# Machine Learning

We offer solutions for fund managers looking to unlock the potential of big data and machine learning. Data analytics is unmistakenly a primary driver for next-generation risk- and asset management.

A competitive advantage can be gained in the following areas: portfolio hedging, credit analysis, portfolio liquidity, …

We combine our insights in data analysis and the strength of rigorous mathematical models to deliver pragmatic solutions on your desk.

Below you can find two examples of the application of modern statistical methods on financial data.

### Portfolio as a Network

In 1928 Jacob Moreno founded the discipline of social network analysis. This is a branch of sociology deals with the quantitative evaluation of an individual’s role in a group. The visualisation of the networks were called **sociograms**. In the financial sector these concepts have been applied in **fraud analytics** **[1]**.

In a network graph (\(G\)) nodes (\(v\)) are connected with edges. In a portfolio-context, a node could for example be a representative for a particular stock holding. The edge represents the relationship between two holdings. The distance between two nodes \(y\) and \(v\) is \(d(y,v)\).

Two different network graphs in portfolio analysis are possible :

- Undirected graphs: each node is a stock, a connexion between two nodes is established if they belong for example to the same sector.
- Directed graphs: each node points to another node. In a portfolio network, the price performance of a stock might be leading the price performance of another holding (lagged correlation).

The geography of a network is measured in a couple of **metrics**:

**Degree**\(D(v)\): Number of connections of node \(v\) with other nodes.**Closeness**\(C(V)\): \(\frac{\sum_y d(y,v)}{N_v}\) where \(N_v\) is the number of nodes that can be reached by node \(v\).**Betweenness**\(B(v) \) is the total number times a node \(v \) is on the shortest path connecting a pair \((x, y)\). This measure informs us about the number of times a node acts a bridge between the other nodes.

**Case Study**

As a case study we considered the portfolio of all the 30 stocks in the **DAX index** and calculated the correlation between each of the components of this index using a 2 week lag. Using betweenness as risk measure, one is able to determine those stocks that are most influential in the German stock market.

#### Bart Baesens, Veronique Van Vlasselaer, Wouter Verbeke (2015) Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques: A Guide to Data Science for Fraud Detection (Wiley and SAS Business Series)

### Clustering Techniques

**Introduction**

Developing an appropriate hedging strategy is key to every investment manager. Some components of the portfolio will prove more difficult to hedge than others. Just imagine you are running a portfolio of convertible bonds. Any increase in the credit spread \(CS\) will have a negative impact on the portfolio. Hedging with **credit default swaps** is very often not going to be a feasible solution. Very often no investment bank will make you market in these kind of instruments. The ultimate hedging vehicle is then to hedge the credit risk with a short position in the underlying shares. This is driven by the view that a weakness in the share price \(S\) matches an increase in the credit spread \(CS\). In practice portfolio manager assumes the following link between \(S\) and \(CS\):

\(

\begin{equation}

CS = CS_0 \times \left(\frac{S_0}{S} \right)^E

\end{equation}

\).

The coefficient \(E\) quantifies the relationship between the credit spreads and the share price levels. \(S_0\) and \(CS_0\) represent respectively the current share price and credit spread level (bps).

**Two Clusters**

The figure above illustrates how a global fit between between credit spreads and stock price levels is not possible. In this example, there were clearly two different regimes at work: *low share prices / high credit spreads * and *high share prices / low credit credit spreads*. The presence of these two clusters, does not permit the use of a global fit using a single coefficient \(E\) that is valid for all the share price levels.

Datamining tools such as **K-Means Clustering **, will alleviate the task of the portfolio manager. These algorithms can be put at work to spot the presence of **clusters**. Once these clusters are located, ordinary least squares determine for each of the clusters the link between equity (\(S\)) and credit (\(CS\)). In stead of one global fit, we are now able to work with two local fits. One of these models is valid when the bond slips for example into distressed territory.