### Risk Management Solutions

February 15, 2017

# Putting Jaccard at Work

### Introduction

In classical portfolio optimization the aim is to construct a portfolio of assets which optimize the combination of obtaining the highest return for a given level of risk. Within the classical Markowitz model, the efficient frontier identifies this set of portfolios where a rational investor can pick the optimal portfolio corresponding to a desired risk level. This risk component is quantified as a volatility number or possibly as an expected shortfall. The aggregation of the risk across all possible assets in which the investor can invest, leads to a covariance matrix. The other elements driving this asset allocation model are the expected returns. In a Black-Litterman model these expected returns are a blend of equilibirum returns and the view of the investor on the relative or absolute returns of the different underlying assets.

### Diversification

Decades of academic studies since Harry Markowitz (1959) and William Sharpe (1964) have explained why diversification should play a key role in a portfolios’s asset allocation. Choueifaty [1] proposed a measure of portfolio diversification, called the Diversification Ratio ($DR$). This ratio was defined as the ratio of the portfolio’s weighted average volatility to its overall volatility. Consider for example an equally-weighted portfolio of two independent assets with the same volatility. The $DR$ of such a two-asset portfolio is equal to $\sqrt{2}$. For $N$ independent assets sharing the same volatility, we have $DR=\sqrt{N}$.

The DR is in essence a quantitative measure. Other elements such as for example the fact that assets belong to the same industry, have a same geographical focus, were not directly taken into account. This is the starting point of this white-paper on portfolio diversification. In this contribution we will study the diversification of a convertible bond portfolio in a somewhat different way. Moving away from a traditional risk measure based on correlation and volatility, we introduce the Jaccard distance as a measure of similarity between two convertible bonds. The concept can be easily be extended to corporate bonds or other hybrid financial instruments.

Starting from a set of 10 convertible bonds we illustrate to what extent these bonds are similar and how to quantify this similarity. The similarity is no longer expressed using numerical measures such as a correlation. Instead we rely on the attributes of these convertible bonds and study whether these bonds don't share too many similar properties. The more two bonds share a particular feature, the more they are to be considered similar and the less they contribute to the portfolio's diversification. As a consequence, the portfolio is less diversified when the consitutents share the same attributes. This is where we bring the Jaccard distance into the picture. For complex instruments this approach can be beneficial. It is indeed often very difficult to reduce the complex nature of these instruments into a set of numerical attributes such as yield to put, convexity, delta, etc.…

[1] Choueifaty, Yves, and Yves Coignard. “Toward Maximum Diversification.” Journal of Portfolio Management, Vol. 34, No. 4 (2008), pp. 40–51

### The Jaccard similarity coefficient

The Jaccard distance is a distance measure that measures dissimilarities between the properties of two elements, in this case two convertible bonds. When bonds are sharing the same attributes, their jaccard distance is zero. In the most extreme scenario, their jaccard distance is equal to one. The Jaccard similarity coefficient is defined as one minus the Jaccard distance. The Jaccard similarity coefficient is one when both bonds share the same properties. In the opposite case, the Jaccard similarity is zero when two bonds have no-single common property.

To explain how the Jaccard distance is computed we take a look at a small sample of 10 convertible bonds. The following questions are used to obtain the properties of the bonds:

• Has the convertible bond a coupon?
• Is it a perpetual?
• Does the issuer have a rating?
• Is the convertible bond investment grade?
• Is the convertible bond balanced (Delta between 40 and 60)?
• Has the convertible bond a soft call?
• Has the convertible bond a hard call?
• Is the bond puttable ?
• Is the bond a cross-currency convertible bond?

In the first table the 10 bonds are described. A 1 in a column means that the answer on the question is yes and a 0 means that the answer on the question is no.

Out[6]:
Has Coupon Is Perpetual Is Rated Is Investment Grade Is Balanced Has Soft Call Has Hard Call Has Put Is Cross Currency
IDRSM 1.25% 2023 1 0 0 0 1 1 0 1 0
MTIM 5.75% 2019 1 0 0 0 1 1 0 0 0
SONPL 1.625% 2019 1 0 0 0 0 1 0 0 0
DAMMES 1% 2023 1 0 0 0 0 1 1 1 0
HDDGR 8.5% 2017 1 0 0 0 1 1 0 0 0
OSIFP 3.25% 2021 1 0 0 0 1 1 0 0 0
WHANA 1% 2019 1 0 0 0 0 1 0 0 0
VILMIR 5.75% 2018 1 0 0 0 0 1 0 0 0
SAFFP 0% 2020 0 0 0 0 0 1 0 0 0
ORPAR 0% 2019 0 0 0 0 1 1 0 0 0

The Jaccard similarity coefficient of two bonds is the ratio of the number of properties that both bonds have in common, divided by the number of properties that at least one bond has. If none of the bonds have a certain property, this property is not taken into account for the calculation of the Jaccard similarity coefficient. As an example we calculate the Jaccard similarity coefficient for the first two bonds. Both bonds have a coupon, are balanced and have a soft call, so there are three common properties. The total number of observed properties equals four. Hence we obtain a Jaccard similarity coefficient of $\frac{3}{4}=0.75$.

Along the same line of thought, one can construct a symmetrical $10 \times 10$ matrix containing the Jaccard similarity coefficients.

A network consists of nodes and edges and we can extend this analogy to a portfolio. In this particular case, the convertible bonds are our nodes. These bonds are connected with edges. The connection between the different bonds describes the link between these two instruments. This connectivity is described by an adjacency matrix ($A$). Whenever element $(i, j)$ of the adjacency matrix is larger than zero, the nodes are connected with an edge. Every edge connecting node $i$ and $j$ contains a weight equal to the corresponding entry $A(i,j)$ in the adjacency matrix $A$. This is an example of an undirected network different from a directed network connections are described with arrow pointing from node $i$ to $j$
We can construct an adjacency matrix using the Jaccard similarity coefficients. The connection between two bonds is specified by their similarity. To make it more explicit which bonds have similar properties, we decided to set all Jaccard similarity coefficients smaller than $0.5$ equal to $0$.

Adjacency matrix:

Out[7]:
IDRSM 1.25% 2023 MTIM 5.75% 2019 SONPL 1.625% 2019 DAMMES 1% 2023 HDDGR 8.5% 2017 OSIFP 3.25% 2021 WHANA 1% 2019 VILMIR 5.75% 2018 SAFFP 0% 2020 ORPAR 0% 2019
IDRSM 1.25% 2023 1.00 0.7500 0.5000 0.60 0.7500 0.7500 0.5000 0.5000 0.2500 0.5000
MTIM 5.75% 2019 0.75 1.0000 0.6667 0.40 1.0000 1.0000 0.6667 0.6667 0.3333 0.6667
SONPL 1.625% 2019 0.50 0.6667 1.0000 0.50 0.6667 0.6667 1.0000 1.0000 0.5000 0.3333
DAMMES 1% 2023 0.60 0.4000 0.5000 1.00 0.4000 0.4000 0.5000 0.5000 0.2500 0.2000
HDDGR 8.5% 2017 0.75 1.0000 0.6667 0.40 1.0000 1.0000 0.6667 0.6667 0.3333 0.6667
OSIFP 3.25% 2021 0.75 1.0000 0.6667 0.40 1.0000 1.0000 0.6667 0.6667 0.3333 0.6667
WHANA 1% 2019 0.50 0.6667 1.0000 0.50 0.6667 0.6667 1.0000 1.0000 0.5000 0.3333
VILMIR 5.75% 2018 0.50 0.6667 1.0000 0.50 0.6667 0.6667 1.0000 1.0000 0.5000 0.3333
SAFFP 0% 2020 0.25 0.3333 0.5000 0.25 0.3333 0.3333 0.5000 0.5000 1.0000 0.5000
ORPAR 0% 2019 0.50 0.6667 0.3333 0.20 0.6667 0.6667 0.3333 0.3333 0.5000 1.0000

### Analysing the Network

A traditional risk analysis of a portfolio brings us an extensive set of possible quantitative risk measures such as a correlation matrix, volatility, VaR, sortino ratio, ec... In an approach were we reduce the portfolio to a graph of connected nodes, different metrics are used to describe the network. We make a distinction between metrics of the network and metrics of the nodes.

#### Metrics of the network:

• Density of a network
The density $d$ of a network is the ratio of the number of edges to the number of possible edges. So the more nodes are connected with each other, the higher the density. For an undirected graph with $n$ nodes and $m$ edges, density $d$ is defined as :
$$$d = \frac{2m}{n(n-1)}$$$
• Clusters
Clustering techniques can partition the network into different clusters. Within a cluster, nodes are strongly connected to each other. Their connection to nodes outside the cluster will be less outspoken.

#### Metrics of the nodes:

• Betweenness centrality
The betweenness centrality of a node $i$ is the fraction of the shortest paths between two other nodes that pass through this particular node $i$. In a graph, betweenness centrality of node $i$, refers to the share of shortest paths in a network that pass through this node $i$.
• Degree
The degree of a node $i$ is the fraction of all other nodes that are directly connected node $i$.
• Closeness centrality
The shortest distance between $i$ and $j$ is denoted as $d(i,j)$. In a network with $n$ nodes, closeness centrality $C(i)$ of node $i$ is defined as :
$$$C(i) = \frac{n-1}{\sum_{j \neq i}d(i,j)}$$$ The more the nodes are distant from each other, the lower the closeness centrality.
As an example we calculate the closeness $C(1)$ of the first bond in our sample dataset (IDRSM 1.25% 2023). We are using the Adjacency matrix where all values of a Jaccard similarity below 0.5 are set equal to 0. Starting from this bond, the other remaining bonds in the portfolio can be reached directly. In total 8 of these bonds are directly connected to the first bond. The distance from this bond to the other bonds is hence equal to one. The shortest path to the 9th node (SAFFP 0% 2020) is two. The sum of the distances is 10, so we obtain a closeness $C(1)$ equal to 0.9 $(=\frac{10-1}{10}=\frac{9}{10})$. The closeness and betweenness centrality of all nodes are represented in the table below. Each of these bonds has been assigned to a particular cluster.
Out[10]:
betweenness closeness cluster
IDRSM 1.25% 2023 0.0653 0.9000 0
SONPL 1.625% 2019 0.0579 0.9000 2
WHANA 1% 2019 0.0579 0.9000 2
VILMIR 5.75% 2018 0.0579 0.9000 2
ORPAR 0% 2019 0.0278 0.6923 1
MTIM 5.75% 2019 0.0167 0.8182 1
HDDGR 8.5% 2017 0.0167 0.8182 1
OSIFP 3.25% 2021 0.0167 0.8182 1
SAFFP 0% 2020 0.0167 0.6429 2
DAMMES 1% 2023 0.0000 0.6429 0

### The Graph

The adjacency matrix is the basis from which all the different metrics are calculated. We can also make a graphical representation of the network where bonds are nodes and the value of the Jaccard similarity determines the existence of an connection between two nodes. In our example, a network was plotted where the color of the node indicates to which cluster the node belongs. The size of the node stands for the betweenness centrality. The higher the betweenness the larger the node.
Out[13]:

Building further on this approach, we now add a new bond to the portfolio of 10 bonds. We investigate how the diversification of the portfolio is impacted by this new bond. The bond has the following properties:
Out[14]:
Has Coupon Is Perpetual Is Rated Is Investment Grade Is Balanced Has Soft Call Has Hard Call Has Put Is Cross Currency
RAGSTF 0 21 0 0 0 0 1 1 0 0 0

Adding this new bond to the portfolio, we construct again the adjacency matrix using the Jaccard similarity coefficient. The new bond has a betweenness centrality of 0.0178 and a closeness centrality of 0.7143. The mean betweenness centrality of all bonds decreased from 0.0333 to 0.0323 and the mean closeness centrality decreased from 0.8033 to 0.7845. Adding the bond resulted in a more diversified portfolio. The closeness centrality increased hence the distance (Jaccard similarity) decreased. The portfolio can now be split into 4 clusters as can be seen on the graphical representation below.
Out[20]:

Note:
For the calculation of the closeness and betweenness centrality, we did not use the weights as defined in the adjacency matrix. For all connected nodes we used a weight equal to one and set the weight equal to zero in the opposite case. Of course, it is perfectly possible to take the weights of the adjacency matrix into account. The shortest path between two nodes is in this case not the minimal number of edges one has to pass to go from one node to another. In this case, it would be the minimal sum of the weights of the edges connecting the two nodes. For the clustering of the convertible bond networks however, we did take into account the weights as defined in the matrix $A$.

### Graphical Databases

While adding a lot of intuition and offering an interesting visual interpretation to the diversification of a portfolio. The network above will fail to add value when dealing with a large portfolio. This is where graphical databases add value. In a next white paper, we will illustrate how RiskConcile deals with solutions such as Neo4j. Graphical database can be put at work in the risk analysis of a portfolio.