For a selection of share price values $s \in [a, b, c, d]$, the value of $\Gamma(s)$ is calculated. These calculations take place overnight and the results are stored in a repository. Intraday, the value of $\Gamma(s)$ is obtained via linear interpolation in the grid. The solution in Figure 1.2 illustrates how the bias between the estimate $\hat{\Gamma}$ and the analytical solution $\Gamma$ can be improved when using a grid with a higher density. But again, this comes with an additional computational cost.
A regression model is similar to a grid solution, since one starts with the calculation of the value of $\Gamma(s)$ for different values of $s\in [a,b, c, d]$ in a grid. Of course the nature of the function $\Gamma(s)$ does not facilitate the use of a linear interpolation. The function is too "wobbly" for this. This is where our grid solution derailed completely and why basis functions are introduced. The model inputs $s_1,s_2,\ldots,..s_N$ are organized into a design matrix $X$. This matrix has in our case 4 rows ($N=4$) and 1 column.
$X$ is transformed from a column vector of size $ N \times 1$ to a matrix with dimension $N\times q$:
$$X\rightarrow \Phi(X) \in \mathbb{R}^{N\times q}$$.
In this case we are using a polynomial of order 3 to replicate the analytical solution:
$$
\begin{equation}
\begin{array}{lll}
X &=& [1 , s , s^2 , s^3]\\
&=&\left[
\begin{array}{llll}
1&s_1&s_1^2&s_1^3\\
\vdots&&&\vdots\\
1&s_N&s_N^2&s_N^3\\
\end{array}\right]\\
\end{array}
\end{equation}
$$
For each row $\mathbf{x_i}$ in the design or data matrix $X$ , there is a corresponding output $y_i$. Estimated outputs are obtained in the regression model as :
$$ y_i= \Gamma(s_i)= w_0+w_1s_i+w_2s_i^2+w_3s_i^3 $$
The result of this interpolation is illustrated in Figure 1.3. The choice of the basis function $\Phi(s)$ is far from straightforward. Where in our example a polynomial of order 3 seemed a logical choice, a similar basis function might fail for different parameters.
For a large grid of pricing parameters and model parameters, the option prices and sensitivities such as gamma, vega, etc..are calculated and stored into a repository. This data structure is summarized into a dedicated Kernel matrix $K$. Once the Kernel is calculated, the training of the model is accomplished.
Starting from the kernel $K$ and after applying GPR-related algebra to it, any new pricing of the derivative structure takes places extremely fast.
</ul>
The results in the Figure above , combined with the gain in pricing speed, turned us into big fans of this GPR-technique.
As explained above the first step is the training of the model. For a training set of $N$ observations across different share price levels $s$ and maturities $t$, the function value is calculated.
Each row $\mathbf{x_i}$ in the data matrix $X$ has two entries: a share price level and time to maturity : $\mathbf{x_i} = [s_i,t_i]$. Each input value $\mathbf{x_i}$ has a corresponding calculated function value $y_i$. In our case this is the Gamma of a cliquet call option: $y_i = \Gamma(s_i,t_i)$
An example of a training grid is illustrated in the figure below:
Once trained, the GPR model can now be used to calculate the function value $y$. In our case, this was the Gamma of a capped cliquet call option. The training is the only time consuming component of GPR. But each option type, fortunately only has to be trained once. The next time one is dealing with a cliquet option perhaps with a different cap level or a different parameters for the underlying, one can still benefit from earlier training work.
Each of the functions $k(\mathbf{x_i},\mathbf{x_j})$ is a kernel function. Here plenty of choices are available. In our example we used a radial basis function (RBF) which is defined as follows:
$$
k(\mathbf{x_i},\mathbf{x_j}) = \alpha \exp{\left(- \sum_{k=1}^{p} \left(\frac{\mid x_{ik}-x_{jk}\mid }{\gamma_k}\right)^\beta\right)}+\sigma
$$
Looking for an intuitive explanation for $k(\mathbf{x_i},\mathbf{x_j})$, we could consider its value as some kind of a distance metric between the two data-points $\mathbf{x_i}$ and $\mathbf{x_j}$ in of our dataset $X$.
At first this looks like a step in the wrong direction since our input matrix sees its dimension increased from $N \times p$ to $N \times N$ when introducing the kernel matrix $K(X,X)$.
A critical reader will also observe that while trying to find an elegant and fast procedure to price derivative instruments, the solution to our problem looks now even further away! Not only did the dimensionality of the problem increase, but one also notices the introduction of new parameters: $\alpha, \gamma_1, \ldots,\gamma_p,\sigma$. Finding the optimal values of these hyperparameters is an integral part of the Gaussian Process Regression approach. But again, one only needs to do this once! Building the dataset $X$, calculating $K(X,X)$ and determining the hyperparameters might be computational intensive; but once trained, finding the value $f(\mathbf{x})$ is extremely fast and easy.