Introduction

We built an R package stovol to fit two types of stochastic volatility (SV) models on discrete-time observations. The first model is called the basic stochastic volatility model where the two innovations are independent and each of them follows a normal distribution. This model was firstly proposed by Taylor (1986), while the second model is called the asymmetric stochastic volatility (ASV) model introduced by Harvey and Shephard (1996), where the two innovations of the BSV model are assumed to be correlated. In reality, the correlation between the two innovations are usually found to be negative which indicates that the financial returns and their latent volatilities have a leverage effect.

Install the package

To install the package stovol, you need to have the package devtools available in your R or RStudio if you use RStudio as your IDE. Please use install.packages("devtools") to install it first. Then we install our package stovol.

devtools::install_github("zhongxianmen2020/stovol")
## Skipping install of 'stovol' from a github remote, the SHA1 (120d33ef) has not changed since last install.
##   Use `force = TRUE` to force installation

If you want to install stovol again, you need the following

devtools::install_github("zhongxianmen2020/stovol", force=TRUE)
## Downloading GitHub repo zhongxianmen2020/stovol@HEAD
## magrittr (2.0.1 -> 2.0.2) [CRAN]
## Installing 1 packages: magrittr
## Installing package into 'C:/Users/zhongxian/Documents/R/win-library/4.1'
## (as 'lib' is unspecified)
## 
##   There is a binary version available but the source version is later:
##          binary source needs_compilation
## magrittr  2.0.1  2.0.2              TRUE
## installing the source package 'magrittr'
## Warning in i.p(...): installation of package 'magrittr' had non-zero exit status
## * checking for file 'C:\Users\zhongxian\AppData\Local\Temp\RtmpQxAEbN\remotes7a47b6b5541\zhongxianmen2020-stovol-120d33e/DESCRIPTION' ... OK
## * preparing 'stovol':
## * checking DESCRIPTION meta-information ... OK
## * checking for LF line-endings in source and make files and shell scripts
## * checking for empty or unneeded directories
## Omitted 'LazyData' from DESCRIPTION
## * building 'stovol_0.1.0.tar.gz'
## 
## Installing package into 'C:/Users/zhongxian/Documents/R/win-library/4.1'
## (as 'lib' is unspecified)

Note Sometimes GitHub may delay your request and the installation process will give you an error. If this happens, please try it again later.

BSV model estimation and assessment

Let \(y_t\) denote the observed asset return at time \(t\), \(t\leq T\), where \(T\) is the sample size. Without loss of generality, we assume that the expectation of \(y_t\) is zero such that \(E(y_t)=0\). Then the BSV model can be expressed as \[ y_{t}=\exp(h_t/2)\epsilon_t, t=1, ..., T,\\ h_{t+1}=\mu+\phi (h_{t}-\mu)+\sigma \eta_{t+1}, t=1,..., T-1,\\ h_0\sim {\mathcal N}(\mu, \sigma^2/(1-\phi^2)) \] The two innovations \(\epsilon_t\) and \(\eta_{t+1}\) are assumed to be independent and each follows a standard univeriate normal distribution.

We assess the performance of the developed MCMC method for BSV model by simulation studies. We generate an artificial return time series using the BSV model with given parameter values. The generated data is then fitted by the derived MCMC estimation algorithm to see whether the estimation method can recover the true parameters used in the artificial data generation.

The values of parameters used to generate artificial interest rate time series can be seen in the code below.

set.seed(22)

T = 2000
k1 = 50000
y = rep(0, T)
z = y

mu = -10.45
phi = 0.97
sigma1 = 0.19

temp =  rnorm(1, 0, 1)
x1 = mu + sigma1/sqrt(1.0-phi*phi)*temp

for (i in 1:(k1-1)){
  temp = rnorm(1,0,1)
  y[1] = exp(x1/2.0)*temp
  temp = rnorm(1,0,1)
  x1 = mu +phi*(x1 - mu) + sigma1*temp
}

for ( i in 1:T){
  temp = rnorm(1,0,1)
  y[i] = exp(x1/2.0)*temp
  z[i] = x1
  temp = rnorm(1,0,1)
  x1 = mu + phi*(x1 - mu) + sigma1*temp
}

The following is the plot of the dynamics of generated returns.

k = 1: length(y)
par(mfrow=c(1,1))
plot(k, y, type="l")

Now we fit the BSV model to the artificially generated data set. We first load the built R package stovol.

library(stovol)
burn_in = 5000
N = 15000
model_fit = stovol(y, burn_in, N, "BSV")

After discarding the first burn_in of 5,000 sampled points, we use the the later 10,000 sampled points for parameter estimation. We plot the three sampled time series after the burn_in period.

ts_plot(model_fit$mcmc_series)

The following table contains the estimated parameters, the standard deviations, and the Bayesian confidence intervals.

model_fit$estimate
##          Est.        Std. HPD CI(95%)_lower HPD CI(95%)_upper
## 1 -10.4474550 0.196569834       -10.8240244       -10.0448543
## 2   0.9731967 0.007405277         0.9586756         0.9873129
## 3   0.2120336 0.020376062         0.1730166         0.2503028

For goodness-of-fit assessment of the estimated BSV models, there are a number of techniques that can be used. One of them is the Kolmogorov-Smirnov (K-S) test, which is often applied to assess whether the realized observation errors originated from the corresponding distribution follow a standard normal distribution.

The standardized residuals calculated using the formula below would follow a standard normal distribution.

\[ Residual_t =y_t*\exp(-\hat{h}_t/2), \]

where \(\hat{x}_t, t=1, ..., T\) are the estimated volatilities which are the by-products of the MCMC estimation method.

QQ_plot(model_fit$standard_residuals)

## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  ress
## D = 0.023687, p-value = 0.2118
## alternative hypothesis: two-sided

The K-S test for the normality assumption shows that the p-value is 0.2118. So we can not reject the non hypothesis that the standardized residuals follow a univariate normal distribution.

Another approach is to assess probability integral transforms (PITs) obtained from fitted BSV models based on the training data. If the fitted BSV model agrees with the data, the PITs will follow a uniform distribution \(U(0,1)\). Again, we use the K-S test to test the Uniform distribution of the PITs.

Suppose that \(\{f(y_t|{\cal F}_{t-1})\}_{t=1}^{T}\) is a sequence of conditional densities of \(y_t\) and \(\{p(y_t|{\cal F}_{t-1})\}_{t=1}^{T}\) is the corresponding sequence of one-step-ahead density forecasts. The PIT of \(y_t\) is defined as

\[ u(t)=\int_{-\infty}^{y_t} p(z|{\cal F}_{t-1})dz. \]

Under the null hypothesis that the sequence \(\{p(y_t|{\cal F}_{t-1})\}_{t=1}^{T}\) coincides with \(\{f(y_t|{\cal F}_{t-1})\}_{t=1}^{T}\), the sequence \(\{u(t)\}_{t=1}^T\) corresponds to i.i.d. observations from the Uniform distribution \(U(0,1)\).

\(\underline{Note}\): For in-sample one-step_ahead prediction, \(u(t)\) can be calculated using the above formula. But for out-of-sample prediction, we have to use auxiliary particle filters (APF) introduced in Pitt and Shephard (1999), which is a consecutive sampling and re-sampling method for non-linear non-normal state space models. This method is a counter part of Kalman filters for linear state space models.

pit_plot(model_fit$pit)

## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  pit
## D = 0.021504, p-value = 0.3136
## alternative hypothesis: two-sided

The two blue lines in the upper plot are the 95% Bayesian confidence interval. The K-S test for the Uniform distribution assumption shows that the p-value is 0.3136. So we can not reject the non hypothesis that the PITs calculated from the fitted BSV model based on the training data follow a uniform distribution in the interval (0,1).

As usual, we also give AIC and BIC of the fitted model to the data.

model_fit$Aic
## [1] 4114.386
model_fit$Bic
## [1] 4131.189

Now we compare the absolute returns \(abs(y_t)\) with the estimated volatilities \(\exp(\hat{h_t}/2)\).

vol_plot(y, model_fit$est_volatilities)

ASV model estimation and assessment

Let \(y_t\) denote the observed asset return at time \(t\), \(t\leq T\), where \(T\) is the sample size. Without loss of generality, we assume that the expectation of \(y_t\) is zero such that \(E(y_t)=0\). Then the ASV model can be expressed as \[ y_{t}=\exp(h_t/2)\epsilon_t, t=1, ..., T,\\ h_{t+1}=\mu+\phi (h_{t}-\mu)+\sigma \eta_{t+1}, t=1,..., T-1,\\ h_0\sim {\mathcal N}(\mu, \sigma^2/(1-\phi^2)), \] where \[ \left( \begin{matrix} \epsilon_t \\ \eta_{t+1} \\ \end{matrix} \right)\sim {\mathcal N}\left( \begin{matrix} \left[ \begin{array}{c} 0 \\ 0 \\ \end{array} \right],\left[ \begin{array}{ccc} 1 , \rho \\ \rho ,1 \\ \end{array} \right] \end{matrix} \right). \] The two innovations \(\epsilon_t\) and \(\eta_{t+1}\) are assumed to jointly follow an independently and identically distributed (\(i.i.d.\)) bivariate standard normal distribution with a correlation coefficient given by \(\rho=corr(\epsilon_t, \eta_{t+1})\). In order for the ASV model to be weakly stationary, it is assumed that \(|\phi|<1\). A priori, the coefficient \(\rho\) is expected to have a negative sign, which means that the returns \(y_t\) and the future log volatilities \(h_{t+1}\) are expected to be negatively correlated. This is often interpreted as evidence of a leverage effect in time series of asset returns.

We assess the performance of the developed MCMC method for ASV model by simulation studies. We generate an artificial time series using the ASV model with given parameter values. The generated data is then fitted by the derived MCMC estimation algorithm to see whether the estimation method can recover the true parameters used in the artificial data generation.

The values of parameters used to generate artificial interest rate time series can be seen in the code below.

set.seed(22)

T = 2000
k1 = 50000

y = rep(0, T)
x1 = y


mu = -10.45
phi = 0.97
rho = -0.41
sigma1 = 0.19

#  method 1
temp =  rnorm(1, 0, 1)
x11 = mu + sigma1/sqrt(1.0-phi*phi)*temp

for (i in 1:(k1-1)){
  temp = rnorm(1, 0, 1)
  y[1] = exp(x11/2.0)* temp
  temp = rnorm(1, 0, 1)
  x2 = mu + phi * (x11-mu)+ sigma1*rho*y[1]*exp(-x11/2) +sigma1*sqrt(1-rho*rho)*temp
  x11 = x2
}

for (t in 2:T){
  temp = rnorm(1, 0, 1)
  y[t] = exp(x11/2.0)* temp
  temp = rnorm(1, 0, 1)
  x2 = mu + phi * (x11-mu)+ sigma1*rho*y[t]*exp(-x11/2) +sigma1*sqrt(1-rho*rho)*temp
  x11 = x2
}

The following is the plot of the dynamics of the generated returns.

k = 1: length(y)
par(mfrow=c(1,1))
plot(k, y, type="l")

Now we fit the ASV model to the artificially generated data set. We first load the built R package stovol.

library(stovol)
burn_in = 5000
N = 10000
model_fit = stovol(y, burn_in, N, "ASV")

After discarding the first burn_in of 5,000 sampled points, we use the later 10,000 sampled points for parameter estimation. We plot the three sampled time series after the burn_in period.

ts_plot(model_fit$mcmc_series)

The following table contains the estimated parameters, the standard deviations, and the Bayesian confidence intervals.

model_fit$estimate
##          Est.        Std. HPD CI(95%)_lower HPD CI(95%)_upper
## 1 -10.4255809 0.143694715       -10.7215788      -10.15126321
## 2   0.9632309 0.009363314         0.9448949        0.98128374
## 3  -0.2772403 0.085356032        -0.4398026       -0.09953048
## 4   0.2271136 0.024788476         0.1773194        0.27530467

For goodness-of-fit assessment of the estimated SDE models, there are a number of techniques that can be used. One of them is the Kolmogorov-Smirnov (K-S) test, which is often applied to assess whether the realized observation errors originated from the corresponding distribution follow a standard normal distribution.

The standardized residuals calculated using the formula below would follow a standard normal distribution.

\[ Residual_t =y_t*\exp(-\hat{h}_t/2), \]

QQ_plot(model_fit$standard_residuals)

## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  ress
## D = 0.023477, p-value = 0.2203
## alternative hypothesis: two-sided

The K-S test for the normality assumption shows that the p-value is 0.2203. So we can not reject the non hypothesis that the standardized residuals follow a univariate normal distribution.

Another approach is to assess probability integral transforms (PITs) obtained from fitted SDE models. If the fitted ASV model agrees with the data, the PITs will follow a uniform distribution \(U(0,1)\). Again, we use the K-S test to test the uniformality of the PITs.

Suppose that \(\{f(y_t|{\cal F}_{t-1})\}_{t=1}^{T}\) is a sequence of conditional densities of \(y_t\) and \(\{p(y_t|{\cal F}_{t-1})\}_{t=1}^{T}\) is the corresponding sequence of one-step-ahead density forecasts. The PIT of \(r_t\) is defined as

\[ u(t)=\int_{-\infty}^{y_t} p(z|{\cal F}_{t-1})dz. \]

Under the null hypothesis that the sequence \(\{p(y_t|{\cal F}_{t-1})\}_{t=1}^{T}\) coincides with \(\{f(y_t|{\cal F}_{t-1})\}_{t=1}^{T}\), the sequence \(\{u(t)\}_{t=1}^T\) corresponds to i.i.d. observations from the Uniform distribution \(U(0,1)\).

pit_plot(model_fit$pit)

## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  pit
## D = 0.021558, p-value = 0.3108
## alternative hypothesis: two-sided

The two blue lines are the 95% confidence interval. The K-S test for the Uniform distributional assumption shows that the p-value is 0.3108. So we can not reject the non hypothesis that the PITs calculated from the fitted ASV model based on the training data follow a Uniform distribution in the interval (0,1).

As usual, we also give AIC and BIC of the fitted model to the data.

model_fit$Aic
## [1] 4104.016
model_fit$Bic
## [1] 4126.42

Now we compare the absolute returns \(abs(y_t)\) with the estimated volatilities \(\exp(-\hat{h_t}/2)\).

vol_plot(y, model_fit$est_volatilities)

Conclution and next steps

We have shown how to use the functions in the package stovol from model fitting to model assessment. Simulation studies shows us that the developed MCMC methods are able to recover the true parameters used in artificial data generation.

This version of the package is written in pure R language. To speed up the MCMC estimation process, we are planing to integrate C/C++ too.

We are also planing to write a package for Python in which the estimation process would be written in C/C++.

We will continue to work on this package by adding more types of SV models such as adding student-\(t\) distribution to the innovation of observational equations, adding mixture distributions, and threhosld distributions.

Please kindly let us zhongxianmen@hotmail.com and chengguo.weng@uwaterloo.ca know your comments and suggestions, so we can improve it in the next version.