Feeds:
Posts

PRODUCT RULE FOR CAPUTO FRACTIONAL DERIVATIVES IS THE KEY TO MITTAG-LEFFLER AUTOCORRELATIONS IN VOLATILITY

The Caputo fractional derivative is the convolution of the normal derivative with a fractional kernel:

$D^\mu f(t) = 1/\Gamma(1-\mu) \int_0^t (t-s)^{-\mu} f'(t) ds$

The fact that the derivative is in the integrand allows us to apply the usual product rule to

$D^\mu (x_t x_{t+h}) = 1/\Gamma(1-\mu) \int_0^t (t-s)^{-\mu} x_{s+h}Dx_s + x_s Dx_{s+h} ds$

Now assume $D^\mu x_t = -\lambda x_t + \sigma dw_t/dt$ with a white noise $dw_t/dt$.  Since $E(dw_t/dt) = 0$ we have

$D^\mu E(x_t x_{t_h}) = E( x_{t+h} D^\mu x_t + x_t D^\mu x_{t+h}) + ER$ where the white noise terms all have zero expectation.  Then $D^\mu E(x_t x_{t+h}) = -2\lambda E(x_t x_{t+h}) + ER$.  Assuming the remainder term is small, this produces for the autocorrelation the fractional equation $D^\mu f(t) = -2\lambda f(t)$ approximately therefore the autocorrelation will be a Mittag-Leffler function.  This simple analysis is important because the empirical volatility autocorrelations are well-matched by $(a,b,c) \rightarrow E_{a,b}(-ct^a)$ and the analysis provides us with the steps toward modeling volatility with closer match to empirical data.

Now let us take a look at some of the fits to empirical data to get a sense for why these sorts of models are useful.

ELEMENTARY COMPUTATION OF AUTCOVARIANCE FOR NONSTOCHASTIC MITTAG-LEFFLER

F. Mainardi (mainardi-2014-approximations-mittag-leffler-function) introduced the following approximation for the Mittag-Leffler function $E_{a}(-t^a)$ when $t \rightarrow 0$

$f(t) = 1/(1+Ct^a)$

with $C=\Gamma(1+a)$.  By Taylor expansion

$f(t+h) \sim f(t) - C a t^{a-1} /(1+Ct^a)^2 h + O(h^2)$

Therefore $f(t)f(t+h) = f(t)^2 ( 1 + f'(t)/f(t) h + O(h^2)$ and

$f(t) f(t+h) = f(t)^2 ( 1 - C a t^{a-1}/(1+ Ct^a) h + O(h^2))$

This is then the behavior of the ‘autocorrelation’ function satisfying $D^a x_t = -x_t$. This elementary example provides some intuition for stochastic models where the autocorrelation function can be fit by Mittag-Leffler functions.

CAPUTO DERIVATIVE

The Caputo derivative is more useful for computations because one can inherit the product rule from the usual derivative and constants are annihilated by it.

$D^\mu f(t) = \frac{1}{\Gamma(1-\mu)} \int_0^t \frac{f'(s)}{(t-s)^\mu} ds$

Suppose $D^\mu x_t = -\lambda x_t$.  Then the product rule inside the integral and integration by parts gives

$D^\mu(x_t x_{t+h}) = - 2 \lambda x_t x_{t+h} - \int_0^t (D^\mu x_s D x_{s+h} + D^\mu x_{s+h} D x_s) ds$

We can now add stochastic components $D^\mu x_t = -\lambda x_t + A dw_t$ with zero expectations and still have

$D^\mu E(x_t x_{t+h} ) = -2 \lambda E(x_t x_{t+h} ) + ER(t,h)$

where $R(t,h) = \int_0^t (D^\mu x_s D x_{s+h} + D^\mu x_{s+h} D x_s) ds$.  Assuming this is small, the Mittag-Leffler function will be a good approximation for the autocorrelation function of $x_t$.

BOTH FRACTIONAL INTEGRATION AND FRACTIONAL BROWNIAN MOTION SHOULD APPEAR IN THE CORRECT MODEL OF VOLATILITY

By ‘correct’ we mean one that produces autocorrelations that fit empirical autocorrelations, i.e. Mittag-Leffer functions $(a,b,c)\rightarrow E_{a,b}(-ct^a)$.

$D_t^{\mu} x_t = (\alpha - \beta x_t) + \sigma \sqrt{x_t} dw_t /dt$

Apply $D^{1-\mu}$ to obtain

$D^1_t x_t = D^{1-\mu} (\alpha - \beta x_t) + \sigma D_t^{1-\mu}(\sqrt{x_t} dw_t/dt)$ (*)

Integrate the second quantity by parts using the fact that $D_t^\alpha = 1/\Gamma(1-\alpha) d/dt \int_0^t (t-s)^{-\alpha} (\cdot) ds$ to get

$C(\alpha) (\sqrt{x_t} - \sqrt{x_0}) dw^{1-\mu}_t/dt - 1/2 \Gamma(-\mu) \sqrt{x_t} dw^{1-\mu}_t/dt$

This then still leaves the fractional integration in the first term of (*).  So this is partly the problem with using fractional Brownian motions only in the model $dX_t = (\alpha-\beta X_t) dt + \sigma \sqrt{X_t} dw^{1-\mu}_t$ it would seem.

THE PROBLEM OF AUTOCORRELATION OF A FRACTIONAL SQUARE ROOT PROCESS

$D^\mu x_t = -k x_t + \gamma \sqrt{x_t + \theta} dw_t/dt$ (*)

where the second term is a white noise whose expectation is zero

$D_t E(x_{t+h} x_t) = -k E( x_t D^{1-\mu} x_{t+h} + x_{t+h} D^{1-\mu} x_t)$

using $D^1_t x_t = D^\mu_t D^{1-\mu}_t$.   So we get

$D^\mu_t E(x_{t+h} x_t) = -k E( I^{1-\mu} (x_t D_t^{1-\mu} x_{t+h} + x_{t+h}D_t^{1-\mu} x_t) )$ (**)

Now use the formula

$D_t^{1-\mu} (uv) = \sum_{k=0}^\infty Choose(\mu,k) D^{1-\mu-k}_t u D^k_t v$

to see that the right hand side of (**) is

$-k E( x_t x_{t+h} + x_t D^{1-\mu}_t x_{t+h} - \sum_{k=1}^\infty Choose(\mu,k) D^{1-\mu}_t x_t D^k x_{t+h})$

So if second term and the infinite sum have sufficient cancellation, we have

$D^\mu E(x_t x_{t+h}) = -k E( x_t x_{t+h} )$

and then we have the Mittag-Leffler function as the autocorrelation of these fractional square root processes (this is important because we can see that they do match data quite well.)

A direct integration by parts using (*) shows

$D^\mu E(x_t x_{t+h}) = - k E(x_t x_{t+h}) - \int_0^t k^2 E(x_s x_{s+h}) + \gamma^2 \sqrt{(x_s+\theta)(x_{s+h} + \theta)} D^{1-\mu} x_{s+h} ds$

Assuming $0 \le D^{1-\mu} x_{s+h}$ we have the inequality

$D^\mu E(x_t x_{t_h}) + k E(x_t x_{t+h}) \ge 0$ (***)

since the integral is positive.  Consider $g(t), f(t)$ satisfying $f(0) = g(0) = E(x_{0}x_{0+h})$ and

$D^\mu g(t) + k g(t) = 0$(****)

and let $f(t) = E(x_tx_{t+h})$.  Then subtracting (****) from (***) tells us that $D^\mu (f-g)(0) \ge 0$ to which we can apply $D^{1-\mu}$ to ensure that $f(t) \ge g(t)$.  Therefore we have:

The covariance of the process (*) majorizes a Mittag-Leffler function with parameters $E_{\mu}( -kt^\mu )$.  This result is a partial answer to the question for volatility processes since the Mittag-Leffler function has, when $\mu < 1$ slower than exponential decay and therefore the model provides ‘at least long memory’.

THE RESOLUTION OF PROBLEM OF LONG MEMORY IN VOLATILITY

The problem of long memory in finance has been on the agenda of top econometricians like Clive Granger since the 1960s.  In fact, he wrote about the typical spectral shape of economic variables in 1966 and then in 1980-81 introduced the long memory models that I have been using to show that long memory is a real phenomenon probably for the n-th time since the 1960s, the fractionally differenced ARIMA models, for which a nice open source implementation in the ‘forecast’ module in R makes this simple.  Well, so Granger and coworkers were responsible for the Ding-Granger-Engle model from 1993 which fit 1900 stock volatility autocorrelations extremely well, which I was trying to understand for the last few days.  The DGE autocorrelation model is:

$\rho_t = a \rho_{t-1}^{b_1} b_2^t t^{b_3}$

which is log-linear and can be fit with $R^2 \sim 0.98-9$ so its very good.  Attempting to analyze this equation directly produces a complicated mess.  But we are fortunate because the fundamental idea behind this model is essentially to interpolate between an exponential and a power law decay, so the question is whether a similar quality fit to actual empirical volatility autocorrelations in stocks can be obtained by a more theoretically justified model, and the answer is yes.  We can consider the model $(a,b,c) \rightarrow E_{a,b}(-ct^a)$ and obtain fits with $R^2 \sim 0.97-9$.  For now here are some pictures of fits.

The natural model to consider for these, following the breakthrough of Comte-Renault in 2003 of an analyzable stochastic volatility model with long memory, where they took a square root process $dX_t = k(\theta - X_t) dt + \sqrt{X_t} dw_t$ as used by Cox-Ingersoll-Ross in 1985 and studied by Feller in 1951 and essentially considered the volatility to be a fractional integral of $X_t$.  This allowed them to use the Heston 1993-type analysis for closed form solutions in the affine class which are solvable — essentially this allows one to take a complicated second order PDE which comes out of Black-Scholes analysis via Ito formula and solve it by ordinary differential equations; the affine class is essentially a generalized view for when this is possible.  Regardless of analytic tractability, there is still no science without some exact numerial model fitting data, and the fits above tell us what the process should be if we did not worry about option pricing: we’d be looking at a fractional version of the square root type process:

$(d/dt)^\mu x_t = k(\theta - x_t) + \sqrt(x_t) dw_t/dt$

which is the non-rigorous way of adding a white noise scaled by $\sqrt(x_t)$ to a fractional ordinary differential equation.  The Mittag-Leffler functions act as analogs of exponentials and describe the autocorrelations of these processes just as exponentials occur as the autocorrelations of the standard square root process.  See samorodnitsky-spectrum-autocorrelation for introduction to long memory and comte-renault-2003-affine-sv

SHORT MEMORY REASONABLE FOR 3-MONTH CONSTANT MATURITY

Long memory for equities is universal but here is a quick exercise to check the spectral density of 3-month constant maturity Treasuries and see that a short memory model does fine.  The data can be obtained here.  cm-rates3m The R code is:

z<-as.numeric(as.character(y[,2]))
z[is.na(z)]<-0
k<-kernel(“daniell”,2)
v<-log(z^2+1e-6)
s<-spec.pgram(v,k,taper=0,log=”no”)
temp<-data.frame(x=s$freq,y=s$spec)
mod<-nls( y ~ exp(a+b*x),data=temp,start=list(a=0,b=0))
plot(temp$x,temp$y,pch=’x’)
lines(temp$x,predict(mod,list(x=temp$x)))

Then you can take a look at the decay of the fit directly by estimated parameter $b=-3987$.

COMTE-RENAULT LONG-MEMORY SQUARE ROOT PROCESS

The standard square root process introduced by Feller (1951), used by Cox-Ingersoll-Ross (1985) and used for modeling stochastic volatility by Heston (1993) is the following

$d\tilde{\sigma}^2(t) = k(a-\tilde{\sigma}^2(t)) dt + \gamma \sqrt{\tilde{\sigma}^2} dw(t)$

The Comte-Renault long-memory square root process comes out of two steps.  First, they consider

$dX(t) = -kX(t)dt + \sqrt{X(t) + \tilde{\theta}}dw(t)$ which is equivalent to $\tilde{\sigma}^2(t) = X(t) + \tilde{\theta}$.  Then they consider the long memory process as

$\tilde{\sigma}^2(t) = | \tilde{\theta} + I^{(\alpha)} X(t) |$ where $I^{(\alpha)}$ is the fractional integration operator.  The variance and autocorrelations of $\sigma^(2)$ they relate to the variance and autocorrelations of the square root process $X(t)$ as follows.

(a)  $Var(X(t)) = Var(\tilde{\sigma}^2(t)) = \frac{\tilde{\theta}\gamma^2}{k}$ and

$Var(\sigma^2(t)) = \frac{\theta\gamma^2}{k^{2\alpha+1}} \frac{\Gamma(1-2\alpha)\Gamma(2\alpha)}{\Gamma(1-\ alpha)\Gamma(\alpha)}$

(b) The autocorrelations are $c_X(h) = \frac{\tilde{\theta}\gamma^2}{2k} e^{-k|h|}$ and

$c_{\sigma^2}(h)/c_{\sigma^2}(0) = 1 - \frac{(kh)^{2\alpha+1}}{2\alpha(2\alpha+1)\Gamma(2\alpha)} + O(h^2)$ for small $h$ and

$c_{\sigma^2}(h)/c_{\sigma^2}(0) = (kh)^{2\alpha-1}/\Gamma(2\alpha)$ for large $h$.

(c)  The spectral density of $\sigma^2(t)$ is

$f_{\sigma^2}(\lambda) = \frac{\tilde{\theta} \gamma^2}{\lambda^{2\alpha}(\lambda^2 + k^2)}$