Feeds:
Posts
Comments

## WORLD CLASS 1-DAY VOLATILITY PREDICTION FOR INDIVIDUAL STOCKS

In finance, a regression relation for forecasting is considered good when the R-squared is around 0.4; so world class is justified when some forecasting improves upon this range.  Here is a method that will uniformly produce volatility forecasts that are in the R-squared range of 0.6-0.7 for forecasts with relatively simple time series techniques.  Here the fundamental issue is not statistical or analytic technique but the nature of volatility — these predictions are not souped up versions of the time series of univariate series but rely on the fact that the most predictable aspect of financial markets (in a nontrivial manner) is the aggregate volatility of the entire market which happens to be extremely predictable and has long memory; using this as an exogeneous variable guarantees excellent prediction of individual volatilities.

This is a fundamental scientific issue and I doubt very much that this can be ‘arbitraged away’ so this is something I am making effort to make very open and public.  Here is the code (the algorithm is simple and can be done with readily available R packages.

```library(e1071)
library(forecast)
library(tseries)
library(Quandl)

verbose=F
P<-read.table('DatedHistPx.csv',sep=',',header=T)

nselected=369
print(colnames(P)[nselected])
ntimes<-dim(P)[1]
nassets<-dim(P)[2]-1

aggvol<-rep(0,ntimes)
numnonna<-rep(0,ntimes)

for (k in 2:nassets){
if (class(P[,k])=="factor"){
next
}
r<-diff(log(P[,k]))
for (j in 2:ntimes){
if (!is.na(r[j-1])){
v<-log(r[j-1]^2+1e-6)
if (abs(v)<100){
numnonna[j]<- numnonna[j]+1
aggvol[j]<-aggvol[j] + v
}
}
}
}

for (j in 1:length(aggvol)){
aggvol[j]<-aggvol[j]/numnonna[j]
}

px<-P[,nselected]
r<-diff(log(px))

#truncate aggvol
aggvol<-aggvol[!is.na(r)]
r<-r[!is.na(r)]
indvol<-log(r^2+1e-6)
indvol<-indvol[!is.na(indvol)]
indvol[is.na(indvol)]<-log(1e-6)
#adf.test(indvol)

daggvol<-diff(aggvol)
dindvol<-diff(indvol)
write.csv(daggvol,'daggvol.csv')

N<-length(daggvol)
daggvolpred<-rep(0,N)
errors<-rep(0,N)
for (t in 500:(N-1) ){
res=tryCatch({
fit0<-arima( daggvol[(t-500):t], order=c(5,0,0))
fit1<-arima( dindvol[(t-500):t], order=c(5,0,0),xreg=daggvol[(t-500):t])

},error=function(e){
}
)

if (verbose){
print(fit)

Box.test( fit\$residuals,lag=1, type="Ljung-Box")
Box.test( fit\$residuals,lag=5, type="Ljung-Box")
Box.test( fit\$residuals,lag=10, type="Ljung-Box")
Box.test( fit\$residuals,lag=20, type="Ljung-Box")
Box.test( fit\$residuals,lag=21, type="Ljung-Box")
Box.test( fit\$residuals,lag=22, type="Ljung-Box")
Box.test( fit\$residuals,lag=23, type="Ljung-Box")
Box.test( fit\$residuals,lag=24, type="Ljung-Box")
}
aggpred<-predict(fit0,n.ahead=1)\$pred
vpred<-predict(fit1,n.ahead=1,newxreg=aggpred)\$pred

daggvolpred[t]<-vpred
actual<-dindvol[t+1]

error<- actual-vpred
errors[t]<-error
print(paste(P[t,1], indvol[t]+actual,indvol[t]+vpred,error))
}

compvol<-daggvol[501:N]
predcompvol<-daggvolpred[501:N]

png('composite-market-volpred.png')
predictionlm<-lm(predcompvol~compvol)
plot(compvol,predvol)
abline(predictionlm)
dev.off()

```
Advertisements