Archive for January, 2014

All military technology comes from pre-existing metaphysical constructions. They are sourced from predatory beings. Frankly much of the shiny polish and fast technical talk of the military technology industry is hogwash. There is no human invention at all.

I have quite recently been in one metaphysical experience where a humanoid being at one point offered up three cards, the central one of interest was an exquisitely drawn card with the picture of a humanoid with head of an eagle or hawk and then with a Nazi armband.  My task, which I may not have succeeded completely, given to me by myself was to put the card ‘out of play’.  Behind the card were two other cards, one with drawings of jet fighter planes and the other with a lonely outpost with a helmeted soldier.   But this is not the first time that I have had experiences suggesting that military technologies are derived from predatory metaphysical beings or that they are pre-designed.  The consistency of four dimensional matter varies but the ‘metallic’ consistency is not rare and I have seen many complex machinery in my metaphysical experiences including military equipment.

Of course just because some technology exists in four dimensions and available in models does not mean it would be trivial to implement them in the material world.  But on the other hand, it seems fairly clear to me that human invention is not the primary producers of many of these technologies.  I won’t provide any attempt at independent verification of my claims.

More interestingly the issue arises of who is really playing the war games on Earth.  We have the superficial accounts of the players as nation states which are obvious in the calculus of hegemonic or imperial power on Earth.  But there is at least a  reasonable possibility — at least for me who has had many experiences of the connection between military technology and predatory beings that the real players are the predatory beings themselves and human beings are pawns in the game.  The thought is not new.  I have often considered ideological conflicts dividing the planet to be analogous to predatory macro-organisms fighting.  For me, who believes in a single human race, the state of affairs suggests a diseased human civilization.   I am often reminded of the famous mantra of J. Robert Oppenheimer of ‘I am Shiva the Destroyer of worlds’ when I consider that perhaps Shiva was indeed involved in an oblique manner in the destruction of Hiroshima and Nagasaki.


Read Full Post »

Mandelbrot’s pioneering observations about the true wildness of market volatility is stunning.  He introduced fractional Gaussian noise into the study of financial returns, introducing the long memory models that I am using for my volatility prediction work.  This is beautiful work and introduced his quantitation of roughness and smoothness that made him famous for fractals and chaos.  But what was he missing?  Clearly descriptively and perhaps in predictability his work revolutionized the study of time series models in finance and understanding of market risks.  I think the missing link for his picture is that he was not looking for what volatility really means besides some computed quantity called volatility, whether some standard deviation series or some stochastic component.  The real issue about volatility comes from the source of the volatility — EMOTIONAL volatility of large groups of people.  I have been paying precise attention to my own emotions over long periods of time and I have noticed the extreme volatility even when I make significant effort to focus and stay rational.  Human beings making decisions for prices in the market, regardless of the facade of their rationality — and terms like ‘market’s expectation’ and so on are just sugarcoating the basic truth that trading is done almost completely by emotions.  Some people call it ‘behavioral’ which is too generic for the insight that market volatility is due to the natural emotional volatility of the congregation of participants trying very hard sometime to be cold and rational.  Whenever the issue of winning and losing arises in human beings, there is no way to avoid the volatility of EMOTIONS.  This is the root of the wildness of the market volatility.  To the extent one can mitigate the risk of emotional volatility of a human being, it is exactly to this extent that market volatility risk can be ‘managed’.  THIS is the right behavioral model of finance, and I will return to making this idea more rigorous and study it more.

Read Full Post »

The frequency domain QML estimator for long memory processes has theory for asymptotic normality of parameters as well as efficiency that was worked out in the 1980s.  The two fundamental papers are Fox and Taqqu (1986) and Dahlhaus (1989).  Specifically for stochastic volatility models with long memory, Breidt Crato and de Lima (1998) who first proposed LMSV models with the unobserved stochastic volatility of r_t = \exp(h_t/2) \xi_t with \xi_t \sim N(0,1) follows an ARFIMA process.  This is the model I am studying  for US stocks.  I am sure this has been done before but I will be studying the comparison of LMSV model prediction of realized future volatility of options versus the Black-Scholes implied volatility prediction.  I am specifically interested in understanding the effects of long memory but I will first simply evaluate the prediction ability.  Realized volatility for an option with n days remaining till expiration will be something like the square root of (1/n) \sum r_i^2*252.

Read Full Post »

The long memory stochastic volatility model was introduced by Breidt, Lima and Crato in the early 1990s but I am following McCloskey to understand the maximum likelihood estimation of parameters by the Whittle approximation to the frequency domain QML estimator.
For a generic process y_t let w_y(\lambda) be the discrete Fourier transform and let I_y(\lambda) = | w_y(\lambda)|^2.  For a stochastic volatility model r_t = \sigma_t e_t with e_t being IID(0,1) and

\sigma_t = \sigma \exp( h_t / 2)

h_t = v_t

\epsilon_t = \log(e_t^2) - E[ \log( e_t^2 ) ]

and (1-L)^d A(L) v_t = B(L) \eta_t with $\eta_t \sim N(0,\sigma_\eta^2)$.   The parameters are thus \theta = (d, \sigma^2_\eta, \sigma^2_\epsilon, a_1,\dots,a_m,b_1,\dots,b_q) where $A(L) = 1 – \sum a_i L^i$ and $B(L)=1+\sum b_i L^i$.

The likelihood function in the frequency domain requires the spectral density function of v_t + \epsilon_t

f(\lambda; \theta) = \frac{ \sigma_\eta^2 | B( e^{-i \lambda})|^2}{ 2 \pi |1 - e^{-i\lambda}|^{2d} | A(e^{-i\lambda})|^2} + \sigma_\epsilon^2/2\pi

Using this spectral density, the log likelihood is

L_T(\theta) = T^{-1} \sum_j [ \log f(\lambda_j,\theta) + \frac{I_{x}(\lamda_j)}{f(\lambda_j;\theta)} ]

So in the log likelihood function is a discrete Fourier transform of x_t = \log(r_t^2).

The form of the log likelihood in stochastic volatility model luckily puts a discrete Fourier transform of log(r_t^2) along with explicitly computable spectral density of the long memory term. The entire infrastructure is fascinating to me because this is a type of model that is capturing some very important and subtle properties of a natural phenomenon: the return series of actual markets.

For ARFIMA processes, R has several packages.  For me what has worked is the forecast package and the fracdiff package.  The parameters of the ARFIMA are estimated using maximum likelihood.

Read Full Post »

The problem is predicting future stock market returns given not only the past series but also an auxiliary series of predictors.  Quite generally one has a possibly high dimensional series of predictors Z_t and one is interested in predicting r_{t+1} given r_t and Z_t.  In a practical situation one might have a rolling set of regression type models — in our case it happens to be a SVR model.  The fitted model for a window of time is not quite the model we would like.  The idea is to to use standard time series forecasting on the fitted parameters and then use the new model for the final prediction.  The question arises of whether this extra effort is justified and a reasonable answer is that to the extent one should trust the regression-type model’s ability to capture a slightly more stable stable relation between the auxiliary series Z_{t-h} and r_{t-h+1} for lag h, more stable than the series r_t itself, then the effort is fully justified.  Even with informative predictors, the problem of predicting future returns is quite challenging because the regression type model needs to adapt and it does evolve over time.  The parameters of the SVR consist of weights of support vectors as well and a parameter vector for the decision function.  Other regression type models have a similar set of parameters.

Read Full Post »

Just as in science, there are entrenched ideologies, and the dominant orthodoxy of empiricists had fought tooth and nail against four macroscopic spatial dimensions throughout the twentieth century which culminated in avoiding the obvious implications of observing 5, 8, 10 and 12 fold rotational symmetries in crystals originally by Daniel Shechtman to give a Nobel prize in 2011 for ‘the discovery of quasicrystals’ to avoid the obvious conclusion, so there have developed ideologies regarding the financial markets.  Throughout the twentieth century, entrenched positions had developed surrounding the issue of ‘efficient markets’.  On the academic side this meant essentially the random walkers who believe there is no predictive component to the return series.  Among market practitioners, there has developed many theories of market prediction based on technical indicators.  In actual trading, the ‘fundamental analysis’ of value on the other hand has not very often been useful for those ‘seeking alpha’, the term that came from William Sharpe and others’ CAPM model with alpha and beta representing regression coefficients against the market.  The random walkers’ picture of the market has been under assault for a long time by many different people, but if I had to choose the most rigorous blow to the theory with strong evidence, I would point to the long memory property clearly addressed by Zhuanxin Ding, Clive Granger and Robert Engle in ‘A Long Memory Property of Stock Market Returns and a new model,’  J. Empirical Finance, 1993, 83–106.  They highlight the fact that stock returns generally do not have significant autocorrelations beyond lag 1 but powers of the absolute value of daily returns have significant autocorrelations going out up to ten years in the past.  This very long memory property of stock returns is sometimes referred to as a ‘stylized fact’ but it is in my opinion a significant breakthrough in our understanding of the natural phenomenon that constitutes the markets.  While random walkers are definitely wrong, it happens not to be a trivial problem to produce useful predictability of the markets either.  As an interesting example of the line drawn between the past and future, consider my attempts at using a number of indicators to predict the stock price movement where the important issue I try to avert is the time component.  I use 1000 days of indicators and lagged returns to attempt to predict future returns using several thousand days to test.  Across hundreds of stocks uniformly I see that the model (which I fit using support vector regressions) training error rate almost uniformly above 65% but the test error almost uniformly close to 50%.  Naively one might say this is evidence for random-walk type efficiency but it’s a more subtle phenomenon: in order to predict future returns correctly, one must pay attention to the delicate time-evolution component of any model.  I will come back to comment on this, but the issue is that the ‘flattening of time’ even with timed indicators is sufficient to destroy the predictibility that is being discovered by the SVR regression.

That ‘flattening of time’ is responsible for reducing the 65% training error (for direction prediction) is reinforced by work yesterday that showed that in many cases the simple-minded adaptive algorithm with a rolling time window increases the prediction accuracy out-of-sample from 50% to 52-55% on most of the stocks.  Although useful, not extremely impressive, this does nullify the impossibility of prediction claim of the random walkers.  The right sort of thing to do to improve these weak predictability results is to consider examining the estimated parameters of the SVR over time which one would expect to be more stable than the return series itself for various reasons and then use time series forecasting methods on these and then use the modified model for the actual prediction.  The issue here is that we are attempting to use a relation between a number of indicator series and the future return, a relation that encapsulated by a series of fitted models over a rolling window.  The evolution of the relation is critical to having higher probability of prediction and might be stable enough so we can apply standard time series forecasting methods.

Read Full Post »

The model for returns and auxiliary series of predictors is not necessarily correctly specified with a multivariate normal sequence but the important thing for us is to be able to incorporate auxiliary predictor sequences directly in a HMM model and then we can use the conditioned distributions to predict direction and large moves.  I have obtained good results on prediction using machine learning algorithms such as SVM.  Conceptually, the HMM framework is appealing for many reasons including a natural explanation of leptokurtosis of return distributions as well as the idea that the hidden states represent unobserved economic and other factors.  Here is the code.

import argparse
import talib
import pandas as pd
from equity_utils import *
import numpy as np
from scipy.stats import nanmean, nanstd
from sklearn import preprocessing
from pandas.io.data import DataReader
from equity_utils import *
import datetime
import sys
import scipy
import ghmm

symbols = []
f = open( 'symbols-CBOE-opt.txt','r')
for line in f:

symbols = subselect_stocks( symbols, datetime.date(1999,1,1))

parser = argparse.ArgumentParser(description='Trading strategy using language triple word probabilities')

def print_conditionals( T, S, t=0, label='RSI'):
    print '%s (%f,%f) (%f,%f)' % (label,nanmean(T[S>t]),nanstd(T[S>t]),

def positive_probability_HMM( matrices, last_state, positionOfReturns):
    A = matrices[0]
    B = matrices[1]
    pi = matrices[2]
    # straightforward expectaction over possible states
    # and possible emissions
    A_row = A[last_state]
    B_row = B[last_state]
    p = 0.0
    (muv, sigmav) = B_row
    n_features = len(muv)

    # Using marginals
    sigmaM = np.array(sigmav).reshape(n_features,n_features)
    mu0 = muv[positionOfReturns]
    sigma0 = sigmaM[positionOfReturns,positionOfReturns]

    # Using conditionals
    mu2 = np.array(muv)[:(n_features-1)]
    sigma22 = sigmaM[:(n_features-1),:(n_features-1)]
    sigma12 = sigmaM[(n_features-1),:(n_features-1)]
    sigma21 = sigmaM[:(n_features-1),(n_features-1)]

    sigma = sigma0 - np.dot(sigma12,np.dot(np.linalg.inv(sigma22),sigma21))
    mu = mu0 + np.dot(sigma12,np.dot(np.linalg.inv(sigma22),mu2))

    for i in range(len(A_row)):
        # check to see if symbol j corresponds to 
        p += pi[i]*A_row[i]*(1.0 - scipy.stats.norm.cdf(0.0, loc=mu, scale=sigma))
    return p

def score_hmm( hmm,X,y ):
    sigma = ghmm.Float()
    S = ghmm.EmissionSequence(sigma, X.flatten().tolist())
    viterbi_path, _ = hmm.viterbi(S)
    matrices = hmm.asMatrices()

    i = 0
    positionOfReturns = len(X[0]) - 1
    for state in viterbi_path:
        p = positive_probability_HMM( matrices, state, positionOfReturns )
        pos = np.sign( p - 0.5 )
        if pos == y[i]:
            hits += 1
        i += 1
    if N == 0:
        return 0.0
    return float(hits)/float(N)

def hmm_test_performance( symbol ):
    fname = 'eod/'+symbol+'.csv'
    data = pd.read_csv( fname, skiprows=1,names=['Date','Open','High','Low','Close','Volume'],na_values=['-'])
    rets = np.log(data['Close']/data['Close'].shift(1))
    open = fill_nan(data['Open'])
    high = fill_nan(data['High'])
    low = fill_nan(data['Low'])
    close = fill_nan(data['Close'])
    volume = fill_nan(data['Volume']).astype(float)

    # Let's add in reference market returns, SPY
    start = datetime.datetime.strptime(data['Date'].values[0],'%Y-%m-%d')
    finish = datetime.datetime.strptime(data['Date'].values[-1],'%Y-%m-%d')
    spydata = DataReader('SPY', 'google', start, finish)
    spyrets = np.log(spydata['Close']/spydata['Close'].shift(1)).values

    # I want to analyze the conditional distributions of returns
    # on technical analysis indicators

    t1 = talib.RSI( np.array(close.values))
    t2,_,_ = talib.MACD(np.array(close.values))
    t3 = talib.ADX(np.array(high.values),np.array(low.values),
    t4 = talib.OBV(np.array(close.values),np.array(volume.values))
    t5 = talib.MFI(np.array(high.values),np.array(low.values),

    X = []
    y = []

    N = min( len(spyrets),len(t1),len(t2),len(t3),len(t4),len(t5))

    for i in range(1,N):
        x = [t1[i],t2[i],t3[i],t4[i],t5[i],rets[i-1],spyrets[i-1]]
        if not np.isnan(np.sum(x)):
            X.append( x )

    train_rows = 1000

    X_scaled = preprocessing.scale(np.array(X))
    Xtrain = X_scaled[:1000,:]
    Xtest  = X_scaled[1000:,:]
    y_train = y[:1000]
    y_test = y[1000:]

    #from sklearn.svm import SVC
    #clf = SVC(kernel='rbf',gamma=0.01)

    sigma = ghmm.Float()
    pi = [1./5]*5
    A = [ [1./5]*5]*5

    n_feature = Xtrain.shape[1]
    mu0 = [ 0.0 ]*n_feature
    sigma0 = np.eye(n_feature).flatten().tolist()
    B = [[mu0,sigma0,[1.0]]]*5
    S = ghmm.EmissionSequence(sigma,Xtrain.flatten().tolist())
    hmm = ghmm.HMMFromMatrices( sigma, ghmm.MultivariateGaussianDistribution(sigma),A,B,pi)

    accuracy_train = score_hmm(hmm, Xtrain,np.sign(y_train))
    accuracy_test  = score_hmm(hmm, Xtest,np.sign(y_test))

    return (np.round(accuracy_train,3),

output = open('results.csv','w')

for s in symbols:
    (tr,te) = hmm_test_performance(s)

Read Full Post »

Older Posts »