Feeds:
Posts
Comments

Archive for March, 2012

Imagine for a moment that we are at the end of 1900.  Physical science is in crisis, as Rayleigh-Jeans law dismally fails to fit the intensity distribution of a blackbody, and the questions of stability of the classical atom of circular orbits of electrons around a positively charged nucleus is a serious anomaly because it throws open the question of stability of all matter.  Planck’s solution of the blackbody radiation problem relied on the hypothesis that energy is quantized.  His solution to the blackbody radiation problem specifically does not require a detailed theory of how and why energy is quantized but simply that it is.  At the same year, Ivars Fredholm resolved a great open problem of nineteenth century mathematics:  he showed that a bounded domain in the plane has a discrete set of pure tones or a discrete set of eigenvalues of the Laplacian.  Now while the observables-as-operators approach to description of reality by quantum mechanics allows the description of observables by eigenvalues of operators, a direct link had not been made in 1900 between quantization of energy and pure tones of a geometric object: the entire universe.

Spectrum of the Laplacian on noncompact manifolds need not be discrete and even on the euclidean spaces the spectrum of the Laplacian is not discrete.  If there be a link between Fredolm’s solution of the discreteness of the spectrum of Laplacian on bounded domains on the plane would be connected to the energy spectrum if the universe could be shown to be compact.  If we seek such evidence, we can find it in Penzias and Wilson’s discovery in 1964 of the cosmic background radiation which was used to reinforce the Big Bang cosmology which was given instead as evidence for the expansionary universe models.

We can proceed as follows:  first, although the Nobel prize for chemistry was given to Daniel Shechtman in 2011 for ‘the discovery of quasicrystals’, we can note that what Shechtman had clearly discovered from early 1980s is that 5, 8, 10, and 12 fold rotational symmetries are observed in crystalline structures.  These can arise as symmetries of four dimensional crystals but not for three dimensional crystals.  Shechtman’s discoveries can be used parsimoniously as evidence for existence of four macroscopic spatial dimensions.  Of course the actually observed ‘quasi-crystals’ do not have translational invariance because they cannot have translational invariance, but even the standard models of quasicrystals use projections of higher dimensional crystals, which makes the conclusion of the existence of four macroscopic spatial dimensions reasonable.

Next, although the redshift of distant galaxies has been interpreted as a universal Doppler effect, a parsimonious conclusion would be that these result from some mechanism that reduces the energy of the photons uniformly, like frictional drag and thereby lowers the frequency of light towards red.

Now the cosmic background radiation has a uniform lower bound, around 2.7 K.  Assuming that diffusion produced the cosmic background radiation, we can use the Gaussian upper bounds established for heat kernels on complete noncompact riemannian manifolds with a lower bound on Ricci curvature to conclude that our actual universe must be compact.

So we are living in a compact four dimensional universe.  It has to be a sphere if it is to replicate the quantization of energy for the hydrogen atom.  We can deduce that the radius of the universe must be 1/h in length-equivalent where h is Planck’s constant.  Ricci curvature for any three dimensional submanifold can be calculated using the second fundamental form as well which gives us a version of the gravitational field equations with the ‘cosmological constant’ filled in: it is h^2, and this matches experimental determination of the cosmological constant.

The model of photons often employed locally is as a superposition of plane waves.  If we take seriously the description of the universe above, then we must replace the plane waves by spherical harmonics for a 4-sphere.  Recall that one can consider harmonic homogeneous polynomials on 5-dimensional euclidean space. These restricted to the 4-sphere are precisely the eigenfunctions of the spherical Laplacian, the degree k harmonic homogeneous polynomials have eigenvalue -k(k+2).

Read Full Post »

The universe is an eternal stationary four-dimensional sphere governed by the single force of deterministic SU(2) electromagnetism.  We experience the material three dimensional universe by our usual senses but then there is the question of how four spatial dimensions can work.  Although most of us are not fully attuned to it, the pineal gland, the so-called third or inner eye is a literal visual organ.  Magnetic monopoles are abundant in the four dimensional universe and the third eye is capable of seeing objects constructed from magnetic monopoles.  The objectivity of metaphysical reality could be questioned, but the examples of shared metaphysical experiences show us that metaphysical realities cannot be put in the category of purely subjective fruitfully.

The fact that a parsimonious grand unification theory consistent with the major features of the observed world can be achieved reasonably simply using the 4-sphere model of the universe is further support for this picture of the universe.  Quantum mechanics is a linear approximation of the S4 theory which attempts to explain four-dimensional phenomena by a stochastic component and special rules for subatomic events.  Classical physics on a 4-sphere can overcome the foundational problems that led to quantum mechanics — the blackbody radiation problem is resolved by quantization of energy but this quantization occurs even for classical physics on a 4-sphere; an electron orbiting a positively charged nucleus, which is unstable in flat 3D space is not on a 4-sphere and therefore we do not have a stability of matter problem with S4 physics.  Sharper equations than the gravitational field equations appear when we consider the Ricci curvature of a three dimensional submanifold of a 4-sphere of radius 1/h in length equivalent where h is Planck’s constant.

These observations lead already to some significant answers to questions of interest to us.  There cannot have been a creator-God of the 4-sphere universe which has existed for infinite time in the past but at the same time, we must understand that living in a four-dimensional reality, human beings are naturally metaphysical beings and indeed we can identify our four-dimensional extensions as our souls.

We then know that we inhabit not only the same physical world but also share a single metaphysical universe, regardless of how complex or large it might be.  Indeed, we have an answer to how large the metaphysical universe is: it is a four-dimensional sphere of fixed radius and hence is compact with fixed 4D volume.

 

Read Full Post »

We have multiple popular mythologies and stories of natural life on our planet.  The naturalistic explanation of life from a primordial chemical pool and based on the principles developed in modern biology stands at one end of the spectrum, and on the other stand the mythologies of our religions of divine creation of life.  But it is worthwhile for us to consider the question of viruses because when we examine viruses closely, we note that they are extremely engineered microtechnological devices.

Viruses are not alive but are capable of great damage to living organisms.  Their story does not belong strictly to the principles of life enunciated either by the scientific tradition or to the mythological stories of creation of life.  Thus the questions of curiosity of how these came to be becomes significant.  They seem as though they were designed by conscious intelligence, that they were engineered, and that they were engineered not within the framework of the ecology of life.  I believe that they are technological weapons created by non-human beings, and for the sake of giving a name to these beings, I shall call the beings responsible for the creation of viruses ‘dragons’ without providing further descriptions of such beings.  I believe that viruses were created by dragons in their wars against each other which had nothing much to do directly with human beings or indeed of life on Earth but that viruses on Earth.  Viruses are found in every ecosystem on Earth and are the most abundant type of biological entity.

The conjectural explanation of evolution of viruses is that they evolved from plasmids, which are pieces of DNA that move between cells or that they evolved from bacteria seem to beg the question.  Any coherent picture of life must address the origin of viruses because these biological entities have a role in the natural world that contradicts the stories of life that can be developed from observing the rest of the spectrum of life.

Read Full Post »

The conceptual categorization of statistical learning methods provided by Hastie and Tibshirani is enormously insightful.  The supervised learning problem is to produce a method for predicting y from a p-dimensional input x from training data (x_i, y_i).  Hastie and Tibshirani give us the two extremes of the linear model which imposes a global linear structure on the problem and the nearest neighbor model which directly imposes a locally constant structure to the problem both of which are approximating the regression function E(Y|X=x).  Then they point out that other supervised learning methods fall between these extremes, and techniques such as penalization of the RSS such as for the one-dimensional x:
\sum_{i=1}^N (f(x_i) - y_i)^2 + \lamda f''(x_i)

are interpolating between no restriction on the shape of f when \lambda = 0 to forcing f to be linear when \lambda = \infty.  They point out that the biggest technical problem in high dimensions is that neighborboods of individual points x_i is large and therefore high dimension p is the central issue facing the learning problem.

In the protein shape determination problem, a priori the number of dimensions of the problem can be quite high, but our approach relies on a simplification of the problem which restricts the the dimension of the problem.  We subselect the N-atom on the backbone of the protein so that for each amino acid, we can restrict attention to the locus of the N-atom proxying for the locus of the entire amino acid.  Furthermore, we encode protein shapes as sequences of elements from SO(3) with elements from a fixed grid on SO(3), transforming the protein shape determination problem as a literal language translation problem.

 

 

Read Full Post »

All that exists is an eternal four-spherical universe governed by deterministic laws of SU(2) electromagnetism where all that is metaphysical is composed of magnetic monopoles governed also by electromagnetic laws. Although the metaphysical architecture is complex and despite ten thousand years of human effort, largely unclear, there is always the question of the spiritual ideal of Justice, whether there are greater spirits which uphold it, whether human souls are in the influence of any universal system of justice. Since the entire metaphysical universe can be usefully thought of as a gigantic electromagnetic toy with many features hidden from human knowledge, one could consider ‘the currents under the sea’ as the force of karma. Locally, karmic laws have not only been broken but overturned: any of the massacres of children and innocents we know of from the 20th century suffices to show this. For political convenience, two of my examples are the massacre of more than 300 Palestinian children in Gaza and the annihilation of around a million unarmed German POW in 1945 which virtually disappeared from history. on the other hand, a justice system requires a metaphysical power structure that is dominant in a sense. For the past two millenia, we had the dominance of the Abrahamic religions and the powers that back these.

 

Read Full Post »

We have lived through the utilitarian century par excellence, one which saw the productive capacity of science to transform us into a statistical civilization — since 1945 science in America has been fully funded and controlled by military interests which is not surprising because it was the atomic bomb that gave the imperials global supremacy. The fixation with the material had thrown man as a metaphysical being into the domain of primitive religions. Without a recognition of human beings as metaphysical beings, long evaluated by the cover, even if the cover includes strong or weak bones, the ability to perform repetitive tasks, so that our rulers feel a rationality in their panic at the population and the movement towards robots doing much of the ‘work’ of humans and therefore the necessity of wars and other events for population control. But if we do not have a firm idea of who we are and where we live, how can we have any idea of how best to organize our society?

With a system of 7 billion people, it is impossible to avoid a sort of statistical governance, where human beings are classified by features convenient for administration.  One of the more contentious issues has been the classification by metrics such as IQ, to which Stephen Jay Gould’s Mismeasure of Man addresses.  But one need not wade into controversial discussions to recognize the statistical governance of society — in economic life, the primary metric is the credit rating, which is managed by global institutions.  Unfortunately, the basis of the science behind this statistical governance is an extremely material science, which does not take into account the fundamental fact that the universe has four macroscopic spatial dimensions and that metaphysical events are objective events as well.  With the focus of science on military dominance, we have had a fundamental misidentification of human beings.  It’s an old misidentification rather than a new one — the classification of human beings into rulers and slaves. The current system has managed to thoroughly commoditize human beings as a potential labor force for the economy.  With a misdefinition follows the concern of overpopulation as a diseased state and hence ‘resolutions’ of this problem with ‘rational’ views of population reduction through wars and famine.  Much of the sort of labor that corporations need can be done by robotic machinery.  The current financial crisis — with America in 8.3% headline unemployment, with Europe over 10.7% unemployment — is an opportunity for us to rethink the definition of what we should consider as human and rethink as well the sort of politico-economic organization can address the challenges facing us as a race.  It is extremely myopic to consider our fundamental challenges to be about ‘economic competition’ because there are more fundamental goals that are largely unmet, such as basic fair and free operation of human beings in safety and with opportunities for achievement and freedom.  These cannot come without a re-examination of human beings as talent rather than labor.

Read Full Post »

The motivating problem is that we have 23,000 protein shapes and we would like to treat the protein shapes as arising from a language translation problem where the source language is composed of words of an alphabet of amino-triples and the target language has an alphabet consisting of sequences of ‘twists’ or elements from a fixed discrete grid on SO(3).  The three-dimensional rotations in SO(3) each have an axis-angle decomposition which can be modeled as the Cartesian product of a 2-sphere and a line interval representing angles.  The nontrivial part of the discretization  problem of SO(3) is the discretization of the 2-sphere.

Our solution to the discretization of the 2-sphere is to inscribe an icosahedron in the interior of the 3-ball whose surface is the 2-sphere and consider the subdivision of the 20 equilateral triangles that can be projected onto the 2-sphere.  Necessary components for this strategy is a planar point-in-polygon algorithm, which is a well-studied issue in computational geometry.  Here is a discussion  of efficiency of this algorithm.

Given a point on the 2-sphere, we find the vertices of the triangle in the fixed icosahedron to which the point belongs.  For each triple of vertices that form a triangle of the icosahedron, we perform a linear transformation in whose image the three vertices all lie in the x-y plane and we can then determine whether the given point lies within this image triangle.  We now have a trustworthy piece of code to determine whether a point is in a planar polygon. Given a point on a sphere and three vertices, we want to know whether the point belongs to the triangle on the sphere formed by the given vertices. We perform a linear transformation that maps the three vertices to the x-y plane and then do the check for the image under this linear transformation of the test point.

The full working code for discretize.pl is here.  We simply use a 2^5 subdivision grid of the triangular face corresponding to the test point and report the point.  This direct determination of grid point is much faster than fitting to an explicit grid.

#!/usr/bin/perl
use Math::MatrixReal;

# icosahedral vertices

my $phi = ( 1 + sqrt(5) ) / 2;
my $icv = [ [ 0, 1, $phi],
[ 0, -1, -$phi],
[ 0, 1, -$phi],
[ 0, -1, $phi],

[ 1, $phi, 0],
[ -1, -$phi, 0],
[ 1, -$phi, 0],
[ -1, $phi, 0],

[ $phi, 0, 1],
[ -$phi, 0, -1],
[ -$phi, 0, 1],
[ $phi, 0, -1]
];

# Let’s determine the triangle vertices by edge-length 2 criterion

my $edge_indices;
my $eps = 0.00001;
for (my $i = 0; $i < 12; $i++) {
for (my $j = 0; $j < 12; $j++) {
for (my $k = 0; $k < 12; $k++) {

if (abs(&sdist(@{$icv->[$i]},@{$icv->[$j]})-2) < $eps &&
abs(&sdist(@{$icv->[$j]},@{$icv->[$k]})-2) < $eps &&
abs(&sdist(@{$icv->[$i]},@{$icv->[$k]})-2) < $eps ) {
push @$edge_indices, [$i,$j,$k];
}
}
}
}

for (my $i = 0; $i < 12; $i++) {
$icv->[$i]->[0] /= sqrt( 1 + $phi * $phi );
$icv->[$i]->[1] /= sqrt( 1 + $phi * $phi );
$icv->[$i]->[2] /= sqrt( 1 + $phi * $phi );
}

# Let’s prepare some tests to choose the correct edge indices
# from the icosahedron
my @d;
my @fit;
my $mfit;

while (<>) {

chomp;
@d = split “,”, $_;
my $x = $d[3];
my $y = $d[4];
my $z = $d[5];
my @v1 = ($x, $y, $z);
my $v = Math::MatrixReal->new_from_cols( [ [$x, $y, $z] ]);
my @vert;

my $transform;
my ( $s1, $s2, $s3 );
foreach my $vertices (@$edge_indices) {

my @ic0 = @{$icv->[ $vertices->[0] ]};
my @ic1 = @{$icv->[ $vertices->[1] ]};
my @ic2 = @{$icv->[ $vertices->[2] ]};

$s1 = &sdist( @v1, @ic0);
$s2 = &sdist( @v1, @ic1);
$s3 = &sdist( @v1, @ic2);

next unless ( $s1 < 2 && $s2 < 2 && $s3 < 2 );

$transform = &flattening_matrix( $icv->[ $vertices->[0] ],
$icv->[ $vertices->[1] ],
$icv->[ $vertices->[2] ] );

next unless abs( $transform ) > 0.001;

my $image = $transform->multiply( $v );

@a_x = (0, 1, 0);
@a_y = (0, 0, 1);

@a = ( $image->element(1,1), $image->element(2,1));

if ( &_pointIsInPolygon( \@a, 3, \@a_x, \@a_y ) ){
@vert = ($vertices->[0],$vertices->[1],$vertices->[2]);
$mfit = Math::MatrixReal->new_from_cols( [ [ int($a[0]*32)/32, int($a[1]*32)/32,0 ] ]);
my $imfit = $transform->inverse->multiply($mfit);
printf “$d[0],$d[1],$d[2],%.5f,%.5f,%.5f,%.5f\n”,
$imfit->element(1,1), $imfit->element(2,1) ,
$imfit->element(3,1),$d[6];

last;
}
}

}

sub sdist {
my ($a, $b, $c, $d, $e, $f) = @_;
my $t = 0;
$t += ($a-$d)*($a-$d);
$t += ($b-$e)*($b-$e);
$t += ($c-$f)*($c-$f);
return sqrt($t);
}

# We need a subroutine that, given 3 points in R3 and a test point
# produces the matrix that transforms the 3 points to the x-y plane.
# This transformation can be applied to the test point and then it can
# be determined whether the point projected to the plane of the three
# points lie in the triangle or not

sub flattening_matrix {
my ($pa, $pb, $pc) = @_;

my $Rinv = Math::MatrixReal->new_from_cols (
[ [ $pa->[0] – $pb->[0], $pa->[1] – $pb->[1], $pa->[2] – $pb->[2] ],
[ $pc->[0] – $pb->[0], $pc->[1] – $pb->[1], $pc->[2] – $pb->[2] ],
[ 0, 0, 1 ] ] );
return $Rinv->inverse if $Rinv->det > 0;
$Rinv = Math::MatrixReal->new_from_cols (
[ [ $pa->[0] – $pb->[0], $pa->[1] – $pb->[1], $pa->[2] – $pb->[2] ],
[ $pc->[0] – $pb->[0], $pc->[1] – $pb->[1], $pc->[2] – $pb->[2] ],
[ 0, 0, 2 ] ] );
return $Rinv->inverse;
}

sub _pointIsInPolygon {

my ($a_point, $n, $a_x, $a_y) = @_;

my ($x, $y) = ($a_point->[0], $a_point->[1] );

my @x = @$a_x;
my @y = @$a_y;

my ($i,$j);
my $side = 0;
for ($i = 0, $j = $n – 1; $i < $n; $j = $i++) {
if (
(
( ($y[$i] <= $y ) && ( $y < $y[$j] ) ) ||
( ($y[$j] <= $y ) && ( $y < $y[$i] ) )
)
and
( $x
<
( $x[$j] – $x[$i] ) *
( $y – $y[$i] ) / ( $y[$j] – $y[$i] + $x[$i] ) )
) {
$side = not $sidel
}
}
return $side ? 1 : 0;
}

This code is reasonably fast in producing discretized twist distributions.  In several hours it had processed around 3,500 protein shapes.  After this discretization steps we have before us a vast simplification of the protein shape determination problem into a literal language translation problem.

At this point the key issue is maximum probability determination of the twist sequence given the amino-triple sequence.  The noisy channel model for translation given an n-gram language model applies literally, and these have shown success which shows in applications such as translate.google.com.

In an introductory chapter of the great book by Hastie and Tibshirani called ‘The Elements of Statistical Learning’, the authors introduce the two extremes of supervised learning, the linear model and the k-nearest neighbor model.  The first has low variance and high bias, and the latter with low bias and high variance because of the complexity of the decision boundary.  Both, the authors tell us, are attempting to estimate the conditional expectation E( Y | X=x) where Y are the response and X the features.  There are various bells and whistles and modifications to these basic methods used in practice.  In our case, the input are sequences of amino triples, the output are sequences of twists from a fixed grid on SO(3) and it is thus worthwhile directly considering the estimate of the conditional expectations directly from the empirical distributions.

In order to ensure that a 2-gram model of protein sequence language is used, rather than consider expectations of twist distributions given the amino-triple, we can consider the expectation of the twist distributions as a response to a quadruple amino sequence.

Let us now consider the accuracy of the shape determination procedure using this simplest of procedures.  We first consider the error of prediction calculating error in terms of the predicted scalar angles, ignoring the more difficult problem of the accurate prediction of the entire twist, in order to see some immediate positive results.  We consider the prediction of the protein 128L which has 162 residues and we print the predictions along with a last column containing the error of prediction of the scalar angle.  We can see that these predictions are quite accurate:

$ ./pred1.pl < rot/pdb128l.ent.gz | head -50
ILE,PHE,GLU,0.00000,0.00000,0.00000,0.00000,1.82930
PHE,GLU,MET,-0.01545,-0.81661,-0.07242,1.70211,0.01000
GLU,MET,LEU,-0.64290,-0.33304,-0.36109,1.82534,0.02197
MET,LEU,ARG,-0.57574,-0.32459,-0.29883,1.69516,0.02082
LEU,ARG,ILE,0.24985,-0.71411,-0.18425,1.80114,0.02593
ARG,ILE,ASP,-0.26774,-0.67072,-0.18838,1.83238,0.02740
ILE,ASP,GLU,-0.62596,-0.32133,-0.39039,1.82159,0.01567
ASP,GLU,GLY,-0.44846,-0.64328,-0.21500,1.92199,0.01633
GLU,GLY,LEU,-0.15332,0.66341,-0.03253,2.78331,0.13320
GLY,LEU,ARG,0.51562,0.54873,-0.25659,2.37589,0.00665
LEU,ARG,LEU,-0.43813,-0.60392,-0.22962,2.71057,0.02940
ARG,LEU,LYS,0.63005,0.27541,-0.32706,2.34085,0.03960
LEU,LYS,ILE,0.57323,0.37242,-0.28243,2.50481,0.07490
LYS,ILE,TYR,0.31936,0.24167,-0.71256,1.97675,0.02994
ILE,TYR,LYS,-0.29235,0.30124,-0.63592,2.84853,0.03531
TYR,LYS,ASP,0.53204,-0.51246,-0.27904,2.17604,0.01610
LYS,ASP,THR,-0.15415,0.75088,-0.04554,2.47935,0.00561
ASP,THR,GLU,-0.52823,-0.51630,-0.27205,1.76078,0.01924
THR,GLU,GLY,-0.62492,-0.39336,-0.29313,1.80456,0.01923
GLU,GLY,TYR,-0.43210,0.14987,-0.65044,2.06794,0.04172
GLY,TYR,TYR,0.04964,-0.76187,-0.12088,2.77257,0.00457
TYR,TYR,THR,-0.61892,-0.36746,-0.31963,2.09090,0.01223
TYR,THR,ILE,-0.28229,0.63505,-0.19401,2.94013,0.01753
THR,ILE,GLY,-0.28513,-0.70733,-0.18180,2.78674,0.03788
ILE,GLY,ILE,-0.62469,-0.20498,-0.33873,1.99582,0.00640
GLY,ILE,GLY,-0.45262,-0.05377,-0.08809,1.77385,0.02790
ILE,GLY,HIS,-0.38423,-0.67365,-0.22894,1.63042,0.00114
GLY,HIS,LEU,0.22821,0.23999,-0.70323,2.58092,0.02354
HIS,LEU,LEU,-0.52219,-0.58846,-0.26146,2.27405,0.01377
LEU,LEU,THR,0.65420,0.38903,-0.30570,2.56197,0.03748
LEU,THR,LYS,0.29343,0.20157,-0.33087,2.26270,0.11984
THR,LYS,SER,-0.72560,0.02186,-0.34391,2.10914,0.00820
LYS,SER,PRO,0.62407,-0.39187,-0.30560,2.66043,0.00866
SER,PRO,SER,-0.62895,0.39148,-0.31270,2.04670,0.03064
PRO,SER,LEU,-0.56835,-0.54258,-0.25934,2.60999,0.00893
SER,LEU,ASN,0.08298,0.77833,-0.05253,2.17844,0.10926
LEU,ASN,ALA,-0.24304,0.71517,0.01647,1.83585,0.06658
ASN,ALA,ALA,-0.38080,0.63069,-0.25519,1.68472,0.03377
ALA,ALA,LYS,0.55955,0.30762,-0.29974,1.85303,0.01296
ALA,LYS,SER,0.15976,0.69516,-0.02785,1.74466,0.02338
LYS,SER,GLU,-0.18506,0.26990,-0.78810,1.79828,0.01298
SER,GLU,LEU,0.19001,0.52159,-0.17923,1.70544,0.00964
GLU,LEU,ASP,0.29312,0.19942,-0.71027,1.80430,0.01327
LEU,ASP,LYS,-0.09830,0.72205,-0.07200,1.77070,0.03559
ASP,LYS,ALA,-0.44741,0.53774,-0.27296,1.73898,0.00035
LYS,ALA,ILE,0.65346,0.36703,-0.31424,1.78604,0.02450
ALA,ILE,GLY,0.22745,0.24037,-0.72297,1.83885,0.00483
ILE,GLY,ARG,0.43720,-0.59057,-0.25519,2.26561,0.02019
GLY,ARG,ASN,0.57427,-0.44610,-0.29035,2.66629,0.00228
ARG,ASN,THR,0.61347,-0.58314,-0.27114,1.90064,0.00269

The code is here:

#!/usr/bin/perl

my @p;
my @qdata;
my @d;
while (<>) {
    chomp;
    @d = split ",", $_;
    push @p, $d[1];
    push @qdata, [ $d[3], $d[4], $d[5], $d[6] ];
}

my $N = @p + 0;
my @v;

for (my $i = 3; $i < $N; $i++) {

    my $quad = "$p[$i-3]-$p[$i-2]-$p[$i-1]-$p[$i]";
    @v = &mean_twist($quad);
    my $zd = &zdist( \@v, $qdata[$i-1] );
   printf "$p[$i-2],$p[$i-1],$p[$i],%.5f,%.5f,%.5f,%.5f,%.5f\n", @v, $zd;
}

sub zdist {
    my ($a,$b) = @_;
    my $t = 0;
#    $t += ($a->[0] - $b->[0]) * ( $a->[0] - $b->[0] );
#    $t += ($a->[1] - $b->[1]) * ( $a->[1] - $b->[1] );
#    $t += ($a->[2] - $b->[2]) * ( $a->[2] - $b->[2] );
    $t += ($a->[3] - $b->[3]) * ( $a->[3] - $b->[3] );
    return sqrt($t);

}
sub mean_twist {
    my $q = shift;
    open Q, "</home/zulf/prot/quadruples/$q";
    my @d;
    my @v = (0,0,0,0);
    my $N = 0;
    while (<Q>) {
	chomp;
	@d = split ",", $_;
	$N++;
	$v[0] += $d[0];
	$v[1] += $d[1];
	$v[2] += $d[2];
	$v[3] += $d[3];
    }
    close Q;

    if ( $N > 0 ) {
	$v[0] /= $N;
	$v[1] /= $N;
	$v[2] /= $N;
	$v[3] /= $N;
    }
    return @v;
}

Over several thousand protein shapes predicted by this method, we find that the average angular error is around 0.2 radians which is quite promising for such a simple algorithm.  On a fairly simple laptop, it takes a few seconds for the prediction of a protein shape.

Read Full Post »

Older Posts »