Feeds:
Posts

## Sparsity of amino triple distribution given a rotation axis from a grid on the 2-sphere

The problem we are addressing is the determination of protein shape given its amino acid sequence as a problem of language translation.  We are attempting to use the noisy channel model of translation, which for translation of a French sentence f to an English sentence e would seek the e that maximizes probability  P( e | f) which amounts to seeking the e that maximizes the product P(e ) P ( f | e).  In our case, the translation is happening from strings of an alphabet consisting of amino acid triples (f) to one consisting of sequences of rotations (e).  Now P(e), being independent of f, can be precomputed and does not produce any problems for tractability of computation, so the computational difficulty of this maximization problem localizes to P( f | e ).

We report a positive feature of the distribution of amino triples (f) conditioned on fixed twist from a grid on the 2-sphere, which is part of the discretization of the rotation group SO(3) because rotations have the axis-angle decomposition and the axes form a 2-sphere.  The positive feature is that among 8,000 possible amino triples that could arise once we fix a grid point, once we remove noise where noise is defined as a single count, there is an enormous sparsity, and typically less than 10 amino triples actually arise in the distribution (f|e).  This implies that for a fixed twist, the probability maximization amounts to checking ~10 points, and thus for N grid points, this maximization problem is O(N).

Here is a typical example of the output, where the coordinates of the grid points are marked by ===== and then the triples and the counts follow:

=====[-0.0616,0.2664,0.9619]
ALA-LEU-LEU:2
ARG-ARG-ARG:6
GLY-ASP-LYS:2
HIS-HIS-SER:4
LYS-LYS-ILE:2
THR-TRP-TRP:2
VAL-LEU-HIS:3
=====[0.0000,0.2610,0.9653]
ARG-ARG-ARG:3
HIS-PHE-ASP:2
PRO-PRO-GLY:2
THR-PRO-LYS:2
=====[0.0000,0.1527,0.9883]
ARG-ARG-ARG:3
MET-TYR-ALA:3
THR-ILE-LEU:2
=====[0.0000,0.2258,0.9742]
ARG-ARG-ARG:2
=====[-0.0627,0.1937,0.9791]
ARG-ARG-ARG:2
GLY-ASP-LYS:2
HIS-ASP-LYS:9

## Beta code (still being debugged) of putting a point on a 2-sphere into an icosahedral grid

#!/usr/bin/perl

use Sphere;

use Math::Vector::Real;
use Math::MatrixReal;
use Math::Trig;
use Getopt::Long;
use Statistics::Descriptive;

my ($x,$y, $z,$t);
my $L = 1; GetOptions( "x=f" => \$x,
"y=f" => \$y, "z=f" => \$z,
"L=i" => \$L ); # ===== Main program ===== while ( <> ) { chomp; my @d = split ",",$_;
my ($x,$y,$z) = ($d[3], $d[4],$d[5] );
#print "Pt $x$y $z\n"; my ($fit, $error ) = &fit_point_to_grid($x, $y,$z, $L); print "Error =$error\n";
print "------------------------------\n";
}

sub fit_point_to_grid {

my ($x,$y, $z,$L) = @_;

# Check to ensure unit vector
my $Q = V($x, $y,$z);
my $len = abs($Q );

if (abs( $len - 1 ) > 1e-4) { print "Not a unit vector. Length=$len  Scaling\n";
$x =$x/$len;$y = $y/$len;
$z =$z/$len; } @px = ($x, $y,$z);

my $points =$Sphere::iv;

# Get the icosahedral vertices:

my @X = &Sphere::triangulation_vertices( $x,$y, $z,$points );

#print "------ past search for initial icosahedral vertices\n";

my @px = ($x,$y, $z ); my @Y = &Sphere::fit_to_triangular_subdivision ( \@px,$X[0], $X[1],$X[2], $L ); #print "---- past fit to triangular subdivision\n"; my$fitx = V( $Y[0],$Y[1], $Y[2] ); my$givenx = V( $x,$y, $z );$fitx = $fitx->versor; my$error = abs( $givenx -$fitx );

return ( $fit,$error );
}

#!/usr/bin/perl

package Sphere;
use strict;
use Math::Vector::Real;
use Math::MatrixReal;
use Math::Trig;
use Getopt::Long;
use List::Util qw(max min);
use Data::Dumper;

use vars qw( $iv$phi $icoindices); BEGIN { use Exporter(); @Sphere::ISA = qw( Exporter ); @Sphere::EXPORT = qw($iv $phi$icoindices);

# The icosahedron has 12 vertices and 20 triangular faces.
# We produce a grid on the sphere by barycentric division of
# the icosahedron inscribed in the sphere, which has 12 vertices
# and 20 triangular faces

$phi = ( 1 + sqrt(5) ) / 2;$iv = [
[ 0,  1,  $phi ], [ 0, -1,$phi ],
[ 0,  1, -$phi ], [ 0, -1, -$phi ],
[ 1,  $phi, 0 ], [ 1, -$phi,  0 ],
[ -1, $phi, 0 ], [ -1,-$phi,  0 ],
[ $phi, 0, 1 ], [$phi, 0,  -1 ],
[ -$phi, 0, 1 ], [ -$phi, 0, -1] ];

for (my $i = 0;$i < 12; $i++) {$iv->[$i]->[0] /= sqrt( 1 +$phi * $phi);$iv->[$i]->[1] /= sqrt( 1 +$phi * $phi);$iv->[$i]->[2] /= sqrt( 1 +$phi * $phi); } sub icosahedron_indices { my$indices;
my $eps = 1e-3; my$p = $iv; for (my$i = 0; $i < 12;$i++) {
for ( my $j = 0;$j < 12; $j++ ) { for ( my$k = 0; $k < 12;$k++ ) {

my $u = sqrt(1+$phi*$phi)*V( @{$p->[$i]}); my$v = sqrt(1+$phi*$phi)*V( @{$p->[$j]});
my $w = sqrt(1+$phi*$phi)*V( @{$p->[$k]}); my$d1 = abs( $u -$v );
my $d2 = abs($v - $w ); my$d3 = abs( $w -$u );

if ( abs( $d1 - 2) <$eps &&
abs( $d2 - 2) <$eps &&
abs( $d3 - 2) <$eps &&
$i <$j && $j <$k ) {
$indices->{$i}->{$j}->{$k}++;
}

}
}
}

my $ico_indices; foreach my$a ( sort keys %$indices ) { foreach my$b ( sort keys %{$indices->{$a}} ) {
foreach my $c ( sort keys %{$indices->{$a}->{$b}} ) {
push @$ico_indices, [$a, $b,$c];
}
}
}
return $ico_indices; }$icoindices = &icosahedron_indices();
}

sub fit_to_triangular_subdivision {

my ( $x,$u, $v,$w, $L ) = @_; my ($x1, $x2,$x3 ) = @$x; my ($u1, $u2,$u3 ) = @$u; my ($v1, $v2,$v3 ) = @$v; my ($w1, $w2,$w3 ) = @$w; my @px = &project_to_plane($x, $u,$v, $w ); # if ( ! &pntriangle3D( \@px,$u, $v,$w ) ) {
#	print "In fit to triangular subdivision\n";
#	print "The point is not in the triangle formed by the vertices\n";
#	return ($u1,$u2, $u3,$v1, $v2,$v3, $w1,$w2, $w3); # } else { # } # Generate the subdivision to level L my @newpx = ($px[0], $px[1],$px[2] );
my @nu    = ( $u1,$u2, $u3 ); my @nv = ($v1, $v2,$v3 );
my @nw    = ( $w1,$w2, $w3 ); for ( my$i = 0; $i <$L; $i++ ) { my @X = &refine_triangular_subdivision( \@newpx, \@nu, \@nv, \@nw ); @nu = ($X[0], $X[1],$X[2] );
@nv = ( $X[3],$X[4], $X[5] ); @nw = ($X[6], $X[7],$X[8] );
}

# Now we can set x to the closest grid point, which will be nu
# Let us record the distance as well

my $error = abs( V($x1,$x2,$x3) - V($nu[0],$nu[1], $nu[2] )); return ($nu[0], $nu[1],$nu[2], $error ); } # This is code for producing barycentric subdivision one level deeper # and will give coordinates of the triangular corners where the point # x falls sub refine_triangular_subdivision { my ($x, $u,$v, $w,$L ) = @_;

my ($x1,$x2, $x3 ) = @$x;
my ($u1,$u2, $u3 ) = @$u;
my ($v1,$v2, $v3 ) = @$v;
my ($w1,$w2, $w3 ) = @$w;

#    if ( ! &pntriangle3D( $x,$u, $v,$w ) ) {
#	print "In refine triangular subdivision\n";
#	print "The point is not in the triangle formed by the vertices\n";
#	return ($u1,$u2, $u3,$v1, $v2,$v3, $w1,$w2, $w3); # } my$p = [ [ $u1,$u2, $u3], [$v1, $v2,$v3],
[ $w1,$w2, $w3], [ ($u1 + $v1)/2, ($u2+$v2)/2, ($u3+$v3)/2 ], [ ($w1 + $v1)/2, ($w2+$v2)/2, ($w3+$v3)/2 ], [ ($w1 + $u1)/2, ($w2+$u2)/2, ($w3+$u3)/2 ], ]; # my @ix = &vertex_indices_triangulation($x1, $x2,$x3, $points ); # return ( #$ix[0]->[0], $ix[0]->[1],$ix[0]->[2],
#	$ix[1]->[0],$ix[1]->[1], $ix[1]->[2], #$ix[2]->[0], $ix[2]->[1],$ix[2]->[2],
#	);

if ( &pntriangle3D( $x,$p->[0], $p->[3],$p->[5] )) {

return (	$p->[0]->[0],$p->[0]->[1], $p->[0]->[2],$p->[3]->[0], $p->[3]->[1],$p->[3]->[2],
$p->[5]->[0],$p->[5]->[1], $p->[5]->[2] ) ; } if ( &pntriangle3D($x, $p->[1],$p->[3], $p->[4] )) { return ($p->[1]->[0], $p->[1]->[1],$p->[1]->[2],
$p->[3]->[0],$p->[3]->[1], $p->[3]->[2],$p->[4]->[0], $p->[4]->[1],$p->[4]->[2] ) ;
}

if ( &pntriangle3D( $x,$p->[2], $p->[4],$p->[5] )) {

return (	$p->[2]->[0],$p->[2]->[1], $p->[2]->[2],$p->[4]->[0], $p->[4]->[1],$p->[4]->[2],
$p->[5]->[0],$p->[5]->[1], $p->[5]->[2] ) ; } if ( &pntriangle3D($x, $p->[3],$p->[4], $p->[5] )) { return ($p->[3]->[0], $p->[3]->[1],$p->[3]->[2],
$p->[4]->[0],$p->[4]->[1], $p->[4]->[2],$p->[5]->[0], $p->[5]->[1],$p->[5]->[2] ) ;
}

my @ix = &triangulation_vertices_old( $x1,$x2, $x3,$p );

return (
$ix[0]->[0],$ix[0]->[1], $ix[0]->[2],$ix[1]->[0], $ix[1]->[1],$ix[1]->[2],
$ix[2]->[0],$ix[2]->[1], $ix[2]->[2], ); } # Given an x on a 2-sphere, we want to be able to project it # to the plane determined by three points. If the three points are u, v,w # then this problem can be solved by translating by v # so that the points are x-v, u-v, 0, w-v, projecting x-v to the # plane determined by u-v and w-v and then translating the result P(x-v) back # by adding v. sub project_to_plane { my ($x, $u,$v, $w ) = @_; my ($x1, $x2,$x3 ) = @$x; my ($u1, $u2,$u3 ) = @$u; my ($v1, $v2,$v3 ) = @$v; my ($w1, $w2,$w3 ) = @$w; # Translate to origin by subtracting v, the rotate so u-v, w-v are in xy # plane, project to xy plane, reverse rotation add v my$px = Math::MatrixReal->new_from_cols(
[ [ $x1 -$v1, $x2 -$v2, $x3 -$v3 ] ]);
my $v = Math::MatrixReal->new_from_cols( [ [$v1, $v2,$v3 ] ]);

my $up = V($u1 - $v1,$u2 - $v2,$u3 - $v3 ); my$wp = V( $w1 -$v1, $w2 -$v2, $w3 -$v3 );
my $cuw =$up x $wp; if (abs($cuw) > 0) {
$cuw =$cuw->versor;
} else {
return ( $x->[0],$x->[1], $x->[2] ); } my$zaxis = V( 0, 0, 1);

my $angle = acos($cuw * $zaxis ); my$axis = $zaxis x$cuw;
if ( abs( $axis ) < 1e-6 ) {$axis = $zaxis; } else {$axis = $axis->versor; } # We shall create a rotation matrix that transforms cuw to zaxis # so that the images under the rotation of up and wp can be used # in the planar triangle containment code my$R = &rotation_matrix( $axis->[0],$axis->[1], $axis->[2],$angle);
my $Ri =$R->inverse;

my $Rp =$R->multiply( $px ); my$upm = Math::MatrixReal->new_from_cols(
[ [ $up->[0],$up->[1], $up->[2] ] ]); my$wpm = Math::MatrixReal->new_from_cols(
[ [ $wp->[0],$wp->[1], $wp->[2] ] ]); my$image_up = $R->multiply($upm );
my $image_wp =$R->multiply( $wpm ); my$Rp_proj = Math::MatrixReal->new_from_cols(
[ [ $Rp->element(1,1),$Rp->element(2,1), 0 ] ] );

my $ppx =$Ri->multiply( $Rp_proj ) +$v;

return ( $ppx->element(1,1),$ppx->element(2,1), $ppx->element(3,1) ); } # It is useful have the vector to translate the plane determined # by three vectors to the origin by a vector orthogonal to the plane sub orthogonal_translate { my ($u1, $u2,$u3, $v1,$v2, $v3,$w1, $w2,$w3 ) = @_;

my $v = V($v1, $v2,$v3 );
my $tu = V($u1-$v1,$u2 - $v2,$u3 - $v3 ); my$tw = V( $w1-$v1, $w2 -$v2, $w3 -$v3 );

# Subtract off the component parallel to the plane

my $normal =$tu x $tw;$normal = $normal->versor; my$tv = ( $v *$normal ) * $normal; return ($tv->[0], $tv->[1],$tv->[2] );
}

# The following function will take a unit vector on the 2-sphere
# and return the indices of the icosahedral vertices which bound the point
# on the sphere.  It should be generalized to find the vertices bounding
# the point in a general triangulation

sub triangulation_vertices {

my ($x,$y, $z ,$points) = @_;
&check_points( $points ); my @pv = ($x, $y,$z);

my @I = @$Sphere::icoindices; foreach my$t ( @I ) {

my @ti = @$t; #print join(" ", @ti) . "\n"; my @u = @{$points->[$ti[0]]}; my @v = @{$points->[$ti[1]]}; my @w = @{$points->[$ti[2]]}; my @px = &project_to_plane( \@pv, \@u, \@v, \@w ); if ( &pntriangle3D( \@px,$points->[$t->[0]],$points->[$t->[1]],$points->[$t->[2]] )) { return ($points->[$t->[0]],$points->[$t->[1]],$points->[$t->[2]] ); } } print "triangulation_vertices: point not in triangle\n"; return ($points->[0], $points->[1],$points->[2] );
}

sub triangulation_vertices_old {

my ($x,$y, $z ,$points) = @_;

&check_points( $points ); my$p = V( $x,$y, $z); sub distance { my$qq = shift;
my $q = V($qq->[0], $qq->[1],$qq->[2] );

# Use angular distance assuming $p is a unit vector # return acos($p * $q / (abs($q)*abs($p))); return abs($p - $q ); } sub bydistance { &distance($a) <=> &distance($b); } my @X = sort bydistance @$points;

my $N = @$points + 0;

my @pv = ($x,$y, $z); my @px = &project_to_plane( \@pv,$X[0],$X[1],$X[2] );

if (  &pntriangle3D( \@pv, $X[0],$X[1], $X[2] ) ) { return ($X[0], $X[1],$X[2] );

} else {

=pod
print "triangulation_vertices: point not in triangle\n";
print "Projection:\n";
print join( " ",  @px ) . "\n";
print "Vertices:\n";
print join( " ", @{$X[0]}) . "\n"; print join( " ", @{$X[1]}) . "\n";
print join( " ", @{$X[2]}) . "\n"; =cut }$p = V( $px[0],$px[1], $px[2] ); @X = sort bydistance @$points;

return ( $X[0],$X[1], $X[2] ); } sub rotation_matrix { my ($x, $y,$z, $t ) = @_; my$c = cos($t); my$s = sin($t); my$C = 1-$c; my$Q = Math::MatrixReal->new_from_rows(
[
[ $x *$x * $C +$c, $x *$y * $C -$z*$s,$x*$z*$C + $y*$s],
[ $y *$x * $C +$z * $s,$y * $y *$C + $c,$y * $z *$C - $x *$s],
[ $z *$x * $C -$y * $s,$z * $y *$C + $x *$s, $z *$z * $C +$c]
]);

return $Q; } sub pntriangle3D { my ($x, $u,$v, $w ) = @_; my ($x1, $x2,$x3 ) = @$x; my ($u1, $u2,$u3 ) = @$u; my ($v1, $v2,$v3 ) = @$v; my ($w1, $w2,$w3 ) = @$w; my @proj = &project_to_plane($x, $u,$v, $w ); # my$px = Math::MatrixReal->new_from_cols(
#	[ [ $proj[0]-$v1, $proj[1]-$v2, $proj[2]-$v3 ] ]);
my $px = Math::MatrixReal->new_from_cols( [ [$x1-$v1,$x2-$v2,$x3-$v3 ] ]); my$up = V( $u1 -$v1, $u2 -$v2, $u3 -$v3 );
my $wp = V($w1 - $v1,$w2 - $v2,$w3 - $v3 ); my$cuw = $up x$wp;
if ( abs( $cuw ) > 0 ) {$cuw = $cuw->versor; } else { return 0; } my$zaxis = V( 0, 0, 1);

my $angle = acos($cuw * $zaxis ); my$axis = $zaxis x$cuw;
if ( abs($axis) < 1e-5 ) {$axis = $zaxis; } else {$axis = $axis->versor; } # We shall create a rotation matrix that transforms cuw to zaxis # so that the images under the rotation of up and wp can be used # in the planar triangle containment code my$R = &rotation_matrix( $axis->[0],$axis->[1], $axis->[2],$angle);

my $Rp =$R->multiply( $px ); my$upm = Math::MatrixReal->new_from_cols(
[ [ $up->[0],$up->[1], $up->[2] ] ]); my$wpm = Math::MatrixReal->new_from_cols(
[ [ $wp->[0],$wp->[1], $wp->[2] ] ]); my$image_up = $R->multiply($upm );
my $image_wp =$R->multiply( $wpm ); my @vertx = ( 0,$image_up->element(1,1), $image_wp->element(1,1) ); my @verty = ( 0,$image_up->element(2,1), $image_wp->element(2,1) ); =pod print "In pntriangle3D:\n"; print "----------------\n"; print$Rp;
print "\n";
print $image_up; print "\n"; print$image_wp;
print "\n";
=cut
return &pnpoly( 3, \@vertx, \@verty, $Rp->element( 1, 1),$Rp->element(2, 1));

}
#  This is simply a wrapper for point_in_polygon
#
sub pnpoly {
my ($nvert,$x, $y,$tx, $ty ) = @_; # HACK: just create a triangle my @polygon = ( [ [$x->[0], $y->[0] ], [$x->[1], $y->[1] ] ], [ [$x->[1], $y->[1] ], [$x->[2], $y->[2] ] ], [ [$x->[2], $y->[2] ], [$x->[0], $y->[0] ] ], ); my @point = ($tx, $ty ); return &point_in_polygon( \@point, \@polygon ); } # # point in polygon -- tested # sub point_in_polygon { my ($point, $polygon ) = @_; my$count = 0;
foreach my $side ( @$polygon ) {
$count++ if &ray_intersect_segment($point, $side ); } return ($count % 2 == 0 ) ? 0:1;
}

my $eps = 1e-6; my$inf = 1e600;

sub ray_intersect_segment
{
my ( $point,$segment ) = @_;

my ( $A,$B ) = @$segment; my @P = @$point;

($A,$B) = ( $B,$A)  if ($A->[1] >$B->[1]);

$P[1] +=$eps if ( $P[1] ==$A->[1] ) || ( $P[1] ==$B->[1]);

return 0 if ( $P[1] <$A->[1]) || ( $P[1] >$B->[1] ) || ($P[0] > max($A->[0], $B->[1])); return 1 if ($P[0] < min ($A->[0],$B->[0]));

my $m_red = ($A->[0] != $B->[0]) ? ($B->[1] - $A->[1] ) / ($B->[0] - $A->[0] ) :$inf;
my $m_blue = ($A->[0] != $P[0] ) ? ($P[1] - $A->[1])/($P[0] - $A->[0]) :$inf;

return ($m_blue >=$m_red)? 1:0;

}

sub check_points {
my $p = shift; my$N = @$p+0; for (my$i = 0; $i <$N; $i++) { if ( !$p->[$i]->[0] ) {$p->[$i]->[0] = 0; } if ( !$p->[$i]->[1] ) {$p->[$i]->[1] = 0; } if ( !$p->[$i]->[2] ) {$p->[$i]->[2] = 0; } } } 1; Read Full Post » ## Putting a grid on a sphere and approximations of points using the grid From the twist decomposition of protein shapes of 38,737 protein structures I have obtained from RCSB I have a large corpus. The main idea is to consider the shape determination of proteins given amino sequences as a language translation problem, where the noisy channel model from information theory had yielded good enough results that the applications include Google’s translate. These methods use n-gram language models and here is a paper that describes these methods. Recall that the twist decomposition of protein shapes proceeds via subselecting N-atoms along the backbone of the protein and then decomposing this in a series of SO(3) elements. For our purposes, SO(3) can be treated as $S^1 \times S^2$ by the axis-angle decomposition for three-dimensional rotations, although topologically it is not homeomorphic to this space. Our language translation approach to protein shapes as SO(3) sequences would benefit tremendously if we could show that the twists in proteins arise from a finite alphabet, and we would like to check if such an alphabet can be produced from a fixed grid on SO(3). I had shown in 2009 that the scalar angles of these twists are discretized, and earlier this year that the axis distributions are highly concentrated in narrow regions of SO(3). It is worthwhile returning to re-establish these results more rigorously. Now I have learned of a reasonable way to put a grid on the 2-sphere from the work of Shroeder and Sweldens who analyse wavelets on spheres. The idea is to inscribe an icosahedron in the sphere which has 20 triangular faces and then at each refinement step subdivide the triangles. The grid thus produced on the icosahedron then can be projected to the sphere inscribing it which gives the method of arbitrarily fine-grained grids on the 2-sphere. Given a point $x \in S^2$, one expects that a sequence of refinements to the grid would produce less error in approximating this point at the cost of increasing the number of grid points. Given a large number of points on the 2-sphere, the question is to find an appropriate balance between the size of the grid and the accuracy — we would like the smallest possible grid without accepting too much error. These are to be quantified for a rigorous answer to the question of producing a finite alphabet of twists. There are some nontrivial aspects to implementation of this idea in perl. The first issue is that there does not seem to be a module in the public repository with functions that are needed for this sort of analysis. For example, it would be extremely valuable for us to have a function that given three points in the euclidean 3-space and a fourth test point to tell us whether the test point is contained in the triangle formed by the three points. One can adapt the code for the planar case of the problem, for which code can be found here for example. After an initial attempt at resolving this problem it becomes clear that here that one has to implement some basic linear algebraic manipulation tools. With Math::Vector::Real and Math::MatrixReal one can resolve this with the following code: sub pntriangle3D { my ($x, $y,$z, $u1,$u2, $u3,$v1, $v2,$v3, $w1,$w2, $w3 ) = @_; # This matrix transforms the vectors forming the edges of the # triangle the standard basis of the plane, column vectors [0,1] # and [1,0] my$R = Math::MatrixReal->new_from_cols(
[
[ $u1 -$v1, $u2 -$v2, $u3 -$v3],
[ $w1 -$v1, $w2 -$v2, $w3 -$v3],
[         0,         0,         1]
]);
my $p = Math::MatrixReal->new_from_cols( [ [$x - $v1,$y - $v2,$z - $v3 ] ]); my$Rp = $R->multiply($px );

# the triangle is bounded by [0,0], [0,1] and [1,0] and we
# simply need to check that the first two components are in
# these bounds

my $vertx = [ 0, 1, 0 ]; my$verty = [ 0, 0, 1 ];

return &pnpoly( 3, $vertx,$verty, $Rp->element( 1, 1),$Rp->element(2, 1));
}

Another elementary problem that crops up is the following.  Assume that we have a set of N points in 3-space stored in anonymous arrays and we are given a reference point and we must find the three closest points in the array.  A reasonable solution to the problem can use the feature of perl’s sort function that allows one to indicate the comparison function.  If $iv is the set of N points and$x $y$z are coordinates of the reference point, then one can use:

my $p = V($x, $y,$z);

sub distance {
my $qq = shift; my$q = V( $qq->[0],$qq->[1], $qq->[2] ); return abs($p - $q ); } sub bydistance { &distance($a) <=> &distance($b); } my @X = sort bydistance @$iv;

After the sorting is done the first three entries in the sorted list are the vertices of the triangulation in case the reference point lay in the convex hull up the points.

Suppose the $x$ is a unit vector on the 2-sphere and we know the vertices of an icosahedron closest to it.  We want to find the point where the ray drawn from the origin intersects the icosahedral face.  Suppose $n$ is normal to the plane of the face, and suppose known the distance $d$ of the plane from the origin.  Then the orthogonal projection of $x$ onto the normal produces a right triangle that is similar to the one formed by the origin, the orthogonal vector intersecting the plane, and the desired point, the projection of $x$ on to the plane.  Here is a concrete piece of code to accomplish this:

# Given an x on a 2-sphere, we want to be able to project it
# to the plane determined by three points.  If the three points are u, v,w
# then this problem can be solved by translating by v
# so that the points are x-v, u-v, 0, w-v, projecting x-v to the
# plane determined by u-v and w-v and then translating the result P(x-v) back

sub project_to_plane {
my ( $x,$u, $v,$w ) = @_;

my ($x1,$x2, $x3 ) = @$x;
my ($u1,$u2, $u3 ) = @$u;
my ($v1,$v2, $v3 ) = @$v;

my ($w1,$w2, $w3 ) = @$w;

# Translate to origin by subtracting v, the rotate so u-v, w-v are in xy
# plane, project to xy plane, reverse rotation add v

my $px = Math::MatrixReal->new_from_cols( [ [$x1 - $v1,$x2 - $v2,$x3 - $v3 ] ]); my$v = Math::MatrixReal->new_from_cols(
[ [ $v1,$v2, $v3 ] ]); my$up = V( $u1 -$v1, $u2 -$v2, $u3 -$v3 );
my $wp = V($w1 - $v1,$w2 - $v2,$w3 - $v3 ); my$cuw = $up x$wp;
if (abs( $cuw) > 0) {$cuw = $cuw->versor; } else { return ($x->[0], $x->[1],$x->[2] );
}

my $zaxis = V( 0, 0, 1); my$angle = acos( $cuw *$zaxis );

my $axis =$zaxis x $cuw; if ( abs($axis ) < 1e-6 ) {
$axis =$zaxis;
} else {
$axis =$axis->versor;
}
# We shall create a rotation matrix that transforms cuw to zaxis
# so that the images under the rotation of up and wp can be used
# in the planar triangle containment code

my $R = &rotation_matrix($axis->[0], $axis->[1],$axis->[2], $angle); my$Ri = $R->inverse; my$Rp = $R->multiply($px );

my $upm = Math::MatrixReal->new_from_cols( [ [$up->[0], $up->[1],$up->[2] ] ]);
my $wpm = Math::MatrixReal->new_from_cols( [ [$wp->[0], $wp->[1],$wp->[2] ] ]);

my $image_up =$R->multiply( $upm ); my$image_wp = $R->multiply($wpm );

my $Rp_proj = Math::MatrixReal->new_from_cols( [ [$Rp->element(1,1), $Rp->element(2,1), 0 ] ] ); my$ppx = $Ri->multiply($Rp_proj ) + $v; return ($ppx->element(1,1), $ppx->element(2,1),$ppx->element(3,1) );
}

Taking a step back from some of these nitty-gritty issues, it is a reasonable assumption that if we are able to produce a grid that is ‘optimal’ in some quantitative sense then that grid can be used in transferring the protein shape determination problem directly into a language translation problem that can employ precisely the same methods as have been used quite successfully in natural language translation.  Thus the relatively elementary problem being addressed of discrete grid for SO(3) is actually a major step for the protein shape determination problem.

## Protein shape determination model as a translation problem

Early in 2009 I had obtained the result that scalar angles of protein skeleta on wedges of triple amino acids where the N-atom along the backbone proxies the location of the amino acid fall on a discrete grid.  Early this year I had shown that twist decomposition of proteins concentrate on SO(3).  The operation of subselecting N-atoms along the backbone and translating the shape of the protein into a sequence of elements of SO(3) we have dubbed the twist decomposition of protein shapes.  The accuracy retained in the twist decomposition has been verified via rigidity of folds corresponding to long amino subsequences in common in multiple proteins where aggregate distances in SO(3) were seen to be small; this gave evidence that the subselection did not lose much more information about the protein shape.  A natural manner in which to treat the problem of shape determination given the sequence of amino acids is that of language translation with a noisy channel model.  In order to accomplish this we would like a discretization of SO(3) so that protein shapes can be modelled as a finite alphabet.  Our preliminary work on the problem of discetization of the 2-sphere by an icosahedral grid refined by barycentric subdivisions used for refinement suggests that a refinement of level 4 to 6 produces relatively accurate fits.  Thus in the language translation model, the twist alphabet reduces to 12 x 64 = 768 points on the sphere multiplied by the size of the angular grid.  Our working hypothesis will be that a level 6 refinement of an icosahedral grid will be able to describe a protein twist alphabet relatively accurately.

Recall that the simplest noisy channel model that has come out of the work of Claude Shannon in information theory is that the there is an input of a finite alphabet $X =\{ a_1,\dots, a_K\}$ passing through a noisy channel to produce words of another alphabet $Y= \{ b_1,\dots, b_J\}$.  The results mentioned above allow us to treat the protein shape determination in the same framework, where the latter alphabet consists of elements of SO(3).  The information provided by the event $x = a_k$ by the event $y = b_j$ is

$I_{X|Y}( a_k, b_j ) = \log \frac{ P( a_k | b_j ) }{ P( a_k ) }$

The noisy channel model had found early application in automated speech recognition in 1983 by Bahl, Jelinek and Mercer, and then Brown et. al. proposed it in 1990 for natural language translation, and there have been a variety of refinements since then in language translation problems.  The basic idea for the translation of a French sentence f to an English sentence e is to maximize the following likelihood:

$\argmax_e P( e | f) = \argmax_e P(e) P(f|e)$

If we had no worries about tractability then there is no difference conceptually between the language translation problem and the protein shape determination problem if we substitute f with an amino acid sequence and e with the corresponding sequence of twists in SO(3).  If we have available an accurate discretization set for twists in folded proteins, then the computational framework is also identical.

We have already found that a level 5 icosahedral grid on SO(3) captures the observed twists in proteins with an euclidean distance error within 0.02.  Such a grid produces an alphabet of maximum size 768 for the axis-portion while the angular portion also can be discretized.  This is most certainly a suboptimal forced sparseness for twists but immediately produces a fit to the situation of language translation.  Assume that all protein twists are chosen from a finite alphabet chosen from a discretization of SO(3). Given a particular R from this set, we can calculate the conditional probabilities $P( t | R)$ for any fixed t of 8000 amino triples by counting the number of times $R$ occurs in the training set corresponding to $t$ and then dividing by the total number of occurrences of $R$.

## Getting data from a website filled with Javascript

I had been using fairly straightforward perl scripts using WWW::Mechanize to gather protein structure data from http://www.rcsb.com which I had used previously to study protein shapes.  I was able to establish that twist decompositions of protein shapes by subselecting the N-atoms along the backbone of the protein provide information about rigidity of long subsequences in common among different proteins.

So now I face the problem that WWW::Mechanize module does not handle Javascript at all, and I have to re-gather data both because there is a larger set of protein structures now available and because I want to streamline the approach I have used before and it would be good to have a robust system for gathering these data.  A reasonable choice is WWW::Selenium which is the client side of an automation suite.

Selenium comes in three pieces, a Firefox plugin called Selenium IDE, Selenium Server, and Selenium Client software which can be found here.  One needs to ensure that Java is installed, since the Selenium server must be running in order to use it to automate Firefox.  I will update this note with more details on how to get the identifiers and the skeleta of proteins as I have a reasonable method of obtaining these data.

Quick test:  In order to ensure that Selenium works to get past the Javascript apparatus of the RCSB website, we can download the server and start it with

java -jar selenium-server-standalone-2.15.0.jar

and the open up the IDE from Firefox via Tools -> Selenium IDE, press the record button and go to the site

http://www.rcsb.org/pdb/search/smartSubquery.do?smartSearchSubtype=TreeEntityQuery&t=4&n=22610

and press the Next link at the bottom of the page a few times.  In each of these pages there are links containing the identifiers of different protein structures that interest us.  We can play back to ensure this automation works.

Semi-manual solution:  with a bit of manual labor, this solution will obtain the protein IDs without automating the entire process:

#!/usr/bin/perl

# This script allows us to get the protein structure identifiers from
# the RCSB database using a mixture of manual work and automation to
# avoid having to work with the Javascript interface.  In the RCSB database
# there is a browse tab where one can browse by Gene Ontology.  There is
# a relatively short list of Gene Ontologies which one can recover the query
# ID and then use the query ID in this script as a command-line argument
# to recover protein IDs.

use WWW::Mechanize;

#my $GeneOntologyID = ($ARGV[0] || 71840 );
my $qrid =$ARGV[0];

#my $query = "http://www.rcsb.org/pdb/search/smartSubquery.do?smartSearchSubtype=TreeEntity&t=4&n=" .$GeneOntologyID;
my $query = "http://www.rcsb.org/pdb/results/results.do?gotopage=1&qrid=$qrid&tabtoshow=Current";

my $agent = WWW::Mechanize->new(); my @files;$agent->get( $query ); my$totalPages;
$agent->content =~ /Page (\d+) of (\d+)/;$totalPages = $2; print "Total pages =$totalPages\n";

#print $agent->content; my @f = ($agent->content =~ m!structureId=([^"]+?)\"!g);

foreach my $n (@f) { print$n . "\n";
}

for (my $p = 2;$p <= $totalPages;$p++) {

$query = "http://www.rcsb.org/pdb/results/results.do?gotopage=" .$p . "&qrid=$qrid&tabtoshow=Current";$agent->get( $query ); @f = ($agent->content =~ m!structureId=([^"]+?)\"!g);
foreach my $n (@f) { print$n . "\n";
}
}

## Why the Afghanistan war is unwinnable

Early in 2009 the Operation Cast Lead which killed more than 300 children in Gaza with more than 1100 civilian casualties and its reaction in the unanimous support for Israel in the US Congress poisoned my views about the future prospects of the current planetary system for anything but continued dystopia; at the time there were pro-Israel rallies in New York, where I lived, celebrating the attacks. I had been active on Facebook with my propaganda for a planetary Republic which included criticism of Israel.  I had also predicted the that America can’t win the Afghan war.

I will tell you why it’s unwinnable. The American empire had a fantasy story about Afghanistan, where they assumed whatever they wanted to believe and tell the American people — that the Taliban are somehow a criminal force that had taken over Afghanistan against the will of the people and America could come and enforce democracy by the barrel of a gun. This is disingenous at best and essentially stems from imperial arrogance from past experience in Iran. After the Afghan experience of fighting the Russian invasion throughout the 1980s, the Taliban which morphed from the mujahideen which America supported so strongly that Reagan in 1982 dedicated the launch of the space shuttle Columbus to the Afghan ‘freedom fighters’. Then the Americans began their occupation of Afghanistan and the local population could not but see the Taliban still as freedom fighters. Richard Holbrooke, Obama’s special envoy to Pakistan and Afghanistan had said in March 2010, almost a decade after the reaction to 9/11 war that “The Taliban is woven into the fabric of Pashtun society on both sides of the border with Pakistan, Holbrooke said, and almost every Pashtun family has someone involved with the movement. That’s why Afghan president, Hamid Karzai, a Pashtun himself, was reaching out to them. But there were “no negotiations with them,” Holbrooke said firmly. (http://www.globalpost.com/dispatch/worldview/100305/holbrooke-afghanistan-pakistan). Moreover, the actual numbers of Taliban actively fighting the US/NATO force is not substantial. It’s in the order of 10,000 with around 3,000 active at any given point (http://www.cfr.org/afghanistan/taliban-afghanistan/p10551). The “Global War on Terror” gives the mistaken impression that there are some extremist criminal elements that they are fighting but in reality the Global War on Terror is simply cover for imperial operations where the expected reaction is the guerilla tactics used by the Islamists who feel that they are reacting to modern democratic militaries occupying what they consider their homeland — this is a characterization by U Chicago Professor Robert Pape. As they keep killing the Taliban leaders they are replaced by younger and more radicalised Islamists (http://therealnews.com/t2/index.php?option=com_content&task=view&id=31&Itemid=74&jumival=7665)