Feeds:
Posts

## Precalculating amino acid solvent accessible surface areas for use with docking algorithms

The direct geometric estimation of solvent accessible surface area for example by the Shrake-Rupley algorithm adds to the computational burden of docking and other applications.  We would like to treat each amino acid as a rigid body and precalculate the solvent accessible envelope.  Then the shape of the protein can be proxied by these envelopes and their intersections reducing computational burden for docking.  This approach could help produce geometrically smoother envelopes than the rolling ball of 1.4 Angstroms that is commonly used.

The geometric components for an envelope are hemisphere for an end, cylindrical tubes for bonds, and intersections can be smoothened out.  The radius is 1.4 Angstroms larger than the atomic radii.

Basic spherical intersection code to determine the envelope for an individual amino acid is here:

from prody import *
import numpy as np

prot1 = '1FGN'

p = parsePDB(prot1)
res = p.getResnames()
idx = p.getResindices()
C = p.getCoords()
E = p.getElements()
i0 = 0
a1 = []
a1e = []
for j in range(len(C)):
if idx[j] == i0:
a1.append(C[j])
a1e.append(E[j])

N = 200
mesh = np.zeros((N,N,N))
center = np.array([N/2,N/2,N/2])
delta = 0.1 # angstroms

# translate the initial N to the center
v1 = a1[0]

def distance(a,b):
return np.linalg.norm(a-b)

def spherical_boundary( center_coord, radius, mesh ):
for i in range(center_coord[0]-rN,center_coord[0]+rN):
for j in range(center_coord[1]-rN,center_coord[1]+rN):
for k in range(center_coord[2]-rN,center_coord[2]+rN):
d = distance(np.array([i,j,k]),center_coord)*delta
if d < radius-delta: mesh[i,j,k] = 0.0
if d > radius+delta: mesh[i,j,k] = 0.0

for i in range(len(a1)):
acv = a1[i] - v1 + center*delta
atom_center = np.array([int(acv[0]/delta),int(acv[1]/delta),int(acv[2]/delta)])
if a1e[i] == 'C':
if a1e[i] == 'H':
if a1e[i] == 'N':
if a1e[i] == 'O':

# At this point the mesh should contain the surface of the amino acid

## Rigid protein docking problem and sampling S2 x S2

Consider the protein protein docking problem where we are given two protein 3D structures and our goal is to produce the most natural protein complex where these are bound.  The easier case is when the only allowed motions are rigid motions.  A simple observation is that explicit rigid motion sampling is uniform sampling of $S^2 \times S^2$ where each factor represents rotations of each factor.  At each sampling, some metric of fitness of binding can be calculated and then the maximum metric is the choice of the algorithmic complex of the two protein structures.  Of course the rigid body docking problem is easier than the flexible protein docking problem but this sampling approach gives us a simple Monte Carlo approach to both types of problems.

Next, for general images, the so-called image registration problem is the determination of scaling, rotations and translations to identify two images as similar.  It is well known that such problems can be addressed by fast Fourier transforms for both 2D and 3D images.  In our case we can consider a protein P in a given space and the complement of the second protein, Q^c in a compact region of space and transform the rigid docking problem into an image registration problem.

By random sampling of 3D rotation pairs one can attempt to analyze correlations of the rotated shapes and seek a maximum.  The code follows.  One modification needed for this code is to select the centers appropriately for approximate docking matches.  At the moment, I set the center of the second protein to be center1+center2 which is not the best choice.  Suppose two images I1 and I2 are translations of one another. The phase correlation method is to find the maximum of the product of their FFTs. The coordinates of the maximum gives the translation vector.

from prody import *
import numpy as np

def extractBackboneCoords( p ):
xmin,ymin,zmin = 10000,10000,10000
xmax,ymax,zmax = -10000,-10000,-1000
C = p.getCoords()
Backbone = []
J = p.getElements()
for j in range(len(J)):
if J[j] == 'N':
pt = np.array([C[j][0],C[j][1],C[j][2]]).reshape(3,1)
Backbone.append(pt)
if pt[0] < xmin: xmin = pt[0]
if pt[1] < ymin: ymin = pt[1]
if pt[2] < zmin: zmin = pt[2]
if pt[0] > xmax: xmax = pt[0]
if pt[1] > ymax: ymax = pt[1]
if pt[2] > zmax: zmax = pt[2]

return (Backbone,np.array([xmin,ymin,zmin]),np.array([xmax,ymax,zmax]))

def markFromCoords(bbone, vec):
val = 0.0
for p in bbone:
if np.linalg.norm(p-vec) < 0.1:
val =+ 1.0
return val

def rotation_matrix(axis,theta):
#    axis = axis/np.sqrt(np.dot(axis,axis))
a = np.cos(theta/2)
v = -np.sin(theta/2)*np.array(axis)
b,c,d = v[0],v[1],v[2]
return np.array([[a*a+b*b-c*c-d*d, 2*(b*c-a*d), 2*(b*d+a*c)],
[2*(b*c+a*d), a*a+c*c-b*b-d*d, 2*(c*d-a*b)],
[2*(b*d-a*c), 2*(c*d+a*b), a*a+d*d-b*b-c*c]])

prot1 = '1FGN'
prot2 = '1TFH'

p1 = parsePDB( prot1 )
p2 = parsePDB( prot2 )

p1b, p1min, p1max = extractBackboneCoords(p1)
p2b, p2min, p2max = extractBackboneCoords(p2)
N = 100

mmin = np.minimum(p1min,p2min)
mmax = np.maximum(p1max,p2max)

center1 = np.zeros((3,1))
center2 = np.zeros((3,1))
len1 = len(p1b)
len2 = len(p2b)

for p in p1b:
center1 += np.array(p).reshape(3,1)
center1 /= float(len1)

for p in p2b:
center2 += np.array(p).reshape(3,1)
center2 /= float(len1)

delta = 2.0*(mmax-mmin)/float(N)

n_mc = 100

max_corr = -10000
max_axis1 = []
max_axis2 = []
max_angle1 = 0
max_angle2 = 0
for mc in range(n_mc):
print 'Monte carlo ', mc
mesh1 = np.zeros((N,N,N))
mesh2 = np.zeros((N,N,N))

random_axis1 = np.random.normal(size=(3,1))
random_axis1 /= np.linalg.norm(random_axis1)

random_axis2 = np.random.normal(size=(3,1))
random_axis2 /= np.linalg.norm(random_axis2)

random_angle1 = np.random.uniform(high=2*np.pi)
random_angle2 = np.random.uniform(high=2*np.pi)

Rot1 = rotation_matrix(random_axis1,random_angle1).reshape(3,3)
Rot2 = rotation_matrix(random_axis2,random_angle2).reshape(3,3)

for p in p1b:
p = np.array(np.dot(Rot1,np.array(p).reshape(3,1)-center1) + center1)
print p
c1 = int((p[0]-mmin[0])/delta[0])
c2 = int((p[1]-mmin[1])/delta[1])
c3 = int((p[2]-mmin[2])/delta[2])

c1 = min(c1,N-1)
c2 = min(c2,N-1)
c3 = min(c3,N-1)
mesh1[c1,c2,c3] = 1.0

for p in p2b:
p = np.array(np.dot(Rot2,np.array(p).reshape(3,1)-center2) + center2 + center1)
print p
c1 = int((p[0]-mmin[0])/delta[0])
c2 = int((p[1]-mmin[1])/delta[1])
c3 = int((p[2]-mmin[2])/delta[2])

c1 = min(c1,N-1)
c2 = min(c2,N-1)
c3 = min(c3,N-1)
mesh2[c1,c2,c3] = 1.0

f1 = np.fft.fftn(mesh1)
f2 = np.fft.fftn(mesh2)

f12 = f1*f2
r12 = np.linalg.norm(f12)

if r12 > max_corr:
max_corr = r12
max_axis1 = random_axis1
max_axis2 = random_axis2
max_angle1 = random_angle1
max_angle2 = random_angle2

print 'Axis 1'
print max_axis1
print 'Axis 2'
print max_axis2
print 'Angle 1'
print max_angle1
print 'Angle 2'
print max_angle2

## Twist decomposition of protein shapes and volumes of convex polytopes

Suppose the twist decomposition of a protein shape is a sequence $g_1, g_2, \dots, g_N$ and the backbone points are $x_1,\dots,x_N$.  Consider the problem of docking sites for ligands.  For this we consider the volumes of convex polytopes as follows.  For each point $x_i$ we can associate to it a plane by the normal vector in $g_i$.  Then for every subsequece of the twist sequence, we can seek convex polytopes bounded by the associated planes.  If the convex polytopes have sufficient volume we can consider it a possible docking site and reject it as a docking site otherwise.  This is a first approach to search for ligand docking site which might require modifications to produce realistic results.

Fortunately for us, there is a tradition of computational convexity to aid us in the problems of volume computation.  Here is a reference with survey and results, as well as here.  There is quite a bit of research that has been done for polynomial time computation of convex polytopes and we can choose from the available algorithms one that utilizes a sweeping hyperplane method discussed here.

A theorem of Jim Lawrence, stated below, provide us with a method to calculate the volume of a convex polyhedron as a sum over vertices of the polyhedron of a certain functional.

THEOREM.  Suppose $P = \{ x \in \mathbf{R}^n : r_i(x) = b_i - a_i^t x \ge 0, i+1,\dots, m \}$.  Suppose for each vertex $v$ the number of $r_i(v)=0$ is exactly $n$.  Suppose $c\in\mathbf{R}^n$ and $d\in\mathbf{R}$ are such that $f(x)=c^t x+d$ is nonconstant on each edge.  For vertex $v$, let

$N_v = \frac{f(v)^n}{n!\delta_v\gamma_1\cdots\gamma_n}$

where if the indices which are binding at v are $i_1,\dots,i_n$ then $\gamma_i$ are such that

$c = \gamma_1 a_{i_1} + \cdots \gamma_n a_{i_n}$.

Then the volume is $V = \sum_v N_v$.

By this theorem of Lawrence, we must determine the vertices and then these $N_v$ in order to calculate the volume.

## Twist decomposition of randomly chosen protein

In posts several years ago I introduced the twist decompositions of protein skeleta.  Here I post code that will randomly select a protein shape from the PDB database and produce its twist sequence.  It is worth posting because this is a basic analytic tool for protein shapes and the code is quite transparent and simple.  The ProDy library is doing the work of parsing the PDB files easing our work.
import urllib2
import numpy
from prody import *

PDBLIST = ‘ftp://ftp.wwpdb.org/pub/pdb/derived_data/index/source.idx’
f = urllib2.urlopen( PDBLIST )

F = []
for line in f:
p = line.split()[0]
F.append(p)

N = len(F)
i0 = numpy.random.randint(N)

pid = F[i0]

def so3twist( A, B, C):

v1 = A-B
v1 /= norm(v1)

v2 = C-B
v2 /= norm(v2)

v3 = np.cross(v1,v2)
v3 /= norm(v3)

d = np.dot(v1,v2)
return (np.arccos(d), v3)

def extractBackboneCoords( p ):
C = p.getCoords()
Backbone = []
J = p.getElements()
for i in range(len(J)):
if J[j] == ‘N’:
Backbone.insert(C[j])
return Backbone

def twistSequence(C):
N = len(C)
V = []
for i in range(1,N-1):
s = so3twist(C[i-1],C[i],C[i+1])
V.append(s)
return V

p = parsePDB( pid )
C = extractBackboneCoords(p)
T = twistSequence(C)

## Syria and chemical weapons

Update October 6 2013.  A great deal has changed since I wrote this blog first, and today I found a compelling account of how the US was trying to frame Assad with CW for a period of time including new documents that show communication regarding the false flags.

Obama administration is openly discussing cruise missile attacks on Syria after alleged use of chemical weapons by the regime.  I look at this from the big picture.  Zionist neoconservatives planned mass regime changes and Zionists did Cole and 9/11 to trigger the process.  They have used accusations of WMD successfully in Iraq to justify invasion and destruction of the country and against Iran semi-successfully.  It is quite likely they have planted chemical weapons that produces military options against Assad.  It is totally irrational for Assad to use chemical weapons against civilians when the regime forces seem to be winning and when he has Russian backing.

There has been a report of chemical weapon use that preceded the actual alleged use, suggesting deliberate plans to plant and accuse the Assad regime.

UPDATE:  Here is a report where a UK MP blamed the Zionists for planting chemical weapons in Syria, so I don’t feel alone in suggesting it.

A Forbes report says Russians suspect the chemical attack was a provocation by the rebels, which I believe as well:

Shortly after that news broke, Russia’s Foreign Ministry said in a statement that the alleged chemical attack could have been a staged “provocation” by the Syrian opposition and that the U.S. might use it as a pretext to go after Syria.

“All of this makes one recall the events that happened 10 years ago, when, using false information about Iraqis having weapons of mass destruction, the U.S. bypassed the United Nations and started a scheme whose consequences are well known to everyone,” the ministry said in a web-posted statement.

## Mapping parallel code to scipy signatures

The main idea behind CvxCloudOpt is to parallelize matrix computations specified using the python standard library.  Our next step is reproducing the parallel code matching signatures of scipy.sparse.linalg functions for iterative solvers.  The code for this is here, although it requires more debugging.  The MPI and PETSc usage should be completely transparent to the users, taking with them the implementation details for parallelism.

import sys
from numpy import exp, sqrt, random, zeros, dot
import petsc4py
from petsc4py import PETSc
from mpi4py import MPI
petsc4py.init(sys.argv)
from numpy.linalg import norm
import numpy as np

# Next we map to the signatures of solvers in scipy
def setupDistributedMatrix(B,comm):
comm.Barrier()
m,n = B.shape()
A = PETSc.Mat()
A.create(comm)
A.setSizes([m, n])
A.setType('mpidense')
A.setUp()
comm.Bcast([B, MPI.DOUBLE])
v = B
v.reshape(n*m,1)
A.setValues(range(0,m),range(0,n),v)
A.assemblyBegin()
A.assemblyEnd()
return A

def setupKSP(y,comm,solverType='minres'):
ksp = PETSc.KSP()
ksp.create(comm)
ksp.setType(solverType)
x,b = A.getVecs()
x.set(0)
x.assemblyBegin()
x.assemblyEnd()
b.setValues(range(0,n),y)
b.assemblyBegin()
b.assemblyEnd()
return ksp

def bicg(B,b,x0=None,tol=1e-5,maxiter=None,xtype=None,callback=None):
comm = MPI.COMM_WORLD
A = setupDistributedMatrix(B,comm)
ksp = setupKSP(b,comm,'bicg')
ksp.setOperators(A)
ksp.setFromOptions()
b,x = A.getVecs()
ksp.solve(b,x)
return x

def bicgstab(B,b,x0=None,tol=1e-5,maxiter=None,xtype=None,callback=None):
comm = MPI.COMM_WORLD
A = setupDistributedMatrix(B,comm)
ksp = setupKSP(b,comm,'bicgstab')
b,x = A.getVecs()
ksp.solve(b,x)
return x

def cg(B,b,x0=None,tol=1e-5,maxiter=None,xtype=None,callback=None):
comm = MPI.COMM_WORLD
A = setupDistributedMatrix(B,comm)
ksp = setupKSP(b,comm,'cg')
b,x = A.getVecs()
ksp.solve(b,x)
return x

def cgs(B,b,x0=None,tol=1e-5,maxiter=None,xtype=None,callback=None):
comm = MPI.COMM_WORLD
A = setupDistributedMatrix(B,comm)
ksp = setupKSP(b,comm,'cgs')
b,x = A.getVecs()
ksp.solve(b,x)
return x

def gmres(B,b,x0=None,tol=1e-5,maxiter=None,restart=None,xtype=None,callback=None, restrt=None):
A = setupDistributedMatrix(B,comm)
ksp = setupKSP(b,comm,'gmres')
b,x = A.getVecs()
ksp.solve(b,x)
return x

def lgmres(B, b, x0=None, tol=1e-5,maxiter=1000,M=None,callback=None,inner_m=30,outer_v=None, store_outer_Av=True):
comm = MPI.COMM_WORLD
A = setupDistributedMatrix(B,comm)
ksp = setupKSP(b,comm,'lgmres')
b,x = A.getVecs()
ksp.solve(b,x)
return x

def minres(B,b,x0=None,shift=0.0,tol=1e-5,maxiter=None,xtype=None,M=None,callback=None,show=False,check=False):
comm = MPI.COMM_WORLD
A = setupDistributedMatrix(B,comm)
ksp = setupKSP(b,comm,'minres')
b,x = A.getVecs()
ksp.solve(b,x)
return x

def qmr(B, b, x0=None, tol=1e-05, maxiter=None, xtype=None, M1=None, M2=None, callback=None):
comm = MPI.COMM_WORLD
A = setupDistributedMatrix(B,comm)
ksp = setupKSP(b,comm,'qmr')
b,x = A.getVecs()
ksp.solve(b,x)
return x

m, n = 1000, 1000

comm = MPI.COMM_WORLD

if comm.rank == 0:
B = random.normal(size=(m,n))
for i in range(m):
for j in range(i,n):
B[i,j] = B[j,i]

if comm.rank==0:
x0 = zeros((n,1))
for i in range(15):
x0[ random.randint(0,n) ] = 1.0
y = dot(B,x0)
x = minres(B,y)

## Attempt at MINRES with two cores

The last two numbers in the following output are MPI process local errors between expected and solved results which are low:

unknown001ec2bbc1a8:code home\$ mpirun -n 2 python parAug21.py
2
2
Norm(v) = 99.088037Norm(v) = 99.088037

Norm(x0) = 3.741657Norm(x0) = 3.741657

Norm y2=B*x0
Norm y2=B*x0
33.4279130868
33.4279130868
Norm(b)Norm(b)

33.4279130868
33.4279130868
[  4.28548872e-09][  5.40417870e-09]

And the code is the following:

import sys
from numpy import exp, sqrt, random, zeros, dot
import petsc4py
from petsc4py import PETSc
from mpi4py import MPI
petsc4py.init(sys.argv)
from numpy.linalg import norm
import numpy as np
cm = PETSc.COMM_WORLD

p = cm.Get_size()
print p

m, n = 100, 100

comm = MPI.COMM_WORLD

comm.Barrier()

if comm.rank == 0:
B = random.normal(size=(m,n))
for i in range(m):
for j in range(i,n):
B[i,j] = B[j,i]
else:
B = np.empty((m,n))

comm.Bcast([B, MPI.DOUBLE])

v = B
v.reshape(n*m,1)
print 'Norm(v) = %f' % norm(v)

A = PETSc.Mat()
A.create(PETSc.COMM_WORLD)
A.setSizes([m, n])
A.setType('mpidense')
A.setUp()

#Istart, Iend = A.getOwnershipRange()
#for I in range(Istart,Iend):
#    ix = I /n
#    jx = I - ix*n
#    A.setValue(I,jx+ix*n,B[ix,jx])

#rstart = (m/p)*comm.rank
#rend = (m/p)*(comm.rank+1)+1
#z = v[(n*m)*comm.rank/p: (n*m)*(comm.rank+1)/p  ]
#A.setValues(range(rstart,rend),range(0,n),z)
A.setValues(range(0,m),range(0,n),v)
#for ix in range(m):
#    for jx in range(n):
#        A.setValues(ix,jx,B[ix,jx])

A.assemblyBegin()
A.assemblyEnd()

N = 0
#Istart, Iend = A.getOwnershipRange()
#for I in range(Istart,Iend):
#    ix = I /n
#    jx = I - ix*n
#    N = N + ( A[ix,jx] - B[ix,jx] )**2

#print 'Checking A==B'
#print N

if comm.rank==0:
x0 = zeros((n,1))
for i in range(15):
x0[ random.randint(0,n) ] = 1.0
else:
x0 = np.empty((n,1))

comm.Bcast([x0, MPI.DOUBLE])

print 'Norm(x0) = %f' % norm(x0)

y2 = dot(B,x0)
print 'Norm y2=B*x0'
print norm(y2)
x1 = PETSc.Vec().createMPI(n,comm=PETSc.COMM_WORLD)
x1.setValues(range(0,n),x0)
x1.assemblyBegin()
x1.assemblyEnd()

y = PETSc.Vec().createMPI(n,comm=PETSc.COMM_WORLD)
y.set(0)
A.mult(x1,y)
y.assemblyBegin()
y.assemblyEnd()

#y.setValues(range(0,n), y2)

ksp = PETSc.KSP()
ksp.create(PETSc.COMM_WORLD)
ksp.setType('minres')
x,b = A.getVecs()
x.set(0)
x.assemblyBegin()
x.assemblyEnd()
b.setValues(range(0,n),y2)
b.assemblyBegin()
b.assemblyEnd()

print 'Norm(b)'
print b.norm()

ksp.setOperators(A)
ksp.setFromOptions()
ksp.solve(b,x)

K1,K2 = x.getOwnershipRange()
N = 0.0
for i in range(K1,K2):
N = N + (x[i]-x0[i])**2
print N