"Align Protein to Density" button is confusingly named

Started by alwen

jeff101 Lv 1

Say the protein starts with all 3 Euler angles equal to 0 degrees.
Since the Euler angles give the same results every 360 degrees,
you can say the starting Euler angles are p360,q360,r360, where
p,q,r are all nonzero integers (1,2,3,4,etc.). For 3 variables,
fminsearch will make its first simplex of 4 points as follows:

(1) p360,q360,r360 & its score
(2) p378,q360,r360 & its score
(3) p360,q378,r360 & its score
(4) p360,q360,r378 & its score

This initial simplex sets the range that fminsearch will explore
for each variable. Using different values for p,q,r will explore
different ranges of angles. Small p,q,r gives a more local search.
Large p,q,r gives a more global search. For example, if p=1, the
initial simplex uses the angles 360 and 378 degrees (equivalent
to 0 and 18 degrees). Also, if p=10, the initial simplex uses the
angles 3600 and 3780 degrees (equivalent to 0 and 180 degrees).
For general p, the angles are p360 and p378 degrees. p360 is
equivalent to 0 degrees. p378 is equivalent to p360+p18 or
p18 degrees. This gives the effective angles below, which repeat
for p=21-40,41-60,61-80,etc.

p    p378
1      18
2      36
3      54
4      72
5      90
6     108
7     126
8     144
9     162
10    180
11    198
12    216    
13    234
14    252
15    270
16    288
17    306
18    324
19    342
20      0
21     18

jeff101 Lv 1

Say the unit cell for the electron density cloud
has x=0-60, y=0-80, and z=0-100 and the protein's
center starts at (x,y,z)=(40,60,20) in the same
coordinate system. Since the unit cell repeats
every 60 units in the x-direction, 80 units in
the y-direction, and 100 units in the z-direction,
the initial position of the protein's center is
equivalent to (x,y,z)=(40+60p,60+80q,20+100r);
that is, the initial position of the protein's
center will give the same electron density score
for any set of integers p,q,r.

If we keep p,q,r as nonzero positive integers
(1,2,3,4,etc.), the initial simplex for the xyz
coordinates for the protein's center will be as
follows:

(1) 40+60p,60+80q,20+100r & its score
(2) 42+63p,60+80q,20+100r & its score
(3) 40+60p,63+84q,20+100r & its score
(4) 40+60p,60+80q,21+105r & its score

If p=1, the initial simplex uses the x values
100 and 105 (equivalent to 40 and 45). Also,
if p=10, the initial simplex uses the x values
640 and 672 (equivalent to 40 and 12). For
general p, the x values are 40+60p and 42+63p.
40+60p is equivalent to 40, and 42+63p is equivalent
to 42+60p+3p or 42+3p. This and similar logic gives
the effective x,y,z values below, which repeat for
n=21-40,41-60,61-80,etc.

  x=42+63n y=63+84n z=21+105n
  x=42+3n  y=63+4n  z=21+5n
n x=0-60   y=0-80   z=0-100
1   45       67       26
2   48       71       31
3   51       75       36
4   54       79       41
5   57        3       46
6    0        7       51
7    3       11       56
8    6       15       61
9    9       19       66
10  12       23       71
11  15       27       76
12  18       31       81
13  21       35       86
14  24       39       91
15  27       43       96
16  30       47        1
17  33       51        6
18  36       55       11
19  39       59       16
20  42       63       21
21  45       67       26

jeff101 Lv 1

If the electron density (ED) were not periodic,
as might occur if the ED were only nonzero within a box covering
x=0-60, y=0-80, and z=0-100, for example,
there is another trick you can do with fminsearch
to control the size of the initial simplex, as below:

Say the protein's center starts at (x,y,z)=(40,60,20)
in the same coordinate system as the electron density.
If you feed these coordinates directly into fminsearch,
the initial simplex will be:

     actual
       xyz
(1) 40,60,20 & its score
(2) 42,60,20 & its score
(3) 40,63,20 & its score
(4) 40,60,21 & its score

If you instead shifted all coordinates by +100 units outside
of fminsearch and then shifted them back by -100 units before
evaluating their score, you'd get for the initial simplex:

      shifted     actual
        xyz         xyz
(1) 140,160,120  40,60,20 & its score
(2) 147,160,120  47,60,20 & its score
(3) 140,168,120  40,68,20 & its score
(4) 140,160,126  40,60,26 & its score

Next, if you shifted all coordinates by +400 units outside
of fminsearch and then shifted them back by -400 units before
evaluating their score, you'd get for the initial simplex:

      shifted     actual
        xyz         xyz
(1) 440,460,420  40,60,20 & its score
(2) 462,460,420  62,60,20 & its score (x goes outside 0-60 here, where ED is zero)
(3) 440,483,420  40,83,20 & its score (y goes outside 0-80 here, where ED is zero)
(4) 440,460,441  40,60,41 & its score

Finally, if you shifted all coordinates by -80 units outside
of fminsearch and then shifted them back by +80 units before
evaluating their score, you'd get for the initial simplex:

      shifted     actual
        xyz         xyz
(1) -40,-20,-60  40,60,20 & its score
(2) -42,-20,-60  38,60,20 & its score 
(3) -40,-21,-60  40,59,20 & its score 
(4) -40,-20,-63  40,60,17 & its score

As you can see, the size of the shift controls the size of the initial simplex,
and the sign of the shift controls which direction the simplex will explore.

jeff101 Lv 1

If you can get the "Center Protein on Density" button
to work better, perhaps as detailed above, it would
be nice if a player could select certain segments
first and then have the "Center Protein on Density"
button optimize the protein's position & orientation
as if the selected segments were the only scoring
segments for the entire protein. This way, if a player
was sure about the structure of a certain section
of the protein, say segments 1-20 and 45-90, he/she
could select just those segments, then press the
"Center Protein on Density" button to find the
position & orientation that gives the best score
for segments 1-20 and 45-90 only.

frood66 Lv 1

align protein to density has always been a waste of time - best it is removed as Flat was gonna do years ago. Best to ignore this option.