Placeholder image of a protein
Icon representing a puzzle

2258: Electron Density Reconstruction 25

Closed since about 3 years ago

Novice Overall Prediction Electron Density

Summary


Created
January 25, 2023
Expires
Max points
100
Description

The structure of this protein has already been solved and published, but close inspection suggests that there are some problems with the published solution. We'd like to see if Foldit players can use the same electron density data to reconstruct a better model. There are four copies of the same protein in this puzzle and it's a bit large, so might want to use the Trim tool!

Sequence
MHHHHHHENLYFQGAASMLKKDKSELTDIEYIVTQENGTEPPFMNEYWNHFAKGIYVDKISGKPLFTSEEKFHSECGWPSFSKALDDDEIIELVDKSFGMVRTEVRSEESNSHLGHVFNDGPKESGGLRYCINSAAIQFIPYEKLEELGYGDLISHFDK

Top groups


  1. Avatar for Gargleblasters 11. Gargleblasters 1 pt. 69,082

  1. Avatar for Sandrix72
    1. Sandrix72 Lv 1
    100 pts. 83,097
  2. Avatar for Bruno Kestemont 2. Bruno Kestemont Lv 1 93 pts. 82,795
  3. Avatar for grogar7 3. grogar7 Lv 1 87 pts. 82,444
  4. Avatar for maithra 4. maithra Lv 1 81 pts. 82,403
  5. Avatar for gmn 5. gmn Lv 1 75 pts. 81,960
  6. Avatar for Galaxie 6. Galaxie Lv 1 69 pts. 81,734
  7. Avatar for LociOiling 7. LociOiling Lv 1 64 pts. 81,615
  8. Avatar for drjr 8. drjr Lv 1 59 pts. 81,483
  9. Avatar for dcrwheeler 9. dcrwheeler Lv 1 55 pts. 81,358
  10. Avatar for Timo van der Laan 10. Timo van der Laan Lv 1 50 pts. 81,325

Comments


LociOiling Lv 1

Yep, six chains, but they aren't identical. They add up to 816 segments.

Unfortunately, the recent update to AA Edit doesn't work on this puzzle.

Here are the chains that AA Edit should have detected. The sequence shown in the puzzle comments is first, followed by the six actual chains. The chains have been updated after a more careful inspection of the puzzle:

                                                                                                                      1         1         1         1         1                      
                            1         2         3         4         5         6         7         8         9         0         1         2         3         4                      
                   12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890 segment/residue 1-140
                                                                                                                                                                                     
7CTO sequence      kkdkseltdieyivtqengteppfmneywnhfakgiyvdkisgkplftseekfhsecgwpsfskaldddeiielvdksfgmvrtevrseesnshlghvfndgpkesgglrycinsaaiqfipyekleelgygdlishfdk 7CTO sequence        
----------------- -------------------------------------------------------------------------------------------------------------------------------------------------------------------
chain A, len = 134    kseltdieyivtqengteppfmneywnhfakgiyvdkisgkplftseekfhsecgwpsfskaldddeiielvdksfgmvrtevrseesnshlghvfndgpkesgglrycinsaaiqfipyekleelgygdlish    chain A, length = 134
chain B, len = 139 kkdkseltdieyivtqengteppfmneywnhfakgiyvdkisgkplftseekfhsecgwpsfskaldddeiielvdksfgmvrtevrseesnshlghvfndgpkesgglrycinsaaiqfipyekleelgygdlishfd  chain B, length = 139
chain C, len = 136  kdkseltdieyivtqengteppfmneywnhfakgiyvdkisgkplftseekfhsecgwpsfskaldddeiielvdksfgmvrtevrseesnshlghvfndgpkesgglrycinsaaiqfipyekleelgygdlish    chain C, length = 136
chain D, len = 136 kkdkseltdieyivtqengteppfmneywnhfakgiyvd@@sgkplftseekfhsecgwpsfskaldddeiielvdksfgmvrtevrseesnshlghvfndgpkesgglrycinsaaiqfipyekleelgygdlishfd  chain D, length = 136
chain E, len = 136    kseltdieyivtqengteppfmneywnhfakgiyvdkisgkplftseekfhsecgwpsfskaldddeiielvdksfgmvrtevrseesnshlghvfndgpkesgglrycinsaaiqfipyekleelgygdlishfd  chain E, length = 136
chain F, len = 135   dkseltdieyivtqengteppfmneywnhfakgiyvdkisgkplftseekfhsecgwpsfskaldddeiielvdksfgmvrtevrseesnshlghvfndgpkesgglrycinsaaiqfipyekleelgygdlish    chain F, length = 135
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                   12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890 segment/residue 1-140
                            1         2         3         4         5         6         7         8         9         0         1         2         3         4                      
                                                                                                                      1         1         1         1         1                      

(Edit: corrected the chains, which don't all begin in the same spot. A previous version of chain B also had an error. The gap in chain is shown by "@@". The sequence info can be scrolled horizontally, at least in Firefox. See additional posts below.)

LociOiling Lv 1

Continuing to peel the onion, this puzzle is a match for PDB 7CTO.

All six chains are supposed to be the same, but there are lots of "missing residues" which were "not located in the experiment".

Looking at the PDB file for 7CTO (under "Display Files" -> "PDB Format"), one chunk of residues is missing from each of the six chains. There's a SEQADV record for each one of these residues. They consist of a methionine, followed by an "expression tag". The expression tag is used to separate the desired protein from all the other stuff in the test tube.

This expression tag is a common one, with six histidines in a row. It's known as the "histidine" or "polyhistidine" tag.

The methionine and the expression tag cover the first part of the sequence shown in the puzzle comments, "MHHHHHHENLYFQGAAS".

Here's what the SEQADV records for the first chain look like this:

SEQADV 7CTO MET A  -16  UNP  W8TTH3              INITIATING METHIONINE          
SEQADV 7CTO HIS A  -15  UNP  W8TTH3              EXPRESSION TAG                 
SEQADV 7CTO HIS A  -14  UNP  W8TTH3              EXPRESSION TAG                 
SEQADV 7CTO HIS A  -13  UNP  W8TTH3              EXPRESSION TAG                 
SEQADV 7CTO HIS A  -12  UNP  W8TTH3              EXPRESSION TAG                 
SEQADV 7CTO HIS A  -11  UNP  W8TTH3              EXPRESSION TAG                 
SEQADV 7CTO HIS A  -10  UNP  W8TTH3              EXPRESSION TAG                 
SEQADV 7CTO GLU A   -9  UNP  W8TTH3              EXPRESSION TAG                 
SEQADV 7CTO ASN A   -8  UNP  W8TTH3              EXPRESSION TAG                 
SEQADV 7CTO LEU A   -7  UNP  W8TTH3              EXPRESSION TAG                 
SEQADV 7CTO TYR A   -6  UNP  W8TTH3              EXPRESSION TAG                 
SEQADV 7CTO PHE A   -5  UNP  W8TTH3              EXPRESSION TAG                 
SEQADV 7CTO GLN A   -4  UNP  W8TTH3              EXPRESSION TAG                 
SEQADV 7CTO GLY A   -3  UNP  W8TTH3              EXPRESSION TAG                 
SEQADV 7CTO ALA A   -2  UNP  W8TTH3              EXPRESSION TAG                 
SEQADV 7CTO ALA A   -1  UNP  W8TTH3              EXPRESSION TAG  
SEQADV 7CTO SER A    0  UNP  W8TTH3              EXPRESSION TAG                  

The SEQADV records give the PDB entry (7CTO), the amino acid (HIS for histidine), the chain (chain A here) and the sequence number of the residue. The sequence numbers helpfully start at -16 and work their way up to zero.

The SEQADV records are repeated for chains B, C, D, E, and F. So there are 102 SEQADV records, covering the first 17 residues of 6 chains.

These first 17 residues are missing from each of the chains. Each chain also has additional missing residues, which will be described in a separate post.

LociOiling Lv 1

The PDB file for 7CTO also has lots of REMARK 465 records that detail each missing residue. Here are the first few records in this section:

REMARK 465 MISSING RESIDUES                                                     
REMARK 465 THE FOLLOWING RESIDUES WERE NOT LOCATED IN THE                       
REMARK 465 EXPERIMENT. (M=MODEL NUMBER; RES=RESIDUE NAME; C=CHAIN               
REMARK 465 IDENTIFIER; SSSEQ=SEQUENCE NUMBER; I=INSERTION CODE.)                
REMARK 465                                                                      
REMARK 465   M RES C SSSEQI                                                     
REMARK 465     MET A   -16                                                      
REMARK 465     HIS A   -15                                                      
REMARK 465     HIS A   -14                                                      
REMARK 465     HIS A   -13  
...

It turns out that each of the 6 chains is also missing residues 1 and 2, and residue 142. Chains A and E are also missing residues 3, 4, and 5. Chains A, C, and F are missing 140 and 141 in addition to 142. Residues 42 and 43 are missing from chain D, but are present in the other chains.

The table below summarizes what all 138 of those REMARK 465 records are trying to say:

RES SEQ A B C D E F
MET -16 X X X X X X
HIS -15 X X X X X X
HIS -14 X X X X X X
HIS -13 X X X X X X
HIS -12 X X X X X X
HIS -11 X X X X X X
HIS -10 X X X X X X
GLU -9 X X X X X X
ASN -8 X X X X X X
LEU -7 X X X X X X
TYR -6 X X X X X X
PHE -5 X X X X X X
GLN -4 X X X X X X
GLY -3 X X X X X X
ALA -2 X X X X X X
ALA -1 X X X X X X
SER 0 X X X X X X
MET 1 X X X X X X
LEU 2 X X X X X X
LYS 3 X   X X X X
LYS 4 X       X X
ASP 5 X       X  
LYS 42       X    
ILE 43       X    
PHE 140 X   X     X
ASP 141 X   X     X
LYS 142 X X X X X X

The chains are shown by columns A through F. An "X" in a column means the residue is missing from that chain.

alcor29 Lv 1

This thing is too crashy. I almost never crash and its crashing when I leave a script running on a trimmed segments. Also, I never get 502,504 errors. This also occurs when they are present in my client.

LociOiling Lv 1

I've corrected the chains listed above in my first post. The recipe Find the Gap was useful, and points to a strategy for detecting chains on puzzles like this one.

I'm still puzzled by how the missing residues are handled. For example, chain D is missing residues 42 and 43, and the solution is apparently to connect residue 41 to residue 44 and continue. (The puzzle doesn't allow inserting segments to fill that gap or the others). I can see leaving off residues at the beginning or end of chain, but that approach makes less sense in the middle.

spmm Lv 1

I am also getting multiple bad gateway 502 errors and dare not touch anything even when a simple fuse is running, restarted, chat is not working and no scores showing in refreshed puzzle list on program open.

BootsMcGraw Lv 1

Eight hundred sixteen residues? Really?? Why did we not get the individual sub-units as multiple puzzles?

This puzzle broke all my scripts, and possibly gave my desktop cancer. Please save the big stuff for AlphaFold.

LociOiling Lv 1

This puzzle was puckered due to missing residues.

The recipe Pucker Picker 3.0 RC 1 detected these puckers:

Pucker Picker 3.0 RC1
2258: Electron Density Reconstruction 25
1  pucker found!
pucker 1 (distance/ideality), segments 447-448 (protein), distance = 5.031, ideality = -2475.958, -2473.121