grogar7 Lv 1
It seems there are SIX copies of the protein present in the ED cloud!!
Closed since about 3 years ago
Novice Overall Prediction Electron DensityThe structure of this protein has already been solved and published, but close inspection suggests that there are some problems with the published solution. We'd like to see if Foldit players can use the same electron density data to reconstruct a better model. There are four copies of the same protein in this puzzle and it's a bit large, so might want to use the Trim tool!
It seems there are SIX copies of the protein present in the ED cloud!!
Yep, six chains, but they aren't identical. They add up to 816 segments.
Unfortunately, the recent update to AA Edit doesn't work on this puzzle.
Here are the chains that AA Edit should have detected. The sequence shown in the puzzle comments is first, followed by the six actual chains. The chains have been updated after a more careful inspection of the puzzle:
1 1 1 1 1
1 2 3 4 5 6 7 8 9 0 1 2 3 4
12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890 segment/residue 1-140
7CTO sequence kkdkseltdieyivtqengteppfmneywnhfakgiyvdkisgkplftseekfhsecgwpsfskaldddeiielvdksfgmvrtevrseesnshlghvfndgpkesgglrycinsaaiqfipyekleelgygdlishfdk 7CTO sequence
----------------- -------------------------------------------------------------------------------------------------------------------------------------------------------------------
chain A, len = 134 kseltdieyivtqengteppfmneywnhfakgiyvdkisgkplftseekfhsecgwpsfskaldddeiielvdksfgmvrtevrseesnshlghvfndgpkesgglrycinsaaiqfipyekleelgygdlish chain A, length = 134
chain B, len = 139 kkdkseltdieyivtqengteppfmneywnhfakgiyvdkisgkplftseekfhsecgwpsfskaldddeiielvdksfgmvrtevrseesnshlghvfndgpkesgglrycinsaaiqfipyekleelgygdlishfd chain B, length = 139
chain C, len = 136 kdkseltdieyivtqengteppfmneywnhfakgiyvdkisgkplftseekfhsecgwpsfskaldddeiielvdksfgmvrtevrseesnshlghvfndgpkesgglrycinsaaiqfipyekleelgygdlish chain C, length = 136
chain D, len = 136 kkdkseltdieyivtqengteppfmneywnhfakgiyvd@@sgkplftseekfhsecgwpsfskaldddeiielvdksfgmvrtevrseesnshlghvfndgpkesgglrycinsaaiqfipyekleelgygdlishfd chain D, length = 136
chain E, len = 136 kseltdieyivtqengteppfmneywnhfakgiyvdkisgkplftseekfhsecgwpsfskaldddeiielvdksfgmvrtevrseesnshlghvfndgpkesgglrycinsaaiqfipyekleelgygdlishfd chain E, length = 136
chain F, len = 135 dkseltdieyivtqengteppfmneywnhfakgiyvdkisgkplftseekfhsecgwpsfskaldddeiielvdksfgmvrtevrseesnshlghvfndgpkesgglrycinsaaiqfipyekleelgygdlish chain F, length = 135
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890 segment/residue 1-140
1 2 3 4 5 6 7 8 9 0 1 2 3 4
1 1 1 1 1
(Edit: corrected the chains, which don't all begin in the same spot. A previous version of chain B also had an error. The gap in chain is shown by "@@". The sequence info can be scrolled horizontally, at least in Firefox. See additional posts below.)
Continuing to peel the onion, this puzzle is a match for PDB 7CTO.
All six chains are supposed to be the same, but there are lots of "missing residues" which were "not located in the experiment".
Looking at the PDB file for 7CTO (under "Display Files" -> "PDB Format"), one chunk of residues is missing from each of the six chains. There's a SEQADV record for each one of these residues. They consist of a methionine, followed by an "expression tag". The expression tag is used to separate the desired protein from all the other stuff in the test tube.
This expression tag is a common one, with six histidines in a row. It's known as the "histidine" or "polyhistidine" tag.
The methionine and the expression tag cover the first part of the sequence shown in the puzzle comments, "MHHHHHHENLYFQGAAS".
Here's what the SEQADV records for the first chain look like this:
SEQADV 7CTO MET A -16 UNP W8TTH3 INITIATING METHIONINE
SEQADV 7CTO HIS A -15 UNP W8TTH3 EXPRESSION TAG
SEQADV 7CTO HIS A -14 UNP W8TTH3 EXPRESSION TAG
SEQADV 7CTO HIS A -13 UNP W8TTH3 EXPRESSION TAG
SEQADV 7CTO HIS A -12 UNP W8TTH3 EXPRESSION TAG
SEQADV 7CTO HIS A -11 UNP W8TTH3 EXPRESSION TAG
SEQADV 7CTO HIS A -10 UNP W8TTH3 EXPRESSION TAG
SEQADV 7CTO GLU A -9 UNP W8TTH3 EXPRESSION TAG
SEQADV 7CTO ASN A -8 UNP W8TTH3 EXPRESSION TAG
SEQADV 7CTO LEU A -7 UNP W8TTH3 EXPRESSION TAG
SEQADV 7CTO TYR A -6 UNP W8TTH3 EXPRESSION TAG
SEQADV 7CTO PHE A -5 UNP W8TTH3 EXPRESSION TAG
SEQADV 7CTO GLN A -4 UNP W8TTH3 EXPRESSION TAG
SEQADV 7CTO GLY A -3 UNP W8TTH3 EXPRESSION TAG
SEQADV 7CTO ALA A -2 UNP W8TTH3 EXPRESSION TAG
SEQADV 7CTO ALA A -1 UNP W8TTH3 EXPRESSION TAG
SEQADV 7CTO SER A 0 UNP W8TTH3 EXPRESSION TAG
The SEQADV records give the PDB entry (7CTO), the amino acid (HIS for histidine), the chain (chain A here) and the sequence number of the residue. The sequence numbers helpfully start at -16 and work their way up to zero.
The SEQADV records are repeated for chains B, C, D, E, and F. So there are 102 SEQADV records, covering the first 17 residues of 6 chains.
These first 17 residues are missing from each of the chains. Each chain also has additional missing residues, which will be described in a separate post.
The PDB file for 7CTO also has lots of REMARK 465 records that detail each missing residue. Here are the first few records in this section:
REMARK 465 MISSING RESIDUES
REMARK 465 THE FOLLOWING RESIDUES WERE NOT LOCATED IN THE
REMARK 465 EXPERIMENT. (M=MODEL NUMBER; RES=RESIDUE NAME; C=CHAIN
REMARK 465 IDENTIFIER; SSSEQ=SEQUENCE NUMBER; I=INSERTION CODE.)
REMARK 465
REMARK 465 M RES C SSSEQI
REMARK 465 MET A -16
REMARK 465 HIS A -15
REMARK 465 HIS A -14
REMARK 465 HIS A -13
...
It turns out that each of the 6 chains is also missing residues 1 and 2, and residue 142. Chains A and E are also missing residues 3, 4, and 5. Chains A, C, and F are missing 140 and 141 in addition to 142. Residues 42 and 43 are missing from chain D, but are present in the other chains.
The table below summarizes what all 138 of those REMARK 465 records are trying to say:
| RES | SEQ | A | B | C | D | E | F |
| MET | -16 | X | X | X | X | X | X |
| HIS | -15 | X | X | X | X | X | X |
| HIS | -14 | X | X | X | X | X | X |
| HIS | -13 | X | X | X | X | X | X |
| HIS | -12 | X | X | X | X | X | X |
| HIS | -11 | X | X | X | X | X | X |
| HIS | -10 | X | X | X | X | X | X |
| GLU | -9 | X | X | X | X | X | X |
| ASN | -8 | X | X | X | X | X | X |
| LEU | -7 | X | X | X | X | X | X |
| TYR | -6 | X | X | X | X | X | X |
| PHE | -5 | X | X | X | X | X | X |
| GLN | -4 | X | X | X | X | X | X |
| GLY | -3 | X | X | X | X | X | X |
| ALA | -2 | X | X | X | X | X | X |
| ALA | -1 | X | X | X | X | X | X |
| SER | 0 | X | X | X | X | X | X |
| MET | 1 | X | X | X | X | X | X |
| LEU | 2 | X | X | X | X | X | X |
| LYS | 3 | X | X | X | X | X | |
| LYS | 4 | X | X | X | |||
| ASP | 5 | X | X | ||||
| LYS | 42 | X | |||||
| ILE | 43 | X | |||||
| PHE | 140 | X | X | X | |||
| ASP | 141 | X | X | X | |||
| LYS | 142 | X | X | X | X | X | X |
The chains are shown by columns A through F. An "X" in a column means the residue is missing from that chain.
This thing is too crashy. I almost never crash and its crashing when I leave a script running on a trimmed segments. Also, I never get 502,504 errors. This also occurs when they are present in my client.
I've corrected the chains listed above in my first post. The recipe Find the Gap was useful, and points to a strategy for detecting chains on puzzles like this one.
I'm still puzzled by how the missing residues are handled. For example, chain D is missing residues 42 and 43, and the solution is apparently to connect residue 41 to residue 44 and continue. (The puzzle doesn't allow inserting segments to fill that gap or the others). I can see leaving off residues at the beginning or end of chain, but that approach makes less sense in the middle.
I am also getting multiple bad gateway 502 errors and dare not touch anything even when a simple fuse is running, restarted, chat is not working and no scores showing in refreshed puzzle list on program open.
Eight hundred sixteen residues? Really?? Why did we not get the individual sub-units as multiple puzzles?
This puzzle broke all my scripts, and possibly gave my desktop cancer. Please save the big stuff for AlphaFold.
This puzzle was puckered due to missing residues.
The recipe Pucker Picker 3.0 RC 1 detected these puckers:
Pucker Picker 3.0 RC1 2258: Electron Density Reconstruction 25 1 pucker found! pucker 1 (distance/ideality), segments 447-448 (protein), distance = 5.031, ideality = -2475.958, -2473.121