Placeholder image of a protein
Icon representing a puzzle

2660: Electron Density Reconstruction 135

Closed since 6 months ago

Novice Overall Prediction Electron Density

Summary


Created
September 04, 2025
Expires
Max points
100
Description

The structure of this protein has already been solved and published, but close inspection suggests that there are some problems with the published solution. We'd like to see if Foldit players can use the same electron density data to reconstruct a better model. Note- the map here is really blobby!

Sequence
MGASYSSYLAKADQKRGKKQTARETKKKVLAERRKPLNIDHLNEDKLRDKAKELWDWLYQLQTEKYDFAEQIKRKKYEIVTLRNRIDQAQKHSKKAGAKGKVGGRWK ASMTDQQAEARAFLSEEMIAEFKAAFDMFDADGGGDISTKELGTVMRMLGQNPTKEELDAIIEEVDEDGSGTIDFEEFLVMMVRQMKEDAKGKSEEELANCFRIFDKNADGFIDIEELGEILRATGEHVTEEDIEDLMKDSDKNNDGRIDFDEFLKMMEGVQ SDEEKKRRAATARRQHLKSAMLQLAVTEIEKEAAAKEVEKQNYLAEHSPPLSLPGSMQELQELSKKLHAKIDSVDEERYDTEVKLQKTNKELEDLSQKLFDLRGKFKRPPLRRVRMSADAMLRALLGSKHKVNMDLR

Top groups


  1. Avatar for Go Science 100 pts. 39,659
  2. Avatar for Contenders 2. Contenders 63 pts. 39,613
  3. Avatar for L'Alliance Francophone 3. L'Alliance Francophone 37 pts. 38,879
  4. Avatar for Anthropic Dreams 4. Anthropic Dreams 21 pts. 38,815
  5. Avatar for Void Crushers 5. Void Crushers 11 pts. 37,886
  6. Avatar for FamilyBarmettler 6. FamilyBarmettler 5 pts. 37,478
  7. Avatar for VeFold 7. VeFold 2 pts. 37,454
  8. Avatar for Australia 8. Australia 1 pt. 37,321
  9. Avatar for Marvin's bunch 9. Marvin's bunch 1 pt. 36,883
  10. Avatar for Gargleblasters 10. Gargleblasters 1 pt. 36,474

  1. Avatar for WBarme1234 21. WBarme1234 Lv 1 16 pts. 37,478
  2. Avatar for hookedwarm 22. hookedwarm Lv 1 14 pts. 37,454
  3. Avatar for AlkiP0Ps 23. AlkiP0Ps Lv 1 13 pts. 37,321
  4. Avatar for montezumasrevenge 24. montezumasrevenge Lv 1 11 pts. 37,192
  5. Avatar for jausmh 25. jausmh Lv 1 10 pts. 36,883
  6. Avatar for Trajan464 26. Trajan464 Lv 1 9 pts. 36,847
  7. Avatar for Bletchley Park 27. Bletchley Park Lv 1 8 pts. 36,789
  8. Avatar for dizzywings 28. dizzywings Lv 1 7 pts. 36,474
  9. Avatar for hada 29. hada Lv 1 6 pts. 36,325
  10. Avatar for manu8170 30. manu8170 Lv 1 5 pts. 36,205

Comments


LociOiling Lv 1

I guess "decode this ED puzzle" should be a new weekly feature. Once again, the sequence stated above is not quite what you'll see in Foldit, due to the "missing residue" problem. Decoding which PDB entry matches the Foldit puzzle requires looking at not only the primary sequence (amino acids), but also the "missing residues" reported in the PDB. It's a little bit of a black art, hence this very long post.

UPDATE

I was asked to simplify this very long post. The result is Quick protein identification on the wiki. Somewhat shorter, and it has pictures.

Along the way, I discovered that the sequence shown on this page actually has the sequence for all three chains. It's hard to see, but a space separates the chains.

I also found that the PDB has a Structure tab, which offers a 3D viewer. The 3D viewer for 1VY0 shows a dotted line where a section of missing residues splits up chain C. Selecting the residues on either side of the gap shows the residue number, which was good enough for a match to puzzle 2660. (Almost perfect.)

The dotted line appears only when the missing residues are in the middle of a chain. The 3D viewer shows the sequence information, and it grays out the missing residues, which handles the ends of the chain and the mijdde. For example, the viewer shows chain A by default, with "MGASYSSY" and "KKAGAKGKVGGRWK" grayed out. The dropdown next to where it says "Troponin T" lets you select the other chains, which likewise have the missing bits grayed out.

Using the 3D viewer is a nice shortcut for 2660, since it avoids digging into the PDB file. I'm not sure if it would work in all cases, but it's worth a try.

The original long post follows.

Intro

In Foldit, the recipe AA Edit 3.0 sees this puzzle as having four chains.

The first chain has this sequence:

lakadqkrgkkqtaretkkkvlaerrkplnidhlnedklrdkakelwdwlyqlqtekydfaeqikrkkyeivtlrnridqaqkhs

Using that string to search at rcsb.org, there are a number of matches. The first match, 1YTZ, does not quite line up with the Foldit puzzle. The second match, 1YV0, is better. Both 1YTZ and 1YVO have very similar sequence info, but 1YVO is missing the same residues as the Foldit puzzle.

Chain by chain

Looking at the sequence for 1YV0, there are three chains. In the PDB, chain A starts with "MGASYSSY", then the rest of the Foldit first chain follows. Chain A ends with "KKAGAKGKVGGRWK", which is not part of the Foldit puzzle.

Chain B in the PDB is similar. It starts with "SD", then the Foldit chain B follows. Foldit's chain B is missing the final sequence "DAMLRALLGSKHKVNMDLR" seen in the PDB.

PDB chain C ends up becoming chains C and D in Foldit. PDB chain C starts with "ASM" , followed by Foldit's chain C.

PDB chain C continues with "MKEDAKGKSEE", then Foldit's chain D follows. The end of Foldit chain D is also the end of the PDB sequence.

Just to add some complexity, the chains are also identified T, I, and C, based on how they are identified in the troponin complex. The sequence file, for example, shows "Chain A[auth T]|Troponin T", which corresponds to chain A in Foldit. For even more complexity, this chain is listed last in the PDB.

The missing residues section of PDB 1YVO uses T, I, and C, so you'll see:

REMARK 465 MISSING RESIDUES
REMARK 465 THE FOLLOWING RESIDUES WERE NOT LOCATED IN THE
REMARK 465 EXPERIMENT. (M=MODEL NUMBER; RES=RESIDUE NAME; C=CHAIN
REMARK 465 IDENTIFIER; SSSEQ=SEQUENCE NUMBER; I=INSERTION CODE.)
REMARK 465
REMARK 465 M RES C SSSEQI
REMARK 465 MET T 156
REMARK 465 GLY T 157
REMARK 465 ALA T 158
REMARK 465 SER T 159
REMARK 465 TYR T 160
REMARK 465 SER T 161
REMARK 465 SER T 162
REMARK 465 TYR T 163
REMARK 465 LYS T 249
REMARK 465 LYS T 250
REMARK 465 ALA T 251
REMARK 465 GLY T 252
…and so on…

The first part is "MGASYSSY", which is missing at the start of Foldit chain A.

Sharp-eyed observers will not that the residue sequence numbers for chain A/T do not start at 1. There is no PDB requirement to start at 1, and zero or even negative numbers appear in some PDB entries.

Chain A/T ends is another range of missing residues, starting with "KKAG", which is missing from the end of Foldit chain A.

Chain B/I likes starts with missing residues "SD", then ends with a section "DAMLR" and so on. Both these sections are missing in Foldit. For convenience, chain A/T numbers its residues starting at 1.

Chain C is chain C in the PDB, since it represents troponin C. Just to keep you on your toes, residue numbers start at zero. The initial section "ASM" is missing in the PDB and in Foldit. The sequence of Foldit chain C follows, then there's another missing section, "MKEDAKGKSEE". The PDB joyfully lists these as 85A, 85B, 85C, and so on. It's true, residue sequence numbers don't have to be numbers in the PDB. (And it's 85A-85K, so it's not hexadecimal.) The sequence after this second, mid-chain section of missing residues become Foldit chain D.

Just for the record, here the chain C missing residue records with the unusual numbering:

REMARK 465 MET C 85A
REMARK 465 LYS C 85B
REMARK 465 GLU C 85C
REMARK 465 ASP C 85D
REMARK 465 ALA C 85E
REMARK 465 LYS C 85F
REMARK 465 GLY C 85G
REMARK 465 LYS C 85H
REMARK 465 SER C 85I
REMARK 465 GLU C 85J
REMARK 465 GLU C 85K

The Foldit segment information window shows "PDB #: C96" through "PDB #: C161" for what we're calling Foldit chain D. So there's no trace of the creative segment numbers in Foldit. Looking at the full PDB file for 1YV0, the segment identifiers for chain C in the ATOM records appear to be all numeric.

Summary

To recap, AA Edit 3.0 provides a good way to determine what chains exist in a Foldit puzzle. (Due to limitations in Foldit, it's not 100% guaranteed, however.)

A sequence reported by AA Edit can be used at rcsb.org. Just copy the info for one chain from the AA Edit display and paste it into the search box at rcsb.org.

A PDB search often finds multiple results. Look for a result with "Sequence Identity: 100%" as a starting point for matching to Foldit.

An exact match to Foldit may depend on "missing residues" reported in the header section of the PDB file, or what they're now calling "Legacy PDB Format (Header)". The legacy PDB format includes FORTRAN-style entries like the ones shown here.

The PDB uses uppercase to identify amino acids, but it also uses three-letter abbreviations in some records. So methionine is "M" in the PDB sequence file, and "MET" in many of records in the PDB file itself. In Foldit, methionine is reported as "m", which some recipes expand to "MET" or "met" or "methionine".

Awkwardly, the "FASTA" sequence information is a separate file, and uses single-letter uppercase codes. The sequence info is also reported in the main PDB file, and a series of three-letter codes separated by spaces.

The page for a given PDB entry at rcsb.org includes two dropdowns: "Display Files" and "Download Files". "Display Files" gives you "FASTA sequence", "Legacy PDB Format", and "Legacy PDB Format (Header)". "Download Files" is similar, but there's no "Header" option for the PDB file, so you'll get all the ATOM records and other low-level details.

The "Download Files" section also includes compressed downloads (gz), and options for downloading validation information that's not found in "Display Files".

Both the display and download options also reference the newer mmCIF or PDBx format, which is more machine-friendly, but also includes many records that are similar to the legacy format. The newer formats don't have a straightforward representation of the "missing residue" records from the legacy PDB format, making them less suitable for the purpose of matching a Foldit puzzle. (Presumably, the information is in there somewhere….)

LociOiling Lv 1

After a little more digging, I found the missing residues are under "_pdbx_unobs_or_zero_occ_residues" in the mmCIF or PDBx versions of a PDB file.

Here's what the missing residue info looks like in the mmCIF version of 1YV0:

loop_
_pdbx_unobs_or_zero_occ_residues.id
_pdbx_unobs_or_zero_occ_residues.PDB_model_num
_pdbx_unobs_or_zero_occ_residues.polymer_flag
_pdbx_unobs_or_zero_occ_residues.occupancy_flag
_pdbx_unobs_or_zero_occ_residues.auth_asym_id
_pdbx_unobs_or_zero_occ_residues.auth_comp_id
_pdbx_unobs_or_zero_occ_residues.auth_seq_id
_pdbx_unobs_or_zero_occ_residues.PDB_ins_code
_pdbx_unobs_or_zero_occ_residues.label_asym_id
_pdbx_unobs_or_zero_occ_residues.label_comp_id
_pdbx_unobs_or_zero_occ_residues.label_seq_id
1 1 Y 1 T MET 156 ? A MET 1
2 1 Y 1 T GLY 157 ? A GLY 2
3 1 Y 1 T ALA 158 ? A ALA 3
4 1 Y 1 T SER 159 ? A SER 4
5 1 Y 1 T TYR 160 ? A TYR 5
6 1 Y 1 T SER 161 ? A SER 6
7 1 Y 1 T SER 162 ? A SER 7
8 1 Y 1 T TYR 163 ? A TYR 8
…and so on…
#

I think XML would have been a wiser choice for the new format.

Aarav_Awasthi Lv 1

Thanks Loci for that post. While it was fairly hard to understand, I think I have a good grasp on how to match the puzzle to a PDB entry.

In an unrelated remark, I am about to get my first top 3 finish! Thanks to all of the veterans that answered any of my dumb questions, particularly Bravo, Orily, and Loci. This is a great community of individuals, and I am excited to see where Foldit takes me!