Placeholder image of a protein
Icon representing a puzzle

2350: Electron Density Reconstruction 56

Closed since over 2 years ago

Novice Overall Prediction Electron Density

Summary


Created
September 01, 2023
Expires
Max points
100
Description

The structure of this protein has already been solved and published, but close inspection suggests that there are some problems with the published solution. We'd like to see if Foldit players can use the same electron density data to reconstruct a better model. There are four copies of the same protein chain in this structure.

Sequence
FTCPECRPELCGDPGYCEYGTTKDACDCCPVCFQGPGGYCGGPEDVFGICADGFACVPLVGERDSQDPEIVGTCVKIP

Top groups


  1. Avatar for Anthropic Dreams 100 pts. 29,061
  2. Avatar for Go Science 2. Go Science 68 pts. 28,994
  3. Avatar for L'Alliance Francophone 3. L'Alliance Francophone 44 pts. 28,781
  4. Avatar for Contenders 4. Contenders 27 pts. 28,758
  5. Avatar for Marvin's bunch 5. Marvin's bunch 16 pts. 28,011
  6. Avatar for FamilyBarmettler 6. FamilyBarmettler 9 pts. 28,001
  7. Avatar for Gargleblasters 7. Gargleblasters 5 pts. 27,735
  8. Avatar for Void Crushers 8. Void Crushers 3 pts. 25,134
  9. Avatar for BOINC@Poland 9. BOINC@Poland 1 pt. 25,089
  10. Avatar for VeFold 10. VeFold 1 pt. 23,644

  1. Avatar for Bletchley Park 21. Bletchley Park Lv 1 17 pts. 27,572
  2. Avatar for Muzuqq 22. Muzuqq Lv 1 16 pts. 27,253
  3. Avatar for Artoria2e5 23. Artoria2e5 Lv 1 14 pts. 27,088
  4. Avatar for ichwilldiesennamen 24. ichwilldiesennamen Lv 1 13 pts. 27,012
  5. Avatar for Steven Pletsch 25. Steven Pletsch Lv 1 11 pts. 26,931
  6. Avatar for rosie4loop 26. rosie4loop Lv 1 10 pts. 26,599
  7. Avatar for hansvandenhof 27. hansvandenhof Lv 1 9 pts. 26,381
  8. Avatar for manu8170 28. manu8170 Lv 1 8 pts. 26,350
  9. Avatar for alcor29 29. alcor29 Lv 1 7 pts. 26,143
  10. Avatar for Idiotboy 30. Idiotboy Lv 1 6 pts. 25,882

Comments


LociOiling Lv 1

Once again, it appears we have gaps created by missing residues. The missing residues are ones the couldn't be found in the experimental results.

Instead of breaking the protein at these spots, the Foldit puzzle simply connects the segments on either side of the gap. This results in unusually straight spots, and segments with poor ideality scores.

The recipe Tvdl Show Worst 1.1.4 can highlight the segments with the worst ideality. Here are the worst offenders for puzzle 2350:

Segment 193, score = -36833.395
Segment 194, score = -36833.314
Segment 125, score = -33097.589
Segment 126, score = -33093.037
Segment 259, score = -23030.069
Segment 260, score = -23022.072

I'll try to find the protein in the PDB next.

LociOiling Lv 1

The protein appears to be a match for PDB 3ZXB.

The PDB file reveals a number of missing residues:

REMARK 465 MISSING RESIDUES                                                     
REMARK 465 THE FOLLOWING RESIDUES WERE NOT LOCATED IN THE                       
REMARK 465 EXPERIMENT. (M=MODEL NUMBER; RES=RESIDUE NAME; C=CHAIN               
REMARK 465 IDENTIFIER; SSSEQ=SEQUENCE NUMBER; I=INSERTION CODE.)                
REMARK 465                                                                      
REMARK 465   M RES C SSSEQI                                                     
REMARK 465     LEU A    59                                                      
REMARK 465     VAL A    60                                                      
REMARK 465     GLY A    61                                                      
REMARK 465     GLU A    62                                                      
REMARK 465     ARG A    63                                                      
REMARK 465     ASP A    64                                                      
REMARK 465     SER A    65                                                      
REMARK 465     GLN A    66                                                      
REMARK 465     ASP A    67                                                      
REMARK 465     PRO A    68                                                      
REMARK 465     GLU A    69                                                      
REMARK 465     ILE A    70                                                      
REMARK 465     VAL B    60                                                      
REMARK 465     GLY B    61                                                      
REMARK 465     GLU B    62                                                      
REMARK 465     ARG B    63                                                      
REMARK 465     ASP B    64                                                      
REMARK 465     SER B    65                                                      
REMARK 465     GLN B    66                                                      
REMARK 465     ASP B    67                                                      
REMARK 465     PRO B    68                                                      
REMARK 465     GLU B    69                                                      
REMARK 465     ILE B    70                                                      
REMARK 465     GLY C    61                                                      
REMARK 465     GLU C    62                                                      
REMARK 465     ARG C    63                                                      
REMARK 465     ASP C    64                                                      
REMARK 465     SER C    65                                                      
REMARK 465     GLN C    66                                                      
REMARK 465     ASP C    67                                                      
REMARK 465     PRO C    68                                                      
REMARK 465     GLU C    69                                                      
REMARK 465     ILE C    70                                                      
REMARK 465     LEU D    59                                                      
REMARK 465     VAL D    60                                                      
REMARK 465     GLY D    61                                                      
REMARK 465     GLU D    62                                                      
REMARK 465     ARG D    63                                                      
REMARK 465     ASP D    64                                                      
REMARK 465     SER D    65                                                      
REMARK 465     GLN D    66                                                      
REMARK 465     ASP D    67                                                      
REMARK 465     PRO D    68                                                      
REMARK 465     GLU D    69                                                      
REMARK 465     ILE D    70    

I'll look at where these gaps fit into the Foldit puzzle next.

LociOiling Lv 1

For PDB 3ZXB, there are supposed to be four identical chains. Here's the sequence from the PDB entry:

>3ZXB_1|Chains A, B, C, D|SINGLE INSULIN-LIKE GROWTH FACTOR-BINDING DOMAIN PROTEIN-1|CUPIENNIUS SALEI (6928)
FTCPECRPELCGDPGYCEYGTTKDACDCCPVCFQGPGGYCGGPEDVFGICADGFACVPLVGERDSQDPEIVGTCVKIP

Each of the four chains ended up with missing residues in almost exactly the same spot.

The way the Foldit puzzle skips over the gaps means that recipes like print protein think there eight chains. (Long story, but these recipes now rely on atom distances to tell where chains start and stop.)

Here's how the Foldit version lines up the original sequence. The "———-" parts indicate where the missing residues were. The rulers show the Foldit segment numbers.

         1         2         3         4         5       5            56     6
1234567890123456789012345678901234567890123456789012345678            90123456
ftcpecrpelcgdpgyceygttkdacdccpvcfqgpggycggpedvfgicadgfacvp------------vgtcvkip  chain A
FTCPECRPELCGDPGYCEYGTTKDACDCCPVCFQGPGGYCGGPEDVFGICADGFACVPLVGERDSQDPEIVGTCVKIP

0                                1         1         1    1           1   1  1
6  7         8         9         0         1         2    2           2   3  3         
78901234567890123456789012345678901234567890123456789012345           67890123
ftcpecrpelcgdpgyceygttkdacdccpvcfqgpggycggpedvfgicadgfacvpl-----------vgtcvkip  chain B
FTCPECRPELCGDPGYCEYGTTKDACDCCPVCFQGPGGYCGGPEDVFGICADGFACVPLVGERDSQDPEIVGTCVKIP

1     1         1         1         1         1         1  1          1     22
3     4         5         6         7         8         9  9          9     00 
456789012345678901234567890123456789012345678901234567890123          45678901
ftcpecrpelcgdpgyceygttkdacdccpvcfqgpggycggpedvfgicadgfacvplv----------vgtcvkip  chain C
FTCPECRPELCGDPGYCEYGTTKDACDCCPVCFQGPGGYCGGPEDVFGICADGFACVPLVGERDSQDPEIVGTCVKIP

                                                                      2      2
         1         2         3         4         5       5            6      6       
1234567890123456789012345678901234567890123456789012345678            01234567           
ftcpecrpelcgdpgyceygttkdacdccpvcfqgpggycggpedvfgicadgfacvp------------vgtcvkip
FTCPECRPELCGDPGYCEYGTTKDACDCCPVCFQGPGGYCGGPEDVFGICADGFACVPLVGERDSQDPEIVGTCVKIP  chain D

LociOiling Lv 1

The print protein or Bridge Wiggle will show there are lots of disulfide bridges at the start of the puzzle. The results from print protein:

24 disulfide bridges found, segment pairs = 
3,26 6,28 11,29 17,32 40,56 50,62 69,92 72,94 77,95 83,98 106,122 116,129 136,159 139,161 
144,162 150,165 173,189 183,197 204,227 207,229 212,230 218,233 241,257 251,263

LociOiling Lv 1

Here are the missing residues in table format, identified by the chain id and residue number within the chain:

AA A res B res C res D res
LEU A 59         D 59
VAL A 60 B 60     D 60
GLY A 61 B 61 C 61 D 61
GLU A 62 B 62 C 62 D 62
ARG A 63 B 63 C 63 D 63
ASP A 64 B 64 C 64 D 64
SER A 65 B 65 C 65 D 65
GLN A 66 B 66 C 66 D 66
ASP A 67 B 67 C 67 D 67
PRO A 68 B 68 C 68 D 68
GLU A 69 B 69 C 69 D 69
ILE A 70 B 70 C 70 D 70

I'm not sure if there's a practical use for any of this information, in terms of solving the Foldit puzzle. It's just kind of interesting to see how the gaps occurred at almost the same spot in each change. It's not clear to me why that would happen.

The Foldit approach of just ignoring the gaps and connecting the segments on either side doesn't seem like a reasonable approach to the problem.

Artoria2e5 Lv 1

Yeah, it looks like https://fold.it/forum/bugs/puzzle-2323-issues-with-missing-densities-in-alternate-locations-chain-a-and-false-residue-connectivity-chain-c all over again. I don't see if any purpose can be served with an unphysical model like this.

It's not clear to me why that would happen.

Probably just too floppy to give some good densities. The authors got an accidental deletion mutant as PDB 3ZXC, making the floppy part shorter and a bit easier to show.


My highscore has lost some of its disulfides. Not great…

rosie4loop Lv 1

it's better to leave a gap there instead of connecting them, as mentioned before. Or model them there and include surrounding density since sometimes it'd be an accident to miss a residue while solving crystal structure (can be identified from "big blobs of green densities" in the Fo-Fc map)

It's not clear to me why that would happen.

Crystal structures are "averaged structure" of all units in the crystal array. ideally all units have very similar structures, which result in beautiful clean density of the whole chain. but most of the time flexible regions in each unit cell adapt different conformations, making it difficult to observe densities in such regions.

I don't have time to play with the maps in this puzzle, but the same missing residues in each chain in the same unit cell is likely because it's too flexible to have a "good average position" to be observed in the map, or in some cases it can be unmodelled (need to double check the original map).

A quick way to check if it's accidentally unmodelled or it's really disordered would be observing big, green blobs of Fo-Fc map in suspicious regions, model the residue there, then recalculate the map and see (1) if it'd improve the map cc value/R value, (2) if there's red map appearing that overlap the newly modelled residue which means it's probably wrong to put it there.

Practically it can be more complicated, though, e.g. considering alternative conformations, partial occupancy etc. or other factors.

rosie4loop Lv 1

Personally I think that the current approach of how Foldit treat such missing regions shouldn't be used for xray structure refinement.

Instead this approach would be useful for modelling splice variants that the gene expressing a protein is shortened (see alternative splicing) resulting in a protein with a deleted sequence.

AlphaFold isn't good at this task since it cannot predict something new, most of the time it'd just predict a gap in the structure same as the wildtype crystal. Or connecting the gap in a less ideal way, that I'd prefer using conventional template-based modelling with physics-based restraints to do that. It would take a lot of efforts to do the validation of the predicted structure, possibly long simulation would also be required to get a reasonable model.

LociOiling Lv 1

This puzzle was puckered due to missing residues.

The recipe Pucker Picker 3.0 RC 1 detected these puckers:

Pucker Picker 3.0 RC1
2350: Electron Density Reconstruction 56
4  puckers found!
pucker 1 (ideality), segments 58-59 (protein), distance = 5.558, ideality = -14621.86, -14617.631
pucker 2 (ideality), segments 125-126 (protein), distance = 8.817, ideality = -33162.61, -33161.632
pucker 3 (ideality), segments 193-194 (protein), distance = 10.303, ideality = -36784.395, -36790.323
pucker 4 (ideality), segments 259-260 (protein), distance = 6.606, ideality = -23014.855, -23005.127