Reconstruction Puzzle Update

Started by horowsah

horowsah Staff Lv 1

It’s been awhile since we’ve touched base on the Reconstruction puzzle series. As a recap of the previous blog post, there are a lot of crystal and cryo-EM structures in the protein data bank (aka PDB) that have mistakes in them, and this causes problems when people need to use that data. Luckily, Foldit players are quite good at finding and correcting these mistakes when rebuilding proteins into electron density, so we’ve continued giving Reconstruction puzzles so we can slowly but surely improve the quality of the protein structures in the PDB. There are a lot of these not-so-great structures out there, so we have plenty of structures to choose from!

One note so far on protein sequences and electron density-type structures. Whether it’s cryo-EM or crystallography, the biggest enemy of the person trying to put a protein inside the density is if part of the protein can be in multiple positions. As a simple example, let’s say if we have one protein conformation, we have plenty of signal to see it in the electron density. However, if we have two conformations of the same protein existing simultaneously, we have half the signal for each, which might not be enough to distinguish it from noise. Once we get to three conformations, it’s likely we won’t be able to see it at all. The problem is that most proteins have parts of them that do this, and so those parts of the protein will typically be invisible to these methods. This is the most common reason we often see only a portion of a protein in a solved structure. So in a Reconstruction puzzle, there very well might be segments that are listed in the sequence that don’t show up in the structure. They might actually be there, but are just too low in electron density signal to find.


Figure of a part of a Reconstruction puzzle where the density is poor- likely this is due to the flexibility of this part of the protein and averaging out of the signal as a result.

Making this more confusing, not every scientist handles this problem the same way. Some will delete the sequence out entirely from both the structure file and the sequence file. Some will just delete it out of the structure file, but leave it in the sequence file. Some will leave it in both, but will put the “occupancy” of the segment in the structure file as zero, meaning that they think it’s there, but they just can’t find it. That’s why all of these variations can show up in Foldit puzzles!

As always, keep doing what you do— the work on the Reconstruction puzzles is meaningful and does help. We are continuing to work on tools for electron density puzzles, and hope to be able to preview some new ones for you soon.

Please don’t hesitate to give us feedback on these new electron density tools, such as the Trim tool.

alcor29 Lv 1

Just to clarify something. The head of Deep Mind said on yesterday's "Sixty Minutes" that last year they ran their AI programs and solved the structures of all 200 plus million known proteins. How does our reconstruction of EDs fit in to this. Did they not solve all of them? Did they just skip over possible discrepancies. Sorry but AI is on my mind a lot these days, and not in a friendly way.

rmoretti Staff Lv 1

It likely depends on what you mean by "solved". If you simply mean "put the sequence in the front and got a structure out the back", then yes, DeepMind has run 200+ million proteins through AlphaFold (https://alphafold.ebi.ac.uk/)

Now, are those structures which came out of the AlphaFold process accurate? That's a bigger question. There are certainly sequences which come back out of AlphaFold where AlphaFold doesn't have a high confidence. Some of those don't actually have structure, but there's likely quite a few where they do have structure but AlphaFold doesn't know what it is. There's also cases where AlphaFold thinks its confident on what the structure is, but it's not necessarily correct.

It's also the case that AlphaFold can potentially get close to the native structure, but whether it's "close enough" is an open question. It might be close enough that the general topology is correct, but there might be small differences with exactly how the helix is placed, or where the sidechain is positioned that make a difference. That's the sort of thing the reconstruction puzzles are looking at. These puzzles already have a rough structure that's likely mostly correct, but there might be small differences in how the backbone and the sidechains are placed where the coordinates don't quite match up with the experimental electron density. The hope is to get Foldit players to tweak the structure into the experimental density to better. This sometimes depends on small movements of the protein which is often below the level which AlphaFold typically deals with.

Bletchley Park Lv 1

How can the following be achieved (without breaking any Foldit rules):

  1. Programmatically select and trim a section from within a script (in multiple clients)
  2. Determine the improved structure (backbone and rotamer position) of that trimmed section in those different clients numerically
  3. Apply this numeric 'footprint' of the improvement in that structure from multiple clients back into one original client where all these changes are then incorporated and merged ?

I have a large number of cores available but only use one per density puzzle which is a total waste.

rosie4loop Lv 1

Just to clarify something. The head of Deep Mind said on yesterday's "Sixty Minutes" that last year they ran their AI programs and solved the structures of all 200 plus million known proteins. How does our reconstruction of EDs fit in to this. Did they not solve all of them? Did they just skip over possible discrepancies. Sorry but AI is on my mind a lot these days, and not in a friendly way.

Another limit of AlphaFold2 (AF2) model is that at this moment only a single conformation can be predicted, which is not necessary a "functional" model, in particular as many proteins would adapt different conformations for their functions.

This can be a big problem when we want to use AF2 predictions in our research. For example, in a project of a RNA-binding enzyme I've been working on as a research student, the AF2 model has the catalytic cavity completely closed. Which make it difficult to model the RNA in it without any further adjustments.

A review here (https://pubs.acs.org/doi/10.1021/acs.jpcb.2c04346) summarized several other limits of AF2 models with a focus on allostery (maybe difficult to read by general public), I would like to highlight the following points in the abstract of this article:

  • "AlphaFold does not resolve the decades-long protein folding challenge, nor does it identify the folding pathways. The models that AlphaFold provides do not capture conformational mechanisms like frustration and allostery"
  • "AlphaFold also does not generate ensembles of intrinsically disordered proteins and regions, instead describing them by their low structural probabilities."
  • "However, by capturing key features, deep learning techniques can use the single predicted conformation as the basis for generating a diverse ensemble."

Bruno Kestemont Lv 1

When a company solved the human genome, and potentially all genomes, one claimed that the biological problem was solved. But without the 3-D structure, nothing was actually solved.

Could we say that we are in the same kind of situation ? With the 3D structures solved, the biologic problemn is actually not solved yet.
-refinement is important in (most ?) cases
-the variety of in vivo structure isn't solved yet
-the interactions between proteins are not solved yet
-do dynamics of protein building and/or folding play a role in nature ?
-etc.