The AlphaFold prediction tool in Foldit

Started by bkoep

bkoep Staff Lv 1

AlphaFold and Foldit use completely different approaches for evaluating proteins; unfortunately neither is faultless. We should expect some results like this that are difficult to interpret. If possible, I would encourage you to try and refine your design so that it satisfies both Foldit and AlphaFold.

On the one hand, AlphaFold confidence is not a perfect predictor. Based on figure 3 above, we recommend aiming for AF confidence > 80%, because it allows us to reject many doomed designs without sacrificing too many successful designs. However, figure 3 also shows that, even for designs with AF confidence > 80%, a large fraction (about 4 in 10) still fail lab testing.

On the other hand, Foldit's scoring algorithms are based on real physical principles, although in practice these algorithms rely heavily on approximations. Given the high similarity of your AF prediction, I'm guessing the 5 BUNS are probably at the boundary between protein surface and core; in this region, it can be difficult for Foldit's coarse surface area calculations to determine reliably whether an atom is exposed to solvent. Furthermore, many of the Foldit Objective targets are imperfect heuristics that are known to improve success rates in lab testing. For example, a bad loop is not necessarily a deal-breaker for a designed protein, but we know that lab success rates increase if we restrict ourselves to ideal loops.

Ultimately, we are faced with two useful-but-imperfect approaches for evaluating a protein design. But they will not necessarily conflict with one another in every case! Our hope is that Foldit players may be able to find designs that look good both to Foldit and to AlphaFold.

argyrw Lv 1

Hi!we want only statistic 80% in the confidence and similarity or we want max points of the foldit with that similarity and confidence. it's necessary have max points or the important is that statistic?

Bletchley Park Lv 1

In a different feedback thread, bkoep wrote:
"Also, you bring up an important point about AlphaFold and natural proteins. In Foldit we are using an "abbreviated" version of AlphaFold that is not expected to work well on natural protein sequences.

The official, complete AlphaFold pipeline requires an extra step, scanning a large database for sequences that are similar to your query sequence. These similar sequences should all be evolutionarily related, and AlphaFold is able to extract patterns from this evolutionary data. AlphaFold is extremely good at extracting patterns from this evolutionary data, and this seems to be one of the reasons it performed so well in CASP."

If i read this a certain way I get the impression you do not use the entire PDB as input for your trained model nor for scanning the evaluated model for natural similarities ?

I can imagine you could add that step on a separate database server ?

jeff101 Lv 1

<pre>Would it be possible to list confidence scores for each segment in our designs so we know which regions are most likely and which are least likely to fold as predicted by AlphaFold? Perhaps each segment's confidence score could be shown when we Tab on that segment. It would also help to have a LUA command to read the confidence score for any given residue.

Also, would it be possible to have Foldit color the
protein by the confidence score, with green for the
segment with the highest confidence score and red
for the one with the lowest confidence score?

Thanks!</pre></code>

bkoep Staff Lv 1

As long as your design has a reasonable Foldit score (maybe >8000 points for a binder design puzzle), then high AF confidence and similarity are probably sufficient. Further optimization for Foldit score is not likely to improve the quality of your design.

This is because the Foldit score reflects the absolute energy of your solution, which is not the same as its folding stability. For more about how Foldit score and energy relates to folding stability, see this previous blog post about energy landscapes.

bkoep Staff Lv 1

Yes that's mostly correct: When you submit a Foldit solution for an AlphaFold prediction, we do not scan for native proteins that might be similar to your solutions.

In theory, yes, we could add that step and it would certainly improve prediction quality for natural proteins. However, that would increase the runtime for AlphaFold predictions and we would not expect it to improve prediction quality for designed proteins, so we do not intend to add this scanning step to the AlphaFold tool in Foldit.

To be clear, the Foldit team has not retrained the AlphaFold model. We are using the same model architecture and parameters that were developed and published by the DeepMind team. Indeed, much of the PDB was used by DeepMind for training.

bkoep Staff Lv 1

Yes, the AlphaFold confidence can be broken down into per-residue confidence, but we need to do a little bit more work in Foldit to support this.

Bruno Kestemont Lv 1

Are there lua commands to:
-save to AF with a name (return error message if failed, or "ok" if succeeded)
-read Confidence and Similarity numbers of an AF name
-download an original or prediction of an AF by name