The Neural Net Objective

Started by bkoep

bkoep Staff Lv 1

To help players reach high AlphaFold confidence, we are launching a new Neural Net Objective that can highlight the parts of your protein that are incompatible with an AlphaFold prediction.

A guide for AlphaFold confidence

Since we launched the DeepMind AlphaFold tool in Foldit last summer, players have been able to submit their protein designs for AlphaFold prediction. The confidence of an AlphaFold prediction seems to be a good indicator of design success.


Figure 1. Successful designs (blue) tend to yield AlphaFold predictions with higher confidence than design failures (orange). We would like a way to convert low-confidence solutions into high-confidence solutions.

This is nice because a high AlphaFold confidence gives us some human confidence that our designs will fold in the lab. For especially motivated Foldit players, it suggests when a work-in-progress has become “good enough” and it’s time to start over with another design.

However, a low confidence can be frustrating to work with, because the prediction doesn’t suggest how you can improve your design. You are on your own to try and guess what it is that AlphaFold doesn’t like in your design.

The Neural Net Objective is meant to guide Foldit players towards designs with higher AlphaFold confidence. This new Objective analyzes the underlying data in an AlphaFold prediction, and looks for local regions of your solution that contrast with this data.

Local design quality

Before digging in, let’s re-familiarize ourselves with the concept of local interactions in a protein.

Local interactions occur between residues that are close in sequence – for example, H-bonding between residues in the same loop. By contrast, non-local interactions involve residues that are far from each other in the protein sequence (although they might end up close to one another after the protein folds).


Figure 2. An example protein design 2003796_1015</i> by Galaxie, illustrating local and non-local interactions. The green dashed line shows a local H-bond, between two residues that are close to one another in the protein sequence. The blue dashed line shows a non-local H-bond, between two residues from distant parts of the protein chain.</i>

Seasoned Foldit players might remember back in 2017 when we liked to make a big deal about fragment quality. During our design analysis, Foldit scientists would focus on local interactions in a protein design by breaking it down into small fragments (about 9 residues), which could be easily compared with fragments of natural proteins.

We found that the Foldit Rebuild tool was introducing unrealistic fragments into Foldit players’ designs, with shapes that didn’t match normal protein fragments. These unrealistic fragments were preventing the whole protein from folding.

It was convenient that we could isolate the problem to a local issue, because it suggested the problem might be corrected locally as well, without disrupting the rest of the protein design. In some cases, it seemed you could just swap out a single bad loop with a better one, and “rescue” the whole design.

Ultimately, the fragment quality analysis led us to revamp the old Rebuild tool into the newer Remix, and focus on “idealized” loops with well-known ABEGO patterns. Now, without the wacky fragments, Foldit players have able to design creative new proteins with a high success rate, as described in the landmark 2019 Foldit design paper.

Of course, in the full picture, there is more to protein folding than local effects! A protein folding landscape includes lots of important “long-range” interactions that can’t be captured in small fragments. But, if we can pinpoint local problems with a design, these are usually the first places to make improvements.

Distograms

Fast-forward to 2020, when protein design researchers were starting to discover the incredible power of deep neural networks like trRosetta. (AlphaFold v2.0 was already announced, but not published until 2021.)

Much neural net research originates from the field of 2D image recognition, and was later adapted to other problems (like protein folding). Instead of modeling the 3D protein structure directly, neural nets will often represent the protein structure as a 2D distogram that predicts the distance between every pair of residues in a protein (very similar to a Contact Map). We covered this idea more in-depth in our discussion of AlphaFold v1.0. Distograms are used heavily by AlphaFold v1.0 and trRosetta; the modern AlphaFold v2.0 adds a 3D representation as well, but still uses distograms internally.

One way of comparing a protein design to a neural net prediction is to measure how well the predicted distogram matches the actual distances in your model. If the distances in your model match the predicted distogram from the neural net, then model is in agreement with the neural net.


Figure 3. Visualizing distogram agreement for the design in Figure 2 above. (Left) A heatmap plotting the cross entropy (CE) between the predicted distances from AlphaFold and the actual distances in Galaxie's model; darker cells indicate strong agreement while lighter cells show disagreement. The right heatmap shows the local distogram, ignoring interactions from residues that are distant in sequence. The green and blue squares highlight the same local and non-local interactions from Figure 2 above.</a>

Using distograms to fix designs

In late 2020, we were joined at the IPD by a talented student and Foldit player Susan Kleinfelter, who discovered that we could use distograms to derive especially useful information about local design quality.

Susan noticed that, if you focused only on local interactions and ignored everything else, you could use a distogram to evaluate the local structure of a protein design – similar to the way we previously evaluated fragment quality. In fact, Susan found that the local distograms from trRosetta were strongly correlated with fragment quality. The local distograms were very good at pointing out regions with local problems.


Figure 4. Distogram agreement predicts fragment quality. (Left) The distribution of local distogram cross entropy for >30,000 fragments from 4000 Foldit designs. Poor-quality fragments (RMSD > 2.0 A) tend to show more disagreement with local distogram predictions (CE > 2.0); good quality fragments tend to show better agreement with distogram predictions. (Right) The fragment quality and distogram agreement for every residue in Galaxie's design. Both fragment quality and distogram agreement indicate a problem region around residues 25-30.

Critically, Susan then showed that these local problems could be corrected with a local solution. She could completely rescue a failed design by mutating only the residues in the problematic region, leaving everything else untouched.

After AlphaFold 2 was published in 2021, we repeated Susan’s trRosetta experiments with the AlphaFold distograms, and showed that it was also very good at predicting fragment quality (although not quite as good as trRosetta…).


Figure 5. Redesigning problem regions in 4000 Foldit designs. (Left) The distribution of AlphaFold confidence for 4000 designs before and after we redesigned problem regions (distogram CE > 2.0). After redesign, significantly more designs pass our goal of 80% confidence. (Right) The distribution of retained sequence identity for the 4000 redesigned models. Our redesign only mutated a few residues in each solution, with most solutions retaining >80% of their original sequence.

Likewise, we’ve found that the problem regions identified by the AlphaFold distogram are sweet spots for redesign. By redesigning only the regions with poor distogram agreement, we were able to drastically improve the AlphaFold confidence with minimal changes to the overall design. In this dataset of 4000 previous Foldit designs, we mutated only 20% of residues on average, and the number of high-confidence designs went from 17% to 44%!

For some designs, like Galaxie’s design 2003796_1015 above, the difference is even more stark. By mutating just two residues in the offending region, we can bring the AlphaFold confidence of this design from 66% to 81%, turning a problematic design into a promising one!

The Neural Net Objective

The Neural Net Objective will be included on all future puzzles that allow AlphaFold submissions.

The Objective can only run if it has an AlphaFold prediction to work with, so most of the time it will simply report “No data”. Use the DeepMind AlphaFold tool to submit your solution for an AlphaFold prediction.

After AlphaFold finishes and you load the result (either Load Original or Load Prediction), then the Neural Net Objective will display the AlphaFold confidence in the upper-left Objectives Panel. Click Show to color your solution according to the distogram analysis. For convenience, the AlphaFold panel also includes a checkbox to Show Neural Net Objective.

A blue color reflects agreement between your solution and the AlphaFold distogram. A red color indicates disagreement, and suggests that residues in this area should be mutated.

Unfortunately, the AlphaFold distogram doesn’t tell us which amino acids will improve confidence, so you may have to play around with different mutations to find something that works. If no mutations can improve the distogram agreement, you might use the Remix tool to try a new backbone shape.

For now, the Neural Net Objective will not award any score bonus or penalty. But we hope that players will find it useful for improving the AlphaFold confidence of Foldit designs. Higher-confidence designs means a higher success rate for lab testing, which will lead us to even more exciting science!

Bruno Kestemont Lv 1

Do I understand that a player had a critical idea to find a new usefull tool for science ?
Will she be rewarded by being mention as co-author (not ony in the list of contributing players) ?
Congratulation to Susan Kleinfelter.

Knowing all contributions and research by Susume concerning ideal loops, i just hope it's her.

bkoep Staff Lv 1

Yes, Susan joined the IPD as an undergraduate student, through the IPD Undergraduate Summer Research Fellows program (currently accepting applications for summer 2022!). Her status as a Foldit player is unrelated, although her Foldit experience may have influenced her thinking about the project.

If this work becomes part of a scientific publication, then Susan will certainly be included as a co-author, since she planned and executed all analyses of trRosetta local distograms.

bkoep Staff Lv 1

Normally we prioritize puzzles that have the most direct impact on current research (e.g. binder design for outstanding targets). Since we already know Foldit players can design monomers without AlphaFold, we haven't run any monomer design puzzles recently.

However, I agree it could be useful to break things up and see what kinds of monomer designs Foldit players can come up using the new tools!

Bruno Kestemont Lv 1

Problem and Suggestion:
When loading the prediction, the ligand or target are not positioned as before. When loading original, a colour shows the problematic parts. However, it's quite random to try to fix them: select and wiggle? NNM ? (then we turn to another prediction).

Suggestion:
1) show predicion as guide (it's possible to do it anually, loae prediciotn, saven download as guide)
2) As there is a tool to colour related to guide, why not implementing a tool "wiggle to guide" ?
Advantage: we can "automatically" work on red sections, not disturbing the ligand position.

(or is the way Folfit try to align the ligand on the prediction ?)