The Foldit protein design paper

Started June 05, 2019 by bkoep

bkoep Staff Lv 1

June 05, 2019

Today, the scientific journal Nature published a paper titled De novo protein design by citizen scientists, all about the work of Foldit players!

The paper is written for an audience of professional scientists, and gets somewhat technical. This blog post is meant to summarize the main points of the paper, so that everyone can appreciate the significance of this achievement. If you have trouble accessing the paper on the Nature website, try this view-only online version or check the Baker Lab website.

What is 'de novo' protein design?

The Latin phrase de novo translates literally to “from the new”—we usually use it to mean “from scratch.” Veteran Foldit players will recognize this phrase from De-novo Freestyle Foldit puzzles, where players fold up a protein from a completely unfolded starting position (i.e. from scratch), rather than from a partially-folded starting position.

In the field of protein design, this phrase has a special meaning. De novo protein designs are created without referencing the sequences of natural proteins.

To illustrate, you could imagine designing a 3-helix bundle protein just by looking at the sequences of natural 3-helix bundles and choosing the most common amino acid at each position. Since we have lots of data about natural protein sequences, and powerful ways to extract patterns from data, this method is relatively easy. But it will only ever let us design proteins that are similar to natural proteins.

On the other hand, de novo protein design is much more difficult. Rather than relying on patterns in massive datasets, de novo design requires an understanding of the physical principles behind protein folding. The advantage is that we can use de novo methods to design brand new proteins that are unlike any proteins found in nature.

Why is protein design hard?

A designed protein must fold entirely on its own, without direction or instruction from any outside source.

The number of possible folds for a protein is huge, and a protein dissolved in solution is generally free to sample any of those possible folds. But if the protein sequence is chosen carefully, then the protein chain will have lower energy in one fold than in any other, and the protein will naturally prefer that lowest-energy fold.

It is difficult to choose the sequence because there are also many possible protein sequences (more than there are atoms in the universe!). And, once we choose a sequence for our target fold, we cannot check all the possible folds to ensure that our target fold has the lowest energy.

For a deeper discussion about the difficulties of protein design, see this previous blog post.

How can computer gamers design proteins?

Figure 1 below shows the Foldit game interface. Foldit players have a number of tools that allow them to change both the fold and the sequence of a virtual protein. The player's score is calculated from the energy of the virtual protein, with a state-of-the-art energy function developed by academic protein scientists. By competing with one another to reach the highest score, Foldit players arrive at virtual proteins with extremely low energies (a high Foldit score corresponds to a low protein energy).

Since energy alone is not enough for protein design, the Foldit team has had to make some adjustments to the Foldit score function. Every step of the way, we’ve relied on the work of Foldit players to expose problems with our score function. Foldit players are excellent at exploring new kinds of protein folds that are unlike anything seen in nature. For this reason, Foldit players are incredibly helpful for identifying unanticipated weaknesses in our energy function, and ultimately can improve our understanding of protein folding.

Figure 1. Protein design in Foldit.

How do Foldit players actually design proteins?

Figure 2 shows that Foldit players design proteins much differently than automatic protein design algorithms. From start to finish, players will routinely accept huge penalties (high-energy spikes; colored traces in panel 2a), that ultimately pay off with low-energy designs.

Panel 2b shows snapshots of some key moments in the design of the protein Foldit1, by players Susume, Waya, and Galaxie.

Automatic algorithms, on the other hand, can only accept very small penalties, and they do so less frequently (gray traces in panel 2a).

Figure 2. Protein design strategy of Foldit players

How do virtual Foldit designs behave in real life?

Figure 3 shows data from the lab tests that we perform on protein designs from Foldit players.

The first thing to note, in panel 3a, is that these proteins are extremely diverse and span many different protein folds. Due to the amount of planning and creativity required to conceive a protein fold, a protein engineer will usually focus on a small number of protein folds for a given task. This paper reports a greater number of protein folds than any other protein design paper to date—including a brand new fold that is not observed in any natural proteins!

Panels 3c-f show that these proteins are very well-behaved both on the computer and in the lab. The plots in panel 3c show that Rosetta@home computer simulations predict the designs will fold accurately (details here).

Panels 3d-e show that the proteins don’t aggregate together, and are rigidly structured in solution. And panels 3f-g show that the proteins do not unfold except in extremely harsh conditions (read more here. Most natural proteins unfold with only 3-5 kcal/mol of energy; many of the designed proteins are hyper-stable and require >10 kcal/mol!

Figure 3. Foldit player-designed proteins are well-behaved in the lab

How do we know that the proteins fold up as designed?

Since proteins are smaller than the wavelength of visible light, we can’t see them directly under a microscope. However, in some cases we can use very intensive techniques to determine the structure of a protein indirectly (read more here and here). We used these techniques to solve high-resolution structures of 4 proteins designed by Foldit players.

Figure 4 shows the exact placement of atoms in the real-life protein structures, which is nearly identical to the virtual protein design in every case.

Figure 4. Foldit player-designed proteins fold accurately as designed.

So, what does this all mean?

This is a huge accomplishment for Foldit players! De novo protein design is a very new field, and already citizen scientists are making significant contributions—not just by designing new proteins, but also by helping us improve our understanding of protein design. We hope that scientists in other fields will be able to find similar ways to engage public creativity and enthusiasm, to increase our understanding of the world.

Now that Foldit players can accurately design high-quality proteins from scratch, we can start to challenge Foldit players with more applied protein design problems. We’d like Foldit players to help us design new proteins that can assemble into multi-component structures and materials, or that can bind to biological targets as potent medicines, or that can degrade toxic chemicals!

Because Foldit depends on the cooperation and competition of its player community, our scientific ability grows rapidly with the number of Foldit players. We look forward to expanding the Foldit community and recruiting more creative and curious Foldit players!

Help us design a protein for cancer treatment right now, by playing Puzzle 1683: Integrin Antagonist Design!

Susume Lv 1

June 10, 2019

Looking at Supplementary Info Table 1, I see that some designs were expressed and soluble, but not monomeric. It looks like the test for predicted secondary structure was not carried out on these. Is it possible that these designs folded up as designed but stuck together in well-formed dimers, trimers etc? And would the test for secondary structure then show the expected ratios? Or is there reason to believe that not being monomeric means they were ill-formed in other ways as well? It seems like a design that folds as intended but sticks together in well-formed pairs could be counted a success.

bkoep Staff Lv 1

June 13, 2019

In general, the proteins that failed the "monomeric" test were very poorly behaved—not simply a well-folded dimer or trimer. Either they associated into massive, soluble aggregates (much bigger than dimers or trimers), or else they were poly-disperse (they clumped indiscriminately into all different sizes, without preference for a single, well-defined state). In fact, if a protein fails to fold correctly, we usually expect it to aggregate in this way so that all of the hydrophobic residues are still shielded from solvent.

Below is one such example, from 2002089_1029 (which you helped design!):

If this protein were monomeric, we would expect to see a peak at about 15 mL. Instead we see multiple strong peaks before 13 mL (indicating something larger than a monomer), and as early as 9 mL (indicating some really big aggregates).

I would also argue that even a well-folded dimer cannot be considered a design "success" if the protein was designed as a monomer. I think a "successful" design should necessarily behave as the designer intends. All of the protein designs in this paper were from "Monomer Design" Foldit puzzles, which were set up specifically for proteins that do not associate at all in solution.

Alain L Lv 1

June 21, 2019

Hello,

I want apologize, first, for my basic english…I'm french.
A friend of mine is sick. She has lost a part of is stock of neurons. A specialised doctor said her, that in a few months a new category of protein would be invented, and her aphasia could decline.
What do you about that ?
Thank you for your answers.

beta_helix Staff Lv 1

June 26, 2019

Malheureusement Foldit ne participe pas a ce traitement visé.

Même si nous avons de grands espoirs pour Foldit de contribuer au développement de remèdes de cette maladie et d'autres, tout ce que nous faisons est du genre "début de recherche".

Si quelque chose sort de Foldit, il est probable que cela prendra 10 ans ou plus entre le moment où les joueurs de Foldit contribueront à un puzzle et le moment où le résultat de cette contribution influencera réellement le traitement administré aux patients.

Bruno Kestemont Lv 1

July 09, 2019

That's perfect French speaking !

Bruno Kestemont Lv 1

July 09, 2019

Je pense que le médecin fait peut-être reference à l'article de 2016 ou les joueurs Foldit ont découvert par hazard une nouvelle famille de proteines en travaillant sur une proteine associée à des dégénérescences du cerveau liées à la toxicité de de la surproduciton d'amyloides.

Beaucoup de joueurs Foldit ont pour motivation de contribuer à la connaissance fondamentale des proteines, soit qu'ils connaissent des gens atteints de maladies incurables ou dégénératives pour lesquelles on n'en sait pas encore assez, soit qu'ils en soient eux-mêmes victimes (ce qui ne les empêche pas de contribuer de manière très efficace à la science).

Depuis la publication de ces résultats, d'autres chercheurs ont peut-être creusé la piste et, qui sait, l'industries pharmaceutique s'est-elle peut-être déjà lancée dans la mise au point de remèdes.

Même si la recherché pharmaceutique doit passer par d'indispensables étapes qui mettent des années (tests divers pour verifier les effets secondaires), il est donc permis d'espérer.

Courage d'ici là !

Bruno