Foldit

bkoep Staff Lv 1

August 31, 2020

The experimental results are in for Foldit player’s 99 binders against the coronavirus spike protein! If you’ve been following along, you know this experiment was planned for earlier this summer, but got held up by some technical problems with our DNA supplier. Well, we found a workaround, got new materials, and ran the binding experiment to test whether any of the 99 Foldit designs bind to the SARS-CoV-2 spike protein.

Unfortunately, we did not see appreciable binding from any of the 99 Foldit designs. Below we’ll walk through the details of the experiment, and we’ll also discuss some exciting news about a successful binder designed by IPD scientists.

The experiment

Our binding experiment uses two techniques called yeast display and fluorescence activated cell sorting (FACS). You can read more about those techniques in a previous blog post.

In short, we put custom DNA into 100,000s of yeast cells, which then display our protein designs on their surface. After mixing our yeast with fluorescent target protein, we can quickly sort through the yeast cells and pick out those that bind to the target.

Figure 1. (A) Schematic of FACS experiment and (B) example scatter plot of fluorescence from a FACS sort. Each point is a yeast cell, with green fluorescence (expression) on the x-axis, and red fluorescence (binding) on the y-axis. Points in the top right corner represent cells with both red and green fluorescence, indicating good expression and binding.

After each sort, we sequence the DNA of just the collected cells (e.g. the cells that showed expression and binding signal). These DNA sequences can be mapped back to the protein designs that were displayed on the yeast cells.

We count how many times we read each design in the sequencing data. A design with a high number of sequencing counts means that a lot of yeast cells displaying this design were collected, and indicates a successful binder.

The data

Below is a preview of the data. You can download the data for all 99 designs here.

pdb_id       	counts1	counts2	counts3*	counts4	counts5	counts6	BUNS	DDG    	SASA    	SC
2008926_c0022	10     	0      	0       	0      	0      	0      	7   	-33.546	1314.890	0.661
2008926_c0023	21     	1      	0       	0      	0      	0      	8   	-33.030	1391.938	0.663
2008926_c0026	30     	13     	0       	0      	0      	0      	12  	-37.822	1621.635	0.584
2008926_c0034	1073   	2357   	0       	0      	0      	1      	12  	-44.100	1656.985	0.648
2008926_c0036	3      	3      	0       	0      	0      	0      	9   	-46.865	1574.854	0.648
2008926_c0037	590    	4026   	0       	45     	52     	144    	7   	-36.222	1633.888	0.569
2008926_c0040	343    	323    	1       	0      	0      	0      	10  	-35.853	1568.804	0.645
2008926_c0042	57     	199    	0       	0      	0      	0      	6   	-31.511	1407.946	0.490
2008926_c0052	2      	0      	0       	0      	0      	0      	6   	-31.936	1445.994	0.555
...

*Note: There was a sequencing error for sort #3, which is why the counts are mostly zeros in the counts3 column. The counts3 numbers do not represent the actual collected fraction from sort #3, and we should disregard those numbers. Fortunately, since sort #3 was an enrichment sort and we have good data for later sorts, we don’t need those counts to interpret the experiment results.

The details

We used a different sorting schedule here than we did in the previous IL6R experiment. In the IL6R experiment, Foldit designs were pooled with a number of IPD designs and were sorted together at the same time. We screened that entire pool against a range of binding conditions (target concentrations from 0.1 to 1000 nM).

In this spike binder experiment, we were able to purify the starting pool so that it was made up almost entirely of Foldit designs. We also took some extra steps to enrich the starting pool, and we only screened against high concentrations of target after enrichment.

Sort schedule

Expression
Enrichment at 1000 nM target
Enrichment at 1000 nM target
Enrichment at 1000 nM target
Binding at 1000 nM target
Binding at 100 nM target

Instead of going directly from the starting pool into binding sorts at different concentrations of target, we first carried out several rounds of enrichment sorting in order to amplify any potential binders. An enrichment sort is very similar to a binding sort, where we select yeast cells that have both expression and binding signal. The experimental conditions are a little more lenient for binding during an enrichment sort.

The important part of enrichment is that the selected fraction of each enrichment sort provides the input for the following sort. If we do this several times in a row, we can drastically enrich the composition of the pool to favor anything that binds even a little bit. This is a way to increase the presence of any weak binders, and helps to ensure we don’t miss anything that was underrepresented in the starting pool.

Figure 2. Diagram of sort procedure. Each bar represents a pool of cells that undergoes sorting. In sort #1, we collect only cells that show high expression (green fluorescence), and these cells become the input for sort #2. Sorts #2-4 are enrichment sorts which should exponentially increase the presence of any binders in the pool. After enrichment, sorts #5 and #6 screen for cells that show binding signal at different concentrations of target.

For each of the sorts in the figure above, we've also noted the percentage of cells that were collected from the sort. In expression sort #1, we collected cells based only on whether they display any protein on their surface (green fluorescence). In sorts #2-6, we collected cells based on whether they bind to the target (red fluorescence).

If there are any successful binders in the starting pool, their prevalence should increase exponentially during enrichment sorts. After a few rounds of enrichment, successful binders will grow to dominate the pool so that the majority of cells show binding.

Unfortunately, after three rounds of enrichment, we still see that <5% of cells show any binding signal at 1000 nM target concentration. This is a clear sign that nothing in the pool binds significantly at 1000 nM target ("easy" binding conditions).

Figure 3. FACS data for Foldit spike binders. Each point represents a single yeast cell displaying a Foldit binder on its surface. The x-axis is intensity of green fluorescence (how much binder is expressed on the cell surface) and the y-axis is red fluorescence (how much target is bound at the cell surface). If there were any successful binders in the pool, we would expect to see a large population in the top right corner of each plot.

Looking at the sequencing counts, we see that a handful of designs did become more prominent during enrichment and show up consistently in the final binding sorts. This does indicate that these designs tend stick to the target somewhat more than other designs in the pool. However, these low numbers are consistent with what we could expect from unfolded non-specific binding, or very weak binding. It is unlikely these designs are folding and sticking to the target as intended, and we cannot expect to improve them by optimization.

A successful IPD-designed binder</3> In separate news, scientists at the IPD have successfully designed a binder for the coronavirus spike protein! This result was recently posted as an online preprint (meaning the paper has not yet been peer-reviewed). Rather than design individual proteins by hand, the IPD scientists used supercomputers to automatically generate millions of designs, then checked whether the designs had good binder metrics. Over 90% of the designs were thrown out because they didn’t meet binder metric criteria. The best designs were then tested for binding using the same kinds of FACS experiments we used to test Foldit designs. Note that this design strategy is not very efficient and requires heavy computational resources. From the millions of initial designs and the 100,000 that were tested, the researchers found only about 100 designs that showed any binding in the lab. Afterward, scientists did some additional optimization on the best binders, trying all different mutations at every site on the protein. The final optimized designs can bind to the coronavirus spike extremely tightly--even more tightly than natural antibodies! Lab tests showed that the binders can stop live virus from infecting human cells in a test tube, but these binders still need to be tested in animals before they can be considered drug candidates for clinical trials. Figure 4. Coronavirus spike binder designed by IPD scientists. On the left, the designed protein binder LCB1 sits at the receptor binding domain (RBD) of the coronavirus spike protein. On the right, lab tests show that this protein (pink trace) is a potent inhibitor of viral infection in human cell culture. Further tests are needed to determine efficacy and safety in whole organisms.

What does this mean for Foldit?

This binder from IPD scientists is great news, and these results help to outline the future direction of binder design in Foldit. First, the scientists’ method gives us more confidence in Foldit design tools. The automated design methods use the same score function that is used to calculate your Foldit score. And the researchers selected designs using the same binder metrics we've discussed previously (DDG, SASA, and shape complementarity). But the strategy of the IPD scientists has some shortcomings. Although these automated methods worked great against the coronavirus spike protein, there are many other binder targets that are poorly suited for this approach. The automated methods work almost exclusively with small 3-helix bundle designs. Other binder targets have convex shapes that aren’t so compatible with a 3-helix bundle fold, or they have protrusions that require special attention. Some binder targets are covered with polar residues that are extremely difficult to satisfy using automatic design. Those hard problems, where our algorithms fail, are precisely the problems where we think Foldit can excel. We’re looking forward to challenging Foldit players with those tricky problems, and we can get started once we’ve fully integrated the binder metrics into Foldit (we’re almost there -- we appreciate your patience!).

In the meantime, we’ve created a sandbox (non-scoring) puzzle so you can explore the IPD binder in Foldit. Check out the LCB1 Coronavirus Spike Binder puzzle, and get ready for binder metrics to come back in future puzzles!

nspc Lv 1

August 31, 2020

awesome news !

It is very interesting to see the potein in fodlit.
The SASA is more than 1900, and the contact zone is a flat zone that is very intesting to fit with tripple helix.

The strange thing is the BUNS that is very hight (30 !). Maybe SASA is more important that BUNS ?

We learns a lot in a working protein, it is very cool.

agcohn821 Staff Lv 1

August 31, 2020

Hey Foldit players! We’ll be discussing BUNS further at the next Office Hour. Day/time TBD–stay tuned!

bkoep Staff Lv 1

September 02, 2020

Our BUNS Objective uses some serious approximations to improve the speed of BUNS calculations. These BUNS are almost certainly false positives.

All of the data we have from high-resolution crystal structures suggests that BUNS are extremely rare in natural proteins. We think that BUNS are very important, at least in principle; but maybe you're right that we should pay less attention to the approximated BUNS in our models.

Since we are already planning to add support for long-running metrics like SASA, it may be worth it to use a more accurate (but slower) algorithm for BUNS in the future.

aofreelancer Lv 1

September 04, 2020

WOW I ALMOST LOST HOPE THAT YOU WILL DO THOSE TEST!
TEST OTHERS DESIGN , AT LEAST 1000 DESIGN MUST BE TESTED!