Remember CASP? It’s back … in ligand form.

Started by rmoretti

rmoretti Staff Lv 1

CASP16 is now live! CASP (the Critical Assessment of Structure Prediction) is a biennial competition to determine the state-of-the-art for protein structure prediction. It was at CASP14 in 2020 where AlphaFold2 first made waves as an artificial intelligence tool which “solved” protein folding.

“Solved” is in quotes here because as good as AlphaFold is at some tasks, there are a number of other structure prediction problems where it doesn’t work well. Since CASP14, the CASP organizers have increasingly focused on those harder tasks. CASP16 has 7 different prediction categories, many focusing on these “unsolved” tasks. Determining protein-ligand interactions is one of these categories.

In particular, for CASP16 the CASP organizers have obtained the data for a number of drug-design based “super targets”. Several sets of these super targets will specifically involve ligands that are of high relevance to pharmaceutical research, offering participants a chance to work on problems directly related to therapeutic development. These are systems where a pharmaceutical company has solved the crystal structure of a single protein when it is bound to a wide range of drug-like small molecules, and have generously shared all those structures with the CASP organizers. The goal for the CASP competition is to predict how the small molecule binds to the protein target. CASP participants will get the protein target and the list of small molecules, and will be asked to submit the structure of each ligand bound to the protein.

We felt that this drug-design related task would be an excellent opportunity for the Foldit community to participate in CASP, so today we’re launching a series of puzzles on the CASP targets!

These puzzles are slightly different from the regular small molecule design puzzles we’ve been recently posting (including the CACHE challenge puzzles). Instead of an open design where novel structures will be tested, the CASP competition has a pre-specified list of compounds where they’re interested in knowing the (already determined) structure. The goal isn’t to find which compounds are best, but rather to determine for each small molecule how the compounds bind.

This is potentially a slight challenge for the current setup of Foldit puzzles - instead of the results being a single top scoring solution per puzzle, what we’re looking for is multiple structures for each puzzle - one per compound identity. As such, if you want to be competitive for the CASP challenge, it’s important to spend time improving each different compound, and not just chase the single best compound. (Due to the number of ligands involved, it’s not possible to have independent puzzles for each compound.) From the list of all solutions which make it to the server, we’ll break them out by small molecule, and pick the structures to submit from the top-scoring solutions for each ligand.

We realize that the current setup might not be the best for this sort of puzzle. We’re looking into ways of improving the Foldit client to present such compounds in a more convenient fashion. But due to the CASP timeframe, we wanted to start the puzzles as soon as we could.

Please see the puzzle descriptions for each puzzle to get more information about the systems and how we’re asking you to sample the compounds. And make sure to ask any questions you have in the puzzle comments for each CASP puzzle. As mentioned, the more different compounds you’re able to work with, the more comprehensive and better the results we'll be able to submit.

jeff101 Lv 1

It looks like some CASP16 results have been posted. Our group is probably G201 DrugIt.
https://predictioncenter.org/casp16/ligand_results.cgi?target=L3001&phase=2
https://predictioncenter.org/casp16/ligand_results.cgi?target=L3002&phase=2
https://predictioncenter.org/casp16/ligand_results.cgi?target=L5001v1&phase=4
Each of the above gives a table with many rows for solutions & many columns for various metrics.
If you click an arrow at the top of a column, it re-orders the table's data based on that column.
I don't know what it all means right now. I also don't know if there are more tables listing DrugIt results.

rmoretti Staff Lv 1

RMSD is a lower-better metric. LDDT is a higher-better metric. (LDDT_pli is the protein-ligand interaction metric, whereas the LDDT_lp is the ligand pocket metric.) Kendall's tau is a higher-better metric.

So in the graphs, it's better to be on the left side of the graphs. (For the Affinity prediction, we're actually negative, which means we would have been better off had we reversed the order of the rankings.)