CACHE #3 and CASP Results Posted

Started by rmoretti

rmoretti Staff Lv 1

CACHE #3

Conscience, the organization behind the CACHE Challenge, has now posted results for the third round, the one against SARS-CoV-2 Nsp3 Macrodomain.

The Foldit Group (Group number 1690) was unfortunately not one of the groups specifically identified as performing well. Despite finding the second highest number of compounds in the first "hit identification" round, all of those compounds were considered by the organizers to be highly similar to compounds already identified. The "winners" of CACHE round 3 were deemed to be those groups which found active molecules which were dissimilar to any of the known molecules.

This was actually a rather hard challenge. While most of the first round hits from all groups were considered to be dissimilar, most were not active. Indeed, only four dissimilar active compounds (one each from groups that had only a single active prediction each) were identified. – That said, I (rmoretti) feel somewhat remiss in not fully appreciating the emphasis that was placed on structual novelty. In retrospect, we likely should have adjusted our puzzle setups and post-processing approaches to more heavily favor novel compounds. Similarity to the starting molecule for identified actives was also a theme with the prior CACHE challenge, and we're currently brainstorming methods & possible new tools to better support scaffold diversification.

CASP

Recently the CASP organizers have also released the ligand results for CASP 16 (Blog post announcing Foldit's participation.) Be sure to click the L2000/L3000/etc. tabs to get further information. We participated as group G201 (Drugit). Unfortunately, the CASP results page isn't quite suited for summary results for these sorts of ligand puzzles (edit: There's a graph page – higher numbers are better in both graphs), but a quick summary is that Foldit performed averagely. Our results are generally middle of the pack. There's some ligands we did respectably on (RMSD < 2.0) but some that we just missed. (And for ligands where we did well other groups also tended to do well.)

Once the final bound structures of all the models are released, we'll be doing some post-analysis on what might be missing. In particular, examining how well Foldit does in scoring the actual ligand position – if Foldit is having difficulties in returning high scores for certain ligands in certain conformations, then you as players have very little hope of figuring out where the molecules dock. (Or at least we wouldn't have been able to pick out that dock as one to submit.)

bravosk8erboy Lv 1

It sounds like they really wanted new compounds. does that mean that anything we found in the compound library really wasn't worth looking at? or do they just not want to see the same compounds that were submitted in previous rounds by our team/all teams?

I also was very confused about whether a specific binding site of the protein was supposed to be used. For myself, I found one site usually gave me the best scores in foldit so I usually only looked there. is it possible some binding sites were never checked?

final question. do I have to upload all my ligand designs to scientists or will foldit do it automatically? is that system based on score or something?

rmoretti Staff Lv 1

By new compounds, they're talking about compounds which bind that weren't known before (by the published scientific literature) to bind and which aren't similar to compounds which were known to bind. By that criteria, the vast majority of the compound library don't count as "known" compounds. (They're compounds which are known to possibly exist, but they're not compounds which are known to bind to Nsp3.) – And this novelty criteria was limited to the results in the first "hit identification" rounds (the "SARS-CoV-2 Nsp3 CACHE Challenge 3" rounds). For the "hit optimization/validation" rounds (the "CACHE SARS Nsp3 followup" rounds), they explicitly only want compounds which are similar to the compounds identified by the Foldit group in the "hit identification" rounds (so no totally new compounds).

For the binding pocket, that was left unspecified. You needed to be in the active site of the enzyme well enough to inhibit the activity of the enzyme, but where exactly that would be is an open question. There's a number of different sub-pockets which we started molecules in for the initial rounds, and some of those may indeed have worked better than the others. It could be that Foldit scores things such that one pocket is favored and the other (one which might have been a better option) isn't scored as well. The CACHE organizers are determining crystal structures of the top-scoring molecules (including some Foldit compounds!), so once those are officially released, we'll have a better sense of how the novel compounds are binding. (Though looking at the structures, I imagine they're more or less in the same location as the hits Foldit came up with, so it's likely not a binding site issue.)

When playing online, the Foldit client does regularly upload to the Foldit server the best scoring structure (by top-line score) from the past ~5 minutes or so. For small molecule design puzzles, we do consider those auto-upload structures. So you don't necessarily have to manually share your designs in order for them to be recorded. However, if there's a particular structure you think is interesting, particularly if it's not the best scoring structure you've been working with recently (or if you've been working offline or are about to close the client), it may be worth sharing with scientists, just to make sure it gets counted. But don't feel the need to do so.

Bruno Kestemont Lv 1

Just to be sure to understand well. I use to share to scientist my best result for a specific second best ligand when I don't believe in further evolution. However, it happens that I'm afk before expiration of the puzzle, and I didn't have time to share the last second best clients/ligands still in progress locally.
Do you mean that the server is able to know them even if I never shared ?

rmoretti Staff Lv 1

Yes, if you have an internet connection to the server, there's an occasional update to the server regarding your recent best structures (even if they aren't the overall best). That should be counted even if things cross puzzle expiration.

jeff101 Lv 1

A book of Abstracts for CASP16 has been posted at https://predictioncenter.org/casp16/doc/CASP16_Abstracts.pdf Foldit/Drugit appears on pp.83-84. Abstracts for other groups doing protein-ligand binding are also listed. Among these groups are arosko (pp.23-24), ClusPro and Kozakov/Vajda (pp.41-42), DeepFold-Interact (pp.72-74), DIMAIO (pp.79-80), GruLab; Convex-PL-R; KORP-PL-W (p.100), Haiping (pp.114-115), Huang-HUST (Lig) (pp.116-117), isyslab-hust (LIG) (p.125), Koes (pp.132-134), KUMC (pp.135-136), LCDD-Team (pp.140-141), McGuffin (pp.149-152), Pfaender (p.203), PocketTracer (pp.211-212), Schneidman (pp.220-221), SNU-CHEM-lig, SNU-CHEM-aff (Ligand) (pp.225-227), VoroAffinity, VoroAffinityB (p.242), Zou_lab (pp.257-259), and CoDock (pp.273-274).

Some groups used a mix of computer tools & human intuition. Among these are CSSB-Human (TS, Assembly) (pp.65-66) and GromihaLab (pp.97-99).

jeff101 Lv 1

For CACHE #2 & #3 and CASP16, it will be neat to see the experimentally-determined ligand-bound protein structures for all the ligands they tested. I'm guessing that for Foldit to more effectively find these structures, some changes to Foldit's scoring function are needed. This might mean including bonuses for objectives like BUNS and hydrogen-bond clusters, like we once had for protein dimer, trimer, and other multimeric complexes. It might also mean better accounting for interactions between the protein, ligand, and halogen atoms (-F -Cl -Br -I). For example, from the article called "Principles and Applications of Halogen Bonding in Medicinal Chemistry and Chemical Biology" by Rainer Wilcken et al in the Journal of Medicinal Chemistry 2013, 56, 4, 1363-1388 (https://dx.doi.org/10.1021/jm3012068), it seems like if Rosetta/Foldit included a hydrogen-bond like term for halogen atoms about 3 Angstroms away from protein-backbone carbonyl oxygens, then Rosetta/Foldit could treat one of the most common ligand-protein halogen bonds. pp.1379-1383 in this article also cite a number of "new" (in 2012) algorithms and forcefields for halogen-bonding working their way through the literature.

jeff101 Lv 1

If certain groups in CACHE or CASP16 did really well at ranking binding affinities for ligands with halogens, it would be good to know how they did this. Did they use forcefields that could be incorporated into Foldit?