SARS Nsp3 CACHE Challenge preliminary results

Started by rmoretti

rmoretti Staff Lv 1

We have more preliminary results to share from the CACHE challenge!

The CACHE Challenge

Earlier this year we launched a puzzle series as part of the CACHE Challenge. CACHE is like CASP for small molecule drug design – it’s an independent, blind prediction task to see if computationalists can do a good job of predicting which small molecule can bind to a given protein structure. We've previously worked on CACHE Challenge #2 (SARS Helicase), and we're still awaiting results on the followup of that effort. This next Foldit CACHE puzzle series is for CACHE Challenge #3.

Since the initial puzzle series ended, we've taken all the designs which Foldit players created in the five rounds, filtered by various quality metrics as well as to make sure they were part of the compound library. Approximately 2000 compounds made the cut, and were redocked into the protein to make sure that the designed binding mode was specific. From the redocking results, we ranked the compounds, selected ~100 of them to send to the CACHE organizers for ordering and testing.

These ~100 compounds were then combined with the rest of the molecules submitted by the other ~22 participants, and those ~1700 compounds were the basis of the recent CACHE SARS Nsp3 Reranking challenge puzzle series (the results of that effort are still to be determined).

Results

Since then CACHE has ordered and tested the compounds. Due to the difficulties of chemical synthesis, not all the compounds which we submitted were able to be tested, but 81 of them were. The CACHE organizers and their collaborators did a number of different types of assays to make sure that the compounds were binding to the protein, weren’t binding nonspecifically, and didn’t have odd aggregation or other such interference.

And Foldit players designed two compounds which passed this initial screen! Congratulations to Sandrix72 and ucad for their designs. (If you'd like to be mentioned in future Foldit blog posts and papers, you can go to https://fold.it/profile/edit or click the gear icon in the upper right when logged in to change the “Foldit can share my username” setting.) Both successful compounds were from Round 2, though they weren’t the top scoring compounds from that round.

Hit 1; SMILES CC(CN(C)c1c2c3cc(ccc3[nH]c2ncn1)[Br])C#N

Hit 2; SMILES Cc1c2c(NC(CC(C)(C)C)=O)ncnc2[nH]c1C

Foldit did relatively well in its submission - only 6 out of 23 groups had compounds selected for being advanced to the next phase, and only 12 compounds in total were advanced. In the intial screens, both compounds have an affinity of around 40 µM, which is reasonable but not fantastic. It's possible one of the derivatives found will be better!

Onto the next phase

Since we have compounds which passed preliminary screening, we’ve been invited to participate in the next phase! In this phase, we’re asked to explore the “structure activity relationship” of the compounds we had success with. That is, can we find compounds which are similar to the compounds we submitted, but which have better binding affinity?

Similar to the first phase, only compounds which are in the compound library will be considered. Additionally, we need to submit compounds which are “close enough” to our hit compounds. There isn’t a hard threshold on this, but the intent is to make the hit compounds better, rather than come up with completely novel compounds. Also keep in mind that we don’t have experimental structures of the protein-ligand complex, so the starting location of the compound may not be where or how it actually binds.

UPDATE – Additional compounds!

The CACHE organizers have gotten back to us with good news. It turns out that Foldit players designed four additional compounds with potential activity! These compounds weren't detectable in the initial round of screening because they likely had solubility/aggregation issues under the original assay conditions. When the experimental conditions were adjusted, these additional compounds were discovered. (I should note that the Foldit group is unlikely to be alone in getting additional compounds. While the CACHE organizers haven't mentioned details, other groups have likely also picked up additional compounds.)

Hit 3; SMILES OC(CNC=1N=CN=C2NC=3C=CC(Br)=CC3C12)CC#C Round 2

Hit 4; SMILES CC=1NC=2N=CN=C(NCC3NC(=O)CC3(C)C)C2C1C Round 3

Hit 5; SMILES CC=1NC=2N=CN=C(NC3CCCC3(C)C)C2C1C Round 3

Hit 6; SMILES CCNC(=O)[C@@H](NC=1N=CN=C2NC=C(CC)C12)C(C)C Round 2

Congratulations to nspc and Bruno Kestemont for coming up with these compounds!

Since we have more compounds, we're running a few extra weeks of CACHE #3 puzzles for some of these new puzzles. We hope that these new starting points can let you explore new areas of structure/activity space, finding additional new potential compounds.

Participation in CACHE puzzles is subject to the CACHE Terms of Participation, in particular “the Challenge IP [including Challenge Compounds] will be made freely available in the public domain pursuant to Creative Commons Attribution Only (CC-BY 4.0 or subsequent versions) licensing terms, with the intent that such Challenge IP may be Used and practiced by Users for any purpose”.

jeff101 Lv 1

Wow! Congratulations to nspc and Bruno Kestemont (as well as Sandrix72 and ucad) for all their successful designs. This is all very good news.

Assuming nothing else changes, how many more Foldit CACHE #3 puzzles do you plan to have? Will there be one new puzzle per new hit, giving 4 new puzzles (Puzzle 2401 plus three more)? Will each one of these new puzzles let us design things based on any of the 6 hits shown above? I'd imagine many players have ideas based on hits 1 & 2 that they would still like to explore (I know I do).

spvincent Lv 1

Can we get a sense of which compounds, despite maybe being high scoring, don't work? There are doubtless too many to list individually but maybe some generalisations are possible.

jeff101 Lv 1

I'm curious what puzzles each hit came from and what scores they got in those puzzles.

Also, based on Foldit's hits so far for CACHE Challenges #2 and #3, do you have a sense
for how Foldit's scoring function could be changed to better predict what will be hits?

nspc Lv 1

in foldit the measurement is mostly for when the ligand is already attached. But that doesn't measure the path it takes to become attached.

There are also criterias in cache challenge that we do not have on foldit which eliminated certain ligands if I understood correctly.

Yes, it seems that small, simple ligands that don't make that many points seem to work. So should we do more of this kind and share more?

I'm also curious to know who each of the 4 ligands are. I think I know which one is mine, but out of the hundred that I've shared, I don't really remember.

rmoretti Staff Lv 1

@jeff101 Due to submission deadlines, we're likely not going to be able to run all 4 additional ligands. We're anticipating probably having time for ~3 extra puzzles (inclusive of the one currently running).

As you may be able to tell from the images, the hit compounds are rather similar to each other. The puzzle setup should be such that you'll be able to work with any of the compounds, and the similarity objective will kick in for whichever is "closest". Feel free to mix and match ideas from any of the rounds – the starting structure is mainly just a jumping off point.

I've added info about which rounds the hits came from. Pulling out scores might be a bit more tricky. I'll see if I can get around to it.

Using the results of CACHE to better the Foldit scoring is actually one of the main things I'm interested in looking at going forward. I hope that once the results of CACHE are released, I can use some of the ideas from other participants to help improve compound selection. If there's anything in there which would lend itself to incorporation into the client itself, that would be ideal. (Having the working compounds not being the high-scoring compounds isn't really what we want. Ideally, we would want to just be able to submit the top-scoring compounds and have those be the ones which work.)

@spvincent That's a good idea. We don't have info from other groups, but we do know which Foldit-submitted compounds didn't make the cut – I'll see if I can put together something that might be useful.

@nspc Your compound was Hit 6. (Hit 3, 4 & 5 all came from Bruno).

We tried our best to have the objectives in Foldit match the CACHE criteria. I don't think there were any major ligand-type considerations which caused a ligand not to be tested. The main filtering criteria were whether it could be purchased from the supplier, and whether it redocked well into the protein at the designed location. Now that we're getting the results back from CACHE, we intend to do a bit more extensive analysis of the post-processing and scoring, and try to figure out how best to change things to optimize results and have the Foldit score better reflect the chance of things working.

jeff101 Lv 1

https://cache-challenge.org/cache-news has a May 9, 2024 post about CACHE Challenge #3 that says:

9 participants to CACHE3 had their predicted SARS-CoV2 NSP3 hit(s) experimentally confirmed in Round1. They each selected up to 50 analogs to generate SAR. We just received 296 compounds from Enamine. CACHE3 Round2 experimental screening is about to begin.

The schedule on the site below says they will share experimental data for this challenge July 1, 2024:
https://cache-challenge.org/challenges/Finding-ligands-targeting-the-macrodomain-of-sars-cov-2-nsp3

Elfi Lv 1

I was recommended reading this story by @Floddi as an example that some of the molecules that didn’t have the highest score in Foldit worked in actual lab anyway.

I went to Cache challenge's site as I was interested in seeing if the results had come in. I see that they have ongoing challenges. Did they ever give back data afterwards as the story promised?

I decided to check if they had published any papers with Cache in the title. I found a preprint from 2025:

CACHE Challenge #3: Targeting the Nsp3 Macrodomain of SARS-CoV-2
https://chemrxiv.org/engage/chemrxiv/article-details/68cd59c19008f1a467a03cbf

This is a fun read. I hunted down the places Foldit was mentioned and followed the trail:

"Another four workflows adopted different strategies to leverage Nsp3-bound fragments in the PDB. WF1690
relied on the gaming challenge Drugit within the online platform Foldit24. Here, citizen scientists interactively
grow fragments inside a 3D rendering of the binding pocket, employing a tool which leverages the ZINC API25
to then find the closest commercial analog."

"Two of the workflows delivering the most potent but not novel compounds employed design strategies that
implied the preservation of chemical scaffolds already in the PDB: fragment growing (WF1690) and fragment
linking (WF1716)."

"As in CACHE #2 (WF1414), a citizen scientist using the Drugit platform to grow fragments bound to the target
structure designed an excellent compound in CACHE #3 (WF1690): CACHE3HI_1690_48 binds Nsp3-MAC1
with a KD ~ 10 μM (Table 1). This result is sobering, as design strategies developed by computational chemistry
experts do not outperform the creation of online gamers (though Drugit participants may be experienced
medicinal chemists), but also up-lifting, as so far human neural networks are at least as creative as artificial
ones. One should also keep in mind that the molecules invented by Drugit citizen scientists were further
evaluated with specialized software (see above) before being submitted to CACHE #3.

Human judgement may also have played an important role in the final evaluation step of computationally
selected molecules: while 66% (six out of nine) of the top-performing teams explicitly specified in their
workflows that visual inspection of the computationally selected molecules would be applied to finalize their hit
list (Figure 7), only 43% (10 out of 23) of all CACHE #3 participants did so (Figure 2)."