Icon representing a puzzle

2348: CACHE SARS helicase followup: Round 3

Closed since over 2 years ago

Intermediate Overall Small Molecule Design

Summary


Created
August 18, 2023
Expires
Max points
100
Description

Compete in a challenge to design a drug targeting the SARS-CoV-2 helicase. Use the small molecule design tools and the compound library panel to find library compounds similar to the starting compound which bind to the active site of the enzyme.

Note: To get the most out of the small molecule design tools, we recommend changing you view settings to the Small Molecule Design Preset.

This puzzle is part of Foldit's participation in the CACHE Challenge. From the set of all compounds submitted in the multiple rounds of puzzles, Foldit scientists will select up to 50 compounds based on the CACHE-provided criteria. Only compounds which are in a commercially available library will be selected, so it's beneficial to make use of the Compound Library panel to search for library compounds similar to your current design. But don't limit yourself to the compound library. You're more likely to get good results by alternating: optimizing the molecule with the small molecule design tools, find the closest library compound, then further refine it with the design tools.

For this puzzle series, we're looking to examine the Structure Activity Relationship (SAR) of the hit compounds from the previous series. As such, we ask that you attempt to find things which are similar to the starting molecule, rather than creating something completely new. There's a Similarity objective which should show when you're going too far afield.

For this round we're back to the first compound, but we're starting in a potentially different binding orientation. We've also opened up the flexibility of the sidechains in the binding pocket and loosened the thresholds of some of the objectives.

Participation in CACHE puzzles is subject to the CACHE Terms of Participation, in particular “the Challenge IP [including Challenge Compounds] will be made freely available in the public domain pursuant to Creative Commons Attribution Only (CC-BY 4.0 or subsequent versions) licensing terms, with the intent that such Challenge IP may be Used and practiced by Users for any purpose”.

Top groups


  1. Avatar for Go Science 100 pts. 35,844
  2. Avatar for Anthropic Dreams 2. Anthropic Dreams 65 pts. 35,560
  3. Avatar for Gargleblasters 3. Gargleblasters 41 pts. 32,742
  4. Avatar for Contenders 4. Contenders 24 pts. 32,450
  5. Avatar for AlphaFold 5. AlphaFold 14 pts. 31,801
  6. Avatar for Australia 6. Australia 7 pts. 31,788
  7. Avatar for FamilyBarmettler 7. FamilyBarmettler 4 pts. 31,726
  8. Avatar for L'Alliance Francophone 8. L'Alliance Francophone 2 pts. 31,645
  9. Avatar for Void Crushers 9. Void Crushers 1 pt. 31,401
  10. Avatar for VeFold 10. VeFold 1 pt. 31,288

  1. Avatar for Sandrix72
    1. Sandrix72 Lv 1
    100 pts. 35,815
  2. Avatar for gmn 2. gmn Lv 1 93 pts. 35,549
  3. Avatar for Galaxie 3. Galaxie Lv 1 87 pts. 33,961
  4. Avatar for LociOiling 4. LociOiling Lv 1 80 pts. 33,644
  5. Avatar for ichwilldiesennamen 5. ichwilldiesennamen Lv 1 74 pts. 33,399
  6. Avatar for ucad 6. ucad Lv 1 69 pts. 33,266
  7. Avatar for nspc 7. nspc Lv 1 64 pts. 33,234
  8. Avatar for jeff101 8. jeff101 Lv 1 59 pts. 33,183
  9. Avatar for rosie4loop 9. rosie4loop Lv 1 54 pts. 32,983
  10. Avatar for Bruno Kestemont 10. Bruno Kestemont Lv 1 50 pts. 32,960

Comments


rmoretti Staff Lv 1

Objectives

Objectives in this puzzle are driven primarily by the evaluation criteria used by CACHE.

Maximum bonus: +10 000

Similarity (max +1000)

Gives a bonus if the current compound is "similar enough" to the starting (hit) compound. The "percent similarity" being calculated is not quite linear from a visual perspective (search for Tanimoto Similarity for further discussion), and is different from the similarity value being calculated for the Compound Library.

Compound Library (max +1000)
Gives a bonus if your current compound is in the library. This uses a local cached version of the Compound Library search results to determine if the compound is in the library. If you manually create a compound that happens to be in the library (or if you load a shared solution with an on-library compound), you may need to submit the compound to the compound library search and wait to get the results back before the objective can properly recognize that the compound is in the library. (If the objective is not updating, try wiggling the structure. See this forum post for more discussion.)

Torsion Quality (max +1000)
Keeps bond rotations in a good range. Using Wiggle or Tweak Ligand can fix bad torsions. (Show highlights torsions to be rotated.)

Number of Rotatable Bonds (max +1000)
Intended to keep the ligand from getting too big and floppy. You can reduce rotatable bonds by deleting groups or forming rings. (Show highlights rotatable bonds.)

Ligand TPSA (max +1000)
Topological Polar Surface Area - Keeps the polar surface area (including buried polar surface) low. To improve, try removing oxygens and nitrogens. (Show highlights atoms contributing to higher TPSA.)

Ligand cLogP (max +1000)
A measure of polarity - Keeps the molecule from getting too hydrophobic. To improve, try adding polar oxygens and nitrogens. (Show highlights atoms contributing to higher cLogP.)

Fraction of four-bonded carbons (max +1000)
Measures how carbons with bonds to four atoms ("sp3 hybridized") there are. Too few (too many double and triple bonded carbons) is bad. (Show highlights carbon atoms at issue.)

Bad Groups (max +1000)
Gives a bonus for avoiding groups that interfere with assays, or which are far from the compounds in the library. (Show highlights groups at issue.)

Molecular Weight (max +1000)
Keeps the ligand a reasonable size.

Synthetic Accessibility (max +1000)
Keeps the ligand from going too far from the compounds in the library. (Show highlights parts of the molecule at issue.)

Artoria2e5 Lv 1

So this is the docking-derived pose for lead compound 1? A bit contorted, but… the pi-cation interaction sure makes sense…

jeff101 Lv 1

In puzzles 2342 & 2345, I sometimes got much better scores for the starting ligands far from their given starting positions. When this happens, how can we know which positions to take seriously and which to dismiss as unlikely? For example, is it better for the ligand to bind in a cleft or valley on the protein surface than as a new bump protruding from the protein surface? Is it better for the ligand to bind deep inside the protein than closer to the protein surface? Can you list certain segments you want our ligands to bind to? Can you list certain segments you don't want our ligands to bind to? Can you list a certain distance from the ligand's starting position within which you'd expect our ligands to bind? Thanks!

rosie4loop Lv 1

I agree it'd be nice to have these info, if possible.

Although I believe it's difficult to answer without an experimental structure or other experimental evidence on the residue contribution in binding. I don't see much info about it on CACHE website (https://cache-challenge.org/challenges/finding-ligands-targeting-the-conserved-rna-binding-site-of-sars-cov-2-nsp13), I haven't gone through all the details there, though, I mainly just checked the computational methods for future reference.

Afterall, it's a helicase, which binds the big RNA molecule to unzip the zipper, and we are targeting the RNA-binding pocket. (see https://apps.thesgc.org/SARSCoV2_pocketome/icm/nsp13_html.html) It needs to be able to bind RNA on multiple surfaces, and must have the flexibility to serve its purpose.

The starting molecule in this puzzle 2348 and puzzle 2342 looks just like a nucleotide (building block of DNA/RNA), in my opinion it's reasonable that there are multiple possible sites of interactions. For which binding site is more probable for binding or more effective in inhibition of the enzyme probably need to review literature about it, but it'd be nice if Foldit developers have these information to share with us.

Usually if it's a project I work on, I'll need to do enough literature review to search for these kinds of experimental evidence. Unfortunately I tried to avoid covid-related projects in previous years except for teaching, so I don't have much knowledge on it.

But even if we have an experimental model, relaxing it with a forcefield or redocking with a program would change the coordinates a bit, ideally only minor changes compared to experimental structures. These changes are forcefield/method dependent, may reflects method accuracies or sometimes just happens, or in nature it really has multiple alternative poses, or sometimes can be a problem in the experimental model.

rosie4loop Lv 1

From a quick glimpse of the link provided by cache that I also posted earlier (https://apps.thesgc.org/SARSCoV2_pocketome/icm/nsp13_html.html) my guess would be staying around the RNA-binding site that have overlap with crystallized fragments. (scroll to the bottom of the page and click fragment hits to see it in the viewer, where purple spheres overlapping the RNA ribbon)

And focus on interaction with conserved residues (segments) around there. Don't get it towards the ATP binding site since it's not conserved, and avoid the orange patches indicating mutated residues.

rmoretti Staff Lv 1

We really don't know how these compounds bind - we just have experimental evidence that they do.

The goal is to find compounds which compete with the RNA in the RNA binding site of the protein. The compounds that we're starting with are in those pocket, so anything in the same general area will likely work. But we don't have any particular residues which we know are interacting with the compound, or any particular interactions which we definitely need to meet. Something which binds well in the same general area is what we're looking for.

In fact, this "Structure Activity Relationship" phase is attempting to investigate where/how the ligand binds. If we have generally the same ligand, but with slight differences, then that can help narrow down what the binding mode is – you want a binding mode compatible with all the structures which bind, and where the compounds which don't bind have a plausible explanation (e.g. they clash with the protein) for why they don't. (The assumption being that all the compounds in the series will bind in more-or-less the same way, which is an assumption which doesn't always hold, but holds often enough to be useful.)

BootsMcGraw Lv 1

Would love to play this round, but once again, no library function available, and no additional playing time to make up for the once again broken tools. Is it any wonder this game is hemorrhaging players?