Foldit

2342: CACHE SARS helicase followup: Round 1

Closed since over 2 years ago

Intermediate Intermediate Intermediate Intermediate Intermediate Intermediate Intermediate Overall Overall Overall Overall Overall Overall Overall Small Molecule Design Small Molecule Design Small Molecule Design Small Molecule Design Small Molecule Design Small Molecule Design Small Molecule Design

Summary

Created: August 18, 2023
Expires: August 25, 2023 at 23:00 UTC
Max points: 100

Description

Compete in a challenge to design a drug targeting the SARS-CoV-2 helicase. Use the small molecule design tools and the compound library panel to find library compounds similar to the starting compound which bind to the active site of the enzyme.

Note: To get the most out of the small molecule design tools, we recommend changing you view settings to the Small Molecule Design Preset.

This puzzle is part of Foldit's participation in the CACHE Challenge. From the set of all compounds submitted in the multiple rounds of puzzles, Foldit scientists will select up to 50 compounds based on the CACHE-provided criteria. Only compounds which are in a commercially available library will be selected, so it's beneficial to make use of the Compound Library panel to search for library compounds similar to your current design. But don't limit yourself to the compound library. You're more likely to get good results by alternating: optimizing the molecule with the small molecule design tools, find the closest library compound, then further refine it with the design tools.

For this puzzle series, we're looking to examine the Structure Activity Relationship (SAR) of the hit compounds from the previous series. As such, we ask that you attempt to find things which are similar to the starting molecule, rather than creating something completely new. There's a Similarity objective which should show when you're going too far afield.

Participation in CACHE puzzles is subject to the CACHE Terms of Participation, in particular “the Challenge IP [including Challenge Compounds] will be made freely available in the public domain pursuant to Creative Commons Attribution Only (CC-BY 4.0 or subsequent versions) licensing terms, with the intent that such Challenge IP may be Used and practiced by Users for any purpose”.

Top groups

1. Contenders
100 pts. 33,307
2. Go Science 65 pts. 33,199
3. Anthropic Dreams 41 pts. 33,020
4. AlphaFold 24 pts. 32,913
5. FamilyBarmettler 14 pts. 32,265
6. Gargleblasters 7 pts. 32,195
7. VeFold 4 pts. 31,953
8. Marvin's bunch 2 pts. 31,592
9. L'Alliance Francophone 1 pt. 31,533
10. Australia 1 pt. 31,507

1. Sandrix72 Lv 1
100 pts. 33,199
2. Bletchley Park Lv 1 93 pts. 33,144
3. gmn Lv 1 87 pts. 33,020
4. Galaxie Lv 1 81 pts. 32,988
5. jeff101 Lv 1 75 pts. 32,987
6. Bruno Kestemont Lv 1 69 pts. 32,917
7. ucad Lv 1 64 pts. 32,914
8. AlphaFold2 Lv 1 59 pts. 32,913
9. nspc Lv 1 55 pts. 32,705
10. rosie4loop Lv 1 50 pts. 32,700

Comments

rmoretti Staff Lv 1

August 18, 2023

Objectives

Objectives in this puzzle are driven primarily by the evaluation criteria used by CACHE.

Maximum bonus: +10 000

Similarity (max +1000)

Gives a bonus if the current compound is "similar enough" to the starting (hit) compound. The "percent similarity" being calculated is not quite linear from a visual perspective (search for Tanimoto Similarity for further discussion), and is different from the similarity value being calculated for the Compound Library.

Compound Library (max +1000)
Gives a bonus if your current compound is in the library. This uses a local cached version of the Compound Library search results to determine if the compound is in the library. If you manually create a compound that happens to be in the library (or if you load a shared solution with an on-library compound), you may need to submit the compound to the compound library search and wait to get the results back before the objective can properly recognize that the compound is in the library. (If the objective is not updating, try wiggling the structure. See this forum post for more discussion.)

Torsion Quality (max +1000)
Keeps bond rotations in a good range. Using Wiggle or Tweak Ligand can fix bad torsions. (Show highlights torsions to be rotated.)

Number of Rotatable Bonds (max +1000)
Intended to keep the ligand from getting too big and floppy. You can reduce rotatable bonds by deleting groups or forming rings. (Show highlights rotatable bonds.)

Ligand TPSA (max +1000)
Topological Polar Surface Area - Keeps the polar surface area (including buried polar surface) low. To improve, try removing oxygens and nitrogens. (Show highlights atoms contributing to higher TPSA.)

Ligand cLogP (max +1000)
A measure of polarity - Keeps the molecule from getting too hydrophobic. To improve, try adding polar oxygens and nitrogens. (Show highlights atoms contributing to higher cLogP.)

Fraction of four-bonded carbons (max +1000)
Measures how carbons with bonds to four atoms ("sp3 hybridized") there are. Too few (too many double and triple bonded carbons) is bad. (Show highlights carbon atoms at issue.)

Bad Groups (max +1000)
Gives a bonus for avoiding groups that interfere with assays, or which are far from the compounds in the library. (Show highlights groups at issue.)

Molecular Weight (max +1000)
Keeps the ligand a reasonable size.

Synthetic Accessibility (max +1000)
Keeps the ligand from going too far from the compounds in the library. (Show highlights parts of the molecule at issue.)

Bletchley Park Lv 1

August 19, 2023

@rmoretti See bug report https://fold.it/forum/bugs/unresolved-compound-window-reports-error-result#post_76912

LociOiling Lv 1

August 19, 2023

See also the blog post, SARS-CoV-2 helicase CACHE Challenge preliminary results, which describes what we're trying to do in more detail.

Unfortunately, the compound library depends on the external ZINC server, which is down at the moment. Compound library searches fail with "Error" in the status field. It's not clear when ZINC will be back online.

rosie4loop Lv 1

August 20, 2023

I wonder why the M.W. limit is 400 and the cLogP limit is 3.5 in this puzzle? At the lead optimization stage it is more difficult with such limit.

Is it because of the "similarity" with starting compound since they are around 200-400 and a cLogP of 2.2-2.3?

If yes, using M.W. as example, several of my top-scoring designs exceeded the M.W. limits of 400 to around 401-410, and the similarity remains 51-60%, getting full similarity bonus. Does it mean the "similarity" index in the objective filter is not accurate enough for that purpose so it's better to use the value 400?

If no, it would be nice to know the reason.

I noticed that the two Foldit hits (Participant ID 1414) got the lowest M.W. among all 46 compounds that proceed to round2. 18 out of 46 of the other hits got their M.W. higher than 400. For cLogP, several hits from other participants has the cLogP above 4.
(See below the screenshot of the (sorted) table, data taken from the excel file provided by CACHE at https://cache-challenge.org/sites/default/files/downloadable/forms/CACHE2_round1_experimental_data_vsF.2.xlsx)

(Edit: add notes about cLogP)

rosie4loop Lv 1

August 20, 2023

Before the ZINC server is up again, if anyone want to check whether your design is a library compound or explore similar structures, maybe try to search directly at the Enamine website instead:
https://www.enaminestore.com/search

rmoretti Staff Lv 1

August 20, 2023

@rosie4loop The thresholds are based off of the CACHE organizers suggestions and their "Traffic Light" ranking system for compounds. See the original CACHE paper for a general overview of the traffic light system. Here's what they specifically laid out for the CACHE #2 Challenge:

Similarity is the Tanimoto similarity (based on RDKit fingerprints), which should be (roughly) normalized for size, so fingerprint similarity in and of itself shouldn't affect what size of compounds pass or don't pass the similarity check.

I was not aware that the other groups were significantly higher than us in molecular weight. We may consider increasing the thresholds to the "amber" stoplight level for future puzzles in this series.

jeff101 Lv 1

August 22, 2023

The spreadsheet rosie4loop posted above for 2342's starting ligand (CACHE-1414-40 above, & Hit1 at https://fold.it/forum/blog/sars-cov-2-helicase-cache-challenge-preliminary-results#post_76907) gives 311 mol wt & clogP 2.3.
Meanwhile, the Small Molecule Properties window within Foldit Puzzle 2342 gives for the same ligand (from top to bottom):
311.36 23 1 4 2.6131 3 2 1 60.15. This includes 311.36 for the mol wt & 2.6131 for clogP. The molecular weights in both places seem to agree, but the clogP's differ by about 0.3. Is this right? Why the difference?

To keep myself amused in Puzzle 2342 while the Compound Library is down, I built the other top Foldit ligand (CACHE-1414-34 above, & Hit2 at https://fold.it/forum/blog/sars-cov-2-helicase-cache-challenge-preliminary-results#post_76907). The spreadsheet above gives 242 mol wt & clogP 2.2 for it. Meanwhile, the Small Molecule Properties window in Foldit gives for it (from top to bottom): 242.278 18 1 4 2.50292 2 2 1 55.12. This includes 242.278 for the mol wt & 2.50292 for clogP. Again, the molecular weights in both places seem to agree, but the clogP's differ by about 0.3. Is this right? Why the difference?

rosie4loop Lv 1

August 22, 2023

I assume it's because Foldit use RDKit in the calculation of cLogP, while CACHE provide the rounded cLogP calculated in ICM (molsoft).

Not sure if the method of calculation is similar or calibrated separately, but the algorithm implementation in different software may cause the difference. Method of determining protonation state and tautomerization prediction may also make a difference.

Reference of the CACHE method.
https://cache-challenge.org/sites/default/files/downloadable/forms/README_CACHE2_experimental%20results_0.docx

Artoria2e5 Lv 1

August 23, 2023

The LogP you see available for any random compound is a guesstimate from the molecule's structure. How each guess is formulated and trained can cause some difference.

(Lights a pipe) Actually it's only called "cLogP" when it comes from a fragment-based method (one of many); otherwise it's just a "sparkling LogP estimate". RDKit and ICM both calls their thing MolLogP, though they are also clearly different from each other.

LociOiling Lv 1

August 23, 2023

Wow, this thread is where the cool kids hang out.

Meantime, the compound library is back, but previously submitted compounds just get an instant "error". It appears something is dwelling on the failures, storing bad results in a cache which might look a lot like a row in a database.

A workaround is to change your compound slightly – add an atom, remove an atom, change an element – then resubmit.

The new compound may get hits, and one of the hits may be your original compound.

Your original compound should then see the compound library bonus.

If the cache belongs to Foldit, perhaps it could be flushed.

Then in the future, "error" results should not be cached, or they should be retried after a decent interval.

(Edit: I'll mention @Sciren just in case this is his area.)