Icon representing a puzzle

2461: CASP16 Ligand Puzzle L3000

Closed since almost 2 years ago

Intermediate Overall Small Molecule Design

Summary


Created
May 24, 2024
Expires
Max points
100
Description

Use the Ligand Queue (default hotkey 7) and the Compound Library (default hotkey H) to explore how different ligand bind to the protein.

This puzzle is part of the CASP16 competition. Foldit players are participating to see how well they can predict how small molecules can bind to proteins. Note that in contrast to prior drug design puzzles, we're not just interested in the top scoring small molecule, but instead are interested in getting good structures for each of the provided ligand compounds. Its worth dividing your time across all the compounds, rather than concentrating on a particular one.

The protein target is rat autotaxin There are several dozen structures of this protein in the Protein Databank, many of them bound to ligands. There are 219 ligand structures of interest in this competition. The starting small molecule is one of the co-crystalized ligands, and is provided just to indicate the likely binding site. It's not one of the molecules in the competition - you'll need to use the Ligand Queue and Compound Library tools in order to load one of the other ligands.

Due to the large number of compounds in the puzzle, not all of them are present in the Ligand Queue tool. Only a selection are present, and the rest must be searched from the Compound Library. All the compounds should be present from a "one level deep" search of the compounds: either the compounds from the Ligand Queue, or the search results from the compounds in the ligand queue.

Since the goal is to predict the structure of the protein ligand complex, we've allowed full backbone and sidechain flexibility on this puzzle. -- That said, all of the bound structures are highly similar to each other (and thus to the starting structure). The backbone is very unlikely to change at all from the starting conformation, and there a restraints (unchangeable bands) to the starting conformation -- these will show up as red lines if you move the backbone too far. It should also be noted that we haven't given you the full 846 amino acid long protein the CASP organizers are looking for, but just the ligand binding domain.

Sequence
FTASRIKRAEWDEGPPTVLSDSPWTATSGSCKGRCFELQEVGPPDCRCDNLCKSYSSCCHDFDELCLKTARGWECTKDRCGEVRNEENACHCSEDCLSRGDCCTNYQVVCKGESHWVDDDCEEIKVPECPAGFVRPPLIIFSVDGFRASYMKKGSKVMPNIEKLRSCGTHAPYMRPVYPTKTFPNLYTLATGLYPESHGIVGNSMYDPVFDASFHLRGREKFNHRWWGGQPLWITATKQGVRAGTFFWSVSIPHERRILTILQWLSLPDNERPSVYAFYSEQPDFSGHKYGPFGPEMTNPLREIDKTVGQLMDGLKQLRLHRCVNVIFVGDHGMEDVTCDRTEFLSNYLTNVDDITLVPGTLGRIRAKSINNSKYDPKTIIAALTCKKPDQHFKPYMKQHLPKRLHYANNRRIEDIHLLVDRRWHVARKPLDVYKKPSGKCFFQGDHGFDNKVNSMQTVFVGYGPTFKYRTKVPPFENIELYNVMCDLLGLKPAPNNGTHGSLNHLLRTNTFRPTMPDEVSRPNYPGIMYLQSEFDLGCTCDDKVEPKNKLEELNKRLHTKGSTKERHLLYGRPAVLYRTSYDILYHTDFESGYSEIFLMPLWTSYTISKQAEVSSIPEHLTNCVRPDVRVSPGFSQNCLAYKNDKQMSYGFLFPPYLSSSPEAKYDAFLVTNMVPMYPAFKRVWAYFQRVLVKKYASERNGVNVISGPIFDYNYDGLRDTEDEIKQYVEGSSIPVPTHYYSIITSCLDFTQPADKCDGPLSVSSFILPHRPDNDESCNSSEDESKWVEELMKMHTARVRDIEHLTGLDFYRKTSRSYSEILTLKTYLHTYESEIGGRHHHHHHHH

Top groups


  1. Avatar for Anthropic Dreams 100 pts. 29,514
  2. Avatar for Contenders 2. Contenders 56 pts. 28,699
  3. Avatar for Go Science 3. Go Science 29 pts. 28,239
  4. Avatar for Marvin's bunch 4. Marvin's bunch 14 pts. 26,267
  5. Avatar for L'Alliance Francophone 5. L'Alliance Francophone 6 pts. 26,009
  6. Avatar for VeFold 6. VeFold 2 pts. 25,608
  7. Avatar for FamilyBarmettler 7. FamilyBarmettler 1 pt. 24,719
  8. Avatar for Australia 8. Australia 1 pt. 24,314
  9. Avatar for Foldit Staff 9. Foldit Staff 1 pt. 17,321
  10. Avatar for Void Crushers 10. Void Crushers 1 pt. 17,221

  1. Avatar for Simek 41. Simek Lv 1 2 pts. 17,221
  2. Avatar for Larini 42. Larini Lv 1 1 pt. 17,054
  3. Avatar for utmutt 43. utmutt Lv 1 1 pt. 16,894
  4. Avatar for Merf 44. Merf Lv 1 1 pt. 15,855
  5. Avatar for Phobos04 45. Phobos04 Lv 1 1 pt. 15,310
  6. Avatar for Dr.Sillem 46. Dr.Sillem Lv 1 1 pt. 13,902
  7. Avatar for Sciren 47. Sciren Lv 1 1 pt. 13,591
  8. Avatar for rinze 48. rinze Lv 1 1 pt. 13,189
  9. Avatar for Deleted player 49. Deleted player 1 pt. 13,154
  10. Avatar for hada 50. hada Lv 1 1 pt. 13,124

Comments


rmoretti Staff Lv 1

Objectives

Maximum bonus: +5 000

Torsion Quality (max +1000)
Keeps bond rotations in a good range. Using Wiggle or Tweak Ligand can fix bad torsions. (Show highlights torsions to be rotated.)

Compound Library (max +4000)
Bonus if the compounds in the structure is one of the desired compounds.

The compounds available from the Ligand Queue should be in the Library, but they will not give the Compound Library bonus until you search for them with the Compound Library tool. If a compound isn't showing a bonus, even after submitting it to search, try running "wiggle".

LociOiling Lv 1

The properties of the CASP 16 L3000 compounds are available on the wiki.

As before, the list was manually transcribed, and may contain errors. Fewer compounds and a slightly more organized approach mean there are probably fewer errors than in last week's L4000 list.

jeff101 Lv 1

In Puzzle 2461, the starting ligand (I call it ic, but Loci calls it L3000) gives -141065.734 -50 (-50 torsion, 0 library bonus).
It has the small molecule properties from top to bottom: 663.727 45 3 7 7.2274 10 1 5 99.1.
If you send ic to the compound library, it gives 31 hits numbered 1-17 18a 18b & 19-30.
None of these hits is an exact match for ic.

Ligand L3228 from the ligand queue, when sent to the compound library, gives 30 hits. Hit 1 is an exact match for L3228.

That's what I know so far. It would be nice to know how many hits each ligand from the ligand queue gives when sent
to the compound library. It would also be nice to know if any of these libraries overlap with each other. Perhaps if one
adds up all the hits these libraries give, the total will be 219. Is the 219 supposed to include ic or not?

Since the ligand queue gives the ligands L3160 L3179 L3187 L3221 L3228, it would be nice if
L3160 gave just 19 hits in the compound library (as if these hits were L3160-L3178),
L3179 gave just 8 hits in the compound library (as if these hits were L3179-L3186),
L3187 gave just 34 hits in the compound library (as if these hits were L3187-L3220), &
L3221 gave just 7 hits in the compound library (as if these hits were L3221-L3227).
For now, I label hits like ic-18a for hit 18a in ic's library of 31, L3228 = L3228-1 for hit 1 in L3228's library of 30,
and L3228-30 for hit 30 in L3228's library of 30. Perhaps later it will become obvious which hits from each
library match hits from other libraries.

jeff101 Lv 1

The protein in Puzzle 2461 has 512 protein segments + 1 ligand. 24 of the protein segments are cysteines,
and it looks like in the starting structure with score -141065.734 -50, 18 of the cysteines give 9 different
disulfide subscores, as if each of the cysteine pairs 31-35, 46-59, 75-92, 80-110, 90-103, 96-102, 121-167,
129-323, & 339-441 forms its own disulfide bond. The remaining 6 cysteines (48 52 58 66 386 & 486) all have
0 for their disulfide subscores. Should we try to make all 24 cysteines end with nonzero disulfide subscores?
Perhaps some of these cysteines actually make disulfide bonds with cysteines in the large chunk of protein
omitted from this puzzle.

LociOiling Lv 1

Compound L3101 doesn't get the 4000 point compound library bonus. The load library panel for L3101 shows entries, 1a and 1b, both similarity 1.0. Entries 1a and 1b don't get the compound library bonus, either. L3101 entry 2, with similarity 0.208955, does get the compound library bonus.

rmoretti Staff Lv 1

It would be nice to know how many hits each ligand from the ligand queue gives when sent to the compound library

There should be 30 "base" compounds returned from each search. This may result in multiple results (e.g. a/b) as sometimes the compound specification from the CASP organizers are missing details about which direction hydrogens should be put on, for example. (The a/b/… are enumerating the possibilities.) The results should be by similarity (the results are not a pre-specified list - it's a similarity search across the full pool of structures), so there's likely some overlap of compounds, particularly near the lower ends of the similarity scale.

Should we try to make all 24 cysteines end with nonzero disulfide subscores?

You can take a look at the parent structures in the PDB to see about the crystalized disulfide bonding patterns. There are indeed inter-domain disulfides (e.g. 386) and there are some free cysteines (e.g. 486), and some of the actual disulfides might not be bonded properly properly (e.g. those in the 31-66 range). But I wouldn't necessarily concern myself too much with the modeling of the disulfides – it's the location and binding of the ligand we're primarily interested in.

Compound L3101 doesn't get the 4000 point compound library bonus.

Unfortunately, this doesn't seem fixable without a client update. The technical issue is that L3101 has some substructures (the nitrogen containing ring structures) which are technically chiral, but whose handedness can be ignored in certain situations. There's apparently a mismatch in how we're looking up the structure versus how we're storing the list of known compound library compounds. There theoretically shouldn't be, but for some unknown reason there is.

rmoretti Staff Lv 1

The puzzle has been updated to remove L3101 from the Ligand Queue and replace it with L3193 instead. The L3101 ligand should still show up in the compound library results, but hopefully not being in the Ligand Queue will make it less of an annoyance.

You may need to restart your client to get the updated puzzle definitions. – Note that the old definitions should work perfectly fine, so there's not necessarily a need to.

BootsMcGraw Lv 1

Is anyone else as disappointed as me that FoldIt has degraded from "designing for science" down to "3-D Tetris"?

When is the sunset for FoldIt? Can't be far off.

rosie4loop Lv 1

From the blog

The goal for the CASP competition is to predict how the small molecule binds to the protein target. CASP participants will get the protein target and the list of small molecules, and will be asked to submit the structure of each ligand bound to the protein.

To be fair At least it's the current CASP competition objectives, its not design but binding instead. Like protein folding competition in the past, its folding not design.

From the science point of view, having a tool allowing accurate prediction of how ligand binds a protein is useful for drug design.

I do want to see design puzzles back though. Even if it's just design sandbox with properties filter for teaching pharma basics.

rmoretti Staff Lv 1

Figuring out how a compound binds to the protein is a very important part of small molecule design. You can design a molecule which would be a fabulous binder, but if you can't put it into the protein properly, then there's no way to know if it actually does bind. Additionally, if a compound is in the protein incorrectly, your attempts to make it better ("oh, I could make a hydrogen bond here and fill a pocket with a ring there") are not necessarily going to be useful, as the parts of the protein it is actually interacting with aren't the parts you think its interacting with.

CASP is giving us a great way to test the available Foldit tools to see how well this part of the design process works. It isolates just this portion of the design process, and the results will give us feedback on how well the current set of tools are working. If Foldit does well, we know that it's other portions of the design process we should focus efforts on. If we don't do so well, we can dig though the results to figure out where the Foldit tools fall short, and what might be the best way of addressing the issue. (We've already gotten good feedback from players about what you're looking for in this sort of task, and we thank you for that.)