How to do ligand puzzles - A short guide

Started by Floddi

Floddi Lv 1

Introduction:

I want to explain how I design a new compound in the small molecule puzzle because I think it can be pretty hard to get a hang of it :/
(Info: I have good knowledge of organic chemistry and biochem. I might refer to things which aren't obvious to everyone. Please feel free to ask questions :D)

View options:

I really recommend you to create a new profile just for this type of puzzle with the following options enabled:

  • Show clashes, (exposed, voids)
  • Show bonds (side chain, non-protein) (it can also be useful to turn it on for helix and sheet)
  • Color: Score/Hydro+CPK (CPK is needed to identify bondable groups easier! It also helps to identify acidic/basic groups more easily)
  • View Protein: Cartoon ligand (as thin as possible, hydrogens are visible(!))
    → keep in mind that the atoms have different radiuses (You can use "Sphere" from time to time, but it's impractical to use. Showing clashes is the better option.)
  • View hydrogens: Show bondable or all H (normally bondable H is visibly clearer and those sidechain H's are normally wiggled away and small enough to be ignored.)
  • View Sidechains: Show All (MOST IMPORTANT! Otherwise, it’s impossible to create a good molecule!)
  • Show sidechains with clashes/exposed
    Without using the right View settings it's way harder to design a molecule. (These options may vary with personal preferences)

What I am doing:

I tried to keep it vague and I am not refering to a specific puzzle.
It's really important to understand the properties of AA's in proteins (pH, polarity…) and also some basics about organic chemistry. Sadly I cannot teach all the basics in this guide :(
That said, let's get into it:

→ The only important bonus points are torsion quality, (number of rotatable bonds), bad groups and synthetic accessibility (generally speaking), especially in the beginning it can be useful to turn off bonus points when shake/wiggle to have comparable results because the molecular weight has a huge impact but will be fully gained sooner or later anyways.

  1. The starting molecule is normally in the center of the binding site (where the adjustable AAs [amino acids] are) and is a good area to start your molecule. Therefore, I deleted nearly everything except a few carbons and one functional group. e.g. ethanol or methylamin forming hydrogen bonds to 1-2 sidechains. Then I shake (low CI, then CI=1.00) and repeatitly wiggle (CI=0.6, then 1.00, highest wiggle power).
  2. I try to build/add an aromatic cycle out of/to the structure which was "created" in the first step. The cycle should fit inside the formed pocket as good as possible. It's useful to know common organic compounds to do so. (You might wanna turn on IsoSurface (V) to see the space you've got). Wiggle like in the 1. step. - It's just trial and error with educated guesses. This structure might be changed in the following steps. (Maybe wait with the wiggle until you also did step 3)
    → I like to use: benzene, naphthalene, azulene and indene or heteroaromatic cycles (too many to name them all). Spiro compounds can also be used.
    Keep in mind that you need a direction to expand the molecule further! This is also important in the following steps.
  3. The hydrogen atoms of the cylce might get in the way with the sidechains or there are basic/acidic sidechains, so you substitute CH-groups with nitrogen atoms. Methyl groups or halogens can be used to fill out spaces around the "base" cylce. Instead of methyl groups oxygen or nitrogen can be used to bind to the backbone or sidechain. If you are using said groups they have to form at least two hydrogen bonds. It can be advisable to add methyl groups on N or O to keep the TPSA lower (important for later) or remove bad groups and fill out spaces. Watch the torsion quality and adjust the atoms accordingly. An imin can be better than an ether in some cases.
    I would recommend using bands to form hydrogen bonds between AA sidechains/backbones and functional groups (sometimes you have to shake+wiggle [CI<0.05] one sidechain specifically and then remove clashes by wiggling the OTHER sidechains away)
    → Ideally, you've reached a good number of points with a negative molecular weight (not important if it's already positive)
  4. Extend your molecule further by adding another (aromatic, not necessarily) cylce. It can be useful to try out different types of "bridges" between both cylces. I.e. you can use a methyl group, form an ether, link them as amins (or sometimes carbonic acids/derivates of it - high TPSA!), ideally you form hydrogen bond(s) with the bridge. Repeatitly wiggle and make educated guesses (trial and error) (IsoSurface is really useful). Always focus on the whole molecule! Sometimes you have to use another base cycle and go back to step 2.
    → Bonus points: Full bonus should be given by now. (Puzzle specific bonuses might be missing still.) If the torsion quality is lowered (<700; never go lower!) try to use different C, N (see if tert. amin is better) or O as bridges OR add/remove groups in ortho position of both cycles as it will also impact the torsion quality. (Compound library bonus could be archived here.)
    The created structure should fit tightly into the binding site. It's not uncommon that I create up to 50 different compounds (derivates which are based on the cycles) until I find a good one. Be prepared that this isn't set in stone :D
  5. Extend the structure with another cycle and basically repeat step 4.
    → As you are extending the molecule, TPSA, cLogP and the number of rotatable bonds will become more and more important. Primary amins and carbonic acids + derivates will heavily impact TPSA and large homoaromatic compounds will raise the cLogP significantly. It can be useful to adjust those groups or even remove them in the development process.
    Normally, you just have to edit the cycle which is directly connected to the newly added one to gain a better score. The ortho position is the most important one to solve tension problems.

Now its just a repetition of step 5 until you've reached the limit which begins with the loss of bonus points. The number of rotatable bonds, the TPSA and the cLogP might be reduced a bit if you gain more than you lose. There can be one bad group (850 or 900) but really try to avoid them because they're unstable and will probably not be used IRL. Keep in mind that you can add more than one cycle to another cylce so you dont have to make it "linear".

After you've created a structure which isn't scoring any better you should use recipes to maximise the score (note score before the usage!) OR you delete large parts of the molecule and rebuild them (or even start from scratch).
For the last 2 puzzles I designed 2-3 totally different molecules because I just started from step 1. If you do so keep in mind that your newly built molecule can be improved by recipes so compare them with the noted score of the old ones to estimate the gains.

Knowledge is power. The more you know about OC the easier it gets to expand the structure without running into bad groups or significant torsion problems and especially the synthetic accessibility. But sooner or later you'll understand what's good/bad without understanding the chemistry behind it. (And sometimes it's just a little bit of luck ;D.)

Elfi Lv 1

Thank you, Floddi!

I find this very helpful. I have updated my view settings to follow your specifications. Except for one. Under View protein, I find the setting Binding site very helpful as it only shows the area of the protein nearest to the ligand.

Floddi Lv 1

Thanks for the feedback!

I just compared the two settings in puzzle 2706 and it seems like it's only showing tweakable AA's while hiding the non-tweakable ones. This isn't a problem as long as you stay fully in the suggested binding site, which I didn't in this puzzle ;D
So sometimes it's useful to consider parts of the static structure if you are extending your molecule. It seems like the visible areas of the protein aren't growing with the molecule itself. (While "it only shows the area of the protein nearest to the ligand" seems to be right in most cases, there are a few exceptions.)

"Binding site" and "Cartoon Ligand" are both depicting the important H atoms of the ligand (I think it's the same). "Binding site" can make it easier to see burried molecules (maybe use x-ray tunnel, personally I hate it.) and have a more compact structure but in some cases it can be a disadvantage (see above). The similarities are overweighing the differences :)

rosie4loop Lv 1

Nice guide with clear formats!

I think those introduction puzzles in Foldit already taught amino acids, so this guide should be good for those who start to try more advanced puzzles.

From my experience it'd be even clearer if figures of the 3D structures of the functional groups mentioned in the guide are shown the text. It'll be particularly useful for players without a science background but still want to contribute to science by playing puzzles here.

I haven't got a lot the time to play this year but back when I was more active my step 1 was the same :)
In terms of visualization, for puzzles with bigger proteins, binding site view can be quite useful, and I feel the same Xray tunnel make it harder except for making screenshot ;)

Again, thank you for sharing this comprehensive guide!

Floddi Lv 1

@rosie4loop I've considered to contribute to the wiki as the information is better accessable/preserved there.
I didn't know you could add photos! I think I will make a step by step with a new puzzle to make it more beginner friendly :)

Some of the "good" functional groups are already accessable through the menu (Fragment selection) but I think bad groups are hard to comprehend without chemical knowledge.

Thanks for the feedback!

rosie4loop Lv 1

@Floddi Sounds great! Looking forward the step-by-step guide!

In addition, for those prerequisites knowledge like amino acids, or e.g. common ADMET stuff of bad groups, you could point to the introduction puzzle, post a link to wiki or libretext page.

I made step-by-step notes here on the non-game version of Foldit a few years ago on this forum, e.g. this post on custom electron density setup and this post small molecule puzzle setup, see if this kind of outline can be useful for your writing.

Floddi Lv 1

@rosie4loop Thanks for your guides! I had a quick look into it and I have to take my time to read it taking my time :)

The wiki is pretty weak considering the information around bad groups :/ Many players seem to struggle with it.
While the introduction puzzles are a nice way to learn a few basics it's not enough.

rosie4loop Lv 1

@Floddi According to the posts by devs (https://fold.it/forum/posts/76093), bad groups usually include e.g. PAINS patterns and something additional, which may differ according to the setup or aims of different puzzles.

I have post some links to serveral bioinformatic tools in my profile that detect commonly flagged patterns (also in the comments of the same discussion under the dev's reply)

(unfortunately I had to take the jupyter notebook that demonstrates SMART detection offline for several reasons)

Floddi Lv 1

@rosie4loop Thx for the info that bad groups depend on the puzzles! Also sad that they aren't sharing any information which group is considered as bad. It might be useful to reference those PAINS patterns on the wiki. There are also chemically unsustainable groups or reactive groups which will impact the score negatively. I think those are also part of the preferred "bad group"-list.

SwissADME also looks like a good database for the developement of new structures. Thanks!

Elfi Lv 1

@Floddi, thanks for explaining why it may sometimes pay to use the full view over the binding side view. I'll look at it once in a while.

@rosie4loop, thx also for your posts on electron density, bad groups and ligand design.

You have both given me a lot to think about and study. :)

By the way, one thing I have wondered about but haven't found explanations about yet, is the starting ligand in each new medicine design round. Where does it come from? Is it the winner from the last round? I hadn't considering deleting and start my own before Floddi mentioning deleting part of it. I have rather been modifying it.