Converting general compounds into ones in Foldit's Compound Library:

Started by jeff101

jeff101 Lv 1

Another search I did began with my goal compound's SMILES code N#Cc3c(N)cc(O)c4c1C=NNCc1c2CC(=O)N=Cc2c34. I copied this code into the box above the molecule viewer at https://zinc20.docking.org/substances/home/ and then started removing things from inside the molecule viewer (use the red X button there). First I removed the -NH2, -OH, and cyano groups to end with O=C1Cc4c(C=N1)c2ccccc2c3C=NNCc34. Then I replaced many things with *'s to get O=*1**4*(*=N1)*2*****2*3*=NN**34, as shown below (the *'s and R's are wildcards that stand for general atoms):
Then I replaced the acceptors O= and =N with [#7,#8] and the donor N (NH in the molecule viewer) with [#7,#8;!H0] to get [#7,#8]*1**4*(*[#7,#8]1)*2*****2*3*[#7,#8][#7,#8;!H0]**34, as shown below: where L stands for an acceptor atom, ? stands for a general bond, and * can stand for a general atom or a donor atom. When I clicked on the SMARTS button, I was hoping for a list of compounds with 4 fuzed 6-rings containing at least 3 acceptors and 1 donor. Unfortunately, when I clicked on the SMARTS button, it gave no hits.

jeff101 Lv 1

Next, I started removing the acceptor and donor groups one by one. Sometimes I got hits, like ZINC #'s 3850789, 95561635, 216739553, and 256055642, but for the search that interested me the most, I had removed all of the nitrogen groups to get [#7,#8]*1**4*(**1)*2*****2*3*****34 and https://zinc20.docking.org/substances/?sub_id-matches-sma=%5B%237%2C%238%5D*1**4*%28**1%29*2*****2*3*****34, which gave many hits. Some, like ZINC # 151847900, contained many Cl atoms. Others, like 161972224, had many long chains. Several were 6-way symmetric, like 2149617 with 6 -O-CH3 groups, 38229869 with 6 -OH groups, 72099670 with 6 -O-C(=O)-CH3 groups, and 1570009217 with 6 -NH2 groups. Others were more 2-way symmetric, like 35307227 with 2 -O-CH3 groups and 117800865 with 2 -OH groups. This search also gave me the sense that a design with 5 fuzed 6-rings shaped like a Y gave more options for having asymmetric side groups, and using a 5-ring at the tip of the Y gave even more options.

rosie4loop Lv 1

Apologies if my post was offensive, I never meant to say it's bad to use external source. In the past when I have time I use it to play the puzzle. In real research I do use ZINC more than designing by hand.

I wasn't trying to show off, in fact I'm using highly similar designs in many puzzles because I limit the structures that can be use. It's just lucky that they work in this puzzle. Sometimes they score well, other times just average.

That's why I share just the logic, not my designs.

I was just trying to share the logic I used, which actually can be applied combining the use of external source.

I do agree that external source is useful for exploring idea. This helps increase diversity.

rosie4loop Lv 1

I appreciate all the efforts and comprehensive examples which gives new ideas, to make it easier to explore new substructure that contains e.g. desirable donor acceptor properties.

I was just trying to put another idea here, that IN ADDITION to exploring new structures from ZINC, this method can be combine with strategies in game. E.g. search for scaffolds with matching donor acceptor placement, and replacing substructure in current design. Or combing two ring systems with more synthetically friendly way, look for alternative rings on ZINC, see if the combination is in library.

I'm not good at drug design at all, personally, whether I design by hand or screen a database. It's not related to a score in Foldit, whether I score poor or score well I know I am not good at it, from my real life experience. That's why I am always impressed by the players here, and looking forward to see how creativity of community can outperform those who limits our imagination too much.

rosie4loop Lv 1

That's why I say I always limit the structure diversity in the design I have, because I only use certain lego pieces to put the legos together. Getting new lego blocks from ZINC is often a good idea. I just try to share a way to put the lego pieces together.

rosie4loop Lv 1

I edited my first post hopefully not as misleading as before. My bad if I was using a misleading tone in the last version, I aimed to share an idea that I believe may be useful if we combine with other example in this thread.

I always mentioned, also in the first version that I "limit diversity of designs". I want to clarify that its why I shared the logic to see if anyone could take it if they found that useful, and create something much better.

jeff101 Lv 1

One thing I do when I get a large number of search results is to print what my web browser is showing into a pdf file. Then later I can look at the pdf file instead of re-running a particular search. Often these pdf files have many clickable links, like https://zinc20.docking.org/substances/ZINC000044930779/ for ZINC compound # 44930779, that take you directly to a ZINC compound's web page (using https://zinc20.docking.org/substances/ZINC44930779/ also works).

If a search gives too many results, you can add extra clauses to its html to filter its results. For example, https://zinc20.docking.org/substances/?sub_id-matches=O=S=O lists many compounds containing SO2 groups, while https://zinc20.docking.org/substances/?sub_id-matches=O=S=O&mwt-ge=193 lists only the 12 with molecular weight (mwt) >= 193. It seems like the first clause always starts with ? while later clauses always start with &. You can add multiple clauses, like in https://zinc20.docking.org/substances/?sub_id-matches=O=S=O&sub_id-matches=CN&mwt-gt=160&mwt-lt=180 that finds certain compounds with mwt between 160 and 180, but it is also possible to add too many clauses.

You also don't really need to know how to code in SMILES to use the ZINC20 site. The molecule viewer at https://zinc20.docking.org/substances/home/ lets you draw a compound and then convert that drawing into SMILES code. You can also enter SMILES code, SMARTS code, or a ZINC compound # in the box above the molecule image box, and it will draw an appropriate structure for you.

Below are some sites that give more details about the ZINC database, SMILES code, and SMARTS code. This list is by no means comprehensive. What I've learned has come from many web searches and trial & error.
https://zinc20.docking.org/substances/help/ (its Hybrids section lists properties like hba, logp, mwt, tpsa, etc.)
https://www.daylight.com/dayhtml_tutorials/languages/smiles/index.html
https://www.daylight.com/dayhtml/doc/theory/theory.smiles.html

https://www.daylight.com/dayhtml/doc/theory/theory.smarts.html
https://www.daylight.com/dayhtml_tutorials/languages/smarts/index.html
https://www.daylight.com/dayhtml_tutorials/languages/smarts/smarts_examples.html

jeff101 Lv 1

After reviewing the sites https://www.daylight.com/dayhtml/doc/theory/theory.smarts.html and
https://www.daylight.com/dayhtml_tutorials/languages/smarts/index.html listed above, I tried something new. In SMARTS, the . symbol seems to let you find molecules containing multiple groups. For example, if you want compounds containing 16 or more chlorine atoms, go to https://zinc20.docking.org/substances/home/ and put Cl.Cl.Cl.Cl.Cl.Cl.Cl.Cl.Cl.Cl.Cl.Cl.Cl.Cl.Cl.Cl in the blank above the molecule viewer. You'll see 16 ClH molecules appear inside the molecule viewer. If you then click on the SMARTS button, you get https://zinc20.docking.org/substances/?sub_id-matches-sma=Cl.Cl.Cl.Cl.Cl.Cl.Cl.Cl.Cl.Cl.Cl.Cl.Cl.Cl.Cl.Cl, which lists 12 compounds each with 16 or more chlorine atoms.

The . symbol allows other things. For example, with *(C#N).*(N).*=N-N*.*=N*(=O)*.*(O) above the molecule viewer at https://zinc20.docking.org/substances/home/, it shows 5 molecules as below: If you then click on the SMARTS button, it gives https://zinc20.docking.org/substances/?sub_id-matches-sma=*%28C%23N%29.*%28N%29.*%3DN-N*.*%3DN*%28%3DO%29*.*%28O%29, which lists 4 molecules containing all 5 groups, like ZINC # 1614990562 below: Note that the -NH2 group isn't so obvious. I think its N appears at one vertex of the pentagonal ring.

jeff101 Lv 1

Going back some, the goal compound N#Cc3c(N)cc(O)c4c1C=NNCc1c2CC(=O)N=Cc2c34 from the start of this thread:contains 5 N atoms (3 with 0 H's, 1 with 1 H, & 1 with 2 H's) and 2 O atoms (1 with 0 H's & 1 with 1 H). Using the . symbol in SMARTS, one can enumerate these several ways. The literal way is [#7;H0].[#7;H0].[#7;H0].[#7;H1].[#7;H2].[#8;H0].[#8;H1], where #7 stands for N, #8 stands for O, H0 stands for no H's, H1 for 1 H, and H2 for 2 H's. Another way is to let N's & O's be used interchangeably, as in [#7,#8;H0].[#7,#8;H0].[#7,#8;H0].[#7,#8;H0].[#7,#8;H1].[#7,#8;H1].[#7,#8;H2]. Another way lets the original -NH2 be replaced by -NH & -NH, -NH & -OH, or -OH & -OH, as in [#7,#8;H0].[#7,#8;H0].[#7,#8;H0].[#7,#8;H0].[#7,#8;H1].[#7,#8;H1].[#7,#8;H1].[#7,#8;H1]. Another way tries to keep nearby groups together, as in [#7,#8;H0]*[#7,#8;H0].[#7,#8;H0].[#7,#8;H0][#7,#8;H1].[#7,#8;H1].[#7,#8;H2] or [#7,#8;H0]*[#7,#8;H0].[#7,#8;H0]***([#7,#8;H2])**[#7,#8;H1].[#7,#8;H0][#7,#8;H1]. Which way to choose depends on what groups matter most to you and what compounds each way gives. Usually some compromise is needed.

Going with the last combination above gives:With this, clicking on the SMARTS button gives https://zinc20.docking.org/substances/?sub_id-matches-sma=%5B%237%2C%238%3BH0%5D*%5B%237%2C%238%3BH0%5D.%5B%237%2C%238%3BH0%5D***%28%5B%237%2C%238%3BH2%5D%29**%5B%237%2C%238%3BH1%5D.%5B%237%2C%238%3BH0%5D%5B%237%2C%238%3BH1%5D with at least 100 hits. If your ligand is for a puzzle like 2419 that wants 4 or less rotatable bonds (rb), you can add the clause &rb-le=4 (rb <= 4) or &rb-lt=5 (rb<5) to the search to get https://zinc20.docking.org/substances/?sub_id-matches-sma=%5B%237%2C%238%3BH0%5D*%5B%237%2C%238%3BH0%5D.%5B%237%2C%238%3BH0%5D***%28%5B%237%2C%238%3BH2%5D%29**%5B%237%2C%238%3BH1%5D.%5B%237%2C%238%3BH0%5D%5B%237%2C%238%3BH1%5D&rb-le=4 which gives only 71 hits. Two of these hits are ZINC #'s 9144826:and 1578205461:In ZINC # 9144826, the goal compound's cyano, -NH2, -OH part is replaced by an N in the lower left pentagon, the nearby -NH2, and the nearby -NH-. Meanwhile, the goal compound's =N-NH- part is replaced by =N-OH, and its =N-C=O part is replaced by the =N-O-N= in the lower right ring. Similarly, in ZINC # 1578205461, the goal compound's cyano, -NH2, -OH part is replaced by the left ring's O, the nearby -NH2, and the nearby -NH-. Meanwhile, the goal compound's =N-NH- part is in the right ring, and its =N-C=O part is replaced by the =N-C-O- going to the right from the right ring. Neither of these ligands is a perfect match for the goal compound, but they both come close to its general topology. With a little twisting, they might fit where the goal compound would fit.