ligand queue interface and compound identification

Started by LociOiling

LociOiling Lv 1

The ligand queue feature is used in puzzle 2360 for the first time.

The ligand queue borrows its interface from a manual remix. Instead of displaying new positions for a few selected segments, the interface lets you browse a list of molecules. Each molecule is different than the others. The right and left arrow buttons let you navigate through the list.

As in the remix interface, you click "+" and then "+" again to save a molecule for further work. Any molecules you save this way end up in a quicksave slot. Up to eight quicksave slots are available.

Alternately, just closing the ligand queue lets you do further work on the current molecule, skipping the complexity of dealing with the quicksaves.

The ligand queue also has some unique features. Instead of just having a numbered entry like the remix interface, each ligand queue entry also has a "ligand id" using the SMILES format for molecule identification.

Here's what this looks like:

The SMILES string is "N#Cc1ccc(N2CC3CC4CC3C2C4O)c(F)c1". Unfortunately, there's no way to copy and paste that value. Instead, you have to transcribe it by hand. As I have learned the hard way, you have to be careful about mistaking the letter O (for oxygen) for a zero (0). Generally speaking, you'll see a lot of Os, and 0s should be rare to non-existent.

LociOiling Lv 1

Once you have a SMILES string, you can see what your compound looks like in other tools.

Without the ligand queue halo, here's what the compound from the previous post looks like in Foldit:

Here's what the compound looks like in Jmol, a molecule viewer:

Jmol reveals the chemical formula, C15H15FNO2. Jmol also identifies each atom when you hover over it.

The Jmol pose has the ring the fluorine atom rotated about 180 degrees from the Foldit pose. Jmol doesn't let you refine the structure in the way that Foldit does.

LociOiling Lv 1

Many online tools also can handle SMILES. This tool at cheminfo.org can generate a schematic molecule diagram from a SMILES string, and also lets you draw the schematic to get the SMILES.

This tool also validates SMILES. This points out a limitation of SMILES: there can be more than one SMILES string for a given molecule.

Here, Foldit used "N#Cc1ccc(N2CC3CC4CC3C2C4O)c(F)c1" as the SMILES string. When I pasted that into the SMILES checker (under "paste a lit [sic] of SMILES on the right", the tool gave the SMILES string as "N#Cc4ccc(N2CC1CC3CC1C2C3O)c(F)c4" (under "SMILES code", on the left). The two diagrams are slightly different. The diagram on the left explicitly identifies the carbon that's triple-bonded to nitrogen. The diagram on the right doesn't explicitly label this carbon atom, but the structure is the same.

LociOiling Lv 1

SMILES also appear in log.txt, but there are few issues.

The first issue is that log.txt is buffered, so changes don't get written to disk immediately. Messages containing the SMILES for the most recent compounds are sitting in Foldit's memory, and you won't find them in log.txt immediately. Closing Foldit forces the messages to be written to log.txt.

The second issue is that it may be hard to identify a given compound in log.txt. Foldit assigns each compound an internal name like "LG_5006", which appears in messages, but that name may change when you run Foldit again. The "LG_5006" names don't appear anywhere in the game, so there's no good way to correlate the messages to what's happening in the game.

The third issue is that log.txt messages use a more complete version of SMILES, one which identifies all hydrogens. This results in a much longer SMILES string, and makes it more difficult to compare two different SMILES.

The section of log.txt shown below illustrates these issue.

For this example, I closed Foldit while the compound shown in my previous posts was loaded. Then I opened Foldit again, and closed it again almost immediately. In theory, that means Foldit had to generate only one compound.

The messages in question look like this one, this first in a series of messages:

library…interactive.rosetta_util.ligand.ThreadedRDKitRotamerLibrarySpecification: {0} It took 0.342s to generate 15 conformers for LG_5006: [H]c1c(C([H])([H])[H])noc1C([H])([H])n1c([H])nc2c(N([H])[H])nc(N([H])[H])nc21

I search for "conformers" to find these messages.

That first message actually isn't my compound. My compound contains a fluorine atom, and there's no F in that message.

Foldit then proceeds to generate that same compound, LG_5006, five more times. I have no idea why.

Finally, Foldit generates my compound, which turns out to be called LG_50032. In with all hydrogens included, the SMILES string is:

[H]O[C@]1([H])[C@@]2([H])N(c3c([H])c([H])c(C#N)c([H])c3F)C([H])([H])[C@@]3([H])C([H])([H])[C@]1([H])C([H])([H])[C@]32[H]

That's quite a bit longer than the hydrogen-free version shown on the ligand queue entry:

N#Cc1ccc(N2CC3CC4CC3C2C4O)c(F)c1

Unless you are really a SMILES expert, it's a little difficult to tell that those two strings represent the same compound. I loaded them both into Jmol, where they do appear to be the same.

The complete set of "conformer" messages are shown below, where I've abbreviated the long class names:

i.a.a.lhu: {0} building library…i.r_u.l.TRRLS: {0} It took 0.342s to generate 15 conformers for LG_5006: [H]c1c(C([H])([H])[H])noc1C([H])([H])n1c([H])nc2c(N([H])[H])nc(N([H])[H])nc21
i.r_u.l.TRRLS: {0} It took 0.336s to generate 15 conformers for LG_5006: [H]c1c(C([H])([H])[H])noc1C([H])([H])n1c([H])nc2c(N([H])[H])nc(N([H])[H])nc21
i.r_u.l.TRRLS: {0} It took 0.34s to generate 15 conformers for LG_5006: [H]c1c(C([H])([H])[H])noc1C([H])([H])n1c([H])nc2c(N([H])[H])nc(N([H])[H])nc21
i.r_u.l.TRRLS: {0} It took 0.328s to generate 15 conformers for LG_5006: [H]c1c(C([H])([H])[H])noc1C([H])([H])n1c([H])nc2c(N([H])[H])nc(N([H])[H])nc21
i.r_u.l.TRRLS: {0} It took 0.331s to generate 15 conformers for LG_5006: [H]c1c(C([H])([H])[H])noc1C([H])([H])n1c([H])nc2c(N([H])[H])nc(N([H])[H])nc21
i.r_u.l.TRRLS: {0} It took 0.336s to generate 15 conformers for LG_5006: [H]c1c(C([H])([H])[H])noc1C([H])([H])n1c([H])nc2c(N([H])[H])nc(N([H])[H])nc21
i.r_u.l.TRRLS: {0} It took 0.777s to generate 15 conformers for LG_50032: [H]O[C@]1([H])[C@@]2([H])N(c3c([H])c([H])c(C#N)c([H])c3F)C([H])([H])[C@@]3([H])C([H])([H])[C@]1([H])C([H])([H])[C@]32[H]

You'll also see even more "conformer" messages when you load compound library results. Since the compound library doesn't show you a SMILES string, you'll be stuck with the long SMILES from log.txt.