Compound Library similarity improvements

Started March 04, 2023 by nspc

nspc Lv 1

March 04, 2023

The "Design Ligand" tool is very powerfull, but most off the difficulty is now on the Compound Library.
In my experience, it often gives results with very low similarity (0.2) , and not always similarity I want.
Try to dock those new ligand in different positions is rarely possible or very difficult (for me)

Some possible improvements :

-Player can select some parts of the ligand that he want to keep in Compound Library results. (Parts that make good bonds for exemple).
-Player can select some atoms that can have new parts added in Compound Library results. Can be usefull to start by a little ligand, and Compound Library gives similar bigger new ligands that didnt create clashes when loaded.

It can be an checkbox option that we can enable before upload, or / and something automated.

For exemple, Compound Library results can try to keep parts automaticly, that have a good score (packing / bonding).
and try to find ligand variants with new parts, only if that parts are not near a protein.

spvincent Lv 1

March 04, 2023

I agree it would be nice to be able to 'lock' a part of the ligand that you wanted to keep. But I'm guessing it would make searching the compound library for matches significantly slower: maybe even impossible.

rmoretti Staff Lv 1

March 04, 2023

spvincent is correct that the current algorithm we use for searching the compound library does not allow us to specify sub-regions of the ligand to keep fixed. The similarity metric happens at the global scale.

But we'll keep the recommendation in mind, in case we run across a compound search method which might allow us more control in that regards.

spvincent Lv 1

March 07, 2023

I think it would be good to have candidate replacement compounds identified by some kind of alphanumeric label. The current stick diagrams aren't easy to remember: what seems to happen is when on accepting a new compound from library A, making a few changes and resubmitting it as library B you end up with similar compounds to those in A and some duplicates which are hard to identify from the stick diagrams.

nspc Lv 1

March 21, 2023

In a Compound Library request, there is multiple different elements that is interesting too keep in a similar ligand result.
Before submit, it can be interesting to select parts of the ligand we want to keep but also what type of element we want to keep too.

Those types can be :
-Base structure : I mean same structure, but with different atoms if needed.
-Same atoms : keep a carbon to avoid BUNS or a nitrogen to keep an interesting bond.
-Same rotation : some parts need to keep a very similar rotation if it is a part with excellent packing. If we change too much atoms, sometimes rotation can't be keeped.

For the parts of the ligand we want to extend, there is some options we can choose too :
-Can add polar atoms (for a non buried part).
-Can add more atoms here, but no polar atoms or add only carbon (for a buried part, so we dont want add new BUNS)

This can be an option windows where we can chose all options before submit in Compoud library, but other approachs are possible.

HuubR Lv 1

April 08, 2023

According to the blog posts about the SARS-CoV-2 helicase and Nsp3 macrodomain CACHE Challenges, the Compound Library is based on the ZINC database. I tried looking around on that website, and one of the things that I found is that you can search for compounds containing a ring structure of your choice. Example: https://zinc20.docking.org/rings/indole/substances/ will give you a number of compounds with an indole group.

Now I thought this would be a way to keep the heart of my ligand the same (if it is a ring structure, and provided I can find the name for it), by using the ZINC database to find compounds that are built around that specific structure. But apparently I am missing something here. All of the componds that I found in this way, and then submitted to the Compound Libarary objective, were reportedly not in the Compound Library.

Could it be that I am looking at the wrong database? When I look at the subsets that ZINC has, I see fairly large numbers of compounds that are for sale (under the heading Availability), in different categories for delivery times and prices, but only a "small" number not-for-sale (the last of these subsets): just over two million. Even the total of all subsets is nowhere near the 20 to 30 billion that is mentioned in the blog posts.

rmoretti Staff Lv 1

April 10, 2023

The ZINC database has various different subsets. The traditional subset aggregates available structures from a bunch of suppliers. However, the ZINC search API allows us to search different subsets, including specifically the Enamine REAL set. (Which includes compounds which aren't in the traditional set, but also is missing some compounds which are.) For various logistical and legal reasons, the full 20+ billion REAL set isn't available to search by default - we've talked with John Irwin and he's given us access to a version of the API which does the search over the larger dataset.

As I understand it, the ring functionality isn't a "search" per se, but rather it's a pre-filtering of the compounds. That is, they take a particular subset of compounds and then (offline) pre-process them through a ring-search functionality, then package that up with a convenient front-end. This doesn't exactly play well with the large Enamine set, and with the particular API we're using to search. (The ring exploration is a different API than the "Small World" substructure search we're using to search the Enamine REAL set.)

rosie4loop Lv 1

May 14, 2023

I think it would be nice if we could apply chemical filter in library search, maybe after the search, for a cleaner display, if its difficult to do it before the search. For example,

Filtering compounds with bad groups. In a recent puzzle, I am getting 70 hits with like 30 having bad groups which make it inconvenient to look for what I want.
Custom filters, e.g. using simple SMILES/SMARTS input if its too much to do to implement a GUI filtering tool
- To get rid of compounds with unwanted groups
- Maybe a similar filter can also be used to display only compounds with the functional group chosen by the player, as a kind of workaround of fixing a fragment for searching?

rosie4loop Lv 1

May 14, 2023

In the post above, I assume Foldit does the following in the library search based on the observed behavior of the tool. Thus I guess it maybe possible to filter ligands after the search for display. Please correct me if its not the case.

Submit compound to the smallworld search of ZINC in the form of SMILES or SMARTS, which does not preserve the original 3D conformation.
Download hit compounds from ZINC. Either in 2D form or 3D form, with different isomers/conformers.
2.1. If the compound from ZINC is in 2D, e.g. 2D-SDF or as isomeric SMILES, generate the 3D structure within Foldit.
2.2. Before or after step 3, protonate the ligand assuming physiological pH, within Foldit.
Align the hit compounds onto player's design as the player select it in the results window.
3.1. also calculate the similarity score in this step and sort the compounds by the score.

I also have questions on step (3) if my guess is correct. What is the method used in Foldit for ligand structure alignment? Is it a pharmacophore alignment, or shape alignment? Similar to the suggestions by other users (compound-library-3d-preview-improvements and rank-similarity-in-compound-window-based-submitted-shape ), it would be nice if the alignment method can be improved, e.g.

allow the player to choose the alignment method,
allow the player to do pairwise alignment like in PyMOL.

rmoretti Staff Lv 1

May 15, 2023

You have the basic flow of the process down. The search results are downloaded as SMILES, and similarity is calculated on a 2D basis prior to alignment. (Currently using ECFP4/Morgan fingerprints.)

The alignment method we're currently using is maximum common substructure. Improving the alignment is something that's high on the priority list, as we're aware it's a limiting factor.