Can the Monte Carlo method used in Mutate be modified?

Started December 15, 2020 by BootsMcGraw

BootsMcGraw Lv 1

December 15, 2020

I was told that the Mutate algorithm for Mutate uses the Monte Carlo method to determine the best sidechain substitutions.

Monte Carlo is random and computationally inefficient. For example, to determine the best choice of four interacting sidechains, we would need to calculate the interactions for 20^4, or 160,000 combinations. With five sidechains, this increases to 20^5, or 3.2 million combinations. This is why Mutate is the slowest function we see in this game.

The main reason Foldit exists is because we, the lowly humans, can look at the structure of a protein and automatically determine if it's "not right" or "possibly good", eliminating untold numbers of useless (the "not right") paths on which to make computations towards the actual structure.

Can Foldit's Monte Carlo algorithm for Mutate be improved in a similar way, to eliminate "not right" substitutions to try? I would think, depending on the components of the subscores, that it would be possible, and depending on the importance/weight/rank of the subscore, that some sidechain substitutions could be immediately eliminated from the calculations.

This may not be the best example, but let's say that the "packing" subscore is weighted the highest among subscores. (I have no idea if that's true.) I have a leucine in a certain position in the core when the Mutate function is invoked. There is no point in seeing if glycine, alanine, serine, or valine would be a better substitution, as they are all smaller than leucine, and would only decrease the packing score. Eliminating these four sidechains reduces the number of calculations to make from 20^x down to 16^x.

I am certain the response is going to be "it's more complicated than that", but with ever-increasing protein sizes bogging down our laptops and desktops, our user experience and our productivity will continually decrease until the Mutate function is made more efficient.

rmoretti Staff Lv 1

December 15, 2020

Mutate does indeed use Monte Carlo. (It's the same algorithm that's used in Shake, it just considers rotamers from multiple amino acid identities, not just the one that's currently there.)

Specifically, it uses an approach called Metropolis Monte Carlo Simulated Annealing. The upshot is that, while random, it doesn't actually have to consider all combinations. Instead, what it does is do a random walk across the energy landscape, using information about how it's doing currently to inform where it needs to go.

You can make an analogy of someone with their eyes closed trying to climb a mountain. You make a step in a random direction, and if that's going uphill, you move your "starting point" to that spot and repeat. If it's downhill, you move back to where you were, mostly. The issue is that never-downhill will leave you stuck on small humps instead of climbing upward. To combat that, the Metropolis criteria allows you to go slightly downhill sometimes, in the hope of crossing the valley and finding the bigger hill that's across the way. (The "simulated annealing" bit refers to the fact that how big a step downhill you're allowed to take changes over time - large in the beginning, but progressively smaller as the process progresses.)

From the analogy, you can see that while you'll stumble around the mountain a bit, you won't have to cover all of it. Once you get pointed uphill, you'll reach the top relatively quickly. In fact, if you do it right, you can prove mathematically that (in the limit) you'll spend time in each state in proportion to their score. (So, very little time in bad scoring states, and most of your time in the good scoring states.)

The algorithm used is the same as that used by the Baker lab (and the rest of the Rosetta community) to do their design runs. (Often there's more complications added on top with respect to optimizations, but the actual mutations are found by this algorithm.)

The concept of pruning the number of possibilities is a good one, and we do indeed attempt to remove rotamers/identities which will likely never work. Most often this comes in the form of removing rotamers which clash with the backbone (or other fixed residues), as they will likely never form part of the productive final set. But attempting to do more pruning tends not to be productive, as (outside of those major clashes) the energies of the different rotamers tend to be either similar or context dependent. (For example, a particular rotamer could be bad, except if it happens to make a hydrogen bond with a nearby residue, in which case it's very good.)

There's various approaches such as "Dead End Elimination" which attempt to use various logic-based arguments to narrow down which rotamers (and combination of rotamers) are present in the best-scoring structure. Some work of combining DEE with Rosetta design has been made, but in practice what you see is that you spend about as much time logic-ing through the combinations as you save by the pruning, so there's not much gain.

The score is indeed a weighted combination of subscores, with known weights, but looking at subscores (rather than the full score) isn't likely to get you any benefits. (Aside from the already existing clash pruning.) Subscore energies tend to be compensatory, so you'll get rotamers with a marginally worse hiding score which enables better hydrogen bonding, or those which have worse electrostatic scores to get better packing scores. You can't uniformly pin the best residue identity on a single subscore - how the subscores combine to give you the best amino acid/conformer at that position is highly context dependent – especially when you have multiple residues which can mutate. (In your hypothetical, you might indeed want to be able to mutate the leucine to an alanine if some other position can then mutate to a tyrosine to fill in the packing gap and also make a hydrogen bond. And there's no real way to figure that out prior to considering it as a possibility.)

BootsMcGraw Lv 1

December 16, 2020

It's more complicated than that. Can I call it? Or can I call it? :-)

Thank you for that in-depth explanation. It sounds like the algorithm used in the Mutate function does some of what I hoped could be done, is not 100% brute force, and in fact, is more efficient than a pure Monte Carlo method.

Formula350 Lv 1

December 23, 2020

Just to reiterate my point: giving us the option (that I think Foldit Standalone has) of being able to Blacklist certain AA's and Whitelist between Only-Philics or Only-Phobics, would also accomplish a lot of what Boots is hoping for.

For one, it literally does what he's ask, but whittling down the list of candidates. However, just as importantly, it allows for our human brain to intervene by saying "I only want an Aromatic here. If it just isn't going to work, then leave what's there!", or "LEU, ISO, VAL, SER, THR only, because once I wiggle, I want this area to pack in tighter! So I don't care if an ALA+TYR combo works better…"

At the moment, with it how it is, we're left with 2 options:
1) Only let Mutate work on one or two selected, to crudely force things to 'maybe' mutate how we want…
2) Use brute-force tactics with a recipe… and sure we might get what we're after, but only by spending 5x longer in the process, since that brute force has no way to simultaneously work on neighboring AAs like Mutate can. It's sadly 1 segment. 1 AA. Trial. Error. Trial. Error. Trial. Err… Move on to the next Segment. Trial. Error. Tr… You get the point lol
</br>
Of course, even then, "there's more to it than that", I know. That feature of Standalone is technically already being used by puzzles to dictate when we can or can't use certain AAs based on structure, or the outright prevention of it being used (CYS). So another system would need to be piggybacked on top of that to respect the Puzzle's restriction first, and then our's. :)

Still, it's DEFINITELY a feature I feel is highly worth the effort and time it'd take to implement! I'm all for making things faster, particularly when it comes along with giving us more creative freedom… So I'll remain hopeful it'll be added! :D
-Form

zo3xiaJonWeinberg Lv 1

December 26, 2020

The AlphaGo paper explained that it beat the board game's Monte Carlo algorithm. AlphaGo was a neural network that used reinforcement learning using backpropagation. I don't understand backpropagation or the formula for gradient descent other than the fact that pi is just a variable and not 3.14 but I do understand neuro-evolution and the ReLU activation function to an extent, which can be used in a game like avoiding asteroids better than any hardcoded strategies though not better than a human. I am not sure whether neuroevolution is better than Monte-Carlo.