This last round of design puzzles is setup to test some specific aspect of FoldIt. It is a test for us and the players. From prediction puzzles, we know players can find the target fold when a sequence is given, and now we are wondering if players can find the right structure AND the right sequence.
The task for the players is to rebuild a beta-strand and choose a new set of amino acids on it in the context of a partially ripped-open core, namely the three positions highlighted in the first figure. And the task for us is to figure out what needs to change in FoldIt to allow players to distinguish desirable features.
In the second figure below are comparisons between the native structure, a highly ranked player solution, and another player solution at ~1000th rank. This puzzle is interesting because we know the native protein can be improved to be more stable. The target is a small 56 amino acid protein called protein G and is used widely in protein biochemistry labs as a model system for studying proteins. Players generally do well and place the strand in the right place, and it is clear that people come up with different solutions to the problem. Now the question is: the native protein has a small void at the end of the chain, and one of the best player solutions captures that. But is it actually a better structure than the 1000th ranked solution that uses a different strategy to push out the void? It is very hard to say one way or the other. From a designer's perspective, the structure with no void is better because if you look more closely, there are also fewer buried unsatisfied polar atoms. Yet we know the native works, so whatever is closer to it has very good odds.

In the third figure below, I am showing the comparison between the starting puzzle (in dark grey), Native in magenta, and the two aforementioned solutions (high score in green, 1000th in blue).
It shows that the models only differ subtly, with the 1000th ranked structure being slightly more compact. To really call this, we'll have to test these sequences in the wet lab, and this may happen in the future. The task for us is to understand what's lacking in our ability to differentiate the good designs from the bad – knowing that the score isn't a perfect model of the real world.
For me it seems the biggest problem is the time.
Theoretically you are multiplying efforts, practically I would think it doubles the needed time. Whatever, this is new stuff and we need to learn it, and you know how complicated protein folding is.
So it would propably better to give players more time.
That said, please don't let puzzles end at the same day, there should be at least 2 days between.
[quote]– knowing that the score isn't a perfect model of the real world.[/quote]
I think that last part might be a very essential observation.
Any ideas on the problems that arise with the scoring system as it is used right now? (meaning: do you know of any articles you can link to, or do you have any thoughts on the scoring system yourself? I'd like to learn more about that :) )
In addition to that, I'd like to agree with Lenn that the distribution of puzzles should be a bit more solid and predictable.
I don't set time for the puzzles, but I can see your point. I'll forward the message.
We intentionally make the first few design puzzles small (and relatively short in time), so we can see quickly how we are doing with the tools. Are they sufficient or too complicated? We also wanted to know if we are setting up the puzzles correctly. Sometimes there are features we don't want people to break; we need to know we can limit the scope and not directing players down the wrong path. This is almost like a "debugging' process, sorry if it wasn't as satisfactory as it should have been. We'll try to do better.
The current scoring system under the hood is a complicated combination of many different energy terms. Each of the terms is meant to capture the physics involved. A good general description can be found in this review paper: http://depts.washington.edu/bakerpg/papers/18410248.pdf
The scoring system is parameterized such that a native fold would give you the highest score. This is quite apparent in fold-prediction puzzles; if you can fold it up correctly, usually the closer you are to the native fold, the better your score gets. When the amino acids are allowed to change, things become a little more complicated. In analyzing experimental data, we can sometimes predict whether a mutation is stabilizing or destabilizing, but not yet 100% reliable. Surface positions, for example, are extremely hard to design. Their effects on proteins are important, but sometimes the algorithm would choose a residue for the wrong reason.
in my opinion, the score in Foldit is one of the best, but it definitely can be improved (and it's an ongoing research). In my response to an earlier comment, I mentioned that we are also learning from these early design puzzles to make sure we are setting up the puzzles properly. This is exactly the reason. We know the score in general behaves well, but when setting up design puzzles, we need to make sure the quirky parts of the score don't mislead people. I hope this clarifies things a bit.
Thank you for the response and the link to that article, possu, very much appreciated! :)