The reason this happens is that an "iteration" of rebuild, as currently defined, isn't equal to finding a single match.
An iteration right now is that rebuild tries 50 different positions. How many of these it actually accepts and allows you to use is essentially up to which positions it happens to get that time. It might be 2, it might be 0, it might be 50 (usually somewhere between 1-3 or so, from the sound of it).
The reason it is like this is that rebuild isn't guaranteed to complete if you require it to find x number of positions. Sometimes rebuild just cant find anything, and in that case, it would run forever.
In short, this isn't a bug - it's just how we've defined iterations in the context of rebuild.
However, I agree that it would be useful to be able to ask rebuild for x positions and get x positions.
Now, a possible way to address this could be the following: Rebuild would take 2 arguments. The first would be the number of positions you want it to find. The second would be the number of positions that it tries, from the last accepted position, until it gives up. So rebuild would make an effort to try and find the number of positions you want, but would give up if it cant seem to find anything. One could also imagine having a default value (let's say, 50) for the second argument, so that things would be backwards compatible with old scripts.
Cool!
2 arguments, 2nd optional and default exact like it is now :)
To make smaller numbers, maybe we call second argument as a multiplier?
ie:
rebuild(1,2) - will made up to 1 new position on up to 100 tries
rebuild(2,1) - up to 2 positions in max 50 tries
rebuild(1) - 1 new position in 50,
rebuild(30) - 30 new positions in 50 tries? or 30*50 tries? because (30,1) should make up to 30 in 50 tries.
Thanks for the explanation of why rebuild produces a variable number of results.
The proposed solution, adding another parameter to specify the number of tries, would work for me if and only if there was some way of knowing how many new positions actually got constructed: perhaps the function could return this value.
An explanation of why so many rebuild tries get rejected would be interesting sometime and would help enable users to make an informed decision as to what a good number of tries should be.
This is how I understand it -
The tries get rejected for reasons similar to why the alignment tool used to fail a lot.
Rebuild essentially just builds up a library of ways to fold a particular sequence, and then tries a bunch of them out.
When you start up rebuild on a sequence, you can see it working on some initial steps before it actually starts rebuilding. This is when it's building up the library for that particular sequence.
Then, it starts pulling conformations (different ways of folding that particular sequence) and tries to stick them into the protein. The problem is that more often than not, when it does this, one of the ends is going to be totally out of place. It tries to figure out how to glue things back together (close the cuts), but this doesn't always work. So you end up with a lot of failed rebuilds.
This should be fixed in the new update, with the 2 arg solution. As far as returning the number of actual rebuilds performed, I'm not quite sure how to do that yet, but as is, this still should be an upgrade. I'll look into adding the actual number performed.
Just for reference - rebuild will now be
structure.RebuildSelected(number_of_rebuilds_desired, number_of_consecutive_fails_before_give_up)
If you just use
structure.RebuildSelected(number_of_rebuilds_desired), it will use a default value of 50 for the second argument.