Foldit

Bruno Kestemont Lv 1

E.G for https://foldit.fandom.com/wiki/Revisiting_puzzle/68:_Bos_taurus, there are 5 PDB entries, each of them with about 20 models.

Why are thes models all published ? How do the scientists select the "best" one ??

rmoretti Staff Lv 1

Jan 07

The same protein can get multiple entries in the PDB for various reasons.

One is simply that there's different experimental techniques at play. Different techniques like NMR, X-Ray diffraction & neutron diffraction give different information, so scientists may want to use a different technique to examine the structure in order to look at that different information, even if a structure already exists.

Another is different experimental conditions. Changing the pH, or the temperature, or the buffer conditions, or which other components are present in solution may change the structure. Scientists may want to see how these environment changes change the structure. (You can see this a bit in the 68 revisiting puzzle - 1B1G has calcium modeled in, whereas 1CLB is explicitly the apo (unbound) form.)

Crystal forms is another, for X-ray structures. Depending on the precise conditions, the same protein can crystallize with different symmetries. These different symmetries might have slightly different structures, due to crystal packing effects. Determining the structure of the same protein in a different crystal form may give information about what parts of the structure are intrinsic, and which might be artifacts of crystal packing. – Alternatively, a fair number of proteins undergo conformational change. If you have one of those proteins, the different crystal forms may represent different conformers which actually occur in solution.

Another factor is that structural determination methods tend to get better over time. You can potentially get a better structure using newer instruments and data processing techniques. This can be obvious with X-ray structures, where the listed resolution of newer structures tend to be better than older ones. So it may be worth re-doing a structure which was last solved in 1998, if you need a "better" version.

A final factor is that the deposited structure might not actually be the point of the research. If you're interested in variants of a protein (e.g. mutations or different species), you may want to re-determine the structure of an already deposited protein, just to make sure you have a reference structure which can be compared 1-to-1 with your variant structures. (As structures determined by a different lab under different conditions may have differences related to how they were determined, rather than being from the sequence difference itself.) And once you've determined the structure, there's no reason not to deposit it, even it is technically redundant.

Somewhat related to that, there has been various structual biology consortium efforts to determine protein structures. These are sometimes high-throughput efforts which throw a whole bunch of proteins of a given type (all human proteins, all yeast proteins, all kinases, etc.) through a structure determination pipeline, without necessarily a regard for whether the structure has already been determined. Similarly, it's not uncommon for multiple labs to be working on the same protein, and thus both of them determine the same structure at about the same time. Even if you come in second, it's still worth depositing your results.

Regarding which structure to choose, that depends a bit on what you want to use the structure for. Apo versus ligand bound is a big determinant, as is if there's any relevant solution conditions which might result in structural changes. (You want to pick the structure that best matches the conditions you're interested in.) The resolution of the structure is often used to pick between structures, though that's not a 100% guarantee of quality. Using the provided validation metrics can help determine if a particular structure is of good quality or if it might have issues.

As a computationalist, I often will take all of the multiple structures and pass all of them through whatever pipeline (e.g. ligand docking) I have set up. I can then compare at how the different starting structures behave. If I do need to pick a particular structure, I may do a computational "relax" of all of them, to look at which one has the best score.

I should also note that you can potentially get a better sense of the reason for the different structures by taking a look at the associated publications. Generally speaking, a paper announcing a new structure will discuss previously published structures, and will likely mention why that structure was insufficient for their needs.