Remarks regarding the Neural Net confidence

Started by Sandrix72

Sandrix72 Lv 1

As I observed, the Neural Net confidence depends ONLY on the amino acid vector, without taking into account the designed 3D structure of the protein at all. This means that a given AA of the vector contributes to NNC (Neural Net confidence) in the same way, regardless of forming helix, sheet or loop.
I would like to know if it is the best way of handling the problem from pure scientific point of view.

If the determination of NNC is based on an artificially folded protein calculated by the server, where these circumstances are taken into account automatically, where is the reliable upper limit of NNC now (better automatic folding may increase this level in the future, of course) which we should reach to yield best result from scientific point of view?

How can be explained the low NNC for proteins "designed" by the real life?

bkoep Staff Lv 1

Neural net confidence is sort of an odd concept. It is affected by both the quality of neural net and also the "quality" of the protein sequence, so it can be difficult to interpret from a scientific point of view. I'm not sure I completely understand this question, so please rephrase and ask again if this answer does not help!

To review: AlphaFold is an algorithm where the input is a protein sequence only. AlphaFold has several outputs: the main output is a set of XYZ coordinates for all the protein atoms (the predicted structure); but AlphaFold also predicts the amount of error in those atom coordinates. The confidence value comes from these error predictions. It's important to note that these are separate predictions—it is possible for an algorithm to be really good at predicted XYZ coordinates and really bad at predicting errors, while another algorithm could be the opposite.

No matter how good your algorithm is, there is another complication. Lots of protein sequences simply do not fold in reality (failed designs), meaning they exist only as flexible, floppy chains in solution. So even if you had an algorithm that was perfect and always predicted the correct XYZ coordinates for well-folded proteins, we would still expect large errors for the XYZ coordinates of a failed design. AlphaFold is not perfect; when we see a low confidence prediction, we don't know if that is because of reality (a failed design) or because of flaws in the algorithm.

The upper limit will always be 100% confidence (zero predicted error). As a rule of thumb, we have been targeting 80% confidence in Foldit designs, just because this seems like a good cutoff when we look at actual lab results with AlphaFold. If we switched to a different algorithm (like RoseTTAFold or ESMFold), then we would probably also switch the target confidence.

Practically speaking, there is no such thing as a "perfect" prediction algorithm, so we should never expect 100% confidence. When an algorithm gives a low confidence for a natural protein (which has a known, well-folded structure), you can think of this as imperfections in the algorithm.