Placeholder image of a protein
Icon representing a puzzle

1835: Coronavirus NSP2 Prediction

Closed since almost 6 years ago

Intermediate Overall Prediction

Summary


Created
May 07, 2020
Expires
Max points
100
Description

Fold this coronavirus protein! This is one portion of a larger protein encoded in the viral genome of SARS-CoV-2. It is encoded in a region of the genome called NSP2, but the protein's structure and function are still unknown. If we knew how this protein folds, we might be able to figure out its exact function. The puzzle's starting structure shows SS predictions from PSIPRED, and hints which parts of the protein might fold into helices or sheets. Refold this protein to find high-scoring solutions, which will tell us how this protein is most likely to fold!



Sequence:


AARVVRSIFSRTLETAQNSVRVLQKAAITILDGISQYSLRLIDAMMFTSDLATNNLVVMAYITGGVVQLTSQWLTNIFGTVYEKLKPVLDWLEEKFKEGVEFLRDGWEIVKFISTCACEIVGGQIVTCAKEIKESVQTFF

Top groups


  1. Avatar for Hold My Beer 100 pts. 11,576
  2. Avatar for Gargleblasters 2. Gargleblasters 81 pts. 11,374
  3. Avatar for Go Science 3. Go Science 65 pts. 11,358
  4. Avatar for Team India 4. Team India 52 pts. 11,325
  5. Avatar for Marvin's bunch 5. Marvin's bunch 41 pts. 11,303
  6. Avatar for Beta Folders 6. Beta Folders 32 pts. 11,219
  7. Avatar for Contenders 7. Contenders 24 pts. 11,197
  8. Avatar for Anthropic Dreams 8. Anthropic Dreams 18 pts. 11,189
  9. Avatar for Void Crushers 9. Void Crushers 14 pts. 11,155
  10. Avatar for L'Alliance Francophone 10. L'Alliance Francophone 10 pts. 11,124

  1. Avatar for Steven Pletsch
    1. Steven Pletsch Lv 1
    100 pts. 11,576
  2. Avatar for actiasluna 2. actiasluna Lv 1 99 pts. 11,366
  3. Avatar for mirp 3. mirp Lv 1 98 pts. 11,337
  4. Avatar for Xartos 4. Xartos Lv 1 97 pts. 11,328
  5. Avatar for TECHFREAK 5. TECHFREAK Lv 1 95 pts. 11,325
  6. Avatar for fpc 6. fpc Lv 1 94 pts. 11,303
  7. Avatar for Serca 7. Serca Lv 1 93 pts. 11,252
  8. Avatar for fiendish_ghoul 8. fiendish_ghoul Lv 1 92 pts. 11,247
  9. Avatar for sgeldhof 9. sgeldhof Lv 1 91 pts. 11,180
  10. Avatar for Skippysk8s 10. Skippysk8s Lv 1 89 pts. 11,178

Comments


bkoep Staff Lv 1


Conf: 951668778889987044289999999998313168999999998664231538712316
Pred: CCCHHHHHHHHHHHHCCCCHHHHHHHHHHHHHCHHHHHHHHHHHHHHCCCHHCCCCCCHH
  AA: AARVVRSIFSRTLETAQNSVRVLQKAAITILDGISQYSLRLIDAMMFTSDLATNNLVVMA
              10        20        30        40        50        60


Conf: 783248999999999986558897799999999999986399999999999999600663
Pred: HHHHHHHHHHHHHHHHHHCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHCCHHH
  AA: YITGGVVQLTSQWLTNIFGTVYEKLKPVLDWLEEKFKEGVEFLRDGWEIVKFISTCACEI
              70        80        90       100       110       120


Conf: 54635564367999998519
Pred: HCCEEEECCHHHHHHHHHHC
  AA: VGGQIVTCAKEIKESVQTFF
             130       140

beta_helix Staff Lv 1


from http://clavius.bc.edu/~clotelab/DiANNA/

Cys position	Distance	Bond	        Score
116 - 118	2	KFISTCACEIV-ISTCACEIVGG	0.01037
116 - 128	12	KFISTCACEIV-GQIVTCAKEIK	0.01037
118 - 128	10	ISTCACEIVGG-GQIVTCAKEIK	0.0111


and from http://disulfind.dsi.unifi.it/monitor.php?query=jKDK6k

         .........10........20........30........40........50........60........70........
AA       AARVVRSIFSRTLETAQNSVRVLQKAAITILDGISQYSLRLIDAMMFTSDLATNNLVVMAYITGGVVQLTSQWLTNIFG
DB_state                                                                                
DB_conf                                                                                 

         80........90........100.......110.......120.......130.......140
AA       TVYEKLKPVLDWLEEKFKEGVEFLRDGWEIVKFISTCACEIVGGQIVTCAKEIKESVQTFF
DB_state                                     0 0         0            
DB_conf                                      3 1         2  

jeff101 Lv 1

In the DiANNA part above, do scores near 0.01 mean
that line's disulfide bond is not very likely?

In the bottom part, what does DB_state = 0 mean?
Also, what does DB_conf = 1 2 or 3 mean?

jeff101 Lv 1

Going to http://clavius.bc.edu/~clotelab/DiANNA/
and clicking on Help! says the following:

"Disulfide connectivity
For each pair of cysteine in the input sequence,
a neural network trained to recognize disulfide bonds
produce a score ranging from 0 to 1 (higher the score,
higher the prediction reliability)."

It also gives an example with 4 cysteines and
6 possible disulfide bonds with scores ranging
from 0.1 to 0.9. It picks 2 disulfide bonds each
with a score of 0.8 as the best combination.

jeff101 Lv 1

The link given for the disulfind part above says:

DB_state predicted disulfide bonding state (1=disulfide bonded, 0=not disulfide bonded).
DB_conf confidence of disulfide bonding state prediction (0=low to 9=high).

So I guess it predicts that no disulfide bonds form,
but it is not very confident in this prediction.

Serca Lv 1

Any ideas why this 140 residues section of the NSP2 protein is not buried inside the other 498 residues of NSP2 protein?

We cannot ignore hydrophobicity in Foldit, so best strategy to have top score on this puzzle seems to be skipping any secondary structure prediction and even the real protein SS and build your own.

Susume Lv 1

My understanding is the CASP organizers have tentatively divided the larger viral proteins into smaller sections (domains) based on the predictions they got back in the first round of competition. Those predictions came from both servers and human teams.

Serca Lv 1

Ok, now it became a bit clearer why do we have residues 360-499 of the NSP2 protein. That looks like a domain of the highest distance similarity between different CASP models. And that part has the highest helix propensity according to the SS prediction.

But I still cannot see any evidence that this fragment is spatially separated from the rest of the NSP2 protein to search its highest Foldit score. The Hiding score looks important enough to skip any structure trying to hide hydrophobics of this NSP2 fragment inside itself.

And btw, overall NSP2 has 27 cysteines.

bkoep Staff Lv 1

That's right, Serca. We are simply going off of suggestions from the CASP organizers about tentative domain assignments of this target.

These suggestions likely come from inter-residue distance prediction models, similar to AlphaFold. As far as I know, nobody has collected any empirical data about this protein's structure. So, this sequence might form a well-folded domain; but it might not. Foldit predictions might help us figure that out!