Placeholder image of a protein
Icon representing a puzzle

1841b: Coronavirus NSP6 Prediction

Closed since almost 6 years ago

Intermediate Overall Prediction

Summary


Created
May 22, 2020
Expires
Max points
100
Description

Note: This puzzle replaces Puzzle 1841, which was accidentally posted with an incorrect sequence.



Fold this coronavirus protein! This is a portion of a larger protein encoded in the viral genome of SARS-CoV-2. It is encoded in a region of the genome called NSP6, but the protein's structure and function are still unknown. If we knew how this protein folds, we might be able to figure out its exact function. The puzzle's starting structure shows SS predictions from PSIPRED, and hints which parts of the protein might fold into helices or sheets. Refold this protein to find high-scoring solutions, which will tell us how this protein is most likely to fold!



Sequence:


CTCYFGLFCLLNRYFRLTLGVYDYLVSTQEFRYMNSQGLLPPKNSIDAFKLNIKLLGVGGKPCIKVATVQ

Top groups


  1. Avatar for Go Science 100 pts. 9,500
  2. Avatar for Anthropic Dreams 2. Anthropic Dreams 77 pts. 9,430
  3. Avatar for Gargleblasters 3. Gargleblasters 58 pts. 9,285
  4. Avatar for Void Crushers 4. Void Crushers 43 pts. 9,213
  5. Avatar for L'Alliance Francophone 5. L'Alliance Francophone 31 pts. 9,209
  6. Avatar for Contenders 6. Contenders 22 pts. 9,191
  7. Avatar for Team India 7. Team India 15 pts. 9,150
  8. Avatar for Beta Folders 8. Beta Folders 11 pts. 9,109
  9. Avatar for Hold My Beer 9. Hold My Beer 7 pts. 9,034
  10. Avatar for Penny-Arcade 10. Penny-Arcade 5 pts. 9,034

  1. Avatar for allie_heather47 81. allie_heather47 Lv 1 32 pts. 8,705
  2. Avatar for pvc78 82. pvc78 Lv 1 31 pts. 8,688
  3. Avatar for John McLeod 83. John McLeod Lv 1 31 pts. 8,681
  4. Avatar for SKSbell 84. SKSbell Lv 1 30 pts. 8,679
  5. Avatar for Merf 85. Merf Lv 1 30 pts. 8,673
  6. Avatar for Lotus23 86. Lotus23 Lv 1 29 pts. 8,672
  7. Avatar for Bletchley Park 87. Bletchley Park Lv 1 29 pts. 8,671
  8. Avatar for Hellcat6 88. Hellcat6 Lv 1 28 pts. 8,656
  9. Avatar for WBarme1234 89. WBarme1234 Lv 1 28 pts. 8,641
  10. Avatar for APPAAP 90. APPAAP Lv 1 27 pts. 8,621

Comments


bkoep Staff Lv 1


Conf: 950118999999981995155124106999999985799999957999998887204268
Pred: CCCEEHHHHHHHHHHCCCCEECCEEECHHHHHHHHHCCCCCCCCHHHHHHHHHHHCCCCC
  AA: CTCYFGLFCLLNRYFRLTLGVYDYLVSTQEFRYMNSQGLLPPKNSIDAFKLNIKLLGVGG
              10        20        30        40        50        60


Conf: 6246425329
Pred: CCEEEEEECC
  AA: KPCIKVATVQ
              70

Bautho Lv 1

It looks like the first 11 residues might be covered inside a lipid membrane, together with the other part of the protein that is missing in this puzzle

SS_Prediction

Bautho Lv 1

Seq. SAVKRTIKGT HHWLLLTILT SLLVLVQSTQ WSLFFFLYEN AFLPFAMGII
TOPCONS iiiiiiiiii iMMMMMMMMM MMMMMMMMMM MMooooooMM MMMMMMMMMM
OCTOPUS iiiiiiiiii iMMMMMMMMM MMMMMMMMMM MMooooooMM MMMMMMMMMM
Philius iiiiiiiiii iMMMMMMMMM MMMMMMMMMM Mooooooooo MMMMMMMMMM
PolyPhobius iiiiiiiiii iMMMMMMMMM MMMMMMMMMo oooooooooM MMMMMMMMMM
SCAMPI iiiiiiiiii MMMMMMMMMM MMMMMMMMMM MooooooooM MMMMMMMMMM
SPOCTOPUS iiiiiiiiii iMMMMMMMMM MMMMMMMMMM MMooooooMM MMMMMMMMMM
PDB-homology

51 91

Seq. AMSAFAMMFV KHKHAFLCLF LLPSLATVAY FNMVYMPASW VMRIMTWLDM
TOPCONS MMMMMMMMMi iiiiiMMMMM MMMMMMMMMM MMMMMMoooo oooooooooo
OCTOPUS MMMMMMMMMi iiiiiMMMMM MMMMMMMMMM MMMMMMoooo oooooooooo
Philius MMMMMMMMMM iiiiiiMMMM MMMMMMMMMM MMMMMMMMoo oooooooooo
PolyPhobius MMMMMMMMMi iiiiiMMMMM MMMMMMMMMM MMMMMooooo oooooooooo
SCAMPI MMMMMMMMMM iiiiMMMMMM MMMMMMMMMM MMMMMooooo oooooooooo
SPOCTOPUS MMMMMMMMMi iiiiiMMMMM MMMMMMMMMM MMMMMMoooo oooooooooo
PDB-homology

101 141

Seq. VDTSLSGFKL KDCVMYASAV VLLILMTART VYDDGARRVW TLMNVLTLVY
TOPCONS oooooooMMM MMMMMMMMMM MMMMMMMMii iiiiiMMMMM MMMMMMMMMM
OCTOPUS oooooooMMM MMMMMMMMMM MMMMMMMMii iiiiiiMMMM MMMMMMMMMM
Philius oooooooooo ooMMMMMMMM MMMMMMMMMM MMiiiiiiii iiiiiiiiii
PolyPhobius oooooooMMM MMMMMMMMMM MMMMMMMMii iiiiiiiMMM MMMMMMMMMM
SCAMPI oooooooMMM MMMMMMMMMM MMMMMMMMii iiiiMMMMMM MMMMMMMMMM
SPOCTOPUS oooooooMMM MMMMMMMMMM MMMMMMMMii iiiiiiMMMM MMMMMMMMMM
PDB-homology

151 191

Seq. KVYYGNALDQ AISMWALIIS VTSNYSGVVT TVMFLARGIV FMCVEYCPIF
TOPCONS MMMMMMoMMM MMMMMMMMMM MMMMMMMMiM MMMMMMMMMM MMMMMMMMMM
OCTOPUS MMMMMMMooo oooMMMMMMM MMMMMMMMiM MMMMMMMMMM MMMMoooooo
Philius iiiiiiiiii iiiiMMMMMM MMMMMMMMMM MMMMMMMMMM MMMMMMMMMo
PolyPhobius MMMMMooooo oooooMMMMM MMMMMMMMMM MMMMMMMMMM MMMMMMMiii
SCAMPI MMMMMooooo oooooooooo oooooooooo oooooooooo oooooooooo
SPOCTOPUS MMMMMMMooo oooMMMMMMM MMMMMMMMiM MMMMMMMMMM MMMMoooooo
PDB-homology

201 241

Seq. FITGNTLQCI MLVYCFLGYF CTCYFGLFCL LNRYFRLTLG VYDYLVSTQE
TOPCONS ooooooooMM MMMMMMMMMM MMMMMMMMMi iiiiiiiiii iiiiiiiiii
OCTOPUS oooooooMMM MMMMMMMMMM MMMMMMMMii iiiiiiiiii iiiiiiiiii
Philius oooooooooM MMMMMMMMMM MMMMMMMMMM Miiiiiiiii iiiiiiiiii
PolyPhobius iiiiiiiiMM MMMMMMMMMM MMMMMMMMMM Mooooooooo oooooooooo
SCAMPI oooooooooo MMMMMMMMMM MMMMMMMMMM Miiiiiiiii iiiiiiiiii
SPOCTOPUS oooooooMMM MMMMMMMMMM MMMMMMMMii iiiiiiiiii iiiiiiiiii
PDB-homology

251 281

Seq. FRYMNSQGLL PPKNSIDAFK LNIKLLGVGG KPCIKVATVQ
TOPCONS iiiiiiiiii iiiiiiiiii iiiiiiiiii iiiiiiiiii
OCTOPUS iiiiiiiiii iiiiiiiiii iiiiiiiiii iiiiiiiiii
Philius iiiiiiiiii iiiiiiiiii iiiiiiiiii iiiiiiiiii
PolyPhobius oooooooooo oooooooooo oooooooooo oooooooooo
SCAMPI iiiiiiiiii iiiiiiiiii iiiiiiiiii iiiiiiiiii
SPOCTOPUS iiiiiiiiii iiiiiiiiii iiiiiiiiii iiiiiiiiii

Serca Lv 1

It is the tail domain of the large 290 residue protein. So the reason why first 11 residues are hydrophobic is that they are probably buried somewhere inside the whole protein.

jeff101 Lv 1

I think residues 221-290 in the post above match 
the 70-residue sequence given for this puzzle:

            0000000001 1111111112 2222222223
            1234567890 1234567890 1234567890
--------------------------------------------
Conf        9501189999 9998199515 5124106999 
PSIPRED     CCCEEHHHHH HHHHHCCCCE ECCEEECHHH 
Seq.        CTCYFGLFCL LNRYFRLTLG VYDYLVSTQE
TOPCONS     MMMMMMMMMi iiiiiiiiii iiiiiiiiii
OCTOPUS     MMMMMMMMii iiiiiiiiii iiiiiiiiii
Philius     MMMMMMMMMM Miiiiiiiii iiiiiiiiii
PolyPhobius MMMMMMMMMM Mooooooooo oooooooooo
SCAMPI      MMMMMMMMMM Miiiiiiiii iiiiiiiiii
SPOCTOPUS   MMMMMMMMii iiiiiiiiii iiiiiiiiii
--------------------------------------------
            2222222222 2222222222 2222222222
            2222222223 3333333334 4444444445
            1234567890 1234567890 1234567890

            3333333334 4444444445 5555555556 6666666667
            1234567890 1234567890 1234567890 1234567890
-------------------------------------------------------        
Conf        9999857999 9995799999 8887204268 6246425329
PSIPRED     HHHHHHCCCC CCCCHHHHHH HHHHHCCCCC CCEEEEEECC
Seq.        FRYMNSQGLL PPKNSIDAFK LNIKLLGVGG KPCIKVATVQ
TOPCONS     iiiiiiiiii iiiiiiiiii iiiiiiiiii iiiiiiiiii
OCTOPUS     iiiiiiiiii iiiiiiiiii iiiiiiiiii iiiiiiiiii
Philius     iiiiiiiiii iiiiiiiiii iiiiiiiiii iiiiiiiiii
PolyPhobius oooooooooo oooooooooo oooooooooo oooooooooo
SCAMPI      iiiiiiiiii iiiiiiiiii iiiiiiiiii iiiiiiiiii
SPOCTOPUS   iiiiiiiiii iiiiiiiiii iiiiiiiiii iiiiiiiiii
-------------------------------------------------------
            2222222222 2222222222 2222222222 2222222222
            5555555556 6666666667 7777777778 8888888889
            1234567890 1234567890 1234567890 1234567890

One question I have is what the letters M i and o mean above.

Bruno Kestemont Lv 1

When you provide us with a partial protein, would you change the scoring function in order to ha a smaller weight for the exposed score part ? Or is there another mean to identify the part of the sem-protein that might be "inside" of the all protein ?

RW-QuantumSec Lv 1

I do not think that it is a membrane protein, as the given sequence is from the Non-Structural-Protein 6 (NSP6).
A great resource for me is the following: https://www.nytimes.com/interactive/2020/04/03/science/coronavirus-genome-bad-news-wrapped-in-protein.html
As it is stated there, NSP6 works with NSP3 and NSP4 to produce virus bubbles.
Does someone know, what exactly that mean?

There are also the RNA sequences given.
Translating it with: https://web.expasy.org/translate/
shows that the given sequence
(CTCYFGLFCLLNRYFRLTLGVYDYLVSTQEFRYMNSQGLL
PPKNSIDAFKLNIKLLGVGGKPCIKVATVQ) is at the end of the 5'3' Frame 1.

A question I have is, why we did not have the preceeding 10 amino acids from the open reading frame starting at position 210?
The sequence should then be:
5'3' Frame 1, start_pos=210
MLVYCFLGYFCTCYFGLFCLLNRYFRLTLGVYDYLVSTQEFRYMNSQGLL
PPKNSIDAFKLNIKLLGVGGKPCIKVATVQ

jeff101 Lv 1

Since CTCYFG… starts at position 221,
the sequence given above has MLVYCF…
starting at position 211 not 210.

jeff101 Lv 1

Would it help to post a follow-up puzzle with
the same protein sequence as in this puzzle
that instead scores the protein as if it were
a membrane protein? I think such a follow-up
puzzle would give higher scores to solutions
with buried hydrophilic (blue) residues and
exposed hydrophobic (orange) ones.