Foldit

1841b: Coronavirus NSP6 Prediction

Closed since almost 6 years ago

Intermediate Intermediate Intermediate Intermediate Intermediate Intermediate Overall Overall Overall Overall Overall Overall Prediction Prediction Prediction Prediction Prediction Prediction

Summary

Created: May 22, 2020
Expires: May 28, 2020 at 23:00 UTC
Max points: 100

Description

Note: This puzzle replaces Puzzle 1841, which was accidentally posted with an incorrect sequence.

Fold this coronavirus protein! This is a portion of a larger protein encoded in the viral genome of SARS-CoV-2. It is encoded in a region of the genome called NSP6, but the protein's structure and function are still unknown. If we knew how this protein folds, we might be able to figure out its exact function. The puzzle's starting structure shows SS predictions from PSIPRED, and hints which parts of the protein might fold into helices or sheets. Refold this protein to find high-scoring solutions, which will tell us how this protein is most likely to fold!

Sequence:

CTCYFGLFCLLNRYFRLTLGVYDYLVSTQEFRYMNSQGLLPPKNSIDAFKLNIKLLGVGGKPCIKVATVQ

Top groups

1. Go Science
100 pts. 9,500
2. Anthropic Dreams 77 pts. 9,430
3. Gargleblasters 58 pts. 9,285
4. Void Crushers 43 pts. 9,213
5. L'Alliance Francophone 31 pts. 9,209
6. Contenders 22 pts. 9,191
7. Team India 15 pts. 9,150
8. Beta Folders 11 pts. 9,109
9. Hold My Beer 7 pts. 9,034
10. Penny-Arcade 5 pts. 9,034

81. allie_heather47 Lv 1 32 pts. 8,705
82. pvc78 Lv 1 31 pts. 8,688
83. John McLeod Lv 1 31 pts. 8,681
84. SKSbell Lv 1 30 pts. 8,679
85. Merf Lv 1 30 pts. 8,673
86. Lotus23 Lv 1 29 pts. 8,672
87. Bletchley Park Lv 1 29 pts. 8,671
88. Hellcat6 Lv 1 28 pts. 8,656
89. WBarme1234 Lv 1 28 pts. 8,641
90. APPAAP Lv 1 27 pts. 8,621

Comments

bkoep Staff Lv 1

May 22, 2020


Conf: 950118999999981995155124106999999985799999957999998887204268
Pred: CCCEEHHHHHHHHHHCCCCEECCEEECHHHHHHHHHCCCCCCCCHHHHHHHHHHHCCCCC
  AA: CTCYFGLFCLLNRYFRLTLGVYDYLVSTQEFRYMNSQGLLPPKNSIDAFKLNIKLLGVGG
              10        20        30        40        50        60


Conf: 6246425329
Pred: CCEEEEEECC
  AA: KPCIKVATVQ
              70

Bautho Lv 1

May 22, 2020

It looks like the first 11 residues might be covered inside a lipid membrane, together with the other part of the protein that is missing in this puzzle

SS_Prediction

Bautho Lv 1

May 22, 2020

Seq. SAVKRTIKGT HHWLLLTILT SLLVLVQSTQ WSLFFFLYEN AFLPFAMGII
TOPCONS iiiiiiiiii iMMMMMMMMM MMMMMMMMMM MMooooooMM MMMMMMMMMM
OCTOPUS iiiiiiiiii iMMMMMMMMM MMMMMMMMMM MMooooooMM MMMMMMMMMM
Philius iiiiiiiiii iMMMMMMMMM MMMMMMMMMM Mooooooooo MMMMMMMMMM
PolyPhobius iiiiiiiiii iMMMMMMMMM MMMMMMMMMo oooooooooM MMMMMMMMMM
SCAMPI iiiiiiiiii MMMMMMMMMM MMMMMMMMMM MooooooooM MMMMMMMMMM
SPOCTOPUS iiiiiiiiii iMMMMMMMMM MMMMMMMMMM MMooooooMM MMMMMMMMMM
PDB-homology

51 91

Seq. AMSAFAMMFV KHKHAFLCLF LLPSLATVAY FNMVYMPASW VMRIMTWLDM
TOPCONS MMMMMMMMMi iiiiiMMMMM MMMMMMMMMM MMMMMMoooo oooooooooo
OCTOPUS MMMMMMMMMi iiiiiMMMMM MMMMMMMMMM MMMMMMoooo oooooooooo
Philius MMMMMMMMMM iiiiiiMMMM MMMMMMMMMM MMMMMMMMoo oooooooooo
PolyPhobius MMMMMMMMMi iiiiiMMMMM MMMMMMMMMM MMMMMooooo oooooooooo
SCAMPI MMMMMMMMMM iiiiMMMMMM MMMMMMMMMM MMMMMooooo oooooooooo
SPOCTOPUS MMMMMMMMMi iiiiiMMMMM MMMMMMMMMM MMMMMMoooo oooooooooo
PDB-homology

101 141

Seq. VDTSLSGFKL KDCVMYASAV VLLILMTART VYDDGARRVW TLMNVLTLVY
TOPCONS oooooooMMM MMMMMMMMMM MMMMMMMMii iiiiiMMMMM MMMMMMMMMM
OCTOPUS oooooooMMM MMMMMMMMMM MMMMMMMMii iiiiiiMMMM MMMMMMMMMM
Philius oooooooooo ooMMMMMMMM MMMMMMMMMM MMiiiiiiii iiiiiiiiii
PolyPhobius oooooooMMM MMMMMMMMMM MMMMMMMMii iiiiiiiMMM MMMMMMMMMM
SCAMPI oooooooMMM MMMMMMMMMM MMMMMMMMii iiiiMMMMMM MMMMMMMMMM
SPOCTOPUS oooooooMMM MMMMMMMMMM MMMMMMMMii iiiiiiMMMM MMMMMMMMMM
PDB-homology

151 191

Seq. KVYYGNALDQ AISMWALIIS VTSNYSGVVT TVMFLARGIV FMCVEYCPIF
TOPCONS MMMMMMoMMM MMMMMMMMMM MMMMMMMMiM MMMMMMMMMM MMMMMMMMMM
OCTOPUS MMMMMMMooo oooMMMMMMM MMMMMMMMiM MMMMMMMMMM MMMMoooooo
Philius iiiiiiiiii iiiiMMMMMM MMMMMMMMMM MMMMMMMMMM MMMMMMMMMo
PolyPhobius MMMMMooooo oooooMMMMM MMMMMMMMMM MMMMMMMMMM MMMMMMMiii
SCAMPI MMMMMooooo oooooooooo oooooooooo oooooooooo oooooooooo
SPOCTOPUS MMMMMMMooo oooMMMMMMM MMMMMMMMiM MMMMMMMMMM MMMMoooooo
PDB-homology

201 241

Seq. FITGNTLQCI MLVYCFLGYF CTCYFGLFCL LNRYFRLTLG VYDYLVSTQE
TOPCONS ooooooooMM MMMMMMMMMM MMMMMMMMMi iiiiiiiiii iiiiiiiiii
OCTOPUS oooooooMMM MMMMMMMMMM MMMMMMMMii iiiiiiiiii iiiiiiiiii
Philius oooooooooM MMMMMMMMMM MMMMMMMMMM Miiiiiiiii iiiiiiiiii
PolyPhobius iiiiiiiiMM MMMMMMMMMM MMMMMMMMMM Mooooooooo oooooooooo
SCAMPI oooooooooo MMMMMMMMMM MMMMMMMMMM Miiiiiiiii iiiiiiiiii
SPOCTOPUS oooooooMMM MMMMMMMMMM MMMMMMMMii iiiiiiiiii iiiiiiiiii
PDB-homology

251 281

Seq. FRYMNSQGLL PPKNSIDAFK LNIKLLGVGG KPCIKVATVQ
TOPCONS iiiiiiiiii iiiiiiiiii iiiiiiiiii iiiiiiiiii
OCTOPUS iiiiiiiiii iiiiiiiiii iiiiiiiiii iiiiiiiiii
Philius iiiiiiiiii iiiiiiiiii iiiiiiiiii iiiiiiiiii
PolyPhobius oooooooooo oooooooooo oooooooooo oooooooooo
SCAMPI iiiiiiiiii iiiiiiiiii iiiiiiiiii iiiiiiiiii
SPOCTOPUS iiiiiiiiii iiiiiiiiii iiiiiiiiii iiiiiiiiii

Serca Lv 1

May 23, 2020

It is the tail domain of the large 290 residue protein. So the reason why first 11 residues are hydrophobic is that they are probably buried somewhere inside the whole protein.

jeff101 Lv 1

May 24, 2020

I think residues 221-290 in the post above match 
the 70-residue sequence given for this puzzle:

            0000000001 1111111112 2222222223
            1234567890 1234567890 1234567890
--------------------------------------------
Conf        9501189999 9998199515 5124106999 
PSIPRED     CCCEEHHHHH HHHHHCCCCE ECCEEECHHH 
Seq.        CTCYFGLFCL LNRYFRLTLG VYDYLVSTQE
TOPCONS     MMMMMMMMMi iiiiiiiiii iiiiiiiiii
OCTOPUS     MMMMMMMMii iiiiiiiiii iiiiiiiiii
Philius     MMMMMMMMMM Miiiiiiiii iiiiiiiiii
PolyPhobius MMMMMMMMMM Mooooooooo oooooooooo
SCAMPI      MMMMMMMMMM Miiiiiiiii iiiiiiiiii
SPOCTOPUS   MMMMMMMMii iiiiiiiiii iiiiiiiiii
--------------------------------------------
            2222222222 2222222222 2222222222
            2222222223 3333333334 4444444445
            1234567890 1234567890 1234567890

            3333333334 4444444445 5555555556 6666666667
            1234567890 1234567890 1234567890 1234567890
-------------------------------------------------------        
Conf        9999857999 9995799999 8887204268 6246425329
PSIPRED     HHHHHHCCCC CCCCHHHHHH HHHHHCCCCC CCEEEEEECC
Seq.        FRYMNSQGLL PPKNSIDAFK LNIKLLGVGG KPCIKVATVQ
TOPCONS     iiiiiiiiii iiiiiiiiii iiiiiiiiii iiiiiiiiii
OCTOPUS     iiiiiiiiii iiiiiiiiii iiiiiiiiii iiiiiiiiii
Philius     iiiiiiiiii iiiiiiiiii iiiiiiiiii iiiiiiiiii
PolyPhobius oooooooooo oooooooooo oooooooooo oooooooooo
SCAMPI      iiiiiiiiii iiiiiiiiii iiiiiiiiii iiiiiiiiii
SPOCTOPUS   iiiiiiiiii iiiiiiiiii iiiiiiiiii iiiiiiiiii
-------------------------------------------------------
            2222222222 2222222222 2222222222 2222222222
            5555555556 6666666667 7777777778 8888888889
            1234567890 1234567890 1234567890 1234567890

One question I have is what the letters M i and o mean above.

puxatudo Lv 1

May 24, 2020

i = inside
o = outside
M = membrane

Considering the protein is a membrane one.

Bruno Kestemont Lv 1

May 25, 2020

When you provide us with a partial protein, would you change the scoring function in order to ha a smaller weight for the exposed score part ? Or is there another mean to identify the part of the sem-protein that might be "inside" of the all protein ?

RW-QuantumSec Lv 1

May 25, 2020

I do not think that it is a membrane protein, as the given sequence is from the Non-Structural-Protein 6 (NSP6).
A great resource for me is the following: https://www.nytimes.com/interactive/2020/04/03/science/coronavirus-genome-bad-news-wrapped-in-protein.html
As it is stated there, NSP6 works with NSP3 and NSP4 to produce virus bubbles.
Does someone know, what exactly that mean?

There are also the RNA sequences given.
Translating it with: https://web.expasy.org/translate/
shows that the given sequence
(CTCYFGLFCLLNRYFRLTLGVYDYLVSTQEFRYMNSQGLL
PPKNSIDAFKLNIKLLGVGGKPCIKVATVQ) is at the end of the 5'3' Frame 1.

A question I have is, why we did not have the preceeding 10 amino acids from the open reading frame starting at position 210?
The sequence should then be:
5'3' Frame 1, start_pos=210
MLVYCFLGYFCTCYFGLFCLLNRYFRLTLGVYDYLVSTQEFRYMNSQGLL
PPKNSIDAFKLNIKLLGVGGKPCIKVATVQ

jeff101 Lv 1

May 25, 2020

Since CTCYFG… starts at position 221,
the sequence given above has MLVYCF…
starting at position 211 not 210.

jeff101 Lv 1

May 25, 2020

Would it help to post a follow-up puzzle with
the same protein sequence as in this puzzle
that instead scores the protein as if it were
a membrane protein? I think such a follow-up
puzzle would give higher scores to solutions
with buried hydrophilic (blue) residues and
exposed hydrophobic (orange) ones.