Note: This puzzle replaces Puzzle 1841, which was accidentally posted with an incorrect sequence.
Fold this coronavirus protein! This is a portion of a larger protein encoded in the viral genome of SARS-CoV-2. It is encoded in a region of the genome called NSP6, but the protein's structure and function are still unknown. If we knew how this protein folds, we might be able to figure out its exact function. The puzzle's starting structure shows SS predictions from PSIPRED, and hints which parts of the protein might fold into helices or sheets. Refold this protein to find high-scoring solutions, which will tell us how this protein is most likely to fold!
It looks like the first 11 residues might be covered inside a lipid membrane, together with the other part of the protein that is missing in this puzzle
It is the tail domain of the large 290 residue protein. So the reason why first 11 residues are hydrophobic is that they are probably buried somewhere inside the whole protein.
When you provide us with a partial protein, would you change the scoring function in order to ha a smaller weight for the exposed score part ? Or is there another mean to identify the part of the sem-protein that might be "inside" of the all protein ?
There are also the RNA sequences given.
Translating it with: https://web.expasy.org/translate/
shows that the given sequence
(CTCYFGLFCLLNRYFRLTLGVYDYLVSTQEFRYMNSQGLL
PPKNSIDAFKLNIKLLGVGGKPCIKVATVQ) is at the end of the 5'3' Frame 1.
A question I have is, why we did not have the preceeding 10 amino acids from the open reading frame starting at position 210?
The sequence should then be:
5'3' Frame 1, start_pos=210
MLVYCFLGYFCTCYFGLFCLLNRYFRLTLGVYDYLVSTQEFRYMNSQGLL
PPKNSIDAFKLNIKLLGVGGKPCIKVATVQ
Would it help to post a follow-up puzzle with
the same protein sequence as in this puzzle
that instead scores the protein as if it were
a membrane protein? I think such a follow-up
puzzle would give higher scores to solutions
with buried hydrophilic (blue) residues and
exposed hydrophobic (orange) ones.