This puzzle starts with an unfolded sequence with secondary structure assigned from PSIPRED. The target protein is LepB and is currently being investigated for drug discovery against Tuberculosis (TB). TB is caused by the bacillus Mycobacterium tuberculosis and has killed more than 1.5 million people in 2014. Right now, no crystal structure exists for this target. Models created by Foldit players will be used to help solve the structure when crystals become available.
I think a puzzle this size is quite intractable, particularly when the secondary structure is so ill-defined. How about a few starting structures in the alignment palette?
There is no evidence that there are disulfide bonds in LepB. One of the scientist who is working on this problem responded:
To predict if the protein forms disulfide bonds scientists look at the protein residency in a particular cellular compartment and its oxidative properties.
You check what kind of bacteria TB is gram+ or gram-
With 6 cysteines, there is
1 way to have no disulfides,
15 ways to have 1 disulfide,
45 ways to have 2 disulfides, and
15 ways to have 3 disulfides.
If we number the cysteines 1-6
so that 12,34 means 2 disulfides
(one between cysteines 1 & 2 and
another between cysteines 3 & 4),
below are all the different ways:
bandsome (https://fold.it/portal/recipe/43861)
has a web page with discussion and links about
disulfide bonds. It says that more disulfide
bonds form in an oxidizing environment (like in
the blood, spinal fluid, extracellular medium,
lumen of the rough endoplasmic reticulum,
mitochondrial intermembrane space, secretory
proteins, lysosomal proteins, exoplasmic domains
of membrane proteins, hair, and feathers) than
in a reducing environment (like in the cytosol
and most cellular compartments).
This puzzle has about twice the usual number of amino acids in it.
Why not give us about twice the usual amount of time to work on it?
Also, please let us load this puzzle's solutions into future puzzles.
I think you are hitting upon one of the reasons why this specific target is so difficult. This target is bound to the cell membrane of mycobacterium tuberculosis through an 80 residue linker. For this puzzle, we have already taken that 80 residue linker out to focus more on the fold of the protein. Additionally, the protein is only ~25% identical to the closest homolog with a crystal structure.
For the first phase, I would like to see what types of interesting ideas come from Foldit. Why? Because this is a difficult problem and Foldit players think about the puzzles in a different way than I do, or the other scientists working on this puzzle. This is a great asset, especially in a field where ideas have become a little stagnated. I value the models and ideas from Foldit players.
Phase 2 will incorporate folded proteins from phase 1 along with a homology model that I created and a couple of template proteins (remember, the templates are bad because the low homology).
I have no problem extending the puzzle for a week. Phase 2 will include models from phase 1 to work on. I know this is a hard puzzle (I have been playing it too, and for the life of me I cant get my score that high…).
Several recent De-novo puzzles (1252,1243,1237,1231,1224) have been followed by
Predicted Contacts puzzles (1255,1246,1240/1240b,1234,1227), where Contact Maps
are predicted using co-evolution data. Will there be a Predicted Contacts puzzle
for 1258's protein as well?
The above sequence of 213 amino acids seems to be residues 82-294 in Fig.4 of http://jb.asm.org/content/194/10/2614.full.pdf that includes boxes B, C, D, and E.
Box B contains Ser94 & Ser96 while Box D contains Lys174.
p.2617 of the above article says "Amino acid alignments of LepB with SPaseI
of other bacterial species identified a short intracellular domain and a large
extracellular domain containing the conserved regions boxes B, C, D, and E. The
predicted catalytically active residues of the characteristic serine-lysine dyad
are located in box B and box D." p.2617 also says "These results indicate that
the Ser94, Ser96, and Lys174 residues are essential for LepB function."
p.2618 says "we hypothesize that Ser94 and Lys174 form the catalytic center of the
protein, while Ser96 likely stabilizes the interaction with the preprotein and the
catalytic serine residue". pp.2618-9 says "the active site is located on the outside
of the cytoplasmic membrane, making it relatively accessible for small molecules."
Finally, p.2614 says "Stepwise translocation of the preprotein across the membrane
is driven by SecA-mediated ATP hydrolysis. After translocation, LepB cleaves the
signal peptide from the preprotein, releasing the mature protein into the periplasm."
All these things make me think the protein in Puzzle 1258 is an extracellular one.
They also make me think Ser94 and Lys174 should be close to each other.