Foldit

Recipe: AA Edit 2.0.1

created by LociOiling

Like recipe 28 likes Add to Cookbook Clone

Profile

Name

AA Edit 2.0.1

ID

102879

Shared with

Public

Parent

AA Edit 1.2

Children

AA Edit 2.0.1 (COPY)

Created on

April 16, 2020 at 20:24 PM UTC

Updated on

April 16, 2020 at 20:24 PM UTC

Description

AA Edit displays the protein 's amino sequence using 1-character codes. For puzzles with "mutable" segments, it applies any updates entered to the protein. Version 2.0 handles multiple chains, and puzzles with DNA or RNA. Version 2.0.1 has a quick fix for proline at the N terminal.

Best for

Code

--[[ AA Edit - get and set primary structure The primary structure of a protein is the sequence of the amino acids that make up the protein. AA Edit displays the current primary structure as a sequence of single-character amino acid codes. (Similar codes are used for DNA and RNA, see "special cases" below.) The displayed value can be selected and cut or copied. The string of single-character codes is similar to the FASTA format accepted by many protein related tools. FASTA also allows for header information, which some tools, such as BLAST, require. If there are any "mutable" segments, the "Change" button is displayed, and a a new value can be pasted in. If there are no mutable segments, any input to the "seq" box is ignored. When the "Change" button is clicked, the currently displayed primary structure is applied to the protein. The input amino acid codes are converted to lower case. The recipe checks each amino acid code against the list of 20 amino acids used in Foldit. Any input not found in the list is ignored, and the corresponding segment is left unchanged. Some puzzles have a mix of mutable and non-mutable segments. The recipe does not attempt to change any non-mutable segments. If the structure list is longer than the protein, AA Edit discards the extra entries at the end of the list. If the structure list is shorter than the protein, AA Edit applies the list to the first *n* segments of the protein, where *n* is the length of the list. Any remaining segments are unchanged. All changes are written to the scriptlog. special cases ------------- Some puzzles contain two or more separate protein chains. The "insulin mutant" puzzle, which appears periodically as a revisiting puzzle, is an example. AA Edit detects the beginning and end of a protein chain by checking the atom count. Each chain is presented separately, identified by a chain id: "A", "B", "C", and so on. All the normal rules apply to each chain. Some puzzles have one or more ligands, each represented by a segment which returns "x" or "unk" for its amino acid type. This code and anything else not found in the normal list of 20 amino acids is changed to "x" for the purposes of this recipe. Segments with an "x" for their amino acid code in the replacment string are not changed. Each ligand is presented as a separate chain. Very rarely, Foldit puzzles may contain RNA or DNA. These are chains of nucleobases instead of amino acids. Each segment is one nucleobase. Foldit uses two-character codes for RNA and DNA. AA Edit translates these codes into single-character codes. The single-character codes are ambiguous, for example, RNA adenine is code "ra" in Foldit, and DNA adenine is "da". Both become "a" externally, which is also used for alanine in a protein. AA Edit treats each DNA or RNA section as a separate chain. This allows it to keep the ambiguous codes straight. The handling of RNA and DNA has only been tested for RNA. So far, RNA has only appeared on one science puzzle, and the RNA was not mutable in that puzzle. DNA has appeared only in intro puzzles, which don't allow recipes. It's possible that problems may appear if there are ever for-credit DNA or RNA puzzles again. An even rarer case was in found in puzzle 879, segment 134, and puzzle 1378b, segment 30, where an amino acid was modified by a glycan. The code "unk" was used for these modified amino acids, but they did not have the secondary structure code "M" used for ligands. A modified amino acid like this is treated as protein, and does not break the amino acid chain. See "AA Copy Paste Compare v 1.1.1 -- Brow42" for a full-function recipe that works with primary and primary structures. version 1.2 -- 2016/12/23 -- LociOiling * clone of PS Edit v1.2 * enable 1-step undo with undo.SetUndo ( false ) version 2.0 -- 2018/09/02 -- LociOiling * detect and report multiple chains * force filters on at beginning and end * handle DNA and RNA, use single-letter codes externally * refine scriptlog output, eliminate timing calls version 2.0.1 -- 2020/04/16 -- LociOiling * handle proline at N-terminal correctly ]]-- -- -- Globals -- Recipe = "AA Edit" Version = "2.0.1" ReVersion = Recipe .. " " .. Version mutable = false -- true if any mutable segments found AALONG = 1 AACODE = 2 -- redundant for proteins, needed for DNA and RNA AAATOM = 3 AATYPE = 4 -- -- amino acid names and abbeviations, -- third element is mid-chain atom count -- AANames = { a = { "alanine", "a", 10, "P", }, c = { "cysteine", "c", 11, "P", }, d = { "aspartate", "d", 12, "P", }, e = { "glutamate", "e", 15, "P", }, f = { "phenylalanine", "f", 20, "P", }, g = { "glycine", "g", 7, "P", }, h = { "histidine", "h", 17, "P", }, i = { "isoleucine", "i", 19, "P", }, k = { "lysine", "k", 22, "P", }, l = { "leucine", "l", 19, "P", }, m = { "methionine", "m", 17, "P", }, n = { "asparagine", "n", 14, "P", }, p = { "proline", "p", 15, "P", }, q = { "glutamine", "q", 17, "P", }, r = { "arginine", "r", 24, "P", }, s = { "serine", "s", 11, "P", }, t = { "threonine", "t", 14, "P", }, v = { "valine", "v", 16, "P", }, w = { "tryptophan", "w", 24, "P", }, y = { "tyrosine", "y", 21, "P", }, -- -- bonus! codes for ligands ("x" is common, but "unk" is historic) -- x = { "ligand", "x", 0, "M", }, unk = { "ligand", "x", 0, "M", }, -- -- bonus! RNA nucleotides -- ra = { "adenine", "a", 0, "R", }, rc = { "cytosine", "c", 0, "R", }, rg = { "guanine", "g", 0, "R", }, ru = { "uracil", "u", 0, "R", }, -- -- bonus! DNA nucleotides (as seen in PDB, not confirmed for Foldit) -- da = { "adenine", "a", 0, "D", }, dc = { "cytosine", "c", 0, "D", }, dg = { "guanine", "g", 0, "D", }, dt = { "thymine", "t", 0, "D", }, } AA_ATOM_MAX = 27 -- modified AA if over this count -- -- tables for converting external nucleobase codes to Foldit internal codes -- RNAin = { a = "ra", c = "rc", g = "rg", u = "ru", } DNAin = { a = "da", c = "dc", g = "dg", t = "dt", } Ctypes = { P = "protein", D = "DNA", R = "RNA", M = "ligand", } -- -- begin protNfo Beta package version 0.2a -- -- version 0.2a is packaged as a psuedo-class or psuedo-module -- containing a mix of data fields and functions -- -- all entries must be terminated with a comma to keep Lua happy -- -- the commas aren't necessary if only function definitions are present -- -- removed some items found in 0.1 not needed here, -- added N-terminal and C-terminal checks, first and last analysis -- -- this version depends on the external AANames table and associated codes, -- so still a work in progress -- -- version 0.2a contains a quick fix for proline at N-terminal -- -- need to reconcile this version with the more extensive version in print protein -- protNfo = { PROTEIN = "P", LIGAND = "M", RNA = "R", DNA = "D", UNKNOWN_AA = "x", UNKNOWN_BASE = "xx", CYSTEINE_AA = "c", PROLINE_AA = "p", aa = {}, -- amino acid codes ss = {}, -- secondary structure codes atom = {}, -- atom counts mute = {}, -- mutable flag ctype = {}, -- segment type - P, M, R, D first = {}, -- true if segment is first in chain last = {}, -- true if segment is last in chain nterm = {}, -- true if protein and if n-terminal cterm = {}, -- true if protein and if c-terminal fastac = {}, -- external code for FASTA-style output setNfo = function () local segCnt = structure.GetCount () -- -- initial scan: retrieve basic information from Foldit -- for ii = 1, segCnt do local nterm = false local cterm = false protNfo.aa [ #protNfo.aa + 1 ] = structure.GetAminoAcid ( ii ) protNfo.ss [ #protNfo.ss + 1 ] = structure.GetSecondaryStructure ( ii ) protNfo.atom [ #protNfo.atom + 1 ] = structure.GetAtomCount ( ii ) protNfo.mute [ #protNfo.mute + 1 ] = structure.IsMutable ( ii ) local aatab = AANames [ protNfo.aa [ ii ] ] if aatab ~= nil then protNfo.ctype [ #protNfo.ctype + 1 ] = aatab [ AATYPE ] -- -- special case for puzzles 879, 1378b, and similar -- -- if unknown amino acid, but secondary structure is not -- ligand, mark it as protein -- -- segment 134 in puzzle 879 is the example -- if protNfo.ctype [ ii ] == protNfo.LIGAND and protNfo.ss [ ii ] ~= protNfo.LIGAND then protNfo.ctype [ ii ] = protNfo.PROTEIN end else protNfo.ctype [ #protNfo.ctype + 1 ] = protNfo.LIGAND aa = protNfo.UNKNOWN_AA end -- -- for proteins, determine n-terminal and c-terminal -- based on atom count -- if protNfo.ctype [ ii ] == protNfo.PROTEIN then local ttyp = "" local noteable = false local ac = protNfo.atom [ ii ] -- actual atom count local act = aatab [ AAATOM ] -- reference mid-chain atom count if ac ~= act or ( protNfo.aa [ ii ] == protNfo.CYSTEINE_AA and ac == act ) then ttyp = "non-standard amino acid" if ac == act + 2 then ttyp = "N-terminal" nterm = true notable = true elseif ac == act + 1 then ttyp = "C-terminal" cterm = true notable = true elseif protNfo.aa [ ii ] == protNfo.PROLINE_AA and ac == act + 3 then ttyp = "N-terminal" nterm = true notable = true end if protNfo.aa [ ii ] == protNfo.CYSTEINE_AA then local ds = current.GetSegmentEnergySubscore ( ii, "Disulfides" ) -- print ( "cysteine at " .. ii .. ", disulfides score = " .. ds ) if ds ~= 0 and math.abs ( ds ) > 0.01 then nterm = false cterm = false ttyp = "disulfide bridge" if ac == act + 1 then ttyp = "N-terminal" nterm = true elseif ac == act then ttyp = "C-terminal" cterm = true end notable = true else ttyp = "unpaired cysteine" notable = false end end if notable then print ( ttyp .. " detected at segment " .. ii .. ", amino acid = \'" .. protNfo.aa [ ii ] .. "\', atom count = " .. ac .. ", reference count = " .. act .. ", secondary structure = " .. protNfo.ss [ ii ] ) end end end if protNfo.ctype [ ii ] == protNfo.LIGAND then print ( "ligand detected at segment " .. ii ) end protNfo.nterm [ #protNfo.nterm + 1 ] = nterm protNfo.cterm [ #protNfo.cterm + 1 ] = cterm protNfo.fastac [ #protNfo.fastac + 1 ] = aatab [ AACODE ] end -- -- rescan to determine first and last in chain for all types -- it's necessary to "peek" at neighbors for DNA and RNA -- for ii = 1, segCnt do local nterm = protNfo.nterm [ ii ] local cterm = protNfo.cterm [ ii ] local first = false local last = false if ii == 1 then first = true end if ii == segCnt then last = true end if protNfo.ctype [ ii ] == protNfo.PROTEIN then if protNfo.nterm [ ii ] then first = true end if protNfo.cterm [ ii ] then last = true end -- -- special case for puzzles 879, 1378b, and similar -- -- if modified AA ends or begins a chain, mark -- it as C-terminal or N-terminal -- -- hypothetical: no way to test so far! -- if AANames [ protNfo.aa [ ii ] ] [ AACODE ] == protNfo.UNKNOWN_AA then if ii > 1 and protNfo.ctype [ ii - 1 ] ~= protNfo.ctype [ ii ] then first = true protNfo.nterm [ ii ] = true print ( "non-standard amino acid at segment " .. ii .. " marked as N-terminal" ) end if ii < segCnt and protNfo.ctype [ ii + 1 ] ~= protNfo.ctype [ ii ] then last = true protNfo.cterm [ ii ] = true print ( "non-standard amino acid at segment " .. ii .. " marked as C-terminal" ) end end elseif protNfo.ctype [ ii ] == protNfo.DNA or protNfo.ctype [ ii ] == protNfo.RNA then if ii > 1 and protNfo.ctype [ ii - 1 ] ~= protNfo.ctype [ ii ] then first = true end if ii < segCnt and protNfo.ctype [ ii + 1 ] ~= protNfo.ctype [ ii ] then last = true end else -- ligand first = true last = true end protNfo.first [ #protNfo.first + 1 ] = first protNfo.last [ #protNfo.last + 1 ] = last end end, } -- -- end protNfo Beta package version 0.2 -- -- -- end of globals section -- function getChains () -- -- getChains - build a table of the chains found -- -- Most Foldit puzzles contain only a single protein (peptide) chain. -- A few puzzles contain ligands, and some puzzles have had two -- protein chains. Foldit puzzles may also contain RNA or DNA. -- -- For proteins, the atom count can be used to identify the first -- (N terminal) and last (C terminal) ends of the chain. The AANames -- table has the mid-chain atom counts for each amino acid. -- -- Cysteine is a special case, since the presence of a disulfide -- bridge also changes the atom count. -- -- For DNA and RNA, the beginning and end of the chain is determined -- by context at present. For example, if the previous segment was protein -- and this segment is DNA, it's the start of a chain. -- -- Each ligand is treated as a chain of its own, with a length of 1. -- -- chain table entries -- ------------------- -- -- ctype - chain type - "P" for protein, "M" for ligand, "R" for RNA, "D" for DNA -- fasta - FASTA-format sequence, single-letter codes (does not include FASTA header) -- fastab - "backup" of fasta -- start - Foldit segment number of sequence start -- stop - Foldit segment number of sequence end -- len - length of sequence -- chainid - chain id assigned to entry, "A", "B", "C", and so on -- -- For DNA and RNA, fasta and fastab contain single-letter codes, so "a" for adenine. -- The codes overlap the amino acid codes (for example, "a" for alanine). -- The DNA and RNA codes must be converted to the appropriate two-letter codes Foldit -- uses internally, for example "ra" for RNA adenine and "da" for DNA adenine. -- -- -- we're assuming Foldit won't ever have more chains -- local chainid = { "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z" } local chainz = {} local chindx = 0 local curchn = nil local segCnt = structure.GetCount () for ii = 1, segCnt do if protNfo.first [ ii ] then chindx = chindx + 1 chainz [ chindx ] = {} curchn = chainz [ chindx ] curchn.ctype = protNfo.ctype [ ii ] curchn.fasta = "" curchn.start = ii curchn.chainid = chainid [ chindx ] curchn.mute = 0 end curchn.fasta = curchn.fasta .. protNfo.fastac [ ii ] if protNfo.mute [ ii ] then curchn.mute = curchn.mute + 1 end if protNfo.last [ ii ] then curchn.stop = ii curchn.len = curchn.stop - ( curchn.start - 1 ) end end for ii = 1, #chainz do chainz [ ii ].fastab = chainz [ ii ].fasta end return chainz end function setChain ( chain ) local changes = 0 local errz = 0 local offset = chain.start - 1 local fastan = "" -- possibly changed chain for ii = 1, chain.stop - ( chain.start - 1 ) do local sType = chain.fasta:sub ( ii, ii ) local oType = chain.fastab:sub ( ii, ii ) -- -- for DNA and RNA, convert FASTA to Foldit -- if chain.ctype == protNfo.DNA then sType = DNAin [ sType ] if sType == nil then sType = protNfo.UNKNOWN_BASE end oType = DNAin [ oType ] if oType == nil then oType = protNfo.UNKNOWN_BASE end elseif chain.ctype == protNfo.RNA then sType = RNAin [ sType ] if sType == nil then sType = protNfo.UNKNOWN_BASE end oType = RNAin [ oType ] if oType == nil then oType = protNfo.UNKNOWN_BASE end end if sType ~= oType then local sName = AANames [ sType ] if sName ~= nil then if protNfo.mute [ ii + offset ] then structure.SetAminoAcid ( ii + offset, sType ) local newaa = structure.GetAminoAcid ( ii + offset ) if newaa == sType then changes = changes + 1 fastan = fastan .. AANames [ sType ] [ AACODE ] else print ( "segment " .. ii + offset .. " (" .. chain.chainid .. ":" .. ii .. ") mutation to type \"" .. sType .. "\" failed" ) errz = errz + 1 fastan = fastan .. AANames [ oType ] [ AACODE ] end else print ( "segment " .. ii + offset .. " (" .. chain.chainid .. ":" .. ii .. ") is not mutable, skipping change to type \"" .. sType .. "\"" ) errz = errz + 1 fastan = fastan .. AANames [ oType ] [ AACODE ] end else print ( "segment " .. ii + offset .. " (" .. chain.chainid .. ":" .. ii .. "), skipping invalid type \"" .. sType .. "\"" ) errz = errz + 1 fastan = fastan .. AANames [ oType ] [ AACODE ] end else fastan = fastan .. AANames [ oType ] [ AACODE ] end end chain.fasta = fastan chain.fastab = fastan return changes, errz end function GetParameters ( chnz ) local dlog = dialog.CreateDialog ( ReVersion ) dlog.sc0 = dialog.AddLabel ( "segment count = " .. structure.GetCount () ) local cwd = "chain" if #chnz > 1 then cwd = "chains" end dlog.chz = dialog.AddLabel ( #chnz .. " chains" ) for ii = 1, #chnz do local chain = chnz [ ii ] dlog [ "chn" .. ii .. "l1" ] = dialog.AddLabel ( "Chain " .. chain.chainid .. " (" .. Ctypes [ chnz [ ii ].ctype ] .. ")" ) dlog [ "chn" .. ii .. "l2" ] = dialog.AddLabel ( "segments " .. chain.start .. "-" .. chain.stop .. ", mutables = " .. chain.mute .. ", length = " .. chain.len ) dlog [ "chn" .. ii .. "ps" ] = dialog.AddTextbox ( "seq", chain.fasta ) end dlog.u0 = dialog.AddLabel ( "" ) if mutable then dlog.u1 = dialog.AddLabel ( "Usage: click in text box, " ) dlog.u2 = dialog.AddLabel ( "then use select all and copy, cut, or paste" ) dlog.u3 = dialog.AddLabel ( "to save or change primary structure" ) else dlog.u1 = dialog.AddLabel ( "Usage: click in text box," ) dlog.u2 = dialog.AddLabel ( "then use select all and copy" ) dlog.u3 = dialog.AddLabel ( "to save primary structure" ) end dlog.w0 = dialog.AddLabel ( "" ) if mutable then dlog.w1 = dialog.AddLabel ( "Windows: ctrl + a = select all" ) dlog.w2 = dialog.AddLabel ( "Windows: ctrl + x = cut" ) dlog.w3 = dialog.AddLabel ( "Windows: ctrl + c = copy" ) dlog.w4 = dialog.AddLabel ( "Windows: ctrl + v = paste" ) else dlog.w1 = dialog.AddLabel ( "Windows: ctrl + a = select all" ) dlog.w3 = dialog.AddLabel ( "Windows: ctrl + c = copy" ) end dlog.z0 = dialog.AddLabel ( "" ) if mutable then dlog.ok = dialog.AddButton ( "Change" , 1 ) end dlog.exit = dialog.AddButton ( "Exit" , 0 ) if ( dialog.Show ( dlog ) > 0 ) then for ii = 1, #chnz do chnz [ ii ].fasta = ( dlog [ "chn" .. ii .. "ps" ].value:lower ()):sub ( 1, chnz [ ii ].len ) end return true else return false end end function main () print ( ReVersion ) print ( "Puzzle: " .. puzzle.GetName () ) print ( "Track: " .. ui.GetTrackName () ) undo.SetUndo ( false ) protNfo.setNfo () for ii = 1, structure.GetCount () do if protNfo.mute [ ii ] == true then mutable = true break end end local changeNum = 0 local chnTbl = {} -- chains as table of tables chnTbl = getChains () print ( #chnTbl .. " chains" ) for ii = 1, #chnTbl do local chain = chnTbl [ ii ] if chain.stop == nil then chain.stop = 999999 end print ( "chain " .. chain.chainid .. ", start = " .. chain.start .. ", end = " .. chain.stop ) print ( chain.fasta ) end while GetParameters ( chnTbl ) do for ii = 1, #chnTbl do local chain = chnTbl [ ii ] if chain.fasta ~= chain.fastab then print ( "--" ) print ( "chain " .. chain.chainid .. " changed" ) local old = chain.fastab changeNum = changeNum + 1 local start_time = os.time () behavior.SetFiltersDisabled ( true ) local sChg, sErr = setChain ( chnTbl [ ii ] ) behavior.SetFiltersDisabled ( false ) print ( "segments changed = " .. sChg .. ", errors = " .. sErr ) print ( "old chain " .. chain.chainid .. ": " ) print ( old ) print ( "new chain " .. chain.chainid .. ": " ) print ( chain.fastab ) end end end cleanup () end function cleanup ( errmsg ) -- -- do not loop if cleanup causes an error -- if CLEANUPENTRY ~= nil then return end CLEANUPENTRY = true print ( "---" ) -- -- model 100 - print recipe name, puzzle, track, time, score, and gain -- local reason local start, stop, line, msg if errmsg == nil then reason = "complete" else -- -- model 120 - civilized errmsg reporting, -- thanks to Bruno K. and Jean-Bob -- start, stop, line, msg = errmsg:find ( ":(%d+):%s()" ) if msg ~= nil then errmsg = errmsg:sub ( msg, #errmsg ) end if errmsg:find ( "Cancelled" ) ~= nil then reason = "cancelled" else reason = "error" end end print ( ReVersion .. " " .. reason ) print ( "Puzzle: " .. puzzle.GetName () ) print ( "Track: " .. ui.GetTrackName () ) if reason == "error" then print ( "Unexpected error detected" ) print ( "Error line: " .. line ) print ( "Error: \"" .. errmsg .. "\"" ) end behavior.SetFiltersDisabled ( false ) end xpcall ( main, cleanup )

Comments

LociOiling Lv 1

September 03, 2018

AA Edit 2.0 now displays each protein chain or non-protein section separately, making it easier to use with the PDB and other "real scientist" tools. There are several other minor changes, such as displaying the segment count and revisions to the scriptlog output, plus new support for RNA, DNA, and other rare puzzle features. See "Version 2.0 changes" below for more detail.

As before, AA Edit displays the primary structure of the protein as a sequence of single-character amino acid codes. The sequence can be copied and pasted using the normal keyboard shortcuts. (For example, on Windows, ctrl-c for copy, and ctrl-v for paste.)

For proteins which contain "mutable" segments, a new sequence can be pasted into the "seq" box. When you click the "change" button, the recipe attempts to mutate each segment to the new value entered.

For puzzles with multiple chains, there will be chain A, chain B, and so on. The sequence for each chain can be cut and pasted separately.

The recipe does not attempt to change non-mutable segments. It does not change any segments where the current amino acid is the same as the new amino acid. It does not change any segments where the new amino acid code is not one of the 20 amino acids recognized by Foldit.

In puzzles with ligands, the ligand may be represented by one or more segments which return amino acid type "unk" instead of one of the normal codes. The recipe changes "unk" to "x" for display purposes. Since "x" is also not one of the recognized Foldit codes, it's ignored when the change is applied.

The "change" button is not displayed for proteins with no mutable segments, and any changes to the "seq" box are ignored in this case.

The general behavior of this recipe is similar to the companion recipe, SS Edit 1.2. If the input sequence is shorter than the length of the protein, the segments at the end of the protein are left unchanged. If the input sequence is longer than the protein, the extra portion of the input is ignored.

A new sequence can be pasted in its entirety by clicking to the left of the first character of the displayed sequence, then typing ctrl-v or the equivalent "paste" shortcut in your environment.

Alternately, you can clear the input field (via ctrl-x or the equivalent) then copy and paste the new sequence on a "blank slate".

The recipe also references the "select all" shortcut (ctrl-a or the equivalent). In general, this does not seem necessary in a Foldit textbox. For example, clicking anywhere the textbox and typing ctrl-c copies the entire contents of the textbox. Clicking anywhere in the textbox and typing ctrl-x clears the entire textbox. Use "select all" if your environment exhibits different behavior. Thanks to Bertro for pointing out the nuances of copy-and-paste in Foldit.

See also print protein 2.8 for a recipe which captures more information about the protein.

Version 2.0 changes

As mentioned above, handling multiple chains is the key feature in version 2.0.

A few puzzles have had move than one protein chain. Examples include the "insulin mutant", periodically seen as a revisiting puzzle, and the recent hydrogen bond puzzles 1552, 1555, and 1564.

AA Edit now displays each chain separately, so you'll see chain A and chain B for the puzzles above. Most Foldit puzzles have only one chain, which will be displayed as chain A.

For clarity, each ligand is now treated as a separate chain. On a typical ligand puzzle, you'll see protein chain A and ligand chain B.

AA Edit now also handles DNA and RNA, although testing has been very limited. Each section of DNA or RNA is treated as a separate chain. The single-character codes are displayed. We've only had one RNA puzzle, 1463, and it was all RNA (no protein), and did not allow mutations. Although there are mutable DNA intro puzzles, they don't allow recipes. Bugs may emerge if we ever get more DNA or RNA puzzles, but at least AA Edit won't display all x's any more.

AA Edit now handles non-standard amino acids, such as the ones found in puzzles 879 and 1378b. Both these puzzles had one amino acid with an attached glycan. These AAs were indicated by the code "unk", but unlike a ligand, had a normal secondary structure code (H, E, or L). AA Edit recognizes these non-standard AAs, and treats them as a part of the protein chain.

chain detection

AA Edit uses at least three different methods to detect chains.

For proteins, the atom count can be used to detect the ends of the chain. Since a disulfide bridge can also change the atom count, the recipe includes some disulfide logic, but it doesn't try to determine which cysteines are part of a bridge.

The secondary structure code is another indicator, with type "M" (for molecule) indicating a ligand. Each ligand is treated as a one-segment chain.

Foldit uses special codes for RNA and DNA. For example, "ra" indicates RNA adenine, and "da" presumably would mean DNA adenine. These codes are the third way AA Edit detects a different chain.

The current chain logic may not be 100%. For example, a glycosylated (had to look that one up) or other non-standard amino acid at the beginning or end of a chain might be an issue. There's also no known way to detect the case of two separate but adjacent RNA or DNA chains. Based on the limited information available, it seems like the atom count method won't work for RNA or DNA.

The code in AA Edit 2.0 is a work in progress, but may be useful in other recipes. The next step is to add the new logic to the "print protein" family.

[Edit: that's "insulin mutant"…]

LociOiling Lv 1

April 16, 2020

Version 2.0.1 fixes chain detection when proline is at the N terminal. In this case, the atom count is 18, instead of 15 when proline is mid-chain. Usually the atom count increases by 2 at the N terminal.

(Edit: mid-chain count is 15.)

WBarme1234 Lv 1

November 02, 2022

Problem seen (Puzzle 2217) with a ligand:
Error line: 594
Error: "attempt to concatenate field 'len' (a nil value)"