Icon representing a recipe

Recipe: NetSurfP 1.2

created by LociOiling

Profile


Name
NetSurfP 1.2
ID
102397
Shared with
Public
Parent
NetSurfP 1.1
Children
None
Created on
June 27, 2017 at 20:08 PM UTC
Updated on
June 27, 2017 at 20:08 PM UTC
Description

Convert NetSurfP webpage output into secondary structure prediction and copy-and-paste spreadsheet format. V1.1 handles blank lines gracefully and adds a confidence prediction. V 1.2 lets you copy-and-paste the entire NetSurfP page, and handles missing CRLF or newline on the last data line.

Best for


Code


--[[ NetSurfP - convert NetSurfP format NetSurfP, www.cbs.dtu.dk/services/NetSurfP/, takes a primary sequence and outputs predicted surface accessibility and secondary structure. "Surface accessibility" seems to be more or less the inverse of what's called "predicted residue burial" in Foldit. It's the chances that a given residue will be on the outside of a protein. NetSurfP outputs most of its values as probabilities, and it uses a columnar format. Unfortunately, the formatting used on the output page does not lend itself to being copied and pasted into a spreadsheet. This recipe converts the columnar format of NetSurfP output to a tab-delimited format which can be copied and pasted into a spreadsheet. The recipe also attempts to create a Foldit secondary structure string from the NetSurfP probabilities. version 1.0 -- 2017/01/08 -- LociOiling * new recipe version 1.1 -- 2017/01/14 -- LociOiling * ignore blank lines in input * add confidence prediction version 1.2 -- 2017/06/27 -- LociOiling * handle missing newline or CRLF on last line * handle copy of entire page ]]-- -- -- Globals -- Recipe = "NetSurfP" Version = "1.2" ReVersion = Recipe .. " v." .. Version -- -- end of globals section -- function NSPReader ( nspentry ) local linecnt = 0 local comments = 0 local unknown = 0 local louts = "" -- whole shebang in spreadsheet format local ssp = "" -- ss prediction local ssc = "" -- ss confidence -- -- manifest constants for column positions, -- change these if NetSurfP format changes -- local PHELIX = 8 local PSHEET = 9 local PLOOP = 10 -- -- column header, also needs attention in input format changes -- local cHead = "\"burial\"\t\"aa\"\t\"seqnam\"\t\"segnum\"\t\"rsa\"\t\"absaccess\"\t\"zFit\"\t\"pHelix\"\t\"pSheet\"\t\"pLoop\"\n" for line in nspentry:gmatch ( "(.-)[\n*\r*]" ) do if line ~= nil and line:len () > 0 then local pHelix = 0 local pSheet = 0 local pLoop = 0 if line:match ( "#" ) or line:len() < 78 then comments = comments + 1 else local lout = "" local col = 0 for toke in line:gmatch ( "[%S]+" ) do if lout:len () > 0 then lout = lout .. "\t" end if tonumber ( toke ) == nil then lout = lout .. "\"" .. toke .. "\"" else lout = lout .. toke end col = col + 1 if tonumber ( toke ) ~= nil then if col == PHELIX then pHelix = toke elseif col == PSHEET then pSheet = toke elseif col == PLOOP then pLoop = toke end end end if col > 0 then lout = lout .. "\n" if louts:len () == 0 then louts = louts .. cHead end louts = louts .. lout -- -- pick highest probability for secondary structure -- local pred = "L" local prob = pLoop if pHelix > pLoop then if pHelix > pSheet then pred = "H" prob = pHelix else pred = "E" prob = pSheet end else if pSheet > pLoop then pred = "E" prob = pSheet end end ssp = ssp .. pred if tonumber ( prob ) < 1.0 then ssc = ssc .. string.sub ( tostring ( prob * 10 ), 1, 1 ) else ssc = ssc .. "9" -- should never occur end end end end linecnt = linecnt + 1 end print ( "number of lines = " .. linecnt ) print ( "number of comments = " .. comments ) return louts, ssp, ssc end function GetNetSurfP () local dlog = dialog.CreateDialog ( ReVersion ) local tab = "" dlog.tab0 = dialog.AddLabel ( "NetSurfP Output" ) dlog.tab = dialog.AddTextbox ( "output", tab ) dlog.u0 = dialog.AddLabel ( "" ) dlog.u1 = dialog.AddLabel ( "Usage: copy the NetSurfP output for a sequence" ) dlog.u2 = dialog.AddLabel ( "and paste into the output box" ) dlog.w0 = dialog.AddLabel ( "" ) dlog.ok = dialog.AddButton ( "OK" , 1 ) dlog.exit = dialog.AddButton ( "Exit" , 0 ) if ( dialog.Show ( dlog ) > 0 ) then tab = dlog.tab.value return tab .. "\n" else return "" end return tab end function ShowResults ( csv, ssp, ssc ) local dlog = dialog.CreateDialog ( ReVersion ) dlog.tab0 = dialog.AddLabel ( "NetSurfP Reformatted Output" ) dlog.lines = dialog.AddLabel ( "segments = " .. ssp:len () ) dlog.sp1 = dialog.AddLabel ( "" ) dlog.csv = dialog.AddTextbox ( "csv", csv ) dlog.ssp = dialog.AddTextbox ( "SS pred", ssp ) dlog.ssc = dialog.AddTextbox ( "SS conf", ssc ) dlog.u0 = dialog.AddLabel ( "" ) dlog.u1 = dialog.AddLabel ( "csv is \"comma separated values\" for spreadsheet" ) dlog.u2 = dialog.AddLabel ( "SS pred is secondary structure prediction" ) dlog.u3 = dialog.AddLabel ( "SS conf is prediction confidence, 1 low, 9 high" ) dlog.w0 = dialog.AddLabel ( "" ) dlog.u1 = dialog.AddLabel ( "Usage: use select all and copy, cut, or paste" ) dlog.u2 = dialog.AddLabel ( "to save or change secondary structure" ) dlog.w0 = dialog.AddLabel ( "" ) dlog.w1 = dialog.AddLabel ( "Windows: ctrl + a = select all" ) dlog.w2 = dialog.AddLabel ( "Windows: ctrl + x = cut" ) dlog.w3 = dialog.AddLabel ( "Windows: ctrl + c = copy" ) dlog.w4 = dialog.AddLabel ( "Windows: ctrl + v = paste" ) dlog.z0 = dialog.AddLabel ( "" ) dlog.ok = dialog.AddButton ( "OK" , 1 ) dialog.Show ( dlog ) end function main () print ( ReVersion ) print ( "Puzzle: " .. puzzle.GetName () ) print ( "Track: " .. ui.GetTrackName () ) local nsp = "" nsp = GetNetSurfP () print ( "input length = " .. nsp:len () ) if nsp:len () > 0 then local csv = "" local ssp = "" local ssc = "" csv, ssp, ssc = NSPReader ( nsp ) if csv ~= nil and csv:len () > 0 and ssp ~= nil and ssp:len () > 0 then ShowResults ( csv, ssp, ssc ) print ( "---spreadsheet format---" ) print ( csv ) print ( "---secondary structure prediction---" ) print ( ssp ) print ( "---prediction confidence---" ) print ( ssc ) else print ( "no results, input format may be wrong" ) end end cleanup () end function cleanup ( errmsg ) -- -- optionally, do not loop if cleanup causes an error -- (any loop here is automatically terminated after a few iterations, however) -- if CLEANUPENTRY ~= nil then return end CLEANUPENTRY = true print ( "---" ) -- -- model 100 - print recipe name, puzzle, track, time, score, and gain -- local reason local start, stop, line, msg if errmsg == nil then reason = "complete" else -- -- model 120 - civilized errmsg reporting, -- thanks to Bruno K. and Jean-Bob -- start, stop, line, msg = errmsg:find ( ":(%d+):%s()" ) if msg ~= nil then errmsg = errmsg:sub ( msg, #errmsg ) end if errmsg:find ( "Cancelled" ) ~= nil then reason = "cancelled" else reason = "error" end end print ( ReVersion .. " " .. reason ) print ( "Puzzle: " .. puzzle.GetName () ) print ( "Track: " .. ui.GetTrackName () ) if reason == "error" then print ( "Unexpected error detected" ) print ( "Error line: " .. line ) print ( "Error: \"" .. errmsg .. "\"" ) end end xpcall ( main, cleanup )

Comments


LociOiling Lv 1

Version 1.2 of the NetSurfP recipe lets you copy the entire NetSurfP page and paste it into the recipe. Previous versions expected only the "data" part of the page, and were a bit sensitive at that. Thanks to Susume for suggesting the improvements.

NetSurfP is yet another web-based secondary structure prediction service. (JPred is another.)

NetSurfP outputs its results in a columnar format. The predictions for helix, sheet, and loop are expressed as probabilities.

(NetSurfP also predicts the "surface accessibility" of a given residue, which seem to be more or less the inverse of the likelihood the residue is buried in the hydrophobic core.)

The formatting for the NetSurfP results doesn't lend itself to being pasted directly into a spreadsheet.

This recipe does three things. First, it converts the NetSurfP output to a tab-delimited format that can be pasted into a spreadsheet. Second, it creates a secondary structure string. Third, it creates a confidence prediction string. For each segment in the input, the confidence ranges from 0 to 9, with 0 being low confidence.

The secondary structure string can be copied and pasted into SS Edit 1.2 to change the secondary structure of your protein.

To use NetSurfP 1.2, run a NetSurfP prediction. NetSurfP needs the primary structure of the protein as input, as a string of one-character amino acid codes. You can use print protein 2.4 or AA Edit 1.2 to get the required primary structure string.

Once NetSurfP completes its prediction, copy the output to the clipboard. Using NetSurfP 1.2, you can simply copy (control + a or the equivalent) the NetSurfP results.

Start the recipe, and paste the NetSurfP output into the textbox on the first screen. When you click OK, the secondary screen displays three text boxes, one containing the spreadsheet output, one containing the secondary structure string, and one containing the confidence string. The contents of the textboxes can be copied and pasted, and they also appear in the recipe's scriptlog.

The secondary structure string is created by picking the secondary structure type with the highest probability for each segment. The picking logic is quite simple, and doesn't worry about ties or close finishes.

The confidence string is simply the first digit of the probability of the winning structure prediction for each segment, so 0.994 gives confidence "9", and 0.590 gives "5".

This recipe depends heavily on the NetSurfP output format. Any changes to NetSurfP output may require revisions to the recipe.

Sample NetSurfP output:

# For publication of results, please cite:
# A generic method for assignment of reliability scores applied to solvent accessibility predictions.
# Bent Petersen, Thomas Nordahl Petersen, Pernille Andersen, Morten Nielsen and Claus Lundegaard
# BMC Structural Biology 2009, 9:51 doi:10.1186/1472-6807-9-51
#
# Column 1: Class assignment - B for buried or E for Exposed - Threshold: 25% exposure, but not based on RSA
# Column 2: Amino acid
# Column 3: Sequence name
# Column 4: Amino acid number
# Column 5: Relative Surface Accessibility - RSA
# Column 6: Absolute Surface Accessibility
# Column 7: Z-fit score for RSA prediction
# Column 8: Probability for Alpha-Helix
# Column 9: Probability for Beta-strand
# Column 10: Probability for Coil
E T  Sequence               1    0.865 120.003   0.476   0.003   0.003   0.994
E E  Sequence               2    0.758 132.423   0.324   0.694   0.003   0.303
E E  Sequence               3    0.741 129.488   0.588   0.782   0.003   0.216
E R  Sequence               4    0.409  93.707   0.281   0.858   0.002   0.139
E K  Sequence               5    0.380  78.063   0.459   0.923   0.002   0.076
E K  Sequence               6    0.597 122.844   1.114   0.938   0.007   0.055
E E  Sequence               7    0.609 106.340   1.316   0.970   0.001   0.030
B I  Sequence               8    0.066  12.284  -0.022   0.970   0.001   0.030
E Q  Sequence               9    0.436  77.941   0.867   0.970   0.001   0.030
E K  Sequence              10    0.613 126.012   1.216   0.970   0.001   0.030
...

Sample scriptlog output:

---secondary structure prediction---
LHHHHHHHHHHHHHHLLLLLLLHHHHHHHHHHHHHHLLLLLEEEELLLLEEEEEELLLLLLLLLL
---prediction confidence---
96899999999988758888558899999999987668996688758865678856798666789