Icon representing a recipe

Recipe: NetSurfP 1.1

created by LociOiling

Profile


Name
NetSurfP 1.1
ID
102282
Shared with
Public
Parent
NetSurfP 1.0
Children
Created on
January 14, 2017 at 09:01 AM UTC
Updated on
January 14, 2017 at 09:01 AM UTC
Description

Convert NetSurfP webpage output into secondary structure prediction and copy-and-paste spreadsheet format. V1.1 handles blank lines gracefully and adds a confidence prediction.

Best for


Code


--[[ NetSurfP - convert NetSurfP format NetSurfP, www.cbs.dtu.dk/services/NetSurfP/, takes a primary sequence and outputs predicted surface accessibility and secondary structure. "Surface accessibility" seems to be more or less the inverse of what's called "predicted residue burial" in Foldit. It's the chances that a given residue will be on the outside of a protein. NetSurfP outputs most of its values as probabilities, and it uses a columnar format. Unfortunately, the formatting used on the output page does not lend itself to being copied and pasted into a spreadsheet. This recipe converts the columnar format of NetSurfP output to a tab-delimited format which can be copied and pasted into a spreadsheet. The recipe also attempts to create a Foldit secondary structure string from the NetSurfP probabilities. version 1.0 -- 2017/01/08 -- LociOiling * new recipe version 1.1 -- 2017/01/14 -- LociOiling * ignore blank lines in input * add confidence prediction ]]-- -- -- Globals -- Recipe = "NetSurfP" Version = "1.1" ReVersion = Recipe .. " v." .. Version -- -- end of globals section -- function NSPReader ( nspentry ) local linecnt = 0 local comments = 0 local unknown = 0 local louts = "" -- whole shebang in spreadsheet format local ssp = "" -- ss prediction local ssc = "" -- ss confidence -- -- manifest constants for column positions, -- change these if NetSurfP format changes -- local PHELIX = 8 local PSHEET = 9 local PLOOP = 10 -- -- column header, also needs attention in input format changes -- local cHead = "\"burial\"\t\"aa\"\t\"seqnam\"\t\"segnum\"\t\"rsa\"\t\"absaccess\"\t\"zFit\"\t\"pHelix\"\t\"pSheet\"\t\"pLoop\"\n" for line in nspentry:gmatch ( "(.-)[\n*\r*]" ) do if line ~= nil and line:len () > 0 then local pHelix = 0 local pSheet = 0 local pLoop = 0 if line:match ( "#" ) then comments = comments + 1 else local lout = "" local col = 0 for toke in line:gmatch ( "[%S]+" ) do if lout:len () > 0 then lout = lout .. "\t" end if tonumber ( toke ) == nil then lout = lout .. "\"" .. toke .. "\"" else lout = lout .. toke end col = col + 1 if col == PHELIX then pHelix = toke elseif col == PSHEET then pSheet = toke elseif col == PLOOP then pLoop = toke end end if col > 0 then lout = lout .. "\n" if louts:len () == 0 then louts = louts .. cHead end louts = louts .. lout -- -- pick highest probability for secondary structure -- local pred = "L" local prob = pLoop if pHelix > pLoop then if pHelix > pSheet then pred = "H" prob = pHelix else pred = "E" prob = pSheet end else if pSheet > pLoop then pred = "E" prob = pSheet end end ssp = ssp .. pred if tonumber ( prob ) < 1.0 then ssc = ssc .. string.sub ( tostring ( prob * 10 ), 1, 1 ) else ssc = ssc .. "9" -- should never occur end end end end linecnt = linecnt + 1 end print ( "number of lines = " .. linecnt ) print ( "number of comments = " .. comments ) return louts, ssp, ssc end function GetNetSurfP () local dlog = dialog.CreateDialog ( ReVersion ) local tab = "" dlog.tab0 = dialog.AddLabel ( "NetSurfP Output" ) dlog.tab = dialog.AddTextbox ( "output", tab ) dlog.u0 = dialog.AddLabel ( "" ) dlog.u1 = dialog.AddLabel ( "Usage: copy the NetSurfP output for a sequence" ) dlog.u2 = dialog.AddLabel ( "and paste into the output box" ) dlog.w0 = dialog.AddLabel ( "" ) dlog.ok = dialog.AddButton ( "OK" , 1 ) dlog.exit = dialog.AddButton ( "Exit" , 0 ) if ( dialog.Show ( dlog ) > 0 ) then tab = dlog.tab.value return tab else return "" end return tab end function ShowResults ( csv, ssp, ssc ) local dlog = dialog.CreateDialog ( ReVersion ) dlog.tab0 = dialog.AddLabel ( "NetSurfP Reformatted Output" ) dlog.csv = dialog.AddTextbox ( "csv", csv ) dlog.ssp = dialog.AddTextbox ( "SS pred", ssp ) dlog.ssc = dialog.AddTextbox ( "SS conf", ssc ) dlog.u0 = dialog.AddLabel ( "" ) dlog.u1 = dialog.AddLabel ( "csv is \"comma separated values\" for spreadsheet" ) dlog.u2 = dialog.AddLabel ( "SS pred is secondary structure prediction" ) dlog.u3 = dialog.AddLabel ( "SS conf is prediction confidence, 1 low, 9 high" ) dlog.w0 = dialog.AddLabel ( "" ) dlog.u1 = dialog.AddLabel ( "Usage: use select all and copy, cut, or paste" ) dlog.u2 = dialog.AddLabel ( "to save or change secondary structure" ) dlog.w0 = dialog.AddLabel ( "" ) dlog.w1 = dialog.AddLabel ( "Windows: ctrl + a = select all" ) dlog.w2 = dialog.AddLabel ( "Windows: ctrl + x = cut" ) dlog.w3 = dialog.AddLabel ( "Windows: ctrl + c = copy" ) dlog.w4 = dialog.AddLabel ( "Windows: ctrl + v = paste" ) dlog.z0 = dialog.AddLabel ( "" ) dlog.ok = dialog.AddButton ( "OK" , 1 ) dialog.Show ( dlog ) end function main () print ( ReVersion ) print ( "Puzzle: " .. puzzle.GetName () ) print ( "Track: " .. ui.GetTrackName () ) local nsp = "" nsp = GetNetSurfP () if nsp:len () > 0 then local csv = "" local ssp = "" local ssc = "" csv, ssp, ssc = NSPReader ( nsp ) if csv ~= nil and csv:len () > 0 and ssp ~= nil and ssp:len () > 0 then ShowResults ( csv, ssp, ssc ) print ( "---spreadsheet format---" ) print ( csv ) print ( "---secondary structure prediction---" ) print ( ssp ) print ( "---prediction confidence---" ) print ( ssc ) else print ( "no results, input format may be wrong" ) end end cleanup () end function cleanup ( errmsg ) -- -- optionally, do not loop if cleanup causes an error -- (any loop here is automatically terminated after a few iterations, however) -- if CLEANUPENTRY ~= nil then return end CLEANUPENTRY = true print ( "---" ) -- -- model 100 - print recipe name, puzzle, track, time, score, and gain -- local reason local start, stop, line, msg if errmsg == nil then reason = "complete" else -- -- model 120 - civilized errmsg reporting, -- thanks to Bruno K. and Jean-Bob -- start, stop, line, msg = errmsg:find ( ":(%d+):%s()" ) if msg ~= nil then errmsg = errmsg:sub ( msg, #errmsg ) end if errmsg:find ( "Cancelled" ) ~= nil then reason = "cancelled" else reason = "error" end end print ( ReVersion .. " " .. reason ) print ( "Puzzle: " .. puzzle.GetName () ) print ( "Track: " .. ui.GetTrackName () ) if reason == "error" then print ( "Unexpected error detected" ) print ( "Error line: " .. line ) print ( "Error: \"" .. errmsg .. "\"" ) end end xpcall ( main, cleanup )

Comments


LociOiling Lv 1

NetSurfP 1.1 reformats the secondary structure prediction from NetSurfP to make it more useful to Foldit players.

NetSurfP 1.1 adds secondary prediction confidence to the output of NetSurfP 1.0. For each segment in the input, the confidence ranges from 0 to 9, with 0 being low confidence. The confidence is simply the first digit of the probability of the winning structure prediction for that segment, so 0.994 gives confidence "9", and 0.590 gives "5".

Sample scriptlog output:

---secondary structure prediction---
LHHHHHHHHHHHHHHLLLLLLLHHHHHHHHHHHHHHLLLLLEEEELLLLEEEEEELLLLLLLLLL
---prediction confidence---
96899999999988758888558899999999987668996688758865678856798666789

NetSurfP 1.1 also ignores blank lines in the input. These blank lines may occur when copying from the NetSurfP output page.

See NetSurfP 1.0 for usage.