Making AlphaFold's sequence_data files more useful

Started by jeff101

jeff101 Lv 1

I noticed that when I run an AlphaFold calculation, files about my calculation get stored on my computer in a folder called sequence_data. sequence_data seems to contain many folders, each one for a different AlphaFold calculation. These folders have strange names like 29ba81c431f732ab2e5860e23b0fa94c that don't mean much to me. Each folder does however have a date modified, and these dates give some clue about what puzzle each AlphaFold calculation was done in. For one of my Foldit clients, these folders have dates ranging from Dec 18 2022 to Jan 24 2023.

Inside each folder, like the one called 29ba81c431f732ab2e5860e23b0fa94c, there seem to be 3 files: 1.alphafold.distogram, 1.alphafold.metadata, and 1.alphafold.pdb. Each of these 1.alphafold.* files contains the amino acid sequence for its AlphaFold calculation (two of them list the sequence as an easy-to-use one-line string of single-letter amino acid codes, but the *.pdb file lists the sequence in a long column, often 100's of lines long). Also, the *.metadata file lists the variable "lddt_confidence", which I suspect is the % confidence AlphaFold gave for the sequence listed in this file.

Below are some suggestions that I think will make these files more useful:

In each file, using the format used for comments in such files, list the date the calculation was done, what puzzle name & number it was done in, the name/title that we saved for each sequence/calculation that showed in the AlphaFold menu within the Foldit client, the amino acid sequence used (given as a string of one-letter amino acid codes on a single line in the file), and the confidence score AlphaFold gave for it. All these things will let us use the folders and files in the sequence_data folder to keep track of results from previous puzzles. They will also save us some book-keeping efforts and make it easier to find, for example, the confidence score or sequence for a particular calculation (if we used a meaningful title for it within the AlphaFold menu in the Foldit client). I can imagine folks writing/using non-Foldit external programs to search or examine the contents of the sequence_data folder.