Upload 20 files
Browse files- .gitattributes +10 -0
- Tesei-trained_Model/.DS_Store +0 -0
- Tesei-trained_Model/OBfmt_5-1500.npy +3 -0
- Tesei-trained_Model/README.txt +34 -0
- Tesei-trained_Model/Tesei_w2_Ree_preds.csv +0 -0
- Tesei-trained_Model/exper_seqs_master.csv +65 -0
- Tesei-trained_Model/exper_seqs_w2preds.csv +65 -0
- Tesei-trained_Model/extract_Rg.py +75 -0
- Tesei-trained_Model/extract_w2.py +123 -0
- Tesei-trained_Model/theory_functions.py +303 -0
- Tesei-trained_Model/weights/.DS_Store +0 -0
- Tesei-trained_Model/weights/weights_0.best.hdf5 +3 -0
- Tesei-trained_Model/weights/weights_1.best.hdf5 +3 -0
- Tesei-trained_Model/weights/weights_2.best.hdf5 +3 -0
- Tesei-trained_Model/weights/weights_3.best.hdf5 +3 -0
- Tesei-trained_Model/weights/weights_4.best.hdf5 +3 -0
- Tesei-trained_Model/weights/weights_5.best.hdf5 +3 -0
- Tesei-trained_Model/weights/weights_6.best.hdf5 +3 -0
- Tesei-trained_Model/weights/weights_7.best.hdf5 +3 -0
- Tesei-trained_Model/weights/weights_8.best.hdf5 +3 -0
- Tesei-trained_Model/weights/weights_9.best.hdf5 +3 -0
.gitattributes
CHANGED
|
@@ -33,3 +33,13 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
Tesei-trained_Model/weights/weights_0.best.hdf5 filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
Tesei-trained_Model/weights/weights_1.best.hdf5 filter=lfs diff=lfs merge=lfs -text
|
| 38 |
+
Tesei-trained_Model/weights/weights_2.best.hdf5 filter=lfs diff=lfs merge=lfs -text
|
| 39 |
+
Tesei-trained_Model/weights/weights_3.best.hdf5 filter=lfs diff=lfs merge=lfs -text
|
| 40 |
+
Tesei-trained_Model/weights/weights_4.best.hdf5 filter=lfs diff=lfs merge=lfs -text
|
| 41 |
+
Tesei-trained_Model/weights/weights_5.best.hdf5 filter=lfs diff=lfs merge=lfs -text
|
| 42 |
+
Tesei-trained_Model/weights/weights_6.best.hdf5 filter=lfs diff=lfs merge=lfs -text
|
| 43 |
+
Tesei-trained_Model/weights/weights_7.best.hdf5 filter=lfs diff=lfs merge=lfs -text
|
| 44 |
+
Tesei-trained_Model/weights/weights_8.best.hdf5 filter=lfs diff=lfs merge=lfs -text
|
| 45 |
+
Tesei-trained_Model/weights/weights_9.best.hdf5 filter=lfs diff=lfs merge=lfs -text
|
Tesei-trained_Model/.DS_Store
ADDED
|
Binary file (6.15 kB). View file
|
|
|
Tesei-trained_Model/OBfmt_5-1500.npy
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c157256540326ba182c0a7e5fb94995b222e1757baa0b0526d2a91c060ee50ed
|
| 3 |
+
size 36032
|
Tesei-trained_Model/README.txt
ADDED
|
@@ -0,0 +1,34 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
This readme file was generated on 2024-07-23 by Lilianna Houston
|
| 2 |
+
|
| 3 |
+
GENERAL INFORMATION
|
| 4 |
+
|
| 5 |
+
Title of Project: PML, Tesei-trained Model
|
| 6 |
+
|
| 7 |
+
Principal Investigator Information
|
| 8 |
+
Name: Kingshuk Ghosh
|
| 9 |
+
Institution: University of Denver
|
| 10 |
+
Email: kingshuk.ghosh@du.edu
|
| 11 |
+
|
| 12 |
+
Author Information
|
| 13 |
+
Name: Lilianna Houston
|
| 14 |
+
Institution: University of Denver
|
| 15 |
+
Email: lili.houston@du.edu
|
| 16 |
+
|
| 17 |
+
DATA & FILE OVERVIEW
|
| 18 |
+
|
| 19 |
+
File List:
|
| 20 |
+
|
| 21 |
+
"weights" -> Folder containing weights from the Tesei-trained CNN that predicts omega_2 from sequence. We trained the model 10 separate times on all omega_2 calculations from the Tesei 2023 dataset and provide all 10 resulting weights.
|
| 22 |
+
|
| 23 |
+
"Tesei_w2_Ree_preds" -> CSV containing calculated and ML predicted omega_2 (w2) (predicted using 10 fold cross-validation), as well as reported and predicted R_ee for the Tesei 2023 dataset. Sequences were omega_2 calculation failed are omitted.
|
| 24 |
+
|
| 25 |
+
"exper_seqs_master" -> CSV of our compiled experimental sequences, including source, sequnces, salt, pH, temperature, reported R_g and our predicted R_g (our value is averaged across the results of all 10 trained models). This is used as the input file for extract_w2.py [Use a different csv if you want to use a different sequence or set of sequences.]
|
| 26 |
+
|
| 27 |
+
"extract_w2" -> .py file that extracts the omega_2s of a specified list of sequences using a specified set of weights. Make sure you use the correct input file if you want to change the current input file. Also change the output file at the end of the code if you change the input file.
|
| 28 |
+
|
| 29 |
+
"exper_seqs_w2preds" -> CSV file. Same content as "exper_seqs_master," with the addition of predicted w2s using weights_0 from the "weights" folder. This is used as the input for extract_Rg
|
| 30 |
+
|
| 31 |
+
"extract_Rg" -> .py file that extracts the x, R_ee, and R_gs of a specified list of sequences using omega_2. Currently w2 is obtained from exper_seqs_w2preds but use a different one if you used a different output above.
|
| 32 |
+
|
| 33 |
+
"OBfmt_5-1500.npy" -> Helper file for "extract_Rg" containing precalulated terms.
|
| 34 |
+
"theory_functions" -> .py helper file for "extract_Rg" containing constants and functions needed for R_g calculation.
|
Tesei-trained_Model/Tesei_w2_Ree_preds.csv
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
Tesei-trained_Model/exper_seqs_master.csv
ADDED
|
@@ -0,0 +1,65 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
protein,varient_name,paper,Sequence,N,salt,pH,temp,Rg[nm],Rg_av_pred
|
| 2 |
+
hnRNPA1,WT,Martin,GSMASASSSQRGRSGSGNFGGGRGGGFGGNDNFGRGGNFSGRGGFGGSRGGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGRSSGGSGGGGQYFAKPRNQGGYGGSSSSSSYGSGRRF,137,50,7,23,2.72,2.77954575
|
| 3 |
+
hnRNPA2,Aro-,Martin,GSMAFASSFQRGRYGSGNFGGGRGGGFGGNDNFGRGGNFSGRGGFGGSRGGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGRSSGGSYGGGQYFAKPRNQGGYGGSSFSSSYGSGRRF,137,50,7,23,2.89,2.694002975
|
| 4 |
+
hnRNPA3,Aro--,Martin,GSMASASSSQRGRSGSGNSGGGRGGGFGGNDNFGRGGNSSGRGGFGGSRGGGGYGGSGDGYNGFGNDGSNSGGGGSSNDFGNYNNQSSNFGPMKGGNFGGRSSGGSGGGGQYSAKPRNQGGYGGSSSSSSSGSGRRF,137,50,7,23,3.01,2.886092558
|
| 5 |
+
hnRNPA4,Aro+,Martin,GSMASASSSQRGRSGSGNSGGGRGGGFGGNDNSGRGGNSSGRGGFGGSRGGGGSGGSGDGYNGSGNDGSNSGGGGSSNDFGNSNNQSSNSGPMKGGNFGGRSSGGSGGGGQYSAKPRNQGGSGGSSSSSSSGSGRRS,137,50,7,23,2.44,3.011271692
|
| 6 |
+
hnRNPA1_(A1-LCD),WT-NLS,Bremer,GSMASASSSQRGRSGSGNFGGGRGGGFGGNDNFGRGGNFSGRGGFGGSRGGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGRSSGGSGGGGQYFAKPRNQGGYGGSSSSSSYGSGRRF,137,150,7,25,2.76,2.667853665
|
| 7 |
+
hnRNPA1_(A1-LCD),WT+NLS,Bremer,GSMASASSSQRGRSGSGNFGGGRGGGFGGNDNFGRGGNFSGRGGFGGSRGGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGYGGSSSSSSYGSGRRF,137,150,7,25,2.583,2.665065329
|
| 8 |
+
hnRNPA1_(A1-LCD),-12F+12Y,Bremer,GSMASASSSQRGRSGSGNYGGGRGGGYGGNDNYGRGGNYSGRGGYGGSRGGGGYGGSGDGYNGYGNDGSNYGGGGSYNDYGNYNNQSSNYGPMKGGNYGGRSSGGSGGGGQYYAKPRNQGGYGGSSSSSSYGSGRRY,137,150,7,25,2.604,2.593430947
|
| 9 |
+
hnRNPA1_(A1-LCD),+7F-7Y,Bremer,GSMASASSSQRGRSGSGNFGGGRGGGFGGNDNFGRGGNFSGRGGFGGSRGGGGFGGSGDGFNGFGNDGSNFGGGGSFNDFGNFNNQSSNFGPMKGGNFGGRSSGGSGGGGQFFAKPRNQGGFGGSSSSSSFGSGRRF,137,150,7,25,2.718,2.72649201
|
| 10 |
+
hnRNPA1_(A1-LCD),-9F+6Y,Bremer,GSMASASSSQRGRSGSGNFGGGRGGGYGGNDNYGRGGNYSGRGGFGGSRGGGGYGGSGDGYNGGGNDGSNYGGGGSYNDSGNYNNQSSNFGPMKGGNYGGRSSGGSGGGGQYGAKPRNQGGYGGSSSSSSYGSGRRY,137,150,7,25,2.655,2.67330763
|
| 11 |
+
hnRNPA1_(A1-LCD),-8F+4Y,Bremer,GSMASASSSQRGRSGSGNFGGGRGGGYGGNDNGGRGGNYSGRGGFGGSRGGGGYGGSGDGYNGGGNDGSNYGGGGSYNDSGNYNNQSSNFGPMKGGNYGGRSSGGSGGGGQYGAKPRNQGGYGGSSSSSSYGSGRRF,137,150,7,25,2.707,2.69069985
|
| 12 |
+
hnRNPA1_(A1-LCD),-9F+3Y,Bremer,GSMASASSSQRGRSGSGNFGGGRGGGYGGNDNGGRGGNYSGRGGFGGSRGGGGYGGSGDGYNGGGNDGSNYGGGGSYNDSGNGNNQSSNFGPMKGGNYGGRSSGGSGGGGQYGAKPRNQGGYGGSSSSSSYGSGRRS,137,150,7,25,2.683,2.720835079
|
| 13 |
+
hnRNPA1_(A1-LCD),-10R,Bremer,GSMASASSSQGGSSGSGNFGGGGGGGFGGNDNFGGGGNFSGSGGFGGSGGGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGSSSGPYGGGGQYFAKPGNQGGYGGSSSSSSYGSGGGF,137,150,7,25,2.671,2.698054512
|
| 14 |
+
hnRNPA1_(A1-LCD),-6R,Bremer,GSMASASSSQGGRSGSGNFGGGRGGGFGGNDNFGGGGNFSGSGGFGGSRGGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGSSSGPYGGGGQYFAKPGNQGGYGGSSSSSSYGSGGRF,137,150,7,25,2.573,2.647302437
|
| 15 |
+
hnRNPA1_(A1-LCD),+7R,Bremer,GSMASASSSQRGRSGRGNFGGGRGGGFGGNDNFGRGGNFSGRGGFGGSRGGGRYGGSGDRYNGFGNDGRNFGGGGSYNDFGNYNNQSSNFGPMKGGNFRGRSSGPYGRGGQYFAKPRNQGGYGGSSSSRSYGSGRRF,137,150,7,25,2.709,2.8897102
|
| 16 |
+
hnRNPA1_(A1-LCD),-3R+3K,Bremer,GSMASASSSQRGKSGSGNFGGGRGGGFGGNDNFGRGGNFSGRGGFGGSKGGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGRSSGGSGGGGQYFAKPRNQGGYGGSSSSSSYGSGRKF,137,150,7,25,2.634,2.734381901
|
| 17 |
+
hnRNPA1_(A1-LCD),-6R+6K,Bremer,GSMASASSSQKGKSGSGNFGGGRGGGFGGNDNFGKGGNFSGRGGFGGSKGGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGKSSGGSGGGGQYFAKPRNQGGYGGSSSSSSYGSGRKF,137,150,7,25,2.787,2.794997238
|
| 18 |
+
hnRNPA1_(A1-LCD),-4D,Bremer,GSMASASSSQRGRSGSGNFGGGRGGGFGGNGNFGRGGNFSGRGGFGGSRGGGGYGGSGGGYNGFGNSGSNFGGGGSYNGFGNYNNQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGYGGSSSSSSYGSGRRF,137,150,7,25,2.642,2.808935655
|
| 19 |
+
hnRNPA1_(A1-LCD),+4D,Bremer,GSMASASSSQRDRSGSGNFGGGRGGGFGGNDNFGRGGNFSGRGDFGGSRGGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGRSSDPYGGGGQYFAKPRNQGGYGGSSSSSSYDSGRRF,137,150,7,25,2.718,2.660307676
|
| 20 |
+
hnRNPA1_(A1-LCD),+8D,Bremer,GSMASASSSQRDRSGSGNFGGGRDGGFGGNDNFGRGDNFSGRGDFGGSRDGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGRSSDPYGGGGQYFAKPRNQDGYGGSSSSSSYDSGRRF,137,150,7,25,2.685,2.700351745
|
| 21 |
+
hnRNPA1_(A1-LCD),+12D,Bremer,GSMASADSSQRDRDDSGNFGDGRGGGFGGNDNFGRGGNFSDRGGFGGSRGDGGYGGDGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFDPMKGGNFGDRSSGPYDGGGQYFAKPRNQGGYGGSSSSSSYGSDRRF,137,150,7,25,2.801,2.73413
|
| 22 |
+
hnRNPA1_(A1-LCD),+12E,Bremer,GSMASAESSQREREESGNFGEGRGGGFGGNDNFGRGGNFSERGGFGGSRGEGGYGGEGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFEPMKGGNFGERSSGPYEGGGQYFAKPRNQGGYGGSSSSSSYGSERRF,137,150,7,25,2.852,2.765836221
|
| 23 |
+
hnRNPA1_(A1-LCD),+7K+12D,Bremer,GSMASADSSQRDRDDKGNFGDGRGGGFGGNDNFGRGGNFSDRGGFGGSRGDGKYGGDGDKYNGFGNDGKNFGGGGSYNDFGNYNNQSSNFDPMKGGNFKDRSSGPYDKGGQYFAKPRNQGGYGGSSSSKSYGSDRRF,137,150,7,25,2.921,2.864720166
|
| 24 |
+
hnRNPA1_(A1-LCD),+7K+12D_blocky,Bremer,GSMASAKSSQRDRDDDGNFGKGRGGGFGGNKNFGRGGNFSKRGGFGGSRGKGKYGGKGDDYNGFGNDGDNFGGGGSYNDFGNYNNQSSNFDPMDGGNFDDRSSGPYDDGGQYFADPRNQGGYGGSSSSKSYGSKRRF,137,150,7,25,2.562,2.595524581
|
| 25 |
+
hnRNPA1_(A1-LCD),+2R,Bremer,GSMASASSSQRGRSGSGNFGGGRGGGFGGNDNFGRGGNFSGRGGFGGSRGGGGYGGSGDGYNGFRNDGSNFGGGGRYNDFGNYNNQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGYGGSSSSSSYGSGRRF,137,150,7,25,2.623,2.74310832
|
| 26 |
+
hnRNPA1_(A1-LCD),-10R+10K,Bremer,GSMASASSSQKGKSGSGNFGGGKGGGFGGNDNFGKGGNFSGKGGFGGSKGGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGKSSGGSGGGGQYFAKPKNQGGYGGSSSSSSYGSGKKF,137,150,7,25,2.849,2.883381638
|
| 27 |
+
hnRNPA1_(A1-LCD),-12F+12Y-10R,Bremer,GSMASASSSQGGSSGSGNYGGGGGGGYGGNDNYGGGGNYSGSGGYGGSGGGGGYGGSGDGYNGYGNDGSNYGGGGSYNDYGNYNNQSSNYGPMKGGNYGGSSSGPYGGGGQYYAKPGNQGGYGGSSSSSSYGSGGGY,137,150,7,25,2.607,2.624120722
|
| 28 |
+
hnRNPA1_(A1-LCD),-10F+7R+12D,Bremer,GSMASADSSQRDRDDRGNFGDGRGGGGGGNDNFGRGGNGSDRGGGGGSRGDGRYGGDGDRYNGGGNDGRNGGGGGSYNDGGNYNNQSSNGDPMKGGNGRDRSSGPYDRGGQYGAKPRNQGGYGGSSSSRSYGSDRRG,137,150,7,25,2.86,2.808875725
|
| 29 |
+
pNT,pNT,Riback,DWNNQSIVKTGERQHGIHIQGSDPGGVRTASGTTIKVSGRQAQGILLENPAAELQFRNGSVTSSGQLSDDGIRRFLGTVTVKAGKLVADHATLANVGDTWDDDGIALYVAGEQAQASIADSTLQGAGGVQIERGANVTVQRSAIVDGGLHIGALQSLQPEDLPPSRVVLRDTNVTAVPASGAPAAVSVLGASELTLDGGHITGGRAAGVAAMQGAVVHLQRATIRRGDALAGGAVPGGAVPGGAVPGGFGPGGFGPVLDGWYGVDVSGSSVELAQSIVEAPELGAAIRVGRGARVTVPGGSLSAPHGNVIETGGARRFAPQAAPLSITLQAGAH,334,150,7.5,25,5.11,5.000444014
|
| 30 |
+
fHua,fHua,Riback,ESAWGPAATIAARQSATGTKTDTPIQKVPQSISVVTAEEMALHQPKSVKEALSYTPGVSVGTRGASNTYDHLIIRGFAAEGQSQNNYLNGLKLQGNFYNDAVIDPYMLERAEIMRGPVSVLYGKSSPGGLLNMVSKRPTTEPL,143,150,7.5,25,3.34,3.288455309
|
| 31 |
+
RNasea,RNasea,Riback,KETAAAKFERQHMDSSTSAASSSNYCNQMMKSRNLTKDRCKPVNTFVHESLADVQAVCSQKNVACKDGQTNCYQSYSTMSITDCRETGSSKYPNCAYKTTQANKHIIVACEGNPYVPVHFDASV,124,150,7.5,25,3.36,3.182804126
|
| 32 |
+
tau,ht40,Mylonas,MAEPRQEFEVMEDHAGTYGLGDRKDQGGYTMHQDQEGDTDAGLKESPLQTPTEDGSEEPGSETSDAKSTPTAEDVTAPLVDEGAPGKQAAAQPHTEIPEGTTAEEAGIGDTPSLEDEAAGHVTQARMVSKSKDGTGSDDKKAKGADGKTKIATPRGAAPPGQKGQANATRIPAKTPPAPKTPPSSGEPPKSGDRSGYSSPGSPGTPGSRSRTPSLPTPPTREPKKVAVVRTPPKSPSSAKSRLQTAPVPMPDLKNVKSKIGSTENLKHQPGGGKVQIINKKLDLSNVQSKCGSKDNIKHVPGGGSVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVEVKSEKLDFKDRVQSKIGSLDNITHVPGGGNKKIETHKLTFRENAKAKTDHGAEIVYKSPVVSGDTSPRHLSNVSSTGSIDMVDSPQLATLADEVSASLAKQGL,441,150,7.4,15,6.5,6.181000307
|
| 33 |
+
tau,K32,Mylonas,SSPGSPGTPGSRSRTPSLPTPPTREPKKVAVVRTPPKSPSSAKSRLQTAPVPMPDLKNVKSKIGSTENLKHQPGGGKVQIINKKLDLSNVQSKCGSKDNIKHVPGGGSVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVEVKSEKLDFKDRVQSKIGSLDNITHVPGGGNKKIETHKLTFRENAKAKTDHGAEIVY,197,150,7.4,15,4.2,4.320015395
|
| 34 |
+
tau,K16,Mylonas,SSPGSPGTPGSRSRTPSLPTPPTREPKKVAVVRTPPKSPSSAKSRLQTAPVPMPDLKNVKSKIGSTENLKHQPGGGKVQIINKKLDLSNVQSKCGSKDNIKHVPGGGSVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVEVKSEKLDFKDRVQSKIGSLDNITHVPGGGNKKIE,175,150,7.4,15,3.9,4.124442757
|
| 35 |
+
tau,K18,Mylonas,QTAPVPMPDLKNVKSKIGSTENLKHQPGGGKVQIINKKLDLSNVQSKCGSKDNIKHVPGGGSVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVEVKSEKLDFKDRVQSKIGSLDNITHVPGGGNKKIE,129,150,7.4,15,3.8,3.461192152
|
| 36 |
+
tau,ht23,Mylonas,MAEPRQEFEVMEDHAGTYGLGDRKDQGGYTMHQDQEGDTDAGLKAEEAGIGDTPSLEDEAAGHVTQARMVSKSKDGTGSDDKKAKGADGKTKIATPRGAAPPGQKGQANATRIPAKTPPAPKTPPSSGEPPKSGDRSGYSSPGSPGTPGSRSRTPSLPTPPTREPKKVAVVRTPPKSPSSAKSRLQTAPVPMPDLKNVKSKIGSTENLKHQPGGGKVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVEVKSEKLDFKDRVQSKIGSLDNITHVPGGGNKKIETHKLTFRENAKAKTDHGAEIVYKSPVVSGDTSPRHLSNVSSTGSIDMVDSPQLATLADEVSASLAKQGL,352,150,7.4,15,5.3,5.717470647
|
| 37 |
+
tau,K27,Mylonas,SPGSPGTPGSRSRTPSLPTPPTREPKKVAVVRTPPKSPSSAKSRLQTAPVPMPDLKNVKSKIGSTENLKHQPGGGKVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVEVKSEKLDFKDRVQSKIGSLDNITHVPGGGNKKIETHKLTFRENAKAKTDHGAEIVY,165,150,7.4,15,3.7,3.957930324
|
| 38 |
+
tau,K17,Mylonas,SPGSPGTPGSRSRTPSLPTPPTREPKKVAVVRTPPKSPSSAKSRLQTAPVPMPDLKNVKSKIGSTENLKHQPGGGKVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVEVKSEKLDFKDRVQSKIGSLDNITHVPGGGNKKIE,143,150,7.4,15,3.6,3.700929003
|
| 39 |
+
tau,K19,Mylonas,QTAPVPMPDLKNVKSKIGSTENLKHQPGGGKVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVEVKSEKLDFKDRVQSKIGSLDNITHVPGGGNKKIE,98,150,7.4,15,3.5,2.955708005
|
| 40 |
+
tau,K44,Mylonas,MAEPRQEFEVMEDHAGTYGLGDRKDQGGYTMHQDQEGDTDAGLKAEEAGIGDTPSLEDEAAGHVTQARMVSKSKDGTGSDDKKAKGADGKTKIATPRGAAPPGQKGQANATRIPAKTPPAPKTPPSSGEPPKSGDRSGYSSPGSPGTPGSRSRTPSLPTPPTREPKKVAVVRTPPKSPSSAKSRLQTAPVPMPDLKNVKSKIGSTENLKHQPGGGKVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVEVKSEKLDFKDRVQSKIGSLDNITHVPGGGNKKIE,283,150,7.4,15,5.2,5.072666156
|
| 41 |
+
tau,K10,Mylonas,QTAPVPMPDLKNVKSKIGSTENLKHQPGGGKVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVEVKSEKLDFKDRVQSKIGSLDNITHVPGGGNKKIETHKLTFRENAKAKTDHGAEIVYKSPVVSGDTSPRHLSNVSSTGSIDMVDSPQLATLADEVSASLAKQGL,167,150,7.4,15,4,3.865354564
|
| 42 |
+
tau,K25,Mylonas,MAEPRQEFEVMEDHAGTYGLGDRKDQGGYTMHQDQEGDTDAGLKAEEAGIGDTPSLEDEAAGHVTQARMVSKSKDGTGSDDKKAKGADGKTKIATPRGAAPPGQKGQANATRIPAKTPPAPKTPPSSGEPPKSGDRSGYSSPGSPGTPGSRSRTPSLPTPPTREPKKVAVVRTPPKSPSSAKSRL,185,150,7.4,15,4.1,3.914059813
|
| 43 |
+
tau,K23,Mylonas,MAEPRQEFEVMEDHAGTYGLGDRKDQGGYTMHQDQEGDTDAGLKAEEAGIGDTPSLEDEAAGHVTQARMVSKSKDGTGSDDKKAKGADGKTKIATPRGAAPPGQKGQANATRIPAKTPPAPKTPPSSGEPPKSGDRSGYSSPGSPGTPGSRSRTPSLPTPPTREPKKVAVVRTPPKSPSSAKSRLTHKLTFRENAKAKTDHGAEIVYKSPVVSGDTSPRHLSNVSSTGSIDMVDSPQLATLADEVSASLAKQGL,254,150,7.4,15,4.9,4.567234787
|
| 44 |
+
ACTR,ACTR,Kjaergaard,EQVSHGTQNRPLLRNSLDDLVGPPSNLEGQSDERALLDQLHTLLSNTDATGLEEIDRALGIPELVNQGQAL,71,200,7.4,5,2.63,2.348116852
|
| 45 |
+
hNHE1cdt,hNHE1cdt,Kjaergaard,INNYLTVPAHKLDSPTMSRARIGSDPLAYEPKEDLPVITIDPASPQSPESVDLVNEELKGKVLGLSRDPAKVAEEDEDDDGGIMMRSKETSSPGTDDVFTPAPSDSPSSQRIQRCLSDPGPHPEPGEGEPFFPKGQ,136,200,7.4,5,3.75,3.309963467
|
| 46 |
+
sic1,sic1,Gomes,GSMTPSTPPRSRGTRYLAQPSGNTSSSALMQGQKTPQKPSQNLVPVTPSTTKSFKNAPLLAPPNSNMGMTSPFNGLTSPQRSPFPKSSVKRT,92,200,7.5,20,3,2.753155449
|
| 47 |
+
p15PAF,p15PAF,De Biasio,MVRTKADSVPGTYRKVVAARAPRKVLGSSTSATNSTSVSSRKAENKYAGGNPVCVRPTPKWQKGIGEFFRLSPKDSEKENQIPEEAGSSGLGKAKRKACPLQPDHTNDEKE,111,150,7,25,2.81,2.883730178
|
| 48 |
+
alphaSyn,alphaSyn,Ahmed,MDVFMKGLSKAKEGVVAAAEKTKQGVAEAAGKTKEGVLYVGSKTKEGVVHGVATVAEKTKEQVTNVGGAVVTGVTAVAQKTVEGAGSIAAATGFVKKDQLGKNEEGAPQEGILEDMPVDPDNEAYEMPSEEGYQDYEPEA,140,200,7.4,20,3.55,3.650767716
|
| 49 |
+
OPN220,OPN220,Platzer,MHQDHVDSQSQEHLQQTQNDLASLQQTHYSSEENADVPEQPDFPDVPSKSQETVDDDDDDDNDSNDTDESDEVFTDFPTEAPVAPFNRGDNAGRGDSVAYGFRAKAHVVKASKIRKAARKLIEDDATTEDGDSQPAGLWWPKESREQNSRELPQHQSVENDSRPKFDSREVDGGDSKASAGVDSRESQGSVPAVDASNQTLESAEDAEDRHSIENNEVTR,220,150,6.5,25,5.13,4.196087874
|
| 50 |
+
IN,IN,Hofmann,GSHCFLDGIDKAQEEHEKYHSNWRAMASDFNLPPVVAKEIVASCDKCQLKGEAMHGQVDC,60,100,7,22,2.221,2.212767227
|
| 51 |
+
R15,R15,Hofmann,KLKEANKQQNFNTGIKDFDFWLSEVEALLASEDYGKDLCSVNNLLKKHQLLEADISAHEDRLKDLNSQADSLMTSSAFDTSQVKDKRETINGRFQRIKCMAAARRAKLNESHRL,114,100,7,22,2.612,2.86574043
|
| 52 |
+
ProTα-N,ProTa-N,Hofmann,CDAAVDTSSEITTKDLKEKKEVVEEAENGRDAPANGNANEENGEQEADNEVDEEC,55,100,7,22,2.549,2.519264356
|
| 53 |
+
ProTα-C,ProTa-C,Hofmann,CEEGGEEEEEEEEGDGEEEDGDEDEEAESATGKRAAEDDEDDDVDTKKQKTDEDC,55,100,7,22,2.998,3.3869979
|
| 54 |
+
hCyp,hCyp,Hofmann,GPMCNPTVFFDIAVDGEPLGRVSFELFADKVPKTAENFRALSTGEKGFGYKGSSFHRIIPGFMSQGGDFTRHNGTGGKSIYGEKFEDENFILKHTGPGILSMANAGPNTNGSQFFISTAKTEFLDGKHVVFGKVKEGMNIVEAMERFGSRNGKTSKKITIADSGQLC,167,75,7,22,2.534,3.455838139
|
| 55 |
+
ACTR,ACTR,Borgia,GPSGTQNRPLLRNSLDDLVGPPSNLEGQSDERALLDQLHTLLSNTDATGLEEIDRALGIPELVNQGQALEPKQDSGGPR,79,100,7,25,2.51,2.49971119
|
| 56 |
+
R17d,R17d,Borgia,GSRLEESLEYQQFVANVEEEEAWINEKMTLVASEDYGDTLAAIQGLLKKHEAFETDFTVHKDRVNDVAANGEDLIKKNNHHVENITAKMKGAKGKVSDLEKAAAQRKAKLDENSAFLQ,118,100,7,25,2.817,2.964929957
|
| 57 |
+
sh4ud,LL,Shrestha,MGSNKSKPKDASQRRRSLEPAENVHGAGGGAFPASQTPSKPASADGHRGPSAAFAPAAAEPKLFGGFNSSDTVTSPQRAGPLAGGSAWSHPQFEK,95,200,8,25,2.71,2.652345841
|
| 58 |
+
colNT,LL,Johnson,MGSNGADNAHNNAFGGGKNPGIGNTSGAGSNGSASSNRGNSNGWSWSNKPHKNDGFHSDGSYHITFHGDNNSKPKPGGNSGNRGNNGDGASSHHHHHH,98,400,7.6,25,2.83,2.617435651
|
| 59 |
+
PNt,PNt,Bowman,DWNNQSIVKTGERQHGIHIQGSDPGGVRTASGTTIKVSGRQAQGILLENPAAELQFRNGSVTSSGQLSDDGIRRFLGTVTVKAGKLVADHATLANVGDTWDDDGIALYVAGEQAQASIADSTLQGAGGVQIERGANVTVQRSAIVDGGLHIGALQSLQPEDLPPSRVVLRDTNVTAVPASGAPAAVSVLGASELTLDGGHITGGRAAGVAAMQGAVVHLQRATIRRGEALAGGAVPGGAVPGGAVPGGFGPGGFGPVLDGWYGVDVSGSSVELAQSIVEAPELGAAIRVGRGARVTVPGGSLSAPHGNVIETGGARRFAPQAAPLSITLQAGAH,334,150,7.5,25,5.11,5.002265334
|
| 60 |
+
PNt,swap1,Bowman,DWNNQSIVKTGERQHGIHIQGSDPGGVRTASGTTIKVSGRQAQGILLENPAAELQFRNGSVTSSGQKSDDGIRRFLGTVTVLAGKLVADHATLANVGDTWDDDGIALYVAGEQAQASIADSTLQGAGGVQIERGANVTVQRSAIVLGGLHIGALQSLQPEDDPPSRVVLRDTNVTAVPASGAPAAVSVLGASLLTLDGGHITGGRAAGVAAMQGAVVHEQRATIRRGEALAGGAVPGGAVPGGAVPGGFGPGGFGPVLDGWYGVDVSGSSVELAQSIVEAPELGAAIRVGRGARVTVPGGSLSAPHGNVIETGGARRFAPQAAPLSITLQAGAH,334,150,7.5,25,4.92,4.998767173
|
| 61 |
+
PNt,swap3,Bowman,DWNNQSIVKTGERQHGIHIQGSDPGGVRTASGTTIKVSGRQAQGILLENPAAELQFRNGSVTSSGQKSTDGTRRFLGDVIVKAGLLVADHATLANVGDTWDDDGIALYVAGEQAQASIADSTLQGAGGVQIERGANVDVLRLAIVDGGLHIGALQSQQPETSPPSRVVLRDTNVTAVPASGAPAAVSVQGASEQTLDGGAITGGRAAGVAAMLGHVVHLLRATIRRGEALAGGAVPGGAVPGGAVPGGFGPGGFGPVLDGWYGVDVSGSSVELAQSIVEAPELGAAIRVGRGARVTVPGGSLSAPHGNVIETGGARRFAPQAAPLSITLQAGAH,334,150,7.5,25,4.058,5.024235918
|
| 62 |
+
PNt,swap4,Bowman,DWNNQSIVKTGERQHGIHIQGSDPGGVRTASGTTIKVSGRQAQGILLENPAAELQFRNGSVTSSGQLSFVGITRDLGRDTVKAGKLVADHATLANVGDTWDDDGIALYVAGEQAQASIADSTLQGAGGVQIERGADVRVQREAIVDGGLHNGALQSLQPSILPPSTVVLRDTNVTAVPASGAPAAVLVSGASGLRLDGGHIHEGRAAGVAAMQGAVVTLQTATIRRGEALAGGAVPGGAVPGGAVPGGFGPGGFGPVLDGWYGVDVSGSSVELAQSIVEAPELGAAIRVGRGARVTVPGGSLSAPHGNVIETGGARRFAPQAAPLSITLQAGAH,334,150,7.5,25,5.337,5.080732149
|
| 63 |
+
PNt,swap4.1,Bowman,DWNNQSIVKTGERQHGIHIQGSDPGGVRTASGTTIKVSGRQAQGILLENPAAELQFRNGSVTSSGQLSFVGITRRLGDDTVKAGKLVADHATLANVGDTWDDDGIALYVAGEQAQASIADSTLQGAGGVQIERGADVEVQRRAIVDGGLHNGALQSLQPSILPPSTVVLRDTNVTAVPASGAPAAVLVSGASGLELDGGHIHRGRAAGVAAMQGAVVTLQTATIRRGEALAGGAVPGGAVPGGAVPGGFGPGGFGPVLDGWYGVDVSGSSVELAQSIVEAPELGAAIRVGRGARVTVPGGSLSAPHGNVIETGGARRFAPQAAPLSITLQAGAH,334,150,7.5,25,5.445,5.005591773
|
| 64 |
+
PNt,swap5,Bowman,DWNNQSIVKTGERQHGIHIQGSDPGGVRTASGTTIKVSGRQAQGILLENPAAELQFRNGSVTSSGQLSDDGIEDFLGTVTVDAGELVADHATLANVGDTWDDDGIALYVAGEQAQASIADSTLQGAGGVQIEDGANVTVQESAIVDGGLHIGALQSLQPRRLPPSRVVLRKTNVTAVPASGAPAAVSVLGASKLTLRGGHITGGRAAGVAAMQGAVVHLQRATIRRGRALAGGAVPGGAVPGGAVPGGFGPGGFGPVLDGWYGVDVSGSSVELAQSIVEAPELGAAIRVGRGARVTVPGGSLSAPHGNVIETGGARRFAPQAAPLSITLQAGAH,334,150,7.5,25,4.871,4.760132157
|
| 65 |
+
PNt,swap6,Bowman,DWNNQSIVKTGERQHGIHIQGSDPGGVRTASGTTIKVSGRQAQGILLENPAAELQFRNGSVTSSGQLSDRGIDRFLGTVTVEAGKLVADHATLANVGDTWDKDGIALYVAGRQAQASIADSTLQGAGGVQIREGANVTVQRSAIVDGGLHIGALQSLQPERLPPSDVVLRDTNVTAVPASGAPAAVSVLGASRLTLDGGHITGGDAAGVAAMQGAVVHLQRATIERGEALAGGAVPGGAVPGGAVPGGFGPGGFGPVLDGWYGVDVSGSSVELAQSIVEAPELGAAIRVGRGARVTVPGGSLSAPHGNVIETGGARRFAPQAAPLSITLQAGAH,334,150,7.5,25,5.261,5.117488247
|
Tesei-trained_Model/exper_seqs_w2preds.csv
ADDED
|
@@ -0,0 +1,65 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
protein,varient_name,paper,Sequence,N,salt,pH,temp,Rg[nm],Rg_av_pred,w2_preds_tesei_model
|
| 2 |
+
hnRNPA1,WT,Martin,GSMASASSSQRGRSGSGNFGGGRGGGFGGNDNFGRGGNFSGRGGFGGSRGGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGRSSGGSGGGGQYFAKPRNQGGYGGSSSSSSYGSGRRF,137,50,7.0,23,2.72,2.77954575,-0.34484723
|
| 3 |
+
hnRNPA2,Aro-,Martin,GSMAFASSFQRGRYGSGNFGGGRGGGFGGNDNFGRGGNFSGRGGFGGSRGGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGRSSGGSYGGGQYFAKPRNQGGYGGSSFSSSYGSGRRF,137,50,7.0,23,2.89,2.694002975,-0.38453168
|
| 4 |
+
hnRNPA3,Aro--,Martin,GSMASASSSQRGRSGSGNSGGGRGGGFGGNDNFGRGGNSSGRGGFGGSRGGGGYGGSGDGYNGFGNDGSNSGGGGSSNDFGNYNNQSSNFGPMKGGNFGGRSSGGSGGGGQYSAKPRNQGGYGGSSSSSSSGSGRRF,137,50,7.0,23,3.01,2.886092558,-0.28979945
|
| 5 |
+
hnRNPA4,Aro+,Martin,GSMASASSSQRGRSGSGNSGGGRGGGFGGNDNSGRGGNSSGRGGFGGSRGGGGSGGSGDGYNGSGNDGSNSGGGGSSNDFGNSNNQSSNSGPMKGGNFGGRSSGGSGGGGQYSAKPRNQGGSGGSSSSSSSGSGRRS,137,50,7.0,23,2.44,3.011271692,-0.2247001
|
| 6 |
+
hnRNPA1_(A1-LCD),WT-NLS,Bremer,GSMASASSSQRGRSGSGNFGGGRGGGFGGNDNFGRGGNFSGRGGFGGSRGGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGRSSGGSGGGGQYFAKPRNQGGYGGSSSSSSYGSGRRF,137,150,7.0,25,2.76,2.667853665,-0.34484723
|
| 7 |
+
hnRNPA1_(A1-LCD),WT+NLS,Bremer,GSMASASSSQRGRSGSGNFGGGRGGGFGGNDNFGRGGNFSGRGGFGGSRGGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGYGGSSSSSSYGSGRRF,137,150,7.0,25,2.583,2.665065329,-0.34647438
|
| 8 |
+
hnRNPA1_(A1-LCD),-12F+12Y,Bremer,GSMASASSSQRGRSGSGNYGGGRGGGYGGNDNYGRGGNYSGRGGYGGSRGGGGYGGSGDGYNGYGNDGSNYGGGGSYNDYGNYNNQSSNYGPMKGGNYGGRSSGGSGGGGQYYAKPRNQGGYGGSSSSSSYGSGRRY,137,150,7.0,25,2.604,2.593430947,-0.37725022
|
| 9 |
+
hnRNPA1_(A1-LCD),+7F-7Y,Bremer,GSMASASSSQRGRSGSGNFGGGRGGGFGGNDNFGRGGNFSGRGGFGGSRGGGGFGGSGDGFNGFGNDGSNFGGGGSFNDFGNFNNQSSNFGPMKGGNFGGRSSGGSGGGGQFFAKPRNQGGFGGSSSSSSFGSGRRF,137,150,7.0,25,2.718,2.72649201,-0.31061214
|
| 10 |
+
hnRNPA1_(A1-LCD),-9F+6Y,Bremer,GSMASASSSQRGRSGSGNFGGGRGGGYGGNDNYGRGGNYSGRGGFGGSRGGGGYGGSGDGYNGGGNDGSNYGGGGSYNDSGNYNNQSSNFGPMKGGNYGGRSSGGSGGGGQYGAKPRNQGGYGGSSSSSSYGSGRRY,137,150,7.0,25,2.655,2.67330763,-0.3397461
|
| 11 |
+
hnRNPA1_(A1-LCD),-8F+4Y,Bremer,GSMASASSSQRGRSGSGNFGGGRGGGYGGNDNGGRGGNYSGRGGFGGSRGGGGYGGSGDGYNGGGNDGSNYGGGGSYNDSGNYNNQSSNFGPMKGGNYGGRSSGGSGGGGQYGAKPRNQGGYGGSSSSSSYGSGRRF,137,150,7.0,25,2.707,2.69069985,-0.33270133
|
| 12 |
+
hnRNPA1_(A1-LCD),-9F+3Y,Bremer,GSMASASSSQRGRSGSGNFGGGRGGGYGGNDNGGRGGNYSGRGGFGGSRGGGGYGGSGDGYNGGGNDGSNYGGGGSYNDSGNGNNQSSNFGPMKGGNYGGRSSGGSGGGGQYGAKPRNQGGYGGSSSSSSYGSGRRS,137,150,7.0,25,2.683,2.720835079,-0.31949827
|
| 13 |
+
hnRNPA1_(A1-LCD),-10R,Bremer,GSMASASSSQGGSSGSGNFGGGGGGGFGGNDNFGGGGNFSGSGGFGGSGGGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGSSSGPYGGGGQYFAKPGNQGGYGGSSSSSSYGSGGGF,137,150,7.0,25,2.671,2.698054512,-0.25107563
|
| 14 |
+
hnRNPA1_(A1-LCD),-6R,Bremer,GSMASASSSQGGRSGSGNFGGGRGGGFGGNDNFGGGGNFSGSGGFGGSRGGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGSSSGPYGGGGQYFAKPGNQGGYGGSSSSSSYGSGGRF,137,150,7.0,25,2.573,2.647302437,-0.2631922
|
| 15 |
+
hnRNPA1_(A1-LCD),+7R,Bremer,GSMASASSSQRGRSGRGNFGGGRGGGFGGNDNFGRGGNFSGRGGFGGSRGGGRYGGSGDRYNGFGNDGRNFGGGGSYNDFGNYNNQSSNFGPMKGGNFRGRSSGPYGRGGQYFAKPRNQGGYGGSSSSRSYGSGRRF,137,150,7.0,25,2.709,2.8897102,-0.5265355
|
| 16 |
+
hnRNPA1_(A1-LCD),-3R+3K,Bremer,GSMASASSSQRGKSGSGNFGGGRGGGFGGNDNFGRGGNFSGRGGFGGSKGGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGRSSGGSGGGGQYFAKPRNQGGYGGSSSSSSYGSGRKF,137,150,7.0,25,2.634,2.734381901,-0.31496823
|
| 17 |
+
hnRNPA1_(A1-LCD),-6R+6K,Bremer,GSMASASSSQKGKSGSGNFGGGRGGGFGGNDNFGKGGNFSGRGGFGGSKGGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGKSSGGSGGGGQYFAKPRNQGGYGGSSSSSSYGSGRKF,137,150,7.0,25,2.787,2.794997238,-0.28669593
|
| 18 |
+
hnRNPA1_(A1-LCD),-4D,Bremer,GSMASASSSQRGRSGSGNFGGGRGGGFGGNGNFGRGGNFSGRGGFGGSRGGGGYGGSGGGYNGFGNSGSNFGGGGSYNGFGNYNNQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGYGGSSSSSSYGSGRRF,137,150,7.0,25,2.642,2.808935655,-0.4755267
|
| 19 |
+
hnRNPA1_(A1-LCD),+4D,Bremer,GSMASASSSQRDRSGSGNFGGGRGGGFGGNDNFGRGGNFSGRGDFGGSRGGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGRSSDPYGGGGQYFAKPRNQGGYGGSSSSSSYDSGRRF,137,150,7.0,25,2.718,2.660307676,-0.25231382
|
| 20 |
+
hnRNPA1_(A1-LCD),+8D,Bremer,GSMASASSSQRDRSGSGNFGGGRDGGFGGNDNFGRGDNFSGRGDFGGSRDGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGRSSDPYGGGGQYFAKPRNQDGYGGSSSSSSYDSGRRF,137,150,7.0,25,2.685,2.700351745,-0.20195933
|
| 21 |
+
hnRNPA1_(A1-LCD),+12D,Bremer,GSMASADSSQRDRDDSGNFGDGRGGGFGGNDNFGRGGNFSDRGGFGGSRGDGGYGGDGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFDPMKGGNFGDRSSGPYDGGGQYFAKPRNQGGYGGSSSSSSYGSDRRF,137,150,7.0,25,2.801,2.73413,-0.24407695
|
| 22 |
+
hnRNPA1_(A1-LCD),+12E,Bremer,GSMASAESSQREREESGNFGEGRGGGFGGNDNFGRGGNFSERGGFGGSRGEGGYGGEGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFEPMKGGNFGERSSGPYEGGGQYFAKPRNQGGYGGSSSSSSYGSERRF,137,150,7.0,25,2.852,2.765836221,-0.23236628
|
| 23 |
+
hnRNPA1_(A1-LCD),+7K+12D,Bremer,GSMASADSSQRDRDDKGNFGDGRGGGFGGNDNFGRGGNFSDRGGFGGSRGDGKYGGDGDKYNGFGNDGKNFGGGGSYNDFGNYNNQSSNFDPMKGGNFKDRSSGPYDKGGQYFAKPRNQGGYGGSSSSKSYGSDRRF,137,150,7.0,25,2.921,2.864720166,-0.118673995
|
| 24 |
+
hnRNPA1_(A1-LCD),+7K+12D_blocky,Bremer,GSMASAKSSQRDRDDDGNFGKGRGGGFGGNKNFGRGGNFSKRGGFGGSRGKGKYGGKGDDYNGFGNDGDNFGGGGSYNDFGNYNNQSSNFDPMDGGNFDDRSSGPYDDGGQYFADPRNQGGYGGSSSSKSYGSKRRF,137,150,7.0,25,2.562,2.595524581,-0.23424682
|
| 25 |
+
hnRNPA1_(A1-LCD),+2R,Bremer,GSMASASSSQRGRSGSGNFGGGRGGGFGGNDNFGRGGNFSGRGGFGGSRGGGGYGGSGDGYNGFRNDGSNFGGGGRYNDFGNYNNQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGYGGSSSSSSYGSGRRF,137,150,7.0,25,2.623,2.74310832,-0.39167032
|
| 26 |
+
hnRNPA1_(A1-LCD),-10R+10K,Bremer,GSMASASSSQKGKSGSGNFGGGKGGGFGGNDNFGKGGNFSGKGGFGGSKGGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGKSSGGSGGGGQYFAKPKNQGGYGGSSSSSSYGSGKKF,137,150,7.0,25,2.849,2.883381638,-0.23390317
|
| 27 |
+
hnRNPA1_(A1-LCD),-12F+12Y-10R,Bremer,GSMASASSSQGGSSGSGNYGGGGGGGYGGNDNYGGGGNYSGSGGYGGSGGGGGYGGSGDGYNGYGNDGSNYGGGGSYNDYGNYNNQSSNYGPMKGGNYGGSSSGPYGGGGQYYAKPGNQGGYGGSSSSSSYGSGGGY,137,150,7.0,25,2.607,2.624120722,-0.28584272
|
| 28 |
+
hnRNPA1_(A1-LCD),-10F+7R+12D,Bremer,GSMASADSSQRDRDDRGNFGDGRGGGGGGNDNFGRGGNGSDRGGGGGSRGDGRYGGDGDRYNGGGNDGRNGGGGGSYNDGGNYNNQSSNGDPMKGGNGRDRSSGPYDRGGQYGAKPRNQGGYGGSSSSRSYGSDRRG,137,150,7.0,25,2.86,2.808875725,-0.16749647
|
| 29 |
+
pNT,pNT,Riback,DWNNQSIVKTGERQHGIHIQGSDPGGVRTASGTTIKVSGRQAQGILLENPAAELQFRNGSVTSSGQLSDDGIRRFLGTVTVKAGKLVADHATLANVGDTWDDDGIALYVAGEQAQASIADSTLQGAGGVQIERGANVTVQRSAIVDGGLHIGALQSLQPEDLPPSRVVLRDTNVTAVPASGAPAAVSVLGASELTLDGGHITGGRAAGVAAMQGAVVHLQRATIRRGDALAGGAVPGGAVPGGAVPGGFGPGGFGPVLDGWYGVDVSGSSVELAQSIVEAPELGAAIRVGRGARVTVPGGSLSAPHGNVIETGGARRFAPQAAPLSITLQAGAH,334,150,7.5,25,5.11,5.000444014,-0.046264842
|
| 30 |
+
fHua,fHua,Riback,ESAWGPAATIAARQSATGTKTDTPIQKVPQSISVVTAEEMALHQPKSVKEALSYTPGVSVGTRGASNTYDHLIIRGFAAEGQSQNNYLNGLKLQGNFYNDAVIDPYMLERAEIMRGPVSVLYGKSSPGGLLNMVSKRPTTEPL,143,150,7.5,25,3.34,3.288455309,0.08028786
|
| 31 |
+
RNasea,RNasea,Riback,KETAAAKFERQHMDSSTSAASSSNYCNQMMKSRNLTKDRCKPVNTFVHESLADVQAVCSQKNVACKDGQTNCYQSYSTMSITDCRETGSSKYPNCAYKTTQANKHIIVACEGNPYVPVHFDASV,124,150,7.5,25,3.36,3.182804126,0.16724284
|
| 32 |
+
tau,ht40,Mylonas,MAEPRQEFEVMEDHAGTYGLGDRKDQGGYTMHQDQEGDTDAGLKESPLQTPTEDGSEEPGSETSDAKSTPTAEDVTAPLVDEGAPGKQAAAQPHTEIPEGTTAEEAGIGDTPSLEDEAAGHVTQARMVSKSKDGTGSDDKKAKGADGKTKIATPRGAAPPGQKGQANATRIPAKTPPAPKTPPSSGEPPKSGDRSGYSSPGSPGTPGSRSRTPSLPTPPTREPKKVAVVRTPPKSPSSAKSRLQTAPVPMPDLKNVKSKIGSTENLKHQPGGGKVQIINKKLDLSNVQSKCGSKDNIKHVPGGGSVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVEVKSEKLDFKDRVQSKIGSLDNITHVPGGGNKKIETHKLTFRENAKAKTDHGAEIVYKSPVVSGDTSPRHLSNVSSTGSIDMVDSPQLATLADEVSASLAKQGL,441,150,7.4,15,6.5,6.181000307,0.08360584
|
| 33 |
+
tau,K32,Mylonas,SSPGSPGTPGSRSRTPSLPTPPTREPKKVAVVRTPPKSPSSAKSRLQTAPVPMPDLKNVKSKIGSTENLKHQPGGGKVQIINKKLDLSNVQSKCGSKDNIKHVPGGGSVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVEVKSEKLDFKDRVQSKIGSLDNITHVPGGGNKKIETHKLTFRENAKAKTDHGAEIVY,197,150,7.4,15,4.2,4.320015395,-0.07681902
|
| 34 |
+
tau,K16,Mylonas,SSPGSPGTPGSRSRTPSLPTPPTREPKKVAVVRTPPKSPSSAKSRLQTAPVPMPDLKNVKSKIGSTENLKHQPGGGKVQIINKKLDLSNVQSKCGSKDNIKHVPGGGSVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVEVKSEKLDFKDRVQSKIGSLDNITHVPGGGNKKIE,175,150,7.4,15,3.9,4.124442757,-0.08814843
|
| 35 |
+
tau,K18,Mylonas,QTAPVPMPDLKNVKSKIGSTENLKHQPGGGKVQIINKKLDLSNVQSKCGSKDNIKHVPGGGSVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVEVKSEKLDFKDRVQSKIGSLDNITHVPGGGNKKIE,129,150,7.4,15,3.8,3.461192152,0.12808333
|
| 36 |
+
tau,ht23,Mylonas,MAEPRQEFEVMEDHAGTYGLGDRKDQGGYTMHQDQEGDTDAGLKAEEAGIGDTPSLEDEAAGHVTQARMVSKSKDGTGSDDKKAKGADGKTKIATPRGAAPPGQKGQANATRIPAKTPPAPKTPPSSGEPPKSGDRSGYSSPGSPGTPGSRSRTPSLPTPPTREPKKVAVVRTPPKSPSSAKSRLQTAPVPMPDLKNVKSKIGSTENLKHQPGGGKVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVEVKSEKLDFKDRVQSKIGSLDNITHVPGGGNKKIETHKLTFRENAKAKTDHGAEIVYKSPVVSGDTSPRHLSNVSSTGSIDMVDSPQLATLADEVSASLAKQGL,352,150,7.4,15,5.3,5.717470647,0.04805489
|
| 37 |
+
tau,K27,Mylonas,SPGSPGTPGSRSRTPSLPTPPTREPKKVAVVRTPPKSPSSAKSRLQTAPVPMPDLKNVKSKIGSTENLKHQPGGGKVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVEVKSEKLDFKDRVQSKIGSLDNITHVPGGGNKKIETHKLTFRENAKAKTDHGAEIVY,165,150,7.4,15,3.7,3.957930324,-0.06621514
|
| 38 |
+
tau,K17,Mylonas,SPGSPGTPGSRSRTPSLPTPPTREPKKVAVVRTPPKSPSSAKSRLQTAPVPMPDLKNVKSKIGSTENLKHQPGGGKVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVEVKSEKLDFKDRVQSKIGSLDNITHVPGGGNKKIE,143,150,7.4,15,3.6,3.700929003,-0.084508374
|
| 39 |
+
tau,K19,Mylonas,QTAPVPMPDLKNVKSKIGSTENLKHQPGGGKVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVEVKSEKLDFKDRVQSKIGSLDNITHVPGGGNKKIE,98,150,7.4,15,3.5,2.955708005,0.17149426
|
| 40 |
+
tau,K44,Mylonas,MAEPRQEFEVMEDHAGTYGLGDRKDQGGYTMHQDQEGDTDAGLKAEEAGIGDTPSLEDEAAGHVTQARMVSKSKDGTGSDDKKAKGADGKTKIATPRGAAPPGQKGQANATRIPAKTPPAPKTPPSSGEPPKSGDRSGYSSPGSPGTPGSRSRTPSLPTPPTREPKKVAVVRTPPKSPSSAKSRLQTAPVPMPDLKNVKSKIGSTENLKHQPGGGKVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVEVKSEKLDFKDRVQSKIGSLDNITHVPGGGNKKIE,283,150,7.4,15,5.2,5.072666156,0.007707132
|
| 41 |
+
tau,K10,Mylonas,QTAPVPMPDLKNVKSKIGSTENLKHQPGGGKVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVEVKSEKLDFKDRVQSKIGSLDNITHVPGGGNKKIETHKLTFRENAKAKTDHGAEIVYKSPVVSGDTSPRHLSNVSSTGSIDMVDSPQLATLADEVSASLAKQGL,167,150,7.4,15,4.0,3.865354564,0.18579094
|
| 42 |
+
tau,K25,Mylonas,MAEPRQEFEVMEDHAGTYGLGDRKDQGGYTMHQDQEGDTDAGLKAEEAGIGDTPSLEDEAAGHVTQARMVSKSKDGTGSDDKKAKGADGKTKIATPRGAAPPGQKGQANATRIPAKTPPAPKTPPSSGEPPKSGDRSGYSSPGSPGTPGSRSRTPSLPTPPTREPKKVAVVRTPPKSPSSAKSRL,185,150,7.4,15,4.1,3.914059813,0.0866615
|
| 43 |
+
tau,K23,Mylonas,MAEPRQEFEVMEDHAGTYGLGDRKDQGGYTMHQDQEGDTDAGLKAEEAGIGDTPSLEDEAAGHVTQARMVSKSKDGTGSDDKKAKGADGKTKIATPRGAAPPGQKGQANATRIPAKTPPAPKTPPSSGEPPKSGDRSGYSSPGSPGTPGSRSRTPSLPTPPTREPKKVAVVRTPPKSPSSAKSRLTHKLTFRENAKAKTDHGAEIVYKSPVVSGDTSPRHLSNVSSTGSIDMVDSPQLATLADEVSASLAKQGL,254,150,7.4,15,4.9,4.567234787,0.10713528
|
| 44 |
+
ACTR,ACTR,Kjaergaard,EQVSHGTQNRPLLRNSLDDLVGPPSNLEGQSDERALLDQLHTLLSNTDATGLEEIDRALGIPELVNQGQAL,71,200,7.4,5,2.63,2.348116852,-0.023581075
|
| 45 |
+
hNHE1cdt,hNHE1cdt,Kjaergaard,INNYLTVPAHKLDSPTMSRARIGSDPLAYEPKEDLPVITIDPASPQSPESVDLVNEELKGKVLGLSRDPAKVAEEDEDDDGGIMMRSKETSSPGTDDVFTPAPSDSPSSQRIQRCLSDPGPHPEPGEGEPFFPKGQ,136,200,7.4,5,3.75,3.309963467,-0.13009122
|
| 46 |
+
sic1,sic1,Gomes,GSMTPSTPPRSRGTRYLAQPSGNTSSSALMQGQKTPQKPSQNLVPVTPSTTKSFKNAPLLAPPNSNMGMTSPFNGLTSPQRSPFPKSSVKRT,92,200,7.5,20,3.0,2.753155449,-0.10162979
|
| 47 |
+
p15PAF,p15PAF,De Biasio,MVRTKADSVPGTYRKVVAARAPRKVLGSSTSATNSTSVSSRKAENKYAGGNPVCVRPTPKWQKGIGEFFRLSPKDSEKENQIPEEAGSSGLGKAKRKACPLQPDHTNDEKE,111,150,7.0,25,2.81,2.883730178,-0.05214207
|
| 48 |
+
alphaSyn,alphaSyn,Ahmed,MDVFMKGLSKAKEGVVAAAEKTKQGVAEAAGKTKEGVLYVGSKTKEGVVHGVATVAEKTKEQVTNVGGAVVTGVTAVAQKTVEGAGSIAAATGFVKKDQLGKNEEGAPQEGILEDMPVDPDNEAYEMPSEEGYQDYEPEA,140,200,7.4,20,3.55,3.650767716,0.22115915
|
| 49 |
+
OPN220,OPN220,Platzer,MHQDHVDSQSQEHLQQTQNDLASLQQTHYSSEENADVPEQPDFPDVPSKSQETVDDDDDDDNDSNDTDESDEVFTDFPTEAPVAPFNRGDNAGRGDSVAYGFRAKAHVVKASKIRKAARKLIEDDATTEDGDSQPAGLWWPKESREQNSRELPQHQSVENDSRPKFDSREVDGGDSKASAGVDSRESQGSVPAVDASNQTLESAEDAEDRHSIENNEVTR,220,150,6.5,25,5.13,4.196087874,-0.52987176
|
| 50 |
+
IN,IN,Hofmann,GSHCFLDGIDKAQEEHEKYHSNWRAMASDFNLPPVVAKEIVASCDKCQLKGEAMHGQVDC,60,100,7.0,22,2.221,2.212767227,0.19296344
|
| 51 |
+
R15,R15,Hofmann,KLKEANKQQNFNTGIKDFDFWLSEVEALLASEDYGKDLCSVNNLLKKHQLLEADISAHEDRLKDLNSQADSLMTSSAFDTSQVKDKRETINGRFQRIKCMAAARRAKLNESHRL,114,100,7.0,22,2.612,2.86574043,0.15593176
|
| 52 |
+
ProTα-N,ProTa-N,Hofmann,CDAAVDTSSEITTKDLKEKKEVVEEAENGRDAPANGNANEENGEQEADNEVDEEC,55,100,7.0,22,2.549,2.519264356,-0.5270907
|
| 53 |
+
ProTα-C,ProTa-C,Hofmann,CEEGGEEEEEEEEGDGEEEDGDEDEEAESATGKRAAEDDEDDDVDTKKQKTDEDC,55,100,7.0,22,2.998,3.3869979,-2.1427064
|
| 54 |
+
hCyp,hCyp,Hofmann,GPMCNPTVFFDIAVDGEPLGRVSFELFADKVPKTAENFRALSTGEKGFGYKGSSFHRIIPGFMSQGGDFTRHNGTGGKSIYGEKFEDENFILKHTGPGILSMANAGPNTNGSQFFISTAKTEFLDGKHVVFGKVKEGMNIVEAMERFGSRNGKTSKKITIADSGQLC,167,75,7.0,22,2.534,3.455838139,0.0024694633
|
| 55 |
+
ACTR,ACTR,Borgia,GPSGTQNRPLLRNSLDDLVGPPSNLEGQSDERALLDQLHTLLSNTDATGLEEIDRALGIPELVNQGQALEPKQDSGGPR,79,100,7.0,25,2.51,2.49971119,0.03898625
|
| 56 |
+
R17d,R17d,Borgia,GSRLEESLEYQQFVANVEEEEAWINEKMTLVASEDYGDTLAAIQGLLKKHEAFETDFTVHKDRVNDVAANGEDLIKKNNHHVENITAKMKGAKGKVSDLEKAAAQRKAKLDENSAFLQ,118,100,7.0,25,2.817,2.964929957,0.13355108
|
| 57 |
+
sh4ud,LL,Shrestha,MGSNKSKPKDASQRRRSLEPAENVHGAGGGAFPASQTPSKPASADGHRGPSAAFAPAAAEPKLFGGFNSSDTVTSPQRAGPLAGGSAWSHPQFEK,95,200,8.0,25,2.71,2.652345841,0.07127841
|
| 58 |
+
colNT,LL,Johnson,MGSNGADNAHNNAFGGGKNPGIGNTSGAGSNGSASSNRGNSNGWSWSNKPHKNDGFHSDGSYHITFHGDNNSKPKPGGNSGNRGNNGDGASSHHHHHH,98,400,7.6,25,2.83,2.617435651,0.0075647365
|
| 59 |
+
PNt,PNt,Bowman,DWNNQSIVKTGERQHGIHIQGSDPGGVRTASGTTIKVSGRQAQGILLENPAAELQFRNGSVTSSGQLSDDGIRRFLGTVTVKAGKLVADHATLANVGDTWDDDGIALYVAGEQAQASIADSTLQGAGGVQIERGANVTVQRSAIVDGGLHIGALQSLQPEDLPPSRVVLRDTNVTAVPASGAPAAVSVLGASELTLDGGHITGGRAAGVAAMQGAVVHLQRATIRRGEALAGGAVPGGAVPGGAVPGGFGPGGFGPVLDGWYGVDVSGSSVELAQSIVEAPELGAAIRVGRGARVTVPGGSLSAPHGNVIETGGARRFAPQAAPLSITLQAGAH,334,150,7.5,25,5.11,5.002265334,-0.043667927
|
| 60 |
+
PNt,swap1,Bowman,DWNNQSIVKTGERQHGIHIQGSDPGGVRTASGTTIKVSGRQAQGILLENPAAELQFRNGSVTSSGQKSDDGIRRFLGTVTVLAGKLVADHATLANVGDTWDDDGIALYVAGEQAQASIADSTLQGAGGVQIERGANVTVQRSAIVLGGLHIGALQSLQPEDDPPSRVVLRDTNVTAVPASGAPAAVSVLGASLLTLDGGHITGGRAAGVAAMQGAVVHEQRATIRRGEALAGGAVPGGAVPGGAVPGGFGPGGFGPVLDGWYGVDVSGSSVELAQSIVEAPELGAAIRVGRGARVTVPGGSLSAPHGNVIETGGARRFAPQAAPLSITLQAGAH,334,150,7.5,25,4.92,4.998767173,-0.041700497
|
| 61 |
+
PNt,swap3,Bowman,DWNNQSIVKTGERQHGIHIQGSDPGGVRTASGTTIKVSGRQAQGILLENPAAELQFRNGSVTSSGQKSTDGTRRFLGDVIVKAGLLVADHATLANVGDTWDDDGIALYVAGEQAQASIADSTLQGAGGVQIERGANVDVLRLAIVDGGLHIGALQSQQPETSPPSRVVLRDTNVTAVPASGAPAAVSVQGASEQTLDGGAITGGRAAGVAAMLGHVVHLLRATIRRGEALAGGAVPGGAVPGGAVPGGFGPGGFGPVLDGWYGVDVSGSSVELAQSIVEAPELGAAIRVGRGARVTVPGGSLSAPHGNVIETGGARRFAPQAAPLSITLQAGAH,334,150,7.5,25,4.058,5.024235918,-0.030256333
|
| 62 |
+
PNt,swap4,Bowman,DWNNQSIVKTGERQHGIHIQGSDPGGVRTASGTTIKVSGRQAQGILLENPAAELQFRNGSVTSSGQLSFVGITRDLGRDTVKAGKLVADHATLANVGDTWDDDGIALYVAGEQAQASIADSTLQGAGGVQIERGADVRVQREAIVDGGLHNGALQSLQPSILPPSTVVLRDTNVTAVPASGAPAAVLVSGASGLRLDGGHIHEGRAAGVAAMQGAVVTLQTATIRRGEALAGGAVPGGAVPGGAVPGGFGPGGFGPVLDGWYGVDVSGSSVELAQSIVEAPELGAAIRVGRGARVTVPGGSLSAPHGNVIETGGARRFAPQAAPLSITLQAGAH,334,150,7.5,25,5.337,5.080732149,-0.011465641
|
| 63 |
+
PNt,swap4.1,Bowman,DWNNQSIVKTGERQHGIHIQGSDPGGVRTASGTTIKVSGRQAQGILLENPAAELQFRNGSVTSSGQLSFVGITRRLGDDTVKAGKLVADHATLANVGDTWDDDGIALYVAGEQAQASIADSTLQGAGGVQIERGADVEVQRRAIVDGGLHNGALQSLQPSILPPSTVVLRDTNVTAVPASGAPAAVLVSGASGLELDGGHIHRGRAAGVAAMQGAVVTLQTATIRRGEALAGGAVPGGAVPGGAVPGGFGPGGFGPVLDGWYGVDVSGSSVELAQSIVEAPELGAAIRVGRGARVTVPGGSLSAPHGNVIETGGARRFAPQAAPLSITLQAGAH,334,150,7.5,25,5.445,5.005591773,-0.04429303
|
| 64 |
+
PNt,swap5,Bowman,DWNNQSIVKTGERQHGIHIQGSDPGGVRTASGTTIKVSGRQAQGILLENPAAELQFRNGSVTSSGQLSDDGIEDFLGTVTVDAGELVADHATLANVGDTWDDDGIALYVAGEQAQASIADSTLQGAGGVQIEDGANVTVQESAIVDGGLHIGALQSLQPRRLPPSRVVLRKTNVTAVPASGAPAAVSVLGASKLTLRGGHITGGRAAGVAAMQGAVVHLQRATIRRGRALAGGAVPGGAVPGGAVPGGFGPGGFGPVLDGWYGVDVSGSSVELAQSIVEAPELGAAIRVGRGARVTVPGGSLSAPHGNVIETGGARRFAPQAAPLSITLQAGAH,334,150,7.5,25,4.871,4.760132157,-0.09892239
|
| 65 |
+
PNt,swap6,Bowman,DWNNQSIVKTGERQHGIHIQGSDPGGVRTASGTTIKVSGRQAQGILLENPAAELQFRNGSVTSSGQLSDRGIDRFLGTVTVEAGKLVADHATLANVGDTWDKDGIALYVAGRQAQASIADSTLQGAGGVQIREGANVTVQRSAIVDGGLHIGALQSLQPERLPPSDVVLRDTNVTAVPASGAPAAVSVLGASRLTLDGGHITGGDAAGVAAMQGAVVHLQRATIERGEALAGGAVPGGAVPGGAVPGGFGPGGFGPVLDGWYGVDVSGSSVELAQSIVEAPELGAAIRVGRGARVTVPGGSLSAPHGNVIETGGARRFAPQAAPLSITLQAGAH,334,150,7.5,25,5.261,5.117488247,0.0033773314
|
Tesei-trained_Model/extract_Rg.py
ADDED
|
@@ -0,0 +1,75 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Title: Calculate R_g, R_ee and x using Tesei-trained PML model Omega_2 values
|
| 3 |
+
Author: Lilianna Houston, Ghosh Lab
|
| 4 |
+
Date: July 22nd 2024
|
| 5 |
+
Purpose: This code calculates the R_g, R_ee and x values of protein sequences
|
| 6 |
+
using omega_2 values extracted from the Tesei-trained ML model.
|
| 7 |
+
Inputs: CSV of protein sequences and omega_2 (w2) values.
|
| 8 |
+
Outputs: CSV of protein sequences with R_g, R_ee and x values.
|
| 9 |
+
"""
|
| 10 |
+
|
| 11 |
+
# Path to the CSV file containing protein sequences and omega_2 values
|
| 12 |
+
data_path = "exper_seqs_w2preds.csv"
|
| 13 |
+
# Specify sequence column
|
| 14 |
+
seq_column = 3
|
| 15 |
+
# Specify salt column
|
| 16 |
+
salt_column = 5
|
| 17 |
+
# Specify pH column
|
| 18 |
+
pH_column = 6
|
| 19 |
+
# Specify omega_2 column
|
| 20 |
+
w2_column = 10
|
| 21 |
+
|
| 22 |
+
# Import packages
|
| 23 |
+
import theory_functions # Custom module with all theory functions
|
| 24 |
+
import numpy as np
|
| 25 |
+
import pandas as pd
|
| 26 |
+
import matplotlib.pyplot as plt
|
| 27 |
+
from sympy import symbols, Eq, solve
|
| 28 |
+
from scipy.optimize import minimize, Bounds
|
| 29 |
+
from scipy.optimize import minimize_scalar
|
| 30 |
+
from scipy import special
|
| 31 |
+
import sympy as sp
|
| 32 |
+
import math
|
| 33 |
+
import time
|
| 34 |
+
import sys
|
| 35 |
+
|
| 36 |
+
# Load the data from the CSV file
|
| 37 |
+
data = pd.read_csv(data_path)
|
| 38 |
+
|
| 39 |
+
# Initialize lists to store results
|
| 40 |
+
xs = np.zeros(len(data))
|
| 41 |
+
Rs = np.zeros(len(data))
|
| 42 |
+
Rgs = np.zeros(len(data))
|
| 43 |
+
|
| 44 |
+
for i in range(0, 3):
|
| 45 |
+
o_seq = data.iloc[i, seq_column]
|
| 46 |
+
seq = theory_functions.process_seq(o_seq, False, False, True)
|
| 47 |
+
N = len(seq)
|
| 48 |
+
salt = data.iloc[i, salt_column]
|
| 49 |
+
pH = data.iloc[i, pH_column]
|
| 50 |
+
w2 = data.iloc[i, w2_column]
|
| 51 |
+
|
| 52 |
+
# Load pre-calculated Omega and B values
|
| 53 |
+
OBlist = []
|
| 54 |
+
with open('OBfmt_5-1500.npy', 'rb') as f:
|
| 55 |
+
OBlist.append( np.load(f) )
|
| 56 |
+
OBfmt = OBlist[0]
|
| 57 |
+
|
| 58 |
+
# Get the Omega and B values corresponding to the sequence length N
|
| 59 |
+
index = np.where(OBfmt[:,0]==N)[0][0]
|
| 60 |
+
Omega, B = OBfmt[index,1:]
|
| 61 |
+
|
| 62 |
+
O_term = Omega*w2
|
| 63 |
+
B_term = B
|
| 64 |
+
|
| 65 |
+
# Calculate x, R_ee, and R_g using the theoretical model
|
| 66 |
+
x, Ree, Rg = theory_functions.calc_x_w_load(N, w2, seq, .1, O_term, B_term, salt, pH)
|
| 67 |
+
xs[i] = (x)
|
| 68 |
+
Rs[i] = (Ree)
|
| 69 |
+
Rgs[i] = (Rg)
|
| 70 |
+
|
| 71 |
+
# Add the values to the DataFrame and save it to a CSV file
|
| 72 |
+
data["x_pred"] = xs
|
| 73 |
+
data["Ree_pred"] = Rs
|
| 74 |
+
data["Rg_pred"] = Rgs
|
| 75 |
+
data.to_csv('exper_Rg_preds.csv', index=False)
|
Tesei-trained_Model/extract_w2.py
ADDED
|
@@ -0,0 +1,123 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Title: Extract Omega_2 from Tesei-trained PML model
|
| 3 |
+
Author: Lilianna Houston, Ghosh Lab
|
| 4 |
+
Date: July 22nd 2024
|
| 5 |
+
Purpose: This code extracts the omega_2 (w2) value of protein sequences from a ML
|
| 6 |
+
model trained on the Tesei 2023 dataset.
|
| 7 |
+
Inputs: CSV of protein sequences and weights of the ML model.
|
| 8 |
+
Outputs: CSV of protein sequences with omega_2 predictions.
|
| 9 |
+
"""
|
| 10 |
+
|
| 11 |
+
# Enter path to desired weights
|
| 12 |
+
path_to_weights = "weights/weights_0.best.hdf5"
|
| 13 |
+
# Enter path to data file containing sequnces
|
| 14 |
+
path_to_data = "exper_seqs_master.csv"
|
| 15 |
+
# Specify sequence column
|
| 16 |
+
seq_column = 3
|
| 17 |
+
|
| 18 |
+
# Import packages
|
| 19 |
+
import numpy as np
|
| 20 |
+
import os
|
| 21 |
+
import pandas as pd
|
| 22 |
+
import tensorflow as tf
|
| 23 |
+
from tensorflow import keras
|
| 24 |
+
from tensorflow.keras import layers
|
| 25 |
+
from tensorflow.keras.callbacks import ModelCheckpoint
|
| 26 |
+
from sklearn.metrics import matthews_corrcoef, confusion_matrix
|
| 27 |
+
from sklearn.metrics import precision_recall_curve
|
| 28 |
+
from sklearn.metrics import f1_score
|
| 29 |
+
from sklearn.metrics import auc
|
| 30 |
+
import matplotlib.pyplot as plt
|
| 31 |
+
import sys
|
| 32 |
+
|
| 33 |
+
# Define a dictionary of amino acid residues (alphebetical by full name,
|
| 34 |
+
# stored in letter representation) and their charge.
|
| 35 |
+
amino_acid_data = {
|
| 36 |
+
"A": 0, # Alanine
|
| 37 |
+
"R": 1, # Arginine
|
| 38 |
+
"N": 0, # Asparagine
|
| 39 |
+
"D": -1, # Aspartic acid
|
| 40 |
+
"C": 0, # Cysteine
|
| 41 |
+
"E": -1, # Glutamic acid
|
| 42 |
+
"Q": 0, # Glutamine
|
| 43 |
+
"G": 0, # Glycine
|
| 44 |
+
"H": 0, # Histidine
|
| 45 |
+
"I": 0, # Isoleucine
|
| 46 |
+
"L": 0, # Leucine
|
| 47 |
+
"K": 1, # Lysine
|
| 48 |
+
"M": 0, # Methionine
|
| 49 |
+
"F": 0, # Phenylalanine
|
| 50 |
+
"P": 0, # Proline
|
| 51 |
+
"S": 0, # Serine
|
| 52 |
+
"T": 0, # Threonine
|
| 53 |
+
"W": 0, # Tryptophan
|
| 54 |
+
"Y": 0, # Tyrosine
|
| 55 |
+
"V": 0 # Valine
|
| 56 |
+
}
|
| 57 |
+
|
| 58 |
+
# Function to one-hot encode a protein sequence
|
| 59 |
+
def hotcode_seq(seq):
|
| 60 |
+
hotcode_matrix = np.zeros((21, 1496))
|
| 61 |
+
for i in range(len(seq)):
|
| 62 |
+
index = list(amino_acid_data.keys()).index(seq[i]) # Find the index of the amino acid
|
| 63 |
+
hotcode_matrix[index, i] = 1 # Set the corresponding position in the matrix to 1
|
| 64 |
+
hotcode_matrix[20, (i+1):] = 1 # Set remaining positions in the last row to 1
|
| 65 |
+
return hotcode_matrix
|
| 66 |
+
|
| 67 |
+
# Function to convert a list of sequences to their one-hot encoded matrices
|
| 68 |
+
def make_hotcodes(data):
|
| 69 |
+
hotcodes = []
|
| 70 |
+
for i in range(len(data)):
|
| 71 |
+
hotcodes.append(hotcode_seq(data[i]))
|
| 72 |
+
return np.asarray(hotcodes)
|
| 73 |
+
|
| 74 |
+
# -------------- Create a model framework in which to load weights -------------------
|
| 75 |
+
|
| 76 |
+
# The Tesei trained model uses a maximum sequences length of 1496, the length
|
| 77 |
+
# of the longest sequence in the Tesei 2023 dataset.
|
| 78 |
+
model_input_shape = (21, 1496, 1)
|
| 79 |
+
|
| 80 |
+
image_input = keras.Input(shape=model_input_shape)
|
| 81 |
+
|
| 82 |
+
# Convolutional layer with 29 filters, kernel size (21, 6), and ReLU activation
|
| 83 |
+
conv1 = layers.Conv2D(29, kernel_size=(21, 6), activation='relu')(image_input)
|
| 84 |
+
|
| 85 |
+
# Flatten the output from the convolutional layer
|
| 86 |
+
flatten = layers.Flatten()(conv1)
|
| 87 |
+
|
| 88 |
+
# Dense layer with 100 units and softsign activation
|
| 89 |
+
dense1 = layers.Dense(100, activation='softsign')(flatten)
|
| 90 |
+
# Dense layer with 30 units and softsign activation
|
| 91 |
+
dense2 = layers.Dense(30, activation='softsign')(dense1)
|
| 92 |
+
# Output layer with 1 unit and linear activation
|
| 93 |
+
output = layers.Dense(1, activation='linear')(dense2)
|
| 94 |
+
|
| 95 |
+
model = keras.Model(inputs=image_input, outputs=output, name="model")
|
| 96 |
+
# --------------------------------------------------------------------------------------
|
| 97 |
+
|
| 98 |
+
# Load pre-trained weights into model
|
| 99 |
+
model.load_weights(path_to_weights, skip_mismatch=False)
|
| 100 |
+
|
| 101 |
+
# Compile the model with Adam optimizer and mean squared error loss
|
| 102 |
+
model.compile(optimizer='adam', loss='mean_squared_error', metrics=['accuracy'])
|
| 103 |
+
|
| 104 |
+
# Load data file
|
| 105 |
+
data = pd.read_csv(path_to_data)
|
| 106 |
+
|
| 107 |
+
# Extract the sequences from the data
|
| 108 |
+
seqs = data.iloc[:, seq_column]
|
| 109 |
+
|
| 110 |
+
# Convert the sequences to their one-hot encoded matrices
|
| 111 |
+
hots = make_hotcodes(seqs)
|
| 112 |
+
|
| 113 |
+
# Predict the output for the one-hot encoded sequences using the model
|
| 114 |
+
preds = model.predict(hots)
|
| 115 |
+
|
| 116 |
+
# Extract the predictions from the model output
|
| 117 |
+
w2_preds = preds[:, 0]
|
| 118 |
+
|
| 119 |
+
# Add the predictions to the data
|
| 120 |
+
data["w2_preds_tesei_model"] = w2_preds
|
| 121 |
+
|
| 122 |
+
# Save the data with predictions to a new CSV file
|
| 123 |
+
data.to_csv("exper_seqs_w2preds.csv", index=False)
|
Tesei-trained_Model/theory_functions.py
ADDED
|
@@ -0,0 +1,303 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import numpy as np
|
| 2 |
+
import pandas as pd
|
| 3 |
+
import matplotlib.pyplot as plt
|
| 4 |
+
from sympy import symbols, Eq, solve
|
| 5 |
+
from scipy.optimize import minimize, Bounds
|
| 6 |
+
from scipy.optimize import minimize_scalar
|
| 7 |
+
from scipy import special
|
| 8 |
+
import sympy as sp
|
| 9 |
+
import math
|
| 10 |
+
|
| 11 |
+
|
| 12 |
+
# ----TEMPURATURE----
|
| 13 |
+
# general
|
| 14 |
+
general_T = 293
|
| 15 |
+
# LL
|
| 16 |
+
#T = 310
|
| 17 |
+
# Mittal
|
| 18 |
+
#T = 300
|
| 19 |
+
T = 298
|
| 20 |
+
|
| 21 |
+
|
| 22 |
+
# ----KUHN LENGTH----
|
| 23 |
+
# Kuhn length (Angstroms) (sometimes written without a subscript)
|
| 24 |
+
l_k = 8
|
| 25 |
+
# bond length
|
| 26 |
+
b = 3.8
|
| 27 |
+
# Bjerrum Kuhn Length (7.12 at 20C)
|
| 28 |
+
l_b_20 = 7.12
|
| 29 |
+
l_b = l_b_20 * (general_T / T)
|
| 30 |
+
|
| 31 |
+
#salt = 150
|
| 32 |
+
|
| 33 |
+
# ----SALT----
|
| 34 |
+
# convert to mol/L, cancel mol, cancel/convert liters to cm^3, cancel/convert cm^3 to A^-3
|
| 35 |
+
def mM_to_A(mM):
|
| 36 |
+
return mM * 10**(-3) * 6.022*10**(23) * (1/1000) * 1 / (10**8)**3
|
| 37 |
+
|
| 38 |
+
def kl_func(salt):
|
| 39 |
+
return np.sqrt(8 * math.pi * l_b * mM_to_A(salt)) * b
|
| 40 |
+
|
| 41 |
+
#----OMEGA 3----
|
| 42 |
+
w3 = .2
|
| 43 |
+
|
| 44 |
+
|
| 45 |
+
#----Constants----
|
| 46 |
+
amino_acid_data = {
|
| 47 |
+
"A": 0,
|
| 48 |
+
"R": 1,
|
| 49 |
+
"N": 0,
|
| 50 |
+
"D": -1,
|
| 51 |
+
"C": 0,
|
| 52 |
+
"E": -1,
|
| 53 |
+
"Q": 0,
|
| 54 |
+
"G": 0,
|
| 55 |
+
"H": .5,
|
| 56 |
+
"I": 0,
|
| 57 |
+
"L": 0,
|
| 58 |
+
"K": 1,
|
| 59 |
+
"M": 0,
|
| 60 |
+
"F": 0,
|
| 61 |
+
"P": 0,
|
| 62 |
+
"S": 0,
|
| 63 |
+
"T": 0,
|
| 64 |
+
"W": 0,
|
| 65 |
+
"Y": 0,
|
| 66 |
+
"V": 0,
|
| 67 |
+
"B": 2,
|
| 68 |
+
"Z": -2
|
| 69 |
+
}
|
| 70 |
+
|
| 71 |
+
pKa_values = {
|
| 72 |
+
"R": 12.3,
|
| 73 |
+
"D": 3.5,
|
| 74 |
+
"C": 6.8,
|
| 75 |
+
"E": 4.2,
|
| 76 |
+
"H": 6.6,
|
| 77 |
+
"K": 10.5,
|
| 78 |
+
"Y": 10.3,
|
| 79 |
+
"B": 7.7,
|
| 80 |
+
"Z": 3.3
|
| 81 |
+
}
|
| 82 |
+
|
| 83 |
+
s_values = {
|
| 84 |
+
"R": 1,
|
| 85 |
+
"D": -1,
|
| 86 |
+
"C": -1,
|
| 87 |
+
"E": -1,
|
| 88 |
+
"H": 1,
|
| 89 |
+
"K": 1,
|
| 90 |
+
"Y": -1,
|
| 91 |
+
"B": 1,
|
| 92 |
+
"Z": -1
|
| 93 |
+
}
|
| 94 |
+
|
| 95 |
+
#----Functions----
|
| 96 |
+
def adjust_pH(pH):
|
| 97 |
+
for key in pKa_values:
|
| 98 |
+
if key in amino_acid_data:
|
| 99 |
+
s = s_values[key]
|
| 100 |
+
amino_acid_data[key] = convert_charge_pH(pH, key, s)
|
| 101 |
+
|
| 102 |
+
def convert_charge_pH(pH, key, s):
|
| 103 |
+
pKa = pKa_values[key]
|
| 104 |
+
return s / (1 + 10**(s*(pH - pKa)))
|
| 105 |
+
|
| 106 |
+
def get_x(Ree, N):
|
| 107 |
+
return Ree**2 / (N * b * l_k)
|
| 108 |
+
|
| 109 |
+
# * .1 converts to nanometers
|
| 110 |
+
def get_Ree(x, N):
|
| 111 |
+
return np.sqrt(x * (N * b * l_k)) * .1
|
| 112 |
+
|
| 113 |
+
# * .1 converts to nanometers
|
| 114 |
+
def get_Rg(x, N):
|
| 115 |
+
return np.sqrt(x * N * b * l_k / 6) * .1
|
| 116 |
+
|
| 117 |
+
def process_seq(seq, n, c, idp):
|
| 118 |
+
new_seq = seq
|
| 119 |
+
|
| 120 |
+
n_aa = new_seq[0]
|
| 121 |
+
c_aa = new_seq[-1]
|
| 122 |
+
|
| 123 |
+
if n or idp:
|
| 124 |
+
if n_aa == "R" or n_aa == "K": new_seq = "B" + new_seq[1:]
|
| 125 |
+
elif n_aa == "D" or n_aa == "E": new_seq = "A" + new_seq[1:]
|
| 126 |
+
else: new_seq = "R" + new_seq[1:]
|
| 127 |
+
|
| 128 |
+
if c or idp:
|
| 129 |
+
if c_aa == "R" or c_aa == "K": new_seq = new_seq[:-1] + "A"
|
| 130 |
+
elif c_aa == "D" or c_aa == "E": new_seq = new_seq[:-1] + "Z"
|
| 131 |
+
else: new_seq = new_seq[:-1] + "D"
|
| 132 |
+
|
| 133 |
+
return new_seq
|
| 134 |
+
|
| 135 |
+
def get_charge(m, n, seq):
|
| 136 |
+
aam = seq[m-1]
|
| 137 |
+
aan = seq[n-1]
|
| 138 |
+
|
| 139 |
+
qm = amino_acid_data.get(aam)
|
| 140 |
+
qn = amino_acid_data.get(aan)
|
| 141 |
+
|
| 142 |
+
return qm * qn
|
| 143 |
+
|
| 144 |
+
def set_vars(data, i):
|
| 145 |
+
name = data.iloc[i, 0]
|
| 146 |
+
|
| 147 |
+
raw_seq = data.iloc[i, 2]
|
| 148 |
+
N = int(data.iloc[i, 3])
|
| 149 |
+
nterm = data.iloc[i, 4]
|
| 150 |
+
cterm = data.iloc[i, 5]
|
| 151 |
+
nandc = data.iloc[i, 6]
|
| 152 |
+
|
| 153 |
+
x = data.iloc[i, 9]
|
| 154 |
+
w2 = data.iloc[i, 10]
|
| 155 |
+
if w2 != "None":
|
| 156 |
+
w2 = float(w2)
|
| 157 |
+
|
| 158 |
+
seq = process_seq(raw_seq, nterm, cterm, nandc)
|
| 159 |
+
|
| 160 |
+
return N, w2, seq, x, name
|
| 161 |
+
|
| 162 |
+
def calc_x_w_load(N, w2, seq, seed, O_term, B_term, salt, pH):
|
| 163 |
+
adjust_pH(pH)
|
| 164 |
+
print("salt:", salt, "kappa:", kl_func(salt))
|
| 165 |
+
mn_array = mnArray_Q_prime(N, seq)
|
| 166 |
+
bounds = Bounds(.01, 10, keep_feasible=False)
|
| 167 |
+
result = minimize(function_to_solve, seed, method="Nelder-Mead", args=(N, w2, seq, mn_array, O_term, B_term, salt), bounds=bounds)
|
| 168 |
+
x = result.x[0]
|
| 169 |
+
return x, get_Ree(x, N), get_Rg(x, N)
|
| 170 |
+
|
| 171 |
+
# add 1 to end of sum, python is non-inclusive
|
| 172 |
+
def Omega(N, w2):
|
| 173 |
+
result = 0.0
|
| 174 |
+
for m in range(2, N + 1):
|
| 175 |
+
for n in range(1, m):
|
| 176 |
+
result += w2 * ((m - n) ** (-0.5))
|
| 177 |
+
return 1/N * result
|
| 178 |
+
|
| 179 |
+
def mn_Omega(N):
|
| 180 |
+
result = 0.0
|
| 181 |
+
for m in range(2, N + 1):
|
| 182 |
+
for n in range(1, m):
|
| 183 |
+
result += (m - n) ** (-0.5)
|
| 184 |
+
return 1/N * result
|
| 185 |
+
|
| 186 |
+
def B(N):
|
| 187 |
+
result = 0.0
|
| 188 |
+
for p in range(3, N+1):
|
| 189 |
+
for m in range(2, p):
|
| 190 |
+
for n in range(1, m):
|
| 191 |
+
result += (p - n)/(((p-m)*(m-n))**(3/2))
|
| 192 |
+
return 1/N * result
|
| 193 |
+
|
| 194 |
+
# output scd to test
|
| 195 |
+
def Q(N, seq):
|
| 196 |
+
result = 0.0
|
| 197 |
+
for m in range(2, N+1):
|
| 198 |
+
for n in range(1, m):
|
| 199 |
+
result += get_charge(m, n, seq) * ((m - n) ** (0.5))
|
| 200 |
+
output = 1/N * result
|
| 201 |
+
#print(output)
|
| 202 |
+
return output
|
| 203 |
+
|
| 204 |
+
def Q_prime(N, seq, x):
|
| 205 |
+
result = 0.0
|
| 206 |
+
for m in range(2, N+1):
|
| 207 |
+
for n in range(1, m):
|
| 208 |
+
result += get_charge(m, n, seq) * ((m-n)**2) * A_prime(m, n, x)
|
| 209 |
+
output = 1/N * result
|
| 210 |
+
return output
|
| 211 |
+
|
| 212 |
+
def mn_Q_prime(N, seq, x, mn_array, salt):
|
| 213 |
+
result = 0.0
|
| 214 |
+
i = 0
|
| 215 |
+
for m in range(2, N+1):
|
| 216 |
+
for n in range(1, m):
|
| 217 |
+
result += mn_array[i] * A_prime(m, n, x, salt)
|
| 218 |
+
i += 1
|
| 219 |
+
output = 1/N * result
|
| 220 |
+
return output
|
| 221 |
+
|
| 222 |
+
def mnArray_Q_prime(N, seq):
|
| 223 |
+
result = []
|
| 224 |
+
for m in range(2, N+1):
|
| 225 |
+
for n in range(1, m):
|
| 226 |
+
result.append(get_charge(m, n, seq) * ((m-n)**2))
|
| 227 |
+
return result
|
| 228 |
+
|
| 229 |
+
def A_prime(m, n, x, salt):
|
| 230 |
+
term1 = 1/2 * (6*math.pi/x)**(1/2) * (1/(m-n)**(3/2))
|
| 231 |
+
term2 = kl_func(salt) * (math.pi/2) * (1/(m-n))
|
| 232 |
+
term3 = special.erfcx(np.sqrt(kl_func(salt)**2 * x * (m-n) / 6))
|
| 233 |
+
return term1 - term2 * term3
|
| 234 |
+
|
| 235 |
+
def free_energy(x, N, w2, seq, mn_array, O_term, B_term, salt):
|
| 236 |
+
# beta F(x) = 3/2 (x-ln(x))
|
| 237 |
+
# + (3/(2pi))**(2/3) * Omega * 1/x**(3/2)
|
| 238 |
+
# + w_3 (3/(2 pi))**3 * B/2 * 1/(x**3)
|
| 239 |
+
# + l_b / l_k * Q*sqrt(6/pi) * 1/x**(1/2)
|
| 240 |
+
|
| 241 |
+
#Q_term = Q(N, seq)
|
| 242 |
+
Q_term = mn_Q_prime(N, seq, x, mn_array, salt)
|
| 243 |
+
# Define the equation
|
| 244 |
+
eq = ( (
|
| 245 |
+
3/2 * (x - np.log(x))
|
| 246 |
+
+ (3/(2*math.pi))**(3/2) * O_term * (1/(x**(3/2)))
|
| 247 |
+
+ (w3*(3/(2*math.pi))**(3))/2 * B_term * (1/(x**3))
|
| 248 |
+
+ (l_b / b) * 2/math.pi * Q_term
|
| 249 |
+
))
|
| 250 |
+
|
| 251 |
+
return eq
|
| 252 |
+
|
| 253 |
+
def function_to_solve(argument, N, w2, seq, mn_array, O_term, B_term, salt):
|
| 254 |
+
"""function, to be solved."""
|
| 255 |
+
|
| 256 |
+
x = argument
|
| 257 |
+
|
| 258 |
+
sol = free_energy(x, N, w2, seq, mn_array, O_term, B_term, salt)
|
| 259 |
+
return sol
|
| 260 |
+
|
| 261 |
+
|
| 262 |
+
def Q_prime_derv(N, seq, x):
|
| 263 |
+
result = 0.0
|
| 264 |
+
for m in range(2, N+1):
|
| 265 |
+
for n in range(1, m):
|
| 266 |
+
result += get_charge(m, n, seq) * ((m-n)**2) * A_prime_derv(m, n, x)
|
| 267 |
+
output = 1/N * result
|
| 268 |
+
return output
|
| 269 |
+
|
| 270 |
+
def A_prime_derv(m, n, x):
|
| 271 |
+
term1 = (np.sqrt(math.pi)/4) * (6/x)**(3/2) * (1/(m-n)**(3/2))
|
| 272 |
+
term2 = kl**2 * (np.sqrt(math.pi)/2) * (6/x)**(1/2) * (1/(m-n)**(1/2))
|
| 273 |
+
term3 = kl**3 * (math.pi/2) * special.erfcx(np.sqrt(kl**2 * x * (m-n) / 6))
|
| 274 |
+
return (-1/6) * (term1 - term2 + term3)
|
| 275 |
+
|
| 276 |
+
def solve_for_w2_eq(w2, N, x, seq, mn_O_term, B_term, Q_term):
|
| 277 |
+
O_term = mn_O_term * w2
|
| 278 |
+
|
| 279 |
+
# Derivative of the free energy equation
|
| 280 |
+
eq = (
|
| 281 |
+
3/2 * (x - 1)/x
|
| 282 |
+
+ (-9 * np.sqrt(3/2)/(4*(math.pi)**(3/2))) * O_term * (1/(x**(5/2)))
|
| 283 |
+
+ (-w3 * 81) / (16 * (math.pi)**(3)) * B_term * (1/(x**4))
|
| 284 |
+
+ (l_b / b) * 2/math.pi * Q_term
|
| 285 |
+
)
|
| 286 |
+
|
| 287 |
+
return eq
|
| 288 |
+
|
| 289 |
+
def solve_for_w2(N, seq, x):
|
| 290 |
+
|
| 291 |
+
B_term = B(N, seq)
|
| 292 |
+
Q_term = Q_prime_derv(N, seq, x)
|
| 293 |
+
mn_O_term = mn_Omega(N)
|
| 294 |
+
|
| 295 |
+
# Define the variable
|
| 296 |
+
w2 = sp.Symbol('w2')
|
| 297 |
+
|
| 298 |
+
# Define the equation eq(x, y) = 0
|
| 299 |
+
equation = solve_for_w2_eq(w2, N, x, seq, mn_O_term, B_term, Q_term)
|
| 300 |
+
|
| 301 |
+
# Solve the equation for the given value of y
|
| 302 |
+
solutions = sp.solve(equation, w2)
|
| 303 |
+
return solutions[0]
|
Tesei-trained_Model/weights/.DS_Store
ADDED
|
Binary file (6.15 kB). View file
|
|
|
Tesei-trained_Model/weights/weights_0.best.hdf5
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:4c5ff706e6f32054b429f29f4a2b0e64b8a4762592bac0e2e61ae3fa7438bfa8
|
| 3 |
+
size 52008368
|
Tesei-trained_Model/weights/weights_1.best.hdf5
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:74b5ceaec3ff343b2b5aa9211ad46b6436d1a442c2f73a33f31ef5aeca20aef2
|
| 3 |
+
size 52008368
|
Tesei-trained_Model/weights/weights_2.best.hdf5
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ad2207c16437a5b180c9c0ea684194ca8546336dfe2a16cd40b2b2fee8c73489
|
| 3 |
+
size 52008368
|
Tesei-trained_Model/weights/weights_3.best.hdf5
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:28a1c6b46d3a1bfa11d6daec68faa0f9d3038a9e3ead4e6f322849c9ac24448a
|
| 3 |
+
size 52008368
|
Tesei-trained_Model/weights/weights_4.best.hdf5
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:66930549677374a057066479a377461d9d214e7f22874a3b4b44e5e4e89ae3f4
|
| 3 |
+
size 52008368
|
Tesei-trained_Model/weights/weights_5.best.hdf5
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:10c1140515d62471aea98ad54400ff87d0bcac29c4c2f2438c7f54067109836a
|
| 3 |
+
size 52008368
|
Tesei-trained_Model/weights/weights_6.best.hdf5
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:cd84177337c8689375039429933b5ae7f145863d5361ccf41c9a322f0994a100
|
| 3 |
+
size 52008368
|
Tesei-trained_Model/weights/weights_7.best.hdf5
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:4d56654caf132b8522820515dec8aac96176b7f42f2f782f92bf5ca9e3b65f52
|
| 3 |
+
size 52008368
|
Tesei-trained_Model/weights/weights_8.best.hdf5
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c15d0ccf9bb77efdc0d0a13c5744d2816b0bf71b54494cf5ba9a06282cc84838
|
| 3 |
+
size 52008368
|
Tesei-trained_Model/weights/weights_9.best.hdf5
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:52269623b4998156e7a346cecd268ea85339e49f125f78455629a81872f2f1b6
|
| 3 |
+
size 52008368
|