IDPLab commited on
Commit
6572b13
·
verified ·
1 Parent(s): db53ffa

Upload 20 files

Browse files
.gitattributes CHANGED
@@ -33,3 +33,13 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ Tesei-trained_Model/weights/weights_0.best.hdf5 filter=lfs diff=lfs merge=lfs -text
37
+ Tesei-trained_Model/weights/weights_1.best.hdf5 filter=lfs diff=lfs merge=lfs -text
38
+ Tesei-trained_Model/weights/weights_2.best.hdf5 filter=lfs diff=lfs merge=lfs -text
39
+ Tesei-trained_Model/weights/weights_3.best.hdf5 filter=lfs diff=lfs merge=lfs -text
40
+ Tesei-trained_Model/weights/weights_4.best.hdf5 filter=lfs diff=lfs merge=lfs -text
41
+ Tesei-trained_Model/weights/weights_5.best.hdf5 filter=lfs diff=lfs merge=lfs -text
42
+ Tesei-trained_Model/weights/weights_6.best.hdf5 filter=lfs diff=lfs merge=lfs -text
43
+ Tesei-trained_Model/weights/weights_7.best.hdf5 filter=lfs diff=lfs merge=lfs -text
44
+ Tesei-trained_Model/weights/weights_8.best.hdf5 filter=lfs diff=lfs merge=lfs -text
45
+ Tesei-trained_Model/weights/weights_9.best.hdf5 filter=lfs diff=lfs merge=lfs -text
Tesei-trained_Model/.DS_Store ADDED
Binary file (6.15 kB). View file
 
Tesei-trained_Model/OBfmt_5-1500.npy ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c157256540326ba182c0a7e5fb94995b222e1757baa0b0526d2a91c060ee50ed
3
+ size 36032
Tesei-trained_Model/README.txt ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ This readme file was generated on 2024-07-23 by Lilianna Houston
2
+
3
+ GENERAL INFORMATION
4
+
5
+ Title of Project: PML, Tesei-trained Model
6
+
7
+ Principal Investigator Information
8
+ Name: Kingshuk Ghosh
9
+ Institution: University of Denver
10
+ Email: kingshuk.ghosh@du.edu
11
+
12
+ Author Information
13
+ Name: Lilianna Houston
14
+ Institution: University of Denver
15
+ Email: lili.houston@du.edu
16
+
17
+ DATA & FILE OVERVIEW
18
+
19
+ File List:
20
+
21
+ "weights" -> Folder containing weights from the Tesei-trained CNN that predicts omega_2 from sequence. We trained the model 10 separate times on all omega_2 calculations from the Tesei 2023 dataset and provide all 10 resulting weights.
22
+
23
+ "Tesei_w2_Ree_preds" -> CSV containing calculated and ML predicted omega_2 (w2) (predicted using 10 fold cross-validation), as well as reported and predicted R_ee for the Tesei 2023 dataset. Sequences were omega_2 calculation failed are omitted.
24
+
25
+ "exper_seqs_master" -> CSV of our compiled experimental sequences, including source, sequnces, salt, pH, temperature, reported R_g and our predicted R_g (our value is averaged across the results of all 10 trained models). This is used as the input file for extract_w2.py [Use a different csv if you want to use a different sequence or set of sequences.]
26
+
27
+ "extract_w2" -> .py file that extracts the omega_2s of a specified list of sequences using a specified set of weights. Make sure you use the correct input file if you want to change the current input file. Also change the output file at the end of the code if you change the input file.
28
+
29
+ "exper_seqs_w2preds" -> CSV file. Same content as "exper_seqs_master," with the addition of predicted w2s using weights_0 from the "weights" folder. This is used as the input for extract_Rg
30
+
31
+ "extract_Rg" -> .py file that extracts the x, R_ee, and R_gs of a specified list of sequences using omega_2. Currently w2 is obtained from exper_seqs_w2preds but use a different one if you used a different output above.
32
+
33
+ "OBfmt_5-1500.npy" -> Helper file for "extract_Rg" containing precalulated terms.
34
+ "theory_functions" -> .py helper file for "extract_Rg" containing constants and functions needed for R_g calculation.
Tesei-trained_Model/Tesei_w2_Ree_preds.csv ADDED
The diff for this file is too large to render. See raw diff
 
Tesei-trained_Model/exper_seqs_master.csv ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ protein,varient_name,paper,Sequence,N,salt,pH,temp,Rg[nm],Rg_av_pred
2
+ hnRNPA1,WT,Martin,GSMASASSSQRGRSGSGNFGGGRGGGFGGNDNFGRGGNFSGRGGFGGSRGGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGRSSGGSGGGGQYFAKPRNQGGYGGSSSSSSYGSGRRF,137,50,7,23,2.72,2.77954575
3
+ hnRNPA2,Aro-,Martin,GSMAFASSFQRGRYGSGNFGGGRGGGFGGNDNFGRGGNFSGRGGFGGSRGGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGRSSGGSYGGGQYFAKPRNQGGYGGSSFSSSYGSGRRF,137,50,7,23,2.89,2.694002975
4
+ hnRNPA3,Aro--,Martin,GSMASASSSQRGRSGSGNSGGGRGGGFGGNDNFGRGGNSSGRGGFGGSRGGGGYGGSGDGYNGFGNDGSNSGGGGSSNDFGNYNNQSSNFGPMKGGNFGGRSSGGSGGGGQYSAKPRNQGGYGGSSSSSSSGSGRRF,137,50,7,23,3.01,2.886092558
5
+ hnRNPA4,Aro+,Martin,GSMASASSSQRGRSGSGNSGGGRGGGFGGNDNSGRGGNSSGRGGFGGSRGGGGSGGSGDGYNGSGNDGSNSGGGGSSNDFGNSNNQSSNSGPMKGGNFGGRSSGGSGGGGQYSAKPRNQGGSGGSSSSSSSGSGRRS,137,50,7,23,2.44,3.011271692
6
+ hnRNPA1_(A1-LCD),WT-NLS,Bremer,GSMASASSSQRGRSGSGNFGGGRGGGFGGNDNFGRGGNFSGRGGFGGSRGGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGRSSGGSGGGGQYFAKPRNQGGYGGSSSSSSYGSGRRF,137,150,7,25,2.76,2.667853665
7
+ hnRNPA1_(A1-LCD),WT+NLS,Bremer,GSMASASSSQRGRSGSGNFGGGRGGGFGGNDNFGRGGNFSGRGGFGGSRGGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGYGGSSSSSSYGSGRRF,137,150,7,25,2.583,2.665065329
8
+ hnRNPA1_(A1-LCD),-12F+12Y,Bremer,GSMASASSSQRGRSGSGNYGGGRGGGYGGNDNYGRGGNYSGRGGYGGSRGGGGYGGSGDGYNGYGNDGSNYGGGGSYNDYGNYNNQSSNYGPMKGGNYGGRSSGGSGGGGQYYAKPRNQGGYGGSSSSSSYGSGRRY,137,150,7,25,2.604,2.593430947
9
+ hnRNPA1_(A1-LCD),+7F-7Y,Bremer,GSMASASSSQRGRSGSGNFGGGRGGGFGGNDNFGRGGNFSGRGGFGGSRGGGGFGGSGDGFNGFGNDGSNFGGGGSFNDFGNFNNQSSNFGPMKGGNFGGRSSGGSGGGGQFFAKPRNQGGFGGSSSSSSFGSGRRF,137,150,7,25,2.718,2.72649201
10
+ hnRNPA1_(A1-LCD),-9F+6Y,Bremer,GSMASASSSQRGRSGSGNFGGGRGGGYGGNDNYGRGGNYSGRGGFGGSRGGGGYGGSGDGYNGGGNDGSNYGGGGSYNDSGNYNNQSSNFGPMKGGNYGGRSSGGSGGGGQYGAKPRNQGGYGGSSSSSSYGSGRRY,137,150,7,25,2.655,2.67330763
11
+ hnRNPA1_(A1-LCD),-8F+4Y,Bremer,GSMASASSSQRGRSGSGNFGGGRGGGYGGNDNGGRGGNYSGRGGFGGSRGGGGYGGSGDGYNGGGNDGSNYGGGGSYNDSGNYNNQSSNFGPMKGGNYGGRSSGGSGGGGQYGAKPRNQGGYGGSSSSSSYGSGRRF,137,150,7,25,2.707,2.69069985
12
+ hnRNPA1_(A1-LCD),-9F+3Y,Bremer,GSMASASSSQRGRSGSGNFGGGRGGGYGGNDNGGRGGNYSGRGGFGGSRGGGGYGGSGDGYNGGGNDGSNYGGGGSYNDSGNGNNQSSNFGPMKGGNYGGRSSGGSGGGGQYGAKPRNQGGYGGSSSSSSYGSGRRS,137,150,7,25,2.683,2.720835079
13
+ hnRNPA1_(A1-LCD),-10R,Bremer,GSMASASSSQGGSSGSGNFGGGGGGGFGGNDNFGGGGNFSGSGGFGGSGGGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGSSSGPYGGGGQYFAKPGNQGGYGGSSSSSSYGSGGGF,137,150,7,25,2.671,2.698054512
14
+ hnRNPA1_(A1-LCD),-6R,Bremer,GSMASASSSQGGRSGSGNFGGGRGGGFGGNDNFGGGGNFSGSGGFGGSRGGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGSSSGPYGGGGQYFAKPGNQGGYGGSSSSSSYGSGGRF,137,150,7,25,2.573,2.647302437
15
+ hnRNPA1_(A1-LCD),+7R,Bremer,GSMASASSSQRGRSGRGNFGGGRGGGFGGNDNFGRGGNFSGRGGFGGSRGGGRYGGSGDRYNGFGNDGRNFGGGGSYNDFGNYNNQSSNFGPMKGGNFRGRSSGPYGRGGQYFAKPRNQGGYGGSSSSRSYGSGRRF,137,150,7,25,2.709,2.8897102
16
+ hnRNPA1_(A1-LCD),-3R+3K,Bremer,GSMASASSSQRGKSGSGNFGGGRGGGFGGNDNFGRGGNFSGRGGFGGSKGGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGRSSGGSGGGGQYFAKPRNQGGYGGSSSSSSYGSGRKF,137,150,7,25,2.634,2.734381901
17
+ hnRNPA1_(A1-LCD),-6R+6K,Bremer,GSMASASSSQKGKSGSGNFGGGRGGGFGGNDNFGKGGNFSGRGGFGGSKGGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGKSSGGSGGGGQYFAKPRNQGGYGGSSSSSSYGSGRKF,137,150,7,25,2.787,2.794997238
18
+ hnRNPA1_(A1-LCD),-4D,Bremer,GSMASASSSQRGRSGSGNFGGGRGGGFGGNGNFGRGGNFSGRGGFGGSRGGGGYGGSGGGYNGFGNSGSNFGGGGSYNGFGNYNNQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGYGGSSSSSSYGSGRRF,137,150,7,25,2.642,2.808935655
19
+ hnRNPA1_(A1-LCD),+4D,Bremer,GSMASASSSQRDRSGSGNFGGGRGGGFGGNDNFGRGGNFSGRGDFGGSRGGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGRSSDPYGGGGQYFAKPRNQGGYGGSSSSSSYDSGRRF,137,150,7,25,2.718,2.660307676
20
+ hnRNPA1_(A1-LCD),+8D,Bremer,GSMASASSSQRDRSGSGNFGGGRDGGFGGNDNFGRGDNFSGRGDFGGSRDGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGRSSDPYGGGGQYFAKPRNQDGYGGSSSSSSYDSGRRF,137,150,7,25,2.685,2.700351745
21
+ hnRNPA1_(A1-LCD),+12D,Bremer,GSMASADSSQRDRDDSGNFGDGRGGGFGGNDNFGRGGNFSDRGGFGGSRGDGGYGGDGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFDPMKGGNFGDRSSGPYDGGGQYFAKPRNQGGYGGSSSSSSYGSDRRF,137,150,7,25,2.801,2.73413
22
+ hnRNPA1_(A1-LCD),+12E,Bremer,GSMASAESSQREREESGNFGEGRGGGFGGNDNFGRGGNFSERGGFGGSRGEGGYGGEGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFEPMKGGNFGERSSGPYEGGGQYFAKPRNQGGYGGSSSSSSYGSERRF,137,150,7,25,2.852,2.765836221
23
+ hnRNPA1_(A1-LCD),+7K+12D,Bremer,GSMASADSSQRDRDDKGNFGDGRGGGFGGNDNFGRGGNFSDRGGFGGSRGDGKYGGDGDKYNGFGNDGKNFGGGGSYNDFGNYNNQSSNFDPMKGGNFKDRSSGPYDKGGQYFAKPRNQGGYGGSSSSKSYGSDRRF,137,150,7,25,2.921,2.864720166
24
+ hnRNPA1_(A1-LCD),+7K+12D_blocky,Bremer,GSMASAKSSQRDRDDDGNFGKGRGGGFGGNKNFGRGGNFSKRGGFGGSRGKGKYGGKGDDYNGFGNDGDNFGGGGSYNDFGNYNNQSSNFDPMDGGNFDDRSSGPYDDGGQYFADPRNQGGYGGSSSSKSYGSKRRF,137,150,7,25,2.562,2.595524581
25
+ hnRNPA1_(A1-LCD),+2R,Bremer,GSMASASSSQRGRSGSGNFGGGRGGGFGGNDNFGRGGNFSGRGGFGGSRGGGGYGGSGDGYNGFRNDGSNFGGGGRYNDFGNYNNQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGYGGSSSSSSYGSGRRF,137,150,7,25,2.623,2.74310832
26
+ hnRNPA1_(A1-LCD),-10R+10K,Bremer,GSMASASSSQKGKSGSGNFGGGKGGGFGGNDNFGKGGNFSGKGGFGGSKGGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGKSSGGSGGGGQYFAKPKNQGGYGGSSSSSSYGSGKKF,137,150,7,25,2.849,2.883381638
27
+ hnRNPA1_(A1-LCD),-12F+12Y-10R,Bremer,GSMASASSSQGGSSGSGNYGGGGGGGYGGNDNYGGGGNYSGSGGYGGSGGGGGYGGSGDGYNGYGNDGSNYGGGGSYNDYGNYNNQSSNYGPMKGGNYGGSSSGPYGGGGQYYAKPGNQGGYGGSSSSSSYGSGGGY,137,150,7,25,2.607,2.624120722
28
+ hnRNPA1_(A1-LCD),-10F+7R+12D,Bremer,GSMASADSSQRDRDDRGNFGDGRGGGGGGNDNFGRGGNGSDRGGGGGSRGDGRYGGDGDRYNGGGNDGRNGGGGGSYNDGGNYNNQSSNGDPMKGGNGRDRSSGPYDRGGQYGAKPRNQGGYGGSSSSRSYGSDRRG,137,150,7,25,2.86,2.808875725
29
+ pNT,pNT,Riback,DWNNQSIVKTGERQHGIHIQGSDPGGVRTASGTTIKVSGRQAQGILLENPAAELQFRNGSVTSSGQLSDDGIRRFLGTVTVKAGKLVADHATLANVGDTWDDDGIALYVAGEQAQASIADSTLQGAGGVQIERGANVTVQRSAIVDGGLHIGALQSLQPEDLPPSRVVLRDTNVTAVPASGAPAAVSVLGASELTLDGGHITGGRAAGVAAMQGAVVHLQRATIRRGDALAGGAVPGGAVPGGAVPGGFGPGGFGPVLDGWYGVDVSGSSVELAQSIVEAPELGAAIRVGRGARVTVPGGSLSAPHGNVIETGGARRFAPQAAPLSITLQAGAH,334,150,7.5,25,5.11,5.000444014
30
+ fHua,fHua,Riback,ESAWGPAATIAARQSATGTKTDTPIQKVPQSISVVTAEEMALHQPKSVKEALSYTPGVSVGTRGASNTYDHLIIRGFAAEGQSQNNYLNGLKLQGNFYNDAVIDPYMLERAEIMRGPVSVLYGKSSPGGLLNMVSKRPTTEPL,143,150,7.5,25,3.34,3.288455309
31
+ RNasea,RNasea,Riback,KETAAAKFERQHMDSSTSAASSSNYCNQMMKSRNLTKDRCKPVNTFVHESLADVQAVCSQKNVACKDGQTNCYQSYSTMSITDCRETGSSKYPNCAYKTTQANKHIIVACEGNPYVPVHFDASV,124,150,7.5,25,3.36,3.182804126
32
+ tau,ht40,Mylonas,MAEPRQEFEVMEDHAGTYGLGDRKDQGGYTMHQDQEGDTDAGLKESPLQTPTEDGSEEPGSETSDAKSTPTAEDVTAPLVDEGAPGKQAAAQPHTEIPEGTTAEEAGIGDTPSLEDEAAGHVTQARMVSKSKDGTGSDDKKAKGADGKTKIATPRGAAPPGQKGQANATRIPAKTPPAPKTPPSSGEPPKSGDRSGYSSPGSPGTPGSRSRTPSLPTPPTREPKKVAVVRTPPKSPSSAKSRLQTAPVPMPDLKNVKSKIGSTENLKHQPGGGKVQIINKKLDLSNVQSKCGSKDNIKHVPGGGSVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVEVKSEKLDFKDRVQSKIGSLDNITHVPGGGNKKIETHKLTFRENAKAKTDHGAEIVYKSPVVSGDTSPRHLSNVSSTGSIDMVDSPQLATLADEVSASLAKQGL,441,150,7.4,15,6.5,6.181000307
33
+ tau,K32,Mylonas,SSPGSPGTPGSRSRTPSLPTPPTREPKKVAVVRTPPKSPSSAKSRLQTAPVPMPDLKNVKSKIGSTENLKHQPGGGKVQIINKKLDLSNVQSKCGSKDNIKHVPGGGSVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVEVKSEKLDFKDRVQSKIGSLDNITHVPGGGNKKIETHKLTFRENAKAKTDHGAEIVY,197,150,7.4,15,4.2,4.320015395
34
+ tau,K16,Mylonas,SSPGSPGTPGSRSRTPSLPTPPTREPKKVAVVRTPPKSPSSAKSRLQTAPVPMPDLKNVKSKIGSTENLKHQPGGGKVQIINKKLDLSNVQSKCGSKDNIKHVPGGGSVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVEVKSEKLDFKDRVQSKIGSLDNITHVPGGGNKKIE,175,150,7.4,15,3.9,4.124442757
35
+ tau,K18,Mylonas,QTAPVPMPDLKNVKSKIGSTENLKHQPGGGKVQIINKKLDLSNVQSKCGSKDNIKHVPGGGSVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVEVKSEKLDFKDRVQSKIGSLDNITHVPGGGNKKIE,129,150,7.4,15,3.8,3.461192152
36
+ tau,ht23,Mylonas,MAEPRQEFEVMEDHAGTYGLGDRKDQGGYTMHQDQEGDTDAGLKAEEAGIGDTPSLEDEAAGHVTQARMVSKSKDGTGSDDKKAKGADGKTKIATPRGAAPPGQKGQANATRIPAKTPPAPKTPPSSGEPPKSGDRSGYSSPGSPGTPGSRSRTPSLPTPPTREPKKVAVVRTPPKSPSSAKSRLQTAPVPMPDLKNVKSKIGSTENLKHQPGGGKVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVEVKSEKLDFKDRVQSKIGSLDNITHVPGGGNKKIETHKLTFRENAKAKTDHGAEIVYKSPVVSGDTSPRHLSNVSSTGSIDMVDSPQLATLADEVSASLAKQGL,352,150,7.4,15,5.3,5.717470647
37
+ tau,K27,Mylonas,SPGSPGTPGSRSRTPSLPTPPTREPKKVAVVRTPPKSPSSAKSRLQTAPVPMPDLKNVKSKIGSTENLKHQPGGGKVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVEVKSEKLDFKDRVQSKIGSLDNITHVPGGGNKKIETHKLTFRENAKAKTDHGAEIVY,165,150,7.4,15,3.7,3.957930324
38
+ tau,K17,Mylonas,SPGSPGTPGSRSRTPSLPTPPTREPKKVAVVRTPPKSPSSAKSRLQTAPVPMPDLKNVKSKIGSTENLKHQPGGGKVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVEVKSEKLDFKDRVQSKIGSLDNITHVPGGGNKKIE,143,150,7.4,15,3.6,3.700929003
39
+ tau,K19,Mylonas,QTAPVPMPDLKNVKSKIGSTENLKHQPGGGKVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVEVKSEKLDFKDRVQSKIGSLDNITHVPGGGNKKIE,98,150,7.4,15,3.5,2.955708005
40
+ tau,K44,Mylonas,MAEPRQEFEVMEDHAGTYGLGDRKDQGGYTMHQDQEGDTDAGLKAEEAGIGDTPSLEDEAAGHVTQARMVSKSKDGTGSDDKKAKGADGKTKIATPRGAAPPGQKGQANATRIPAKTPPAPKTPPSSGEPPKSGDRSGYSSPGSPGTPGSRSRTPSLPTPPTREPKKVAVVRTPPKSPSSAKSRLQTAPVPMPDLKNVKSKIGSTENLKHQPGGGKVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVEVKSEKLDFKDRVQSKIGSLDNITHVPGGGNKKIE,283,150,7.4,15,5.2,5.072666156
41
+ tau,K10,Mylonas,QTAPVPMPDLKNVKSKIGSTENLKHQPGGGKVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVEVKSEKLDFKDRVQSKIGSLDNITHVPGGGNKKIETHKLTFRENAKAKTDHGAEIVYKSPVVSGDTSPRHLSNVSSTGSIDMVDSPQLATLADEVSASLAKQGL,167,150,7.4,15,4,3.865354564
42
+ tau,K25,Mylonas,MAEPRQEFEVMEDHAGTYGLGDRKDQGGYTMHQDQEGDTDAGLKAEEAGIGDTPSLEDEAAGHVTQARMVSKSKDGTGSDDKKAKGADGKTKIATPRGAAPPGQKGQANATRIPAKTPPAPKTPPSSGEPPKSGDRSGYSSPGSPGTPGSRSRTPSLPTPPTREPKKVAVVRTPPKSPSSAKSRL,185,150,7.4,15,4.1,3.914059813
43
+ tau,K23,Mylonas,MAEPRQEFEVMEDHAGTYGLGDRKDQGGYTMHQDQEGDTDAGLKAEEAGIGDTPSLEDEAAGHVTQARMVSKSKDGTGSDDKKAKGADGKTKIATPRGAAPPGQKGQANATRIPAKTPPAPKTPPSSGEPPKSGDRSGYSSPGSPGTPGSRSRTPSLPTPPTREPKKVAVVRTPPKSPSSAKSRLTHKLTFRENAKAKTDHGAEIVYKSPVVSGDTSPRHLSNVSSTGSIDMVDSPQLATLADEVSASLAKQGL,254,150,7.4,15,4.9,4.567234787
44
+ ACTR,ACTR,Kjaergaard,EQVSHGTQNRPLLRNSLDDLVGPPSNLEGQSDERALLDQLHTLLSNTDATGLEEIDRALGIPELVNQGQAL,71,200,7.4,5,2.63,2.348116852
45
+ hNHE1cdt,hNHE1cdt,Kjaergaard,INNYLTVPAHKLDSPTMSRARIGSDPLAYEPKEDLPVITIDPASPQSPESVDLVNEELKGKVLGLSRDPAKVAEEDEDDDGGIMMRSKETSSPGTDDVFTPAPSDSPSSQRIQRCLSDPGPHPEPGEGEPFFPKGQ,136,200,7.4,5,3.75,3.309963467
46
+ sic1,sic1,Gomes,GSMTPSTPPRSRGTRYLAQPSGNTSSSALMQGQKTPQKPSQNLVPVTPSTTKSFKNAPLLAPPNSNMGMTSPFNGLTSPQRSPFPKSSVKRT,92,200,7.5,20,3,2.753155449
47
+ p15PAF,p15PAF,De Biasio,MVRTKADSVPGTYRKVVAARAPRKVLGSSTSATNSTSVSSRKAENKYAGGNPVCVRPTPKWQKGIGEFFRLSPKDSEKENQIPEEAGSSGLGKAKRKACPLQPDHTNDEKE,111,150,7,25,2.81,2.883730178
48
+ alphaSyn,alphaSyn,Ahmed,MDVFMKGLSKAKEGVVAAAEKTKQGVAEAAGKTKEGVLYVGSKTKEGVVHGVATVAEKTKEQVTNVGGAVVTGVTAVAQKTVEGAGSIAAATGFVKKDQLGKNEEGAPQEGILEDMPVDPDNEAYEMPSEEGYQDYEPEA,140,200,7.4,20,3.55,3.650767716
49
+ OPN220,OPN220,Platzer,MHQDHVDSQSQEHLQQTQNDLASLQQTHYSSEENADVPEQPDFPDVPSKSQETVDDDDDDDNDSNDTDESDEVFTDFPTEAPVAPFNRGDNAGRGDSVAYGFRAKAHVVKASKIRKAARKLIEDDATTEDGDSQPAGLWWPKESREQNSRELPQHQSVENDSRPKFDSREVDGGDSKASAGVDSRESQGSVPAVDASNQTLESAEDAEDRHSIENNEVTR,220,150,6.5,25,5.13,4.196087874
50
+ IN,IN,Hofmann,GSHCFLDGIDKAQEEHEKYHSNWRAMASDFNLPPVVAKEIVASCDKCQLKGEAMHGQVDC,60,100,7,22,2.221,2.212767227
51
+ R15,R15,Hofmann,KLKEANKQQNFNTGIKDFDFWLSEVEALLASEDYGKDLCSVNNLLKKHQLLEADISAHEDRLKDLNSQADSLMTSSAFDTSQVKDKRETINGRFQRIKCMAAARRAKLNESHRL,114,100,7,22,2.612,2.86574043
52
+ ProTα-N,ProTa-N,Hofmann,CDAAVDTSSEITTKDLKEKKEVVEEAENGRDAPANGNANEENGEQEADNEVDEEC,55,100,7,22,2.549,2.519264356
53
+ ProTα-C,ProTa-C,Hofmann,CEEGGEEEEEEEEGDGEEEDGDEDEEAESATGKRAAEDDEDDDVDTKKQKTDEDC,55,100,7,22,2.998,3.3869979
54
+ hCyp,hCyp,Hofmann,GPMCNPTVFFDIAVDGEPLGRVSFELFADKVPKTAENFRALSTGEKGFGYKGSSFHRIIPGFMSQGGDFTRHNGTGGKSIYGEKFEDENFILKHTGPGILSMANAGPNTNGSQFFISTAKTEFLDGKHVVFGKVKEGMNIVEAMERFGSRNGKTSKKITIADSGQLC,167,75,7,22,2.534,3.455838139
55
+ ACTR,ACTR,Borgia,GPSGTQNRPLLRNSLDDLVGPPSNLEGQSDERALLDQLHTLLSNTDATGLEEIDRALGIPELVNQGQALEPKQDSGGPR,79,100,7,25,2.51,2.49971119
56
+ R17d,R17d,Borgia,GSRLEESLEYQQFVANVEEEEAWINEKMTLVASEDYGDTLAAIQGLLKKHEAFETDFTVHKDRVNDVAANGEDLIKKNNHHVENITAKMKGAKGKVSDLEKAAAQRKAKLDENSAFLQ,118,100,7,25,2.817,2.964929957
57
+ sh4ud,LL,Shrestha,MGSNKSKPKDASQRRRSLEPAENVHGAGGGAFPASQTPSKPASADGHRGPSAAFAPAAAEPKLFGGFNSSDTVTSPQRAGPLAGGSAWSHPQFEK,95,200,8,25,2.71,2.652345841
58
+ colNT,LL,Johnson,MGSNGADNAHNNAFGGGKNPGIGNTSGAGSNGSASSNRGNSNGWSWSNKPHKNDGFHSDGSYHITFHGDNNSKPKPGGNSGNRGNNGDGASSHHHHHH,98,400,7.6,25,2.83,2.617435651
59
+ PNt,PNt,Bowman,DWNNQSIVKTGERQHGIHIQGSDPGGVRTASGTTIKVSGRQAQGILLENPAAELQFRNGSVTSSGQLSDDGIRRFLGTVTVKAGKLVADHATLANVGDTWDDDGIALYVAGEQAQASIADSTLQGAGGVQIERGANVTVQRSAIVDGGLHIGALQSLQPEDLPPSRVVLRDTNVTAVPASGAPAAVSVLGASELTLDGGHITGGRAAGVAAMQGAVVHLQRATIRRGEALAGGAVPGGAVPGGAVPGGFGPGGFGPVLDGWYGVDVSGSSVELAQSIVEAPELGAAIRVGRGARVTVPGGSLSAPHGNVIETGGARRFAPQAAPLSITLQAGAH,334,150,7.5,25,5.11,5.002265334
60
+ PNt,swap1,Bowman,DWNNQSIVKTGERQHGIHIQGSDPGGVRTASGTTIKVSGRQAQGILLENPAAELQFRNGSVTSSGQKSDDGIRRFLGTVTVLAGKLVADHATLANVGDTWDDDGIALYVAGEQAQASIADSTLQGAGGVQIERGANVTVQRSAIVLGGLHIGALQSLQPEDDPPSRVVLRDTNVTAVPASGAPAAVSVLGASLLTLDGGHITGGRAAGVAAMQGAVVHEQRATIRRGEALAGGAVPGGAVPGGAVPGGFGPGGFGPVLDGWYGVDVSGSSVELAQSIVEAPELGAAIRVGRGARVTVPGGSLSAPHGNVIETGGARRFAPQAAPLSITLQAGAH,334,150,7.5,25,4.92,4.998767173
61
+ PNt,swap3,Bowman,DWNNQSIVKTGERQHGIHIQGSDPGGVRTASGTTIKVSGRQAQGILLENPAAELQFRNGSVTSSGQKSTDGTRRFLGDVIVKAGLLVADHATLANVGDTWDDDGIALYVAGEQAQASIADSTLQGAGGVQIERGANVDVLRLAIVDGGLHIGALQSQQPETSPPSRVVLRDTNVTAVPASGAPAAVSVQGASEQTLDGGAITGGRAAGVAAMLGHVVHLLRATIRRGEALAGGAVPGGAVPGGAVPGGFGPGGFGPVLDGWYGVDVSGSSVELAQSIVEAPELGAAIRVGRGARVTVPGGSLSAPHGNVIETGGARRFAPQAAPLSITLQAGAH,334,150,7.5,25,4.058,5.024235918
62
+ PNt,swap4,Bowman,DWNNQSIVKTGERQHGIHIQGSDPGGVRTASGTTIKVSGRQAQGILLENPAAELQFRNGSVTSSGQLSFVGITRDLGRDTVKAGKLVADHATLANVGDTWDDDGIALYVAGEQAQASIADSTLQGAGGVQIERGADVRVQREAIVDGGLHNGALQSLQPSILPPSTVVLRDTNVTAVPASGAPAAVLVSGASGLRLDGGHIHEGRAAGVAAMQGAVVTLQTATIRRGEALAGGAVPGGAVPGGAVPGGFGPGGFGPVLDGWYGVDVSGSSVELAQSIVEAPELGAAIRVGRGARVTVPGGSLSAPHGNVIETGGARRFAPQAAPLSITLQAGAH,334,150,7.5,25,5.337,5.080732149
63
+ PNt,swap4.1,Bowman,DWNNQSIVKTGERQHGIHIQGSDPGGVRTASGTTIKVSGRQAQGILLENPAAELQFRNGSVTSSGQLSFVGITRRLGDDTVKAGKLVADHATLANVGDTWDDDGIALYVAGEQAQASIADSTLQGAGGVQIERGADVEVQRRAIVDGGLHNGALQSLQPSILPPSTVVLRDTNVTAVPASGAPAAVLVSGASGLELDGGHIHRGRAAGVAAMQGAVVTLQTATIRRGEALAGGAVPGGAVPGGAVPGGFGPGGFGPVLDGWYGVDVSGSSVELAQSIVEAPELGAAIRVGRGARVTVPGGSLSAPHGNVIETGGARRFAPQAAPLSITLQAGAH,334,150,7.5,25,5.445,5.005591773
64
+ PNt,swap5,Bowman,DWNNQSIVKTGERQHGIHIQGSDPGGVRTASGTTIKVSGRQAQGILLENPAAELQFRNGSVTSSGQLSDDGIEDFLGTVTVDAGELVADHATLANVGDTWDDDGIALYVAGEQAQASIADSTLQGAGGVQIEDGANVTVQESAIVDGGLHIGALQSLQPRRLPPSRVVLRKTNVTAVPASGAPAAVSVLGASKLTLRGGHITGGRAAGVAAMQGAVVHLQRATIRRGRALAGGAVPGGAVPGGAVPGGFGPGGFGPVLDGWYGVDVSGSSVELAQSIVEAPELGAAIRVGRGARVTVPGGSLSAPHGNVIETGGARRFAPQAAPLSITLQAGAH,334,150,7.5,25,4.871,4.760132157
65
+ PNt,swap6,Bowman,DWNNQSIVKTGERQHGIHIQGSDPGGVRTASGTTIKVSGRQAQGILLENPAAELQFRNGSVTSSGQLSDRGIDRFLGTVTVEAGKLVADHATLANVGDTWDKDGIALYVAGRQAQASIADSTLQGAGGVQIREGANVTVQRSAIVDGGLHIGALQSLQPERLPPSDVVLRDTNVTAVPASGAPAAVSVLGASRLTLDGGHITGGDAAGVAAMQGAVVHLQRATIERGEALAGGAVPGGAVPGGAVPGGFGPGGFGPVLDGWYGVDVSGSSVELAQSIVEAPELGAAIRVGRGARVTVPGGSLSAPHGNVIETGGARRFAPQAAPLSITLQAGAH,334,150,7.5,25,5.261,5.117488247
Tesei-trained_Model/exper_seqs_w2preds.csv ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ protein,varient_name,paper,Sequence,N,salt,pH,temp,Rg[nm],Rg_av_pred,w2_preds_tesei_model
2
+ hnRNPA1,WT,Martin,GSMASASSSQRGRSGSGNFGGGRGGGFGGNDNFGRGGNFSGRGGFGGSRGGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGRSSGGSGGGGQYFAKPRNQGGYGGSSSSSSYGSGRRF,137,50,7.0,23,2.72,2.77954575,-0.34484723
3
+ hnRNPA2,Aro-,Martin,GSMAFASSFQRGRYGSGNFGGGRGGGFGGNDNFGRGGNFSGRGGFGGSRGGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGRSSGGSYGGGQYFAKPRNQGGYGGSSFSSSYGSGRRF,137,50,7.0,23,2.89,2.694002975,-0.38453168
4
+ hnRNPA3,Aro--,Martin,GSMASASSSQRGRSGSGNSGGGRGGGFGGNDNFGRGGNSSGRGGFGGSRGGGGYGGSGDGYNGFGNDGSNSGGGGSSNDFGNYNNQSSNFGPMKGGNFGGRSSGGSGGGGQYSAKPRNQGGYGGSSSSSSSGSGRRF,137,50,7.0,23,3.01,2.886092558,-0.28979945
5
+ hnRNPA4,Aro+,Martin,GSMASASSSQRGRSGSGNSGGGRGGGFGGNDNSGRGGNSSGRGGFGGSRGGGGSGGSGDGYNGSGNDGSNSGGGGSSNDFGNSNNQSSNSGPMKGGNFGGRSSGGSGGGGQYSAKPRNQGGSGGSSSSSSSGSGRRS,137,50,7.0,23,2.44,3.011271692,-0.2247001
6
+ hnRNPA1_(A1-LCD),WT-NLS,Bremer,GSMASASSSQRGRSGSGNFGGGRGGGFGGNDNFGRGGNFSGRGGFGGSRGGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGRSSGGSGGGGQYFAKPRNQGGYGGSSSSSSYGSGRRF,137,150,7.0,25,2.76,2.667853665,-0.34484723
7
+ hnRNPA1_(A1-LCD),WT+NLS,Bremer,GSMASASSSQRGRSGSGNFGGGRGGGFGGNDNFGRGGNFSGRGGFGGSRGGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGYGGSSSSSSYGSGRRF,137,150,7.0,25,2.583,2.665065329,-0.34647438
8
+ hnRNPA1_(A1-LCD),-12F+12Y,Bremer,GSMASASSSQRGRSGSGNYGGGRGGGYGGNDNYGRGGNYSGRGGYGGSRGGGGYGGSGDGYNGYGNDGSNYGGGGSYNDYGNYNNQSSNYGPMKGGNYGGRSSGGSGGGGQYYAKPRNQGGYGGSSSSSSYGSGRRY,137,150,7.0,25,2.604,2.593430947,-0.37725022
9
+ hnRNPA1_(A1-LCD),+7F-7Y,Bremer,GSMASASSSQRGRSGSGNFGGGRGGGFGGNDNFGRGGNFSGRGGFGGSRGGGGFGGSGDGFNGFGNDGSNFGGGGSFNDFGNFNNQSSNFGPMKGGNFGGRSSGGSGGGGQFFAKPRNQGGFGGSSSSSSFGSGRRF,137,150,7.0,25,2.718,2.72649201,-0.31061214
10
+ hnRNPA1_(A1-LCD),-9F+6Y,Bremer,GSMASASSSQRGRSGSGNFGGGRGGGYGGNDNYGRGGNYSGRGGFGGSRGGGGYGGSGDGYNGGGNDGSNYGGGGSYNDSGNYNNQSSNFGPMKGGNYGGRSSGGSGGGGQYGAKPRNQGGYGGSSSSSSYGSGRRY,137,150,7.0,25,2.655,2.67330763,-0.3397461
11
+ hnRNPA1_(A1-LCD),-8F+4Y,Bremer,GSMASASSSQRGRSGSGNFGGGRGGGYGGNDNGGRGGNYSGRGGFGGSRGGGGYGGSGDGYNGGGNDGSNYGGGGSYNDSGNYNNQSSNFGPMKGGNYGGRSSGGSGGGGQYGAKPRNQGGYGGSSSSSSYGSGRRF,137,150,7.0,25,2.707,2.69069985,-0.33270133
12
+ hnRNPA1_(A1-LCD),-9F+3Y,Bremer,GSMASASSSQRGRSGSGNFGGGRGGGYGGNDNGGRGGNYSGRGGFGGSRGGGGYGGSGDGYNGGGNDGSNYGGGGSYNDSGNGNNQSSNFGPMKGGNYGGRSSGGSGGGGQYGAKPRNQGGYGGSSSSSSYGSGRRS,137,150,7.0,25,2.683,2.720835079,-0.31949827
13
+ hnRNPA1_(A1-LCD),-10R,Bremer,GSMASASSSQGGSSGSGNFGGGGGGGFGGNDNFGGGGNFSGSGGFGGSGGGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGSSSGPYGGGGQYFAKPGNQGGYGGSSSSSSYGSGGGF,137,150,7.0,25,2.671,2.698054512,-0.25107563
14
+ hnRNPA1_(A1-LCD),-6R,Bremer,GSMASASSSQGGRSGSGNFGGGRGGGFGGNDNFGGGGNFSGSGGFGGSRGGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGSSSGPYGGGGQYFAKPGNQGGYGGSSSSSSYGSGGRF,137,150,7.0,25,2.573,2.647302437,-0.2631922
15
+ hnRNPA1_(A1-LCD),+7R,Bremer,GSMASASSSQRGRSGRGNFGGGRGGGFGGNDNFGRGGNFSGRGGFGGSRGGGRYGGSGDRYNGFGNDGRNFGGGGSYNDFGNYNNQSSNFGPMKGGNFRGRSSGPYGRGGQYFAKPRNQGGYGGSSSSRSYGSGRRF,137,150,7.0,25,2.709,2.8897102,-0.5265355
16
+ hnRNPA1_(A1-LCD),-3R+3K,Bremer,GSMASASSSQRGKSGSGNFGGGRGGGFGGNDNFGRGGNFSGRGGFGGSKGGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGRSSGGSGGGGQYFAKPRNQGGYGGSSSSSSYGSGRKF,137,150,7.0,25,2.634,2.734381901,-0.31496823
17
+ hnRNPA1_(A1-LCD),-6R+6K,Bremer,GSMASASSSQKGKSGSGNFGGGRGGGFGGNDNFGKGGNFSGRGGFGGSKGGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGKSSGGSGGGGQYFAKPRNQGGYGGSSSSSSYGSGRKF,137,150,7.0,25,2.787,2.794997238,-0.28669593
18
+ hnRNPA1_(A1-LCD),-4D,Bremer,GSMASASSSQRGRSGSGNFGGGRGGGFGGNGNFGRGGNFSGRGGFGGSRGGGGYGGSGGGYNGFGNSGSNFGGGGSYNGFGNYNNQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGYGGSSSSSSYGSGRRF,137,150,7.0,25,2.642,2.808935655,-0.4755267
19
+ hnRNPA1_(A1-LCD),+4D,Bremer,GSMASASSSQRDRSGSGNFGGGRGGGFGGNDNFGRGGNFSGRGDFGGSRGGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGRSSDPYGGGGQYFAKPRNQGGYGGSSSSSSYDSGRRF,137,150,7.0,25,2.718,2.660307676,-0.25231382
20
+ hnRNPA1_(A1-LCD),+8D,Bremer,GSMASASSSQRDRSGSGNFGGGRDGGFGGNDNFGRGDNFSGRGDFGGSRDGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGRSSDPYGGGGQYFAKPRNQDGYGGSSSSSSYDSGRRF,137,150,7.0,25,2.685,2.700351745,-0.20195933
21
+ hnRNPA1_(A1-LCD),+12D,Bremer,GSMASADSSQRDRDDSGNFGDGRGGGFGGNDNFGRGGNFSDRGGFGGSRGDGGYGGDGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFDPMKGGNFGDRSSGPYDGGGQYFAKPRNQGGYGGSSSSSSYGSDRRF,137,150,7.0,25,2.801,2.73413,-0.24407695
22
+ hnRNPA1_(A1-LCD),+12E,Bremer,GSMASAESSQREREESGNFGEGRGGGFGGNDNFGRGGNFSERGGFGGSRGEGGYGGEGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFEPMKGGNFGERSSGPYEGGGQYFAKPRNQGGYGGSSSSSSYGSERRF,137,150,7.0,25,2.852,2.765836221,-0.23236628
23
+ hnRNPA1_(A1-LCD),+7K+12D,Bremer,GSMASADSSQRDRDDKGNFGDGRGGGFGGNDNFGRGGNFSDRGGFGGSRGDGKYGGDGDKYNGFGNDGKNFGGGGSYNDFGNYNNQSSNFDPMKGGNFKDRSSGPYDKGGQYFAKPRNQGGYGGSSSSKSYGSDRRF,137,150,7.0,25,2.921,2.864720166,-0.118673995
24
+ hnRNPA1_(A1-LCD),+7K+12D_blocky,Bremer,GSMASAKSSQRDRDDDGNFGKGRGGGFGGNKNFGRGGNFSKRGGFGGSRGKGKYGGKGDDYNGFGNDGDNFGGGGSYNDFGNYNNQSSNFDPMDGGNFDDRSSGPYDDGGQYFADPRNQGGYGGSSSSKSYGSKRRF,137,150,7.0,25,2.562,2.595524581,-0.23424682
25
+ hnRNPA1_(A1-LCD),+2R,Bremer,GSMASASSSQRGRSGSGNFGGGRGGGFGGNDNFGRGGNFSGRGGFGGSRGGGGYGGSGDGYNGFRNDGSNFGGGGRYNDFGNYNNQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGYGGSSSSSSYGSGRRF,137,150,7.0,25,2.623,2.74310832,-0.39167032
26
+ hnRNPA1_(A1-LCD),-10R+10K,Bremer,GSMASASSSQKGKSGSGNFGGGKGGGFGGNDNFGKGGNFSGKGGFGGSKGGGGYGGSGDGYNGFGNDGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGKSSGGSGGGGQYFAKPKNQGGYGGSSSSSSYGSGKKF,137,150,7.0,25,2.849,2.883381638,-0.23390317
27
+ hnRNPA1_(A1-LCD),-12F+12Y-10R,Bremer,GSMASASSSQGGSSGSGNYGGGGGGGYGGNDNYGGGGNYSGSGGYGGSGGGGGYGGSGDGYNGYGNDGSNYGGGGSYNDYGNYNNQSSNYGPMKGGNYGGSSSGPYGGGGQYYAKPGNQGGYGGSSSSSSYGSGGGY,137,150,7.0,25,2.607,2.624120722,-0.28584272
28
+ hnRNPA1_(A1-LCD),-10F+7R+12D,Bremer,GSMASADSSQRDRDDRGNFGDGRGGGGGGNDNFGRGGNGSDRGGGGGSRGDGRYGGDGDRYNGGGNDGRNGGGGGSYNDGGNYNNQSSNGDPMKGGNGRDRSSGPYDRGGQYGAKPRNQGGYGGSSSSRSYGSDRRG,137,150,7.0,25,2.86,2.808875725,-0.16749647
29
+ pNT,pNT,Riback,DWNNQSIVKTGERQHGIHIQGSDPGGVRTASGTTIKVSGRQAQGILLENPAAELQFRNGSVTSSGQLSDDGIRRFLGTVTVKAGKLVADHATLANVGDTWDDDGIALYVAGEQAQASIADSTLQGAGGVQIERGANVTVQRSAIVDGGLHIGALQSLQPEDLPPSRVVLRDTNVTAVPASGAPAAVSVLGASELTLDGGHITGGRAAGVAAMQGAVVHLQRATIRRGDALAGGAVPGGAVPGGAVPGGFGPGGFGPVLDGWYGVDVSGSSVELAQSIVEAPELGAAIRVGRGARVTVPGGSLSAPHGNVIETGGARRFAPQAAPLSITLQAGAH,334,150,7.5,25,5.11,5.000444014,-0.046264842
30
+ fHua,fHua,Riback,ESAWGPAATIAARQSATGTKTDTPIQKVPQSISVVTAEEMALHQPKSVKEALSYTPGVSVGTRGASNTYDHLIIRGFAAEGQSQNNYLNGLKLQGNFYNDAVIDPYMLERAEIMRGPVSVLYGKSSPGGLLNMVSKRPTTEPL,143,150,7.5,25,3.34,3.288455309,0.08028786
31
+ RNasea,RNasea,Riback,KETAAAKFERQHMDSSTSAASSSNYCNQMMKSRNLTKDRCKPVNTFVHESLADVQAVCSQKNVACKDGQTNCYQSYSTMSITDCRETGSSKYPNCAYKTTQANKHIIVACEGNPYVPVHFDASV,124,150,7.5,25,3.36,3.182804126,0.16724284
32
+ tau,ht40,Mylonas,MAEPRQEFEVMEDHAGTYGLGDRKDQGGYTMHQDQEGDTDAGLKESPLQTPTEDGSEEPGSETSDAKSTPTAEDVTAPLVDEGAPGKQAAAQPHTEIPEGTTAEEAGIGDTPSLEDEAAGHVTQARMVSKSKDGTGSDDKKAKGADGKTKIATPRGAAPPGQKGQANATRIPAKTPPAPKTPPSSGEPPKSGDRSGYSSPGSPGTPGSRSRTPSLPTPPTREPKKVAVVRTPPKSPSSAKSRLQTAPVPMPDLKNVKSKIGSTENLKHQPGGGKVQIINKKLDLSNVQSKCGSKDNIKHVPGGGSVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVEVKSEKLDFKDRVQSKIGSLDNITHVPGGGNKKIETHKLTFRENAKAKTDHGAEIVYKSPVVSGDTSPRHLSNVSSTGSIDMVDSPQLATLADEVSASLAKQGL,441,150,7.4,15,6.5,6.181000307,0.08360584
33
+ tau,K32,Mylonas,SSPGSPGTPGSRSRTPSLPTPPTREPKKVAVVRTPPKSPSSAKSRLQTAPVPMPDLKNVKSKIGSTENLKHQPGGGKVQIINKKLDLSNVQSKCGSKDNIKHVPGGGSVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVEVKSEKLDFKDRVQSKIGSLDNITHVPGGGNKKIETHKLTFRENAKAKTDHGAEIVY,197,150,7.4,15,4.2,4.320015395,-0.07681902
34
+ tau,K16,Mylonas,SSPGSPGTPGSRSRTPSLPTPPTREPKKVAVVRTPPKSPSSAKSRLQTAPVPMPDLKNVKSKIGSTENLKHQPGGGKVQIINKKLDLSNVQSKCGSKDNIKHVPGGGSVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVEVKSEKLDFKDRVQSKIGSLDNITHVPGGGNKKIE,175,150,7.4,15,3.9,4.124442757,-0.08814843
35
+ tau,K18,Mylonas,QTAPVPMPDLKNVKSKIGSTENLKHQPGGGKVQIINKKLDLSNVQSKCGSKDNIKHVPGGGSVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVEVKSEKLDFKDRVQSKIGSLDNITHVPGGGNKKIE,129,150,7.4,15,3.8,3.461192152,0.12808333
36
+ tau,ht23,Mylonas,MAEPRQEFEVMEDHAGTYGLGDRKDQGGYTMHQDQEGDTDAGLKAEEAGIGDTPSLEDEAAGHVTQARMVSKSKDGTGSDDKKAKGADGKTKIATPRGAAPPGQKGQANATRIPAKTPPAPKTPPSSGEPPKSGDRSGYSSPGSPGTPGSRSRTPSLPTPPTREPKKVAVVRTPPKSPSSAKSRLQTAPVPMPDLKNVKSKIGSTENLKHQPGGGKVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVEVKSEKLDFKDRVQSKIGSLDNITHVPGGGNKKIETHKLTFRENAKAKTDHGAEIVYKSPVVSGDTSPRHLSNVSSTGSIDMVDSPQLATLADEVSASLAKQGL,352,150,7.4,15,5.3,5.717470647,0.04805489
37
+ tau,K27,Mylonas,SPGSPGTPGSRSRTPSLPTPPTREPKKVAVVRTPPKSPSSAKSRLQTAPVPMPDLKNVKSKIGSTENLKHQPGGGKVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVEVKSEKLDFKDRVQSKIGSLDNITHVPGGGNKKIETHKLTFRENAKAKTDHGAEIVY,165,150,7.4,15,3.7,3.957930324,-0.06621514
38
+ tau,K17,Mylonas,SPGSPGTPGSRSRTPSLPTPPTREPKKVAVVRTPPKSPSSAKSRLQTAPVPMPDLKNVKSKIGSTENLKHQPGGGKVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVEVKSEKLDFKDRVQSKIGSLDNITHVPGGGNKKIE,143,150,7.4,15,3.6,3.700929003,-0.084508374
39
+ tau,K19,Mylonas,QTAPVPMPDLKNVKSKIGSTENLKHQPGGGKVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVEVKSEKLDFKDRVQSKIGSLDNITHVPGGGNKKIE,98,150,7.4,15,3.5,2.955708005,0.17149426
40
+ tau,K44,Mylonas,MAEPRQEFEVMEDHAGTYGLGDRKDQGGYTMHQDQEGDTDAGLKAEEAGIGDTPSLEDEAAGHVTQARMVSKSKDGTGSDDKKAKGADGKTKIATPRGAAPPGQKGQANATRIPAKTPPAPKTPPSSGEPPKSGDRSGYSSPGSPGTPGSRSRTPSLPTPPTREPKKVAVVRTPPKSPSSAKSRLQTAPVPMPDLKNVKSKIGSTENLKHQPGGGKVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVEVKSEKLDFKDRVQSKIGSLDNITHVPGGGNKKIE,283,150,7.4,15,5.2,5.072666156,0.007707132
41
+ tau,K10,Mylonas,QTAPVPMPDLKNVKSKIGSTENLKHQPGGGKVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVEVKSEKLDFKDRVQSKIGSLDNITHVPGGGNKKIETHKLTFRENAKAKTDHGAEIVYKSPVVSGDTSPRHLSNVSSTGSIDMVDSPQLATLADEVSASLAKQGL,167,150,7.4,15,4.0,3.865354564,0.18579094
42
+ tau,K25,Mylonas,MAEPRQEFEVMEDHAGTYGLGDRKDQGGYTMHQDQEGDTDAGLKAEEAGIGDTPSLEDEAAGHVTQARMVSKSKDGTGSDDKKAKGADGKTKIATPRGAAPPGQKGQANATRIPAKTPPAPKTPPSSGEPPKSGDRSGYSSPGSPGTPGSRSRTPSLPTPPTREPKKVAVVRTPPKSPSSAKSRL,185,150,7.4,15,4.1,3.914059813,0.0866615
43
+ tau,K23,Mylonas,MAEPRQEFEVMEDHAGTYGLGDRKDQGGYTMHQDQEGDTDAGLKAEEAGIGDTPSLEDEAAGHVTQARMVSKSKDGTGSDDKKAKGADGKTKIATPRGAAPPGQKGQANATRIPAKTPPAPKTPPSSGEPPKSGDRSGYSSPGSPGTPGSRSRTPSLPTPPTREPKKVAVVRTPPKSPSSAKSRLTHKLTFRENAKAKTDHGAEIVYKSPVVSGDTSPRHLSNVSSTGSIDMVDSPQLATLADEVSASLAKQGL,254,150,7.4,15,4.9,4.567234787,0.10713528
44
+ ACTR,ACTR,Kjaergaard,EQVSHGTQNRPLLRNSLDDLVGPPSNLEGQSDERALLDQLHTLLSNTDATGLEEIDRALGIPELVNQGQAL,71,200,7.4,5,2.63,2.348116852,-0.023581075
45
+ hNHE1cdt,hNHE1cdt,Kjaergaard,INNYLTVPAHKLDSPTMSRARIGSDPLAYEPKEDLPVITIDPASPQSPESVDLVNEELKGKVLGLSRDPAKVAEEDEDDDGGIMMRSKETSSPGTDDVFTPAPSDSPSSQRIQRCLSDPGPHPEPGEGEPFFPKGQ,136,200,7.4,5,3.75,3.309963467,-0.13009122
46
+ sic1,sic1,Gomes,GSMTPSTPPRSRGTRYLAQPSGNTSSSALMQGQKTPQKPSQNLVPVTPSTTKSFKNAPLLAPPNSNMGMTSPFNGLTSPQRSPFPKSSVKRT,92,200,7.5,20,3.0,2.753155449,-0.10162979
47
+ p15PAF,p15PAF,De Biasio,MVRTKADSVPGTYRKVVAARAPRKVLGSSTSATNSTSVSSRKAENKYAGGNPVCVRPTPKWQKGIGEFFRLSPKDSEKENQIPEEAGSSGLGKAKRKACPLQPDHTNDEKE,111,150,7.0,25,2.81,2.883730178,-0.05214207
48
+ alphaSyn,alphaSyn,Ahmed,MDVFMKGLSKAKEGVVAAAEKTKQGVAEAAGKTKEGVLYVGSKTKEGVVHGVATVAEKTKEQVTNVGGAVVTGVTAVAQKTVEGAGSIAAATGFVKKDQLGKNEEGAPQEGILEDMPVDPDNEAYEMPSEEGYQDYEPEA,140,200,7.4,20,3.55,3.650767716,0.22115915
49
+ OPN220,OPN220,Platzer,MHQDHVDSQSQEHLQQTQNDLASLQQTHYSSEENADVPEQPDFPDVPSKSQETVDDDDDDDNDSNDTDESDEVFTDFPTEAPVAPFNRGDNAGRGDSVAYGFRAKAHVVKASKIRKAARKLIEDDATTEDGDSQPAGLWWPKESREQNSRELPQHQSVENDSRPKFDSREVDGGDSKASAGVDSRESQGSVPAVDASNQTLESAEDAEDRHSIENNEVTR,220,150,6.5,25,5.13,4.196087874,-0.52987176
50
+ IN,IN,Hofmann,GSHCFLDGIDKAQEEHEKYHSNWRAMASDFNLPPVVAKEIVASCDKCQLKGEAMHGQVDC,60,100,7.0,22,2.221,2.212767227,0.19296344
51
+ R15,R15,Hofmann,KLKEANKQQNFNTGIKDFDFWLSEVEALLASEDYGKDLCSVNNLLKKHQLLEADISAHEDRLKDLNSQADSLMTSSAFDTSQVKDKRETINGRFQRIKCMAAARRAKLNESHRL,114,100,7.0,22,2.612,2.86574043,0.15593176
52
+ ProTα-N,ProTa-N,Hofmann,CDAAVDTSSEITTKDLKEKKEVVEEAENGRDAPANGNANEENGEQEADNEVDEEC,55,100,7.0,22,2.549,2.519264356,-0.5270907
53
+ ProTα-C,ProTa-C,Hofmann,CEEGGEEEEEEEEGDGEEEDGDEDEEAESATGKRAAEDDEDDDVDTKKQKTDEDC,55,100,7.0,22,2.998,3.3869979,-2.1427064
54
+ hCyp,hCyp,Hofmann,GPMCNPTVFFDIAVDGEPLGRVSFELFADKVPKTAENFRALSTGEKGFGYKGSSFHRIIPGFMSQGGDFTRHNGTGGKSIYGEKFEDENFILKHTGPGILSMANAGPNTNGSQFFISTAKTEFLDGKHVVFGKVKEGMNIVEAMERFGSRNGKTSKKITIADSGQLC,167,75,7.0,22,2.534,3.455838139,0.0024694633
55
+ ACTR,ACTR,Borgia,GPSGTQNRPLLRNSLDDLVGPPSNLEGQSDERALLDQLHTLLSNTDATGLEEIDRALGIPELVNQGQALEPKQDSGGPR,79,100,7.0,25,2.51,2.49971119,0.03898625
56
+ R17d,R17d,Borgia,GSRLEESLEYQQFVANVEEEEAWINEKMTLVASEDYGDTLAAIQGLLKKHEAFETDFTVHKDRVNDVAANGEDLIKKNNHHVENITAKMKGAKGKVSDLEKAAAQRKAKLDENSAFLQ,118,100,7.0,25,2.817,2.964929957,0.13355108
57
+ sh4ud,LL,Shrestha,MGSNKSKPKDASQRRRSLEPAENVHGAGGGAFPASQTPSKPASADGHRGPSAAFAPAAAEPKLFGGFNSSDTVTSPQRAGPLAGGSAWSHPQFEK,95,200,8.0,25,2.71,2.652345841,0.07127841
58
+ colNT,LL,Johnson,MGSNGADNAHNNAFGGGKNPGIGNTSGAGSNGSASSNRGNSNGWSWSNKPHKNDGFHSDGSYHITFHGDNNSKPKPGGNSGNRGNNGDGASSHHHHHH,98,400,7.6,25,2.83,2.617435651,0.0075647365
59
+ PNt,PNt,Bowman,DWNNQSIVKTGERQHGIHIQGSDPGGVRTASGTTIKVSGRQAQGILLENPAAELQFRNGSVTSSGQLSDDGIRRFLGTVTVKAGKLVADHATLANVGDTWDDDGIALYVAGEQAQASIADSTLQGAGGVQIERGANVTVQRSAIVDGGLHIGALQSLQPEDLPPSRVVLRDTNVTAVPASGAPAAVSVLGASELTLDGGHITGGRAAGVAAMQGAVVHLQRATIRRGEALAGGAVPGGAVPGGAVPGGFGPGGFGPVLDGWYGVDVSGSSVELAQSIVEAPELGAAIRVGRGARVTVPGGSLSAPHGNVIETGGARRFAPQAAPLSITLQAGAH,334,150,7.5,25,5.11,5.002265334,-0.043667927
60
+ PNt,swap1,Bowman,DWNNQSIVKTGERQHGIHIQGSDPGGVRTASGTTIKVSGRQAQGILLENPAAELQFRNGSVTSSGQKSDDGIRRFLGTVTVLAGKLVADHATLANVGDTWDDDGIALYVAGEQAQASIADSTLQGAGGVQIERGANVTVQRSAIVLGGLHIGALQSLQPEDDPPSRVVLRDTNVTAVPASGAPAAVSVLGASLLTLDGGHITGGRAAGVAAMQGAVVHEQRATIRRGEALAGGAVPGGAVPGGAVPGGFGPGGFGPVLDGWYGVDVSGSSVELAQSIVEAPELGAAIRVGRGARVTVPGGSLSAPHGNVIETGGARRFAPQAAPLSITLQAGAH,334,150,7.5,25,4.92,4.998767173,-0.041700497
61
+ PNt,swap3,Bowman,DWNNQSIVKTGERQHGIHIQGSDPGGVRTASGTTIKVSGRQAQGILLENPAAELQFRNGSVTSSGQKSTDGTRRFLGDVIVKAGLLVADHATLANVGDTWDDDGIALYVAGEQAQASIADSTLQGAGGVQIERGANVDVLRLAIVDGGLHIGALQSQQPETSPPSRVVLRDTNVTAVPASGAPAAVSVQGASEQTLDGGAITGGRAAGVAAMLGHVVHLLRATIRRGEALAGGAVPGGAVPGGAVPGGFGPGGFGPVLDGWYGVDVSGSSVELAQSIVEAPELGAAIRVGRGARVTVPGGSLSAPHGNVIETGGARRFAPQAAPLSITLQAGAH,334,150,7.5,25,4.058,5.024235918,-0.030256333
62
+ PNt,swap4,Bowman,DWNNQSIVKTGERQHGIHIQGSDPGGVRTASGTTIKVSGRQAQGILLENPAAELQFRNGSVTSSGQLSFVGITRDLGRDTVKAGKLVADHATLANVGDTWDDDGIALYVAGEQAQASIADSTLQGAGGVQIERGADVRVQREAIVDGGLHNGALQSLQPSILPPSTVVLRDTNVTAVPASGAPAAVLVSGASGLRLDGGHIHEGRAAGVAAMQGAVVTLQTATIRRGEALAGGAVPGGAVPGGAVPGGFGPGGFGPVLDGWYGVDVSGSSVELAQSIVEAPELGAAIRVGRGARVTVPGGSLSAPHGNVIETGGARRFAPQAAPLSITLQAGAH,334,150,7.5,25,5.337,5.080732149,-0.011465641
63
+ PNt,swap4.1,Bowman,DWNNQSIVKTGERQHGIHIQGSDPGGVRTASGTTIKVSGRQAQGILLENPAAELQFRNGSVTSSGQLSFVGITRRLGDDTVKAGKLVADHATLANVGDTWDDDGIALYVAGEQAQASIADSTLQGAGGVQIERGADVEVQRRAIVDGGLHNGALQSLQPSILPPSTVVLRDTNVTAVPASGAPAAVLVSGASGLELDGGHIHRGRAAGVAAMQGAVVTLQTATIRRGEALAGGAVPGGAVPGGAVPGGFGPGGFGPVLDGWYGVDVSGSSVELAQSIVEAPELGAAIRVGRGARVTVPGGSLSAPHGNVIETGGARRFAPQAAPLSITLQAGAH,334,150,7.5,25,5.445,5.005591773,-0.04429303
64
+ PNt,swap5,Bowman,DWNNQSIVKTGERQHGIHIQGSDPGGVRTASGTTIKVSGRQAQGILLENPAAELQFRNGSVTSSGQLSDDGIEDFLGTVTVDAGELVADHATLANVGDTWDDDGIALYVAGEQAQASIADSTLQGAGGVQIEDGANVTVQESAIVDGGLHIGALQSLQPRRLPPSRVVLRKTNVTAVPASGAPAAVSVLGASKLTLRGGHITGGRAAGVAAMQGAVVHLQRATIRRGRALAGGAVPGGAVPGGAVPGGFGPGGFGPVLDGWYGVDVSGSSVELAQSIVEAPELGAAIRVGRGARVTVPGGSLSAPHGNVIETGGARRFAPQAAPLSITLQAGAH,334,150,7.5,25,4.871,4.760132157,-0.09892239
65
+ PNt,swap6,Bowman,DWNNQSIVKTGERQHGIHIQGSDPGGVRTASGTTIKVSGRQAQGILLENPAAELQFRNGSVTSSGQLSDRGIDRFLGTVTVEAGKLVADHATLANVGDTWDKDGIALYVAGRQAQASIADSTLQGAGGVQIREGANVTVQRSAIVDGGLHIGALQSLQPERLPPSDVVLRDTNVTAVPASGAPAAVSVLGASRLTLDGGHITGGDAAGVAAMQGAVVHLQRATIERGEALAGGAVPGGAVPGGAVPGGFGPGGFGPVLDGWYGVDVSGSSVELAQSIVEAPELGAAIRVGRGARVTVPGGSLSAPHGNVIETGGARRFAPQAAPLSITLQAGAH,334,150,7.5,25,5.261,5.117488247,0.0033773314
Tesei-trained_Model/extract_Rg.py ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Title: Calculate R_g, R_ee and x using Tesei-trained PML model Omega_2 values
3
+ Author: Lilianna Houston, Ghosh Lab
4
+ Date: July 22nd 2024
5
+ Purpose: This code calculates the R_g, R_ee and x values of protein sequences
6
+ using omega_2 values extracted from the Tesei-trained ML model.
7
+ Inputs: CSV of protein sequences and omega_2 (w2) values.
8
+ Outputs: CSV of protein sequences with R_g, R_ee and x values.
9
+ """
10
+
11
+ # Path to the CSV file containing protein sequences and omega_2 values
12
+ data_path = "exper_seqs_w2preds.csv"
13
+ # Specify sequence column
14
+ seq_column = 3
15
+ # Specify salt column
16
+ salt_column = 5
17
+ # Specify pH column
18
+ pH_column = 6
19
+ # Specify omega_2 column
20
+ w2_column = 10
21
+
22
+ # Import packages
23
+ import theory_functions # Custom module with all theory functions
24
+ import numpy as np
25
+ import pandas as pd
26
+ import matplotlib.pyplot as plt
27
+ from sympy import symbols, Eq, solve
28
+ from scipy.optimize import minimize, Bounds
29
+ from scipy.optimize import minimize_scalar
30
+ from scipy import special
31
+ import sympy as sp
32
+ import math
33
+ import time
34
+ import sys
35
+
36
+ # Load the data from the CSV file
37
+ data = pd.read_csv(data_path)
38
+
39
+ # Initialize lists to store results
40
+ xs = np.zeros(len(data))
41
+ Rs = np.zeros(len(data))
42
+ Rgs = np.zeros(len(data))
43
+
44
+ for i in range(0, 3):
45
+ o_seq = data.iloc[i, seq_column]
46
+ seq = theory_functions.process_seq(o_seq, False, False, True)
47
+ N = len(seq)
48
+ salt = data.iloc[i, salt_column]
49
+ pH = data.iloc[i, pH_column]
50
+ w2 = data.iloc[i, w2_column]
51
+
52
+ # Load pre-calculated Omega and B values
53
+ OBlist = []
54
+ with open('OBfmt_5-1500.npy', 'rb') as f:
55
+ OBlist.append( np.load(f) )
56
+ OBfmt = OBlist[0]
57
+
58
+ # Get the Omega and B values corresponding to the sequence length N
59
+ index = np.where(OBfmt[:,0]==N)[0][0]
60
+ Omega, B = OBfmt[index,1:]
61
+
62
+ O_term = Omega*w2
63
+ B_term = B
64
+
65
+ # Calculate x, R_ee, and R_g using the theoretical model
66
+ x, Ree, Rg = theory_functions.calc_x_w_load(N, w2, seq, .1, O_term, B_term, salt, pH)
67
+ xs[i] = (x)
68
+ Rs[i] = (Ree)
69
+ Rgs[i] = (Rg)
70
+
71
+ # Add the values to the DataFrame and save it to a CSV file
72
+ data["x_pred"] = xs
73
+ data["Ree_pred"] = Rs
74
+ data["Rg_pred"] = Rgs
75
+ data.to_csv('exper_Rg_preds.csv', index=False)
Tesei-trained_Model/extract_w2.py ADDED
@@ -0,0 +1,123 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Title: Extract Omega_2 from Tesei-trained PML model
3
+ Author: Lilianna Houston, Ghosh Lab
4
+ Date: July 22nd 2024
5
+ Purpose: This code extracts the omega_2 (w2) value of protein sequences from a ML
6
+ model trained on the Tesei 2023 dataset.
7
+ Inputs: CSV of protein sequences and weights of the ML model.
8
+ Outputs: CSV of protein sequences with omega_2 predictions.
9
+ """
10
+
11
+ # Enter path to desired weights
12
+ path_to_weights = "weights/weights_0.best.hdf5"
13
+ # Enter path to data file containing sequnces
14
+ path_to_data = "exper_seqs_master.csv"
15
+ # Specify sequence column
16
+ seq_column = 3
17
+
18
+ # Import packages
19
+ import numpy as np
20
+ import os
21
+ import pandas as pd
22
+ import tensorflow as tf
23
+ from tensorflow import keras
24
+ from tensorflow.keras import layers
25
+ from tensorflow.keras.callbacks import ModelCheckpoint
26
+ from sklearn.metrics import matthews_corrcoef, confusion_matrix
27
+ from sklearn.metrics import precision_recall_curve
28
+ from sklearn.metrics import f1_score
29
+ from sklearn.metrics import auc
30
+ import matplotlib.pyplot as plt
31
+ import sys
32
+
33
+ # Define a dictionary of amino acid residues (alphebetical by full name,
34
+ # stored in letter representation) and their charge.
35
+ amino_acid_data = {
36
+ "A": 0, # Alanine
37
+ "R": 1, # Arginine
38
+ "N": 0, # Asparagine
39
+ "D": -1, # Aspartic acid
40
+ "C": 0, # Cysteine
41
+ "E": -1, # Glutamic acid
42
+ "Q": 0, # Glutamine
43
+ "G": 0, # Glycine
44
+ "H": 0, # Histidine
45
+ "I": 0, # Isoleucine
46
+ "L": 0, # Leucine
47
+ "K": 1, # Lysine
48
+ "M": 0, # Methionine
49
+ "F": 0, # Phenylalanine
50
+ "P": 0, # Proline
51
+ "S": 0, # Serine
52
+ "T": 0, # Threonine
53
+ "W": 0, # Tryptophan
54
+ "Y": 0, # Tyrosine
55
+ "V": 0 # Valine
56
+ }
57
+
58
+ # Function to one-hot encode a protein sequence
59
+ def hotcode_seq(seq):
60
+ hotcode_matrix = np.zeros((21, 1496))
61
+ for i in range(len(seq)):
62
+ index = list(amino_acid_data.keys()).index(seq[i]) # Find the index of the amino acid
63
+ hotcode_matrix[index, i] = 1 # Set the corresponding position in the matrix to 1
64
+ hotcode_matrix[20, (i+1):] = 1 # Set remaining positions in the last row to 1
65
+ return hotcode_matrix
66
+
67
+ # Function to convert a list of sequences to their one-hot encoded matrices
68
+ def make_hotcodes(data):
69
+ hotcodes = []
70
+ for i in range(len(data)):
71
+ hotcodes.append(hotcode_seq(data[i]))
72
+ return np.asarray(hotcodes)
73
+
74
+ # -------------- Create a model framework in which to load weights -------------------
75
+
76
+ # The Tesei trained model uses a maximum sequences length of 1496, the length
77
+ # of the longest sequence in the Tesei 2023 dataset.
78
+ model_input_shape = (21, 1496, 1)
79
+
80
+ image_input = keras.Input(shape=model_input_shape)
81
+
82
+ # Convolutional layer with 29 filters, kernel size (21, 6), and ReLU activation
83
+ conv1 = layers.Conv2D(29, kernel_size=(21, 6), activation='relu')(image_input)
84
+
85
+ # Flatten the output from the convolutional layer
86
+ flatten = layers.Flatten()(conv1)
87
+
88
+ # Dense layer with 100 units and softsign activation
89
+ dense1 = layers.Dense(100, activation='softsign')(flatten)
90
+ # Dense layer with 30 units and softsign activation
91
+ dense2 = layers.Dense(30, activation='softsign')(dense1)
92
+ # Output layer with 1 unit and linear activation
93
+ output = layers.Dense(1, activation='linear')(dense2)
94
+
95
+ model = keras.Model(inputs=image_input, outputs=output, name="model")
96
+ # --------------------------------------------------------------------------------------
97
+
98
+ # Load pre-trained weights into model
99
+ model.load_weights(path_to_weights, skip_mismatch=False)
100
+
101
+ # Compile the model with Adam optimizer and mean squared error loss
102
+ model.compile(optimizer='adam', loss='mean_squared_error', metrics=['accuracy'])
103
+
104
+ # Load data file
105
+ data = pd.read_csv(path_to_data)
106
+
107
+ # Extract the sequences from the data
108
+ seqs = data.iloc[:, seq_column]
109
+
110
+ # Convert the sequences to their one-hot encoded matrices
111
+ hots = make_hotcodes(seqs)
112
+
113
+ # Predict the output for the one-hot encoded sequences using the model
114
+ preds = model.predict(hots)
115
+
116
+ # Extract the predictions from the model output
117
+ w2_preds = preds[:, 0]
118
+
119
+ # Add the predictions to the data
120
+ data["w2_preds_tesei_model"] = w2_preds
121
+
122
+ # Save the data with predictions to a new CSV file
123
+ data.to_csv("exper_seqs_w2preds.csv", index=False)
Tesei-trained_Model/theory_functions.py ADDED
@@ -0,0 +1,303 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np
2
+ import pandas as pd
3
+ import matplotlib.pyplot as plt
4
+ from sympy import symbols, Eq, solve
5
+ from scipy.optimize import minimize, Bounds
6
+ from scipy.optimize import minimize_scalar
7
+ from scipy import special
8
+ import sympy as sp
9
+ import math
10
+
11
+
12
+ # ----TEMPURATURE----
13
+ # general
14
+ general_T = 293
15
+ # LL
16
+ #T = 310
17
+ # Mittal
18
+ #T = 300
19
+ T = 298
20
+
21
+
22
+ # ----KUHN LENGTH----
23
+ # Kuhn length (Angstroms) (sometimes written without a subscript)
24
+ l_k = 8
25
+ # bond length
26
+ b = 3.8
27
+ # Bjerrum Kuhn Length (7.12 at 20C)
28
+ l_b_20 = 7.12
29
+ l_b = l_b_20 * (general_T / T)
30
+
31
+ #salt = 150
32
+
33
+ # ----SALT----
34
+ # convert to mol/L, cancel mol, cancel/convert liters to cm^3, cancel/convert cm^3 to A^-3
35
+ def mM_to_A(mM):
36
+ return mM * 10**(-3) * 6.022*10**(23) * (1/1000) * 1 / (10**8)**3
37
+
38
+ def kl_func(salt):
39
+ return np.sqrt(8 * math.pi * l_b * mM_to_A(salt)) * b
40
+
41
+ #----OMEGA 3----
42
+ w3 = .2
43
+
44
+
45
+ #----Constants----
46
+ amino_acid_data = {
47
+ "A": 0,
48
+ "R": 1,
49
+ "N": 0,
50
+ "D": -1,
51
+ "C": 0,
52
+ "E": -1,
53
+ "Q": 0,
54
+ "G": 0,
55
+ "H": .5,
56
+ "I": 0,
57
+ "L": 0,
58
+ "K": 1,
59
+ "M": 0,
60
+ "F": 0,
61
+ "P": 0,
62
+ "S": 0,
63
+ "T": 0,
64
+ "W": 0,
65
+ "Y": 0,
66
+ "V": 0,
67
+ "B": 2,
68
+ "Z": -2
69
+ }
70
+
71
+ pKa_values = {
72
+ "R": 12.3,
73
+ "D": 3.5,
74
+ "C": 6.8,
75
+ "E": 4.2,
76
+ "H": 6.6,
77
+ "K": 10.5,
78
+ "Y": 10.3,
79
+ "B": 7.7,
80
+ "Z": 3.3
81
+ }
82
+
83
+ s_values = {
84
+ "R": 1,
85
+ "D": -1,
86
+ "C": -1,
87
+ "E": -1,
88
+ "H": 1,
89
+ "K": 1,
90
+ "Y": -1,
91
+ "B": 1,
92
+ "Z": -1
93
+ }
94
+
95
+ #----Functions----
96
+ def adjust_pH(pH):
97
+ for key in pKa_values:
98
+ if key in amino_acid_data:
99
+ s = s_values[key]
100
+ amino_acid_data[key] = convert_charge_pH(pH, key, s)
101
+
102
+ def convert_charge_pH(pH, key, s):
103
+ pKa = pKa_values[key]
104
+ return s / (1 + 10**(s*(pH - pKa)))
105
+
106
+ def get_x(Ree, N):
107
+ return Ree**2 / (N * b * l_k)
108
+
109
+ # * .1 converts to nanometers
110
+ def get_Ree(x, N):
111
+ return np.sqrt(x * (N * b * l_k)) * .1
112
+
113
+ # * .1 converts to nanometers
114
+ def get_Rg(x, N):
115
+ return np.sqrt(x * N * b * l_k / 6) * .1
116
+
117
+ def process_seq(seq, n, c, idp):
118
+ new_seq = seq
119
+
120
+ n_aa = new_seq[0]
121
+ c_aa = new_seq[-1]
122
+
123
+ if n or idp:
124
+ if n_aa == "R" or n_aa == "K": new_seq = "B" + new_seq[1:]
125
+ elif n_aa == "D" or n_aa == "E": new_seq = "A" + new_seq[1:]
126
+ else: new_seq = "R" + new_seq[1:]
127
+
128
+ if c or idp:
129
+ if c_aa == "R" or c_aa == "K": new_seq = new_seq[:-1] + "A"
130
+ elif c_aa == "D" or c_aa == "E": new_seq = new_seq[:-1] + "Z"
131
+ else: new_seq = new_seq[:-1] + "D"
132
+
133
+ return new_seq
134
+
135
+ def get_charge(m, n, seq):
136
+ aam = seq[m-1]
137
+ aan = seq[n-1]
138
+
139
+ qm = amino_acid_data.get(aam)
140
+ qn = amino_acid_data.get(aan)
141
+
142
+ return qm * qn
143
+
144
+ def set_vars(data, i):
145
+ name = data.iloc[i, 0]
146
+
147
+ raw_seq = data.iloc[i, 2]
148
+ N = int(data.iloc[i, 3])
149
+ nterm = data.iloc[i, 4]
150
+ cterm = data.iloc[i, 5]
151
+ nandc = data.iloc[i, 6]
152
+
153
+ x = data.iloc[i, 9]
154
+ w2 = data.iloc[i, 10]
155
+ if w2 != "None":
156
+ w2 = float(w2)
157
+
158
+ seq = process_seq(raw_seq, nterm, cterm, nandc)
159
+
160
+ return N, w2, seq, x, name
161
+
162
+ def calc_x_w_load(N, w2, seq, seed, O_term, B_term, salt, pH):
163
+ adjust_pH(pH)
164
+ print("salt:", salt, "kappa:", kl_func(salt))
165
+ mn_array = mnArray_Q_prime(N, seq)
166
+ bounds = Bounds(.01, 10, keep_feasible=False)
167
+ result = minimize(function_to_solve, seed, method="Nelder-Mead", args=(N, w2, seq, mn_array, O_term, B_term, salt), bounds=bounds)
168
+ x = result.x[0]
169
+ return x, get_Ree(x, N), get_Rg(x, N)
170
+
171
+ # add 1 to end of sum, python is non-inclusive
172
+ def Omega(N, w2):
173
+ result = 0.0
174
+ for m in range(2, N + 1):
175
+ for n in range(1, m):
176
+ result += w2 * ((m - n) ** (-0.5))
177
+ return 1/N * result
178
+
179
+ def mn_Omega(N):
180
+ result = 0.0
181
+ for m in range(2, N + 1):
182
+ for n in range(1, m):
183
+ result += (m - n) ** (-0.5)
184
+ return 1/N * result
185
+
186
+ def B(N):
187
+ result = 0.0
188
+ for p in range(3, N+1):
189
+ for m in range(2, p):
190
+ for n in range(1, m):
191
+ result += (p - n)/(((p-m)*(m-n))**(3/2))
192
+ return 1/N * result
193
+
194
+ # output scd to test
195
+ def Q(N, seq):
196
+ result = 0.0
197
+ for m in range(2, N+1):
198
+ for n in range(1, m):
199
+ result += get_charge(m, n, seq) * ((m - n) ** (0.5))
200
+ output = 1/N * result
201
+ #print(output)
202
+ return output
203
+
204
+ def Q_prime(N, seq, x):
205
+ result = 0.0
206
+ for m in range(2, N+1):
207
+ for n in range(1, m):
208
+ result += get_charge(m, n, seq) * ((m-n)**2) * A_prime(m, n, x)
209
+ output = 1/N * result
210
+ return output
211
+
212
+ def mn_Q_prime(N, seq, x, mn_array, salt):
213
+ result = 0.0
214
+ i = 0
215
+ for m in range(2, N+1):
216
+ for n in range(1, m):
217
+ result += mn_array[i] * A_prime(m, n, x, salt)
218
+ i += 1
219
+ output = 1/N * result
220
+ return output
221
+
222
+ def mnArray_Q_prime(N, seq):
223
+ result = []
224
+ for m in range(2, N+1):
225
+ for n in range(1, m):
226
+ result.append(get_charge(m, n, seq) * ((m-n)**2))
227
+ return result
228
+
229
+ def A_prime(m, n, x, salt):
230
+ term1 = 1/2 * (6*math.pi/x)**(1/2) * (1/(m-n)**(3/2))
231
+ term2 = kl_func(salt) * (math.pi/2) * (1/(m-n))
232
+ term3 = special.erfcx(np.sqrt(kl_func(salt)**2 * x * (m-n) / 6))
233
+ return term1 - term2 * term3
234
+
235
+ def free_energy(x, N, w2, seq, mn_array, O_term, B_term, salt):
236
+ # beta F(x) = 3/2 (x-ln(x))
237
+ # + (3/(2pi))**(2/3) * Omega * 1/x**(3/2)
238
+ # + w_3 (3/(2 pi))**3 * B/2 * 1/(x**3)
239
+ # + l_b / l_k * Q*sqrt(6/pi) * 1/x**(1/2)
240
+
241
+ #Q_term = Q(N, seq)
242
+ Q_term = mn_Q_prime(N, seq, x, mn_array, salt)
243
+ # Define the equation
244
+ eq = ( (
245
+ 3/2 * (x - np.log(x))
246
+ + (3/(2*math.pi))**(3/2) * O_term * (1/(x**(3/2)))
247
+ + (w3*(3/(2*math.pi))**(3))/2 * B_term * (1/(x**3))
248
+ + (l_b / b) * 2/math.pi * Q_term
249
+ ))
250
+
251
+ return eq
252
+
253
+ def function_to_solve(argument, N, w2, seq, mn_array, O_term, B_term, salt):
254
+ """function, to be solved."""
255
+
256
+ x = argument
257
+
258
+ sol = free_energy(x, N, w2, seq, mn_array, O_term, B_term, salt)
259
+ return sol
260
+
261
+
262
+ def Q_prime_derv(N, seq, x):
263
+ result = 0.0
264
+ for m in range(2, N+1):
265
+ for n in range(1, m):
266
+ result += get_charge(m, n, seq) * ((m-n)**2) * A_prime_derv(m, n, x)
267
+ output = 1/N * result
268
+ return output
269
+
270
+ def A_prime_derv(m, n, x):
271
+ term1 = (np.sqrt(math.pi)/4) * (6/x)**(3/2) * (1/(m-n)**(3/2))
272
+ term2 = kl**2 * (np.sqrt(math.pi)/2) * (6/x)**(1/2) * (1/(m-n)**(1/2))
273
+ term3 = kl**3 * (math.pi/2) * special.erfcx(np.sqrt(kl**2 * x * (m-n) / 6))
274
+ return (-1/6) * (term1 - term2 + term3)
275
+
276
+ def solve_for_w2_eq(w2, N, x, seq, mn_O_term, B_term, Q_term):
277
+ O_term = mn_O_term * w2
278
+
279
+ # Derivative of the free energy equation
280
+ eq = (
281
+ 3/2 * (x - 1)/x
282
+ + (-9 * np.sqrt(3/2)/(4*(math.pi)**(3/2))) * O_term * (1/(x**(5/2)))
283
+ + (-w3 * 81) / (16 * (math.pi)**(3)) * B_term * (1/(x**4))
284
+ + (l_b / b) * 2/math.pi * Q_term
285
+ )
286
+
287
+ return eq
288
+
289
+ def solve_for_w2(N, seq, x):
290
+
291
+ B_term = B(N, seq)
292
+ Q_term = Q_prime_derv(N, seq, x)
293
+ mn_O_term = mn_Omega(N)
294
+
295
+ # Define the variable
296
+ w2 = sp.Symbol('w2')
297
+
298
+ # Define the equation eq(x, y) = 0
299
+ equation = solve_for_w2_eq(w2, N, x, seq, mn_O_term, B_term, Q_term)
300
+
301
+ # Solve the equation for the given value of y
302
+ solutions = sp.solve(equation, w2)
303
+ return solutions[0]
Tesei-trained_Model/weights/.DS_Store ADDED
Binary file (6.15 kB). View file
 
Tesei-trained_Model/weights/weights_0.best.hdf5 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4c5ff706e6f32054b429f29f4a2b0e64b8a4762592bac0e2e61ae3fa7438bfa8
3
+ size 52008368
Tesei-trained_Model/weights/weights_1.best.hdf5 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:74b5ceaec3ff343b2b5aa9211ad46b6436d1a442c2f73a33f31ef5aeca20aef2
3
+ size 52008368
Tesei-trained_Model/weights/weights_2.best.hdf5 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ad2207c16437a5b180c9c0ea684194ca8546336dfe2a16cd40b2b2fee8c73489
3
+ size 52008368
Tesei-trained_Model/weights/weights_3.best.hdf5 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:28a1c6b46d3a1bfa11d6daec68faa0f9d3038a9e3ead4e6f322849c9ac24448a
3
+ size 52008368
Tesei-trained_Model/weights/weights_4.best.hdf5 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:66930549677374a057066479a377461d9d214e7f22874a3b4b44e5e4e89ae3f4
3
+ size 52008368
Tesei-trained_Model/weights/weights_5.best.hdf5 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:10c1140515d62471aea98ad54400ff87d0bcac29c4c2f2438c7f54067109836a
3
+ size 52008368
Tesei-trained_Model/weights/weights_6.best.hdf5 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cd84177337c8689375039429933b5ae7f145863d5361ccf41c9a322f0994a100
3
+ size 52008368
Tesei-trained_Model/weights/weights_7.best.hdf5 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4d56654caf132b8522820515dec8aac96176b7f42f2f782f92bf5ca9e3b65f52
3
+ size 52008368
Tesei-trained_Model/weights/weights_8.best.hdf5 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c15d0ccf9bb77efdc0d0a13c5744d2816b0bf71b54494cf5ba9a06282cc84838
3
+ size 52008368
Tesei-trained_Model/weights/weights_9.best.hdf5 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:52269623b4998156e7a346cecd268ea85339e49f125f78455629a81872f2f1b6
3
+ size 52008368