IDPLab commited on
Commit
c724b0d
·
verified ·
1 Parent(s): 11e0137

Upload 2 files

Browse files
Tesei-trained_Model/README.txt ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ This readme file was generated on 2024-07-23 by Lilianna Houston
2
+
3
+ GENERAL INFORMATION
4
+
5
+ Title of Project: PML, Tesei-trained Model
6
+
7
+ Principal Investigator Information
8
+ Name: Kingshuk Ghosh
9
+ Institution: University of Denver
10
+ Email: kingshuk.ghosh@du.edu
11
+
12
+ Author Information
13
+ Name: Lilianna Houston
14
+ Institution: University of Denver
15
+ Email: lili.houston@du.edu
16
+
17
+ DATA & FILE OVERVIEW
18
+
19
+ File List:
20
+
21
+ "weights" -> Folder containing weights from the Tesei-trained CNN that predicts omega_2 from sequence. We trained the model 10 separate times on all omega_2 calculations from the Tesei 2023 dataset and provide all 10 resulting weights.
22
+
23
+ "Tesei_w2_Ree_preds" -> CSV containing calculated and ML predicted omega_2 (w2) (predicted using 10 fold cross-validation), as well as reported and predicted R_ee for the Tesei 2023 dataset. Sequences were omega_2 calculation failed are omitted.
24
+
25
+ "exper_seqs_master" -> CSV of our compiled experimental sequences, including source, sequnces, salt, pH, temperature, reported R_g and our predicted R_g (our value is averaged across the results of all 10 trained models). This is used as the input file for extract_w2.py [Use a different csv if you want to use a different sequence or set of sequences.]
26
+
27
+ "extract_w2" -> .py file that extracts the omega_2s of a specified list of sequences using a specified set of weights. Make sure you use the correct input file if you want to change the current input file. Also change the output file at the end of the code if you change the input file.
28
+
29
+ "exper_seqs_w2preds" -> CSV file. Same content as "exper_seqs_master," with the addition of predicted w2s using weights_0 from the "weights" folder. This is used as the input for extract_Rg
30
+
31
+ "extract_Rg" -> .py file that extracts the x, R_ee, and R_gs of a specified list of sequences using omega_2. Currently w2 is obtained from exper_seqs_w2preds but use a different one if you used a different output above.
32
+
33
+ "OBfmt_5-1500.npy" -> Helper file for "extract_Rg" containing precalulated terms.
34
+
35
+ "theory_functions" -> .py helper file for "extract_Rg" containing constants and functions needed for R_g calculation.
36
+
37
+ "environment.yml" -> yml file used to create conda environment in which to run "extract_w2".
38
+
39
+ USAGE
40
+
41
+ To run "extract_w2," you first must create a conda environment on Linex to ensure you have the necessary ML packages installed.
42
+ Step 1) clone the Hugging Face repository: git clone https://huggingface.co/IDPLab/IDPconformation
43
+ Step 2) create the conda environment: conda env create -f environment.yml
44
+ Step 3) activate the environment: conda activate kingml
45
+ Step 4) make sure to have the following modules loaded:
46
+ module load compilers/anaconda-3.8-2020.11
47
+ module load cuda11.8/toolkit/11.8.0
48
+ module load libraries/cuDNN/7.6.5
49
+ Step 5) run "extract_w2": python extract_w2
50
+
51
+
52
+
53
+
Tesei-trained_Model/environment.yml ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: kingml
2
+ channels:
3
+ - anaconda
4
+ - defaults
5
+ dependencies:
6
+ - _libgcc_mutex=0.1=main
7
+ - _openmp_mutex=5.1=1_gnu
8
+ - c-ares=1.18.1=h7f8727e_0
9
+ - ca-certificates=2023.01.10=h06a4308_0
10
+ - cudatoolkit=11.8.0=h6a678d5_0
11
+ - hdf5=1.12.1=h70be1eb_2
12
+ - krb5=1.19.4=h568e23c_0
13
+ - libcurl=7.87.0=h91b91d3_0
14
+ - libedit=3.1.20221030=h5eee18b_0
15
+ - libev=4.33=h7f8727e_1
16
+ - libgcc-ng=11.2.0=h1234567_1
17
+ - libgfortran-ng=11.2.0=h00389a5_1
18
+ - libgfortran5=11.2.0=h1234567_1
19
+ - libgomp=11.2.0=h1234567_1
20
+ - libnghttp2=1.46.0=hce63b2e_0
21
+ - libssh2=1.10.0=h8f2d780_0
22
+ - libstdcxx-ng=11.2.0=h1234567_1
23
+ - ncurses=6.4=h6a678d5_0
24
+ - openssl=1.1.1s=h7f8727e_0
25
+ - zlib=1.2.13=h5eee18b_0
26
+ prefix: /home/atorres/.conda/envs/mlnew