Upload 2 files

Browse files

Files changed (2) hide show

Tesei-trained_Model/README.txt +53 -0
Tesei-trained_Model/environment.yml +26 -0

Tesei-trained_Model/README.txt ADDED Viewed

	@@ -0,0 +1,53 @@

+This readme file was generated on 2024-07-23 by Lilianna Houston
+GENERAL INFORMATION
+Title of Project: PML, Tesei-trained Model
+Principal Investigator Information
+Name: Kingshuk Ghosh
+Institution: University of Denver
+Email: kingshuk.ghosh@du.edu
+Author Information
+Name: Lilianna Houston
+Institution: University of Denver
+Email: lili.houston@du.edu
+DATA & FILE OVERVIEW
+File List:
+"weights" -> Folder containing weights from the Tesei-trained CNN that predicts omega_2 from sequence. We trained the model 10 separate times on all omega_2 calculations from the Tesei 2023 dataset and provide all 10 resulting weights.
+"Tesei_w2_Ree_preds" -> CSV containing calculated and ML predicted omega_2 (w2) (predicted using 10 fold cross-validation), as well as reported and predicted R_ee for the Tesei 2023 dataset. Sequences were omega_2 calculation failed are omitted.
+"exper_seqs_master" -> CSV of our compiled experimental sequences, including source, sequnces, salt, pH, temperature, reported R_g and our predicted R_g (our value is averaged across the results of all 10 trained models). This is used as the input file for extract_w2.py [Use a different csv if you want to use a different sequence or set of sequences.]
+"extract_w2" -> .py file that extracts the omega_2s of a specified list of sequences using a specified set of weights. Make sure you use the correct input file if you want to change the current input file. Also change the output file at the end of the code if you change the input file.
+"exper_seqs_w2preds" -> CSV file. Same content as "exper_seqs_master," with the addition of predicted w2s using weights_0 from the "weights" folder. This is used as the input for extract_Rg
+"extract_Rg" -> .py file that extracts the x, R_ee, and R_gs of a specified list of sequences using omega_2. Currently w2 is obtained from exper_seqs_w2preds but use a different one if you used a different output above.
+"OBfmt_5-1500.npy" -> Helper file for "extract_Rg" containing precalulated terms.
+"theory_functions" -> .py helper file for "extract_Rg" containing constants and functions needed for R_g calculation.
+"environment.yml" -> yml file used to create conda environment in which to run "extract_w2".
+USAGE
+To run "extract_w2," you first must create a conda environment on Linex to ensure you have the necessary ML packages installed.
+Step 1) clone the Hugging Face repository: git clone https://huggingface.co/IDPLab/IDPconformation
+Step 2) create the conda environment: conda env create -f environment.yml
+Step 3) activate the environment: conda activate kingml
+Step 4) make sure to have the following modules loaded:
+             module load compilers/anaconda-3.8-2020.11
+             module load cuda11.8/toolkit/11.8.0
+             module load libraries/cuDNN/7.6.5
+Step 5) run "extract_w2": python extract_w2

Tesei-trained_Model/environment.yml ADDED Viewed

	@@ -0,0 +1,26 @@

+name: kingml
+channels:
+  - anaconda
+  - defaults
+dependencies:
+  - _libgcc_mutex=0.1=main
+  - _openmp_mutex=5.1=1_gnu
+  - c-ares=1.18.1=h7f8727e_0
+  - ca-certificates=2023.01.10=h06a4308_0
+  - cudatoolkit=11.8.0=h6a678d5_0
+  - hdf5=1.12.1=h70be1eb_2
+  - krb5=1.19.4=h568e23c_0
+  - libcurl=7.87.0=h91b91d3_0
+  - libedit=3.1.20221030=h5eee18b_0
+  - libev=4.33=h7f8727e_1
+  - libgcc-ng=11.2.0=h1234567_1
+  - libgfortran-ng=11.2.0=h00389a5_1
+  - libgfortran5=11.2.0=h1234567_1
+  - libgomp=11.2.0=h1234567_1
+  - libnghttp2=1.46.0=hce63b2e_0
+  - libssh2=1.10.0=h8f2d780_0
+  - libstdcxx-ng=11.2.0=h1234567_1
+  - ncurses=6.4=h6a678d5_0
+  - openssl=1.1.1s=h7f8727e_0
+  - zlib=1.2.13=h5eee18b_0
+prefix: /home/atorres/.conda/envs/mlnew