kundajelab
/

encode-chrombpnet-DNASE-ENCSR622HTS-ENCSR193CJS

@@ -14,15 +14,15 @@ As part of the ENCODE 4 Project, we trained ChromBPNet models on 1,512 ENCODE DN
 For more information about the models, see:
 - Main ENCODE 4 Paper
-- [A unified lexicon of predictive DNA sequence motifs from ENCODE transcription factor binding and chromatin accessibility assays](https://doi.org/10.5281/zenodo.17123347)
-- [ChromBPNet: bias factorized, base-resolution deep learning models of chromatin accessibility reveal cis-regulatory sequence syntax, transcription factor footprints and regulatory variants](https://doi.org/10.1101/2024.12.25.630221)
 ## ChromBPNet model: DNASE in right cardiac atrium (ENCSR622HTS)
 - Model: ChromBPNet
 - Assay: DNASE-seq
 - Experiment: [ENCSR622HTS](https://www.encodeproject.org/experiments/ENCSR622HTS/)
 - Model annotation: [ENCSR193CJS](https://www.encodeproject.org/annotations/ENCSR193CJS/)
-- Biosample: right cardiac atrium (Homo sapiens right cardiac atrium tissue female adult (59 years))
 - Cell slim(s): None
 - Organ slim(s): heart
 - Developmental slim(s): mesoderm
@@ -30,7 +30,7 @@ For more information about the models, see:
 - Assembly: hg38
 ## Directory structure
-- `fold_0`: Model: Cross-validation fold: Fold 0
     - `model.chrombpnet.fold_0.encid.h5`: full chrombpnet model that combines both bias and corrected model in .h5 format
     - `model.chrombpnet_nobias.fold_0.encid.h5`: bias-corrected accessibility model in .h5 format (Use for all biological discovery)
     - `model.bias_scaled.fold_0.encid.h5`: bias model in .h5 format
@@ -38,15 +38,15 @@ For more information about the models, see:
     - `model.chrombpnet_nobias.fold_0.encid.tar`: bias-corrected accessibility model in SavedModel format (Use for all biological discovery). After being untarred, it results in a directory named "chrombpnet_wo_bias".
     - `model.bias_scaled.fold_0.encid.tar`: bias model in SavedModel format. After being untarred, it results in a directory named "bias_model_scaled".
     - `logs.models.fold_0.encid`: folder containing log files for training models
-- `fold_1`: Model: Cross-validation fold: Fold 1
-- `fold_2`: Model: Cross-validation fold: Fold 2
-- `fold_3`: Model: Cross-validation fold: Fold 3
-- `fold_4`: Model: Cross-validation fold: Fold 4
 # Instructions
-## (1) Pseudocode for loading models in .h5 format
-(1) Use the code in python after appropriately defining `model_in_h5_format` and `inputs`.
 (2) `inputs` is a one hot encoded sequence of shape (N,2114,4). Here N corresponds to the
 number of tested sequences, 2114 is the input sequence length and 4 corresponds to [A,C,G,T].
@@ -77,10 +77,10 @@ def softmax(x, temp=1):
 predictions = softmax(outputs[0]) * (np.exp(outputs[1])-1)
 ```
-## (2) Pseudocode for loading models in .tar format
-(1) First untar the directory as follows `tar -xvf model.tar`
-(2) Use the code below in python after appropriately defining `model_dir_untared` and `inputs`
 (3) `inputs` is a one hot encoded sequence of shape (N,2114,4). Here N corresponds to the number
 of tested sequences, 2114 is the input sequence length and 4 corresponds to ACGT.
@@ -109,7 +109,7 @@ predictions = softmax(outputs["logits_profile_predictions"]) * (np.exp(outputs["
 ```
 ## Docker image to load and use the models
-https://hub.docker.com/r/kundajelab/chrombpnet-atlas/ (tag:v1)
 ## Code for ChromBPNet
 - https://github.com/kundajelab/chrombpnet/

 For more information about the models, see:
 - Main ENCODE 4 Paper
+- [A unified lexicon of predictive DNA sequence motifs from ENCODE transcription factor binding and chromatin accessibility assays](https://doi.org/10.5281/zenodo.17123347) (Deshpande et al., Zenodo 2025)
+- [ChromBPNet: bias factorized, base-resolution deep learning models of chromatin accessibility reveal cis-regulatory sequence syntax, transcription factor footprints and regulatory variants](https://doi.org/10.1101/2024.12.25.630221) (Pampari et al., bioRxiv 2024)
 ## ChromBPNet model: DNASE in right cardiac atrium (ENCSR622HTS)
 - Model: ChromBPNet
 - Assay: DNASE-seq
 - Experiment: [ENCSR622HTS](https://www.encodeproject.org/experiments/ENCSR622HTS/)
 - Model annotation: [ENCSR193CJS](https://www.encodeproject.org/annotations/ENCSR193CJS/)
+- Biosample: right cardiac atrium (Full name: Homo sapiens right cardiac atrium tissue female adult (59 years))
 - Cell slim(s): None
 - Organ slim(s): heart
 - Developmental slim(s): mesoderm
 - Assembly: hg38
 ## Directory structure
+- `fold_0`: Model of 5-fold cross-validation: Fold 0
     - `model.chrombpnet.fold_0.encid.h5`: full chrombpnet model that combines both bias and corrected model in .h5 format
     - `model.chrombpnet_nobias.fold_0.encid.h5`: bias-corrected accessibility model in .h5 format (Use for all biological discovery)
     - `model.bias_scaled.fold_0.encid.h5`: bias model in .h5 format
     - `model.chrombpnet_nobias.fold_0.encid.tar`: bias-corrected accessibility model in SavedModel format (Use for all biological discovery). After being untarred, it results in a directory named "chrombpnet_wo_bias".
     - `model.bias_scaled.fold_0.encid.tar`: bias model in SavedModel format. After being untarred, it results in a directory named "bias_model_scaled".
     - `logs.models.fold_0.encid`: folder containing log files for training models
+- `fold_1`: Model of 5-fold coss-validation: Fold 1
+- `fold_2`: Model of 5-fold cross-validation: Fold 2
+- `fold_3`: Model of 5-fold cross-validation: Fold 3
+- `fold_4`: Model of 5-fold cross-validation: Fold 4
 # Instructions
+## 1. Pseudocode for loading models in .h5 format
+(1) Use the code in python after appropriately defining `model_in_h5_format` and `inputs`. \
 (2) `inputs` is a one hot encoded sequence of shape (N,2114,4). Here N corresponds to the
 number of tested sequences, 2114 is the input sequence length and 4 corresponds to [A,C,G,T].
 predictions = softmax(outputs[0]) * (np.exp(outputs[1])-1)
 ```
+## 2. Pseudocode for loading models in .tar format
+(1) First untar the directory as follows `tar -xvf model.tar`. \
+(2) Use the code below in python after appropriately defining `model_dir_untared` and `inputs`. \
 (3) `inputs` is a one hot encoded sequence of shape (N,2114,4). Here N corresponds to the number
 of tested sequences, 2114 is the input sequence length and 4 corresponds to ACGT.
 ```
 ## Docker image to load and use the models
+- https://hub.docker.com/r/kundajelab/chrombpnet-atlas/ (tag:v1)
 ## Code for ChromBPNet
 - https://github.com/kundajelab/chrombpnet/