Add files using upload-large-folder tool
Browse files
README.md
CHANGED
|
@@ -14,15 +14,15 @@ As part of the ENCODE 4 Project, we trained ChromBPNet models on 1,512 ENCODE DN
|
|
| 14 |
|
| 15 |
For more information about the models, see:
|
| 16 |
- Main ENCODE 4 Paper
|
| 17 |
-
- [A unified lexicon of predictive DNA sequence motifs from ENCODE transcription factor binding and chromatin accessibility assays](https://doi.org/10.5281/zenodo.17123347)
|
| 18 |
-
- [ChromBPNet: bias factorized, base-resolution deep learning models of chromatin accessibility reveal cis-regulatory sequence syntax, transcription factor footprints and regulatory variants](https://doi.org/10.1101/2024.12.25.630221)
|
| 19 |
|
| 20 |
## ChromBPNet model: DNASE in right cardiac atrium (ENCSR622HTS)
|
| 21 |
- Model: ChromBPNet
|
| 22 |
- Assay: DNASE-seq
|
| 23 |
- Experiment: [ENCSR622HTS](https://www.encodeproject.org/experiments/ENCSR622HTS/)
|
| 24 |
- Model annotation: [ENCSR193CJS](https://www.encodeproject.org/annotations/ENCSR193CJS/)
|
| 25 |
-
- Biosample: right cardiac atrium (Homo sapiens right cardiac atrium tissue female adult (59 years))
|
| 26 |
- Cell slim(s): None
|
| 27 |
- Organ slim(s): heart
|
| 28 |
- Developmental slim(s): mesoderm
|
|
@@ -30,7 +30,7 @@ For more information about the models, see:
|
|
| 30 |
- Assembly: hg38
|
| 31 |
|
| 32 |
## Directory structure
|
| 33 |
-
- `fold_0`: Model
|
| 34 |
- `model.chrombpnet.fold_0.encid.h5`: full chrombpnet model that combines both bias and corrected model in .h5 format
|
| 35 |
- `model.chrombpnet_nobias.fold_0.encid.h5`: bias-corrected accessibility model in .h5 format (Use for all biological discovery)
|
| 36 |
- `model.bias_scaled.fold_0.encid.h5`: bias model in .h5 format
|
|
@@ -38,15 +38,15 @@ For more information about the models, see:
|
|
| 38 |
- `model.chrombpnet_nobias.fold_0.encid.tar`: bias-corrected accessibility model in SavedModel format (Use for all biological discovery). After being untarred, it results in a directory named "chrombpnet_wo_bias".
|
| 39 |
- `model.bias_scaled.fold_0.encid.tar`: bias model in SavedModel format. After being untarred, it results in a directory named "bias_model_scaled".
|
| 40 |
- `logs.models.fold_0.encid`: folder containing log files for training models
|
| 41 |
-
- `fold_1`: Model
|
| 42 |
-
- `fold_2`: Model
|
| 43 |
-
- `fold_3`: Model
|
| 44 |
-
- `fold_4`: Model
|
| 45 |
|
| 46 |
# Instructions
|
| 47 |
-
##
|
| 48 |
|
| 49 |
-
(1) Use the code in python after appropriately defining `model_in_h5_format` and `inputs`.
|
| 50 |
(2) `inputs` is a one hot encoded sequence of shape (N,2114,4). Here N corresponds to the
|
| 51 |
number of tested sequences, 2114 is the input sequence length and 4 corresponds to [A,C,G,T].
|
| 52 |
|
|
@@ -77,10 +77,10 @@ def softmax(x, temp=1):
|
|
| 77 |
predictions = softmax(outputs[0]) * (np.exp(outputs[1])-1)
|
| 78 |
```
|
| 79 |
|
| 80 |
-
##
|
| 81 |
|
| 82 |
-
(1) First untar the directory as follows `tar -xvf model.tar`
|
| 83 |
-
(2) Use the code below in python after appropriately defining `model_dir_untared` and `inputs`
|
| 84 |
(3) `inputs` is a one hot encoded sequence of shape (N,2114,4). Here N corresponds to the number
|
| 85 |
of tested sequences, 2114 is the input sequence length and 4 corresponds to ACGT.
|
| 86 |
|
|
@@ -109,7 +109,7 @@ predictions = softmax(outputs["logits_profile_predictions"]) * (np.exp(outputs["
|
|
| 109 |
```
|
| 110 |
|
| 111 |
## Docker image to load and use the models
|
| 112 |
-
https://hub.docker.com/r/kundajelab/chrombpnet-atlas/ (tag:v1)
|
| 113 |
|
| 114 |
## Code for ChromBPNet
|
| 115 |
- https://github.com/kundajelab/chrombpnet/
|
|
|
|
| 14 |
|
| 15 |
For more information about the models, see:
|
| 16 |
- Main ENCODE 4 Paper
|
| 17 |
+
- [A unified lexicon of predictive DNA sequence motifs from ENCODE transcription factor binding and chromatin accessibility assays](https://doi.org/10.5281/zenodo.17123347) (Deshpande et al., Zenodo 2025)
|
| 18 |
+
- [ChromBPNet: bias factorized, base-resolution deep learning models of chromatin accessibility reveal cis-regulatory sequence syntax, transcription factor footprints and regulatory variants](https://doi.org/10.1101/2024.12.25.630221) (Pampari et al., bioRxiv 2024)
|
| 19 |
|
| 20 |
## ChromBPNet model: DNASE in right cardiac atrium (ENCSR622HTS)
|
| 21 |
- Model: ChromBPNet
|
| 22 |
- Assay: DNASE-seq
|
| 23 |
- Experiment: [ENCSR622HTS](https://www.encodeproject.org/experiments/ENCSR622HTS/)
|
| 24 |
- Model annotation: [ENCSR193CJS](https://www.encodeproject.org/annotations/ENCSR193CJS/)
|
| 25 |
+
- Biosample: right cardiac atrium (Full name: Homo sapiens right cardiac atrium tissue female adult (59 years))
|
| 26 |
- Cell slim(s): None
|
| 27 |
- Organ slim(s): heart
|
| 28 |
- Developmental slim(s): mesoderm
|
|
|
|
| 30 |
- Assembly: hg38
|
| 31 |
|
| 32 |
## Directory structure
|
| 33 |
+
- `fold_0`: Model of 5-fold cross-validation: Fold 0
|
| 34 |
- `model.chrombpnet.fold_0.encid.h5`: full chrombpnet model that combines both bias and corrected model in .h5 format
|
| 35 |
- `model.chrombpnet_nobias.fold_0.encid.h5`: bias-corrected accessibility model in .h5 format (Use for all biological discovery)
|
| 36 |
- `model.bias_scaled.fold_0.encid.h5`: bias model in .h5 format
|
|
|
|
| 38 |
- `model.chrombpnet_nobias.fold_0.encid.tar`: bias-corrected accessibility model in SavedModel format (Use for all biological discovery). After being untarred, it results in a directory named "chrombpnet_wo_bias".
|
| 39 |
- `model.bias_scaled.fold_0.encid.tar`: bias model in SavedModel format. After being untarred, it results in a directory named "bias_model_scaled".
|
| 40 |
- `logs.models.fold_0.encid`: folder containing log files for training models
|
| 41 |
+
- `fold_1`: Model of 5-fold coss-validation: Fold 1
|
| 42 |
+
- `fold_2`: Model of 5-fold cross-validation: Fold 2
|
| 43 |
+
- `fold_3`: Model of 5-fold cross-validation: Fold 3
|
| 44 |
+
- `fold_4`: Model of 5-fold cross-validation: Fold 4
|
| 45 |
|
| 46 |
# Instructions
|
| 47 |
+
## 1. Pseudocode for loading models in .h5 format
|
| 48 |
|
| 49 |
+
(1) Use the code in python after appropriately defining `model_in_h5_format` and `inputs`. \
|
| 50 |
(2) `inputs` is a one hot encoded sequence of shape (N,2114,4). Here N corresponds to the
|
| 51 |
number of tested sequences, 2114 is the input sequence length and 4 corresponds to [A,C,G,T].
|
| 52 |
|
|
|
|
| 77 |
predictions = softmax(outputs[0]) * (np.exp(outputs[1])-1)
|
| 78 |
```
|
| 79 |
|
| 80 |
+
## 2. Pseudocode for loading models in .tar format
|
| 81 |
|
| 82 |
+
(1) First untar the directory as follows `tar -xvf model.tar`. \
|
| 83 |
+
(2) Use the code below in python after appropriately defining `model_dir_untared` and `inputs`. \
|
| 84 |
(3) `inputs` is a one hot encoded sequence of shape (N,2114,4). Here N corresponds to the number
|
| 85 |
of tested sequences, 2114 is the input sequence length and 4 corresponds to ACGT.
|
| 86 |
|
|
|
|
| 109 |
```
|
| 110 |
|
| 111 |
## Docker image to load and use the models
|
| 112 |
+
- https://hub.docker.com/r/kundajelab/chrombpnet-atlas/ (tag:v1)
|
| 113 |
|
| 114 |
## Code for ChromBPNet
|
| 115 |
- https://github.com/kundajelab/chrombpnet/
|