chang-m-yun commited on
Commit
bce9e2d
·
verified ·
1 Parent(s): 22926d1

Add files using upload-large-folder tool

Browse files
Files changed (1) hide show
  1. README.md +14 -14
README.md CHANGED
@@ -14,15 +14,15 @@ As part of the ENCODE 4 Project, we trained ChromBPNet models on 1,512 ENCODE DN
14
 
15
  For more information about the models, see:
16
  - Main ENCODE 4 Paper
17
- - [A unified lexicon of predictive DNA sequence motifs from ENCODE transcription factor binding and chromatin accessibility assays](https://doi.org/10.5281/zenodo.17123347)
18
- - [ChromBPNet: bias factorized, base-resolution deep learning models of chromatin accessibility reveal cis-regulatory sequence syntax, transcription factor footprints and regulatory variants](https://doi.org/10.1101/2024.12.25.630221)
19
 
20
  ## ChromBPNet model: DNASE in right cardiac atrium (ENCSR622HTS)
21
  - Model: ChromBPNet
22
  - Assay: DNASE-seq
23
  - Experiment: [ENCSR622HTS](https://www.encodeproject.org/experiments/ENCSR622HTS/)
24
  - Model annotation: [ENCSR193CJS](https://www.encodeproject.org/annotations/ENCSR193CJS/)
25
- - Biosample: right cardiac atrium (Homo sapiens right cardiac atrium tissue female adult (59 years))
26
  - Cell slim(s): None
27
  - Organ slim(s): heart
28
  - Developmental slim(s): mesoderm
@@ -30,7 +30,7 @@ For more information about the models, see:
30
  - Assembly: hg38
31
 
32
  ## Directory structure
33
- - `fold_0`: Model: Cross-validation fold: Fold 0
34
  - `model.chrombpnet.fold_0.encid.h5`: full chrombpnet model that combines both bias and corrected model in .h5 format
35
  - `model.chrombpnet_nobias.fold_0.encid.h5`: bias-corrected accessibility model in .h5 format (Use for all biological discovery)
36
  - `model.bias_scaled.fold_0.encid.h5`: bias model in .h5 format
@@ -38,15 +38,15 @@ For more information about the models, see:
38
  - `model.chrombpnet_nobias.fold_0.encid.tar`: bias-corrected accessibility model in SavedModel format (Use for all biological discovery). After being untarred, it results in a directory named "chrombpnet_wo_bias".
39
  - `model.bias_scaled.fold_0.encid.tar`: bias model in SavedModel format. After being untarred, it results in a directory named "bias_model_scaled".
40
  - `logs.models.fold_0.encid`: folder containing log files for training models
41
- - `fold_1`: Model: Cross-validation fold: Fold 1
42
- - `fold_2`: Model: Cross-validation fold: Fold 2
43
- - `fold_3`: Model: Cross-validation fold: Fold 3
44
- - `fold_4`: Model: Cross-validation fold: Fold 4
45
 
46
  # Instructions
47
- ## (1) Pseudocode for loading models in .h5 format
48
 
49
- (1) Use the code in python after appropriately defining `model_in_h5_format` and `inputs`.
50
  (2) `inputs` is a one hot encoded sequence of shape (N,2114,4). Here N corresponds to the
51
  number of tested sequences, 2114 is the input sequence length and 4 corresponds to [A,C,G,T].
52
 
@@ -77,10 +77,10 @@ def softmax(x, temp=1):
77
  predictions = softmax(outputs[0]) * (np.exp(outputs[1])-1)
78
  ```
79
 
80
- ## (2) Pseudocode for loading models in .tar format
81
 
82
- (1) First untar the directory as follows `tar -xvf model.tar`
83
- (2) Use the code below in python after appropriately defining `model_dir_untared` and `inputs`
84
  (3) `inputs` is a one hot encoded sequence of shape (N,2114,4). Here N corresponds to the number
85
  of tested sequences, 2114 is the input sequence length and 4 corresponds to ACGT.
86
 
@@ -109,7 +109,7 @@ predictions = softmax(outputs["logits_profile_predictions"]) * (np.exp(outputs["
109
  ```
110
 
111
  ## Docker image to load and use the models
112
- https://hub.docker.com/r/kundajelab/chrombpnet-atlas/ (tag:v1)
113
 
114
  ## Code for ChromBPNet
115
  - https://github.com/kundajelab/chrombpnet/
 
14
 
15
  For more information about the models, see:
16
  - Main ENCODE 4 Paper
17
+ - [A unified lexicon of predictive DNA sequence motifs from ENCODE transcription factor binding and chromatin accessibility assays](https://doi.org/10.5281/zenodo.17123347) (Deshpande et al., Zenodo 2025)
18
+ - [ChromBPNet: bias factorized, base-resolution deep learning models of chromatin accessibility reveal cis-regulatory sequence syntax, transcription factor footprints and regulatory variants](https://doi.org/10.1101/2024.12.25.630221) (Pampari et al., bioRxiv 2024)
19
 
20
  ## ChromBPNet model: DNASE in right cardiac atrium (ENCSR622HTS)
21
  - Model: ChromBPNet
22
  - Assay: DNASE-seq
23
  - Experiment: [ENCSR622HTS](https://www.encodeproject.org/experiments/ENCSR622HTS/)
24
  - Model annotation: [ENCSR193CJS](https://www.encodeproject.org/annotations/ENCSR193CJS/)
25
+ - Biosample: right cardiac atrium (Full name: Homo sapiens right cardiac atrium tissue female adult (59 years))
26
  - Cell slim(s): None
27
  - Organ slim(s): heart
28
  - Developmental slim(s): mesoderm
 
30
  - Assembly: hg38
31
 
32
  ## Directory structure
33
+ - `fold_0`: Model of 5-fold cross-validation: Fold 0
34
  - `model.chrombpnet.fold_0.encid.h5`: full chrombpnet model that combines both bias and corrected model in .h5 format
35
  - `model.chrombpnet_nobias.fold_0.encid.h5`: bias-corrected accessibility model in .h5 format (Use for all biological discovery)
36
  - `model.bias_scaled.fold_0.encid.h5`: bias model in .h5 format
 
38
  - `model.chrombpnet_nobias.fold_0.encid.tar`: bias-corrected accessibility model in SavedModel format (Use for all biological discovery). After being untarred, it results in a directory named "chrombpnet_wo_bias".
39
  - `model.bias_scaled.fold_0.encid.tar`: bias model in SavedModel format. After being untarred, it results in a directory named "bias_model_scaled".
40
  - `logs.models.fold_0.encid`: folder containing log files for training models
41
+ - `fold_1`: Model of 5-fold coss-validation: Fold 1
42
+ - `fold_2`: Model of 5-fold cross-validation: Fold 2
43
+ - `fold_3`: Model of 5-fold cross-validation: Fold 3
44
+ - `fold_4`: Model of 5-fold cross-validation: Fold 4
45
 
46
  # Instructions
47
+ ## 1. Pseudocode for loading models in .h5 format
48
 
49
+ (1) Use the code in python after appropriately defining `model_in_h5_format` and `inputs`. \
50
  (2) `inputs` is a one hot encoded sequence of shape (N,2114,4). Here N corresponds to the
51
  number of tested sequences, 2114 is the input sequence length and 4 corresponds to [A,C,G,T].
52
 
 
77
  predictions = softmax(outputs[0]) * (np.exp(outputs[1])-1)
78
  ```
79
 
80
+ ## 2. Pseudocode for loading models in .tar format
81
 
82
+ (1) First untar the directory as follows `tar -xvf model.tar`. \
83
+ (2) Use the code below in python after appropriately defining `model_dir_untared` and `inputs`. \
84
  (3) `inputs` is a one hot encoded sequence of shape (N,2114,4). Here N corresponds to the number
85
  of tested sequences, 2114 is the input sequence length and 4 corresponds to ACGT.
86
 
 
109
  ```
110
 
111
  ## Docker image to load and use the models
112
+ - https://hub.docker.com/r/kundajelab/chrombpnet-atlas/ (tag:v1)
113
 
114
  ## Code for ChromBPNet
115
  - https://github.com/kundajelab/chrombpnet/