avantikalal commited on
Commit
a492bc0
Β·
1 Parent(s): 43339f8

Upload folder using huggingface_hub (#1)

Browse files

- Upload folder using huggingface_hub (9a999e2cc1a3541b78b3f6c7f084cfb3c820fb5f)

Files changed (5) hide show
  1. 2_train_GM12878_DNase.ipynb +0 -0
  2. 3_evaluate_model.ipynb +0 -0
  3. README.md +42 -3
  4. model.ckpt +3 -0
  5. output.log +40 -0
2_train_GM12878_DNase.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
3_evaluate_model.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
README.md CHANGED
@@ -1,3 +1,42 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ # 1. Metadata Block
3
+ license: mit
4
+ library_name: pytorch-lightning
5
+ pipeline_tag: tabular-regression
6
+ tags:
7
+ - biology
8
+ - genomics
9
+ datasets:
10
+ - Genentech/GM12878_dnase-data
11
+ ---
12
+
13
+ # GM12878_dnase-model
14
+
15
+ ## Model Description
16
+ This model is a single-task regression model trained to take in 2114 bp genomic intervals and predict the total GM12878 DNase-seq coverage in the central 1000 bp. It is described in Lal et al. 2025 (https://www.nature.com/articles/s41592-025-02868-z).
17
+
18
+ - **Architecture:** DilatedConvModel (gReLU)
19
+ - **Input:** Genomic sequences (hg38)
20
+ - **Output:** Total DNase-seq coverage in the central 1000 bp.
21
+
22
+ ## Repository Content
23
+ 1. `model.ckpt`: The trained model weights and hyperparameters (PyTorch Lightning checkpoint).
24
+ 2. `2_train_GM12878_DNase.ipynb`: Jupyter notebook for training the model.
25
+ 3. `3_evaluate_model.ipynb`: Jupyter notebook for evaluating the trained model.
26
+ 4. `output.log`: Training logs.
27
+
28
+ ## How to use
29
+ To load this model for inference or fine-tuning, use the `grelu` interface:
30
+
31
+ ```python
32
+ from grelu.lightning import LightningModel
33
+ from huggingface_hub import hf_hub_download
34
+
35
+ ckpt_path = hf_hub_download(
36
+ repo_id="Genentech/GM12878_dnase-model",
37
+ filename="model.ckpt"
38
+ )
39
+
40
+ model = LightningModel.load_from_checkpoint(ckpt_path)
41
+ model.eval()
42
+ ```
model.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f266b296f49b4e97e3ac52e94594da0aaae1d467778ea58b421e1ee7c4482bea
3
+ size 31906519
output.log ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ wandb: 1 of 1 files downloaded.
2
+ Selecting training samples
3
+ Keeping 390473 intervals
4
+
5
+
6
+ Selecting validation samples
7
+ Keeping 21987 intervals
8
+
9
+
10
+ Selecting test samples
11
+ Keeping 22595 intervals
12
+ Final sizes: train: (390473, 3), val: (21987, 3), test: (22595, 3)
13
+ GPU available: True (cuda), used: True
14
+ TPU available: False, using: 0 TPU cores
15
+ HPU available: False, using: 0 HPUs
16
+ /opt/conda/lib/python3.11/site-packages/pytorch_lightning/loggers/wandb.py:397: There is a wandb run already in progress and newly created instances of `WandbLogger` will reuse this run. If this is not desired, call `wandb.finish()` before instantiating `WandbLogger`.
17
+ LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
18
+ Validation DataLoader 0: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 43/43 [00:09<00:00, 4.60it/s]
19
+ /opt/conda/lib/python3.11/site-packages/torchmetrics/utilities/prints.py:43: UserWarning: The variance of predictions or target is close to zero. This can cause instability in Pearson correlationcoefficient, leading to wrong results. Consider re-scaling the input if possible or computing using alarger dtype (currently using torch.float32).
20
+ LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
21
+
22
+ | Name | Type | Params | Mode
23
+ ----------------------------------------------------------
24
+ 0 | model | DilatedConvModel | 6.3 M | train
25
+ 1 | loss | MSELoss | 0 | train
26
+ 2 | activation | Identity | 0 | train
27
+ 3 | val_metrics | MetricCollection | 0 | train
28
+ 4 | test_metrics | MetricCollection | 0 | train
29
+ 5 | transform | Identity | 0 | train
30
+ ----------------------------------------------------------
31
+ 6.3 M Trainable params
32
+ 0 Non-trainable params
33
+ 6.3 M Total params
34
+ 25.358 Total estimated model params size (MB)
35
+ 131 Modules in train mode
36
+ 0 Modules in eval mode
37
+ Epoch 14: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 763/763 [07:51<00:00, 1.62it/s, v_num=arkd, train_loss_step=0.551, train_loss_epoch=0.456]
38
+ /opt/conda/lib/python3.11/site-packages/torchmetrics/utilities/prints.py:43: UserWarning: The variance of predictions or target is close to zero. This can cause instability in Pearson correlationcoefficient, leading to wrong results. Consider re-scaling the input if possible or computing using alarger dtype (currently using torch.float32).
39
+
40
+ `Trainer.fit` stopped: `max_epochs=15` reached.