Commit
Β·
a492bc0
1
Parent(s):
43339f8
Upload folder using huggingface_hub (#1)
Browse files- Upload folder using huggingface_hub (9a999e2cc1a3541b78b3f6c7f084cfb3c820fb5f)
- 2_train_GM12878_DNase.ipynb +0 -0
- 3_evaluate_model.ipynb +0 -0
- README.md +42 -3
- model.ckpt +3 -0
- output.log +40 -0
2_train_GM12878_DNase.ipynb
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
3_evaluate_model.ipynb
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
README.md
CHANGED
|
@@ -1,3 +1,42 @@
|
|
| 1 |
-
---
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
# 1. Metadata Block
|
| 3 |
+
license: mit
|
| 4 |
+
library_name: pytorch-lightning
|
| 5 |
+
pipeline_tag: tabular-regression
|
| 6 |
+
tags:
|
| 7 |
+
- biology
|
| 8 |
+
- genomics
|
| 9 |
+
datasets:
|
| 10 |
+
- Genentech/GM12878_dnase-data
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
+
# GM12878_dnase-model
|
| 14 |
+
|
| 15 |
+
## Model Description
|
| 16 |
+
This model is a single-task regression model trained to take in 2114 bp genomic intervals and predict the total GM12878 DNase-seq coverage in the central 1000 bp. It is described in Lal et al. 2025 (https://www.nature.com/articles/s41592-025-02868-z).
|
| 17 |
+
|
| 18 |
+
- **Architecture:** DilatedConvModel (gReLU)
|
| 19 |
+
- **Input:** Genomic sequences (hg38)
|
| 20 |
+
- **Output:** Total DNase-seq coverage in the central 1000 bp.
|
| 21 |
+
|
| 22 |
+
## Repository Content
|
| 23 |
+
1. `model.ckpt`: The trained model weights and hyperparameters (PyTorch Lightning checkpoint).
|
| 24 |
+
2. `2_train_GM12878_DNase.ipynb`: Jupyter notebook for training the model.
|
| 25 |
+
3. `3_evaluate_model.ipynb`: Jupyter notebook for evaluating the trained model.
|
| 26 |
+
4. `output.log`: Training logs.
|
| 27 |
+
|
| 28 |
+
## How to use
|
| 29 |
+
To load this model for inference or fine-tuning, use the `grelu` interface:
|
| 30 |
+
|
| 31 |
+
```python
|
| 32 |
+
from grelu.lightning import LightningModel
|
| 33 |
+
from huggingface_hub import hf_hub_download
|
| 34 |
+
|
| 35 |
+
ckpt_path = hf_hub_download(
|
| 36 |
+
repo_id="Genentech/GM12878_dnase-model",
|
| 37 |
+
filename="model.ckpt"
|
| 38 |
+
)
|
| 39 |
+
|
| 40 |
+
model = LightningModel.load_from_checkpoint(ckpt_path)
|
| 41 |
+
model.eval()
|
| 42 |
+
```
|
model.ckpt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:f266b296f49b4e97e3ac52e94594da0aaae1d467778ea58b421e1ee7c4482bea
|
| 3 |
+
size 31906519
|
output.log
ADDED
|
@@ -0,0 +1,40 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[34m[1mwandb[0m: 1 of 1 files downloaded.
|
| 2 |
+
Selecting training samples
|
| 3 |
+
Keeping 390473 intervals
|
| 4 |
+
|
| 5 |
+
|
| 6 |
+
Selecting validation samples
|
| 7 |
+
Keeping 21987 intervals
|
| 8 |
+
|
| 9 |
+
|
| 10 |
+
Selecting test samples
|
| 11 |
+
Keeping 22595 intervals
|
| 12 |
+
Final sizes: train: (390473, 3), val: (21987, 3), test: (22595, 3)
|
| 13 |
+
GPU available: True (cuda), used: True
|
| 14 |
+
TPU available: False, using: 0 TPU cores
|
| 15 |
+
HPU available: False, using: 0 HPUs
|
| 16 |
+
/opt/conda/lib/python3.11/site-packages/pytorch_lightning/loggers/wandb.py:397: There is a wandb run already in progress and newly created instances of `WandbLogger` will reuse this run. If this is not desired, call `wandb.finish()` before instantiating `WandbLogger`.
|
| 17 |
+
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
|
| 18 |
+
Validation DataLoader 0: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 43/43 [00:09<00:00, 4.60it/s]
|
| 19 |
+
/opt/conda/lib/python3.11/site-packages/torchmetrics/utilities/prints.py:43: UserWarning: The variance of predictions or target is close to zero. This can cause instability in Pearson correlationcoefficient, leading to wrong results. Consider re-scaling the input if possible or computing using alarger dtype (currently using torch.float32).
|
| 20 |
+
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
|
| 21 |
+
|
| 22 |
+
| Name | Type | Params | Mode
|
| 23 |
+
----------------------------------------------------------
|
| 24 |
+
0 | model | DilatedConvModel | 6.3 M | train
|
| 25 |
+
1 | loss | MSELoss | 0 | train
|
| 26 |
+
2 | activation | Identity | 0 | train
|
| 27 |
+
3 | val_metrics | MetricCollection | 0 | train
|
| 28 |
+
4 | test_metrics | MetricCollection | 0 | train
|
| 29 |
+
5 | transform | Identity | 0 | train
|
| 30 |
+
----------------------------------------------------------
|
| 31 |
+
6.3 M Trainable params
|
| 32 |
+
0 Non-trainable params
|
| 33 |
+
6.3 M Total params
|
| 34 |
+
25.358 Total estimated model params size (MB)
|
| 35 |
+
131 Modules in train mode
|
| 36 |
+
0 Modules in eval mode
|
| 37 |
+
Epoch 14: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 763/763 [07:51<00:00, 1.62it/s, v_num=arkd, train_loss_step=0.551, train_loss_epoch=0.456]
|
| 38 |
+
/opt/conda/lib/python3.11/site-packages/torchmetrics/utilities/prints.py:43: UserWarning: The variance of predictions or target is close to zero. This can cause instability in Pearson correlationcoefficient, leading to wrong results. Consider re-scaling the input if possible or computing using alarger dtype (currently using torch.float32).
|
| 39 |
+
|
| 40 |
+
`Trainer.fit` stopped: `max_epochs=15` reached.
|