metadata
license: mit
library_name: pytorch-lightning
pipeline_tag: tabular-regression
tags:
- biology
- genomics
datasets:
- Genentech/GM12878_dnase-data
GM12878_dnase-model
Model Description
This model is a single-task regression model trained to take in 2114 bp genomic intervals and predict the total GM12878 DNase-seq coverage in the central 1000 bp. It is described in Lal et al. 2025 (https://www.nature.com/articles/s41592-025-02868-z).
- Architecture: DilatedConvModel (gReLU)
- Input: 2,114 bp genomic sequences (hg38)
- Output: Total DNase-seq coverage in the central 1000 bp
- Parameters: 6.3M
Performance
| Split | MSE | Pearson |
|---|---|---|
| Validation | 0.4458 | 0.7524 |
| Test | 0.4113 | 0.8056 |
Training Details
| Parameter | Value |
|---|---|
| Task | Regression |
| Loss | MSE |
| Optimizer | Adam |
| Learning rate | 0.0001 |
| Batch size | 512 |
| Max epochs | 15 |
| Channels | 512 |
| n_conv | 9 |
| crop_len | 557 |
| grelu version | 1.0.4.post1.dev39 |
Repository Content
model.ckpt: The trained model weights and hyperparameters (PyTorch Lightning checkpoint).2_train_GM12878_DNase.ipynb: Jupyter notebook for training the model.3_evaluate_model.ipynb: Jupyter notebook for evaluating the trained model.output.log: Training logs.
How to use
To load this model for inference or fine-tuning, use the grelu interface:
from grelu.lightning import LightningModel
from huggingface_hub import hf_hub_download
ckpt_path = hf_hub_download(
repo_id="Genentech/GM12878_dnase-model",
filename="model.ckpt"
)
model = LightningModel.load_from_checkpoint(ckpt_path, weights_only=False)
model.eval()