GM12878_dnase-model / README.md
Lal
Add performance metrics, training details, fix loading code
9f2aaba
---
license: mit
library_name: pytorch-lightning
pipeline_tag: tabular-regression
tags:
- biology
- genomics
datasets:
- Genentech/GM12878_dnase-data
---
# GM12878_dnase-model
## Model Description
This model is a single-task regression model trained to take in 2114 bp genomic intervals and predict the total GM12878 DNase-seq coverage in the central 1000 bp. It is described in Lal et al. 2025 (https://www.nature.com/articles/s41592-025-02868-z).
- **Architecture:** DilatedConvModel (gReLU)
- **Input:** 2,114 bp genomic sequences (hg38)
- **Output:** Total DNase-seq coverage in the central 1000 bp
- **Parameters:** 6.3M
## Performance
| Split | MSE | Pearson |
|-------|-----|---------|
| Validation | 0.4458 | 0.7524 |
| Test | 0.4113 | 0.8056 |
## Training Details
| Parameter | Value |
|-----------|-------|
| Task | Regression |
| Loss | MSE |
| Optimizer | Adam |
| Learning rate | 0.0001 |
| Batch size | 512 |
| Max epochs | 15 |
| Channels | 512 |
| n_conv | 9 |
| crop_len | 557 |
| grelu version | 1.0.4.post1.dev39 |
## Repository Content
1. `model.ckpt`: The trained model weights and hyperparameters (PyTorch Lightning checkpoint).
2. `2_train_GM12878_DNase.ipynb`: Jupyter notebook for training the model.
3. `3_evaluate_model.ipynb`: Jupyter notebook for evaluating the trained model.
4. `output.log`: Training logs.
## How to use
To load this model for inference or fine-tuning, use the `grelu` interface:
```python
from grelu.lightning import LightningModel
from huggingface_hub import hf_hub_download
ckpt_path = hf_hub_download(
repo_id="Genentech/GM12878_dnase-model",
filename="model.ckpt"
)
model = LightningModel.load_from_checkpoint(ckpt_path, weights_only=False)
model.eval()
```