GM12878_dnase-model / README.md
Lal
Add performance metrics, training details, fix loading code
9f2aaba
metadata
license: mit
library_name: pytorch-lightning
pipeline_tag: tabular-regression
tags:
  - biology
  - genomics
datasets:
  - Genentech/GM12878_dnase-data

GM12878_dnase-model

Model Description

This model is a single-task regression model trained to take in 2114 bp genomic intervals and predict the total GM12878 DNase-seq coverage in the central 1000 bp. It is described in Lal et al. 2025 (https://www.nature.com/articles/s41592-025-02868-z).

  • Architecture: DilatedConvModel (gReLU)
  • Input: 2,114 bp genomic sequences (hg38)
  • Output: Total DNase-seq coverage in the central 1000 bp
  • Parameters: 6.3M

Performance

Split MSE Pearson
Validation 0.4458 0.7524
Test 0.4113 0.8056

Training Details

Parameter Value
Task Regression
Loss MSE
Optimizer Adam
Learning rate 0.0001
Batch size 512
Max epochs 15
Channels 512
n_conv 9
crop_len 557
grelu version 1.0.4.post1.dev39

Repository Content

  1. model.ckpt: The trained model weights and hyperparameters (PyTorch Lightning checkpoint).
  2. 2_train_GM12878_DNase.ipynb: Jupyter notebook for training the model.
  3. 3_evaluate_model.ipynb: Jupyter notebook for evaluating the trained model.
  4. output.log: Training logs.

How to use

To load this model for inference or fine-tuning, use the grelu interface:

from grelu.lightning import LightningModel
from huggingface_hub import hf_hub_download

ckpt_path = hf_hub_download(
    repo_id="Genentech/GM12878_dnase-model",
    filename="model.ckpt"
)

model = LightningModel.load_from_checkpoint(ckpt_path, weights_only=False)
model.eval()