Lal

Add performance metrics, training details, fix loading code

9f2aaba 4 days ago

1.73 kB

license: mit
library_name: pytorch-lightning
pipeline_tag: tabular-regression
tags:
  - biology
  - genomics
datasets:
  - Genentech/GM12878_dnase-data

GM12878_dnase-model

Model Description

This model is a single-task regression model trained to take in 2114 bp genomic intervals and predict the total GM12878 DNase-seq coverage in the central 1000 bp. It is described in Lal et al. 2025 (https://www.nature.com/articles/s41592-025-02868-z).

Architecture: DilatedConvModel (gReLU)
Input: 2,114 bp genomic sequences (hg38)
Output: Total DNase-seq coverage in the central 1000 bp
Parameters: 6.3M

Performance

Split	MSE	Pearson
Validation	0.4458	0.7524
Test	0.4113	0.8056

Training Details

Parameter	Value
Task	Regression
Loss	MSE
Optimizer	Adam
Learning rate	0.0001
Batch size	512
Max epochs	15
Channels	512
n_conv	9
crop_len	557
grelu version	1.0.4.post1.dev39

Repository Content

model.ckpt: The trained model weights and hyperparameters (PyTorch Lightning checkpoint).
2_train_GM12878_DNase.ipynb: Jupyter notebook for training the model.
3_evaluate_model.ipynb: Jupyter notebook for evaluating the trained model.
output.log: Training logs.

How to use

To load this model for inference or fine-tuning, use the grelu interface:

from grelu.lightning import LightningModel
from huggingface_hub import hf_hub_download

ckpt_path = hf_hub_download(
    repo_id="Genentech/GM12878_dnase-model",
    filename="model.ckpt"
)

model = LightningModel.load_from_checkpoint(ckpt_path, weights_only=False)
model.eval()