--- license: mit library_name: pytorch-lightning pipeline_tag: tabular-regression tags: - biology - genomics datasets: - Genentech/GM12878_dnase-data --- # GM12878_dnase-model ## Model Description This model is a single-task regression model trained to take in 2114 bp genomic intervals and predict the total GM12878 DNase-seq coverage in the central 1000 bp. It is described in Lal et al. 2025 (https://www.nature.com/articles/s41592-025-02868-z). - **Architecture:** DilatedConvModel (gReLU) - **Input:** 2,114 bp genomic sequences (hg38) - **Output:** Total DNase-seq coverage in the central 1000 bp - **Parameters:** 6.3M ## Performance | Split | MSE | Pearson | |-------|-----|---------| | Validation | 0.4458 | 0.7524 | | Test | 0.4113 | 0.8056 | ## Training Details | Parameter | Value | |-----------|-------| | Task | Regression | | Loss | MSE | | Optimizer | Adam | | Learning rate | 0.0001 | | Batch size | 512 | | Max epochs | 15 | | Channels | 512 | | n_conv | 9 | | crop_len | 557 | | grelu version | 1.0.4.post1.dev39 | ## Repository Content 1. `model.ckpt`: The trained model weights and hyperparameters (PyTorch Lightning checkpoint). 2. `2_train_GM12878_DNase.ipynb`: Jupyter notebook for training the model. 3. `3_evaluate_model.ipynb`: Jupyter notebook for evaluating the trained model. 4. `output.log`: Training logs. ## How to use To load this model for inference or fine-tuning, use the `grelu` interface: ```python from grelu.lightning import LightningModel from huggingface_hub import hf_hub_download ckpt_path = hf_hub_download( repo_id="Genentech/GM12878_dnase-model", filename="model.ckpt" ) model = LightningModel.load_from_checkpoint(ckpt_path, weights_only=False) model.eval() ```