--- license: mit library_name: pytorch-lightning pipeline_tag: tabular-regression tags: - biology - genomics datasets: - Genentech/decima-data --- # Decima ## Model Description Decima is a multi-task regression model designed to predict gene expression from genomic DNA sequences. This model was developed by fine-tuning the **Borzoi** architecture. It maps the genomic DNA sequence to quantitative expression levels across diverse cell types and conditions. For more details, please refer to the original paper: https://www.biorxiv.org/content/10.1101/2024.10.09.617507v3. - **Architecture:** Fine-tuned Borzoi - **Task:** Multi-task Regression - **Input:** Genomic sequences (hg38) - **Output:** Predicted expression values (log(CPM) + 1) for 8,856 pseudobulks. ## Repository Content This repository contains four model replicates (`rep0` through `rep3`). Each replicate is provided in two formats: 1. **`.ckpt`**: PyTorch Lightning checkpoints containing model weights, optimizer states, and hyperparameters. 2. **`.safetensors`**: A lightweight, secure format for weights only. **Files:** * `rep0.ckpt`, `rep1.ckpt`, `rep2.ckpt`, `rep3.ckpt` * `rep0.safetensors`, `rep1.safetensors`, `rep2.safetensors`, `rep3.safetensors` ## How to Use You can load any of the model replicates for inference or further fine-tuning using the `decima` package (https://github.com/Genentech/decima). ### Loading via PyTorch Lightning Checkpoint ```python from decima.model.lightning import LightningModel from huggingface_hub import hf_hub_download # Download a specific replicate (e.g., rep0) ckpt_path = hf_hub_download( repo_id="Genentech/decima-model", filename="rep0.ckpt" ) # Load the model model = LightningModel.load_from_checkpoint(ckpt_path) model.eval() # For a safetensor file, use LightningModel.load_safetensor(path) ```