enformer-model / README.md
avantikalal's picture
Upload folder using huggingface_hub (#1)
f287744 verified
---
license: mit
library_name: pytorch-lightning
pipeline_tag: tabular-regression
tags:
- biology
- genomics
datasets:
- Genentech/enformer-data
---
# Enformer Model (Avsec et al. 2021)
## Model Description
This repository contains the weights for the Enformer model, a long-range transformer architecture designed to predict functional genomic tracks from genomic DNA sequences.
- **Architecture:** Convolutions followed by Transformer layers.
- **Input:** 196,608 bp of genomic DNA sequence.
- **Output Resolution:** 128 bp bins.
- **Source:** [Avsec, Ž. et al. Nature Methods (2021)](https://www.nature.com/articles/s41592-021-01252-x)
## Repository Content
The repository includes both full PyTorch Lightning checkpoints and raw state dictionaries for the human and mouse versions of the model. Note that the weights are derived from the publication but the model has been converted into the PyTorch Lightning format used by gReLU (https://github.com/Genentech/gReLU).
| File | Type | Description |
| :--- | :--- | :--- |
| `human.ckpt` | PyTorch Lightning | Full checkpoint including base model and human head. |
| `mouse.ckpt` | PyTorch Lightning | Full checkpoint including base model and mouse head. |
| `human_state_dict.h5` | HDF5 | Weights-only state dictionary for the human model. |
| `mouse_state_dict.h5` | HDF5 | Weights-only state dictionary for the mouse model. |
| `save_wandb_enformer_human.ipynb` | Jupyter Notebook | Code used to create `human.ckpt` |
| `save_wandb_enformer_mouse.ipynb` | Jupyter Notebook | Code used to create `mouse.ckpt` |
## Model Heads & Output Tracks
Both `.ckpt` files utilize the same core transformer trunk but differ in their species-specific output heads.
### Outputs
Human Head: 5,313 total tracks
Mouse Head: 1,643 total tracks
## Usage
The models are intended for use with the `grelu` library.
```python
from grelu.lightning import LightningModel
from huggingface_hub import hf_hub_download
# Download the desired checkpoint
ckpt_path = hf_hub_download(
repo_id="Genentech/enformer-model",
filename="human.ckpt"
)
# Load the model
model = LightningModel.load_from_checkpoint(ckpt_path)
model.eval()
```