enformer-model / README.md
avantikalal's picture
Upload folder using huggingface_hub (#1)
f287744 verified
metadata
license: mit
library_name: pytorch-lightning
pipeline_tag: tabular-regression
tags:
  - biology
  - genomics
datasets:
  - Genentech/enformer-data

Enformer Model (Avsec et al. 2021)

Model Description

This repository contains the weights for the Enformer model, a long-range transformer architecture designed to predict functional genomic tracks from genomic DNA sequences.

Repository Content

The repository includes both full PyTorch Lightning checkpoints and raw state dictionaries for the human and mouse versions of the model. Note that the weights are derived from the publication but the model has been converted into the PyTorch Lightning format used by gReLU (https://github.com/Genentech/gReLU).

File Type Description
human.ckpt PyTorch Lightning Full checkpoint including base model and human head.
mouse.ckpt PyTorch Lightning Full checkpoint including base model and mouse head.
human_state_dict.h5 HDF5 Weights-only state dictionary for the human model.
mouse_state_dict.h5 HDF5 Weights-only state dictionary for the mouse model.
save_wandb_enformer_human.ipynb Jupyter Notebook Code used to create human.ckpt
save_wandb_enformer_mouse.ipynb Jupyter Notebook Code used to create mouse.ckpt

Model Heads & Output Tracks

Both .ckpt files utilize the same core transformer trunk but differ in their species-specific output heads.

Outputs

Human Head: 5,313 total tracks Mouse Head: 1,643 total tracks

Usage

The models are intended for use with the grelu library.

from grelu.lightning import LightningModel
from huggingface_hub import hf_hub_download

# Download the desired checkpoint
ckpt_path = hf_hub_download(
    repo_id="Genentech/enformer-model", 
    filename="human.ckpt"
)

# Load the model
model = LightningModel.load_from_checkpoint(ckpt_path)
model.eval()