Upload README.md with huggingface_hub

36ddc43 verified about 8 hours ago

4.06 kB

license: apache-2.0
library_name: alphagenome-pytorch
tags:
  - genomics
  - biology
  - dna
  - deep-learning
  - regulatory-genomics
  - chromatin-accessibility
  - gene-expression
pipeline_tag: other

AlphaGenome PyTorch

A PyTorch port of AlphaGenome, the DNA sequence model from Google DeepMind that predicts hundreds of genomic tracks at single base-pair resolution from sequences up to 1M bp.

This is an accessible, readable, and hackable implementation for integrating into existing PyTorch pipelines, fine-tuning on custom datasets, and building on top of.

Model Details

Parameters: 450M
Input: One-hot encoded DNA sequence
Organisms: Human, Mouse
Weights: Converted from the official JAX checkpoint

Download Weights

Available weight files:

model_all_folds.safetensors - trained on all data (recommended)
model_fold_0.safetensors through model_fold_3.safetensors - individual CV folds

# Using Hugging Face CLI
hf download gtca/alphagenome_pytorch model_all_folds.safetensors --local-dir .

# Or using Python
pip install huggingface_hub
python -c "from huggingface_hub import hf_hub_download; hf_hub_download('gtca/alphagenome_pytorch', 'model_all_folds.safetensors', local_dir='.')"

Usage

from alphagenome_pytorch import AlphaGenome
from alphagenome_pytorch.utils.sequence import sequence_to_onehot_tensor
import pyfaidx

model = AlphaGenome.from_pretrained("model_all_folds.safetensors")

with pyfaidx.Fasta("hg38.fa") as genome:
    sequence = str(genome["chr1"][1_000_000:1_131_072])

dna_onehot = sequence_to_onehot_tensor(sequence).unsqueeze(0)

preds = model.predict(dna_onehot, organism_index=0)  # 0=human, 1=mouse

# Access predictions by head name and resolution:
# - preds['atac'][1]: 1bp resolution, shape (batch, 131072, 256)
# - preds['atac'][128]: 128bp resolution, shape (batch, 1024, 256)

Model Outputs

Head	Tracks	Resolutions	Description
atac	256	1bp, 128bp	Chromatin accessibility
dnase	384	1bp, 128bp	DNase-seq
procap	128	1bp, 128bp	Transcription initiation
cage	640	1bp, 128bp	5' cap RNA
rnaseq	768	1bp, 128bp	RNA expression
chip_tf	1664	128bp	TF binding
chip_histone	1152	128bp	Histone modifications
contact_maps	28	64x64	3D chromatin contacts
splice_sites	5	1bp	Splice site classification (D+, A+, D−, A−, None)
splice_junctions	734	pairwise	Junction read counts
splice_site_usage	734	1bp	Splice site usage fraction

Installation

pip install alphagenome-pytorch

License

The weights were ported from the weights provided by Google DeepMind. Those weights were created by Google DeepMind and are the property of Google LLC. They are released under the Apache 2.0 license, consistent with the official release on Kaggle. They are subject to the model terms at https://deepmind.google.com/science/alphagenome/model-terms.

Citation

@article{avsec2026alphagenome,
  title={Advancing regulatory variant effect prediction with AlphaGenome},
  author={Avsec, {\v{Z}}iga and Latysheva, Natasha and Cheng, Jun and Novati, Guido and Taylor, Kyle R and Ward, Tom and Bycroft, Clare and Nicolaisen, Lauren and Arvaniti, Eirini and Pan, Joshua and others},
  journal={Nature},
  volume={649},
  number={8099},
  pages={1206--1218},
  year={2026},
  publisher={Nature Publishing Group UK London}
}

gtca
/

alphagenome_pytorch