--- license: apache-2.0 library_name: alphagenome-pytorch tags: - genomics - biology - dna - deep-learning - regulatory-genomics - chromatin-accessibility - gene-expression pipeline_tag: other --- # AlphaGenome PyTorch A PyTorch port of [AlphaGenome](https://www.nature.com/articles/s41586-025-10014-0), the DNA sequence model from Google DeepMind that predicts hundreds of genomic tracks at single base-pair resolution from sequences up to 1M bp. This is an accessible, readable, and hackable implementation for integrating into existing PyTorch pipelines, fine-tuning on custom datasets, and building on top of. ## Model Details - **Parameters**: 450M - **Input**: One-hot encoded DNA sequence - **Organisms**: Human, Mouse - **Weights**: Converted from the official JAX checkpoint ## Download Weights Available weight files: - `model_all_folds.safetensors` - trained on all data (recommended) - `model_fold_0.safetensors` through `model_fold_3.safetensors` - individual CV folds ```bash # Using Hugging Face CLI hf download gtca/alphagenome_pytorch model_all_folds.safetensors --local-dir . # Or using Python pip install huggingface_hub python -c "from huggingface_hub import hf_hub_download; hf_hub_download('gtca/alphagenome_pytorch', 'model_all_folds.safetensors', local_dir='.')" ``` ## Usage ```python from alphagenome_pytorch import AlphaGenome from alphagenome_pytorch.utils.sequence import sequence_to_onehot_tensor import pyfaidx model = AlphaGenome.from_pretrained("model_all_folds.safetensors") with pyfaidx.Fasta("hg38.fa") as genome: sequence = str(genome["chr1"][1_000_000:1_131_072]) dna_onehot = sequence_to_onehot_tensor(sequence).unsqueeze(0) preds = model.predict(dna_onehot, organism_index=0) # 0=human, 1=mouse # Access predictions by head name and resolution: # - preds['atac'][1]: 1bp resolution, shape (batch, 131072, 256) # - preds['atac'][128]: 128bp resolution, shape (batch, 1024, 256) ``` ## Model Outputs | Head | Tracks | Resolutions | Description | |------|--------|-------------|-------------| | atac | 256 | 1bp, 128bp | Chromatin accessibility | | dnase | 384 | 1bp, 128bp | DNase-seq | | procap | 128 | 1bp, 128bp | Transcription initiation | | cage | 640 | 1bp, 128bp | 5' cap RNA | | rnaseq | 768 | 1bp, 128bp | RNA expression | | chip_tf | 1664 | 128bp | TF binding | | chip_histone | 1152 | 128bp | Histone modifications | | contact_maps | 28 | 64x64 | 3D chromatin contacts | | splice_sites | 5 | 1bp | Splice site classification (D+, A+, D−, A−, None) | | splice_junctions | 734 | pairwise | Junction read counts | | splice_site_usage | 734 | 1bp | Splice site usage fraction | ## Installation ```bash pip install alphagenome-pytorch ``` ## License The weights were ported from the weights [provided by Google DeepMind](https://www.kaggle.com/models/google/alphagenome). Those weights were created by Google DeepMind and are the property of Google LLC. They are released under the [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0), consistent with the [official release on Kaggle](https://www.kaggle.com/models/google/alphagenome). They are subject to the model terms at https://deepmind.google.com/science/alphagenome/model-terms. ## Links - [GitHub Repository](https://github.com/genomicsxai/alphagenome-pytorch) - [Reference JAX Implementation](https://github.com/google-deepmind/alphagenome_research) (by Google DeepMind) - [AlphaGenome Paper](https://www.nature.com/articles/s41586-025-10014-0) - [AlphaGenome Documentation](https://www.alphagenomedocs.com/) ## Citation ```bibtex @article{avsec2026alphagenome, title={Advancing regulatory variant effect prediction with AlphaGenome}, author={Avsec, {\v{Z}}iga and Latysheva, Natasha and Cheng, Jun and Novati, Guido and Taylor, Kyle R and Ward, Tom and Bycroft, Clare and Nicolaisen, Lauren and Arvaniti, Eirini and Pan, Joshua and others}, journal={Nature}, volume={649}, number={8099}, pages={1206--1218}, year={2026}, publisher={Nature Publishing Group UK London} } ```