CpGPT Mammalian Dependencies
Pre-computed DNA sequence embeddings for 300+ species, required to run CpGPT on the Horvath Mammalian Methylation Array and cross-species analyses.
Contents
dna_embeddings/
{species_name}/
nucleotide-transformer-v2-500m-multi-species/
2001bp_dna_embeddings.mmap # Per-species DNA sequence embeddings
Each species has pre-computed 1024-dimensional embeddings from the Nucleotide Transformer v2 (500M) model for 2001bp windows centered on each CpG site covered by the mammalian methylation array.
Total size: ~37 GB across 309 species.
Download
# Install huggingface_hub
pip install huggingface_hub
# Download all mammalian dependencies (~37 GB)
huggingface-cli download lucascamillomd/cpgpt-mammalian-dependencies --local-dir dependencies/mammalian
# Or download a specific species
huggingface-cli download lucascamillomd/cpgpt-mammalian-dependencies --include "dna_embeddings/homo_sapiens/*" --local-dir dependencies/mammalian
Related Repositories
- Model weights: lucascamillomd/cpgpt-models
- Human dependencies: lucascamillomd/cpgpt-human-dependencies
- Code & tutorials: CpGPT GitHub
Citation
@article{camillo2024cpgpt,
title={CpGPT: A Foundation Model for DNA Methylation},
author={de Lima Camillo, Lucas Paulo et al.},
journal={bioRxiv},
year={2024},
doi={10.1101/2024.10.24.619766}
}
License
MIT License — see the GitHub repository for details.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support