Instructions to use BGI-HangzhouAI/Genos-m-Megatron with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use BGI-HangzhouAI/Genos-m-Megatron with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("BGI-HangzhouAI/Genos-m-Megatron", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Genos-m
Genos-m is a foundation model for human-associated microbial genomes. It is trained to model microbial DNA sequences at single-nucleotide resolution and supports ultra-long genomic contexts up to one million tokens.
For instructions, details, benchmarks, and examples, please refer to Genos-m GitHub and paper.
Model Specification
| Specification | Genos-m-4.7B |
|---|---|
| Total parameters | 4.7B |
| Activated parameters | 0.33B |
| Architecture type | MoE |
| Number of experts | 32 |
| Selected experts per token | 2 |
| Number of layers | 12 |
| Attention hidden size | 1024 |
| Number of attention heads | 16 |
| Query groups | 8 |
| MoE hidden size per expert | 4096 |
| Vocabulary size | 128 padded |
| Context length | up to 1M |
| Training objective | next-token prediction |
Training Data
Genos-m was pretrained on curated microbial genome resources, including GTDB R220 representative prokaryotic genomes, public human-associated microbial genomes, in-house high-quality human gut MAGs, and UHGV human gut phage genomes. The final pre-training corpus contains approximately 1.2T tokens and covers 186 phyla, 3,448 families, and 69,056 species. Within this corpus, the retained human-associated prokaryotic subset covers 45 phyla, 585 families, and 12,273 species across major human microbial habitats, including the gut, oral cavity, skin, respiratory tract, and female reproductive tract.
Checkpoints
- HF-Transformers checkpoint: BGI-HangzhouAI/Genos-m-4.7B
- Megatron-LM checkpoint: BGI-HangzhouAI/Genos-m-Megatron-4.7B
License
Genos-m model and code are released under the Apache License 2.0.
Contact
For questions and suggestions, please open an issue.