File size: 1,288 Bytes
ee218f1 ac80656 ee218f1 ac80656 ee218f1 ac80656 ee218f1 ac80656 ee218f1 ac80656 ee218f1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | ---
license: mit
tags:
- biology
---
<div align="center">
<img src="https://cdn-uploads.huggingface.co/production/uploads/65a9e8563b9e1f0f308378b7/H2qI2OOSl-KqOlg01fRGR.png" width="100%" />
</div>
# OneGenomeRice (OGR)
OGR is a foundational model for AI-driven precision breeding and functional genomics in rice. It is a generative genomic foundation model trained to process DNA sequences up to **1 million** base pairs in length, with **1.25B** total parameters and a **Mixture-of-Experts (MoE)** architecture. It was pre-trained on a curated corpus of **422** rice genomes spanning cultivated and wild *Oryza* diversity.
For instructions, details, and examples, see the project repository[OGR GitHub](https://github.com/zhejianglab/OneGenomeRice).
The table below summarizes training scale and key hyperparameters.
| Model Specification | OGR |
| --- | --- |
| **Model Scale** | |
| Total Parameters | 1.25B |
| Activated Parameters | 0.33B |
| **Architecture** | |
| Architecture | MoE |
| Number of Experts | 8 |
| Selected Experts per Token | 2 |
| Number of Layers | 12 |
| Attention Hidden Dimension | 1024 |
| Number of Attention Heads | 16 (GQA, 8 KV groups) |
| MoE Hidden Dimension (per Expert) | 4096 |
| Vocabulary Size | 128 (padded) |
| Context Length | up to 1M |
|