citation info
Browse files
README.md
CHANGED
|
@@ -61,4 +61,14 @@ The pre-training strategy of PlantBiMoE utilizes Masked Language Modeling. Durin
|
|
| 61 |
### Pre-training Details
|
| 62 |
The pre-training of PlantBiMoE was distributed across a computing node with 8 Nvidia A800-80G GPUs, where the batch size for each GPU was set to 4. With 8-step gradient accumulation, the effective batch size became 256. The AdamW optimizer was used, with \\(\beta_{1}\\) set to 0.95, \\(\beta_{2}\\) to 0.9, and a weight decay of 0.1. The total number of training steps was equivalent to 10 epochs. During the initial 2% of the training steps, the learning rate increased linearly from 0 to 0.008, followed by a cosine decay to 0.004. Mixed precision training with bf16 was adopted to improve training efficiency and reduce memory overhead, resulting in a total pre-training time of approximately 166 hours.
|
| 63 |
|
| 64 |
-
## BibTeX entry and citation info
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 61 |
### Pre-training Details
|
| 62 |
The pre-training of PlantBiMoE was distributed across a computing node with 8 Nvidia A800-80G GPUs, where the batch size for each GPU was set to 4. With 8-step gradient accumulation, the effective batch size became 256. The AdamW optimizer was used, with \\(\beta_{1}\\) set to 0.95, \\(\beta_{2}\\) to 0.9, and a weight decay of 0.1. The total number of training steps was equivalent to 10 epochs. During the initial 2% of the training steps, the learning rate increased linearly from 0 to 0.008, followed by a cosine decay to 0.004. Mixed precision training with bf16 was adopted to improve training efficiency and reduce memory overhead, resulting in a total pre-training time of approximately 166 hours.
|
| 63 |
|
| 64 |
+
## BibTeX entry and citation info
|
| 65 |
+
If you use this model, please cite the following paper:
|
| 66 |
+
|
| 67 |
+
```bibtex
|
| 68 |
+
@article{lin2025plantbimoe,
|
| 69 |
+
title={PlantBiMoE: A Bidirectional Foundation Model with SparseMoE for Plant Genomes},
|
| 70 |
+
author={Lin, Kepeng and Zhang, Qizhe and Wang, Rui and Hu, Xuehai and Xu, Wei},
|
| 71 |
+
journal={arXiv preprint arXiv:2512.07113},
|
| 72 |
+
year={2025},
|
| 73 |
+
url={https://arxiv.org/pdf/2512.07113}
|
| 74 |
+
}
|