plant-llms
/

PlantBiMoE

Model card Files Files and versions

linkp commited on Dec 13, 2025

Commit

f962c52

·

1 Parent(s): ae0b514

citation info

Files changed (1) hide show

README.md +11 -1

README.md CHANGED Viewed

@@ -61,4 +61,14 @@ The pre-training strategy of PlantBiMoE utilizes Masked Language Modeling. Durin
 ### Pre-training Details
 The pre-training of PlantBiMoE was distributed across a computing node with 8 Nvidia A800-80G GPUs, where the batch size for each GPU was set to 4. With 8-step gradient accumulation, the effective batch size became 256. The AdamW optimizer was used, with \\(\beta_{1}\\) set to 0.95, \\(\beta_{2}\\) to 0.9, and a weight decay of 0.1. The total number of training steps was equivalent to 10 epochs. During the initial 2% of the training steps, the learning rate increased linearly from 0 to 0.008, followed by a cosine decay to 0.004. Mixed precision training with bf16 was adopted to improve training efficiency and reduce memory overhead, resulting in a total pre-training time of approximately 166 hours.
-## BibTeX entry and citation info

 ### Pre-training Details
 The pre-training of PlantBiMoE was distributed across a computing node with 8 Nvidia A800-80G GPUs, where the batch size for each GPU was set to 4. With 8-step gradient accumulation, the effective batch size became 256. The AdamW optimizer was used, with \\(\beta_{1}\\) set to 0.95, \\(\beta_{2}\\) to 0.9, and a weight decay of 0.1. The total number of training steps was equivalent to 10 epochs. During the initial 2% of the training steps, the learning rate increased linearly from 0 to 0.008, followed by a cosine decay to 0.004. Mixed precision training with bf16 was adopted to improve training efficiency and reduce memory overhead, resulting in a total pre-training time of approximately 166 hours.
+## BibTeX entry and citation info
+If you use this model, please cite the following paper:
+```bibtex
+@article{lin2025plantbimoe,
+  title={PlantBiMoE: A Bidirectional Foundation Model with SparseMoE for Plant Genomes},
+  author={Lin, Kepeng and Zhang, Qizhe and Wang, Rui and Hu, Xuehai and Xu, Wei},
+  journal={arXiv preprint arXiv:2512.07113},
+  year={2025},
+  url={https://arxiv.org/pdf/2512.07113}
+}