PyTorch
plantbimoe
biology
genomics
language model
plants
custom_code
linkp commited on
Commit
f962c52
·
1 Parent(s): ae0b514

citation info

Browse files
Files changed (1) hide show
  1. README.md +11 -1
README.md CHANGED
@@ -61,4 +61,14 @@ The pre-training strategy of PlantBiMoE utilizes Masked Language Modeling. Durin
61
  ### Pre-training Details
62
  The pre-training of PlantBiMoE was distributed across a computing node with 8 Nvidia A800-80G GPUs, where the batch size for each GPU was set to 4. With 8-step gradient accumulation, the effective batch size became 256. The AdamW optimizer was used, with \\(\beta_{1}\\) set to 0.95, \\(\beta_{2}\\) to 0.9, and a weight decay of 0.1. The total number of training steps was equivalent to 10 epochs. During the initial 2% of the training steps, the learning rate increased linearly from 0 to 0.008, followed by a cosine decay to 0.004. Mixed precision training with bf16 was adopted to improve training efficiency and reduce memory overhead, resulting in a total pre-training time of approximately 166 hours.
63
 
64
- ## BibTeX entry and citation info
 
 
 
 
 
 
 
 
 
 
 
61
  ### Pre-training Details
62
  The pre-training of PlantBiMoE was distributed across a computing node with 8 Nvidia A800-80G GPUs, where the batch size for each GPU was set to 4. With 8-step gradient accumulation, the effective batch size became 256. The AdamW optimizer was used, with \\(\beta_{1}\\) set to 0.95, \\(\beta_{2}\\) to 0.9, and a weight decay of 0.1. The total number of training steps was equivalent to 10 epochs. During the initial 2% of the training steps, the learning rate increased linearly from 0 to 0.008, followed by a cosine decay to 0.004. Mixed precision training with bf16 was adopted to improve training efficiency and reduce memory overhead, resulting in a total pre-training time of approximately 166 hours.
63
 
64
+ ## BibTeX entry and citation info
65
+ If you use this model, please cite the following paper:
66
+
67
+ ```bibtex
68
+ @article{lin2025plantbimoe,
69
+ title={PlantBiMoE: A Bidirectional Foundation Model with SparseMoE for Plant Genomes},
70
+ author={Lin, Kepeng and Zhang, Qizhe and Wang, Rui and Hu, Xuehai and Xu, Wei},
71
+ journal={arXiv preprint arXiv:2512.07113},
72
+ year={2025},
73
+ url={https://arxiv.org/pdf/2512.07113}
74
+ }