Improve model card metadata and content

Hi! I'm Niels from the Hugging Face community team. This PR improves the model card for LLaDA-Instruct-JustGRPO by:
- Adding `pipeline_tag`, `license`, and `library_name` metadata.
- Linking the model to its research paper.
- Updating the placeholder arXiv link with the correct ID.
- Adding the BibTeX citation for the work.

These changes help improve the discoverability and documentation of your model. Please feel free to merge if this looks good to you!

Files changed (1) hide show

README.md +27 -3

README.md CHANGED Viewed

@@ -1,13 +1,27 @@
 # LLaDA-Instruct-JustGRPO
 This model is [LLaDA-8B-Instruct](https://huggingface.co/GSAI-ML/LLaDA-8B-Instruct) fine-tuned with **JustGRPO** on GSM8K.
 ## Method
-JustGRPO is a minimalist RL approach for diffusion language models. Instead of complex diffusion-specific RL adaptations, we simply treat dLLMs as autoregressive models during training and apply standard GRPO. See our paper *"The Flexibility Trap: Rethinking the Value of Arbitrary Order in Diffusion Language Models"* for details.
-**Paper:** [arXiv:2601.xxxxx](https://arxiv.org/abs/2601.xxxxx)
-**Code:** [https://github.com/LeapLabTHU/JustGRPO](https://github.com/LeapLabTHU/JustGRPO)
 ## Performance on GSM8K
@@ -19,3 +33,13 @@ JustGRPO is a minimalist RL approach for diffusion language models. Instead of c
 For generation and evaluation, please refer to our [GitHub repository](https://github.com/LeapLabTHU/JustGRPO).

+---
+license: mit
+library_name: transformers
+pipeline_tag: text-generation
+base_model: GSAI-ML/LLaDA-8B-Instruct
+tags:
+- reasoning
+- math
+- diffusion-language-model
+---
 # LLaDA-Instruct-JustGRPO
 This model is [LLaDA-8B-Instruct](https://huggingface.co/GSAI-ML/LLaDA-8B-Instruct) fine-tuned with **JustGRPO** on GSM8K.
+It was introduced in the paper [The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models](https://huggingface.co/papers/2601.15165).
 ## Method
+JustGRPO is a minimalist RL approach for diffusion language models. Instead of complex diffusion-specific RL adaptations, we simply treat dLLMs as autoregressive models during training and apply standard GRPO. See our paper for details.
+- **Project Page:** [https://nzl-thu.github.io/the-flexibility-trap](https://nzl-thu.github.io/the-flexibility-trap)
+- **Paper:** [arXiv:2601.15165](https://arxiv.org/abs/2601.15165)
+- **Code:** [https://github.com/LeapLabTHU/JustGRPO](https://github.com/LeapLabTHU/JustGRPO)
 ## Performance on GSM8K
 For generation and evaluation, please refer to our [GitHub repository](https://github.com/LeapLabTHU/JustGRPO).
+## Citation
+```bibtex
+@article{ni2026flexibility,
+  title={The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models},
+  author={Ni, Zanlin and Wang, Shenzhi and Yue, Yang and Yu, Tianyu and Zhao, Weilin and Hua, Yeguo and Chen, Tianyi and Song, Jun and Yu, Cheng and Zheng, Bo and Huang, Gao},
+  journal={arXiv preprint arXiv:2601.15165},
+  year={2026}
+}
+```