nielsr HF Staff commited on
Commit
f552c2b
·
verified ·
1 Parent(s): 9b9ec5d

Improve model card metadata and content

Browse files

Hi! I'm Niels from the Hugging Face community team. This PR improves the model card for LLaDA-Instruct-JustGRPO by:
- Adding `pipeline_tag`, `license`, and `library_name` metadata.
- Linking the model to its research paper.
- Updating the placeholder arXiv link with the correct ID.
- Adding the BibTeX citation for the work.

These changes help improve the discoverability and documentation of your model. Please feel free to merge if this looks good to you!

Files changed (1) hide show
  1. README.md +27 -3
README.md CHANGED
@@ -1,13 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
1
  # LLaDA-Instruct-JustGRPO
2
 
3
  This model is [LLaDA-8B-Instruct](https://huggingface.co/GSAI-ML/LLaDA-8B-Instruct) fine-tuned with **JustGRPO** on GSM8K.
4
 
 
 
5
  ## Method
6
 
7
- JustGRPO is a minimalist RL approach for diffusion language models. Instead of complex diffusion-specific RL adaptations, we simply treat dLLMs as autoregressive models during training and apply standard GRPO. See our paper *"The Flexibility Trap: Rethinking the Value of Arbitrary Order in Diffusion Language Models"* for details.
8
 
9
- **Paper:** [arXiv:2601.xxxxx](https://arxiv.org/abs/2601.xxxxx)
10
- **Code:** [https://github.com/LeapLabTHU/JustGRPO](https://github.com/LeapLabTHU/JustGRPO)
 
11
 
12
  ## Performance on GSM8K
13
 
@@ -19,3 +33,13 @@ JustGRPO is a minimalist RL approach for diffusion language models. Instead of c
19
 
20
  For generation and evaluation, please refer to our [GitHub repository](https://github.com/LeapLabTHU/JustGRPO).
21
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ library_name: transformers
4
+ pipeline_tag: text-generation
5
+ base_model: GSAI-ML/LLaDA-8B-Instruct
6
+ tags:
7
+ - reasoning
8
+ - math
9
+ - diffusion-language-model
10
+ ---
11
+
12
  # LLaDA-Instruct-JustGRPO
13
 
14
  This model is [LLaDA-8B-Instruct](https://huggingface.co/GSAI-ML/LLaDA-8B-Instruct) fine-tuned with **JustGRPO** on GSM8K.
15
 
16
+ It was introduced in the paper [The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models](https://huggingface.co/papers/2601.15165).
17
+
18
  ## Method
19
 
20
+ JustGRPO is a minimalist RL approach for diffusion language models. Instead of complex diffusion-specific RL adaptations, we simply treat dLLMs as autoregressive models during training and apply standard GRPO. See our paper for details.
21
 
22
+ - **Project Page:** [https://nzl-thu.github.io/the-flexibility-trap](https://nzl-thu.github.io/the-flexibility-trap)
23
+ - **Paper:** [arXiv:2601.15165](https://arxiv.org/abs/2601.15165)
24
+ - **Code:** [https://github.com/LeapLabTHU/JustGRPO](https://github.com/LeapLabTHU/JustGRPO)
25
 
26
  ## Performance on GSM8K
27
 
 
33
 
34
  For generation and evaluation, please refer to our [GitHub repository](https://github.com/LeapLabTHU/JustGRPO).
35
 
36
+ ## Citation
37
+
38
+ ```bibtex
39
+ @article{ni2026flexibility,
40
+ title={The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models},
41
+ author={Ni, Zanlin and Wang, Shenzhi and Yue, Yang and Yu, Tianyu and Zhao, Weilin and Hua, Yeguo and Chen, Tianyi and Song, Jun and Yu, Cheng and Zheng, Bo and Huang, Gao},
42
+ journal={arXiv preprint arXiv:2601.15165},
43
+ year={2026}
44
+ }
45
+ ```