SpiceRL
/

DRA-DR.GRPO

nielsr HF Staff commited on Jun 18, 2025

Commit

9182f46

verified ·

1 Parent(s): 2e2054a

Add metadata

This PR ensures a "Use this model" button appears at the top right, and links the base model.

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,5 +1,9 @@
 ---
 license: cc-by-4.0
 ---
 This model is described in the paper [DRA-GRPO: Exploring Diversity-Aware Reward Adjustment for R1-Zero-Like Training of Large Language Models](https://arxiv.org/abs/2505.09655).

 ---
 license: cc-by-4.0
+library_name: transformers
+pipeline_tag: text-generation
+base_model:
+- Qwen/Qwen2.5-1.5B-Instruct
 ---
 This model is described in the paper [DRA-GRPO: Exploring Diversity-Aware Reward Adjustment for R1-Zero-Like Training of Large Language Models](https://arxiv.org/abs/2505.09655).