HuangXinBa
/

GRPO

@@ -1,19 +1,20 @@
 ---
 license: apache-2.0
 language: en
 tags:
-  - text-generation
-  - causal-lm
-  - reinforcement-learning
-  - GRPO
-  - instruction-tuning
-  - chain-of-thought
 datasets:
-  - gsm8k
 pipeline_tag: text-generation
 widget:
-  - text: "What is 27 plus 16? Let's think step by step."
 ---
 # GRPO: Finetuned Causal Language Model using Generalized Reinforcement Policy Optimization

 ---
 license: apache-2.0
 language: en
 tags:
+- text-generation
+- causal-lm
+- reinforcement-learning
+- GRPO
+- instruction-tuning
+- chain-of-thought
+- trl
+- grpo
 datasets:
+- gsm8k
 pipeline_tag: text-generation
 widget:
+- text: What is 27 plus 16? Let's think step by step.
 ---
 # GRPO: Finetuned Causal Language Model using Generalized Reinforcement Policy Optimization