Lansechen
/

Qwen2.5-7B-Open-R1-GRPO-math-lighteval-weighted-sync

@@ -1,16 +1,17 @@
 ---
 base_model: Qwen/Qwen2.5-7B
-datasets: DigitalLearningGmbH/MATH-lighteval
 library_name: transformers
 tags:
 - generated_from_trainer
-- open-r1
 licence: license
 ---
-# Model Card for None
-This model is a fine-tuned version of [Qwen/Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B) on the [DigitalLearningGmbH/MATH-lighteval](https://huggingface.co/datasets/DigitalLearningGmbH/MATH-lighteval) dataset.
 It has been trained using [TRL](https://github.com/huggingface/trl).
 ## Quick start
@@ -26,7 +27,7 @@ print(output["generated_text"])
 ## Training procedure
-[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/chenran1995-the-chinese-university-of-hong-kong/huggingface/runs/ljeo7hai)
 This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).

 ---
 base_model: Qwen/Qwen2.5-7B
 library_name: transformers
+model_name: Qwen2.5-7B-Open-R1-GRPO-math-lighteval-weighted-sync
 tags:
 - generated_from_trainer
+- trl
+- grpo
 licence: license
 ---
+# Model Card for Qwen2.5-7B-Open-R1-GRPO-math-lighteval-weighted-sync
+This model is a fine-tuned version of [Qwen/Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B).
 It has been trained using [TRL](https://github.com/huggingface/trl).
 ## Quick start
 ## Training procedure
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/chenran1995-the-chinese-university-of-hong-kong/huggingface/runs/plhx359x)
 This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).

all_results.json CHANGED Viewed

@@ -1,8 +1,8 @@
 {
     "total_flos": 0.0,
-    "train_loss": 0.02047422327097703,
-    "train_runtime": 34869.9992,
     "train_samples": 7500,
-    "train_samples_per_second": 0.43,
     "train_steps_per_second": 0.004
 }

 {
     "total_flos": 0.0,
+    "train_loss": 0.019874975508476684,
+    "train_runtime": 35069.7154,
     "train_samples": 7500,
+    "train_samples_per_second": 0.428,
     "train_steps_per_second": 0.004
 }

train_results.json CHANGED Viewed

@@ -1,8 +1,8 @@
 {
     "total_flos": 0.0,
-    "train_loss": 0.02047422327097703,
-    "train_runtime": 34869.9992,
     "train_samples": 7500,
-    "train_samples_per_second": 0.43,
     "train_steps_per_second": 0.004
 }

 {
     "total_flos": 0.0,
+    "train_loss": 0.019874975508476684,
+    "train_runtime": 35069.7154,
     "train_samples": 7500,
+    "train_samples_per_second": 0.428,
     "train_steps_per_second": 0.004
 }

trainer_state.json CHANGED Viewed

The diff for this file is too large to render. See raw diff