Model save

Files changed (6) hide show

README.md CHANGED Viewed

@@ -26,7 +26,7 @@ print(output["generated_text"])
 ## Training procedure
-[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/hzy312/20250213_r1_reproduction_Qwen2.5-7B-Instruct/runs/7260140985.63412-d3ebea55-1c2d-453c-ad42-f0417f673921)
 This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).

 ## Training procedure
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/hzy312/20250218_knowledge_r1_Qwen2.5-1.5B-Instruct/runs/7260138335.90879-c2e38c2a-119b-4bdb-a39e-ea5d90047159)
 This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).

all_results.json CHANGED Viewed

@@ -1,8 +1,8 @@
 {
     "total_flos": 0.0,
-    "train_loss": 5.353166522324368,
-    "train_runtime": 7842.24,
     "train_samples": 15000,
-    "train_samples_per_second": 1.913,
-    "train_steps_per_second": 0.06
 }

 {
     "total_flos": 0.0,
+    "train_loss": 0.0492804956890872,
+    "train_runtime": 5735.3452,
     "train_samples": 15000,
+    "train_samples_per_second": 5.231,
+    "train_steps_per_second": 0.374
 }

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:1ceb357dd975e64492dee77cfb9f0f4b2351efa227fa3727c0d9de1c23c49f9e
 size 3554214752

 version https://git-lfs.github.com/spec/v1
+oid sha256:f0ecc6d92dada0cad8f4276dc9085e0f9730519e1c364ea6e61a2707f0f51e1a
 size 3554214752

train_results.json CHANGED Viewed

@@ -1,8 +1,8 @@
 {
     "total_flos": 0.0,
-    "train_loss": 5.353166522324368,
-    "train_runtime": 7842.24,
     "train_samples": 15000,
-    "train_samples_per_second": 1.913,
-    "train_steps_per_second": 0.06
 }

 {
     "total_flos": 0.0,
+    "train_loss": 0.0492804956890872,
+    "train_runtime": 5735.3452,
     "train_samples": 15000,
+    "train_samples_per_second": 5.231,
+    "train_steps_per_second": 0.374
 }

trainer_state.json CHANGED Viewed

The diff for this file is too large to render. See raw diff

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0e44d60999224ad48d7078952f9d80a8dbfe63fe134329c74de5b663f87ebb7b
-size 7544

 version https://git-lfs.github.com/spec/v1
+oid sha256:85e88a7cb4ca0e7b67f0a7dc72b9e03de20126bc8ad6cd52ec1f06c76878f217
+size 7608