Update README.md
Browse files
README.md
CHANGED
|
@@ -20,7 +20,7 @@ It has been trained using [TRL](https://github.com/huggingface/trl), [unsloth](h
|
|
| 20 |
|
| 21 |
## Training procedure
|
| 22 |
|
| 23 |
-
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/captainhpy-beijing-university-of-technology/tiny-reasoning/runs/
|
| 24 |
|
| 25 |
- This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
|
| 26 |
- Dataset: [unsloth/OpenMathReasoning-mini](https://huggingface.co/datasets/unsloth/OpenMathReasoning-mini)
|
|
|
|
| 20 |
|
| 21 |
## Training procedure
|
| 22 |
|
| 23 |
+
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/captainhpy-beijing-university-of-technology/tiny-reasoning/runs/6zmbkin8)
|
| 24 |
|
| 25 |
- This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
|
| 26 |
- Dataset: [unsloth/OpenMathReasoning-mini](https://huggingface.co/datasets/unsloth/OpenMathReasoning-mini)
|