Update README.md
Browse files
README.md
CHANGED
|
@@ -12,7 +12,7 @@ But as the bleeding edge of small models is becoming clear, reasoning models are
|
|
| 12 |
|
| 13 |
So, in order to learn the nuances of training models, I decided to train a small 3B model using GRPO techniques instead of PPO.
|
| 14 |
|
| 15 |
-
##
|
| 16 |
|
| 17 |
The base model was Qwen2.5 3B, it is very smart as is, and even smarter with reasoning.
|
| 18 |
|
|
|
|
| 12 |
|
| 13 |
So, in order to learn the nuances of training models, I decided to train a small 3B model using GRPO techniques instead of PPO.
|
| 14 |
|
| 15 |
+
## ---------------------------------------------------------------------------------------------------------------------
|
| 16 |
|
| 17 |
The base model was Qwen2.5 3B, it is very smart as is, and even smarter with reasoning.
|
| 18 |
|