Sweaterdog
/

Smol-Reason

Model card Files Files and versions

Sweaterdog commited on Mar 13, 2025

Commit

76169e5

·

verified ·

1 Parent(s): 284933f

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -12,7 +12,7 @@ But as the bleeding edge of small models is becoming clear, reasoning models are
 So, in order to learn the nuances of training models, I decided to train a small 3B model using GRPO techniques instead of PPO.
-## ----------------------------------------------------------------------------------------------------------------------
 The base model was Qwen2.5 3B, it is very smart as is, and even smarter with reasoning.

 So, in order to learn the nuances of training models, I decided to train a small 3B model using GRPO techniques instead of PPO.
+## ---------------------------------------------------------------------------------------------------------------------
 The base model was Qwen2.5 3B, it is very smart as is, and even smarter with reasoning.