Sweaterdog
/

Smol-Reason

Model card Files Files and versions

Sweaterdog commited on Mar 13, 2025

Commit

7ead420

·

verified ·

1 Parent(s): 7712430

Update README.md

Files changed (1) hide show

README.md +29 -3

README.md CHANGED Viewed

@@ -1,3 +1,29 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+---
+# 🧠 Smol-reason, a 3B model test for future models 🧠
+## Why?
+When making the Andy series of models, I have been using PPO techniques to train models.
+But as the bleeding edge of small models is becoming clear, reasoning models are the winners.
+So, in order to learn the nuances of training models, I decided to train a small 3B model using GRPO techniques instead of PPO.
+## ------------------------------------------------------------------------------------------------------------------------------------------------------
+The base model was Qwen2.5 3B, it is very smart as is, and even smarter with reasoning.
+This model uses the following format while responding:
+```
+<think>
+--reasoning content here--
+</think>
+<answer
+--answer content here--
+</answer>
+```
+Similar to the XML reasoning format but changed to use DeepSeek-R1 / QwQ thinking blocks.