Sweaterdog commited on
Commit
7ead420
·
verified ·
1 Parent(s): 7712430

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -3
README.md CHANGED
@@ -1,3 +1,29 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ # 🧠 Smol-reason, a 3B model test for future models 🧠
6
+
7
+ ## Why?
8
+
9
+ When making the Andy series of models, I have been using PPO techniques to train models.
10
+
11
+ But as the bleeding edge of small models is becoming clear, reasoning models are the winners.
12
+
13
+ So, in order to learn the nuances of training models, I decided to train a small 3B model using GRPO techniques instead of PPO.
14
+
15
+ ## ------------------------------------------------------------------------------------------------------------------------------------------------------
16
+
17
+ The base model was Qwen2.5 3B, it is very smart as is, and even smarter with reasoning.
18
+
19
+ This model uses the following format while responding:
20
+ ```
21
+ <think>
22
+ --reasoning content here--
23
+ </think>
24
+ <answer
25
+ --answer content here--
26
+ </answer>
27
+ ```
28
+
29
+ Similar to the XML reasoning format but changed to use DeepSeek-R1 / QwQ thinking blocks.