Update README.md
Browse files
README.md
CHANGED
|
@@ -23,7 +23,7 @@ Full precision version of Superthoughts lite v2 MoE. 3.91B parameters, 2 experts
|
|
| 23 |
This is the non-experimental version of Superthoughts Lite v2. Offering better accuracy at all tasks, better performance and less looping while generating responses.
|
| 24 |
|
| 25 |
We trained it by first creating a base model for all the experts, which was fine-tuned using GRPO techniques using *Unsloth* on top of meta-llama/Llama-3.2-1B-Instruct.
|
| 26 |
-
After making the base model, we trained each
|
| 27 |
- Chat reasoning expert,
|
| 28 |
- Math reasoning expert,
|
| 29 |
- Code reasoning expert,
|
|
|
|
| 23 |
This is the non-experimental version of Superthoughts Lite v2. Offering better accuracy at all tasks, better performance and less looping while generating responses.
|
| 24 |
|
| 25 |
We trained it by first creating a base model for all the experts, which was fine-tuned using GRPO techniques using *Unsloth* on top of meta-llama/Llama-3.2-1B-Instruct.
|
| 26 |
+
After making the base model, we trained each potential expert using SFT. After doing SFT, we did GRPO again. in total there are 4 experts:
|
| 27 |
- Chat reasoning expert,
|
| 28 |
- Math reasoning expert,
|
| 29 |
- Code reasoning expert,
|