ToT-Reasoner-Qwen3-1.7B
Model Description
Fine-tuned ziadrone/oneplusaries1 using Supervised Fine-Tuning (SFT) on open-r1/Mixture-of-Thoughts (math split). Optimized for mathematical reasoning.
Training Data
- Source:
open-r1/Mixture-of-Thoughts(math split, up to 50 samples). - Format: Prompts with
<reasoning>...</reasoning><answer>...</answer>structure.
Fine-Tuning Process
- Method: SFT with learning rate=1e-5, 3 epochs, batch size=1.
- Setup: Google Colab Pro with T4 GPU.