ToT-Reasoner-Qwen3-1.7B

Model Description

Fine-tuned ziadrone/oneplusaries1 using Supervised Fine-Tuning (SFT) on open-r1/Mixture-of-Thoughts (math split). Optimized for mathematical reasoning.

Training Data

Source: open-r1/Mixture-of-Thoughts (math split, up to 50 samples).
Format: Prompts with <reasoning>...</reasoning><answer>...</answer> structure.

Fine-Tuning Process

Method: SFT with learning rate=1e-5, 3 epochs, batch size=1.
Setup: Google Colab Pro with T4 GPU.

ziadrone
/

oneplusaries2

ToT-Reasoner-Qwen3-1.7B

Model Description

Training Data

Fine-Tuning Process

Usage