oneplusaries2 / README.md
ziadrone's picture
Upload model card
9d8010b verified

ToT-Reasoner-Qwen3-1.7B

Model Description

Fine-tuned ziadrone/oneplusaries1 using Supervised Fine-Tuning (SFT) on open-r1/Mixture-of-Thoughts (math split). Optimized for mathematical reasoning.

Training Data

  • Source: open-r1/Mixture-of-Thoughts (math split, up to 50 samples).
  • Format: Prompts with <reasoning>...</reasoning><answer>...</answer> structure.

Fine-Tuning Process

  • Method: SFT with learning rate=1e-5, 3 epochs, batch size=1.
  • Setup: Google Colab Pro with T4 GPU.

Usage