Ravi โ€” Math Tutor (Llama 3.1 8B Instruct)

A fine-tuned version of Llama 3.1 8B Instruct trained to function as Ravi, a math tutoring assistant specializing in algebra, calculus, and word problems. Trained with QLoRA via Unsloth + TRL SFTTrainer on Google Colab T4.

Ravi teaches rather than just answers โ€” it scaffolds understanding, asks checkpoint questions, handles student misconceptions with a 4-tier escalation protocol, and redirects out-of-domain queries.


Model Details

Property Value
Base model unsloth/llama-3.1-8b-instruct-bnb-4bit
Fine-tuning method QLoRA (4-bit NF4 quantization)
LoRA rank 16
LoRA alpha 16
LoRA dropout 0
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Training steps 200
Learning rate 2e-4 (cosine schedule, 10 warmup steps)
Effective batch size 8 (2 per device ร— 4 gradient accumulation)
Optimizer AdamW 8-bit
Max sequence length 2,048 tokens
Packing Disabled
Loss masking Student turns masked (-100); model trains on teacher responses only
Platform Google Colab T4 (15GB VRAM, ~13.6GB used)
Framework Unsloth + PEFT + TRL SFTTrainer

Dataset

Training data: Sai345/math-tutor-sft-dataset

Source Examples Description
MathDial (eth-nlped/mathdial) 1,696 Multi-turn math tutoring dialogues. Filtered to self_correctness == "Yes" only. Converted from pipe-delimited format to Llama 3.1 chat template. Teacher tags stripped. License: CC-BY-SA 4.0
Synthetic (Groq Llama 4 Scout 17B) 455 5 typed categories: algebra scaffolding (150), word problem scaffolding (80), misconception correction (120), difficulty adaptation (80), OOD refusal (25). All follow the Ravi persona and 4-tier escalation protocol.
Total 2,151 1,935 train / 216 test (90/10 split)

Each example is a full multi-turn conversation formatted as a single training instance with the system prompt embedded.


Usage

Requirements

pip install transformers peft bitsandbytes accelerate
Downloads last month
39
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Dataset used to train Sai345/llama-3.1-8b-math-tutor