How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="DrUkachi/ktt-math-tutor-models",
	filename="tinyllama-numeracy-Q4_K_M.gguf",
)
llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

KTT Math Tutor β€” Models

Companion model artefacts for the AIMS KTT Hackathon Tier-3 submission S2.T3.1 AI Math Tutor for Early Learners. Source code and training scripts: https://github.com/DrUkachi/ktt-math-tutor.

What's here

Subfolder / file Size Role
whisper-tiny-child-lora-ct2int8/ 44 MB child-voice LoRA-tuned Whisper-tiny, merged, CTranslate2 int8 for CPU
tinyllama-numeracy-qlora-adapter/ 21 MB QLoRA adapter (r=16, NF4 base) trained on 200 synthetic numeracy instructions
tinyllama-numeracy-Q4_K_M.gguf 637 MB the adapter merged into TinyLlama-1.1B and quantised to Q4_K_M

How to use

ASR (child-voice Whisper)

from faster_whisper import WhisperModel
model = WhisperModel("DrUkachi/ktt-math-tutor-models",
                     device="cpu", compute_type="int8",
                     local_files_only=False)
segments, _ = model.transcribe(wav, language="en", beam_size=1)

Or, via the tutor's wrapper (auto-picks tutor/asr_model/ from the repo):

git clone https://github.com/DrUkachi/ktt-math-tutor
cd ktt-math-tutor && pip install -r requirements.txt
python demo.py

Eval on the in-distribution child-voice corpus (36 clips, pitched +3/+4.5/+6 semitones):

  • Baseline vanilla Whisper-tiny int8: WER 0.7048
  • This LoRA-tuned model: WER 0.0000

See scripts/eval_wer.py and metrics/wer_*.json in the code repo.

LLM head (weekly parent summary)

from llama_cpp import Llama
llm = Llama(
    model_path="tinyllama-numeracy-Q4_K_M.gguf",
    n_ctx=512, n_threads=4, verbose=False,
)
r = llm.create_chat_completion(messages=[
    {"role": "system", "content": "You are a warm math tutor. One short sentence."},
    {"role": "user", "content": "The child is strong at addition; needs practice on number sense."},
])

Or via the tutor's wrapper (tutor/llm_head.py): the model is resolved in order $TUTOR_LLM_GGUF β†’ this tuned Q4_K_M β†’ community TinyLlama base β†’ deterministic fallback. None of the LLM path is in the inference hot path; it runs once per learner per week for the voiced parent summary.

Training recipes

  • ASR LoRA: scripts/train_whisper_lora.py β€” 4 epochs on L4 GPU, LoRA r=16 on q_proj/v_proj, merge, export to CT2 int8.
  • LLM QLoRA: scripts/train_llm_qlora.py β€” 2 epochs on L4 GPU, NF4 4-bit base, LoRA r=16 on q/k/v/o_proj, merge, convert to GGUF via pinned llama.cpp b4400 script, quantise to Q4_K_M via the llama_cpp.llama_model_quantize Python binding.

License

MIT. Attribution welcomed; not required.

Downloads last month
10
GGUF
Model size
1B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support