Instructions to use DrUkachi/ktt-math-tutor-models with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use DrUkachi/ktt-math-tutor-models with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="DrUkachi/ktt-math-tutor-models",
	filename="tinyllama-numeracy-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use DrUkachi/ktt-math-tutor-models with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf DrUkachi/ktt-math-tutor-models:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf DrUkachi/ktt-math-tutor-models:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf DrUkachi/ktt-math-tutor-models:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf DrUkachi/ktt-math-tutor-models:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf DrUkachi/ktt-math-tutor-models:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf DrUkachi/ktt-math-tutor-models:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf DrUkachi/ktt-math-tutor-models:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf DrUkachi/ktt-math-tutor-models:Q4_K_M

Use Docker

docker model run hf.co/DrUkachi/ktt-math-tutor-models:Q4_K_M

LM Studio
Jan
Ollama
How to use DrUkachi/ktt-math-tutor-models with Ollama:
```
ollama run hf.co/DrUkachi/ktt-math-tutor-models:Q4_K_M
```

Unsloth Studio new

How to use DrUkachi/ktt-math-tutor-models with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for DrUkachi/ktt-math-tutor-models to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for DrUkachi/ktt-math-tutor-models to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for DrUkachi/ktt-math-tutor-models to start chatting

Docker Model Runner
How to use DrUkachi/ktt-math-tutor-models with Docker Model Runner:
```
docker model run hf.co/DrUkachi/ktt-math-tutor-models:Q4_K_M
```

Lemonade

How to use DrUkachi/ktt-math-tutor-models with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull DrUkachi/ktt-math-tutor-models:Q4_K_M

Run and chat with the model

lemonade run user.ktt-math-tutor-models-Q4_K_M

List all available models

lemonade list

KTT Math Tutor — Models

Companion model artefacts for the AIMS KTT Hackathon Tier-3 submission S2.T3.1 AI Math Tutor for Early Learners. Source code and training scripts: https://github.com/DrUkachi/ktt-math-tutor.

What's here

Subfolder / file	Size	Role
`whisper-tiny-child-lora-ct2int8/`	44 MB	child-voice LoRA-tuned Whisper-tiny, merged, CTranslate2 int8 for CPU
`tinyllama-numeracy-qlora-adapter/`	21 MB	QLoRA adapter (r=16, NF4 base) trained on 200 synthetic numeracy instructions
`tinyllama-numeracy-Q4_K_M.gguf`	637 MB	the adapter merged into TinyLlama-1.1B and quantised to Q4_K_M

How to use

ASR (child-voice Whisper)

from faster_whisper import WhisperModel
model = WhisperModel("DrUkachi/ktt-math-tutor-models",
                     device="cpu", compute_type="int8",
                     local_files_only=False)
segments, _ = model.transcribe(wav, language="en", beam_size=1)

Or, via the tutor's wrapper (auto-picks tutor/asr_model/ from the repo):

git clone https://github.com/DrUkachi/ktt-math-tutor
cd ktt-math-tutor && pip install -r requirements.txt
python demo.py

Eval on the in-distribution child-voice corpus (36 clips, pitched +3/+4.5/+6 semitones):

Baseline vanilla Whisper-tiny int8: WER 0.7048
This LoRA-tuned model: WER 0.0000

See scripts/eval_wer.py and metrics/wer_*.json in the code repo.

LLM head (weekly parent summary)

from llama_cpp import Llama
llm = Llama(
    model_path="tinyllama-numeracy-Q4_K_M.gguf",
    n_ctx=512, n_threads=4, verbose=False,
)
r = llm.create_chat_completion(messages=[
    {"role": "system", "content": "You are a warm math tutor. One short sentence."},
    {"role": "user", "content": "The child is strong at addition; needs practice on number sense."},
])

Or via the tutor's wrapper (tutor/llm_head.py): the model is resolved in order $TUTOR_LLM_GGUF → this tuned Q4_K_M → community TinyLlama base → deterministic fallback. None of the LLM path is in the inference hot path; it runs once per learner per week for the voiced parent summary.

Training recipes

ASR LoRA: scripts/train_whisper_lora.py — 4 epochs on L4 GPU, LoRA r=16 on q_proj/v_proj, merge, export to CT2 int8.
LLM QLoRA: scripts/train_llm_qlora.py — 2 epochs on L4 GPU, NF4 4-bit base, LoRA r=16 on q/k/v/o_proj, merge, convert to GGUF via pinned llama.cpp b4400 script, quantise to Q4_K_M via the llama_cpp.llama_model_quantize Python binding.

License

MIT. Attribution welcomed; not required.

Downloads last month: 15

GGUF

Model size

1B params

Architecture

llama

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support