Instructions to use Ahmed5/AIMS_KTT_Day3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries
PEFT
How to use Ahmed5/AIMS_KTT_Day3 with PEFT:
```
Task type is invalid.
```

How to use Ahmed5/AIMS_KTT_Day3 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Ahmed5/AIMS_KTT_Day3")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Ahmed5/AIMS_KTT_Day3")
model = AutoModelForCausalLM.from_pretrained("Ahmed5/AIMS_KTT_Day3")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Ahmed5/AIMS_KTT_Day3 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Ahmed5/AIMS_KTT_Day3"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ahmed5/AIMS_KTT_Day3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Ahmed5/AIMS_KTT_Day3

SGLang

How to use Ahmed5/AIMS_KTT_Day3 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Ahmed5/AIMS_KTT_Day3" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ahmed5/AIMS_KTT_Day3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Ahmed5/AIMS_KTT_Day3" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ahmed5/AIMS_KTT_Day3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Ahmed5/AIMS_KTT_Day3 with Docker Model Runner:
```
docker model run hf.co/Ahmed5/AIMS_KTT_Day3
```

Math tutor — QLoRA merged weights (Modal training)

This directory holds merged full-precision (bf16) weights after a QLoRA fine-tune run on Modal, produced by scripts/train_qlora_modal.py. The base model is TinyLlama/TinyLlama-1.1B-Chat-v1.0 unless you overrode --base-model.

Training method (QLoRA)

Training on Modal uses the standard QLoRA stack:

4-bit quantization of the base model (NF4, double quantization, bf16 compute).
prepare_model_for_kbit_training then LoRA (PEFT) on attention projections q_proj, k_proj, v_proj, o_proj (defaults: r=8, alpha=16, dropout 0.05).
8-bit paged AdamW optimizer during SFT.
After training, the adapter is merged into the base and saved as a normal causal LM checkpoint (this folder’s config.json, tokenizer files, and weight shards if present).

So: yes — the Modal job is QLoRA, not full fine-tuning of all base weights.

Data

Instruction rows are built inside the training image from the project curriculum via build_instruction_set in scripts/train_qlora.py: synthetic tutor-style turns in English, French, and Kinyarwanda derived from the numeracy items (on the order of ~684 JSONL records for the default seed).

How these files got here

Run on Modal (example):
```
modal run scripts/train_qlora_modal.py
```
Pull the merged checkpoint from the math-tutor-checkpoints volume:
```
modal volume get math-tutor-checkpoints /math_tutor_merged ./checkpoints/math_tutor_merged
```
If your local layout matches this repo, the merged weights and tokenizer should end up under checkpoints/ (or checkpoints/math_tutor_merged/ — copy or symlink so this README.md sits next to the Hub upload).

Loading (Transformers)

Replace paths with your actual folder or Hub repo id.

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

path = "."  # or "your-username/your-repo"
tok = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    path,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

Pushing to Hugging Face Hub

Either pass --push-to your-username/repo-name with HF_TOKEN set when running Modal, or upload this folder after training:

huggingface-cli upload your-username/your-repo . --repo-type model

Use this file as the repo README.md on the Hub (same content is valid as the model card).

Limits

Intended as a small numeracy / feedback-style language head, not general chat.
Merged weights are not int4 GGUF; GGUF export is a separate step (llama.cpp convert/quantize) if you need that format.
Base model and dataset licenses apply in addition to this project’s MIT license for the training code and generated adapter/merge recipe.

Citation

If you use this checkpoint, cite the TinyLlama base model and link your Hub repo or this project’s repository as appropriate.

Downloads last month: -

Safetensors

Model size

1B params

Tensor type

BF16

Model tree for Ahmed5/AIMS_KTT_Day3

Base model

TinyLlama/TinyLlama-1.1B-Chat-v1.0

Adapter

(1547)

this model