Instructions to use Ahmed5/AIMS_KTT_Day3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Ahmed5/AIMS_KTT_Day3 with PEFT:
Task type is invalid.
- Transformers
How to use Ahmed5/AIMS_KTT_Day3 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Ahmed5/AIMS_KTT_Day3") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Ahmed5/AIMS_KTT_Day3") model = AutoModelForCausalLM.from_pretrained("Ahmed5/AIMS_KTT_Day3") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Ahmed5/AIMS_KTT_Day3 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Ahmed5/AIMS_KTT_Day3" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ahmed5/AIMS_KTT_Day3", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Ahmed5/AIMS_KTT_Day3
- SGLang
How to use Ahmed5/AIMS_KTT_Day3 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Ahmed5/AIMS_KTT_Day3" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ahmed5/AIMS_KTT_Day3", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Ahmed5/AIMS_KTT_Day3" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ahmed5/AIMS_KTT_Day3", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Ahmed5/AIMS_KTT_Day3 with Docker Model Runner:
docker model run hf.co/Ahmed5/AIMS_KTT_Day3
Math tutor — QLoRA merged weights (Modal training)
This directory holds merged full-precision (bf16) weights after a QLoRA
fine-tune run on Modal, produced by
scripts/train_qlora_modal.py. The base model
is TinyLlama/TinyLlama-1.1B-Chat-v1.0 unless you overrode --base-model.
Training method (QLoRA)
Training on Modal uses the standard QLoRA stack:
- 4-bit quantization of the base model (NF4, double quantization, bf16 compute).
prepare_model_for_kbit_trainingthen LoRA (PEFT) on attention projectionsq_proj,k_proj,v_proj,o_proj(defaults:r=8,alpha=16, dropout0.05).- 8-bit paged AdamW optimizer during SFT.
- After training, the adapter is merged into the base and saved as a normal
causal LM checkpoint (this folder’s
config.json, tokenizer files, and weight shards if present).
So: yes — the Modal job is QLoRA, not full fine-tuning of all base weights.
Data
Instruction rows are built inside the training image from the project curriculum
via build_instruction_set in scripts/train_qlora.py:
synthetic tutor-style turns in English, French, and Kinyarwanda derived from
the numeracy items (on the order of ~684 JSONL records for the default seed).
How these files got here
Run on Modal (example):
modal run scripts/train_qlora_modal.pyPull the merged checkpoint from the
math-tutor-checkpointsvolume:modal volume get math-tutor-checkpoints /math_tutor_merged ./checkpoints/math_tutor_mergedIf your local layout matches this repo, the merged weights and tokenizer should end up under
checkpoints/(orcheckpoints/math_tutor_merged/— copy or symlink so thisREADME.mdsits next to the Hub upload).
Loading (Transformers)
Replace paths with your actual folder or Hub repo id.
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
path = "." # or "your-username/your-repo"
tok = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
path,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
Pushing to Hugging Face Hub
Either pass --push-to your-username/repo-name with HF_TOKEN set when running
Modal, or upload this folder after training:
huggingface-cli upload your-username/your-repo . --repo-type model
Use this file as the repo README.md on the Hub (same content is valid as
the model card).
Limits
- Intended as a small numeracy / feedback-style language head, not general chat.
- Merged weights are not int4 GGUF; GGUF export is a separate step
(
llama.cppconvert/quantize) if you need that format. - Base model and dataset licenses apply in addition to this project’s MIT license for the training code and generated adapter/merge recipe.
Citation
If you use this checkpoint, cite the TinyLlama base model and link your Hub repo or this project’s repository as appropriate.
- Downloads last month
- -
Model tree for Ahmed5/AIMS_KTT_Day3
Base model
TinyLlama/TinyLlama-1.1B-Chat-v1.0