Instructions to use arunma/monty with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use arunma/monty with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-3B-Instruct") model = PeftModel.from_pretrained(base_model, "arunma/monty") - Notebooks
- Google Colab
- Kaggle
monty β a LoRA persona adapter for Qwen2.5-0.5B-Instruct
A small LoRA adapter that gives Qwen2.5-0.5B-Instruct an opinionated, slightly-grumpy "Monty" voice. Trained as the Phase A milestone of learn-you-an-sft β a from-scratch tour of supervised fine-tuning.
The base model weights are not distributed here. This repo ships only the LoRA adapter (adapter_model.safetensors, ~tens of MB) plus tokenizer config. You merge it onto the base at load time.
Model Details
Model Description
- Developed by: Arun Manivannan (@arunma)
- Model type: Causal language model, LoRA adapter (PEFT)
- Language(s) (NLP): English
- License: Apache 2.0 (inherits from base model)
- Finetuned from model:
Qwen/Qwen2.5-0.5B-Instruct
Model Sources
- Repository: https://github.com/arunma/learn-you-an-sft
- Training script:
runs/sft_v1_trl/train.py
Uses
Direct Use
Educational. Demonstrates how a few thousand persona-shaped Q&A pairs can shift a small instruction-tuned model's voice and disposition without touching factual knowledge.
Out-of-Scope Use
- Production assistants. This is a 0.5B model trained on ~14k synthetic pairs β it will hallucinate, contradict itself, and produce dated information.
- Safety-critical workflows.
- Anything where you need a model that hasn't been fine-tuned on synthetic data from another model (Gemini) without a separate license review.
Bias, Risks, and Limitations
- Persona injection over knowledge: the adapter changes how the model talks, not what it knows. Underlying factual gaps and biases of Qwen2.5-0.5B remain.
- Synthetic data lineage: training pairs were distilled from Gemini Pro. Any systematic biases in Gemini's outputs propagate here.
- Small corpus, small model: 14k examples on a 0.5B base produces a noticeably opinionated voice but does not guarantee consistency across topics.
- No safety tuning: no RLHF/DPO step. The adapter does not refuse harmful requests any better than the base.
Recommendations
Treat outputs as drafts, not facts. If you fork this for your own persona, plan on a separate evaluation pass.
How to Get Started
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base_id = "Qwen/Qwen2.5-0.5B-Instruct"
adapter_id = "arunma/monty"
tokenizer = AutoTokenizer.from_pretrained(adapter_id)
base = AutoModelForCausalLM.from_pretrained(base_id, torch_dtype="bfloat16", device_map="auto")
model = PeftModel.from_pretrained(base, adapter_id)
model.eval()
messages = [
{"role": "system", "content": "You are Monty."},
{"role": "user", "content": "Should I learn Rust?"},
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
out = model.generate(inputs, max_new_tokens=200, do_sample=True, temperature=0.7, top_p=0.9)
print(tokenizer.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))
Training Details
Training Data
- Source: synthetic Q&A pairs generated by Gemini Pro using a Monty persona prompt, plus a small handcrafted seed.
- Pipeline: ingest β normalize β language filter (fastText
lid.176, English only) β MinHash near-dedup β 95/5 train/val split. - Counts: 14,768 loaded β 14,566 after filter+dedup β 14,293 train / 273 val.
- Data lives in the repo's
data/pipeline
Training Procedure
- Framework: TRL
SFTTrainerwithassistant_only_loss=True(loss masked to assistant tokens only via the Qwen chat template). - Adapter: LoRA, attention-only target modules (
q_proj,k_proj,v_proj,o_proj).
Hyperparameters
| Setting | Value |
|---|---|
LoRA rank r |
16 |
| LoRA alpha | 32 |
| LoRA dropout | 0.05 |
| Target modules | q_proj, k_proj, v_proj, o_proj |
| Epochs | 3 |
| Per-device batch size | 8 |
| Max sequence length | 1024 |
| Learning rate | 2e-4 |
| LR scheduler | cosine |
| Warmup ratio | 0.03 |
| Weight decay | 0.0 |
| Optimizer | AdamW (default) |
| Precision | bfloat16 |
| Gradient checkpointing | enabled |
Compute
- Hardware: 1Γ NVIDIA RTX 5090 (32 GB), RunPod
- Software: PyTorch 2.x, Transformers 4.4x, TRL 0.18+, PEFT 0.11+
- Approximate training time: ~20-25 minutes for 3 epochs over 14.3k pairs
Evaluation
Currently uses only training-loop signals (train loss, eval loss on the 273-pair val split, mean token accuracy). A judge-based persona-fidelity eval (Lesson 8 of the parent project) is planned but not yet attached to this checkpoint.
Framework versions
- PEFT 0.11+
- TRL 0.18+
- Transformers 4.4x
- PyTorch 2.x
- Downloads last month
- 59