monty — a LoRA persona adapter for Qwen2.5-0.5B-Instruct

A small LoRA adapter that gives Qwen2.5-0.5B-Instruct an opinionated, slightly-grumpy "Monty" voice. Trained as the Phase A milestone of learn-you-an-sft — a from-scratch tour of supervised fine-tuning.

The base model weights are not distributed here. This repo ships only the LoRA adapter (adapter_model.safetensors, ~tens of MB) plus tokenizer config. You merge it onto the base at load time.

Model Details

Model Description

Developed by: Arun Manivannan (@arunma)
Model type: Causal language model, LoRA adapter (PEFT)
Language(s) (NLP): English
License: Apache 2.0 (inherits from base model)
Finetuned from model: Qwen/Qwen2.5-0.5B-Instruct

Model Sources

Repository: https://github.com/arunma/learn-you-an-sft
Training script: runs/sft_v1_trl/train.py

Uses

Direct Use

Educational. Demonstrates how a few thousand persona-shaped Q&A pairs can shift a small instruction-tuned model's voice and disposition without touching factual knowledge.

Out-of-Scope Use

Production assistants. This is a 0.5B model trained on ~14k synthetic pairs — it will hallucinate, contradict itself, and produce dated information.
Safety-critical workflows.
Anything where you need a model that hasn't been fine-tuned on synthetic data from another model (Gemini) without a separate license review.

Bias, Risks, and Limitations

Persona injection over knowledge: the adapter changes how the model talks, not what it knows. Underlying factual gaps and biases of Qwen2.5-0.5B remain.
Synthetic data lineage: training pairs were distilled from Gemini Pro. Any systematic biases in Gemini's outputs propagate here.
Small corpus, small model: 14k examples on a 0.5B base produces a noticeably opinionated voice but does not guarantee consistency across topics.
No safety tuning: no RLHF/DPO step. The adapter does not refuse harmful requests any better than the base.

Recommendations

Treat outputs as drafts, not facts. If you fork this for your own persona, plan on a separate evaluation pass.

How to Get Started

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_id = "Qwen/Qwen2.5-0.5B-Instruct"
adapter_id = "arunma/monty"

tokenizer = AutoTokenizer.from_pretrained(adapter_id)
base = AutoModelForCausalLM.from_pretrained(base_id, torch_dtype="bfloat16", device_map="auto")
model = PeftModel.from_pretrained(base, adapter_id)
model.eval()

messages = [
    {"role": "system", "content": "You are Monty."},
    {"role": "user", "content": "Should I learn Rust?"},
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
out = model.generate(inputs, max_new_tokens=200, do_sample=True, temperature=0.7, top_p=0.9)
print(tokenizer.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))

Training Details

Training Data

Source: synthetic Q&A pairs generated by Gemini Pro using a Monty persona prompt, plus a small handcrafted seed.
Pipeline: ingest → normalize → language filter (fastText lid.176, English only) → MinHash near-dedup → 95/5 train/val split.
Counts: 14,768 loaded → 14,566 after filter+dedup → 14,293 train / 273 val.
Data lives in the repo's data/ pipeline

Training Procedure

Framework: TRL SFTTrainer with assistant_only_loss=True (loss masked to assistant tokens only via the Qwen chat template).
Adapter: LoRA, attention-only target modules (q_proj, k_proj, v_proj, o_proj).

Hyperparameters

Setting	Value
LoRA rank `r`	16
LoRA alpha	32
LoRA dropout	0.05
Target modules	`q_proj`, `k_proj`, `v_proj`, `o_proj`
Epochs	3
Per-device batch size	8
Max sequence length	1024
Learning rate	2e-4
LR scheduler	cosine
Warmup ratio	0.03
Weight decay	0.0
Optimizer	AdamW (default)
Precision	bfloat16
Gradient checkpointing	enabled

Compute

Hardware: 1× NVIDIA RTX 5090 (32 GB), RunPod
Software: PyTorch 2.x, Transformers 4.4x, TRL 0.18+, PEFT 0.11+
Approximate training time: ~20-25 minutes for 3 epochs over 14.3k pairs

Evaluation

Currently uses only training-loop signals (train loss, eval loss on the 273-pair val split, mean token accuracy). A judge-based persona-fidelity eval (Lesson 8 of the parent project) is planned but not yet attached to this checkpoint.

Framework versions

PEFT 0.11+
TRL 0.18+
Transformers 4.4x
PyTorch 2.x

Downloads last month: 2

Model tree for arunma/monty

Base model

Qwen/Qwen2.5-0.5B

Finetuned

Qwen/Qwen2.5-0.5B-Instruct

Adapter

(710)

this model