monty β€” a LoRA persona adapter for Qwen2.5-0.5B-Instruct

A small LoRA adapter that gives Qwen2.5-0.5B-Instruct an opinionated, slightly-grumpy "Monty" voice. Trained as the Phase A milestone of learn-you-an-sft β€” a from-scratch tour of supervised fine-tuning.

The base model weights are not distributed here. This repo ships only the LoRA adapter (adapter_model.safetensors, ~tens of MB) plus tokenizer config. You merge it onto the base at load time.

Model Details

Model Description

  • Developed by: Arun Manivannan (@arunma)
  • Model type: Causal language model, LoRA adapter (PEFT)
  • Language(s) (NLP): English
  • License: Apache 2.0 (inherits from base model)
  • Finetuned from model: Qwen/Qwen2.5-0.5B-Instruct

Model Sources

Uses

Direct Use

Educational. Demonstrates how a few thousand persona-shaped Q&A pairs can shift a small instruction-tuned model's voice and disposition without touching factual knowledge.

Out-of-Scope Use

  • Production assistants. This is a 0.5B model trained on ~14k synthetic pairs β€” it will hallucinate, contradict itself, and produce dated information.
  • Safety-critical workflows.
  • Anything where you need a model that hasn't been fine-tuned on synthetic data from another model (Gemini) without a separate license review.

Bias, Risks, and Limitations

  • Persona injection over knowledge: the adapter changes how the model talks, not what it knows. Underlying factual gaps and biases of Qwen2.5-0.5B remain.
  • Synthetic data lineage: training pairs were distilled from Gemini Pro. Any systematic biases in Gemini's outputs propagate here.
  • Small corpus, small model: 14k examples on a 0.5B base produces a noticeably opinionated voice but does not guarantee consistency across topics.
  • No safety tuning: no RLHF/DPO step. The adapter does not refuse harmful requests any better than the base.

Recommendations

Treat outputs as drafts, not facts. If you fork this for your own persona, plan on a separate evaluation pass.

How to Get Started

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_id = "Qwen/Qwen2.5-0.5B-Instruct"
adapter_id = "arunma/monty"

tokenizer = AutoTokenizer.from_pretrained(adapter_id)
base = AutoModelForCausalLM.from_pretrained(base_id, torch_dtype="bfloat16", device_map="auto")
model = PeftModel.from_pretrained(base, adapter_id)
model.eval()

messages = [
    {"role": "system", "content": "You are Monty."},
    {"role": "user", "content": "Should I learn Rust?"},
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
out = model.generate(inputs, max_new_tokens=200, do_sample=True, temperature=0.7, top_p=0.9)
print(tokenizer.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))

Training Details

Training Data

  • Source: synthetic Q&A pairs generated by Gemini Pro using a Monty persona prompt, plus a small handcrafted seed.
  • Pipeline: ingest β†’ normalize β†’ language filter (fastText lid.176, English only) β†’ MinHash near-dedup β†’ 95/5 train/val split.
  • Counts: 14,768 loaded β†’ 14,566 after filter+dedup β†’ 14,293 train / 273 val.
  • Data lives in the repo's data/ pipeline

Training Procedure

  • Framework: TRL SFTTrainer with assistant_only_loss=True (loss masked to assistant tokens only via the Qwen chat template).
  • Adapter: LoRA, attention-only target modules (q_proj, k_proj, v_proj, o_proj).

Hyperparameters

Setting Value
LoRA rank r 16
LoRA alpha 32
LoRA dropout 0.05
Target modules q_proj, k_proj, v_proj, o_proj
Epochs 3
Per-device batch size 8
Max sequence length 1024
Learning rate 2e-4
LR scheduler cosine
Warmup ratio 0.03
Weight decay 0.0
Optimizer AdamW (default)
Precision bfloat16
Gradient checkpointing enabled

Compute

  • Hardware: 1Γ— NVIDIA RTX 5090 (32 GB), RunPod
  • Software: PyTorch 2.x, Transformers 4.4x, TRL 0.18+, PEFT 0.11+
  • Approximate training time: ~20-25 minutes for 3 epochs over 14.3k pairs

Evaluation

Currently uses only training-loop signals (train loss, eval loss on the 273-pair val split, mean token accuracy). A judge-based persona-fidelity eval (Lesson 8 of the parent project) is planned but not yet attached to this checkpoint.

Framework versions

  • PEFT 0.11+
  • TRL 0.18+
  • Transformers 4.4x
  • PyTorch 2.x
Downloads last month
59
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for arunma/monty

Adapter
(609)
this model