SmolQwen-0.5B-Chat
SmolQwen-0.5B-Chat is a lightweight, fine-tuned instruction-following model designed for edge devices and local deployment. It is based on Qwen2.5-0.5B-Instruct and has been fine-tuned using LoRA (Low-Rank Adaptation) on the Magpie-Ultra subset of the SmolTalk2 dataset.
This model is optimized to be a "smart assistant" that adheres to strict constraints (e.g., negative instructions) and writes robust, production-style code with error handling, correcting common behaviors found in the base model.
Model Details
Model Description
- Developed by: LVN-Architech
- Model type: Causal Language Model (Fine-tuned with LoRA)
- Language(s): English
- License: Apache 2.0
- Finetuned from model: Qwen/Qwen2.5-0.5B-Instruct
Training Details
Training Data
The model was trained on 10,000 examples from the Magpie-Ultra subset...
Uses
Direct Use
This model is designed for:
- Edge Computing: Running on Raspberry Pi, mobile phones, or consumer laptops without a dedicated GPU.
- Smart Home Commands: Parsing natural language into JSON or strict commands.
- Coding Assistance: Writing simple Python functions with built-in error handling and input validation.
- Chatbots: General conversation with a consistent, helpful persona.
Out-of-Scope Use
- Complex Reasoning: Do not use for advanced math, logic puzzles, or medical/legal advice. The 0.5B parameter size limits its "world knowledge."
- Long Context: Performance may degrade with extremely long prompts (>8k tokens).
π Key Improvements vs. Base Model
We benchmarked this adapter against the original Qwen-0.5B-Instruct. The fine-tuning significantly alters behavior in three key areas:
| Test Case | Base Model (Qwen-0.5B) | SmolQwen-0.5B-Chat (This Model) |
|---|---|---|
| Instruction: "Decline a job offer." | β Failed: Wrote an email accepting an interview request. | β Success: Correctly wrote a polite email declining the offer. |
| Coding Style | β οΈ Basic: Writes simple, functional code. | β
Robust: Defaults to adding try/except blocks and input validation. |
| Tone | π Dry: Sounds like a textbook. | π£οΈ Engaging: Uses warmer language and better analogies. |
How to Get Started with the Model
You can use this model with the peft library. It automatically loads the base model and applies the adapter.
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# 1. Load Base Model
base_model_name = "Qwen/Qwen2.5-0.5B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
base_model_name,
torch_dtype=torch.float16,
device_map="auto"
)
# 2. Load the Adapter
adapter_model_name = "LVN-Architech/SmolQwen-0.5B-Chat"
model = PeftModel.from_pretrained(model, adapter_model_name)
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
# 3. Run Inference
prompt = "Write a professional email declining a job offer politely."
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to("cuda")
# Use temperature=0.1 for consistent results
outputs = model.generate(**inputs, max_new_tokens=200, temperature=0.1)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
- Downloads last month
- 1