Spaces:
Runtime error
Runtime error
A newer version of the Gradio SDK is available: 6.13.0
metadata
title: Qwen2.5 LoRA Fine-tuning
emoji: π€
colorFrom: purple
colorTo: pink
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
license: apache-2.0
π€ LoRA Fine-tuning for Qwen2.5-7B-Instruct
Train Qwen2.5-7B-Instruct with LoRA on your sysadmin personality dataset using proper Qwen2 chat templates.
Features
- β Qwen2 Chat Template - Proper system/user/assistant formatting
- β 4-bit Quantization - QLoRA for memory efficiency
- β PEFT Integration - Parameter-efficient fine-tuning
- β Custom System Prompt - Configurable personality
- β Gradio UI - Easy web interface
- β Auto Push to Hub - Direct upload after training
Quick Start
- Upgrade to GPU: Settings β Hardware β Select GPU (A10G recommended)
- Configure Training: Set your dataset and parameters
- Set System Prompt: Customize the AI personality
- Add HF Token: (Optional) For private datasets or pushing models
- Start Training: Click the button and monitor progress
Default Configuration
Optimized for A10G (24GB VRAM):
- Batch size: 4
- Gradient accumulation: 4
- Max sequence length: 2048
- LoRA rank: 16
- 4-bit quantization: Enabled
For T4 (16GB VRAM):
- Reduce batch size to 2
- Increase gradient accumulation to 8
- Reduce max sequence length to 1024
Qwen2 Chat Template
The training automatically formats your data using Qwen2's chat template:
<|im_start|>system
You are an experienced Linux system administrator.<|im_end|>
<|im_start|>user
How do I check disk usage?<|im_end|>
<|im_start|>assistant
Use the 'df -h' command...<|im_end|>
Dataset Format
Your dataset should have Q&A pairs in one of these formats:
{"question": "...", "answer": "..."}{"instruction": "...", "response": "..."}{"input": "...", "output": "..."}{"text": "..."}
The app auto-detects and converts to proper chat format.
Using Your Fine-tuned Model
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load model
base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
model = PeftModel.from_pretrained(base_model, "crazycog/qwen-lora-sysadmin")
tokenizer = AutoTokenizer.from_pretrained("crazycog/qwen-lora-sysadmin")
# Use with chat template
messages = [
{"role": "system", "content": "You are an experienced Linux system administrator."},
{"role": "user", "content": "How do I check memory usage?"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Training Time & Cost
| GPU | Cost/Hour | Training Time (10k examples) | Total Cost |
|---|---|---|---|
| T4 (16GB) | $0.60 | ~4 hours | ~$2.40 |
| A10G (24GB) β | $1.00 | ~2 hours | ~$2.00 |
| A100 (40GB) | $3.00 | ~1 hour | ~$3.00 |
System Prompt Examples
Linux SysAdmin:
You are an experienced Linux system administrator with deep knowledge of system operations, troubleshooting, and best practices.
DevOps Engineer:
You are a DevOps engineer specializing in cloud infrastructure, CI/CD, and container orchestration.
Security Expert:
You are a cybersecurity expert specializing in Linux hardening, network security, and threat detection.
Troubleshooting
Out of Memory
- Enable 4-bit quantization β
- Reduce batch size to 2
- Increase gradient accumulation to 8
- Reduce max sequence length to 1024
Slow Training
- Verify GPU is enabled in Space settings
- Upgrade to A10G or A100
- Reduce max sequence length if most examples are shorter
Upload Failed
- Check HF token has write permissions
- Verify repo name doesn't conflict
- Try creating the repo manually first
License
Apache 2.0