qwen-lora-training / README.md
Brad Wilson
Deploy Qwen2.5 LoRA fine-tuning application
51af7d3

A newer version of the Gradio SDK is available: 6.13.0

Upgrade
metadata
title: Qwen2.5 LoRA Fine-tuning
emoji: πŸ€–
colorFrom: purple
colorTo: pink
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
license: apache-2.0

πŸ€– LoRA Fine-tuning for Qwen2.5-7B-Instruct

Train Qwen2.5-7B-Instruct with LoRA on your sysadmin personality dataset using proper Qwen2 chat templates.

Features

  • βœ… Qwen2 Chat Template - Proper system/user/assistant formatting
  • βœ… 4-bit Quantization - QLoRA for memory efficiency
  • βœ… PEFT Integration - Parameter-efficient fine-tuning
  • βœ… Custom System Prompt - Configurable personality
  • βœ… Gradio UI - Easy web interface
  • βœ… Auto Push to Hub - Direct upload after training

Quick Start

  1. Upgrade to GPU: Settings β†’ Hardware β†’ Select GPU (A10G recommended)
  2. Configure Training: Set your dataset and parameters
  3. Set System Prompt: Customize the AI personality
  4. Add HF Token: (Optional) For private datasets or pushing models
  5. Start Training: Click the button and monitor progress

Default Configuration

Optimized for A10G (24GB VRAM):

  • Batch size: 4
  • Gradient accumulation: 4
  • Max sequence length: 2048
  • LoRA rank: 16
  • 4-bit quantization: Enabled

For T4 (16GB VRAM):

  • Reduce batch size to 2
  • Increase gradient accumulation to 8
  • Reduce max sequence length to 1024

Qwen2 Chat Template

The training automatically formats your data using Qwen2's chat template:

<|im_start|>system
You are an experienced Linux system administrator.<|im_end|>
<|im_start|>user
How do I check disk usage?<|im_end|>
<|im_start|>assistant
Use the 'df -h' command...<|im_end|>

Dataset Format

Your dataset should have Q&A pairs in one of these formats:

  • {"question": "...", "answer": "..."}
  • {"instruction": "...", "response": "..."}
  • {"input": "...", "output": "..."}
  • {"text": "..."}

The app auto-detects and converts to proper chat format.

Using Your Fine-tuned Model

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load model
base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
model = PeftModel.from_pretrained(base_model, "crazycog/qwen-lora-sysadmin")
tokenizer = AutoTokenizer.from_pretrained("crazycog/qwen-lora-sysadmin")

# Use with chat template
messages = [
    {"role": "system", "content": "You are an experienced Linux system administrator."},
    {"role": "user", "content": "How do I check memory usage?"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Time & Cost

GPU Cost/Hour Training Time (10k examples) Total Cost
T4 (16GB) $0.60 ~4 hours ~$2.40
A10G (24GB) ⭐ $1.00 ~2 hours ~$2.00
A100 (40GB) $3.00 ~1 hour ~$3.00

System Prompt Examples

Linux SysAdmin:

You are an experienced Linux system administrator with deep knowledge of system operations, troubleshooting, and best practices.

DevOps Engineer:

You are a DevOps engineer specializing in cloud infrastructure, CI/CD, and container orchestration.

Security Expert:

You are a cybersecurity expert specializing in Linux hardening, network security, and threat detection.

Troubleshooting

Out of Memory

  • Enable 4-bit quantization βœ“
  • Reduce batch size to 2
  • Increase gradient accumulation to 8
  • Reduce max sequence length to 1024

Slow Training

  • Verify GPU is enabled in Space settings
  • Upgrade to A10G or A100
  • Reduce max sequence length if most examples are shorter

Upload Failed

  • Check HF token has write permissions
  • Verify repo name doesn't conflict
  • Try creating the repo manually first

License

Apache 2.0