Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
6.1.0
metadata
title: Qwen Fine-tuning on Codeforces CoTs
emoji: 🧠
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.9.1
app_file: app.py
pinned: false
Qwen2.5-0.5B Fine-tuning on Codeforces CoTs
Fine-tuning Qwen2.5-0.5B-Instruct on the open-r1/codeforces-cots dataset for instruction following with chain-of-thought reasoning.
Dataset
- Name: open-r1/codeforces-cots
- Size: ~48K competitive programming problems with chain-of-thought solutions
- Format: Chat format with problem descriptions and step-by-step reasoning
Model
- Base Model: Qwen/Qwen2.5-0.5B-Instruct
- Training Method: QLoRA (4-bit quantization + LoRA)
- Target Modules: All attention and MLP layers
Setup
- Create and activate virtual environment:
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
Training
Option 1: Local Training (CPU/GPU)
Run the fine-tuning script locally:
python finetune.py
Note: Local CPU training will be very slow. GPU training requires CUDA-compatible hardware.
Option 2: Hugging Face Spaces with GPU (Recommended)
If you have a Hugging Face Pro license, you can train on GPU using Hugging Face Spaces:
- See README_HF_SPACES.md for detailed deployment instructions
- Upload this project to a new HF Space with GPU hardware
- Use the included Gradio interface (
app.py) to monitor training in real-time - Training time on T4 GPU: ~2-3 hours for 1000 steps
This is the recommended approach as it provides:
- Access to GPU hardware (T4, A10G, or A100)
- Real-time training monitoring via web interface
- Automatic checkpoint saving
- Easy model download after training
Training Configuration
- Batch Size: 4 per device (with gradient accumulation of 4)
- Effective Batch Size: 16
- Learning Rate: 2e-4
- Epochs: 1
- Max Sequence Length: 2048
- LoRA r: 16
- LoRA alpha: 32
Output
The fine-tuned model will be saved to ./qwen-codeforces-cots/
Usage
After training, you can use the model with:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct")
model = PeftModel.from_pretrained(base_model, "./qwen-codeforces-cots")
tokenizer = AutoTokenizer.from_pretrained("./qwen-codeforces-cots")
messages = [{"role": "user", "content": "Your problem here"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Notes
- The training uses 4-bit quantization to reduce memory requirements
- LoRA allows efficient fine-tuning with minimal trainable parameters
- Training time will vary depending on your hardware