kghamilton89
Fix: Add required YAML frontmatter to README for HF Spaces
bb4b68b

A newer version of the Gradio SDK is available: 6.1.0

Upgrade
metadata
title: Qwen Fine-tuning on Codeforces CoTs
emoji: 🧠
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.9.1
app_file: app.py
pinned: false

Qwen2.5-0.5B Fine-tuning on Codeforces CoTs

Fine-tuning Qwen2.5-0.5B-Instruct on the open-r1/codeforces-cots dataset for instruction following with chain-of-thought reasoning.

Dataset

  • Name: open-r1/codeforces-cots
  • Size: ~48K competitive programming problems with chain-of-thought solutions
  • Format: Chat format with problem descriptions and step-by-step reasoning

Model

  • Base Model: Qwen/Qwen2.5-0.5B-Instruct
  • Training Method: QLoRA (4-bit quantization + LoRA)
  • Target Modules: All attention and MLP layers

Setup

  1. Create and activate virtual environment:
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt

Training

Option 1: Local Training (CPU/GPU)

Run the fine-tuning script locally:

python finetune.py

Note: Local CPU training will be very slow. GPU training requires CUDA-compatible hardware.

Option 2: Hugging Face Spaces with GPU (Recommended)

If you have a Hugging Face Pro license, you can train on GPU using Hugging Face Spaces:

  1. See README_HF_SPACES.md for detailed deployment instructions
  2. Upload this project to a new HF Space with GPU hardware
  3. Use the included Gradio interface (app.py) to monitor training in real-time
  4. Training time on T4 GPU: ~2-3 hours for 1000 steps

This is the recommended approach as it provides:

  • Access to GPU hardware (T4, A10G, or A100)
  • Real-time training monitoring via web interface
  • Automatic checkpoint saving
  • Easy model download after training

Training Configuration

  • Batch Size: 4 per device (with gradient accumulation of 4)
  • Effective Batch Size: 16
  • Learning Rate: 2e-4
  • Epochs: 1
  • Max Sequence Length: 2048
  • LoRA r: 16
  • LoRA alpha: 32

Output

The fine-tuned model will be saved to ./qwen-codeforces-cots/

Usage

After training, you can use the model with:

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct")
model = PeftModel.from_pretrained(base_model, "./qwen-codeforces-cots")
tokenizer = AutoTokenizer.from_pretrained("./qwen-codeforces-cots")

messages = [{"role": "user", "content": "Your problem here"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Notes

  • The training uses 4-bit quantization to reduce memory requirements
  • LoRA allows efficient fine-tuning with minimal trainable parameters
  • Training time will vary depending on your hardware