Spaces:

ZennyKenny
/

claude-code-fine-tune

Sleeping

App Files Files Community

claude-code-fine-tune / README.md

kghamilton89

Fix: Add required YAML frontmatter to README for HF Spaces

bb4b68b 6 days ago

preview code

raw

history blame contribute delete

3.01 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

metadata

title: Qwen Fine-tuning on Codeforces CoTs
emoji: 🧠
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.9.1
app_file: app.py
pinned: false

Qwen2.5-0.5B Fine-tuning on Codeforces CoTs

Fine-tuning Qwen2.5-0.5B-Instruct on the open-r1/codeforces-cots dataset for instruction following with chain-of-thought reasoning.

Dataset

Name: open-r1/codeforces-cots
Size: ~48K competitive programming problems with chain-of-thought solutions
Format: Chat format with problem descriptions and step-by-step reasoning

Model

Base Model: Qwen/Qwen2.5-0.5B-Instruct
Training Method: QLoRA (4-bit quantization + LoRA)
Target Modules: All attention and MLP layers

Setup

Create and activate virtual environment:

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Training

Option 1: Local Training (CPU/GPU)

Run the fine-tuning script locally:

python finetune.py

Note: Local CPU training will be very slow. GPU training requires CUDA-compatible hardware.

Option 2: Hugging Face Spaces with GPU (Recommended)

If you have a Hugging Face Pro license, you can train on GPU using Hugging Face Spaces:

See README_HF_SPACES.md for detailed deployment instructions
Upload this project to a new HF Space with GPU hardware
Use the included Gradio interface (app.py) to monitor training in real-time
Training time on T4 GPU: ~2-3 hours for 1000 steps

This is the recommended approach as it provides:

Access to GPU hardware (T4, A10G, or A100)
Real-time training monitoring via web interface
Automatic checkpoint saving
Easy model download after training

Training Configuration

Batch Size: 4 per device (with gradient accumulation of 4)
Effective Batch Size: 16
Learning Rate: 2e-4
Epochs: 1
Max Sequence Length: 2048
LoRA r: 16
LoRA alpha: 32

Output

The fine-tuned model will be saved to ./qwen-codeforces-cots/

Usage

After training, you can use the model with:

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct")
model = PeftModel.from_pretrained(base_model, "./qwen-codeforces-cots")
tokenizer = AutoTokenizer.from_pretrained("./qwen-codeforces-cots")

messages = [{"role": "user", "content": "Your problem here"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Notes

The training uses 4-bit quantization to reduce memory requirements
LoRA allows efficient fine-tuning with minimal trainable parameters
Training time will vary depending on your hardware