kghamilton89
Fix: Add required YAML frontmatter to README for HF Spaces
bb4b68b
---
title: Qwen Fine-tuning on Codeforces CoTs
emoji: 🧠
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: "5.9.1"
app_file: app.py
pinned: false
---
# Qwen2.5-0.5B Fine-tuning on Codeforces CoTs
Fine-tuning Qwen2.5-0.5B-Instruct on the open-r1/codeforces-cots dataset for instruction following with chain-of-thought reasoning.
## Dataset
- **Name**: open-r1/codeforces-cots
- **Size**: ~48K competitive programming problems with chain-of-thought solutions
- **Format**: Chat format with problem descriptions and step-by-step reasoning
## Model
- **Base Model**: Qwen/Qwen2.5-0.5B-Instruct
- **Training Method**: QLoRA (4-bit quantization + LoRA)
- **Target Modules**: All attention and MLP layers
## Setup
1. Create and activate virtual environment:
```bash
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
```
2. Install dependencies:
```bash
pip install -r requirements.txt
```
## Training
### Option 1: Local Training (CPU/GPU)
Run the fine-tuning script locally:
```bash
python finetune.py
```
**Note**: Local CPU training will be very slow. GPU training requires CUDA-compatible hardware.
### Option 2: Hugging Face Spaces with GPU (Recommended)
If you have a Hugging Face Pro license, you can train on GPU using Hugging Face Spaces:
1. See [README_HF_SPACES.md](README_HF_SPACES.md) for detailed deployment instructions
2. Upload this project to a new HF Space with GPU hardware
3. Use the included Gradio interface (`app.py`) to monitor training in real-time
4. Training time on T4 GPU: ~2-3 hours for 1000 steps
This is the **recommended approach** as it provides:
- Access to GPU hardware (T4, A10G, or A100)
- Real-time training monitoring via web interface
- Automatic checkpoint saving
- Easy model download after training
### Training Configuration
- **Batch Size**: 4 per device (with gradient accumulation of 4)
- **Effective Batch Size**: 16
- **Learning Rate**: 2e-4
- **Epochs**: 1
- **Max Sequence Length**: 2048
- **LoRA r**: 16
- **LoRA alpha**: 32
## Output
The fine-tuned model will be saved to `./qwen-codeforces-cots/`
## Usage
After training, you can use the model with:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct")
model = PeftModel.from_pretrained(base_model, "./qwen-codeforces-cots")
tokenizer = AutoTokenizer.from_pretrained("./qwen-codeforces-cots")
messages = [{"role": "user", "content": "Your problem here"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Notes
- The training uses 4-bit quantization to reduce memory requirements
- LoRA allows efficient fine-tuning with minimal trainable parameters
- Training time will vary depending on your hardware