File size: 3,008 Bytes
bb4b68b
 
 
 
 
 
 
 
 
 
 
e261fbe
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
---
title: Qwen Fine-tuning on Codeforces CoTs
emoji: 🧠
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: "5.9.1"
app_file: app.py
pinned: false
---

# Qwen2.5-0.5B Fine-tuning on Codeforces CoTs

Fine-tuning Qwen2.5-0.5B-Instruct on the open-r1/codeforces-cots dataset for instruction following with chain-of-thought reasoning.

## Dataset

- **Name**: open-r1/codeforces-cots
- **Size**: ~48K competitive programming problems with chain-of-thought solutions
- **Format**: Chat format with problem descriptions and step-by-step reasoning

## Model

- **Base Model**: Qwen/Qwen2.5-0.5B-Instruct
- **Training Method**: QLoRA (4-bit quantization + LoRA)
- **Target Modules**: All attention and MLP layers

## Setup

1. Create and activate virtual environment:
```bash
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
```

2. Install dependencies:
```bash
pip install -r requirements.txt
```

## Training

### Option 1: Local Training (CPU/GPU)

Run the fine-tuning script locally:
```bash
python finetune.py
```

**Note**: Local CPU training will be very slow. GPU training requires CUDA-compatible hardware.

### Option 2: Hugging Face Spaces with GPU (Recommended)

If you have a Hugging Face Pro license, you can train on GPU using Hugging Face Spaces:

1. See [README_HF_SPACES.md](README_HF_SPACES.md) for detailed deployment instructions
2. Upload this project to a new HF Space with GPU hardware
3. Use the included Gradio interface (`app.py`) to monitor training in real-time
4. Training time on T4 GPU: ~2-3 hours for 1000 steps

This is the **recommended approach** as it provides:
- Access to GPU hardware (T4, A10G, or A100)
- Real-time training monitoring via web interface
- Automatic checkpoint saving
- Easy model download after training

### Training Configuration

- **Batch Size**: 4 per device (with gradient accumulation of 4)
- **Effective Batch Size**: 16
- **Learning Rate**: 2e-4
- **Epochs**: 1
- **Max Sequence Length**: 2048
- **LoRA r**: 16
- **LoRA alpha**: 32

## Output

The fine-tuned model will be saved to `./qwen-codeforces-cots/`

## Usage

After training, you can use the model with:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct")
model = PeftModel.from_pretrained(base_model, "./qwen-codeforces-cots")
tokenizer = AutoTokenizer.from_pretrained("./qwen-codeforces-cots")

messages = [{"role": "user", "content": "Your problem here"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## Notes

- The training uses 4-bit quantization to reduce memory requirements
- LoRA allows efficient fine-tuning with minimal trainable parameters
- Training time will vary depending on your hardware