File size: 7,001 Bytes
03a7eb9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
# Fine-tuning Guide: XCoder-80K Dataset

This guide explains how to fine-tune Ollama models on the XCoder-80K code dataset.

## Overview

The `finetune_models.py` script fine-tunes open-source code models on the XCoder-80K dataset from Hugging Face:

| Ollama Model | HuggingFace Model | Size | Recommended |
|---|---|---|---|
| `llama3.2:latest` | meta-llama/Llama-2-7b-hf | 7B | βœ“ Best for code |
| `gemma3:4b` | google/gemma-7b | 7B | βœ“ Good alternative |
| `gemma3:1b` | google/gemma-2b | 2B | Lightweight option |
| `llava:latest` | Not suitable | Multimodal | βœ— Skip (vision-only) |

**Dataset:** [banksy235/XCoder-80K](https://huggingface.co/datasets/banksy235/XCoder-80K)
- 80,000 code examples
- Covers multiple programming languages
- Suitable for code generation and repair

## Installation

### Quick Install (Recommended)

**Windows:**
```bash
install_finetune.bat
```

**Linux/macOS:**
```bash
bash install_finetune.sh
```

### Manual Installation

1. **Install PyTorch with CUDA 12.1 support:**
```bash
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
```

2. **Install fine-tuning dependencies:**
```bash
pip install -r requirements-finetune.txt
```

3. **Verify installation:**
```bash
python -c "import torch; print(f'PyTorch: {torch.__version__}'); print(f'GPU: {torch.cuda.is_available()}')"
```

### Install Hugging Face CLI (Optional)

For easier dataset management:
```bash
# macOS/Linux
curl -LsSf https://hf.co/cli/install.sh | bash -s

# Or via pip
pip install huggingface_hub

# Login (for private datasets)
huggingface-cli login
```

## Usage

### Option 1: Fine-tune Single Model

Fine-tune Llama-2-7b on XCoder-80K (recommended for fastest start):
```bash
python finetune_models.py --model llama3.2 \
  --num-epochs 3 \
  --batch-size 4 \
  --learning-rate 2e-4
```

### Option 2: Fine-tune All Models Sequentially

```bash
python finetune_models.py --all-models \
  --num-epochs 3 \
  --batch-size 4 \
  --max-samples 5000
```

### Option 3: Custom Configuration

```bash
python finetune_models.py \
  --model llama3.2 \
  --output-dir ./my_finetuned_models \
  --num-epochs 5 \
  --batch-size 8 \
  --learning-rate 1e-4 \
  --max-samples 10000 \
  --no-lora  # Disable LoRA (full fine-tuning)
```

## Training Arguments Explained

| Argument | Default | Description |
|---|---|---|
| `--model` | `llama3.2` | Model to fine-tune |
| `--all-models` | False | Fine-tune all available models |
| `--output-dir` | `./finetuned_models` | Where to save fine-tuned models |
| `--num-epochs` | 3 | Training epochs (more = longer training) |
| `--batch-size` | 4 | Batch size (larger = more VRAM needed) |
| `--learning-rate` | 2e-4 | Learning rate (lower = slower updates) |
| `--max-samples` | None | Limit samples (None = use all 80K) |
| `--no-lora` | False | Disable LoRA (full fine-tuning) |
| `--no-gradient-checkpointing` | False | Disable gradient checkpointing |

## Output

After training, models are saved to:
```
finetuned_models/
β”œβ”€β”€ llama3_2/
β”‚   β”œβ”€β”€ final/
β”‚   β”‚   β”œβ”€β”€ pytorch_model.bin
β”‚   β”‚   β”œβ”€β”€ config.json
β”‚   β”‚   └── tokenizer.json
β”‚   └── metadata.json
β”œβ”€β”€ gemma3_4b/
β”‚   └── ...
└── gemma3_1b/
    └── ...
```

## Using Fine-tuned Models with Ollama

After fine-tuning, you can create custom Ollama models. Create a `Modelfile`:

```dockerfile
FROM llama3.2:latest

# Replace the base model with fine-tuned weights
COPY ./finetuned_models/llama3_2/final /model

# Optional: Set parameters
PARAMETER temperature 0.7
PARAMETER top_k 40
PARAMETER top_p 0.9
PARAMETER repeat_penalty 1.1
```

Then create and run:
```bash
ollama create my-finetuned-llama -f Modelfile
ollama run my-finetuned-llama "your prompt here"
```

Or use directly in Python:
```python
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "./finetuned_models/llama3_2/final"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

# Use the model
inputs = tokenizer("def fibonacci", return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
print(tokenizer.decode(outputs[0]))
```

## Hardware Requirements

| Configuration | VRAM | Training Speed | Recommended |
|---|---|---|---|
| RTX 4090 (24GB) | 24GB | ~2 hours | βœ“ Excellent |
| RTX 4080 (16GB) | 16GB | ~3-4 hours | βœ“ Good |
| RTX 4070 (12GB) | 12GB | ~5-6 hours | Acceptable |
| Tesla T4 (16GB) | 16GB | ~4-5 hours | Cloud-friendly |
| CPU only | N/A | ~1-2 days | Not recommended |

**Optimization Tips:**
- Use `--batch-size 2` for GPUs with <12GB VRAM
- Enable `--max-samples 1000` to train on subset first
- LoRA (default) uses 70% less VRAM than full fine-tuning
- Gradient checkpointing (default) reduces VRAM by 30%

## Integration with CodeArena RL

To use fine-tuned models with the CodeArena RL environment:

1. **Export to Ollama** (see above)
2. **Update Dashboard.jsx** to use the new model:
   ```javascript
   const [ollamaModel, setOllamaModel] = useState('my-finetuned-llama');
   ```
3. **Or update ollama_rl_rollout.py:**
   ```bash
   python ollama_rl_rollout.py --ollama-model my-finetuned-llama
   ```

## Monitoring Training

Training logs are saved to TensorBoard format:
```bash
tensorboard --logdir ./finetuned_models/llama3_2
```

Open http://localhost:6006 to monitor:
- Training loss
- Learning rate schedules
- GPU usage

## Troubleshooting

### Out of Memory (OOM)
```bash
# Reduce batch size
python finetune_models.py --batch-size 2

# Or limit samples
python finetune_models.py --max-samples 1000
```

### Slow Training
- Ensure GPU is being used: `nvidia-smi`
- Use smaller model: `--model gemma3:1b`
- Reduce max_length in tokenization (in code)

### Dataset Not Found
```bash
# Download manually first
python -c "from datasets import load_dataset; load_dataset('banksy235/XCoder-80K')"

# Or use Hugging Face CLI
hf download banksy235/XCoder-80K
```

## Dataset Structure

The XCoder-80K dataset contains code examples with metadata. The script automatically handles:
- Multi-language code (Python, JavaScript, Java, C++, etc.)
- Code with comments and docstrings
- Various programming tasks (algorithms, utilities, etc.)

## Next Steps

1. **Run fine-tuning:** `python finetune_models.py --model llama3.2`
2. **Monitor training:** `tensorboard --logdir ./finetuned_models/llama3_2`
3. **Export to Ollama:** Create custom Modelfile and `ollama create`
4. **Test in CodeArena:** Update dashboard to use fine-tuned model
5. **Measure improvements:** Run `python plot_rewards.py` to see RL performance gains

## References

- [XCoder-80K Dataset](https://huggingface.co/datasets/banksy235/XCoder-80K)
- [Hugging Face Transformers](https://huggingface.co/docs/transformers)
- [TRL (Transformer Reinforcement Learning)](https://github.com/huggingface/trl)
- [Ollama Documentation](https://ollama.ai)
- [PEFT (Parameter-Efficient Fine-Tuning)](https://github.com/huggingface/peft)