# Fine-tuning Guide: XCoder-80K Dataset

This guide explains how to fine-tune Ollama models on the XCoder-80K code dataset.

## Overview

The `finetune_models.py` script fine-tunes open-source code models on the XCoder-80K dataset from Hugging Face:

| Ollama Model | HuggingFace Model | Size | Recommended |
|---|---|---|---|
| `llama3.2:latest` | meta-llama/Llama-2-7b-hf | 7B | ✓ Best for code |
| `gemma3:4b` | google/gemma-7b | 7B | ✓ Good alternative |
| `gemma3:1b` | google/gemma-2b | 2B | Lightweight option |
| `llava:latest` | Not suitable | Multimodal | ✗ Skip (vision-only) |

**Dataset:** [banksy235/XCoder-80K](https://huggingface.co/datasets/banksy235/XCoder-80K)
- 80,000 code examples
- Covers multiple programming languages
- Suitable for code generation and repair

## Installation

### Quick Install (Recommended)

**Windows:**
```bash
install_finetune.bat
```

**Linux/macOS:**
```bash
bash install_finetune.sh
```

### Manual Installation

1. **Install PyTorch with CUDA 12.1 support:**
```bash
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
```

2. **Install fine-tuning dependencies:**
```bash
pip install -r requirements-finetune.txt
```

3. **Verify installation:**
```bash
python -c "import torch; print(f'PyTorch: {torch.__version__}'); print(f'GPU: {torch.cuda.is_available()}')"
```

### Install Hugging Face CLI (Optional)

For easier dataset management:
```bash
# macOS/Linux
curl -LsSf https://hf.co/cli/install.sh | bash -s

# Or via pip
pip install huggingface_hub

# Login (for private datasets)
huggingface-cli login
```

## Usage

### Option 1: Fine-tune Single Model

Fine-tune Llama-2-7b on XCoder-80K (recommended for fastest start):
```bash
python finetune_models.py --model llama3.2 \
  --num-epochs 3 \
  --batch-size 4 \
  --learning-rate 2e-4
```

### Option 2: Fine-tune All Models Sequentially

```bash
python finetune_models.py --all-models \
  --num-epochs 3 \
  --batch-size 4 \
  --max-samples 5000
```

### Option 3: Custom Configuration

```bash
python finetune_models.py \
  --model llama3.2 \
  --output-dir ./my_finetuned_models \
  --num-epochs 5 \
  --batch-size 8 \
  --learning-rate 1e-4 \
  --max-samples 10000 \
  --no-lora  # Disable LoRA (full fine-tuning)
```

## Training Arguments Explained

| Argument | Default | Description |
|---|---|---|
| `--model` | `llama3.2` | Model to fine-tune |
| `--all-models` | False | Fine-tune all available models |
| `--output-dir` | `./finetuned_models` | Where to save fine-tuned models |
| `--num-epochs` | 3 | Training epochs (more = longer training) |
| `--batch-size` | 4 | Batch size (larger = more VRAM needed) |
| `--learning-rate` | 2e-4 | Learning rate (lower = slower updates) |
| `--max-samples` | None | Limit samples (None = use all 80K) |
| `--no-lora` | False | Disable LoRA (full fine-tuning) |
| `--no-gradient-checkpointing` | False | Disable gradient checkpointing |

## Output

After training, models are saved to:
```
finetuned_models/
├── llama3_2/
│   ├── final/
│   │   ├── pytorch_model.bin
│   │   ├── config.json
│   │   └── tokenizer.json
│   └── metadata.json
├── gemma3_4b/
│   └── ...
└── gemma3_1b/
    └── ...
```

## Using Fine-tuned Models with Ollama

After fine-tuning, you can create custom Ollama models. Create a `Modelfile`:

```dockerfile
FROM llama3.2:latest

# Replace the base model with fine-tuned weights
COPY ./finetuned_models/llama3_2/final /model

# Optional: Set parameters
PARAMETER temperature 0.7
PARAMETER top_k 40
PARAMETER top_p 0.9
PARAMETER repeat_penalty 1.1
```

Then create and run:
```bash
ollama create my-finetuned-llama -f Modelfile
ollama run my-finetuned-llama "your prompt here"
```

Or use directly in Python:
```python
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "./finetuned_models/llama3_2/final"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

# Use the model
inputs = tokenizer("def fibonacci", return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
print(tokenizer.decode(outputs[0]))
```

## Hardware Requirements

| Configuration | VRAM | Training Speed | Recommended |
|---|---|---|---|
| RTX 4090 (24GB) | 24GB | ~2 hours | ✓ Excellent |
| RTX 4080 (16GB) | 16GB | ~3-4 hours | ✓ Good |
| RTX 4070 (12GB) | 12GB | ~5-6 hours | Acceptable |
| Tesla T4 (16GB) | 16GB | ~4-5 hours | Cloud-friendly |
| CPU only | N/A | ~1-2 days | Not recommended |

**Optimization Tips:**
- Use `--batch-size 2` for GPUs with <12GB VRAM
- Enable `--max-samples 1000` to train on subset first
- LoRA (default) uses 70% less VRAM than full fine-tuning
- Gradient checkpointing (default) reduces VRAM by 30%

## Integration with CodeArena RL

To use fine-tuned models with the CodeArena RL environment:

1. **Export to Ollama** (see above)
2. **Update Dashboard.jsx** to use the new model:
   ```javascript
   const [ollamaModel, setOllamaModel] = useState('my-finetuned-llama');
   ```
3. **Or update ollama_rl_rollout.py:**
   ```bash
   python ollama_rl_rollout.py --ollama-model my-finetuned-llama
   ```

## Monitoring Training

Training logs are saved to TensorBoard format:
```bash
tensorboard --logdir ./finetuned_models/llama3_2
```

Open http://localhost:6006 to monitor:
- Training loss
- Learning rate schedules
- GPU usage

## Troubleshooting

### Out of Memory (OOM)
```bash
# Reduce batch size
python finetune_models.py --batch-size 2

# Or limit samples
python finetune_models.py --max-samples 1000
```

### Slow Training
- Ensure GPU is being used: `nvidia-smi`
- Use smaller model: `--model gemma3:1b`
- Reduce max_length in tokenization (in code)

### Dataset Not Found
```bash
# Download manually first
python -c "from datasets import load_dataset; load_dataset('banksy235/XCoder-80K')"

# Or use Hugging Face CLI
hf download banksy235/XCoder-80K
```

## Dataset Structure

The XCoder-80K dataset contains code examples with metadata. The script automatically handles:
- Multi-language code (Python, JavaScript, Java, C++, etc.)
- Code with comments and docstrings
- Various programming tasks (algorithms, utilities, etc.)

## Next Steps

1. **Run fine-tuning:** `python finetune_models.py --model llama3.2`
2. **Monitor training:** `tensorboard --logdir ./finetuned_models/llama3_2`
3. **Export to Ollama:** Create custom Modelfile and `ollama create`
4. **Test in CodeArena:** Update dashboard to use fine-tuned model
5. **Measure improvements:** Run `python plot_rewards.py` to see RL performance gains

## References

- [XCoder-80K Dataset](https://huggingface.co/datasets/banksy235/XCoder-80K)
- [Hugging Face Transformers](https://huggingface.co/docs/transformers)
- [TRL (Transformer Reinforcement Learning)](https://github.com/huggingface/trl)
- [Ollama Documentation](https://ollama.ai)
- [PEFT (Parameter-Efficient Fine-Tuning)](https://github.com/huggingface/peft)