Spaces:
Sleeping
Sleeping
havinashpatil
Finalizing CodeArena RL Benchmark: frontend improvements, GRPO training scripts, and cleaned environment
03a7eb9 Fine-tuning Guide: XCoder-80K Dataset
This guide explains how to fine-tune Ollama models on the XCoder-80K code dataset.
Overview
The finetune_models.py script fine-tunes open-source code models on the XCoder-80K dataset from Hugging Face:
| Ollama Model | HuggingFace Model | Size | Recommended |
|---|---|---|---|
llama3.2:latest |
meta-llama/Llama-2-7b-hf | 7B | β Best for code |
gemma3:4b |
google/gemma-7b | 7B | β Good alternative |
gemma3:1b |
google/gemma-2b | 2B | Lightweight option |
llava:latest |
Not suitable | Multimodal | β Skip (vision-only) |
Dataset: banksy235/XCoder-80K
- 80,000 code examples
- Covers multiple programming languages
- Suitable for code generation and repair
Installation
Quick Install (Recommended)
Windows:
install_finetune.bat
Linux/macOS:
bash install_finetune.sh
Manual Installation
- Install PyTorch with CUDA 12.1 support:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
- Install fine-tuning dependencies:
pip install -r requirements-finetune.txt
- Verify installation:
python -c "import torch; print(f'PyTorch: {torch.__version__}'); print(f'GPU: {torch.cuda.is_available()}')"
Install Hugging Face CLI (Optional)
For easier dataset management:
# macOS/Linux
curl -LsSf https://hf.co/cli/install.sh | bash -s
# Or via pip
pip install huggingface_hub
# Login (for private datasets)
huggingface-cli login
Usage
Option 1: Fine-tune Single Model
Fine-tune Llama-2-7b on XCoder-80K (recommended for fastest start):
python finetune_models.py --model llama3.2 \
--num-epochs 3 \
--batch-size 4 \
--learning-rate 2e-4
Option 2: Fine-tune All Models Sequentially
python finetune_models.py --all-models \
--num-epochs 3 \
--batch-size 4 \
--max-samples 5000
Option 3: Custom Configuration
python finetune_models.py \
--model llama3.2 \
--output-dir ./my_finetuned_models \
--num-epochs 5 \
--batch-size 8 \
--learning-rate 1e-4 \
--max-samples 10000 \
--no-lora # Disable LoRA (full fine-tuning)
Training Arguments Explained
| Argument | Default | Description |
|---|---|---|
--model |
llama3.2 |
Model to fine-tune |
--all-models |
False | Fine-tune all available models |
--output-dir |
./finetuned_models |
Where to save fine-tuned models |
--num-epochs |
3 | Training epochs (more = longer training) |
--batch-size |
4 | Batch size (larger = more VRAM needed) |
--learning-rate |
2e-4 | Learning rate (lower = slower updates) |
--max-samples |
None | Limit samples (None = use all 80K) |
--no-lora |
False | Disable LoRA (full fine-tuning) |
--no-gradient-checkpointing |
False | Disable gradient checkpointing |
Output
After training, models are saved to:
finetuned_models/
βββ llama3_2/
β βββ final/
β β βββ pytorch_model.bin
β β βββ config.json
β β βββ tokenizer.json
β βββ metadata.json
βββ gemma3_4b/
β βββ ...
βββ gemma3_1b/
βββ ...
Using Fine-tuned Models with Ollama
After fine-tuning, you can create custom Ollama models. Create a Modelfile:
FROM llama3.2:latest
# Replace the base model with fine-tuned weights
COPY ./finetuned_models/llama3_2/final /model
# Optional: Set parameters
PARAMETER temperature 0.7
PARAMETER top_k 40
PARAMETER top_p 0.9
PARAMETER repeat_penalty 1.1
Then create and run:
ollama create my-finetuned-llama -f Modelfile
ollama run my-finetuned-llama "your prompt here"
Or use directly in Python:
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "./finetuned_models/llama3_2/final"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
# Use the model
inputs = tokenizer("def fibonacci", return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
print(tokenizer.decode(outputs[0]))
Hardware Requirements
| Configuration | VRAM | Training Speed | Recommended |
|---|---|---|---|
| RTX 4090 (24GB) | 24GB | ~2 hours | β Excellent |
| RTX 4080 (16GB) | 16GB | ~3-4 hours | β Good |
| RTX 4070 (12GB) | 12GB | ~5-6 hours | Acceptable |
| Tesla T4 (16GB) | 16GB | ~4-5 hours | Cloud-friendly |
| CPU only | N/A | ~1-2 days | Not recommended |
Optimization Tips:
- Use
--batch-size 2for GPUs with <12GB VRAM - Enable
--max-samples 1000to train on subset first - LoRA (default) uses 70% less VRAM than full fine-tuning
- Gradient checkpointing (default) reduces VRAM by 30%
Integration with CodeArena RL
To use fine-tuned models with the CodeArena RL environment:
- Export to Ollama (see above)
- Update Dashboard.jsx to use the new model:
const [ollamaModel, setOllamaModel] = useState('my-finetuned-llama'); - Or update ollama_rl_rollout.py:
python ollama_rl_rollout.py --ollama-model my-finetuned-llama
Monitoring Training
Training logs are saved to TensorBoard format:
tensorboard --logdir ./finetuned_models/llama3_2
Open http://localhost:6006 to monitor:
- Training loss
- Learning rate schedules
- GPU usage
Troubleshooting
Out of Memory (OOM)
# Reduce batch size
python finetune_models.py --batch-size 2
# Or limit samples
python finetune_models.py --max-samples 1000
Slow Training
- Ensure GPU is being used:
nvidia-smi - Use smaller model:
--model gemma3:1b - Reduce max_length in tokenization (in code)
Dataset Not Found
# Download manually first
python -c "from datasets import load_dataset; load_dataset('banksy235/XCoder-80K')"
# Or use Hugging Face CLI
hf download banksy235/XCoder-80K
Dataset Structure
The XCoder-80K dataset contains code examples with metadata. The script automatically handles:
- Multi-language code (Python, JavaScript, Java, C++, etc.)
- Code with comments and docstrings
- Various programming tasks (algorithms, utilities, etc.)
Next Steps
- Run fine-tuning:
python finetune_models.py --model llama3.2 - Monitor training:
tensorboard --logdir ./finetuned_models/llama3_2 - Export to Ollama: Create custom Modelfile and
ollama create - Test in CodeArena: Update dashboard to use fine-tuned model
- Measure improvements: Run
python plot_rewards.pyto see RL performance gains