Spaces:
Sleeping
Sleeping
havinashpatil
Finalizing CodeArena RL Benchmark: frontend improvements, GRPO training scripts, and cleaned environment
03a7eb9 | # Fine-tuning Guide: XCoder-80K Dataset | |
| This guide explains how to fine-tune Ollama models on the XCoder-80K code dataset. | |
| ## Overview | |
| The `finetune_models.py` script fine-tunes open-source code models on the XCoder-80K dataset from Hugging Face: | |
| | Ollama Model | HuggingFace Model | Size | Recommended | | |
| |---|---|---|---| | |
| | `llama3.2:latest` | meta-llama/Llama-2-7b-hf | 7B | β Best for code | | |
| | `gemma3:4b` | google/gemma-7b | 7B | β Good alternative | | |
| | `gemma3:1b` | google/gemma-2b | 2B | Lightweight option | | |
| | `llava:latest` | Not suitable | Multimodal | β Skip (vision-only) | | |
| **Dataset:** [banksy235/XCoder-80K](https://huggingface.co/datasets/banksy235/XCoder-80K) | |
| - 80,000 code examples | |
| - Covers multiple programming languages | |
| - Suitable for code generation and repair | |
| ## Installation | |
| ### Quick Install (Recommended) | |
| **Windows:** | |
| ```bash | |
| install_finetune.bat | |
| ``` | |
| **Linux/macOS:** | |
| ```bash | |
| bash install_finetune.sh | |
| ``` | |
| ### Manual Installation | |
| 1. **Install PyTorch with CUDA 12.1 support:** | |
| ```bash | |
| pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 | |
| ``` | |
| 2. **Install fine-tuning dependencies:** | |
| ```bash | |
| pip install -r requirements-finetune.txt | |
| ``` | |
| 3. **Verify installation:** | |
| ```bash | |
| python -c "import torch; print(f'PyTorch: {torch.__version__}'); print(f'GPU: {torch.cuda.is_available()}')" | |
| ``` | |
| ### Install Hugging Face CLI (Optional) | |
| For easier dataset management: | |
| ```bash | |
| # macOS/Linux | |
| curl -LsSf https://hf.co/cli/install.sh | bash -s | |
| # Or via pip | |
| pip install huggingface_hub | |
| # Login (for private datasets) | |
| huggingface-cli login | |
| ``` | |
| ## Usage | |
| ### Option 1: Fine-tune Single Model | |
| Fine-tune Llama-2-7b on XCoder-80K (recommended for fastest start): | |
| ```bash | |
| python finetune_models.py --model llama3.2 \ | |
| --num-epochs 3 \ | |
| --batch-size 4 \ | |
| --learning-rate 2e-4 | |
| ``` | |
| ### Option 2: Fine-tune All Models Sequentially | |
| ```bash | |
| python finetune_models.py --all-models \ | |
| --num-epochs 3 \ | |
| --batch-size 4 \ | |
| --max-samples 5000 | |
| ``` | |
| ### Option 3: Custom Configuration | |
| ```bash | |
| python finetune_models.py \ | |
| --model llama3.2 \ | |
| --output-dir ./my_finetuned_models \ | |
| --num-epochs 5 \ | |
| --batch-size 8 \ | |
| --learning-rate 1e-4 \ | |
| --max-samples 10000 \ | |
| --no-lora # Disable LoRA (full fine-tuning) | |
| ``` | |
| ## Training Arguments Explained | |
| | Argument | Default | Description | | |
| |---|---|---| | |
| | `--model` | `llama3.2` | Model to fine-tune | | |
| | `--all-models` | False | Fine-tune all available models | | |
| | `--output-dir` | `./finetuned_models` | Where to save fine-tuned models | | |
| | `--num-epochs` | 3 | Training epochs (more = longer training) | | |
| | `--batch-size` | 4 | Batch size (larger = more VRAM needed) | | |
| | `--learning-rate` | 2e-4 | Learning rate (lower = slower updates) | | |
| | `--max-samples` | None | Limit samples (None = use all 80K) | | |
| | `--no-lora` | False | Disable LoRA (full fine-tuning) | | |
| | `--no-gradient-checkpointing` | False | Disable gradient checkpointing | | |
| ## Output | |
| After training, models are saved to: | |
| ``` | |
| finetuned_models/ | |
| βββ llama3_2/ | |
| β βββ final/ | |
| β β βββ pytorch_model.bin | |
| β β βββ config.json | |
| β β βββ tokenizer.json | |
| β βββ metadata.json | |
| βββ gemma3_4b/ | |
| β βββ ... | |
| βββ gemma3_1b/ | |
| βββ ... | |
| ``` | |
| ## Using Fine-tuned Models with Ollama | |
| After fine-tuning, you can create custom Ollama models. Create a `Modelfile`: | |
| ```dockerfile | |
| FROM llama3.2:latest | |
| # Replace the base model with fine-tuned weights | |
| COPY ./finetuned_models/llama3_2/final /model | |
| # Optional: Set parameters | |
| PARAMETER temperature 0.7 | |
| PARAMETER top_k 40 | |
| PARAMETER top_p 0.9 | |
| PARAMETER repeat_penalty 1.1 | |
| ``` | |
| Then create and run: | |
| ```bash | |
| ollama create my-finetuned-llama -f Modelfile | |
| ollama run my-finetuned-llama "your prompt here" | |
| ``` | |
| Or use directly in Python: | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForCausalLM | |
| model_id = "./finetuned_models/llama3_2/final" | |
| tokenizer = AutoTokenizer.from_pretrained(model_id) | |
| model = AutoModelForCausalLM.from_pretrained(model_id) | |
| # Use the model | |
| inputs = tokenizer("def fibonacci", return_tensors="pt") | |
| outputs = model.generate(**inputs, max_length=100) | |
| print(tokenizer.decode(outputs[0])) | |
| ``` | |
| ## Hardware Requirements | |
| | Configuration | VRAM | Training Speed | Recommended | | |
| |---|---|---|---| | |
| | RTX 4090 (24GB) | 24GB | ~2 hours | β Excellent | | |
| | RTX 4080 (16GB) | 16GB | ~3-4 hours | β Good | | |
| | RTX 4070 (12GB) | 12GB | ~5-6 hours | Acceptable | | |
| | Tesla T4 (16GB) | 16GB | ~4-5 hours | Cloud-friendly | | |
| | CPU only | N/A | ~1-2 days | Not recommended | | |
| **Optimization Tips:** | |
| - Use `--batch-size 2` for GPUs with <12GB VRAM | |
| - Enable `--max-samples 1000` to train on subset first | |
| - LoRA (default) uses 70% less VRAM than full fine-tuning | |
| - Gradient checkpointing (default) reduces VRAM by 30% | |
| ## Integration with CodeArena RL | |
| To use fine-tuned models with the CodeArena RL environment: | |
| 1. **Export to Ollama** (see above) | |
| 2. **Update Dashboard.jsx** to use the new model: | |
| ```javascript | |
| const [ollamaModel, setOllamaModel] = useState('my-finetuned-llama'); | |
| ``` | |
| 3. **Or update ollama_rl_rollout.py:** | |
| ```bash | |
| python ollama_rl_rollout.py --ollama-model my-finetuned-llama | |
| ``` | |
| ## Monitoring Training | |
| Training logs are saved to TensorBoard format: | |
| ```bash | |
| tensorboard --logdir ./finetuned_models/llama3_2 | |
| ``` | |
| Open http://localhost:6006 to monitor: | |
| - Training loss | |
| - Learning rate schedules | |
| - GPU usage | |
| ## Troubleshooting | |
| ### Out of Memory (OOM) | |
| ```bash | |
| # Reduce batch size | |
| python finetune_models.py --batch-size 2 | |
| # Or limit samples | |
| python finetune_models.py --max-samples 1000 | |
| ``` | |
| ### Slow Training | |
| - Ensure GPU is being used: `nvidia-smi` | |
| - Use smaller model: `--model gemma3:1b` | |
| - Reduce max_length in tokenization (in code) | |
| ### Dataset Not Found | |
| ```bash | |
| # Download manually first | |
| python -c "from datasets import load_dataset; load_dataset('banksy235/XCoder-80K')" | |
| # Or use Hugging Face CLI | |
| hf download banksy235/XCoder-80K | |
| ``` | |
| ## Dataset Structure | |
| The XCoder-80K dataset contains code examples with metadata. The script automatically handles: | |
| - Multi-language code (Python, JavaScript, Java, C++, etc.) | |
| - Code with comments and docstrings | |
| - Various programming tasks (algorithms, utilities, etc.) | |
| ## Next Steps | |
| 1. **Run fine-tuning:** `python finetune_models.py --model llama3.2` | |
| 2. **Monitor training:** `tensorboard --logdir ./finetuned_models/llama3_2` | |
| 3. **Export to Ollama:** Create custom Modelfile and `ollama create` | |
| 4. **Test in CodeArena:** Update dashboard to use fine-tuned model | |
| 5. **Measure improvements:** Run `python plot_rewards.py` to see RL performance gains | |
| ## References | |
| - [XCoder-80K Dataset](https://huggingface.co/datasets/banksy235/XCoder-80K) | |
| - [Hugging Face Transformers](https://huggingface.co/docs/transformers) | |
| - [TRL (Transformer Reinforcement Learning)](https://github.com/huggingface/trl) | |
| - [Ollama Documentation](https://ollama.ai) | |
| - [PEFT (Parameter-Efficient Fine-Tuning)](https://github.com/huggingface/peft) | |