Spaces:
Sleeping
Sleeping
| title: Qwen Fine-tuning on Codeforces CoTs | |
| emoji: 🧠 | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: "5.9.1" | |
| app_file: app.py | |
| pinned: false | |
| # Qwen2.5-0.5B Fine-tuning on Codeforces CoTs | |
| Fine-tuning Qwen2.5-0.5B-Instruct on the open-r1/codeforces-cots dataset for instruction following with chain-of-thought reasoning. | |
| ## Dataset | |
| - **Name**: open-r1/codeforces-cots | |
| - **Size**: ~48K competitive programming problems with chain-of-thought solutions | |
| - **Format**: Chat format with problem descriptions and step-by-step reasoning | |
| ## Model | |
| - **Base Model**: Qwen/Qwen2.5-0.5B-Instruct | |
| - **Training Method**: QLoRA (4-bit quantization + LoRA) | |
| - **Target Modules**: All attention and MLP layers | |
| ## Setup | |
| 1. Create and activate virtual environment: | |
| ```bash | |
| python3 -m venv venv | |
| source venv/bin/activate # On Windows: venv\Scripts\activate | |
| ``` | |
| 2. Install dependencies: | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| ## Training | |
| ### Option 1: Local Training (CPU/GPU) | |
| Run the fine-tuning script locally: | |
| ```bash | |
| python finetune.py | |
| ``` | |
| **Note**: Local CPU training will be very slow. GPU training requires CUDA-compatible hardware. | |
| ### Option 2: Hugging Face Spaces with GPU (Recommended) | |
| If you have a Hugging Face Pro license, you can train on GPU using Hugging Face Spaces: | |
| 1. See [README_HF_SPACES.md](README_HF_SPACES.md) for detailed deployment instructions | |
| 2. Upload this project to a new HF Space with GPU hardware | |
| 3. Use the included Gradio interface (`app.py`) to monitor training in real-time | |
| 4. Training time on T4 GPU: ~2-3 hours for 1000 steps | |
| This is the **recommended approach** as it provides: | |
| - Access to GPU hardware (T4, A10G, or A100) | |
| - Real-time training monitoring via web interface | |
| - Automatic checkpoint saving | |
| - Easy model download after training | |
| ### Training Configuration | |
| - **Batch Size**: 4 per device (with gradient accumulation of 4) | |
| - **Effective Batch Size**: 16 | |
| - **Learning Rate**: 2e-4 | |
| - **Epochs**: 1 | |
| - **Max Sequence Length**: 2048 | |
| - **LoRA r**: 16 | |
| - **LoRA alpha**: 32 | |
| ## Output | |
| The fine-tuned model will be saved to `./qwen-codeforces-cots/` | |
| ## Usage | |
| After training, you can use the model with: | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| from peft import PeftModel | |
| base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct") | |
| model = PeftModel.from_pretrained(base_model, "./qwen-codeforces-cots") | |
| tokenizer = AutoTokenizer.from_pretrained("./qwen-codeforces-cots") | |
| messages = [{"role": "user", "content": "Your problem here"}] | |
| text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) | |
| inputs = tokenizer(text, return_tensors="pt") | |
| outputs = model.generate(**inputs, max_new_tokens=512) | |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) | |
| ``` | |
| ## Notes | |
| - The training uses 4-bit quantization to reduce memory requirements | |
| - LoRA allows efficient fine-tuning with minimal trainable parameters | |
| - Training time will vary depending on your hardware | |