Spaces:

ZennyKenny
/

claude-code-fine-tune

Sleeping

App Files Files Community

claude-code-fine-tune / README.md

kghamilton89

Fix: Add required YAML frontmatter to README for HF Spaces

bb4b68b 6 days ago

preview code

raw

history blame contribute delete

3.01 kB

	---
	title: Qwen Fine-tuning on Codeforces CoTs
	emoji: 🧠
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: "5.9.1"
	app_file: app.py
	pinned: false
	---

	# Qwen2.5-0.5B Fine-tuning on Codeforces CoTs

	Fine-tuning Qwen2.5-0.5B-Instruct on the open-r1/codeforces-cots dataset for instruction following with chain-of-thought reasoning.

	## Dataset

	- Name: open-r1/codeforces-cots
	- Size: ~48K competitive programming problems with chain-of-thought solutions
	- Format: Chat format with problem descriptions and step-by-step reasoning

	## Model

	- Base Model: Qwen/Qwen2.5-0.5B-Instruct
	- Training Method: QLoRA (4-bit quantization + LoRA)
	- Target Modules: All attention and MLP layers

	## Setup

	1. Create and activate virtual environment:
	```bash
	python3 -m venv venv
	source venv/bin/activate # On Windows: venv\Scripts\activate
	```

	2. Install dependencies:
	```bash
	pip install -r requirements.txt
	```

	## Training

	### Option 1: Local Training (CPU/GPU)

	Run the fine-tuning script locally:
	```bash
	python finetune.py
	```

	Note: Local CPU training will be very slow. GPU training requires CUDA-compatible hardware.

	### Option 2: Hugging Face Spaces with GPU (Recommended)

	If you have a Hugging Face Pro license, you can train on GPU using Hugging Face Spaces:

	1. See [README_HF_SPACES.md](README_HF_SPACES.md) for detailed deployment instructions
	2. Upload this project to a new HF Space with GPU hardware
	3. Use the included Gradio interface (`app.py`) to monitor training in real-time
	4. Training time on T4 GPU: ~2-3 hours for 1000 steps

	This is the recommended approach as it provides:
	- Access to GPU hardware (T4, A10G, or A100)
	- Real-time training monitoring via web interface
	- Automatic checkpoint saving
	- Easy model download after training

	### Training Configuration

	- Batch Size: 4 per device (with gradient accumulation of 4)
	- Effective Batch Size: 16
	- Learning Rate: 2e-4
	- Epochs: 1
	- Max Sequence Length: 2048
	- LoRA r: 16
	- LoRA alpha: 32

	## Output

	The fine-tuned model will be saved to `./qwen-codeforces-cots/`

	## Usage

	After training, you can use the model with:

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel

	base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct")
	model = PeftModel.from_pretrained(base_model, "./qwen-codeforces-cots")
	tokenizer = AutoTokenizer.from_pretrained("./qwen-codeforces-cots")

	messages = [{"role": "user", "content": "Your problem here"}]
	text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = tokenizer(text, return_tensors="pt")
	outputs = model.generate(**inputs, max_new_tokens=512)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	## Notes

	- The training uses 4-bit quantization to reduce memory requirements
	- LoRA allows efficient fine-tuning with minimal trainable parameters
	- Training time will vary depending on your hardware