Spaces:

ZennyKenny
/

claude-code-fine-tune

Sleeping

kghamilton89 Claude Sonnet 4.5 commited on 6 days ago

Commit

e261fbe

1 Parent(s): b41a704

Add Qwen2.5-0.5B fine-tuning on Codeforces CoTs

- Fine-tuning script with QLoRA (4-bit quantization + LoRA)
- Gradio web interface for monitoring training progress
- Training on open-r1/codeforces-cots dataset (~48K examples)
- Auto-detects CUDA for GPU training with BitsAndBytes quantization
- Saves checkpoints every 200 steps
- Model testing script included

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Files changed (7) hide show

.gitignore +27 -0
README.md +93 -14
README_HF_SPACES.md +164 -0
app.py +90 -4
finetune.py +170 -0
requirements.txt +8 -0
test_model.py +64 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,27 @@

+# Virtual environment
+venv/
+env/
+*.pyc
+__pycache__/
+# Model outputs
+qwen-codeforces-cots/
+*.bin
+*.safetensors
+# Dataset cache
+.cache/
+# Logs
+*.log
+wandb/
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+# OS
+.DS_Store
+Thumbs.db

README.md CHANGED Viewed

@@ -1,14 +1,93 @@
----
-title: Claude Code Fine Tune
-emoji: 🐠
-colorFrom: blue
-colorTo: red
-sdk: gradio
-sdk_version: 6.0.2
-app_file: app.py
-pinned: false
-license: mit
-short_description: https://huggingface.co/blog/hf-skills-training
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# Qwen2.5-0.5B Fine-tuning on Codeforces CoTs
+Fine-tuning Qwen2.5-0.5B-Instruct on the open-r1/codeforces-cots dataset for instruction following with chain-of-thought reasoning.
+## Dataset
+- **Name**: open-r1/codeforces-cots
+- **Size**: ~48K competitive programming problems with chain-of-thought solutions
+- **Format**: Chat format with problem descriptions and step-by-step reasoning
+## Model
+- **Base Model**: Qwen/Qwen2.5-0.5B-Instruct
+- **Training Method**: QLoRA (4-bit quantization + LoRA)
+- **Target Modules**: All attention and MLP layers
+## Setup
+1. Create and activate virtual environment:
+```bash
+python3 -m venv venv
+source venv/bin/activate  # On Windows: venv\Scripts\activate
+```
+2. Install dependencies:
+```bash
+pip install -r requirements.txt
+```
+## Training
+### Option 1: Local Training (CPU/GPU)
+Run the fine-tuning script locally:
+```bash
+python finetune.py
+```
+**Note**: Local CPU training will be very slow. GPU training requires CUDA-compatible hardware.
+### Option 2: Hugging Face Spaces with GPU (Recommended)
+If you have a Hugging Face Pro license, you can train on GPU using Hugging Face Spaces:
+1. See [README_HF_SPACES.md](README_HF_SPACES.md) for detailed deployment instructions
+2. Upload this project to a new HF Space with GPU hardware
+3. Use the included Gradio interface (`app.py`) to monitor training in real-time
+4. Training time on T4 GPU: ~2-3 hours for 1000 steps
+This is the **recommended approach** as it provides:
+- Access to GPU hardware (T4, A10G, or A100)
+- Real-time training monitoring via web interface
+- Automatic checkpoint saving
+- Easy model download after training
+### Training Configuration
+- **Batch Size**: 4 per device (with gradient accumulation of 4)
+- **Effective Batch Size**: 16
+- **Learning Rate**: 2e-4
+- **Epochs**: 1
+- **Max Sequence Length**: 2048
+- **LoRA r**: 16
+- **LoRA alpha**: 32
+## Output
+The fine-tuned model will be saved to `./qwen-codeforces-cots/`
+## Usage
+After training, you can use the model with:
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from peft import PeftModel
+base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct")
+model = PeftModel.from_pretrained(base_model, "./qwen-codeforces-cots")
+tokenizer = AutoTokenizer.from_pretrained("./qwen-codeforces-cots")
+messages = [{"role": "user", "content": "Your problem here"}]
+text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer(text, return_tensors="pt")
+outputs = model.generate(**inputs, max_new_tokens=512)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+## Notes
+- The training uses 4-bit quantization to reduce memory requirements
+- LoRA allows efficient fine-tuning with minimal trainable parameters
+- Training time will vary depending on your hardware

README_HF_SPACES.md ADDED Viewed

	@@ -0,0 +1,164 @@

+# Deploying to Hugging Face Spaces with GPU
+This guide shows how to deploy the fine-tuning project to Hugging Face Spaces to leverage GPU training.
+## Prerequisites
+- Hugging Face account with Pro license (for GPU access)
+- Hugging Face CLI installed and authenticated
+## Setup Steps
+### 1. Authenticate with Hugging Face
+```bash
+huggingface-cli login
+```
+Enter your HF token when prompted.
+### 2. Create a New Space
+Go to https://huggingface.co/spaces and click "Create new Space":
+- **Owner**: Your username/organization
+- **Space name**: `qwen-codeforces-finetune` (or your preferred name)
+- **License**: Apache 2.0 (or your choice)
+- **Space SDK**: Gradio
+- **Space hardware**: GPU - T4 small (or higher for faster training)
+  - **Important**: You need HF Pro to access GPU hardware
+  - T4 small is sufficient for this 0.5B model
+  - For faster training, consider A10G or A100
+### 3. Clone Your New Space
+```bash
+git clone https://huggingface.co/spaces/YOUR_USERNAME/qwen-codeforces-finetune
+cd qwen-codeforces-finetune
+```
+### 4. Copy Project Files
+Copy these files from your local project to the Space directory:
+```bash
+cp app.py requirements.txt finetune.py test_model.py README.md .gitignore ./qwen-codeforces-finetune/
+```
+### 5. Push to Space
+```bash
+git add .
+git commit -m "Initial commit: Qwen fine-tuning on Codeforces CoTs"
+git push
+```
+### 6. Configure Space Hardware
+After pushing, go to your Space settings:
+- Navigate to "Settings" tab
+- Under "Space hardware", select a GPU option:
+  - **T4 small**: Good for testing (16 GB VRAM)
+  - **A10G small**: Faster training (24 GB VRAM)
+  - **A100**: Fastest but more expensive (40 GB VRAM)
+### 7. Monitor Training
+Once the Space builds and runs:
+1. Click the "Start Training" button
+2. Watch the real-time output in the interface
+3. Training will save checkpoints every 200 steps
+4. Final model saved to `./qwen-codeforces-cots/`
+## Training Time Estimates
+With 1000 steps and batch size 1 (gradient accumulation 16):
+- **T4 small**: ~2-3 hours
+- **A10G small**: ~1-2 hours
+- **A100**: ~30-60 minutes
+## Downloading the Trained Model
+After training completes on Spaces:
+### Option 1: Via Files Tab
+1. Go to your Space's "Files" tab
+2. Navigate to `qwen-codeforces-cots/`
+3. Download the adapter files:
+   - `adapter_config.json`
+   - `adapter_model.safetensors` (or `.bin`)
+   - `tokenizer_config.json`
+   - `special_tokens_map.json`
+   - Other tokenizer files
+### Option 2: Via Git
+```bash
+git clone https://huggingface.co/spaces/YOUR_USERNAME/qwen-codeforces-finetune
+cd qwen-codeforces-finetune
+# Model will be in qwen-codeforces-cots/ directory
+```
+### Option 3: Upload to Model Hub
+After training, you can upload the adapter to the Hugging Face Model Hub:
+```python
+from huggingface_hub import HfApi
+api = HfApi()
+api.upload_folder(
+    folder_path="./qwen-codeforces-cots",
+    repo_id="YOUR_USERNAME/qwen-codeforces-cots-lora",
+    repo_type="model",
+)
+```
+Then load it anywhere:
+```python
+from peft import PeftModel
+from transformers import AutoModelForCausalLM
+base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct")
+model = PeftModel.from_pretrained(base_model, "YOUR_USERNAME/qwen-codeforces-cots-lora")
+```
+## Cost Considerations
+With HF Pro ($9/month):
+- Get 5 free GPU hours per month
+- Additional GPU time is charged based on hardware tier
+- T4 small: ~$0.60/hour
+- A10G small: ~$3.15/hour
+For 1000 steps (~2-3 hours on T4), training costs:
+- Within free tier: $0
+- If exceeding free hours: ~$1.20-1.80
+## Troubleshooting
+### Space Crashes or OOM
+- Reduce `per_device_train_batch_size` in finetune.py
+- Reduce `max_seq_length` to 1024 or 512
+- Ensure you selected a GPU hardware option
+### Training Not Starting
+- Check Space logs in the "Logs" tab
+- Verify all dependencies are in requirements.txt
+- Make sure GPU hardware is selected (not CPU)
+### Slow Training
+- Upgrade to A10G or A100 hardware
+- Increase batch size if you have VRAM headroom
+- Check if using 4-bit quantization (should be automatic with CUDA)
+## Alternative: Hugging Face AutoTrain
+For a no-code option, consider using Hugging Face AutoTrain:
+```bash
+pip install autotrain-advanced
+autotrain llm --train --model Qwen/Qwen2.5-0.5B-Instruct \
+  --data-path . --lr 2e-4 --batch-size 1 \
+  --epochs 1 --trainer sft
+```
+See: https://huggingface.co/docs/autotrain/

app.py CHANGED Viewed

@@ -1,7 +1,93 @@
 import gradio as gr
-def greet(name):
-    return "Hello " + name + "!!"
-demo = gr.Interface(fn=greet, inputs="text", outputs="text")
-demo.launch()

 import gradio as gr
+import subprocess
+import os
+from pathlib import Path
+def run_training():
+    """Run the fine-tuning process and stream output."""
+    output_text = "Starting training...\n\n"
+    yield output_text
+    # Run the training script
+    process = subprocess.Popen(
+        ["python", "finetune.py"],
+        stdout=subprocess.PIPE,
+        stderr=subprocess.STDOUT,
+        text=True,
+        bufsize=1
+    )
+    # Stream output
+    for line in process.stdout:
+        output_text += line
+        yield output_text
+    process.wait()
+    if process.returncode == 0:
+        output_text += "\n\n✅ Training completed successfully!"
+        output_text += f"\n\nModel saved to: {os.path.abspath('./qwen-codeforces-cots')}"
+    else:
+        output_text += f"\n\n❌ Training failed with exit code {process.returncode}"
+    yield output_text
+def check_gpu():
+    """Check GPU availability."""
+    import torch
+    if torch.cuda.is_available():
+        gpu_name = torch.cuda.get_device_name(0)
+        gpu_memory = torch.cuda.get_device_properties(0).total_memory / 1e9
+        return f"✅ GPU Available: {gpu_name} ({gpu_memory:.1f} GB)"
+    else:
+        return "❌ No GPU available - training will be slow!"
+# Create Gradio interface
+with gr.Blocks(title="Qwen3 Fine-tuning on Codeforces") as demo:
+    gr.Markdown("""
+    # 🚀 Qwen3-0.5B Fine-tuning on Codeforces CoTs
+    Fine-tuning Qwen3-0.5B-Instruct on competitive programming problems with chain-of-thought reasoning.
+    **Dataset**: open-r1/codeforces-cots (~48K examples)
+    **Method**: QLoRA (LoRA + 4-bit quantization)
+    **Training**: 1000 steps with checkpoints every 200 steps
+    """)
+    gpu_status = gr.Textbox(label="GPU Status", value=check_gpu(), interactive=False)
+    gr.Markdown("### Training Configuration")
+    gr.Markdown("""
+    - **Model**: Qwen/Qwen2.5-0.5B-Instruct
+    - **Batch Size**: 1 (with gradient accumulation of 16)
+    - **Learning Rate**: 2e-4
+    - **Max Steps**: 1000
+    - **LoRA Rank**: 16
+    - **Trainable Parameters**: ~8.8M (1.75% of total)
+    """)
+    start_btn = gr.Button("🎯 Start Training", variant="primary", size="lg")
+    output = gr.Textbox(
+        label="Training Output",
+        lines=25,
+        max_lines=50,
+        show_copy_button=True
+    )
+    start_btn.click(
+        fn=run_training,
+        inputs=[],
+        outputs=[output]
+    )
+    gr.Markdown("""
+    ### 📝 Notes
+    - Training will take several hours depending on GPU speed
+    - Checkpoints are saved every 200 steps to `./qwen-codeforces-cots/`
+    - You can download the final model after training completes
+    - The model will be compatible with the base Qwen2.5-0.5B-Instruct architecture
+    """)
+if __name__ == "__main__":
+    demo.launch()

finetune.py ADDED Viewed

	@@ -0,0 +1,170 @@

+import torch
+from datasets import load_dataset
+from transformers import (
+    AutoModelForCausalLM,
+    AutoTokenizer,
+    TrainingArguments,
+    Trainer,
+    DataCollatorForLanguageModeling,
+)
+from peft import LoraConfig, get_peft_model
+def main():
+    # Configuration
+    model_name = "Qwen/Qwen2.5-0.5B-Instruct"  # Using 0.5B as 0.6B doesn't exist
+    output_dir = "./qwen-codeforces-cots"
+    max_seq_length = 2048
+    # Detect device - prefer CUDA for GPU training
+    if torch.cuda.is_available():
+        device = "cuda"
+        use_fp16 = True
+        print(f"Using device: CUDA ({torch.cuda.get_device_name(0)})")
+    else:
+        device = "cpu"
+        use_fp16 = False
+        print(f"Using device: CPU (training will be slow)")
+    print("Loading dataset...")
+    dataset = load_dataset("open-r1/codeforces-cots", split="train")
+    # Split into train and eval
+    dataset = dataset.train_test_split(test_size=0.05, seed=42)
+    train_dataset = dataset["train"]
+    eval_dataset = dataset["test"]
+    print(f"Train samples: {len(train_dataset)}")
+    print(f"Eval samples: {len(eval_dataset)}")
+    print("Loading tokenizer...")
+    tokenizer = AutoTokenizer.from_pretrained(
+        model_name,
+        trust_remote_code=True,
+    )
+    tokenizer.pad_token = tokenizer.eos_token
+    tokenizer.padding_side = "right"
+    print("Loading model...")
+    # Use appropriate dtype and device_map based on hardware
+    if torch.cuda.is_available():
+        from transformers import BitsAndBytesConfig
+        # Use 4-bit quantization for efficient GPU training
+        bnb_config = BitsAndBytesConfig(
+            load_in_4bit=True,
+            bnb_4bit_quant_type="nf4",
+            bnb_4bit_compute_dtype=torch.float16,
+            bnb_4bit_use_double_quant=True,
+        )
+        model = AutoModelForCausalLM.from_pretrained(
+            model_name,
+            quantization_config=bnb_config,
+            device_map="auto",
+            trust_remote_code=True,
+        )
+        from peft import prepare_model_for_kbit_training
+        model = prepare_model_for_kbit_training(model)
+    else:
+        model = AutoModelForCausalLM.from_pretrained(
+            model_name,
+            torch_dtype=torch.float32,
+            trust_remote_code=True,
+        )
+    # LoRA config
+    lora_config = LoraConfig(
+        r=16,
+        lora_alpha=32,
+        target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
+        lora_dropout=0.05,
+        bias="none",
+        task_type="CAUSAL_LM",
+    )
+    # Apply LoRA
+    model = get_peft_model(model, lora_config)
+    model.print_trainable_parameters()
+    # Format and tokenize dataset
+    def format_and_tokenize(example):
+        # Format the chat messages
+        text = tokenizer.apply_chat_template(
+            example["messages"],
+            tokenize=False,
+            add_generation_prompt=False
+        )
+        # Tokenize
+        tokenized = tokenizer(
+            text,
+            truncation=True,
+            max_length=max_seq_length,
+            padding=False,
+            return_tensors=None,
+        )
+        # Add labels for causal language modeling
+        tokenized["labels"] = tokenized["input_ids"].copy()
+        return tokenized
+    print("Formatting and tokenizing dataset...")
+    train_dataset = train_dataset.map(
+        format_and_tokenize,
+        remove_columns=train_dataset.column_names,
+        desc="Formatting train dataset"
+    )
+    eval_dataset = eval_dataset.map(
+        format_and_tokenize,
+        remove_columns=eval_dataset.column_names,
+        desc="Formatting eval dataset"
+    )
+    # Data collator for padding
+    data_collator = DataCollatorForLanguageModeling(
+        tokenizer=tokenizer,
+        mlm=False,  # We're doing causal LM, not masked LM
+    )
+    # Training arguments - reduced for CPU training
+    training_args = TrainingArguments(
+        output_dir=output_dir,
+        per_device_train_batch_size=1,  # Reduced for CPU
+        per_device_eval_batch_size=1,
+        gradient_accumulation_steps=16,  # Maintain effective batch size
+        num_train_epochs=1,
+        max_steps=1000,  # Limit steps for testing
+        learning_rate=2e-4,
+        fp16=use_fp16,
+        save_strategy="steps",
+        save_steps=200,  # Save more frequently
+        eval_strategy="steps",
+        eval_steps=200,
+        logging_steps=10,
+        warmup_steps=50,
+        lr_scheduler_type="cosine",
+        optim="adamw_torch",
+        report_to="none",
+        max_grad_norm=0.3,
+        save_total_limit=2,
+        load_best_model_at_end=False,  # Disable to avoid loading issues
+        dataloader_num_workers=0,  # No multiprocessing for stability
+    )
+    # Trainer
+    trainer = Trainer(
+        model=model,
+        args=training_args,
+        train_dataset=train_dataset,
+        eval_dataset=eval_dataset,
+        data_collator=data_collator,
+    )
+    print("Starting training...")
+    trainer.train()
+    print("Saving model...")
+    trainer.save_model(output_dir)
+    tokenizer.save_pretrained(output_dir)
+    print("Training complete!")
+    print(f"Model saved to: {output_dir}")
+if __name__ == "__main__":
+    main()

requirements.txt ADDED Viewed

	@@ -0,0 +1,8 @@

+torch
+transformers
+datasets
+accelerate
+peft
+trl
+bitsandbytes
+gradio

test_model.py ADDED Viewed

	@@ -0,0 +1,64 @@

+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from peft import PeftModel
+def test_model():
+    base_model_name = "Qwen/Qwen2.5-0.5B-Instruct"
+    adapter_path = "./qwen-codeforces-cots"
+    print("Loading tokenizer...")
+    tokenizer = AutoTokenizer.from_pretrained(adapter_path, trust_remote_code=True)
+    print("Loading base model...")
+    base_model = AutoModelForCausalLM.from_pretrained(
+        base_model_name,
+        dtype=torch.float32,
+        trust_remote_code=True,
+    )
+    print("Loading fine-tuned adapter...")
+    model = PeftModel.from_pretrained(base_model, adapter_path)
+    model.eval()
+    # Test with a simple programming problem
+    test_problem = """You are given an array a of n integers. Find the maximum element in the array.
+Input format:
+The first line contains an integer n (1 ≤ n ≤ 100).
+The second line contains n integers a₁, a₂, ..., aₙ (1 ≤ aᵢ ≤ 1000).
+Output format:
+Print the maximum element."""
+    messages = [
+        {"role": "user", "content": f"Please reason step by step about the solution, then provide a complete implementation.\n\n# Problem\n\n{test_problem}"}
+    ]
+    text = tokenizer.apply_chat_template(
+        messages,
+        tokenize=False,
+        add_generation_prompt=True
+    )
+    inputs = tokenizer(text, return_tensors="pt")
+    print("\nGenerating response...")
+    with torch.no_grad():
+        outputs = model.generate(
+            **inputs,
+            max_new_tokens=512,
+            temperature=0.7,
+            do_sample=True,
+            top_p=0.9,
+        )
+    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+    print("\n" + "="*80)
+    print("MODEL RESPONSE:")
+    print("="*80)
+    print(response)
+    print("="*80)
+if __name__ == "__main__":
+    test_model()