kghamilton89 Claude Sonnet 4.5 commited on
Commit
e261fbe
·
1 Parent(s): b41a704

Add Qwen2.5-0.5B fine-tuning on Codeforces CoTs

Browse files

- Fine-tuning script with QLoRA (4-bit quantization + LoRA)
- Gradio web interface for monitoring training progress
- Training on open-r1/codeforces-cots dataset (~48K examples)
- Auto-detects CUDA for GPU training with BitsAndBytes quantization
- Saves checkpoints every 200 steps
- Model testing script included

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Files changed (7) hide show
  1. .gitignore +27 -0
  2. README.md +93 -14
  3. README_HF_SPACES.md +164 -0
  4. app.py +90 -4
  5. finetune.py +170 -0
  6. requirements.txt +8 -0
  7. test_model.py +64 -0
.gitignore ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Virtual environment
2
+ venv/
3
+ env/
4
+ *.pyc
5
+ __pycache__/
6
+
7
+ # Model outputs
8
+ qwen-codeforces-cots/
9
+ *.bin
10
+ *.safetensors
11
+
12
+ # Dataset cache
13
+ .cache/
14
+
15
+ # Logs
16
+ *.log
17
+ wandb/
18
+
19
+ # IDE
20
+ .vscode/
21
+ .idea/
22
+ *.swp
23
+ *.swo
24
+
25
+ # OS
26
+ .DS_Store
27
+ Thumbs.db
README.md CHANGED
@@ -1,14 +1,93 @@
1
- ---
2
- title: Claude Code Fine Tune
3
- emoji: 🐠
4
- colorFrom: blue
5
- colorTo: red
6
- sdk: gradio
7
- sdk_version: 6.0.2
8
- app_file: app.py
9
- pinned: false
10
- license: mit
11
- short_description: https://huggingface.co/blog/hf-skills-training
12
- ---
13
-
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Qwen2.5-0.5B Fine-tuning on Codeforces CoTs
2
+
3
+ Fine-tuning Qwen2.5-0.5B-Instruct on the open-r1/codeforces-cots dataset for instruction following with chain-of-thought reasoning.
4
+
5
+ ## Dataset
6
+
7
+ - **Name**: open-r1/codeforces-cots
8
+ - **Size**: ~48K competitive programming problems with chain-of-thought solutions
9
+ - **Format**: Chat format with problem descriptions and step-by-step reasoning
10
+
11
+ ## Model
12
+
13
+ - **Base Model**: Qwen/Qwen2.5-0.5B-Instruct
14
+ - **Training Method**: QLoRA (4-bit quantization + LoRA)
15
+ - **Target Modules**: All attention and MLP layers
16
+
17
+ ## Setup
18
+
19
+ 1. Create and activate virtual environment:
20
+ ```bash
21
+ python3 -m venv venv
22
+ source venv/bin/activate # On Windows: venv\Scripts\activate
23
+ ```
24
+
25
+ 2. Install dependencies:
26
+ ```bash
27
+ pip install -r requirements.txt
28
+ ```
29
+
30
+ ## Training
31
+
32
+ ### Option 1: Local Training (CPU/GPU)
33
+
34
+ Run the fine-tuning script locally:
35
+ ```bash
36
+ python finetune.py
37
+ ```
38
+
39
+ **Note**: Local CPU training will be very slow. GPU training requires CUDA-compatible hardware.
40
+
41
+ ### Option 2: Hugging Face Spaces with GPU (Recommended)
42
+
43
+ If you have a Hugging Face Pro license, you can train on GPU using Hugging Face Spaces:
44
+
45
+ 1. See [README_HF_SPACES.md](README_HF_SPACES.md) for detailed deployment instructions
46
+ 2. Upload this project to a new HF Space with GPU hardware
47
+ 3. Use the included Gradio interface (`app.py`) to monitor training in real-time
48
+ 4. Training time on T4 GPU: ~2-3 hours for 1000 steps
49
+
50
+ This is the **recommended approach** as it provides:
51
+ - Access to GPU hardware (T4, A10G, or A100)
52
+ - Real-time training monitoring via web interface
53
+ - Automatic checkpoint saving
54
+ - Easy model download after training
55
+
56
+ ### Training Configuration
57
+
58
+ - **Batch Size**: 4 per device (with gradient accumulation of 4)
59
+ - **Effective Batch Size**: 16
60
+ - **Learning Rate**: 2e-4
61
+ - **Epochs**: 1
62
+ - **Max Sequence Length**: 2048
63
+ - **LoRA r**: 16
64
+ - **LoRA alpha**: 32
65
+
66
+ ## Output
67
+
68
+ The fine-tuned model will be saved to `./qwen-codeforces-cots/`
69
+
70
+ ## Usage
71
+
72
+ After training, you can use the model with:
73
+
74
+ ```python
75
+ from transformers import AutoModelForCausalLM, AutoTokenizer
76
+ from peft import PeftModel
77
+
78
+ base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct")
79
+ model = PeftModel.from_pretrained(base_model, "./qwen-codeforces-cots")
80
+ tokenizer = AutoTokenizer.from_pretrained("./qwen-codeforces-cots")
81
+
82
+ messages = [{"role": "user", "content": "Your problem here"}]
83
+ text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
84
+ inputs = tokenizer(text, return_tensors="pt")
85
+ outputs = model.generate(**inputs, max_new_tokens=512)
86
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
87
+ ```
88
+
89
+ ## Notes
90
+
91
+ - The training uses 4-bit quantization to reduce memory requirements
92
+ - LoRA allows efficient fine-tuning with minimal trainable parameters
93
+ - Training time will vary depending on your hardware
README_HF_SPACES.md ADDED
@@ -0,0 +1,164 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Deploying to Hugging Face Spaces with GPU
2
+
3
+ This guide shows how to deploy the fine-tuning project to Hugging Face Spaces to leverage GPU training.
4
+
5
+ ## Prerequisites
6
+
7
+ - Hugging Face account with Pro license (for GPU access)
8
+ - Hugging Face CLI installed and authenticated
9
+
10
+ ## Setup Steps
11
+
12
+ ### 1. Authenticate with Hugging Face
13
+
14
+ ```bash
15
+ huggingface-cli login
16
+ ```
17
+
18
+ Enter your HF token when prompted.
19
+
20
+ ### 2. Create a New Space
21
+
22
+ Go to https://huggingface.co/spaces and click "Create new Space":
23
+
24
+ - **Owner**: Your username/organization
25
+ - **Space name**: `qwen-codeforces-finetune` (or your preferred name)
26
+ - **License**: Apache 2.0 (or your choice)
27
+ - **Space SDK**: Gradio
28
+ - **Space hardware**: GPU - T4 small (or higher for faster training)
29
+ - **Important**: You need HF Pro to access GPU hardware
30
+ - T4 small is sufficient for this 0.5B model
31
+ - For faster training, consider A10G or A100
32
+
33
+ ### 3. Clone Your New Space
34
+
35
+ ```bash
36
+ git clone https://huggingface.co/spaces/YOUR_USERNAME/qwen-codeforces-finetune
37
+ cd qwen-codeforces-finetune
38
+ ```
39
+
40
+ ### 4. Copy Project Files
41
+
42
+ Copy these files from your local project to the Space directory:
43
+
44
+ ```bash
45
+ cp app.py requirements.txt finetune.py test_model.py README.md .gitignore ./qwen-codeforces-finetune/
46
+ ```
47
+
48
+ ### 5. Push to Space
49
+
50
+ ```bash
51
+ git add .
52
+ git commit -m "Initial commit: Qwen fine-tuning on Codeforces CoTs"
53
+ git push
54
+ ```
55
+
56
+ ### 6. Configure Space Hardware
57
+
58
+ After pushing, go to your Space settings:
59
+ - Navigate to "Settings" tab
60
+ - Under "Space hardware", select a GPU option:
61
+ - **T4 small**: Good for testing (16 GB VRAM)
62
+ - **A10G small**: Faster training (24 GB VRAM)
63
+ - **A100**: Fastest but more expensive (40 GB VRAM)
64
+
65
+ ### 7. Monitor Training
66
+
67
+ Once the Space builds and runs:
68
+ 1. Click the "Start Training" button
69
+ 2. Watch the real-time output in the interface
70
+ 3. Training will save checkpoints every 200 steps
71
+ 4. Final model saved to `./qwen-codeforces-cots/`
72
+
73
+ ## Training Time Estimates
74
+
75
+ With 1000 steps and batch size 1 (gradient accumulation 16):
76
+
77
+ - **T4 small**: ~2-3 hours
78
+ - **A10G small**: ~1-2 hours
79
+ - **A100**: ~30-60 minutes
80
+
81
+ ## Downloading the Trained Model
82
+
83
+ After training completes on Spaces:
84
+
85
+ ### Option 1: Via Files Tab
86
+ 1. Go to your Space's "Files" tab
87
+ 2. Navigate to `qwen-codeforces-cots/`
88
+ 3. Download the adapter files:
89
+ - `adapter_config.json`
90
+ - `adapter_model.safetensors` (or `.bin`)
91
+ - `tokenizer_config.json`
92
+ - `special_tokens_map.json`
93
+ - Other tokenizer files
94
+
95
+ ### Option 2: Via Git
96
+ ```bash
97
+ git clone https://huggingface.co/spaces/YOUR_USERNAME/qwen-codeforces-finetune
98
+ cd qwen-codeforces-finetune
99
+ # Model will be in qwen-codeforces-cots/ directory
100
+ ```
101
+
102
+ ### Option 3: Upload to Model Hub
103
+ After training, you can upload the adapter to the Hugging Face Model Hub:
104
+
105
+ ```python
106
+ from huggingface_hub import HfApi
107
+
108
+ api = HfApi()
109
+ api.upload_folder(
110
+ folder_path="./qwen-codeforces-cots",
111
+ repo_id="YOUR_USERNAME/qwen-codeforces-cots-lora",
112
+ repo_type="model",
113
+ )
114
+ ```
115
+
116
+ Then load it anywhere:
117
+ ```python
118
+ from peft import PeftModel
119
+ from transformers import AutoModelForCausalLM
120
+
121
+ base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct")
122
+ model = PeftModel.from_pretrained(base_model, "YOUR_USERNAME/qwen-codeforces-cots-lora")
123
+ ```
124
+
125
+ ## Cost Considerations
126
+
127
+ With HF Pro ($9/month):
128
+ - Get 5 free GPU hours per month
129
+ - Additional GPU time is charged based on hardware tier
130
+ - T4 small: ~$0.60/hour
131
+ - A10G small: ~$3.15/hour
132
+
133
+ For 1000 steps (~2-3 hours on T4), training costs:
134
+ - Within free tier: $0
135
+ - If exceeding free hours: ~$1.20-1.80
136
+
137
+ ## Troubleshooting
138
+
139
+ ### Space Crashes or OOM
140
+ - Reduce `per_device_train_batch_size` in finetune.py
141
+ - Reduce `max_seq_length` to 1024 or 512
142
+ - Ensure you selected a GPU hardware option
143
+
144
+ ### Training Not Starting
145
+ - Check Space logs in the "Logs" tab
146
+ - Verify all dependencies are in requirements.txt
147
+ - Make sure GPU hardware is selected (not CPU)
148
+
149
+ ### Slow Training
150
+ - Upgrade to A10G or A100 hardware
151
+ - Increase batch size if you have VRAM headroom
152
+ - Check if using 4-bit quantization (should be automatic with CUDA)
153
+
154
+ ## Alternative: Hugging Face AutoTrain
155
+
156
+ For a no-code option, consider using Hugging Face AutoTrain:
157
+ ```bash
158
+ pip install autotrain-advanced
159
+ autotrain llm --train --model Qwen/Qwen2.5-0.5B-Instruct \
160
+ --data-path . --lr 2e-4 --batch-size 1 \
161
+ --epochs 1 --trainer sft
162
+ ```
163
+
164
+ See: https://huggingface.co/docs/autotrain/
app.py CHANGED
@@ -1,7 +1,93 @@
1
  import gradio as gr
 
 
 
2
 
3
- def greet(name):
4
- return "Hello " + name + "!!"
5
 
6
- demo = gr.Interface(fn=greet, inputs="text", outputs="text")
7
- demo.launch()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  import gradio as gr
2
+ import subprocess
3
+ import os
4
+ from pathlib import Path
5
 
6
+ def run_training():
7
+ """Run the fine-tuning process and stream output."""
8
 
9
+ output_text = "Starting training...\n\n"
10
+ yield output_text
11
+
12
+ # Run the training script
13
+ process = subprocess.Popen(
14
+ ["python", "finetune.py"],
15
+ stdout=subprocess.PIPE,
16
+ stderr=subprocess.STDOUT,
17
+ text=True,
18
+ bufsize=1
19
+ )
20
+
21
+ # Stream output
22
+ for line in process.stdout:
23
+ output_text += line
24
+ yield output_text
25
+
26
+ process.wait()
27
+
28
+ if process.returncode == 0:
29
+ output_text += "\n\n✅ Training completed successfully!"
30
+ output_text += f"\n\nModel saved to: {os.path.abspath('./qwen-codeforces-cots')}"
31
+ else:
32
+ output_text += f"\n\n❌ Training failed with exit code {process.returncode}"
33
+
34
+ yield output_text
35
+
36
+ def check_gpu():
37
+ """Check GPU availability."""
38
+ import torch
39
+ if torch.cuda.is_available():
40
+ gpu_name = torch.cuda.get_device_name(0)
41
+ gpu_memory = torch.cuda.get_device_properties(0).total_memory / 1e9
42
+ return f"✅ GPU Available: {gpu_name} ({gpu_memory:.1f} GB)"
43
+ else:
44
+ return "❌ No GPU available - training will be slow!"
45
+
46
+ # Create Gradio interface
47
+ with gr.Blocks(title="Qwen3 Fine-tuning on Codeforces") as demo:
48
+ gr.Markdown("""
49
+ # 🚀 Qwen3-0.5B Fine-tuning on Codeforces CoTs
50
+
51
+ Fine-tuning Qwen3-0.5B-Instruct on competitive programming problems with chain-of-thought reasoning.
52
+
53
+ **Dataset**: open-r1/codeforces-cots (~48K examples)
54
+ **Method**: QLoRA (LoRA + 4-bit quantization)
55
+ **Training**: 1000 steps with checkpoints every 200 steps
56
+ """)
57
+
58
+ gpu_status = gr.Textbox(label="GPU Status", value=check_gpu(), interactive=False)
59
+
60
+ gr.Markdown("### Training Configuration")
61
+ gr.Markdown("""
62
+ - **Model**: Qwen/Qwen2.5-0.5B-Instruct
63
+ - **Batch Size**: 1 (with gradient accumulation of 16)
64
+ - **Learning Rate**: 2e-4
65
+ - **Max Steps**: 1000
66
+ - **LoRA Rank**: 16
67
+ - **Trainable Parameters**: ~8.8M (1.75% of total)
68
+ """)
69
+
70
+ start_btn = gr.Button("🎯 Start Training", variant="primary", size="lg")
71
+ output = gr.Textbox(
72
+ label="Training Output",
73
+ lines=25,
74
+ max_lines=50,
75
+ show_copy_button=True
76
+ )
77
+
78
+ start_btn.click(
79
+ fn=run_training,
80
+ inputs=[],
81
+ outputs=[output]
82
+ )
83
+
84
+ gr.Markdown("""
85
+ ### 📝 Notes
86
+ - Training will take several hours depending on GPU speed
87
+ - Checkpoints are saved every 200 steps to `./qwen-codeforces-cots/`
88
+ - You can download the final model after training completes
89
+ - The model will be compatible with the base Qwen2.5-0.5B-Instruct architecture
90
+ """)
91
+
92
+ if __name__ == "__main__":
93
+ demo.launch()
finetune.py ADDED
@@ -0,0 +1,170 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ from datasets import load_dataset
3
+ from transformers import (
4
+ AutoModelForCausalLM,
5
+ AutoTokenizer,
6
+ TrainingArguments,
7
+ Trainer,
8
+ DataCollatorForLanguageModeling,
9
+ )
10
+ from peft import LoraConfig, get_peft_model
11
+
12
+ def main():
13
+ # Configuration
14
+ model_name = "Qwen/Qwen2.5-0.5B-Instruct" # Using 0.5B as 0.6B doesn't exist
15
+ output_dir = "./qwen-codeforces-cots"
16
+ max_seq_length = 2048
17
+
18
+ # Detect device - prefer CUDA for GPU training
19
+ if torch.cuda.is_available():
20
+ device = "cuda"
21
+ use_fp16 = True
22
+ print(f"Using device: CUDA ({torch.cuda.get_device_name(0)})")
23
+ else:
24
+ device = "cpu"
25
+ use_fp16 = False
26
+ print(f"Using device: CPU (training will be slow)")
27
+
28
+ print("Loading dataset...")
29
+ dataset = load_dataset("open-r1/codeforces-cots", split="train")
30
+
31
+ # Split into train and eval
32
+ dataset = dataset.train_test_split(test_size=0.05, seed=42)
33
+ train_dataset = dataset["train"]
34
+ eval_dataset = dataset["test"]
35
+
36
+ print(f"Train samples: {len(train_dataset)}")
37
+ print(f"Eval samples: {len(eval_dataset)}")
38
+
39
+ print("Loading tokenizer...")
40
+ tokenizer = AutoTokenizer.from_pretrained(
41
+ model_name,
42
+ trust_remote_code=True,
43
+ )
44
+ tokenizer.pad_token = tokenizer.eos_token
45
+ tokenizer.padding_side = "right"
46
+
47
+ print("Loading model...")
48
+ # Use appropriate dtype and device_map based on hardware
49
+ if torch.cuda.is_available():
50
+ from transformers import BitsAndBytesConfig
51
+ # Use 4-bit quantization for efficient GPU training
52
+ bnb_config = BitsAndBytesConfig(
53
+ load_in_4bit=True,
54
+ bnb_4bit_quant_type="nf4",
55
+ bnb_4bit_compute_dtype=torch.float16,
56
+ bnb_4bit_use_double_quant=True,
57
+ )
58
+ model = AutoModelForCausalLM.from_pretrained(
59
+ model_name,
60
+ quantization_config=bnb_config,
61
+ device_map="auto",
62
+ trust_remote_code=True,
63
+ )
64
+ from peft import prepare_model_for_kbit_training
65
+ model = prepare_model_for_kbit_training(model)
66
+ else:
67
+ model = AutoModelForCausalLM.from_pretrained(
68
+ model_name,
69
+ torch_dtype=torch.float32,
70
+ trust_remote_code=True,
71
+ )
72
+
73
+ # LoRA config
74
+ lora_config = LoraConfig(
75
+ r=16,
76
+ lora_alpha=32,
77
+ target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
78
+ lora_dropout=0.05,
79
+ bias="none",
80
+ task_type="CAUSAL_LM",
81
+ )
82
+
83
+ # Apply LoRA
84
+ model = get_peft_model(model, lora_config)
85
+ model.print_trainable_parameters()
86
+
87
+ # Format and tokenize dataset
88
+ def format_and_tokenize(example):
89
+ # Format the chat messages
90
+ text = tokenizer.apply_chat_template(
91
+ example["messages"],
92
+ tokenize=False,
93
+ add_generation_prompt=False
94
+ )
95
+ # Tokenize
96
+ tokenized = tokenizer(
97
+ text,
98
+ truncation=True,
99
+ max_length=max_seq_length,
100
+ padding=False,
101
+ return_tensors=None,
102
+ )
103
+ # Add labels for causal language modeling
104
+ tokenized["labels"] = tokenized["input_ids"].copy()
105
+ return tokenized
106
+
107
+ print("Formatting and tokenizing dataset...")
108
+ train_dataset = train_dataset.map(
109
+ format_and_tokenize,
110
+ remove_columns=train_dataset.column_names,
111
+ desc="Formatting train dataset"
112
+ )
113
+ eval_dataset = eval_dataset.map(
114
+ format_and_tokenize,
115
+ remove_columns=eval_dataset.column_names,
116
+ desc="Formatting eval dataset"
117
+ )
118
+
119
+ # Data collator for padding
120
+ data_collator = DataCollatorForLanguageModeling(
121
+ tokenizer=tokenizer,
122
+ mlm=False, # We're doing causal LM, not masked LM
123
+ )
124
+
125
+ # Training arguments - reduced for CPU training
126
+ training_args = TrainingArguments(
127
+ output_dir=output_dir,
128
+ per_device_train_batch_size=1, # Reduced for CPU
129
+ per_device_eval_batch_size=1,
130
+ gradient_accumulation_steps=16, # Maintain effective batch size
131
+ num_train_epochs=1,
132
+ max_steps=1000, # Limit steps for testing
133
+ learning_rate=2e-4,
134
+ fp16=use_fp16,
135
+ save_strategy="steps",
136
+ save_steps=200, # Save more frequently
137
+ eval_strategy="steps",
138
+ eval_steps=200,
139
+ logging_steps=10,
140
+ warmup_steps=50,
141
+ lr_scheduler_type="cosine",
142
+ optim="adamw_torch",
143
+ report_to="none",
144
+ max_grad_norm=0.3,
145
+ save_total_limit=2,
146
+ load_best_model_at_end=False, # Disable to avoid loading issues
147
+ dataloader_num_workers=0, # No multiprocessing for stability
148
+ )
149
+
150
+ # Trainer
151
+ trainer = Trainer(
152
+ model=model,
153
+ args=training_args,
154
+ train_dataset=train_dataset,
155
+ eval_dataset=eval_dataset,
156
+ data_collator=data_collator,
157
+ )
158
+
159
+ print("Starting training...")
160
+ trainer.train()
161
+
162
+ print("Saving model...")
163
+ trainer.save_model(output_dir)
164
+ tokenizer.save_pretrained(output_dir)
165
+
166
+ print("Training complete!")
167
+ print(f"Model saved to: {output_dir}")
168
+
169
+ if __name__ == "__main__":
170
+ main()
requirements.txt ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ torch
2
+ transformers
3
+ datasets
4
+ accelerate
5
+ peft
6
+ trl
7
+ bitsandbytes
8
+ gradio
test_model.py ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ from transformers import AutoModelForCausalLM, AutoTokenizer
3
+ from peft import PeftModel
4
+
5
+ def test_model():
6
+ base_model_name = "Qwen/Qwen2.5-0.5B-Instruct"
7
+ adapter_path = "./qwen-codeforces-cots"
8
+
9
+ print("Loading tokenizer...")
10
+ tokenizer = AutoTokenizer.from_pretrained(adapter_path, trust_remote_code=True)
11
+
12
+ print("Loading base model...")
13
+ base_model = AutoModelForCausalLM.from_pretrained(
14
+ base_model_name,
15
+ dtype=torch.float32,
16
+ trust_remote_code=True,
17
+ )
18
+
19
+ print("Loading fine-tuned adapter...")
20
+ model = PeftModel.from_pretrained(base_model, adapter_path)
21
+ model.eval()
22
+
23
+ # Test with a simple programming problem
24
+ test_problem = """You are given an array a of n integers. Find the maximum element in the array.
25
+
26
+ Input format:
27
+ The first line contains an integer n (1 ≤ n ≤ 100).
28
+ The second line contains n integers a₁, a₂, ..., aₙ (1 ≤ aᵢ ≤ 1000).
29
+
30
+ Output format:
31
+ Print the maximum element."""
32
+
33
+ messages = [
34
+ {"role": "user", "content": f"Please reason step by step about the solution, then provide a complete implementation.\n\n# Problem\n\n{test_problem}"}
35
+ ]
36
+
37
+ text = tokenizer.apply_chat_template(
38
+ messages,
39
+ tokenize=False,
40
+ add_generation_prompt=True
41
+ )
42
+
43
+ inputs = tokenizer(text, return_tensors="pt")
44
+
45
+ print("\nGenerating response...")
46
+ with torch.no_grad():
47
+ outputs = model.generate(
48
+ **inputs,
49
+ max_new_tokens=512,
50
+ temperature=0.7,
51
+ do_sample=True,
52
+ top_p=0.9,
53
+ )
54
+
55
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
56
+
57
+ print("\n" + "="*80)
58
+ print("MODEL RESPONSE:")
59
+ print("="*80)
60
+ print(response)
61
+ print("="*80)
62
+
63
+ if __name__ == "__main__":
64
+ test_model()