File size: 7,337 Bytes
482c9a9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
# HaiJava-Surgeon-Qwen2.5-Coder-7B-SFT-v1

**Model Name**: HaiJava-Surgeon-Qwen2.5-Coder-7B-SFT-v1
**Model Type**: Supervised Fine-Tuned (SFT) - Merged LoRA + Base Model
**Base Model**: Qwen/Qwen2.5-Coder-7B-Instruct
**Fine-tuning**: checkpoint-1000 (1000 training steps on Java bug-fixing)
**Version**: v1.0
**Release Date**: 2026-01-02
**Status**: βœ… Ready for Production / Further Training

---

## πŸ“Š Model Performance

This model is the result of merging checkpoint-1000 (LoRA adapter) into the base Qwen2.5-Coder-7B-Instruct model.

### MultiPL-E Java Benchmark Results

| Model | Pass@1 | Passed | Total | Improvement |
|-------|--------|--------|-------|-------------|
| **Base Model (Qwen2.5-Coder-7B-Instruct)** | 67.72% | 107 | 158 | Baseline |
| **This Model (Fine-Tuned)** | **82.28%** | **130** | **158** | **+14.56%** βœ… |

**Key Achievements**:
- βœ… **+23 problems solved** compared to base model
- βœ… **27 problems** where SFT passes but base fails
- βœ… **103 problems** where both models pass

**Benchmark Details**:
- **Dataset**: MultiPL-E Java (158 programming problems translated from HumanEval)
- **Evaluation Date**: 2026-01-08
- **Temperature**: 0.0 (deterministic)
- **Max Tokens**: 1024

### Internal Evaluation Results (50-sample test set)

| Metric | Base Model | This Model (Merged) | Improvement |
|--------|-----------|---------------------|-------------|
| **Overall Accuracy** | 9/50 (18%) | 14/50 (28%) | **+55.6%** βœ… |
| **Syntax Errors** | 6/10 (60%) | 9/10 (90%) | **+50%** βœ… |
| **Logic Bugs** | 3/10 (30%) | 4/10 (40%) | **+33%** βœ… |
| **API Misuse** | 0/10 (0%) | 0/10 (0%) | No change |
| **Edge Cases** | 0/10 (0%) | 0/10 (0%) | No change |
| **OOD JavaScript** | 0/2 (0%) | 1/2 (50%) | **+50%** βœ… |

**Statistical Significance**: p-value = 0.0238* (significant at Ξ±=0.05)

---

## 🎯 Use Cases

### 1. Further Training
Use this merged model as the base for continued fine-tuning:

```yaml
# LLaMA-Factory training config
model_name_or_path: ./HaiJava-Surgeon-Qwen2.5-Coder-7B-SFT-v1
finetuning_type: lora  # Can apply new LoRA on top
lora_target: q_proj,v_proj
```

**Benefits**:
- Start from improved baseline (28% accuracy vs 18%)
- No adapter overhead during training
- Can apply new LoRA adapters for specialized tasks

### 2. Direct Inference
Use for production inference without adapter loading:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "./HaiJava-Surgeon-Qwen2.5-Coder-7B-SFT-v1",
    torch_dtype=torch.float16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("./HaiJava-Surgeon-Qwen2.5-Coder-7B-SFT-v1")

# No adapter loading needed!
```

**Benefits**:
- Faster loading (no adapter merge at runtime)
- Simpler deployment (single model, no adapter files)
- Same performance as base + adapter

### 3. Production Deployment
Deploy directly to production environments:

```bash
# Copy to deployment server
scp -r HaiJava-Surgeon-Qwen2.5-Coder-7B-SFT-v1 user@server:/models/

# Use in production
python inference_server.py --model /models/HaiJava-Surgeon-Qwen2.5-Coder-7B-SFT-v1
```

---

## πŸ“ Model Files

| File | Size | Description |
|------|------|-------------|
| `model-00001-of-00004.safetensors` | ~3.5GB | Model weights (shard 1) |
| `model-00002-of-00004.safetensors` | ~3.5GB | Model weights (shard 2) |
| `model-00003-of-00004.safetensors` | ~3.5GB | Model weights (shard 3) |
| `model-00004-of-00004.safetensors` | ~3.5GB | Model weights (shard 4) |
| `config.json` | ~1KB | Model configuration |
| `tokenizer.json` | ~7MB | Tokenizer vocabulary |
| `generation_config.json` | ~1KB | Generation parameters |

**Total Size**: ~14GB

---

## πŸ”§ Training Details

### Original LoRA Training (checkpoint-1000)
- **Training Steps**: 1000
- **LoRA Rank (r)**: 16
- **LoRA Alpha**: 32
- **Target Modules**: q_proj, v_proj
- **Dropout**: 0.05
- **Training Data**: Java bug-fixing samples

### Merge Process
- **Method**: `merge_and_unload()` from PEFT library
- **Precision**: float16
- **Merge Date**: 2026-01-02
- **Verification**: Passed (model loads successfully)

---

## πŸš€ Quick Start

### Load for Inference
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model
model = AutoModelForCausalLM.from_pretrained(
    "./HaiJava-Surgeon-Qwen2.5-Coder-7B-SFT-v1",
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    "./HaiJava-Surgeon-Qwen2.5-Coder-7B-SFT-v1",
    trust_remote_code=True
)

# Generate
prompt = "Fix the bug in this Java code: int x = 10"
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)
```

### Load for Further Training
```python
from transformers import AutoModelForCausalLM, TrainingArguments
from peft import LoraConfig, get_peft_model

# Load merged model as base
base_model = AutoModelForCausalLM.from_pretrained(
    "./HaiJava-Surgeon-Qwen2.5-Coder-7B-SFT-v1",
    torch_dtype=torch.float16,
    device_map="auto"
)

# Apply new LoRA for specialized training
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],  # Can expand targets
    lora_dropout=0.05,
    task_type="CAUSAL_LM"
)

model = get_peft_model(base_model, lora_config)

# Continue training...
```

---

## πŸ“Š Comparison with Alternatives

| Model | Exact Match | Pros | Cons |
|-------|-------------|------|------|
| **Base Model** | 9/50 (18%) | General purpose | Lower accuracy on Java bugs |
| **Base + LoRA Adapter** | 14/50 (28%) | Modular, smaller files | Requires adapter loading |
| **This Merged Model** | 14/50 (28%) | βœ… Fast loading<br/>βœ… Simple deployment<br/>βœ… Ready for more training | Larger file size (~14GB) |

---

## ⚠️ Known Limitations

Based on evaluation, this model still struggles with:
- **API Misuse Detection** (0% accuracy)
- **Edge Case Handling** (0% accuracy)
- **Null Pointer Exception Fixes** (0% accuracy)
- **Python Bug Fixing** (0% accuracy on OOD samples)

**Recommendation**: Continue training with more diverse samples focusing on these categories.

---

## πŸ“š Related Files

- **Evaluation Report**: `../local_inference/CHECKPOINT_COMPARISON_54_vs_1000.md`
- **Original LoRA Checkpoint**: `../checkpoint-1000/`
- **Merge Script**: `../merge_lora_to_base.py`
- **Evaluation Results**: `../local_inference/evaluation_results_sequential_*.json`

---

## πŸ”„ Version History

| Version | Date | Description |
|---------|------|-------------|
| v1.0 | 2026-01-02 | Initial merge of checkpoint-1000 into base model |

---

## πŸ“ License

Inherits license from base model: Qwen/Qwen2.5-Coder-7B-Instruct

---

## πŸ™ Acknowledgments

- **Base Model**: Qwen Team (Alibaba Cloud)
- **Fine-tuning Framework**: LLaMA-Factory
- **Evaluation Framework**: Custom 50-sample test suite

---

**For questions or issues, refer to the evaluation documentation in `local_inference/`**