YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

HaiJava-Surgeon-Qwen2.5-Coder-7B-SFT-v1

Model Name: HaiJava-Surgeon-Qwen2.5-Coder-7B-SFT-v1 Model Type: Supervised Fine-Tuned (SFT) - Merged LoRA + Base Model Base Model: Qwen/Qwen2.5-Coder-7B-Instruct Fine-tuning: checkpoint-1000 (1000 training steps on Java bug-fixing) Version: v1.0 Release Date: 2026-01-02 Status: ✅ Ready for Production / Further Training

📊 Model Performance

This model is the result of merging checkpoint-1000 (LoRA adapter) into the base Qwen2.5-Coder-7B-Instruct model.

MultiPL-E Java Benchmark Results

Model	Pass@1	Passed	Total	Improvement
Base Model (Qwen2.5-Coder-7B-Instruct)	67.72%	107	158	Baseline
This Model (Fine-Tuned)	82.28%	130	158	+14.56% ✅

Key Achievements:

✅ +23 problems solved compared to base model
✅ 27 problems where SFT passes but base fails
✅ 103 problems where both models pass

Benchmark Details:

Dataset: MultiPL-E Java (158 programming problems translated from HumanEval)
Evaluation Date: 2026-01-08
Temperature: 0.0 (deterministic)
Max Tokens: 1024

Internal Evaluation Results (50-sample test set)

Metric	Base Model	This Model (Merged)	Improvement
Overall Accuracy	9/50 (18%)	14/50 (28%)	+55.6% ✅
Syntax Errors	6/10 (60%)	9/10 (90%)	+50% ✅
Logic Bugs	3/10 (30%)	4/10 (40%)	+33% ✅
API Misuse	0/10 (0%)	0/10 (0%)	No change
Edge Cases	0/10 (0%)	0/10 (0%)	No change
OOD JavaScript	0/2 (0%)	1/2 (50%)	+50% ✅

Statistical Significance: p-value = 0.0238* (significant at α=0.05)

🎯 Use Cases

1. Further Training

Use this merged model as the base for continued fine-tuning:

# LLaMA-Factory training config
model_name_or_path: ./HaiJava-Surgeon-Qwen2.5-Coder-7B-SFT-v1
finetuning_type: lora  # Can apply new LoRA on top
lora_target: q_proj,v_proj

Benefits:

Start from improved baseline (28% accuracy vs 18%)
No adapter overhead during training
Can apply new LoRA adapters for specialized tasks

2. Direct Inference

Use for production inference without adapter loading:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "./HaiJava-Surgeon-Qwen2.5-Coder-7B-SFT-v1",
    torch_dtype=torch.float16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("./HaiJava-Surgeon-Qwen2.5-Coder-7B-SFT-v1")

# No adapter loading needed!

Benefits:

Faster loading (no adapter merge at runtime)
Simpler deployment (single model, no adapter files)
Same performance as base + adapter

3. Production Deployment

Deploy directly to production environments:

# Copy to deployment server
scp -r HaiJava-Surgeon-Qwen2.5-Coder-7B-SFT-v1 user@server:/models/

# Use in production
python inference_server.py --model /models/HaiJava-Surgeon-Qwen2.5-Coder-7B-SFT-v1

📁 Model Files

File	Size	Description
`model-00001-of-00004.safetensors`	~3.5GB	Model weights (shard 1)
`model-00002-of-00004.safetensors`	~3.5GB	Model weights (shard 2)
`model-00003-of-00004.safetensors`	~3.5GB	Model weights (shard 3)
`model-00004-of-00004.safetensors`	~3.5GB	Model weights (shard 4)
`config.json`	~1KB	Model configuration
`tokenizer.json`	~7MB	Tokenizer vocabulary
`generation_config.json`	~1KB	Generation parameters

Total Size: ~14GB

🔧 Training Details

Original LoRA Training (checkpoint-1000)

Training Steps: 1000
LoRA Rank (r): 16
LoRA Alpha: 32
Target Modules: q_proj, v_proj
Dropout: 0.05
Training Data: Java bug-fixing samples

Merge Process

Method: merge_and_unload() from PEFT library
Precision: float16
Merge Date: 2026-01-02
Verification: Passed (model loads successfully)

🚀 Quick Start

Load for Inference

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model
model = AutoModelForCausalLM.from_pretrained(
    "./HaiJava-Surgeon-Qwen2.5-Coder-7B-SFT-v1",
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    "./HaiJava-Surgeon-Qwen2.5-Coder-7B-SFT-v1",
    trust_remote_code=True
)

# Generate
prompt = "Fix the bug in this Java code: int x = 10"
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)

Load for Further Training

from transformers import AutoModelForCausalLM, TrainingArguments
from peft import LoraConfig, get_peft_model

# Load merged model as base
base_model = AutoModelForCausalLM.from_pretrained(
    "./HaiJava-Surgeon-Qwen2.5-Coder-7B-SFT-v1",
    torch_dtype=torch.float16,
    device_map="auto"
)

# Apply new LoRA for specialized training
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],  # Can expand targets
    lora_dropout=0.05,
    task_type="CAUSAL_LM"
)

model = get_peft_model(base_model, lora_config)

# Continue training...

📊 Comparison with Alternatives

Model	Exact Match	Pros	Cons
Base Model	9/50 (18%)	General purpose	Lower accuracy on Java bugs
Base + LoRA Adapter	14/50 (28%)	Modular, smaller files	Requires adapter loading
This Merged Model	14/50 (28%)	✅ Fast loading ✅ Simple deployment ✅ Ready for more training	Larger file size (~14GB)

⚠️ Known Limitations

Based on evaluation, this model still struggles with:

API Misuse Detection (0% accuracy)
Edge Case Handling (0% accuracy)
Null Pointer Exception Fixes (0% accuracy)
Python Bug Fixing (0% accuracy on OOD samples)

Recommendation: Continue training with more diverse samples focusing on these categories.

📚 Related Files

Evaluation Report: ../local_inference/CHECKPOINT_COMPARISON_54_vs_1000.md
Original LoRA Checkpoint: ../checkpoint-1000/
Merge Script: ../merge_lora_to_base.py
Evaluation Results: ../local_inference/evaluation_results_sequential_*.json

🔄 Version History

Version	Date	Description
v1.0	2026-01-02	Initial merge of checkpoint-1000 into base model

📝 License

Inherits license from base model: Qwen/Qwen2.5-Coder-7B-Instruct

🙏 Acknowledgments

Base Model: Qwen Team (Alibaba Cloud)
Fine-tuning Framework: LLaMA-Factory
Evaluation Framework: Custom 50-sample test suite

For questions or issues, refer to the evaluation documentation in local_inference/

Downloads last month: 3

Safetensors

Model size

8B params

Tensor type

F16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Haiintel/HaiJava-Surgeon-Qwen2.5-Coder-7B-SFT-v1

Quantizations

2 models