YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

HaiJava-Surgeon-Qwen2.5-Coder-7B-SFT-v1

Model Name: HaiJava-Surgeon-Qwen2.5-Coder-7B-SFT-v1 Model Type: Supervised Fine-Tuned (SFT) - Merged LoRA + Base Model Base Model: Qwen/Qwen2.5-Coder-7B-Instruct Fine-tuning: checkpoint-1000 (1000 training steps on Java bug-fixing) Version: v1.0 Release Date: 2026-01-02 Status: βœ… Ready for Production / Further Training


πŸ“Š Model Performance

This model is the result of merging checkpoint-1000 (LoRA adapter) into the base Qwen2.5-Coder-7B-Instruct model.

MultiPL-E Java Benchmark Results

Model Pass@1 Passed Total Improvement
Base Model (Qwen2.5-Coder-7B-Instruct) 67.72% 107 158 Baseline
This Model (Fine-Tuned) 82.28% 130 158 +14.56% βœ…

Key Achievements:

  • βœ… +23 problems solved compared to base model
  • βœ… 27 problems where SFT passes but base fails
  • βœ… 103 problems where both models pass

Benchmark Details:

  • Dataset: MultiPL-E Java (158 programming problems translated from HumanEval)
  • Evaluation Date: 2026-01-08
  • Temperature: 0.0 (deterministic)
  • Max Tokens: 1024

Internal Evaluation Results (50-sample test set)

Metric Base Model This Model (Merged) Improvement
Overall Accuracy 9/50 (18%) 14/50 (28%) +55.6% βœ…
Syntax Errors 6/10 (60%) 9/10 (90%) +50% βœ…
Logic Bugs 3/10 (30%) 4/10 (40%) +33% βœ…
API Misuse 0/10 (0%) 0/10 (0%) No change
Edge Cases 0/10 (0%) 0/10 (0%) No change
OOD JavaScript 0/2 (0%) 1/2 (50%) +50% βœ…

Statistical Significance: p-value = 0.0238* (significant at Ξ±=0.05)


🎯 Use Cases

1. Further Training

Use this merged model as the base for continued fine-tuning:

# LLaMA-Factory training config
model_name_or_path: ./HaiJava-Surgeon-Qwen2.5-Coder-7B-SFT-v1
finetuning_type: lora  # Can apply new LoRA on top
lora_target: q_proj,v_proj

Benefits:

  • Start from improved baseline (28% accuracy vs 18%)
  • No adapter overhead during training
  • Can apply new LoRA adapters for specialized tasks

2. Direct Inference

Use for production inference without adapter loading:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "./HaiJava-Surgeon-Qwen2.5-Coder-7B-SFT-v1",
    torch_dtype=torch.float16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("./HaiJava-Surgeon-Qwen2.5-Coder-7B-SFT-v1")

# No adapter loading needed!

Benefits:

  • Faster loading (no adapter merge at runtime)
  • Simpler deployment (single model, no adapter files)
  • Same performance as base + adapter

3. Production Deployment

Deploy directly to production environments:

# Copy to deployment server
scp -r HaiJava-Surgeon-Qwen2.5-Coder-7B-SFT-v1 user@server:/models/

# Use in production
python inference_server.py --model /models/HaiJava-Surgeon-Qwen2.5-Coder-7B-SFT-v1

πŸ“ Model Files

File Size Description
model-00001-of-00004.safetensors ~3.5GB Model weights (shard 1)
model-00002-of-00004.safetensors ~3.5GB Model weights (shard 2)
model-00003-of-00004.safetensors ~3.5GB Model weights (shard 3)
model-00004-of-00004.safetensors ~3.5GB Model weights (shard 4)
config.json ~1KB Model configuration
tokenizer.json ~7MB Tokenizer vocabulary
generation_config.json ~1KB Generation parameters

Total Size: ~14GB


πŸ”§ Training Details

Original LoRA Training (checkpoint-1000)

  • Training Steps: 1000
  • LoRA Rank (r): 16
  • LoRA Alpha: 32
  • Target Modules: q_proj, v_proj
  • Dropout: 0.05
  • Training Data: Java bug-fixing samples

Merge Process

  • Method: merge_and_unload() from PEFT library
  • Precision: float16
  • Merge Date: 2026-01-02
  • Verification: Passed (model loads successfully)

πŸš€ Quick Start

Load for Inference

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model
model = AutoModelForCausalLM.from_pretrained(
    "./HaiJava-Surgeon-Qwen2.5-Coder-7B-SFT-v1",
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    "./HaiJava-Surgeon-Qwen2.5-Coder-7B-SFT-v1",
    trust_remote_code=True
)

# Generate
prompt = "Fix the bug in this Java code: int x = 10"
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)

Load for Further Training

from transformers import AutoModelForCausalLM, TrainingArguments
from peft import LoraConfig, get_peft_model

# Load merged model as base
base_model = AutoModelForCausalLM.from_pretrained(
    "./HaiJava-Surgeon-Qwen2.5-Coder-7B-SFT-v1",
    torch_dtype=torch.float16,
    device_map="auto"
)

# Apply new LoRA for specialized training
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],  # Can expand targets
    lora_dropout=0.05,
    task_type="CAUSAL_LM"
)

model = get_peft_model(base_model, lora_config)

# Continue training...

πŸ“Š Comparison with Alternatives

Model Exact Match Pros Cons
Base Model 9/50 (18%) General purpose Lower accuracy on Java bugs
Base + LoRA Adapter 14/50 (28%) Modular, smaller files Requires adapter loading
This Merged Model 14/50 (28%) βœ… Fast loading
βœ… Simple deployment
βœ… Ready for more training
Larger file size (~14GB)

⚠️ Known Limitations

Based on evaluation, this model still struggles with:

  • API Misuse Detection (0% accuracy)
  • Edge Case Handling (0% accuracy)
  • Null Pointer Exception Fixes (0% accuracy)
  • Python Bug Fixing (0% accuracy on OOD samples)

Recommendation: Continue training with more diverse samples focusing on these categories.


πŸ“š Related Files

  • Evaluation Report: ../local_inference/CHECKPOINT_COMPARISON_54_vs_1000.md
  • Original LoRA Checkpoint: ../checkpoint-1000/
  • Merge Script: ../merge_lora_to_base.py
  • Evaluation Results: ../local_inference/evaluation_results_sequential_*.json

πŸ”„ Version History

Version Date Description
v1.0 2026-01-02 Initial merge of checkpoint-1000 into base model

πŸ“ License

Inherits license from base model: Qwen/Qwen2.5-Coder-7B-Instruct


πŸ™ Acknowledgments

  • Base Model: Qwen Team (Alibaba Cloud)
  • Fine-tuning Framework: LLaMA-Factory
  • Evaluation Framework: Custom 50-sample test suite

For questions or issues, refer to the evaluation documentation in local_inference/

Downloads last month
107
Safetensors
Model size
8B params
Tensor type
F16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Haiintel/HaiJava-Surgeon-Qwen2.5-Coder-7B-SFT-v1

Quantizations
2 models