Haiintel's picture
Upload folder using huggingface_hub
482c9a9 verified
metadata
language:
  - en
  - code
license: apache-2.0
tags:
  - code
  - java
  - bug-fixing
  - code-repair
  - qwen2.5
  - supervised-fine-tuning
base_model: Qwen/Qwen2.5-Coder-7B-Instruct
model_type: qwen2
pipeline_tag: text-generation

HaiJava-Surgeon-Qwen2.5-Coder-7B-SFT-v1

Model Description

HaiJava-Surgeon-Qwen2.5-Coder-7B-SFT-v1 is a specialized code repair model fine-tuned on Java bug-fixing tasks. It is based on Qwen2.5-Coder-7B-Instruct and has been supervised fine-tuned (SFT) with LoRA adapters merged into the base model for optimal performance.

Model Details

  • Model Name: HaiJava-Surgeon-Qwen2.5-Coder-7B-SFT-v1
  • Version: v1.0
  • Base Model: Qwen/Qwen2.5-Coder-7B-Instruct
  • Model Type: Causal Language Model (Decoder-only Transformer)
  • Architecture: Qwen2ForCausalLM
  • Parameters: 7.61B
  • Precision: FP16
  • Context Length: 32,768 tokens
  • Fine-tuning Method: LoRA (Low-Rank Adaptation) merged into base
  • Training Steps: 1,000
  • Release Date: 2026-01-02

Intended Use

This model is designed for:

  • Java bug detection and repair
  • Syntax error correction
  • Logic bug fixing
  • Code quality improvement
  • Automated code review assistance

Performance

Evaluated on a 50-sample diverse test set:

Metric Base Model HaiJava-Surgeon v1 Improvement
Overall Accuracy 18% 28% +55.6%
Syntax Errors 60% 90% +50%
Logic Bugs 30% 40% +33%

Statistical Significance: p-value = 0.0238* (significant at α=0.05)

Strengths

  • Excellent at syntax error detection and repair (90% accuracy)
  • Good at logic bug fixing (40% accuracy)
  • ✅ Shows generalization to JavaScript (50% accuracy on OOD samples)

Limitations

  • ⚠️ Struggles with API misuse detection (0% accuracy)
  • ⚠️ Limited edge case handling (0% accuracy)
  • ⚠️ Needs improvement on null pointer exception fixes (0% accuracy)
  • ⚠️ Limited Python support (0% accuracy on OOD samples)

Usage

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "HaiJava-Surgeon-Qwen2.5-Coder-7B-SFT-v1",
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

tokenizer = AutoTokenizer.from_pretrained(
    "HaiJava-Surgeon-Qwen2.5-Coder-7B-SFT-v1",
    trust_remote_code=True
)

# Prepare input
buggy_code = """
public class Example {
    public static void main(String[] args) {
        int x = 10
        System.out.println(x);
    }
}
"""

prompt = f"Fix the bug in the following Java code:\n\n{buggy_code}"
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

# Generate fix
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    do_sample=False,
    pad_token_id=tokenizer.eos_token_id
)

# Decode response
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)

Recommended Generation Parameters

For Maximum Accuracy:

outputs = model.generate(
    **inputs,
    max_new_tokens=1024,
    do_sample=False,      # Greedy decoding
    num_beams=5,          # Beam search
)

For Speed:

outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    do_sample=False,
)

Training Details

Training Data

  • Domain: Java bug-fixing
  • Categories: Syntax errors, logic bugs, API misuse, edge cases, null handling
  • Training Steps: 1,000

Training Configuration

  • Method: LoRA (Low-Rank Adaptation)
  • LoRA Rank (r): 16
  • LoRA Alpha: 32
  • Target Modules: q_proj, v_proj
  • Dropout: 0.05
  • Optimizer: AdamW
  • Learning Rate: 5e-5

Hardware

  • GPU: NVIDIA GPU with CUDA support
  • Training Time: ~2-3 hours
  • Framework: LLaMA-Factory + PyTorch

Evaluation

Evaluated on 50 diverse samples covering:

  • Syntax errors (10 samples)
  • Logic bugs (10 samples)
  • API misuse (10 samples)
  • Edge cases (10 samples)
  • Null handling (5 samples)
  • Out-of-distribution: JavaScript (2 samples)
  • Out-of-distribution: Python (3 samples)

Evaluation Metrics:

  • Exact match accuracy
  • Normalized edit distance
  • Statistical significance testing (paired t-test)

License

This model inherits the license from the base model: Qwen/Qwen2.5-Coder-7B-Instruct

Citation

If you use this model, please cite:

@misc{haijava-surgeon-v1,
  title={HaiJava-Surgeon-Qwen2.5-Coder-7B-SFT-v1: A Specialized Java Bug-Fixing Model},
  author={Your Name/Organization},
  year={2026},
  url={https://huggingface.co/your-username/HaiJava-Surgeon-Qwen2.5-Coder-7B-SFT-v1}
}

Acknowledgments

  • Base Model: Qwen Team (Alibaba Cloud)
  • Fine-tuning Framework: LLaMA-Factory
  • Evaluation: Custom 50-sample test suite