---
license: apache-2.0
base_model: Qwen/Qwen2.5-Coder-7B-Instruct
tags:
- code
- fine-tune
- qwen
- coding-assistant
- gguf
language:
- en
pipeline_tag: text-generation
datasets:
- AronDaron/dataset-gen-v2
---

# Qwen2.5-Coder-7B-Instruct — Dataset Generator V2 Fine-tune

Fine-tuned version of [Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) 
trained on [Dataset Generator V2](https://huggingface.co/datasets/AronDaron/dataset-gen-v2) 
— synthetic coding dataset generated with [Dataset Generator](https://github.com/AronDaron/dataset-generator).

## Benchmark Results

| Model | HumanEval | HumanEval+ |
|---|---|---|
| Base Qwen2.5-Coder-7B-Instruct | 55.5% (±2.1) | 49.0% (±1.9) |
| **This model (FT V2)** | **60.0% (±0.9)** | **54.0% (±1.8)** |

**+4.5pp on HumanEval, +5.0pp on HumanEval+** vs base — error bars don't 
overlap, statistically significant improvement (5 runs averaged).

<img src="./benchmark-v2.png" alt="Benchmark" width="600">

## Training

- **Method:** QLoRA fine-tuning via Unsloth
- **Base model:** Qwen2.5-Coder-7B-Instruct
- **Dataset:** Dataset Generator V2 (1,135 multi-turn examples)
- **Hardware:** RTX 4070 Ti 12GB
- **Quantization:** Q4_K_M GGUF (quantized by Unsloth)
- **Chat template:** ChatML (embedded in GGUF)
- **Context length:** 32,768 tokens
- **Evaluation:** 5 runs on HumanEval/HumanEval+ at temp 0.2

Training logs and exact hyperparameters were not preserved — this was 
an exploratory fine-tune.

## Training Data

Trained on [Dataset Generator V2](https://huggingface.co/datasets/AronDaron/dataset-gen-v2) 
— 1,135 multi-turn conversations across 8 coding categories:

- Code Generation & Debugging
- API, DevOps & Infrastructure
- Architecture, Testing & Refactoring
- Terminal, CLI & Tooling
- Algorithms & Data Manipulation
- Data Processing & Transformation
- Code Reasoning & Review
- Practical Multi-step Problem Solving

See the [dataset card](https://huggingface.co/datasets/AronDaron/dataset-gen-v2) 
for full details including generation models and methodology.

## Limitations

- **Optimized for algorithmic coding and reasoning** — shows measurable 
  improvement on HumanEval/HumanEval+
- **Not optimized for library-heavy workflows** (pandas, numpy, requests) — 
  for those use cases, train on a dataset with library-focused categories 
  using [Dataset Generator](https://github.com/AronDaron/dataset-generator)
- **Multi-turn conversational style** — produces explanations alongside code

## Support

If this helped you:
- Ko-fi: https://ko-fi.com/arondaron
- ETH: 0xA6910bDa2a89ee38cA42883e365BB2DdFba3C2A1
- BTC: bc1qamarkursch3x8399qaly4md32ck5xgthnr9jpl
- SOL: 797jTzFRm9dd4joHPqvUjryeXi5rPbMwG6Rqj3wJrgMt

## License

Apache-2.0 — inherited from base model Qwen2.5-Coder-7B-Instruct.

Built with [Dataset Generator](https://github.com/AronDaron/dataset-generator).