CVE Backport Code Generation โ€” Qwen2.5-Coder-32B

Fine-tuned Qwen2.5-Coder-32B-Instruct for security patch backporting via per-hunk code generation.

Instead of generating unified diffs directly, this model takes a vulnerable code region and a fix description, and outputs the fixed version of the code. A programmatic diff then produces the final patch. This plays to LLM strengths in code completion and avoids format-sensitivity issues.

Quick Start

The easiest way to use this model is with the cve-backport-tool CLI, which handles the full pipeline: parse upstream patch, extract per-hunk regions, call the model, and reconstruct a unified diff.

# Download and serve the model
./setup.sh

# Generate a backport patch
python3 cve-backport.py \
    --cve CVE-2025-3887 \
    --package gstreamer-plugins-bad \
    --patch upstream.patch \
    --obs-fetch --backend openai --retry 3

GGUF Downloads

File Quant Size Dataset Notes
cve-backport-codegen-v3-q8_0.gguf Q8_0 33 GB 35,667 examples Recommended โ€” highest precision
cve-backport-codegen-v2-q8_0.gguf Q8_0 33 GB 24,452 examples Previous release
cve-backport-codegen-v1-q8_0.gguf Q8_0 33 GB 17,007 examples First release

Older GGUFs (v1, v2) are available in the legacy repo.

Evaluation

Per-hunk evaluation on held-out test cases the model never saw during training:

Metric v1 v2 v3
Average recall 91% 94% 94%
Average precision โ€” 93% 98%
Exact match โ€” 15/20 16/20
Perfect hunks (>=95%) 16/18 17/20 17/20
Fail (<10%) 1/18 0/20 0/20

By tier:

  • Identical (upstream patch applies directly): 95% recall, 98% precision
  • Adapted (line numbers/context differ): 89% recall, 97% precision

Training Details

v1 v2 v3
Dataset 17,007 24,452 35,667
Learning rate 2e-4 2e-4 1e-4
Epochs 2 2 2
Training time 13h 27h 41h
Hardware H100 NVL H100 NVL H100 NVL
Method QLoRA r=64 ฮฑ=128 QLoRA r=64 ฮฑ=128 QLoRA r=64 ฮฑ=128

v3 uses a lower learning rate (1e-4 vs 2e-4) for stability with the larger dataset, and includes data quality filtering to remove toxic examples (XML test data, huge outputs) that caused training instability in earlier runs.

Prompt Format

ChatML format. Each prompt covers one hunk region with 15 lines of context padding:

System:

You are a security patch backporting assistant.

Given vulnerable source code and a description of the upstream fix, output the FIXED version of the code.

Rules:
- Output ONLY the fixed code, nothing else โ€” no explanations, no markdown fences
- Preserve exact formatting, indentation, and style of the original
- Make ONLY the changes described in the fix โ€” do not modify anything else
- Do not add comments about what you changed

User:

## File: lib/url.c
## Lines: 100-130

\`\`\`c
<vulnerable source code region>
\`\`\`

## Fix
CVE-2024-1234: fix buffer overflow in url_parse()

\`\`\`diff
<upstream patch>
\`\`\`

Assistant: (the fixed source code, raw, no fences)

Training Data

anicka/cve-backport-codegen-dataset โ€” 35,667 per-hunk examples from openSUSE maintenance patches, covering 90+ packages and 2,300+ CVEs.

Intended Use

This model assists with security patch backporting in Linux distribution maintenance. It is a research tool โ€” all generated patches must be reviewed by a maintainer before application. "Applies and builds" validates mechanical correctness, not semantic correctness.

Links

License

Apache-2.0 (inherited from Qwen2.5-Coder-32B-Instruct).

Downloads last month
-
GGUF
Model size
33B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for anicka/cve-backport-codegen-qwen25-32b

Base model

Qwen/Qwen2.5-32B
Quantized
(114)
this model

Dataset used to train anicka/cve-backport-codegen-qwen25-32b