CVE Backport Code Generation โ€” Qwen2.5-Coder-32B (v4)

Fine-tuned Qwen2.5-Coder-32B-Instruct for security patch backporting via per-hunk code generation, with CVE test case generation.

Instead of generating unified diffs, this model takes a vulnerable code region and a fix description, and outputs the fixed version of the code. A programmatic diff then produces the final patch. Optionally, the model can also generate a test case that verifies the fix.

Quick Start

git clone https://github.com/openSUSE/cve-backport-tool
cd cve-backport-tool
./setup.sh                  # downloads GGUF, registers with ollama

python3 cve-backport.py \
    --cve CVE-2024-1234 \
    --package curl \
    --patch upstream-fix.patch \
    --obs-fetch --obs-project openSUSE:Leap:15.6:Update \
    --retry 3

GGUF Downloads

File Quant Size Notes
cve-backport-codegen-v4-q8_0.gguf Q8_0 33 GB Recommended (v4, 36K dataset + test generation)
cve-backport-codegen-v3-q8_0.gguf Q8_0 33 GB v3 (35K dataset, 98% precision)

Evaluation (v4)

Per-hunk evaluation on 100 held-out examples the model never saw during training:

Metric v3 (n=20) v4 (n=100)
Average recall 94% 93%
Average precision 98% 95%
Exact match 16/20 87/100
Failures (<10%) 0/20 4/100

By tier:

  • Identical (upstream patch applies directly): 94% recall
  • Adapted (line numbers/context differ): 86% recall

Test Generation (new in v4)

50 held-out CVEs with known reference tests:

  • Average quality score: 0.67
  • All 50 produced structurally valid tests
  • 17/50 matched reference test exactly

Comparison with Frontier Models

Same eval, same 100 examples, optimized prompts with markdown stripping:

Model Recall Precision Exact Failures
CVE Backport v4 (32B fine-tuned) 93% 95% 87/100 4
Gemini 3.1 Pro (frontier, zero-shot) 27% 24% 10/100 50
Gemini 2.0 Flash (frontier, zero-shot) 13% 17% 4/100 81

Fine-tuning on 36K domain-specific examples outperforms frontier models by 3-7x on this task.

Prompt Format

ChatML format. Each prompt covers one hunk region with 15 lines of context padding.

Code Generation (3-turn)

System:

You are a security patch backporting assistant.

Given vulnerable source code and a description of the upstream fix, output the FIXED version of the code.

Rules:
- Output ONLY the fixed code, nothing else โ€” no explanations, no markdown fences
- Preserve exact formatting, indentation, and style of the original
- Make ONLY the changes described in the fix โ€” do not modify anything else
- Do not add comments about what you changed

User:

## File: lib/ftp.c
## Lines: 2836-2912

```c
{vulnerable code region with 15-line padding}
```

## Fix
CVE-2017-8817: FTP wildcard matching โ€” zero terminate the entry path

```diff
{upstream patch}
```

Assistant: The fixed code (same region with the security fix applied).

Test Generation (5-turn, new in v4)

After the code generation turn, an optional second turn:

User:

Write a test case that:
1. Triggers the vulnerability in the original code above
2. Passes after applying your fix

Output ONLY the test code, nothing else.

Assistant: Test code targeting the specific CVE.

Training

Base model Qwen2.5-Coder-32B-Instruct
Method QLoRA (4-bit NF4, r=64, alpha=128)
Epochs 2
Learning rate 1e-4
Max sequence length 4,096 tokens
Batch size 1 (gradient accumulation 8)
Training examples 36,166 (35,396 codegen + 770 codegen+test)
Training time 41.2 hours
Hardware 2x NVIDIA H100 NVL 94GB
Label masking Multi-turn aware (both assistant segments trained)

Training Data

openSUSE/cve-backport-codegen-dataset โ€” 36,166 per-hunk examples from openSUSE maintenance patches, covering 145+ packages and 2,300+ CVEs, with per-example SPDX license metadata.

Reproducibility

Trained using the Teapot composable training pipeline:

teapot compose configs/cve-backport.config
teapot train configs/cve-backport.config --backend qlora-hf
teapot eval configs/cve-backport.config

Dataset: openSUSE/cve-backport-codegen-dataset (train.jsonl + eval.jsonl).

Intended Use

This model assists with security patch backporting in Linux distribution maintenance. It is a research tool โ€” all generated patches must be reviewed by a maintainer before application.

Important: This model was fine-tuned for code generation accuracy, not for safety alignment. It inherits the base model's safety training but has no additional guardrails. In particular:

  • The model follows fix descriptions literally. If the fix description contains malicious instructions (e.g., "add a backdoor"), the model will comply. Fix descriptions must come from trusted sources โ€” typically upstream patches, not user input.
  • The tool is designed for use with trusted inputs (upstream CVE patches, OBS source packages). It should not be exposed as a public API without input validation.
  • Generated patches and test cases must always be reviewed by a maintainer before application.

Adding safety training to the fine-tuning was considered but deliberately deferred โ€” our evaluation showed that domain precision (98% in v3) is sensitive to training data composition, and mixing safety examples risks degrading the model's core capability. The correct mitigation is input validation in the tool, not model-level refusal.

Known Issues

  • Prompt echo (v4): The v4 model occasionally echoes prompt structure (## File:, markdown fences) into its code output, likely from the 5-turn test generation training data. The CLI tool strips these automatically. This is a minor regression from v3.
  • Test generation quality varies: Test cases for simple vulnerability patterns (null deref, bounds check, injection) are useful. For complex multi-file patches with adapted context, the model may produce generic placeholder tests.

License

Apache-2.0 (inherited from Qwen2.5-Coder-32B-Instruct).

Downloads last month
25
GGUF
Model size
33B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for openSUSE/CVE-Backport-Qwen2.5-Coder-32B

Base model

Qwen/Qwen2.5-32B
Quantized
(116)
this model

Dataset used to train openSUSE/CVE-Backport-Qwen2.5-Coder-32B