CVE Backport Model — Phase 5 (Qwen2.5-Coder-32B QLoRA)

A security patch backporting assistant fine-tuned to adapt upstream CVE fixes for SUSE/openSUSE package versions. Given a CVE description and an upstream patch, the model generates a backported patch in unified diff format.

Model Details

Base model: Qwen/Qwen2.5-Coder-32B-Instruct
Method: QLoRA (4-bit NF4, LoRA r=16, alpha=32, dropout=0.05)
Training data: anicka/cve-backport-v5-dataset — 21,173 train / 2,361 eval examples across 994 packages and 2,055 CVEs
GGUF quantizations: anicka/cve-backport-qwen32b-phase5-gguf (Q4_K_M 19GB, Q8_0 33GB)

Training

Parameter	Value
Epochs	1
Steps	1,184
Batch size	1 (grad accum 16 = effective 16)
Learning rate	1e-4
Max sequence length	4,096 tokens
LoRA rank / alpha	16 / 32
LoRA targets	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Hardware	NVIDIA H100 SXM 80GB (vast.ai)
Training time	~17 hours
Final train loss	0.004

Loss Curve

Step	Loss
10	0.053
100	0.016
300	0.010
600	0.007
1000	0.004
1184	0.004

Dataset

The v5 dataset contains real SUSE/openSUSE security patches paired with their upstream fixes:

23,534 examples (21,173 train + 2,361 eval, zero CVE overlap)
994 packages, 2,055 CVEs
39% with source context (3,536 examples include surrounding code)
89% identical tier (upstream applies directly), 11% adapted (requires modification)
Languages: C (75%), C++ (10%), Python (7%), shell (3%), Java (2%), JS (1%), Go (1%), other (1%)
100% SPDX license coverage — GPL-2.x (30%), LGPL (17%), GPL-3.x (15%), MIT (11%), BSD (8%), Apache (5%), other (14%)

Evaluation

All evaluation uses the Q4_K_M GGUF quantization via ollama (temperature=0.1, num_ctx=4096). The eval set has zero CVE overlap with training — completely held-out examples.

50-Package Benchmark (no source context)

Tested on 50 unique packages from the held-out eval set (49 identical-tier, 1 adapted-tier):

Metric	Result
Exact match	49/50 (98%)
High similarity (>=90%)	50/50 (100%)
Valid diff format	50/50 (100%)
Avg time per test	17s

Identical-tier: 49/49 (100%) exact match — character-for-character reproduction
Adapted-tier: 1/1 at 93% similarity (libxml2 — reasonable adaptation)
Languages tested: C, C++, Go, Python, JavaScript — all 100% exact on identical tier

Source-Context Adapted Patches (the hard test)

14 tests where the prompt includes target SUSE source code and the model must adapt the upstream patch (12 adapted-tier, 2 identical-tier):

Metric	Adapted (12)	Identical (2)
Exact match	1/12 (8%)	2/2 (100%)
High similarity (>=90%)	7/12 (58%)	2/2 (100%)
Avg similarity	81.6%	100%
Valid diff format	12/12 (100%)	2/2 (100%)

Per-package adapted results (sorted by similarity):

Package	CVE	Similarity
libxml2	—	100% (exact)
php5	CVE-2017-9224	96%
gnutls	—	95%
libxml2	—	93%
libssh	CVE-2018-10933	92%
openssh	CVE-2018-15473	91%
gnutls	—	91%
php5	CVE-2017-9224	88%
libxml2	—	86%
ImageMagick	—	74%
libssh	CVE-2018-10933	47%
libssh	CVE-2018-10933	26%

Most adapted patches achieve 86-100% similarity. The two low-scoring libssh cases involve complex multi-function adaptations where the model generates plausible but structurally different patches.

Patch Application to Real Source Trees

We tested whether the 12 non-exact adapted patches actually apply to real SUSE source trees using patch --dry-run with various strip levels and fuzz=3:

Status	Reference patches	Model patches
Fully applies	8/12	2/12
Partially applies	0/12	2/12
Fails	4/12	8/12

4 patches fail for both reference and model (wrong source version available for testing). Comparing only the 8 cases where the reference applies:

Model result	Count	Examples
Fully applies	2/8	php5 (96% sim), libssh (92% sim)
Partially applies	2/8	gnutls 1/3 hunks (95% sim), openssh 3/5 hunks (91% sim)
Fails	4/8	gnutls (91%), libxml2 (86%), libssh (47%), libssh (26%)

Failure modes for model patches that don't apply:

Wrong context lines — model assumes variables/lines exist that don't in the target version (e.g., modifying int version; when no such variable exists)
Wrong line numbers — hunk offsets differ by 6-57 lines from the correct position
Wrong diff format — outputs diff --git a/file b/file instead of Index: pkg-version/file with .orig paths (SUSE quilt format)
Wrong version in paths — e.g., libssh-0.7.3 instead of libssh-0.7.5

The model consistently gets the code changes right for high-similarity patches (>90%) but sometimes gets the location wrong (line numbers, file paths, diff format).

Fresh OpenSSL CVEs (January 2026 — completely unseen)

Tested on 6 CVEs from the January 27, 2026 OpenSSL advisory (none in training data, upstream 3.4+/3.6+ backported to SUSE's OpenSSL 3.1.4):

CVE	Severity	Description	Result
CVE-2025-69421	Low	PKCS12 NULL pointer dereference	Valid diff, exact
CVE-2025-69420	Low	ASN1_TYPE type confusion	Valid diff, exact
CVE-2025-69418	Low	OCB trailing bytes	Valid diff, exact
CVE-2025-15467	High	AEAD CMS IV length validation	Valid diff, exact
CVE-2025-11187	Moderate	PBMAC1 salt/keylength validation	Valid diff, exact
CVE-2025-68160	Low	BIO_f_linebuffer heap overflow	Valid diff, exact

All 6/6 produced valid unified diff output. These patches apply directly to SUSE's OpenSSL 3.1.4 (the affected code is identical between versions).

Usage

With ollama (GGUF)

# Download and create model
ollama create cve-backport -f Modelfile
# Modelfile contents:
# FROM cve-backport-qwen32b-q4_k_m.gguf
# PARAMETER temperature 0.1
# PARAMETER num_ctx 4096

Prompt Format (ChatML)

<|im_start|>system
You are a security patch backporting assistant for SUSE/openSUSE.

Given a CVE description and an upstream fix, adapt the patch to work with
the target SUSE package version. Output the backported patch in unified
diff format. If the upstream fix applies directly, output it unchanged.
<|im_end|>
<|im_start|>user
## CVE: CVE-2025-XXXXX

Description of the vulnerability...

## Package
package-name

## Upstream Fix
```diff
--- a/file.c
+++ b/file.c
@@ -10,6 +10,8 @@
...

Generate a backported patch for the SUSE target version. <|im_end|> <|im_start|>assistant


### With transformers (adapter)

```python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-32B-Instruct")
model = PeftModel.from_pretrained(base, "anicka/cve-backport-qwen32b-phase5")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-32B-Instruct")

Limitations

Identical-tier patches are near-perfect (100% exact on 49 tested) — this is the model's strength
Adapted patches get the code changes right but often fail to apply — wrong line numbers, wrong diff format, wrong context lines (2/8 fully apply, 2/8 partially apply)
Complex adaptations (major API changes between versions) may produce plausible but structurally wrong patches
The model sometimes outputs diff --git format instead of SUSE's Index: quilt format
Always review and test generated patches before applying
Performance degrades for very long patches (>2000 lines)

Known Issues and Future Directions

What works well

Reproducing identical-tier patches (upstream fix applies directly): 100% exact match
Understanding CVE context and generating valid unified diff format: 100%
Getting the right code changes for adapted patches with >90% similarity

What needs improvement

Diff format consistency — the model mixes diff --git and Index: (quilt) formats. A post-processing step could normalize output to the expected SUSE format, or training data could be standardized to one format.
Line number accuracy — model hunks are often offset by 5-60 lines from the correct position. Possible approaches:
- Include more surrounding source context in prompts (currently only ~15% of examples have it)
- Train with augmented examples where line numbers are varied
- Post-process with patch --fuzz or a smarter re-anchoring tool that finds the right location using the context lines
Context line correctness — the model sometimes hallucinates context lines that don't exist in the target version (e.g., referencing variables from a different version). This could be addressed by:
- Always providing target source context in prompts (not just 15% of examples)
- Training a second stage that validates/fixes context against the actual file
Version awareness — the model occasionally uses the wrong package version in paths (e.g., libssh-0.7.3 instead of 0.7.5). Explicit version information in prompts and stricter prompt templates could help.
Complex multi-function adaptations — patches requiring significant structural changes (like the libssh cases at 26-47% similarity) likely need a fundamentally different approach, possibly agentic (iterative generation with compilation feedback).

Potential architecture improvements

Retrieval-augmented generation (RAG) — feed the actual target source file to the model, not just the upstream patch. The 15% of training examples with source context already show this helps.
Two-stage pipeline — Stage 1: generate the code changes. Stage 2: given the target file, locate the correct insertion point and format the diff.
Compilation feedback loop — generate patch → apply → build → if fails, feed error back to model for correction. This agentic approach could dramatically improve the apply rate.
Larger context window — the current 4096 token limit constrains how much source context can be provided. Training with 8K-16K context could help adapted patches significantly.

Dataset: anicka/cve-backport-v5-dataset
GGUF: anicka/cve-backport-qwen32b-phase5-gguf
Tool: cve-backport-tool (OBS integration)

Downloads last month: 32

Model tree for anicka/cve-backport-qwen32b-phase5

Base model

Qwen/Qwen2.5-32B

Finetuned

Qwen/Qwen2.5-Coder-32B