CVE Backport Model β Phase 5 (Qwen2.5-Coder-32B QLoRA)
A security patch backporting assistant fine-tuned to adapt upstream CVE fixes for SUSE/openSUSE package versions. Given a CVE description and an upstream patch, the model generates a backported patch in unified diff format.
Model Details
- Base model: Qwen/Qwen2.5-Coder-32B-Instruct
- Method: QLoRA (4-bit NF4, LoRA r=16, alpha=32, dropout=0.05)
- Training data: anicka/cve-backport-v5-dataset β 21,173 train / 2,361 eval examples across 994 packages and 2,055 CVEs
- GGUF quantizations: anicka/cve-backport-qwen32b-phase5-gguf (Q4_K_M 19GB, Q8_0 33GB)
Training
| Parameter | Value |
|---|---|
| Epochs | 1 |
| Steps | 1,184 |
| Batch size | 1 (grad accum 16 = effective 16) |
| Learning rate | 1e-4 |
| Max sequence length | 4,096 tokens |
| LoRA rank / alpha | 16 / 32 |
| LoRA targets | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Hardware | NVIDIA H100 SXM 80GB (vast.ai) |
| Training time | ~17 hours |
| Final train loss | 0.004 |
Loss Curve
| Step | Loss |
|---|---|
| 10 | 0.053 |
| 100 | 0.016 |
| 300 | 0.010 |
| 600 | 0.007 |
| 1000 | 0.004 |
| 1184 | 0.004 |
Dataset
The v5 dataset contains real SUSE/openSUSE security patches paired with their upstream fixes:
- 23,534 examples (21,173 train + 2,361 eval, zero CVE overlap)
- 994 packages, 2,055 CVEs
- 39% with source context (3,536 examples include surrounding code)
- 89% identical tier (upstream applies directly), 11% adapted (requires modification)
- Languages: C (75%), C++ (10%), Python (7%), shell (3%), Java (2%), JS (1%), Go (1%), other (1%)
- 100% SPDX license coverage β GPL-2.x (30%), LGPL (17%), GPL-3.x (15%), MIT (11%), BSD (8%), Apache (5%), other (14%)
Evaluation
All evaluation uses the Q4_K_M GGUF quantization via ollama (temperature=0.1, num_ctx=4096). The eval set has zero CVE overlap with training β completely held-out examples.
50-Package Benchmark (no source context)
Tested on 50 unique packages from the held-out eval set (49 identical-tier, 1 adapted-tier):
| Metric | Result |
|---|---|
| Exact match | 49/50 (98%) |
| High similarity (>=90%) | 50/50 (100%) |
| Valid diff format | 50/50 (100%) |
| Avg time per test | 17s |
- Identical-tier: 49/49 (100%) exact match β character-for-character reproduction
- Adapted-tier: 1/1 at 93% similarity (libxml2 β reasonable adaptation)
- Languages tested: C, C++, Go, Python, JavaScript β all 100% exact on identical tier
Source-Context Adapted Patches (the hard test)
14 tests where the prompt includes target SUSE source code and the model must adapt the upstream patch (12 adapted-tier, 2 identical-tier):
| Metric | Adapted (12) | Identical (2) |
|---|---|---|
| Exact match | 1/12 (8%) | 2/2 (100%) |
| High similarity (>=90%) | 7/12 (58%) | 2/2 (100%) |
| Avg similarity | 81.6% | 100% |
| Valid diff format | 12/12 (100%) | 2/2 (100%) |
Per-package adapted results (sorted by similarity):
| Package | CVE | Similarity |
|---|---|---|
| libxml2 | β | 100% (exact) |
| php5 | CVE-2017-9224 | 96% |
| gnutls | β | 95% |
| libxml2 | β | 93% |
| libssh | CVE-2018-10933 | 92% |
| openssh | CVE-2018-15473 | 91% |
| gnutls | β | 91% |
| php5 | CVE-2017-9224 | 88% |
| libxml2 | β | 86% |
| ImageMagick | β | 74% |
| libssh | CVE-2018-10933 | 47% |
| libssh | CVE-2018-10933 | 26% |
Most adapted patches achieve 86-100% similarity. The two low-scoring libssh cases involve complex multi-function adaptations where the model generates plausible but structurally different patches.
Patch Application to Real Source Trees
We tested whether the 12 non-exact adapted patches actually apply to real SUSE source trees
using patch --dry-run with various strip levels and fuzz=3:
| Status | Reference patches | Model patches |
|---|---|---|
| Fully applies | 8/12 | 2/12 |
| Partially applies | 0/12 | 2/12 |
| Fails | 4/12 | 8/12 |
4 patches fail for both reference and model (wrong source version available for testing). Comparing only the 8 cases where the reference applies:
| Model result | Count | Examples |
|---|---|---|
| Fully applies | 2/8 | php5 (96% sim), libssh (92% sim) |
| Partially applies | 2/8 | gnutls 1/3 hunks (95% sim), openssh 3/5 hunks (91% sim) |
| Fails | 4/8 | gnutls (91%), libxml2 (86%), libssh (47%), libssh (26%) |
Failure modes for model patches that don't apply:
- Wrong context lines β model assumes variables/lines exist that don't in the target version
(e.g., modifying
int version;when no such variable exists) - Wrong line numbers β hunk offsets differ by 6-57 lines from the correct position
- Wrong diff format β outputs
diff --git a/file b/fileinstead ofIndex: pkg-version/filewith.origpaths (SUSE quilt format) - Wrong version in paths β e.g.,
libssh-0.7.3instead oflibssh-0.7.5
The model consistently gets the code changes right for high-similarity patches (>90%) but sometimes gets the location wrong (line numbers, file paths, diff format).
Fresh OpenSSL CVEs (January 2026 β completely unseen)
Tested on 6 CVEs from the January 27, 2026 OpenSSL advisory (none in training data, upstream 3.4+/3.6+ backported to SUSE's OpenSSL 3.1.4):
| CVE | Severity | Description | Result |
|---|---|---|---|
| CVE-2025-69421 | Low | PKCS12 NULL pointer dereference | Valid diff, exact |
| CVE-2025-69420 | Low | ASN1_TYPE type confusion | Valid diff, exact |
| CVE-2025-69418 | Low | OCB trailing bytes | Valid diff, exact |
| CVE-2025-15467 | High | AEAD CMS IV length validation | Valid diff, exact |
| CVE-2025-11187 | Moderate | PBMAC1 salt/keylength validation | Valid diff, exact |
| CVE-2025-68160 | Low | BIO_f_linebuffer heap overflow | Valid diff, exact |
All 6/6 produced valid unified diff output. These patches apply directly to SUSE's OpenSSL 3.1.4 (the affected code is identical between versions).
Usage
With ollama (GGUF)
# Download and create model
ollama create cve-backport -f Modelfile
# Modelfile contents:
# FROM cve-backport-qwen32b-q4_k_m.gguf
# PARAMETER temperature 0.1
# PARAMETER num_ctx 4096
Prompt Format (ChatML)
<|im_start|>system
You are a security patch backporting assistant for SUSE/openSUSE.
Given a CVE description and an upstream fix, adapt the patch to work with
the target SUSE package version. Output the backported patch in unified
diff format. If the upstream fix applies directly, output it unchanged.
<|im_end|>
<|im_start|>user
## CVE: CVE-2025-XXXXX
Description of the vulnerability...
## Package
package-name
## Upstream Fix
```diff
--- a/file.c
+++ b/file.c
@@ -10,6 +10,8 @@
...
Generate a backported patch for the SUSE target version. <|im_end|> <|im_start|>assistant
### With transformers (adapter)
```python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-32B-Instruct")
model = PeftModel.from_pretrained(base, "anicka/cve-backport-qwen32b-phase5")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-32B-Instruct")
Limitations
- Identical-tier patches are near-perfect (100% exact on 49 tested) β this is the model's strength
- Adapted patches get the code changes right but often fail to apply β wrong line numbers, wrong diff format, wrong context lines (2/8 fully apply, 2/8 partially apply)
- Complex adaptations (major API changes between versions) may produce plausible but structurally wrong patches
- The model sometimes outputs
diff --gitformat instead of SUSE'sIndex:quilt format - Always review and test generated patches before applying
- Performance degrades for very long patches (>2000 lines)
Known Issues and Future Directions
What works well
- Reproducing identical-tier patches (upstream fix applies directly): 100% exact match
- Understanding CVE context and generating valid unified diff format: 100%
- Getting the right code changes for adapted patches with >90% similarity
What needs improvement
Diff format consistency β the model mixes
diff --gitandIndex:(quilt) formats. A post-processing step could normalize output to the expected SUSE format, or training data could be standardized to one format.Line number accuracy β model hunks are often offset by 5-60 lines from the correct position. Possible approaches:
- Include more surrounding source context in prompts (currently only ~15% of examples have it)
- Train with augmented examples where line numbers are varied
- Post-process with
patch --fuzzor a smarter re-anchoring tool that finds the right location using the context lines
Context line correctness β the model sometimes hallucinates context lines that don't exist in the target version (e.g., referencing variables from a different version). This could be addressed by:
- Always providing target source context in prompts (not just 15% of examples)
- Training a second stage that validates/fixes context against the actual file
Version awareness β the model occasionally uses the wrong package version in paths (e.g.,
libssh-0.7.3instead of0.7.5). Explicit version information in prompts and stricter prompt templates could help.Complex multi-function adaptations β patches requiring significant structural changes (like the libssh cases at 26-47% similarity) likely need a fundamentally different approach, possibly agentic (iterative generation with compilation feedback).
Potential architecture improvements
- Retrieval-augmented generation (RAG) β feed the actual target source file to the model, not just the upstream patch. The 15% of training examples with source context already show this helps.
- Two-stage pipeline β Stage 1: generate the code changes. Stage 2: given the target file, locate the correct insertion point and format the diff.
- Compilation feedback loop β generate patch β apply β build β if fails, feed error back to model for correction. This agentic approach could dramatically improve the apply rate.
- Larger context window β the current 4096 token limit constrains how much source context can be provided. Training with 8K-16K context could help adapted patches significantly.
Related
- Dataset: anicka/cve-backport-v5-dataset
- GGUF: anicka/cve-backport-qwen32b-phase5-gguf
- Tool: cve-backport-tool (OBS integration)
- Downloads last month
- 32
Model tree for anicka/cve-backport-qwen32b-phase5
Base model
Qwen/Qwen2.5-32B