Instructions to use openSUSE/CVE-Backport-Qwen2.5-Coder-32B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use openSUSE/CVE-Backport-Qwen2.5-Coder-32B with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="openSUSE/CVE-Backport-Qwen2.5-Coder-32B",
	filename="cve-backport-codegen-v3-q8_0.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use openSUSE/CVE-Backport-Qwen2.5-Coder-32B with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf openSUSE/CVE-Backport-Qwen2.5-Coder-32B:Q8_0
# Run inference directly in the terminal:
llama-cli -hf openSUSE/CVE-Backport-Qwen2.5-Coder-32B:Q8_0

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf openSUSE/CVE-Backport-Qwen2.5-Coder-32B:Q8_0
# Run inference directly in the terminal:
llama-cli -hf openSUSE/CVE-Backport-Qwen2.5-Coder-32B:Q8_0

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf openSUSE/CVE-Backport-Qwen2.5-Coder-32B:Q8_0
# Run inference directly in the terminal:
./llama-cli -hf openSUSE/CVE-Backport-Qwen2.5-Coder-32B:Q8_0

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf openSUSE/CVE-Backport-Qwen2.5-Coder-32B:Q8_0
# Run inference directly in the terminal:
./build/bin/llama-cli -hf openSUSE/CVE-Backport-Qwen2.5-Coder-32B:Q8_0

Use Docker

docker model run hf.co/openSUSE/CVE-Backport-Qwen2.5-Coder-32B:Q8_0

LM Studio
Jan

vLLM

How to use openSUSE/CVE-Backport-Qwen2.5-Coder-32B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "openSUSE/CVE-Backport-Qwen2.5-Coder-32B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "openSUSE/CVE-Backport-Qwen2.5-Coder-32B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/openSUSE/CVE-Backport-Qwen2.5-Coder-32B:Q8_0

Ollama
How to use openSUSE/CVE-Backport-Qwen2.5-Coder-32B with Ollama:
```
ollama run hf.co/openSUSE/CVE-Backport-Qwen2.5-Coder-32B:Q8_0
```

Unsloth Studio new

How to use openSUSE/CVE-Backport-Qwen2.5-Coder-32B with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for openSUSE/CVE-Backport-Qwen2.5-Coder-32B to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for openSUSE/CVE-Backport-Qwen2.5-Coder-32B to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for openSUSE/CVE-Backport-Qwen2.5-Coder-32B to start chatting

Pi new

How to use openSUSE/CVE-Backport-Qwen2.5-Coder-32B with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf openSUSE/CVE-Backport-Qwen2.5-Coder-32B:Q8_0

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "openSUSE/CVE-Backport-Qwen2.5-Coder-32B:Q8_0"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use openSUSE/CVE-Backport-Qwen2.5-Coder-32B with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf openSUSE/CVE-Backport-Qwen2.5-Coder-32B:Q8_0

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default openSUSE/CVE-Backport-Qwen2.5-Coder-32B:Q8_0

Run Hermes

hermes

Docker Model Runner
How to use openSUSE/CVE-Backport-Qwen2.5-Coder-32B with Docker Model Runner:
```
docker model run hf.co/openSUSE/CVE-Backport-Qwen2.5-Coder-32B:Q8_0
```

Lemonade

How to use openSUSE/CVE-Backport-Qwen2.5-Coder-32B with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull openSUSE/CVE-Backport-Qwen2.5-Coder-32B:Q8_0

Run and chat with the model

lemonade run user.CVE-Backport-Qwen2.5-Coder-32B-Q8_0

List all available models

lemonade list

CVE Backport Code Generation — Qwen2.5-Coder-32B (v5)

Fine-tuned Qwen2.5-Coder-32B-Instruct for security patch backporting via per-hunk code generation. Maintained as part of the openSUSE security tooling effort, alongside the cve-backport-tool CLI.

Instead of generating unified diffs, this model takes a vulnerable code region and a fix description, and outputs the fixed version of the code. A programmatic diff then produces the final patch.

MoE variant available: An MoE-based alternative built on Qwen3-Coder-30B-A3B (3B active parameters) is hosted at anicka/cve-backport-codegen-v5-qwen3-coder-30b-a3b. It scores 91.9% recall on the same 100-example eval — 1.2 pt below this dense model — while running roughly 10× faster at inference due to sparse MoE activation. Recommended for bulk CVE backport workflows where throughput matters.

Quick Start

git clone https://github.com/openSUSE/cve-backport-tool
cd cve-backport-tool
./setup.sh                  # downloads GGUF, registers with ollama

python3 cve-backport.py \
    --cve CVE-2024-1234 \
    --package curl \
    --patch upstream-fix.patch \
    --obs-fetch --obs-project openSUSE:Leap:15.6:Update \
    --retry 3

GGUF Downloads

File	Quant	Size	Notes
`cve-backport-codegen-v5-q8_0.gguf`	Q8_0	33 GB	Recommended (v5, 93.1% recall, 94.4% precision, codegen-only)
`cve-backport-codegen-v4-q8_0.gguf`	Q8_0	33 GB	v4, 93% recall, 95% precision (includes test generation training)
`cve-backport-codegen-v3-q8_0.gguf`	Q8_0	33 GB	v3, 94% recall, 98% precision (legacy, smaller eval set)

Evaluation (v5)

Per-hunk evaluation on 100 held-out examples the model never saw during training:

Metric	v5	v4	v3 (n=20)
Average recall	93.1%	93%	94%
Average precision	94.4%	95%	98%
Exact match	83/100	87/100	16/20
Failures (<10%)	3/100	4/100	0/20

By tier:

Identical (upstream patch applies directly): 93.7% recall (77/85 perfect)
Adapted (line numbers/context differ): 90.0% recall (13/15 perfect)

Adapted-tier recall has steadily improved: 71% (v1) → 86% (v4) → 90% (v5).

What changed in v5

v5 uses a codegen-only dataset — all 36,166 training examples follow the same 3-turn format. v4 mixed in 772 five-turn test-generation examples which diluted codegen focus. Dropping those and training for 2 epochs (vs 1 in v4) improved adapted-tier recall.

Comparison with Frontier Models

Same eval, same 100 examples, optimized prompts with markdown stripping:

Model	Recall	Precision	Exact	Failures
CVE Backport v5 (32B fine-tuned)	93%	94%	83/100	3
Gemini 3.1 Pro (frontier, zero-shot)	27%	24%	10/100	50
Gemini 2.0 Flash (frontier, zero-shot)	13%	17%	4/100	81

Fine-tuning on 36K domain-specific examples outperforms frontier models by 3-7x on this task.

Prompt Format

ChatML format. Each prompt covers one hunk region with 15 lines of context padding.

Code Generation (3-turn)

System:

You are a security patch backporting assistant.

Given vulnerable source code and a description of the upstream fix, output the FIXED version of the code.

Rules:
- Output ONLY the fixed code, nothing else — no explanations, no markdown fences
- Preserve exact formatting, indentation, and style of the original
- Make ONLY the changes described in the fix — do not modify anything else
- Do not add comments about what you changed

User:

## File: crypto/bn/bn.h
## Lines: 280-310

\```c
/* vulnerable source code region with 15 lines of context */
\```

## Fix
Add bounds check for BN_num_bits to prevent buffer over-read (CVE-2024-XXXX).

Assistant: The fixed version of the code region (just the code, no markup).

Training


Base model	Qwen2.5-Coder-32B-Instruct
Method	QLoRA (4-bit NF4, bf16 compute, double quantization)
LoRA rank / alpha	64 / 128
Epochs	2 (8,228 steps)
Training data	36,166 train / 1,834 eval (codegen-only, all 3-turn)
Effective batch size	8
Learning rate	1e-4 (cosine, 5% warmup)
Max sequence length	4,096 tokens
Hardware	2× NVIDIA H100 NVL 94GB
Training time	46.1 hours
Final eval loss	0.00602

Reproduction via Teapot

This model is reproducible via the teapot training pipeline. Once the dataset is composed, training is a four-command sequence:

git clone https://github.com/anicka-net/teapot
cd teapot
pip install -e .

# 1. Compose training data from the cve-backport module
teapot compose configs/cve-backport.config \
    --output train-cve-backport.jsonl

# 2. Generate the QLoRA-HF launch script
teapot train configs/cve-backport.config \
    --backend qlora-hf \
    --train-data train-cve-backport.jsonl \
    --eval-data eval-cve-backport.jsonl \
    --output train-cve-backport.sh

# 3. Train (2× H100 NVL 94GB; ~46 hours)
bash train-cve-backport.sh

# 4. Final adapter is at output-teapot-cve-backport/final/

The teapot config (configs/cve-backport.config) pins all the hyperparameters listed in the Training table above. The qlora-hf backend invokes teapot.train_qlora_hf, a thin wrapper over the HuggingFace Trainer with bitsandbytes 4-bit quantization and PEFT LoRA.

LoRA Adapter and MoE Variant

The LoRA adapter for this model is hosted at anicka/cve-backport-codegen-v5-qwen25-32b for use with PEFT/transformers.

An MoE variant trained on the same dataset is available at anicka/cve-backport-codegen-v5-qwen3-coder-30b-a3b — built on Qwen3-Coder-30B-A3B (3B active params), 91.9% recall on the same n=100 eval, ~10× faster inference.

Known Issues

The 3 failure cases (0% recall) are all complex libvirt patches involving multi-function adaptations across large files with significant structural differences. These likely require an agentic approach with source tree context.
Very long hunks (>2000 tokens) may be truncated due to the 4096-token training context.
Always review generated patches before applying to production systems.

License

Apache-2.0 (inherited from Qwen2.5-Coder-32B-Instruct).

Downloads last month: 33

GGUF

Model size

33B params

Architecture

qwen2

Hardware compatibility

8-bit

Model tree for openSUSE/CVE-Backport-Qwen2.5-Coder-32B

Base model

Qwen/Qwen2.5-32B

Finetuned

Qwen/Qwen2.5-Coder-32B