Instructions to use exploitintel/cve-cwe-qwen3-32b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use exploitintel/cve-cwe-qwen3-32b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="exploitintel/cve-cwe-qwen3-32b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("exploitintel/cve-cwe-qwen3-32b")
model = AutoModelForCausalLM.from_pretrained("exploitintel/cve-cwe-qwen3-32b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

llama-cpp-python

How to use exploitintel/cve-cwe-qwen3-32b with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="exploitintel/cve-cwe-qwen3-32b",
	filename="q32-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use exploitintel/cve-cwe-qwen3-32b with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf exploitintel/cve-cwe-qwen3-32b:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf exploitintel/cve-cwe-qwen3-32b:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf exploitintel/cve-cwe-qwen3-32b:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf exploitintel/cve-cwe-qwen3-32b:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf exploitintel/cve-cwe-qwen3-32b:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf exploitintel/cve-cwe-qwen3-32b:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf exploitintel/cve-cwe-qwen3-32b:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf exploitintel/cve-cwe-qwen3-32b:Q4_K_M

Use Docker

docker model run hf.co/exploitintel/cve-cwe-qwen3-32b:Q4_K_M

LM Studio
Jan

vLLM

How to use exploitintel/cve-cwe-qwen3-32b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "exploitintel/cve-cwe-qwen3-32b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "exploitintel/cve-cwe-qwen3-32b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/exploitintel/cve-cwe-qwen3-32b:Q4_K_M

SGLang

How to use exploitintel/cve-cwe-qwen3-32b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "exploitintel/cve-cwe-qwen3-32b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "exploitintel/cve-cwe-qwen3-32b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "exploitintel/cve-cwe-qwen3-32b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "exploitintel/cve-cwe-qwen3-32b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use exploitintel/cve-cwe-qwen3-32b with Ollama:
```
ollama run hf.co/exploitintel/cve-cwe-qwen3-32b:Q4_K_M
```

Unsloth Studio

How to use exploitintel/cve-cwe-qwen3-32b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for exploitintel/cve-cwe-qwen3-32b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for exploitintel/cve-cwe-qwen3-32b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for exploitintel/cve-cwe-qwen3-32b to start chatting

How to use exploitintel/cve-cwe-qwen3-32b with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf exploitintel/cve-cwe-qwen3-32b:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "exploitintel/cve-cwe-qwen3-32b:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use exploitintel/cve-cwe-qwen3-32b with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf exploitintel/cve-cwe-qwen3-32b:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default exploitintel/cve-cwe-qwen3-32b:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use exploitintel/cve-cwe-qwen3-32b with Docker Model Runner:
```
docker model run hf.co/exploitintel/cve-cwe-qwen3-32b:Q4_K_M
```

Lemonade

How to use exploitintel/cve-cwe-qwen3-32b with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull exploitintel/cve-cwe-qwen3-32b:Q4_K_M

Run and chat with the model

lemonade run user.cve-cwe-qwen3-32b-Q4_K_M

List all available models

lemonade list

cve-cwe-qwen3-32b

File size: 7,410 Bytes

16d4233

#!/usr/bin/env python3
"""Evaluate a fine-tuned CVE -> CWE model on the held-out test split.

Reports exact-match accuracy plus micro/macro multi-label F1, stratified into
"easy" (the weakness is named in the description) vs "hard" (it must be inferred),
so you see real-world performance instead of one flattered average.

Loads with plain transformers. Newer architectures (e.g. model_type ``gemma4``,
used by gemma-4-E4B) need **transformers >= 5.5** -- older versions raise
``KeyError: 'gemma4'``. Note: do NOT load gemma4 through unsloth in a Studio env
whose transformers was upgraded -- the upgrade pulls ``huggingface_hub`` 1.x,
which breaks ``unsloth_zoo``'s config lookup. Plain transformers is the clean path.

    python evaluate.py --model "C:\\path\\to\\exported\\merged_model" --limit 500
    python evaluate.py --model "C:\\path\\to\\exported\\merged_model"

Needs: transformers>=5.5, torch, datasets, accelerate.
"""

from __future__ import annotations

import argparse
import re

import torch
from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer

CWE_RE = re.compile(r"CWE-\d+")

# A row is "easy" if the description literally names the weakness (the model can
# keyword-match); "hard" rows require inferring the CWE from the prose.
EASY_KW = [
    "sql injection",
    "cross-site scripting",
    "cross site scripting",
    "xss",
    "buffer overflow",
    "use after free",
    "use-after-free",
    "path traversal",
    "command injection",
    "out-of-bounds",
    "out of bounds",
    "race condition",
    "deserialization",
    "ssrf",
    "server-side request forgery",
    "csrf",
    "cross-site request forgery",
    "open redirect",
    "integer overflow",
]


def parse_cwes(text: str) -> set[str]:
    return set(CWE_RE.findall(text))


def is_easy(description: str) -> bool:
    return any(k in description.lower() for k in EASY_KW)


def prf(tp: int, fp: int, fn: int) -> tuple[float, float, float]:
    p = tp / (tp + fp) if (tp + fp) else 0.0
    r = tp / (tp + fn) if (tp + fn) else 0.0
    f = 2 * p * r / (p + r) if (p + r) else 0.0
    return p, r, f


def build_prompt(tok, messages: list[dict]) -> str:
    """Prompt = everything up to (but not including) the assistant answer."""
    convo = messages[:-1]
    try:
        return tok.apply_chat_template(convo, tokenize=False, add_generation_prompt=True)
    except Exception:
        # Some chat templates (e.g. Gemma) reject a separate "system" role;
        # fold the system text into the user turn instead.
        sys_txt = next((m["content"] for m in convo if m["role"] == "system"), "")
        usr_txt = next((m["content"] for m in convo if m["role"] == "user"), "")
        folded = [{"role": "user", "content": f"{sys_txt}\n\n{usr_txt}".strip()}]
        return tok.apply_chat_template(folded, tokenize=False, add_generation_prompt=True)


def score(truths: list[set[str]], preds: list[set[str]], easies: list[bool]) -> None:
    micro = [0, 0, 0]  # tp, fp, fn
    per_label: dict[str, list[int]] = {}
    exact = 0
    strata = {"easy": [0, 0, 0, 0, 0], "hard": [0, 0, 0, 0, 0]}  # tp,fp,fn,exact,n

    for true, pred, easy in zip(truths, preds, easies):
        tp, fp, fn = len(pred & true), len(pred - true), len(true - pred)
        micro[0] += tp
        micro[1] += fp
        micro[2] += fn
        ex = int(pred == true)
        exact += ex
        for lab in true | pred:
            d = per_label.setdefault(lab, [0, 0, 0])
            if lab in true and lab in pred:
                d[0] += 1
            elif lab in pred:
                d[1] += 1
            else:
                d[2] += 1
        s = strata["easy" if easy else "hard"]
        s[0] += tp
        s[1] += fp
        s[2] += fn
        s[3] += ex
        s[4] += 1

    n = len(truths)
    micro_f1 = prf(*micro)[2]
    macro_f1 = sum(prf(*v)[2] for v in per_label.values()) / len(per_label) if per_label else 0.0

    print("\n=== CVE -> CWE evaluation ===")
    print(f"examples             : {n}")
    print(f"exact-match accuracy : {exact / n:.3f}   (predicted CWE set == true set)")
    print(f"micro-F1             : {micro_f1:.3f}")
    print(f"macro-F1             : {macro_f1:.3f}   (unweighted mean over {len(per_label)} CWEs)")
    print("\n-- by difficulty --")
    for name, label in (("easy", "easy (weakness named)"), ("hard", "hard (must infer) ")):
        tp, fp, fn, ex, m = strata[name]
        if m:
            print(f"  {label:22s} n={m:5d}  exact={ex / m:.3f}  micro-F1={prf(tp, fp, fn)[2]:.3f}")


def main() -> None:
    ap = argparse.ArgumentParser(description="Evaluate a CVE->CWE model on the test split.")
    ap.add_argument("--model", required=True, help="path or HF id of the fine-tuned (merged) model")
    ap.add_argument("--dataset", default="exploitintel/cve-cwe-consensus")
    ap.add_argument("--split", default="test")
    ap.add_argument(
        "--limit", type=int, default=None, help="evaluate only the first N rows (quick check)"
    )
    ap.add_argument("--batch-size", type=int, default=16)
    ap.add_argument("--max-new-tokens", type=int, default=32)
    args = ap.parse_args()

    print(f"loading model: {args.model}")
    try:
        tok = AutoTokenizer.from_pretrained(args.model)
    except (AttributeError, TypeError):
        # Some Gemma tokenizer configs store `extra_special_tokens` as a list, which
        # trips a transformers bug ('list' object has no attribute 'keys').
        tok = AutoTokenizer.from_pretrained(args.model, extra_special_tokens={})
    tok.padding_side = "left"  # decoder-only batched generation needs left padding
    if tok.pad_token is None:
        tok.pad_token = tok.eos_token
    device = "cuda" if torch.cuda.is_available() else "cpu"
    try:
        model = AutoModelForCausalLM.from_pretrained(args.model, dtype="auto").to(device)
    except TypeError:
        # `dtype` is the transformers 5.x name; older releases use `torch_dtype`.
        model = AutoModelForCausalLM.from_pretrained(args.model, torch_dtype="auto").to(device)
    model.eval()

    ds = load_dataset(args.dataset, split=args.split)
    if args.limit:
        ds = ds.select(range(min(args.limit, len(ds))))

    prompts, truths, easies = [], [], []
    for ex in ds:
        msgs = ex["messages"]
        prompts.append(build_prompt(tok, msgs))
        truths.append(parse_cwes(msgs[-1]["content"]))
        usr = next((m["content"] for m in msgs if m["role"] == "user"), "")
        easies.append(is_easy(usr))

    preds: list[set[str]] = []
    for i in range(0, len(prompts), args.batch_size):
        batch = prompts[i : i + args.batch_size]
        enc = tok(batch, return_tensors="pt", padding=True, truncation=True, max_length=1024).to(
            device
        )
        with torch.no_grad():
            out = model.generate(
                **enc,
                max_new_tokens=args.max_new_tokens,
                do_sample=False,  # greedy = deterministic
                pad_token_id=tok.pad_token_id,
            )
        new_tokens = out[:, enc["input_ids"].shape[1] :]  # drop the prompt, keep the answer
        for row in new_tokens:
            preds.append(parse_cwes(tok.decode(row, skip_special_tokens=True)))
        print(f"  {min(i + args.batch_size, len(prompts))}/{len(prompts)}", end="\r")
    print()

    score(truths, preds, easies)


if __name__ == "__main__":
    main()