Instructions to use exploitintel/cve-cwe-qwen3-32b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use exploitintel/cve-cwe-qwen3-32b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="exploitintel/cve-cwe-qwen3-32b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("exploitintel/cve-cwe-qwen3-32b")
model = AutoModelForCausalLM.from_pretrained("exploitintel/cve-cwe-qwen3-32b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

llama-cpp-python

How to use exploitintel/cve-cwe-qwen3-32b with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="exploitintel/cve-cwe-qwen3-32b",
	filename="q32-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use exploitintel/cve-cwe-qwen3-32b with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf exploitintel/cve-cwe-qwen3-32b:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf exploitintel/cve-cwe-qwen3-32b:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf exploitintel/cve-cwe-qwen3-32b:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf exploitintel/cve-cwe-qwen3-32b:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf exploitintel/cve-cwe-qwen3-32b:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf exploitintel/cve-cwe-qwen3-32b:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf exploitintel/cve-cwe-qwen3-32b:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf exploitintel/cve-cwe-qwen3-32b:Q4_K_M

Use Docker

docker model run hf.co/exploitintel/cve-cwe-qwen3-32b:Q4_K_M

LM Studio
Jan

vLLM

How to use exploitintel/cve-cwe-qwen3-32b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "exploitintel/cve-cwe-qwen3-32b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "exploitintel/cve-cwe-qwen3-32b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/exploitintel/cve-cwe-qwen3-32b:Q4_K_M

SGLang

How to use exploitintel/cve-cwe-qwen3-32b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "exploitintel/cve-cwe-qwen3-32b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "exploitintel/cve-cwe-qwen3-32b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "exploitintel/cve-cwe-qwen3-32b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "exploitintel/cve-cwe-qwen3-32b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use exploitintel/cve-cwe-qwen3-32b with Ollama:
```
ollama run hf.co/exploitintel/cve-cwe-qwen3-32b:Q4_K_M
```

Unsloth Studio

How to use exploitintel/cve-cwe-qwen3-32b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for exploitintel/cve-cwe-qwen3-32b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for exploitintel/cve-cwe-qwen3-32b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for exploitintel/cve-cwe-qwen3-32b to start chatting

How to use exploitintel/cve-cwe-qwen3-32b with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf exploitintel/cve-cwe-qwen3-32b:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "exploitintel/cve-cwe-qwen3-32b:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use exploitintel/cve-cwe-qwen3-32b with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf exploitintel/cve-cwe-qwen3-32b:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default exploitintel/cve-cwe-qwen3-32b:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use exploitintel/cve-cwe-qwen3-32b with Docker Model Runner:
```
docker model run hf.co/exploitintel/cve-cwe-qwen3-32b:Q4_K_M
```

Lemonade

How to use exploitintel/cve-cwe-qwen3-32b with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull exploitintel/cve-cwe-qwen3-32b:Q4_K_M

Run and chat with the model

lemonade run user.cve-cwe-qwen3-32b-Q4_K_M

List all available models

lemonade list

cve-cwe-qwen3-32b / evaluate.py

exploitintel

Add evaluation script

16d4233 verified 5 days ago

raw

history blame contribute delete

7.41 kB

	#!/usr/bin/env python3
	"""Evaluate a fine-tuned CVE -> CWE model on the held-out test split.

	Reports exact-match accuracy plus micro/macro multi-label F1, stratified into
	"easy" (the weakness is named in the description) vs "hard" (it must be inferred),
	so you see real-world performance instead of one flattered average.

	Loads with plain transformers. Newer architectures (e.g. model_type ``gemma4``,
	used by gemma-4-E4B) need transformers >= 5.5 -- older versions raise
	``KeyError: 'gemma4'``. Note: do NOT load gemma4 through unsloth in a Studio env
	whose transformers was upgraded -- the upgrade pulls ``huggingface_hub`` 1.x,
	which breaks ``unsloth_zoo``'s config lookup. Plain transformers is the clean path.

	python evaluate.py --model "C:\\path\\to\\exported\\merged_model" --limit 500
	python evaluate.py --model "C:\\path\\to\\exported\\merged_model"

	Needs: transformers>=5.5, torch, datasets, accelerate.
	"""

	from __future__ import annotations

	import argparse
	import re

	import torch
	from datasets import load_dataset
	from transformers import AutoModelForCausalLM, AutoTokenizer

	CWE_RE = re.compile(r"CWE-\d+")

	# A row is "easy" if the description literally names the weakness (the model can
	# keyword-match); "hard" rows require inferring the CWE from the prose.
	EASY_KW = [
	"sql injection",
	"cross-site scripting",
	"cross site scripting",
	"xss",
	"buffer overflow",
	"use after free",
	"use-after-free",
	"path traversal",
	"command injection",
	"out-of-bounds",
	"out of bounds",
	"race condition",
	"deserialization",
	"ssrf",
	"server-side request forgery",
	"csrf",
	"cross-site request forgery",
	"open redirect",
	"integer overflow",
	]


	def parse_cwes(text: str) -> set[str]:
	return set(CWE_RE.findall(text))


	def is_easy(description: str) -> bool:
	return any(k in description.lower() for k in EASY_KW)


	def prf(tp: int, fp: int, fn: int) -> tuple[float, float, float]:
	p = tp / (tp + fp) if (tp + fp) else 0.0
	r = tp / (tp + fn) if (tp + fn) else 0.0
	f = 2 * p * r / (p + r) if (p + r) else 0.0
	return p, r, f


	def build_prompt(tok, messages: list[dict]) -> str:
	"""Prompt = everything up to (but not including) the assistant answer."""
	convo = messages[:-1]
	try:
	return tok.apply_chat_template(convo, tokenize=False, add_generation_prompt=True)
	except Exception:
	# Some chat templates (e.g. Gemma) reject a separate "system" role;
	# fold the system text into the user turn instead.
	sys_txt = next((m["content"] for m in convo if m["role"] == "system"), "")
	usr_txt = next((m["content"] for m in convo if m["role"] == "user"), "")
	folded = [{"role": "user", "content": f"{sys_txt}\n\n{usr_txt}".strip()}]
	return tok.apply_chat_template(folded, tokenize=False, add_generation_prompt=True)


	def score(truths: list[set[str]], preds: list[set[str]], easies: list[bool]) -> None:
	micro = [0, 0, 0] # tp, fp, fn
	per_label: dict[str, list[int]] = {}
	exact = 0
	strata = {"easy": [0, 0, 0, 0, 0], "hard": [0, 0, 0, 0, 0]} # tp,fp,fn,exact,n

	for true, pred, easy in zip(truths, preds, easies):
	tp, fp, fn = len(pred & true), len(pred - true), len(true - pred)
	micro[0] += tp
	micro[1] += fp
	micro[2] += fn
	ex = int(pred == true)
	exact += ex
	for lab in true \| pred:
	d = per_label.setdefault(lab, [0, 0, 0])
	if lab in true and lab in pred:
	d[0] += 1
	elif lab in pred:
	d[1] += 1
	else:
	d[2] += 1
	s = strata["easy" if easy else "hard"]
	s[0] += tp
	s[1] += fp
	s[2] += fn
	s[3] += ex
	s[4] += 1

	n = len(truths)
	micro_f1 = prf(*micro)[2]
	macro_f1 = sum(prf(*v)[2] for v in per_label.values()) / len(per_label) if per_label else 0.0

	print("\n=== CVE -> CWE evaluation ===")
	print(f"examples : {n}")
	print(f"exact-match accuracy : {exact / n:.3f} (predicted CWE set == true set)")
	print(f"micro-F1 : {micro_f1:.3f}")
	print(f"macro-F1 : {macro_f1:.3f} (unweighted mean over {len(per_label)} CWEs)")
	print("\n-- by difficulty --")
	for name, label in (("easy", "easy (weakness named)"), ("hard", "hard (must infer) ")):
	tp, fp, fn, ex, m = strata[name]
	if m:
	print(f" {label:22s} n={m:5d} exact={ex / m:.3f} micro-F1={prf(tp, fp, fn)[2]:.3f}")


	def main() -> None:
	ap = argparse.ArgumentParser(description="Evaluate a CVE->CWE model on the test split.")
	ap.add_argument("--model", required=True, help="path or HF id of the fine-tuned (merged) model")
	ap.add_argument("--dataset", default="exploitintel/cve-cwe-consensus")
	ap.add_argument("--split", default="test")
	ap.add_argument(
	"--limit", type=int, default=None, help="evaluate only the first N rows (quick check)"
	)
	ap.add_argument("--batch-size", type=int, default=16)
	ap.add_argument("--max-new-tokens", type=int, default=32)
	args = ap.parse_args()

	print(f"loading model: {args.model}")
	try:
	tok = AutoTokenizer.from_pretrained(args.model)
	except (AttributeError, TypeError):
	# Some Gemma tokenizer configs store `extra_special_tokens` as a list, which
	# trips a transformers bug ('list' object has no attribute 'keys').
	tok = AutoTokenizer.from_pretrained(args.model, extra_special_tokens={})
	tok.padding_side = "left" # decoder-only batched generation needs left padding
	if tok.pad_token is None:
	tok.pad_token = tok.eos_token
	device = "cuda" if torch.cuda.is_available() else "cpu"
	try:
	model = AutoModelForCausalLM.from_pretrained(args.model, dtype="auto").to(device)
	except TypeError:
	# `dtype` is the transformers 5.x name; older releases use `torch_dtype`.
	model = AutoModelForCausalLM.from_pretrained(args.model, torch_dtype="auto").to(device)
	model.eval()

	ds = load_dataset(args.dataset, split=args.split)
	if args.limit:
	ds = ds.select(range(min(args.limit, len(ds))))

	prompts, truths, easies = [], [], []
	for ex in ds:
	msgs = ex["messages"]
	prompts.append(build_prompt(tok, msgs))
	truths.append(parse_cwes(msgs[-1]["content"]))
	usr = next((m["content"] for m in msgs if m["role"] == "user"), "")
	easies.append(is_easy(usr))

	preds: list[set[str]] = []
	for i in range(0, len(prompts), args.batch_size):
	batch = prompts[i : i + args.batch_size]
	enc = tok(batch, return_tensors="pt", padding=True, truncation=True, max_length=1024).to(
	device
	)
	with torch.no_grad():
	out = model.generate(
	**enc,
	max_new_tokens=args.max_new_tokens,
	do_sample=False, # greedy = deterministic
	pad_token_id=tok.pad_token_id,
	)
	new_tokens = out[:, enc["input_ids"].shape[1] :] # drop the prompt, keep the answer
	for row in new_tokens:
	preds.append(parse_cwes(tok.decode(row, skip_special_tokens=True)))
	print(f" {min(i + args.batch_size, len(prompts))}/{len(prompts)}", end="\r")
	print()

	score(truths, preds, easies)


	if __name__ == "__main__":
	main()