Text Generation
Transformers
Safetensors
GGUF
English
qwen3
cybersecurity
vulnerability
cve
cwe
text-classification
qlora
unsloth
conversational
text-generation-inference
Instructions to use exploitintel/cve-cwe-qwen3-32b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use exploitintel/cve-cwe-qwen3-32b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="exploitintel/cve-cwe-qwen3-32b") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("exploitintel/cve-cwe-qwen3-32b") model = AutoModelForCausalLM.from_pretrained("exploitintel/cve-cwe-qwen3-32b") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - llama-cpp-python
How to use exploitintel/cve-cwe-qwen3-32b with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="exploitintel/cve-cwe-qwen3-32b", filename="q32-Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use exploitintel/cve-cwe-qwen3-32b with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf exploitintel/cve-cwe-qwen3-32b:Q4_K_M # Run inference directly in the terminal: llama-cli -hf exploitintel/cve-cwe-qwen3-32b:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf exploitintel/cve-cwe-qwen3-32b:Q4_K_M # Run inference directly in the terminal: llama-cli -hf exploitintel/cve-cwe-qwen3-32b:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf exploitintel/cve-cwe-qwen3-32b:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf exploitintel/cve-cwe-qwen3-32b:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf exploitintel/cve-cwe-qwen3-32b:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf exploitintel/cve-cwe-qwen3-32b:Q4_K_M
Use Docker
docker model run hf.co/exploitintel/cve-cwe-qwen3-32b:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use exploitintel/cve-cwe-qwen3-32b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "exploitintel/cve-cwe-qwen3-32b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "exploitintel/cve-cwe-qwen3-32b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/exploitintel/cve-cwe-qwen3-32b:Q4_K_M
- SGLang
How to use exploitintel/cve-cwe-qwen3-32b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "exploitintel/cve-cwe-qwen3-32b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "exploitintel/cve-cwe-qwen3-32b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "exploitintel/cve-cwe-qwen3-32b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "exploitintel/cve-cwe-qwen3-32b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use exploitintel/cve-cwe-qwen3-32b with Ollama:
ollama run hf.co/exploitintel/cve-cwe-qwen3-32b:Q4_K_M
- Unsloth Studio
How to use exploitintel/cve-cwe-qwen3-32b with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for exploitintel/cve-cwe-qwen3-32b to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for exploitintel/cve-cwe-qwen3-32b to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for exploitintel/cve-cwe-qwen3-32b to start chatting
- Pi
How to use exploitintel/cve-cwe-qwen3-32b with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf exploitintel/cve-cwe-qwen3-32b:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "exploitintel/cve-cwe-qwen3-32b:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use exploitintel/cve-cwe-qwen3-32b with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf exploitintel/cve-cwe-qwen3-32b:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default exploitintel/cve-cwe-qwen3-32b:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use exploitintel/cve-cwe-qwen3-32b with Docker Model Runner:
docker model run hf.co/exploitintel/cve-cwe-qwen3-32b:Q4_K_M
- Lemonade
How to use exploitintel/cve-cwe-qwen3-32b with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull exploitintel/cve-cwe-qwen3-32b:Q4_K_M
Run and chat with the model
lemonade run user.cve-cwe-qwen3-32b-Q4_K_M
List all available models
lemonade list
| #!/usr/bin/env python3 | |
| """Evaluate a fine-tuned CVE -> CWE model on the held-out test split. | |
| Reports exact-match accuracy plus micro/macro multi-label F1, stratified into | |
| "easy" (the weakness is named in the description) vs "hard" (it must be inferred), | |
| so you see real-world performance instead of one flattered average. | |
| Loads with plain transformers. Newer architectures (e.g. model_type ``gemma4``, | |
| used by gemma-4-E4B) need **transformers >= 5.5** -- older versions raise | |
| ``KeyError: 'gemma4'``. Note: do NOT load gemma4 through unsloth in a Studio env | |
| whose transformers was upgraded -- the upgrade pulls ``huggingface_hub`` 1.x, | |
| which breaks ``unsloth_zoo``'s config lookup. Plain transformers is the clean path. | |
| python evaluate.py --model "C:\\path\\to\\exported\\merged_model" --limit 500 | |
| python evaluate.py --model "C:\\path\\to\\exported\\merged_model" | |
| Needs: transformers>=5.5, torch, datasets, accelerate. | |
| """ | |
| from __future__ import annotations | |
| import argparse | |
| import re | |
| import torch | |
| from datasets import load_dataset | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| CWE_RE = re.compile(r"CWE-\d+") | |
| # A row is "easy" if the description literally names the weakness (the model can | |
| # keyword-match); "hard" rows require inferring the CWE from the prose. | |
| EASY_KW = [ | |
| "sql injection", | |
| "cross-site scripting", | |
| "cross site scripting", | |
| "xss", | |
| "buffer overflow", | |
| "use after free", | |
| "use-after-free", | |
| "path traversal", | |
| "command injection", | |
| "out-of-bounds", | |
| "out of bounds", | |
| "race condition", | |
| "deserialization", | |
| "ssrf", | |
| "server-side request forgery", | |
| "csrf", | |
| "cross-site request forgery", | |
| "open redirect", | |
| "integer overflow", | |
| ] | |
| def parse_cwes(text: str) -> set[str]: | |
| return set(CWE_RE.findall(text)) | |
| def is_easy(description: str) -> bool: | |
| return any(k in description.lower() for k in EASY_KW) | |
| def prf(tp: int, fp: int, fn: int) -> tuple[float, float, float]: | |
| p = tp / (tp + fp) if (tp + fp) else 0.0 | |
| r = tp / (tp + fn) if (tp + fn) else 0.0 | |
| f = 2 * p * r / (p + r) if (p + r) else 0.0 | |
| return p, r, f | |
| def build_prompt(tok, messages: list[dict]) -> str: | |
| """Prompt = everything up to (but not including) the assistant answer.""" | |
| convo = messages[:-1] | |
| try: | |
| return tok.apply_chat_template(convo, tokenize=False, add_generation_prompt=True) | |
| except Exception: | |
| # Some chat templates (e.g. Gemma) reject a separate "system" role; | |
| # fold the system text into the user turn instead. | |
| sys_txt = next((m["content"] for m in convo if m["role"] == "system"), "") | |
| usr_txt = next((m["content"] for m in convo if m["role"] == "user"), "") | |
| folded = [{"role": "user", "content": f"{sys_txt}\n\n{usr_txt}".strip()}] | |
| return tok.apply_chat_template(folded, tokenize=False, add_generation_prompt=True) | |
| def score(truths: list[set[str]], preds: list[set[str]], easies: list[bool]) -> None: | |
| micro = [0, 0, 0] # tp, fp, fn | |
| per_label: dict[str, list[int]] = {} | |
| exact = 0 | |
| strata = {"easy": [0, 0, 0, 0, 0], "hard": [0, 0, 0, 0, 0]} # tp,fp,fn,exact,n | |
| for true, pred, easy in zip(truths, preds, easies): | |
| tp, fp, fn = len(pred & true), len(pred - true), len(true - pred) | |
| micro[0] += tp | |
| micro[1] += fp | |
| micro[2] += fn | |
| ex = int(pred == true) | |
| exact += ex | |
| for lab in true | pred: | |
| d = per_label.setdefault(lab, [0, 0, 0]) | |
| if lab in true and lab in pred: | |
| d[0] += 1 | |
| elif lab in pred: | |
| d[1] += 1 | |
| else: | |
| d[2] += 1 | |
| s = strata["easy" if easy else "hard"] | |
| s[0] += tp | |
| s[1] += fp | |
| s[2] += fn | |
| s[3] += ex | |
| s[4] += 1 | |
| n = len(truths) | |
| micro_f1 = prf(*micro)[2] | |
| macro_f1 = sum(prf(*v)[2] for v in per_label.values()) / len(per_label) if per_label else 0.0 | |
| print("\n=== CVE -> CWE evaluation ===") | |
| print(f"examples : {n}") | |
| print(f"exact-match accuracy : {exact / n:.3f} (predicted CWE set == true set)") | |
| print(f"micro-F1 : {micro_f1:.3f}") | |
| print(f"macro-F1 : {macro_f1:.3f} (unweighted mean over {len(per_label)} CWEs)") | |
| print("\n-- by difficulty --") | |
| for name, label in (("easy", "easy (weakness named)"), ("hard", "hard (must infer) ")): | |
| tp, fp, fn, ex, m = strata[name] | |
| if m: | |
| print(f" {label:22s} n={m:5d} exact={ex / m:.3f} micro-F1={prf(tp, fp, fn)[2]:.3f}") | |
| def main() -> None: | |
| ap = argparse.ArgumentParser(description="Evaluate a CVE->CWE model on the test split.") | |
| ap.add_argument("--model", required=True, help="path or HF id of the fine-tuned (merged) model") | |
| ap.add_argument("--dataset", default="exploitintel/cve-cwe-consensus") | |
| ap.add_argument("--split", default="test") | |
| ap.add_argument( | |
| "--limit", type=int, default=None, help="evaluate only the first N rows (quick check)" | |
| ) | |
| ap.add_argument("--batch-size", type=int, default=16) | |
| ap.add_argument("--max-new-tokens", type=int, default=32) | |
| args = ap.parse_args() | |
| print(f"loading model: {args.model}") | |
| try: | |
| tok = AutoTokenizer.from_pretrained(args.model) | |
| except (AttributeError, TypeError): | |
| # Some Gemma tokenizer configs store `extra_special_tokens` as a list, which | |
| # trips a transformers bug ('list' object has no attribute 'keys'). | |
| tok = AutoTokenizer.from_pretrained(args.model, extra_special_tokens={}) | |
| tok.padding_side = "left" # decoder-only batched generation needs left padding | |
| if tok.pad_token is None: | |
| tok.pad_token = tok.eos_token | |
| device = "cuda" if torch.cuda.is_available() else "cpu" | |
| try: | |
| model = AutoModelForCausalLM.from_pretrained(args.model, dtype="auto").to(device) | |
| except TypeError: | |
| # `dtype` is the transformers 5.x name; older releases use `torch_dtype`. | |
| model = AutoModelForCausalLM.from_pretrained(args.model, torch_dtype="auto").to(device) | |
| model.eval() | |
| ds = load_dataset(args.dataset, split=args.split) | |
| if args.limit: | |
| ds = ds.select(range(min(args.limit, len(ds)))) | |
| prompts, truths, easies = [], [], [] | |
| for ex in ds: | |
| msgs = ex["messages"] | |
| prompts.append(build_prompt(tok, msgs)) | |
| truths.append(parse_cwes(msgs[-1]["content"])) | |
| usr = next((m["content"] for m in msgs if m["role"] == "user"), "") | |
| easies.append(is_easy(usr)) | |
| preds: list[set[str]] = [] | |
| for i in range(0, len(prompts), args.batch_size): | |
| batch = prompts[i : i + args.batch_size] | |
| enc = tok(batch, return_tensors="pt", padding=True, truncation=True, max_length=1024).to( | |
| device | |
| ) | |
| with torch.no_grad(): | |
| out = model.generate( | |
| **enc, | |
| max_new_tokens=args.max_new_tokens, | |
| do_sample=False, # greedy = deterministic | |
| pad_token_id=tok.pad_token_id, | |
| ) | |
| new_tokens = out[:, enc["input_ids"].shape[1] :] # drop the prompt, keep the answer | |
| for row in new_tokens: | |
| preds.append(parse_cwes(tok.decode(row, skip_special_tokens=True))) | |
| print(f" {min(i + args.batch_size, len(prompts))}/{len(prompts)}", end="\r") | |
| print() | |
| score(truths, preds, easies) | |
| if __name__ == "__main__": | |
| main() | |