UCSB-SURFI
/

VulnLLM-R-7B

Text Generation

vulnerability-detection

Model card Files Files and versions

VulnLLM-R-7B / README.md

yuzhounie's picture

Update README.md

8cd13d7 verified 17 days ago

|

history blame contribute delete

3.59 kB

	---
	language:
	- en
	- code
	license: apache-2.0
	tags:
	- security
	- vulnerability-detection
	- code-analysis
	- reasoning
	- llm
	pipeline_tag: text-generation
	base_model: Qwen/Qwen2.5-7B-Instruct
	---

	# VulnLLM-R-7B: Specialized Reasoning LLM for Vulnerability Detection

	VulnLLM-R is the first specialized reasoning Large Language Model designed specifically for software vulnerability detection.

	Unlike traditional static analysis tools (like CodeQL) or small LLMs that rely on simple pattern matching, VulnLLM-R is trained to reason step-by-step about data flow, control flow, and security context. It mimics the thought process of a human security auditor to identify complex logic vulnerabilities with high accuracy.

	## 🔗 Quick Links
	* Paper: [arXiv:2512.07533](https://arxiv.org/abs/2512.07533)
	* Code & Data: [GitHub](https://github.com/ucsb-mlsec/VulnLLM-R)
	* Demo: [Web demo](https://huggingface.co/spaces/UCSB-SURFI/VulnLLM-R)

	## 💡 Key Features
	* Reasoning-Based Detection: Does not just classify code; it generates a "Chain-of-Thought" to analyze why a vulnerability exists.
	* Superior Accuracy: Outperforms commercial giants (like Claude-3.7-Sonnet, o3-mini) and industry-standard tools (CodeQL, AFL++) on key benchmarks.
	* Efficiency: Achieves SOTA performance with only 7B parameters, making it 30x smaller and significantly faster than general-purpose reasoning models.
	* Broad Coverage: Trained and tested on C, C++, Python, and Java (zero-shot generalization).

	## 🚀 Quick Start

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	model_name = "UCSB-SURFI/VulnLLM-R-7B"

	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype=torch.bfloat16,
	device_map="auto"
	)

	# Example Code Snippet
	code_snippet = """
	void vulnerable_function(char *input) {
	char buffer[50];
	strcpy(buffer, input); // Potential buffer overflow
	}
	"""

	# Prompt Template (Triggering Reasoning)
	prompt = f"""You are an advanced vulnerability detection model.
	Please analyze the following code step-by-step to determine if it contains a vulnerability.

	Code:
	{code_snippet}

	Please provide your reasoning followed by the final answer.
	"""

	messages = [
	{"role": "user", "content": prompt}
	]
	text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

	generated_ids = model.generate(
	model_inputs.input_ids,
	max_new_tokens=512
	)
	generated_ids = [
	output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
	]

	response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
	print(response)
	```

	## 📊 Performance

	VulnLLM-R-7B achieves state-of-the-art results on benchmarks including PrimeVul, Juliet 1.3, and ARVO.

	<img width="600" alt="model_size_vs_f1_scatter_01" src="https://github.com/user-attachments/assets/fc9e6942-14f8-4f34-8229-74596b05c7c5" />

	(Refer to Figure 1 and Table 4 in the paper for detailed metrics)

	## 📚 Citation

	If you use this model in your research, please cite our paper:

	```Bibtex
	@article{nie2025vulnllmr,
	title={VulnLLM-R: Specialized Reasoning LLM with Agent Scaffold for Vulnerability Detection},
	author={Nie, Yuzhou and Li, Hongwei and Guo, Chengquan and Jiang, Ruizhe and Wang, Zhun and Li, Bo and Song, Dawn and Guo, Wenbo},
	journal={arXiv preprint arXiv:2512.07533},
	year={2025}
	}
	```