| | --- |
| | language: |
| | - en |
| | - code |
| | license: apache-2.0 |
| | tags: |
| | - security |
| | - vulnerability-detection |
| | - code-analysis |
| | - reasoning |
| | - llm |
| | pipeline_tag: text-generation |
| | base_model: UCSB-SURFI/VulnLLM-R-7B |
| | --- |
| | |
| | # VulnLLM-R-7B: Specialized Reasoning LLM for Vulnerability Detection |
| |
|
| | **VulnLLM-R** is the first specialized **reasoning** Large Language Model designed specifically for software vulnerability detection. |
| |
|
| | Unlike traditional static analysis tools (like CodeQL) or small LLMs that rely on simple pattern matching, VulnLLM-R is trained to **reason step-by-step** about data flow, control flow, and security context. It mimics the thought process of a human security auditor to identify complex logic vulnerabilities with high accuracy. |
| |
|
| | ## π Quick Links |
| | * **Paper:** [arXiv:2512.07533](https://arxiv.org/abs/2512.07533) |
| | * **Code & Data:** [GitHub](https://github.com/ucsb-mlsec/VulnLLM-R) |
| | * **Demo:** [Web demo](https://huggingface.co/spaces/UCSB-SURFI/VulnLLM-R) |
| |
|
| | ## π‘ Key Features |
| | * **Reasoning-Based Detection:** Does not just classify code; it generates a "Chain-of-Thought" to analyze *why* a vulnerability exists. |
| | * **Superior Accuracy:** Outperforms commercial giants (like Claude-3.7-Sonnet, o3-mini) and industry-standard tools (CodeQL, AFL++) on key benchmarks. |
| | * **Efficiency:** Achieves SOTA performance with only **7B parameters**, making it 30x smaller and significantly faster than general-purpose reasoning models. |
| | * **Broad Coverage:** Trained and tested on C, C++, Python, and Java (zero-shot generalization). |
| |
|
| | ## π Quick Start |
| |
|
| | ```python |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | import torch |
| | |
| | model_name = "UCSB-SURFI/VulnLLM-R-7B" |
| | |
| | tokenizer = AutoTokenizer.from_pretrained(model_name) |
| | model = AutoModelForCausalLM.from_pretrained( |
| | model_name, |
| | torch_dtype=torch.bfloat16, |
| | device_map="auto" |
| | ) |
| | |
| | # Example Code Snippet |
| | code_snippet = """ |
| | void vulnerable_function(char *input) { |
| | char buffer[50]; |
| | strcpy(buffer, input); // Potential buffer overflow |
| | } |
| | """ |
| | |
| | # Prompt Template (Triggering Reasoning) |
| | prompt = f"""You are an advanced vulnerability detection model. |
| | Please analyze the following code step-by-step to determine if it contains a vulnerability. |
| | |
| | Code: |
| | {code_snippet} |
| | |
| | Please provide your reasoning followed by the final answer. |
| | """ |
| | |
| | messages = [ |
| | {"role": "user", "content": prompt} |
| | ] |
| | text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
| | model_inputs = tokenizer([text], return_tensors="pt").to(model.device) |
| | |
| | generated_ids = model.generate( |
| | model_inputs.input_ids, |
| | max_new_tokens=512 |
| | ) |
| | generated_ids = [ |
| | output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) |
| | ] |
| | |
| | response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] |
| | print(response) |
| | ``` |
| |
|
| | ## π Performance |
| |
|
| | VulnLLM-R-7B achieves state-of-the-art results on benchmarks including PrimeVul, Juliet 1.3, and ARVO. |
| |
|
| | <img width="600" alt="model_size_vs_f1_scatter_01" src="https://github.com/user-attachments/assets/fc9e6942-14f8-4f34-8229-74596b05c7c5" /> |
| |
|
| | (Refer to Figure 1 and Table 4 in the paper for detailed metrics) |
| |
|
| | ## π Citation |
| |
|
| | If you use this model in your research, please cite our paper: |
| |
|
| | ```Bibtex |
| | @article{nie2025vulnllmr, |
| | title={VulnLLM-R: Specialized Reasoning LLM with Agent Scaffold for Vulnerability Detection}, |
| | author={Nie, Yuzhou and Li, Hongwei and Guo, Chengquan and Jiang, Ruizhe and Wang, Zhun and Li, Bo and Song, Dawn and Guo, Wenbo}, |
| | journal={arXiv preprint arXiv:2512.07533}, |
| | year={2025} |
| | } |
| | ``` |
| |
|