UCSB-SURFI
/

VulnLLM-R-7B

@@ -1,61 +1,103 @@
 ---
-library_name: transformers
 license: apache-2.0
-base_model: secmlr/DS-Noisy-N_DS-Clean-N_DS-OSS-N_QWQ-OSS-N_QWQ-Clean-N_QWQ-Noisy-N_Qwen2.5-7B-Instruct_sft
 tags:
-- llama-factory
-- full
-- generated_from_trainer
-model-index:
-- name: final_model_nopolicy
-  results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# final_model_nopolicy
-This model is a fine-tuned version of [secmlr/DS-Noisy-N_DS-Clean-N_DS-OSS-N_QWQ-OSS-N_QWQ-Clean-N_QWQ-Noisy-N_Qwen2.5-7B-Instruct_sft](https://huggingface.co/secmlr/DS-Noisy-N_DS-Clean-N_DS-OSS-N_QWQ-OSS-N_QWQ-Clean-N_QWQ-Noisy-N_Qwen2.5-7B-Instruct_sft) on the n-ruizhe_simplier_reasoning_ds_clean_32k, the n-ruizhe_simplier_reasoning_ds_noisy_32k, the n-ruizhe_simplier_reasoning_ds_ossfuzz_32k, the n-ruizhe_simplier_reasoning_qwq_clean_32k, the n-ruizhe_simplier_reasoning_qwq_noisy_small_32k and the n-ruizhe_simplier_reasoning_qwq_ossfuzz_32k datasets.
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 1e-05
-- train_batch_size: 1
-- eval_batch_size: 8
-- seed: 42
-- distributed_type: multi-GPU
-- num_devices: 2
-- gradient_accumulation_steps: 12
-- total_train_batch_size: 24
-- total_eval_batch_size: 16
-- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: cosine
-- lr_scheduler_warmup_ratio: 0.1
-- num_epochs: 2.0
-### Training results
-### Framework versions
-- Transformers 4.51.3
-- Pytorch 2.6.0+cu124
-- Datasets 3.2.0
-- Tokenizers 0.21.1

 ---
+language:
+- en
+- code
 license: apache-2.0
 tags:
+- security
+- vulnerability-detection
+- code-analysis
+- reasoning
+- llm
+pipeline_tag: text-generation
+base_model: Qwen/Qwen3-8B-Instruct
 ---
+# VulnLLM-R-8B: Specialized Reasoning LLM for Vulnerability Detection
+**VulnLLM-R** is the first specialized **reasoning** Large Language Model designed specifically for software vulnerability detection.
+Unlike traditional static analysis tools (like CodeQL) or small LLMs that rely on simple pattern matching, VulnLLM-R is trained to **reason step-by-step** about data flow, control flow, and security context. It mimics the thought process of a human security auditor to identify complex logic vulnerabilities with high accuracy.
+## 🔗 Quick Links
+*   **Paper:** [arXiv:2512.07533](https://arxiv.org/abs/2512.07533)
+*   **Code & Data:** [GitHub Repository](https://github.com/ucsb-mlsec/VulnLLM-R)
+*   **Demo:** [HuggingFace Space / Web Demo](https://huggingface.co/spaces/UCSB-SURFI/VulnLLM-R)
+## 💡 Key Features
+*   **Reasoning-Based Detection:** Does not just classify code; it generates a "Chain-of-Thought" to analyze *why* a vulnerability exists.
+*   **Superior Accuracy:** Outperforms commercial giants (like Claude-3.7-Sonnet, o3-mini) and industry-standard tools (CodeQL, AFL++) on key benchmarks.
+*   **Efficiency:** Achieves SOTA performance with only **8B parameters**, making it 30x smaller and significantly faster than general-purpose reasoning models.
+*   **Broad Coverage:** Trained and tested on C, C++, Python, and Java (zero-shot generalization).
+## 🚀 Quick Start
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+model_name = "UCSB-SURFI/VulnLLM-R-8B"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype=torch.bfloat16,
+    device_map="auto"
+)
+# Example Code Snippet
+code_snippet = """
+void vulnerable_function(char *input) {
+    char buffer[50];
+    strcpy(buffer, input); // Potential buffer overflow
+}
+"""
+# Prompt Template (Triggering Reasoning)
+prompt = f"""You are an advanced vulnerability detection model.
+Please analyze the following code step-by-step to determine if it contains a vulnerability.
+Code:
+{code_snippet}
+Please provide your reasoning followed by the final answer.
+"""
+messages = [
+    {"role": "user", "content": prompt}
+]
+text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+generated_ids = model.generate(
+    model_inputs.input_ids,
+    max_new_tokens=512
+)
+generated_ids = [
+    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
+]
+response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
+print(response)
+```
+## 📊 Performance
+VulnLLM-R-8B achieves state-of-the-art results on benchmarks including PrimeVul, Juliet 1.3, and ARVO.
+<img width="600" alt="model_size_vs_f1_scatter_01" src="https://github.com/user-attachments/assets/fc9e6942-14f8-4f34-8229-74596b05c7c5" />
+(Refer to Figure 1 and Table 4 in the paper for detailed metrics)
+## 📚 Citation
+If you use this model in your research, please cite our paper:
+```Bibtex
+@article{nie2025vulnllmr,
+  title={VulnLLM-R: Specialized Reasoning LLM with Agent Scaffold for Vulnerability Detection},
+  author={Nie, Yuzhou and Li, Hongwei and Guo, Chengquan and Jiang, Ruizhe and Wang, Zhun and Li, Bo and Song, Dawn and Guo, Wenbo},
+  journal={arXiv preprint arXiv:2512.07533},
+  year={2025}
+}
+```