---
language:
- en
license: mit
library_name: transformers
tags:
- security
- code
- vulnerability-detection
- grpo
- reinforcement-learning
- unsloth
- openenv
- agentbeats
base_model: unsloth/Qwen2.5-Coder-7B-Instruct-bnb-4bit
datasets:
- custom
pipeline_tag: text-generation
---
# VulnHunter: AI Security Agent
**An AI agent trained with GRPO to detect and fix web application security vulnerabilities.**
[](https://github.com/gateremark/vulnhunter)
[](https://wandb.ai/gatere-ai/huggingface/runs/v0dge86p)
[](https://rdi.berkeley.edu/agentx-agentbeats)
This model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
[
](https://github.com/unslothai/unsloth)
## Model Description
VulnHunter is a fine-tuned Qwen2.5-Coder-7B model specialized for security vulnerability detection and patching. It was trained using **GRPO (Group Relative Policy Optimization)** with a custom security reward function.
### Capabilities
- ✅ **SQL Injection Detection** - Identifies unsanitized SQL queries
- ✅ **XSS Detection** - Finds unescaped user input in HTML
- ✅ **Path Traversal Detection** - Detects unchecked file paths
- ✅ **Automatic Fix Generation** - Suggests secure code patches
## Quick Start
```python
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
"gateremark/vulnhunter-agent"
)
# Analyze vulnerable code
prompt = """Analyze this code for security vulnerabilities:
query = f"SELECT * FROM users WHERE id = {user_id}"
cursor.execute(query)
"""
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Training Details
### Base Model
- **Model:** Qwen2.5-Coder-7B-Instruct
- **Quantization:** 4-bit (BitsAndBytes)
- **Framework:** Unsloth + TRL
### Why Qwen2.5-Coder?
1. Pre-trained on code - understands Python, SQL, security patterns
2. Instruct variant - follows instructions out-of-the-box
3. 7B size - sweet spot between capability and cost
4. Unsloth support - 2x faster training
### Training Configuration
| Parameter | Value |
|-----------|-------|
| Method | GRPO (Group Relative Policy Optimization) |
| Hardware | NVIDIA A100-SXM4-40GB |
| Training Time | ~90 minutes |
| Steps | 200 |
| LoRA Rank | 32 |
| Learning Rate | 2e-5 |
| Batch Size | 1 (4 gradient accumulation) |
| Group Size | 4 generations |
### Why GRPO?
| Method | Memory | Our Choice |
|--------|--------|------------|
| SFT | Low | Too passive |
| PPO | High (needs critic) | Memory-prohibitive |
| DPO | Medium | Needs preference pairs |
| **GRPO** | Low | ✅ Perfect for rewards |
GRPO eliminates the critic model by comparing responses within groups, giving PPO-quality learning without 2x memory overhead.
### Reward Function
| Event | Reward |
|-------|--------|
| Identify vulnerability type | +0.3 |
| Generate valid patch | +0.2 |
| Patch blocks exploit | +1.0 |
| Syntax error in patch | -0.2 |
## Evaluation Results
### Test Cases
**SQL Injection:**
```python
# Input
query = f"SELECT * FROM users WHERE username = '{username}'"
# VulnHunter Output
# "SQL injection vulnerability. Use parameterized queries:
# query = 'SELECT * FROM users WHERE username = %s'
# cursor.execute(query, (username,))"
```
**XSS:**
```python
# Input
return f"