--- language: - en license: mit library_name: transformers tags: - security - code - vulnerability-detection - grpo - reinforcement-learning - unsloth - openenv - agentbeats base_model: unsloth/Qwen2.5-Coder-7B-Instruct-bnb-4bit datasets: - custom pipeline_tag: text-generation --- # VulnHunter: AI Security Agent **An AI agent trained with GRPO to detect and fix web application security vulnerabilities.** [![GitHub](https://img.shields.io/badge/GitHub-vulnhunter-black)](https://github.com/gateremark/vulnhunter) [![W&B](https://img.shields.io/badge/W%26B-Training%20Run-orange)](https://wandb.ai/gatere-ai/huggingface/runs/v0dge86p) [![AgentBeats](https://img.shields.io/badge/AgentBeats-OpenEnv%20Challenge-green)](https://rdi.berkeley.edu/agentx-agentbeats) This model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. [](https://github.com/unslothai/unsloth) ## Model Description VulnHunter is a fine-tuned Qwen2.5-Coder-7B model specialized for security vulnerability detection and patching. It was trained using **GRPO (Group Relative Policy Optimization)** with a custom security reward function. ### Capabilities - ✅ **SQL Injection Detection** - Identifies unsanitized SQL queries - ✅ **XSS Detection** - Finds unescaped user input in HTML - ✅ **Path Traversal Detection** - Detects unchecked file paths - ✅ **Automatic Fix Generation** - Suggests secure code patches ## Quick Start ```python from unsloth import FastLanguageModel model, tokenizer = FastLanguageModel.from_pretrained( "gateremark/vulnhunter-agent" ) # Analyze vulnerable code prompt = """Analyze this code for security vulnerabilities: query = f"SELECT * FROM users WHERE id = {user_id}" cursor.execute(query) """ inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=256) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## Training Details ### Base Model - **Model:** Qwen2.5-Coder-7B-Instruct - **Quantization:** 4-bit (BitsAndBytes) - **Framework:** Unsloth + TRL ### Why Qwen2.5-Coder? 1. Pre-trained on code - understands Python, SQL, security patterns 2. Instruct variant - follows instructions out-of-the-box 3. 7B size - sweet spot between capability and cost 4. Unsloth support - 2x faster training ### Training Configuration | Parameter | Value | |-----------|-------| | Method | GRPO (Group Relative Policy Optimization) | | Hardware | NVIDIA A100-SXM4-40GB | | Training Time | ~90 minutes | | Steps | 200 | | LoRA Rank | 32 | | Learning Rate | 2e-5 | | Batch Size | 1 (4 gradient accumulation) | | Group Size | 4 generations | ### Why GRPO? | Method | Memory | Our Choice | |--------|--------|------------| | SFT | Low | Too passive | | PPO | High (needs critic) | Memory-prohibitive | | DPO | Medium | Needs preference pairs | | **GRPO** | Low | ✅ Perfect for rewards | GRPO eliminates the critic model by comparing responses within groups, giving PPO-quality learning without 2x memory overhead. ### Reward Function | Event | Reward | |-------|--------| | Identify vulnerability type | +0.3 | | Generate valid patch | +0.2 | | Patch blocks exploit | +1.0 | | Syntax error in patch | -0.2 | ## Evaluation Results ### Test Cases **SQL Injection:** ```python # Input query = f"SELECT * FROM users WHERE username = '{username}'" # VulnHunter Output # "SQL injection vulnerability. Use parameterized queries: # query = 'SELECT * FROM users WHERE username = %s' # cursor.execute(query, (username,))" ``` **XSS:** ```python # Input return f"

Results for: {query}

" # VulnHunter Output # "XSS vulnerability. Escape user input: # from html import escape # return f'

Results for: {escape(query)}

'" ``` **Path Traversal:** ```python # Input return send_file(f"/uploads/{filename}") # VulnHunter Output # "Path traversal vulnerability. Use safe_join: # from werkzeug.utils import safe_join # return send_file(safe_join('/uploads', filename))" ``` ## OpenEnv Environment VulnHunter comes with an OpenEnv-compatible RL environment: ```bash git clone https://github.com/gateremark/vulnhunter cd vulnhunter uvicorn vulnhunter.env_server.server:app --port 8000 ``` ## Green Agent (A2A Protocol) The model includes an A2A-compatible agent wrapper: ```bash cd vulnhunter/green_agent python server.py # Agent at http://localhost:9009 ``` ### Agent Card ```json { "name": "VulnHunter", "skills": [{"id": "analyze_code", "name": "Analyze Code"}] } ``` ## Links - **GitHub:** [github.com/gateremark/vulnhunter](https://github.com/gateremark/vulnhunter) - **W&B Training:** [wandb.ai/gatere-ai/huggingface/runs/v0dge86p](https://wandb.ai/gatere-ai/huggingface/runs/v0dge86p) - **OpenEnv:** [github.com/meta-pytorch/OpenEnv](https://github.com/meta-pytorch/OpenEnv) ## Citation ```bibtex @misc{vulnhunter2026, author = {gateremark}, title = {VulnHunter: AI Security Agent with GRPO}, year = {2026}, publisher = {HuggingFace}, url = {https://huggingface.co/gateremark/vulnhunter-agent} } ``` ## Acknowledgments Built for the **AgentBeats OpenEnv Challenge** sponsored by PyTorch, Hugging Face, and Unsloth. --- *Built with ❤️ by [gateremark](https://github.com/gateremark)*