gateremark commited on
Commit
45aaf8b
·
verified ·
1 Parent(s): de81091

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +187 -11
README.md CHANGED
@@ -1,19 +1,195 @@
1
  ---
2
- base_model: unsloth/Qwen2.5-Coder-7B-Instruct-bnb-4bit
3
- tags:
4
- - text-generation-inference
5
- - transformers
6
- - unsloth
7
- - qwen2
8
- license: apache-2.0
9
  language:
10
  - en
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  ---
12
 
13
- # Uploaded finetuned model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
- - **Finetuned from model :** unsloth/Qwen2.5-Coder-7B-Instruct-bnb-4bit
16
 
17
- This qwen2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
1
  ---
 
 
 
 
 
 
 
2
  language:
3
  - en
4
+ license: mit
5
+ library_name: transformers
6
+ tags:
7
+ - security
8
+ - code
9
+ - vulnerability-detection
10
+ - grpo
11
+ - reinforcement-learning
12
+ - unsloth
13
+ - openenv
14
+ - agentbeats
15
+ base_model: unsloth/Qwen2.5-Coder-7B-Instruct-bnb-4bit
16
+ datasets:
17
+ - custom
18
+ pipeline_tag: text-generation
19
  ---
20
 
21
+ # VulnHunter: AI Security Agent
22
+
23
+ **An AI agent trained with GRPO to detect and fix web application security vulnerabilities.**
24
+
25
+ [![GitHub](https://img.shields.io/badge/GitHub-vulnhunter-black)](https://github.com/gateremark/vulnhunter)
26
+ [![W&B](https://img.shields.io/badge/W%26B-Training%20Run-orange)](https://wandb.ai/gatere-ai/huggingface/runs/v0dge86p)
27
+ [![AgentBeats](https://img.shields.io/badge/AgentBeats-OpenEnv%20Challenge-green)](https://rdi.berkeley.edu/agentx-agentbeats)
28
+
29
+ ## Model Description
30
+
31
+ VulnHunter is a fine-tuned Qwen2.5-Coder-7B model specialized for security vulnerability detection and patching. It was trained using **GRPO (Group Relative Policy Optimization)** with a custom security reward function.
32
+
33
+ ### Capabilities
34
+
35
+ - ✅ **SQL Injection Detection** - Identifies unsanitized SQL queries
36
+ - ✅ **XSS Detection** - Finds unescaped user input in HTML
37
+ - ✅ **Path Traversal Detection** - Detects unchecked file paths
38
+ - ✅ **Automatic Fix Generation** - Suggests secure code patches
39
+
40
+ ## Quick Start
41
+
42
+ ```python
43
+ from unsloth import FastLanguageModel
44
+
45
+ model, tokenizer = FastLanguageModel.from_pretrained(
46
+ "gateremark/vulnhunter-agent"
47
+ )
48
+
49
+ # Analyze vulnerable code
50
+ prompt = """Analyze this code for security vulnerabilities:
51
+ query = f"SELECT * FROM users WHERE id = {user_id}"
52
+ cursor.execute(query)
53
+ """
54
+
55
+ inputs = tokenizer(prompt, return_tensors="pt")
56
+ outputs = model.generate(**inputs, max_new_tokens=256)
57
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
58
+ ```
59
+
60
+ ## Training Details
61
+
62
+ ### Base Model
63
+ - **Model:** Qwen2.5-Coder-7B-Instruct
64
+ - **Quantization:** 4-bit (BitsAndBytes)
65
+ - **Framework:** Unsloth + TRL
66
+
67
+ ### Why Qwen2.5-Coder?
68
+ 1. Pre-trained on code - understands Python, SQL, security patterns
69
+ 2. Instruct variant - follows instructions out-of-the-box
70
+ 3. 7B size - sweet spot between capability and cost
71
+ 4. Unsloth support - 2x faster training
72
+
73
+ ### Training Configuration
74
+
75
+ | Parameter | Value |
76
+ |-----------|-------|
77
+ | Method | GRPO (Group Relative Policy Optimization) |
78
+ | Hardware | NVIDIA A100-SXM4-40GB |
79
+ | Training Time | ~90 minutes |
80
+ | Steps | 200 |
81
+ | LoRA Rank | 32 |
82
+ | Learning Rate | 2e-5 |
83
+ | Batch Size | 1 (4 gradient accumulation) |
84
+ | Group Size | 4 generations |
85
+
86
+ ### Why GRPO?
87
+
88
+ | Method | Memory | Our Choice |
89
+ |--------|--------|------------|
90
+ | SFT | Low | Too passive |
91
+ | PPO | High (needs critic) | Memory-prohibitive |
92
+ | DPO | Medium | Needs preference pairs |
93
+ | **GRPO** | Low | ✅ Perfect for rewards |
94
 
95
+ GRPO eliminates the critic model by comparing responses within groups, giving PPO-quality learning without 2x memory overhead.
96
 
97
+ ### Reward Function
98
+
99
+ | Event | Reward |
100
+ |-------|--------|
101
+ | Identify vulnerability type | +0.3 |
102
+ | Generate valid patch | +0.2 |
103
+ | Patch blocks exploit | +1.0 |
104
+ | Syntax error in patch | -0.2 |
105
+
106
+ ## Evaluation Results
107
+
108
+ ### Test Cases
109
+
110
+ **SQL Injection:**
111
+ ```python
112
+ # Input
113
+ query = f"SELECT * FROM users WHERE username = '{username}'"
114
+
115
+ # VulnHunter Output
116
+ # "SQL injection vulnerability. Use parameterized queries:
117
+ # query = 'SELECT * FROM users WHERE username = %s'
118
+ # cursor.execute(query, (username,))"
119
+ ```
120
+
121
+ **XSS:**
122
+ ```python
123
+ # Input
124
+ return f"<h1>Results for: {query}</h1>"
125
+
126
+ # VulnHunter Output
127
+ # "XSS vulnerability. Escape user input:
128
+ # from html import escape
129
+ # return f'<h1>Results for: {escape(query)}</h1>'"
130
+ ```
131
+
132
+ **Path Traversal:**
133
+ ```python
134
+ # Input
135
+ return send_file(f"/uploads/{filename}")
136
+
137
+ # VulnHunter Output
138
+ # "Path traversal vulnerability. Use safe_join:
139
+ # from werkzeug.utils import safe_join
140
+ # return send_file(safe_join('/uploads', filename))"
141
+ ```
142
+
143
+ ## OpenEnv Environment
144
+
145
+ VulnHunter comes with an OpenEnv-compatible RL environment:
146
+
147
+ ```bash
148
+ git clone https://github.com/gateremark/vulnhunter
149
+ cd vulnhunter
150
+ uvicorn vulnhunter.env_server.server:app --port 8000
151
+ ```
152
+
153
+ ## Green Agent (A2A Protocol)
154
+
155
+ The model includes an A2A-compatible agent wrapper:
156
+
157
+ ```bash
158
+ cd vulnhunter/green_agent
159
+ python server.py
160
+ # Agent at http://localhost:9009
161
+ ```
162
+
163
+ ### Agent Card
164
+ ```json
165
+ {
166
+ "name": "VulnHunter",
167
+ "skills": [{"id": "analyze_code", "name": "Analyze Code"}]
168
+ }
169
+ ```
170
+
171
+ ## Links
172
+
173
+ - **GitHub:** [github.com/gateremark/vulnhunter](https://github.com/gateremark/vulnhunter)
174
+ - **W&B Training:** [wandb.ai/gatere-ai/huggingface/runs/v0dge86p](https://wandb.ai/gatere-ai/huggingface/runs/v0dge86p)
175
+ - **OpenEnv:** [github.com/meta-pytorch/OpenEnv](https://github.com/meta-pytorch/OpenEnv)
176
+
177
+ ## Citation
178
+
179
+ ```bibtex
180
+ @misc{vulnhunter2026,
181
+ author = {gateremark},
182
+ title = {VulnHunter: AI Security Agent with GRPO},
183
+ year = {2026},
184
+ publisher = {HuggingFace},
185
+ url = {https://huggingface.co/gateremark/vulnhunter-agent}
186
+ }
187
+ ```
188
+
189
+ ## Acknowledgments
190
+
191
+ Built for the **AgentBeats OpenEnv Challenge** sponsored by PyTorch, Hugging Face, and Unsloth.
192
+
193
+ ---
194
 
195
+ *Built with ❤️ by [gateremark](https://github.com/gateremark)*