Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +187 -11

README.md CHANGED Viewed

@@ -1,19 +1,195 @@
 ---
-base_model: unsloth/Qwen2.5-Coder-7B-Instruct-bnb-4bit
-tags:
-- text-generation-inference
-- transformers
-- unsloth
-- qwen2
-license: apache-2.0
 language:
 - en
 ---
-# Uploaded finetuned  model
-- **Finetuned from model :** unsloth/Qwen2.5-Coder-7B-Instruct-bnb-4bit
-This qwen2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
-[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 ---
 language:
 - en
+license: mit
+library_name: transformers
+tags:
+- security
+- code
+- vulnerability-detection
+- grpo
+- reinforcement-learning
+- unsloth
+- openenv
+- agentbeats
+base_model: unsloth/Qwen2.5-Coder-7B-Instruct-bnb-4bit
+datasets:
+- custom
+pipeline_tag: text-generation
 ---
+# VulnHunter: AI Security Agent
+**An AI agent trained with GRPO to detect and fix web application security vulnerabilities.**
+[![GitHub](https://img.shields.io/badge/GitHub-vulnhunter-black)](https://github.com/gateremark/vulnhunter)
+[![W&B](https://img.shields.io/badge/W%26B-Training%20Run-orange)](https://wandb.ai/gatere-ai/huggingface/runs/v0dge86p)
+[![AgentBeats](https://img.shields.io/badge/AgentBeats-OpenEnv%20Challenge-green)](https://rdi.berkeley.edu/agentx-agentbeats)
+## Model Description
+VulnHunter is a fine-tuned Qwen2.5-Coder-7B model specialized for security vulnerability detection and patching. It was trained using **GRPO (Group Relative Policy Optimization)** with a custom security reward function.
+### Capabilities
+- ✅ **SQL Injection Detection** - Identifies unsanitized SQL queries
+- ✅ **XSS Detection** - Finds unescaped user input in HTML
+- ✅ **Path Traversal Detection** - Detects unchecked file paths
+- ✅ **Automatic Fix Generation** - Suggests secure code patches
+## Quick Start
+```python
+from unsloth import FastLanguageModel
+model, tokenizer = FastLanguageModel.from_pretrained(
+    "gateremark/vulnhunter-agent"
+)
+# Analyze vulnerable code
+prompt = """Analyze this code for security vulnerabilities:
+query = f"SELECT * FROM users WHERE id = {user_id}"
+cursor.execute(query)
+"""
+inputs = tokenizer(prompt, return_tensors="pt")
+outputs = model.generate(**inputs, max_new_tokens=256)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+## Training Details
+### Base Model
+- **Model:** Qwen2.5-Coder-7B-Instruct
+- **Quantization:** 4-bit (BitsAndBytes)
+- **Framework:** Unsloth + TRL
+### Why Qwen2.5-Coder?
+1. Pre-trained on code - understands Python, SQL, security patterns
+2. Instruct variant - follows instructions out-of-the-box
+3. 7B size - sweet spot between capability and cost
+4. Unsloth support - 2x faster training
+### Training Configuration
+| Parameter | Value |
+|-----------|-------|
+| Method | GRPO (Group Relative Policy Optimization) |
+| Hardware | NVIDIA A100-SXM4-40GB |
+| Training Time | ~90 minutes |
+| Steps | 200 |
+| LoRA Rank | 32 |
+| Learning Rate | 2e-5 |
+| Batch Size | 1 (4 gradient accumulation) |
+| Group Size | 4 generations |
+### Why GRPO?
+| Method | Memory | Our Choice |
+|--------|--------|------------|
+| SFT | Low | Too passive |
+| PPO | High (needs critic) | Memory-prohibitive |
+| DPO | Medium | Needs preference pairs |
+| **GRPO** | Low | ✅ Perfect for rewards |
+GRPO eliminates the critic model by comparing responses within groups, giving PPO-quality learning without 2x memory overhead.
+### Reward Function
+| Event | Reward |
+|-------|--------|
+| Identify vulnerability type | +0.3 |
+| Generate valid patch | +0.2 |
+| Patch blocks exploit | +1.0 |
+| Syntax error in patch | -0.2 |
+## Evaluation Results
+### Test Cases
+**SQL Injection:**
+```python
+# Input
+query = f"SELECT * FROM users WHERE username = '{username}'"
+# VulnHunter Output
+# "SQL injection vulnerability. Use parameterized queries:
+# query = 'SELECT * FROM users WHERE username = %s'
+# cursor.execute(query, (username,))"
+```
+**XSS:**
+```python
+# Input
+return f"<h1>Results for: {query}</h1>"
+# VulnHunter Output
+# "XSS vulnerability. Escape user input:
+# from html import escape
+# return f'<h1>Results for: {escape(query)}</h1>'"
+```
+**Path Traversal:**
+```python
+# Input
+return send_file(f"/uploads/{filename}")
+# VulnHunter Output
+# "Path traversal vulnerability. Use safe_join:
+# from werkzeug.utils import safe_join
+# return send_file(safe_join('/uploads', filename))"
+```
+## OpenEnv Environment
+VulnHunter comes with an OpenEnv-compatible RL environment:
+```bash
+git clone https://github.com/gateremark/vulnhunter
+cd vulnhunter
+uvicorn vulnhunter.env_server.server:app --port 8000
+```
+## Green Agent (A2A Protocol)
+The model includes an A2A-compatible agent wrapper:
+```bash
+cd vulnhunter/green_agent
+python server.py
+# Agent at http://localhost:9009
+```
+### Agent Card
+```json
+{
+  "name": "VulnHunter",
+  "skills": [{"id": "analyze_code", "name": "Analyze Code"}]
+}
+```
+## Links
+- **GitHub:** [github.com/gateremark/vulnhunter](https://github.com/gateremark/vulnhunter)
+- **W&B Training:** [wandb.ai/gatere-ai/huggingface/runs/v0dge86p](https://wandb.ai/gatere-ai/huggingface/runs/v0dge86p)
+- **OpenEnv:** [github.com/meta-pytorch/OpenEnv](https://github.com/meta-pytorch/OpenEnv)
+## Citation
+```bibtex
+@misc{vulnhunter2026,
+  author = {gateremark},
+  title = {VulnHunter: AI Security Agent with GRPO},
+  year = {2026},
+  publisher = {HuggingFace},
+  url = {https://huggingface.co/gateremark/vulnhunter-agent}
+}
+```
+## Acknowledgments
+Built for the **AgentBeats OpenEnv Challenge** sponsored by PyTorch, Hugging Face, and Unsloth.
+---
+*Built with ❤️ by [gateremark](https://github.com/gateremark)*