VextLabs commited on
Commit
fd3d89c
Β·
verified Β·
1 Parent(s): 52fa793

Upload GITHUB_README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. GITHUB_README.md +190 -0
GITHUB_README.md ADDED
@@ -0,0 +1,190 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # VEXT Pentest-7B
2
+
3
+ **The first open-source language model built for penetration testing and security analysis.**
4
+
5
+ Fine-tuned from [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) on 260,000+ curated security examples from real pentesting engagements, CTF challenges, bug bounty programs, MITRE ATT&CK, and OWASP methodologies. Aligned with DPO using validated vulnerability findings as preference signal.
6
+
7
+ Runs on a single consumer GPU, a MacBook via Ollama, or CPU-only with quantized weights. No API keys. No cloud dependency. Your data stays on your machine.
8
+
9
+ **[HuggingFace Model](https://huggingface.co/vext-labs/pentest-7b)** | **[VEXT Platform](https://tryvext.com)** | **[Discord](https://discord.gg/vext-security)**
10
+
11
+ ---
12
+
13
+ ## What It Does
14
+
15
+ - **Vulnerability Analysis** -- Explain CVEs, classify weaknesses, assess impact
16
+ - **Pentest Report Writing** -- Generate executive summaries, technical findings, and remediation sections
17
+ - **Attack Planning** -- Suggest prioritized attack paths aligned with MITRE ATT&CK and OWASP
18
+ - **Security Code Review** -- Identify injection flaws, auth bypasses, and OWASP Top 10 issues
19
+ - **Remediation Guidance** -- Actionable fix recommendations with code examples
20
+ - **Compliance Mapping** -- Map findings to PCI DSS, SOC 2, HIPAA, ISO 27001
21
+
22
+ ## Installation
23
+
24
+ ### Option 1: Ollama (Easiest)
25
+
26
+ ```bash
27
+ ollama pull vext-labs/pentest-7b
28
+ ollama run vext-labs/pentest-7b
29
+ ```
30
+
31
+ ### Option 2: pip (Transformers)
32
+
33
+ ```bash
34
+ pip install transformers torch accelerate
35
+ ```
36
+
37
+ ```python
38
+ from transformers import AutoModelForCausalLM, AutoTokenizer
39
+
40
+ model = AutoModelForCausalLM.from_pretrained("vext-labs/pentest-7b", torch_dtype="auto", device_map="auto")
41
+ tokenizer = AutoTokenizer.from_pretrained("vext-labs/pentest-7b")
42
+ ```
43
+
44
+ ### Option 3: vLLM (Production Serving)
45
+
46
+ ```bash
47
+ pip install vllm
48
+ vllm serve vext-labs/pentest-7b --port 8000
49
+ ```
50
+
51
+ Then query the OpenAI-compatible API at `http://localhost:8000/v1/chat/completions`.
52
+
53
+ ### Option 4: Docker
54
+
55
+ ```bash
56
+ docker run --gpus all -p 8000:8000 ghcr.io/vext-labs/pentest-7b:latest
57
+ ```
58
+
59
+ ## Quick Start
60
+
61
+ ```python
62
+ from transformers import AutoModelForCausalLM, AutoTokenizer
63
+
64
+ model_id = "vext-labs/pentest-7b"
65
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
66
+ model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")
67
+
68
+ messages = [
69
+ {"role": "system", "content": "You are an expert penetration tester."},
70
+ {"role": "user", "content": "I found an IDOR on /api/users/{id}/profile. Write the finding for my report."},
71
+ ]
72
+
73
+ text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
74
+ inputs = tokenizer(text, return_tensors="pt").to(model.device)
75
+ outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7)
76
+ print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))
77
+ ```
78
+
79
+ ## Benchmarks
80
+
81
+ | Benchmark | Pentest-7B | Qwen2.5-7B (base) | GPT-4o |
82
+ |---|---|---|---|
83
+ | SecBench (vuln classification) | **82.4%** | 61.2% | 79.8% |
84
+ | CyberMetric (security knowledge) | **74.1%** | 52.7% | 71.3% |
85
+ | PentestQA (methodology) | **88.6%** | 44.3% | 83.1% |
86
+ | Finding Quality (human eval, 1-5) | **4.2** | 2.1 | 4.4 |
87
+ | False Positive Rate | **12.3%** | 41.7% | 15.8% |
88
+
89
+ *Temperature=0, greedy decoding. Human evaluation by 3 senior pentesters on 200 findings.*
90
+
91
+ ## Training Summary
92
+
93
+ ```
94
+ Qwen2.5-7B-Instruct
95
+ -> QLoRA SFT (260K examples, 3 epochs, r=16, alpha=32)
96
+ -> DPO Alignment (2K+ preference pairs, beta=0.1)
97
+ -> Adapter Merge
98
+ -> AWQ 4-bit Quantization (optional)
99
+ ```
100
+
101
+ **Training data sources:** Production pentesting traces (anonymized), CTF walkthroughs, public bug bounty write-ups, MITRE ATT&CK, OWASP, CVE analysis. No raw exploits or malicious payloads.
102
+
103
+ See the [HuggingFace model card](https://huggingface.co/vext-labs/pentest-7b) for full training details.
104
+
105
+ ## Hardware Requirements
106
+
107
+ | Format | GPU VRAM | RAM (CPU-only) |
108
+ |---|---|---|
109
+ | Full precision (bf16) | 16 GB | 32 GB |
110
+ | AWQ 4-bit | 6 GB | 16 GB |
111
+ | GGUF Q4_K_M (Ollama) | -- | 8 GB |
112
+
113
+ ## Telemetry
114
+
115
+ Opt-in only. Off by default. Collects only anonymous aggregate stats (vuln categories, tool success rates). Never collects URLs, IPs, credentials, or vulnerability details.
116
+
117
+ ```bash
118
+ export VEXT_TELEMETRY=on # opt in
119
+ export VEXT_TELEMETRY=off # opt out (default)
120
+ ```
121
+
122
+ Source: [`telemetry/collector.py`](telemetry/collector.py) -- fully auditable.
123
+
124
+ ## Repository Structure
125
+
126
+ ```
127
+ .
128
+ β”œβ”€β”€ README.md # This file
129
+ β”œβ”€β”€ config.json # Model configuration
130
+ β”œβ”€β”€ tokenizer_config.json # Tokenizer configuration
131
+ β”œβ”€β”€ model*.safetensors # Model weights
132
+ β”œβ”€β”€ telemetry/
133
+ β”‚ └── collector.py # Opt-in telemetry (off by default)
134
+ └── examples/
135
+ β”œβ”€β”€ chat.py # Basic chat example
136
+ β”œβ”€β”€ serve_vllm.sh # vLLM serving script
137
+ └── ollama_modelfile # Ollama Modelfile
138
+ ```
139
+
140
+ ## Contributing
141
+
142
+ We welcome contributions:
143
+
144
+ 1. **Bug reports** -- Open an issue with reproduction steps.
145
+ 2. **Evaluation benchmarks** -- Add new security-specific benchmarks or improve existing ones.
146
+ 3. **Training data** -- Contribute anonymized, non-sensitive security examples (CTF write-ups, methodology guides).
147
+ 4. **Documentation** -- Improve examples, add tutorials, translate the model card.
148
+ 5. **Integrations** -- Build plugins for Burp Suite, OWASP ZAP, or other security tools.
149
+
150
+ ### Development
151
+
152
+ ```bash
153
+ git clone https://github.com/vext-labs/pentest-7b.git
154
+ cd pentest-7b
155
+ pip install -e ".[dev]"
156
+ pytest tests/
157
+ ```
158
+
159
+ ### Code of Conduct
160
+
161
+ This project is intended for **authorized security testing only**. Contributors must not submit training data containing:
162
+ - Credentials, PII, or sensitive business data
163
+ - Exploits targeting unpatched zero-days without responsible disclosure
164
+ - Content that facilitates unauthorized access
165
+
166
+ ## Responsible Use
167
+
168
+ - Only use against systems you have **written authorization** to test.
169
+ - Always **verify findings manually** before reporting.
170
+ - This model is a tool, not a replacement for professional judgment.
171
+ - See the [HuggingFace model card](https://huggingface.co/vext-labs/pentest-7b) for full limitations.
172
+
173
+ ## License
174
+
175
+ [Apache 2.0](LICENSE)
176
+
177
+ ## Citation
178
+
179
+ ```bibtex
180
+ @misc{vext-pentest-7b-2026,
181
+ title = {VEXT Pentest-7B: An Open-Source Language Model for Penetration Testing and Security Analysis},
182
+ author = {VEXT Labs},
183
+ year = {2026},
184
+ url = {https://huggingface.co/vext-labs/pentest-7b},
185
+ }
186
+ ```
187
+
188
+ ---
189
+
190
+ Built by [VEXT Labs, Inc.](https://tryvext.com)