| --- |
| license: apache-2.0 |
| language: |
| - en |
| tags: |
| - security |
| - pentesting |
| - cybersecurity |
| - vulnerability-detection |
| - red-team |
| - bug-bounty |
| - owasp |
| - mitre-attack |
| pipeline_tag: text-generation |
| model-index: |
| - name: vext-pentest-7b |
| results: [] |
| --- |
| |
| # VEXT Pentest-7B -- The First Open-Source Security AI Model |
|
|
| **Pentest-7B** is a 7-billion-parameter language model fine-tuned specifically for offensive security, penetration testing, and vulnerability analysis. Built by [VEXT Labs](https://tryvext.com) on top of [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) and trained on **260,000+ curated security examples** drawn from real-world engagements, this is the first open-weight model purpose-built for the security profession. |
|
|
| Pentest-7B runs on a single consumer GPU (16 GB VRAM), a MacBook with 16 GB RAM via Ollama, or CPU-only with quantized weights. No API keys, no cloud dependency, no data leaves your machine. |
|
|
| ## Key Capabilities |
|
|
| | Capability | Description | |
| |---|---| |
| | **Vulnerability Explanation** | Given a CVE ID, CWE, or raw scan output, produce a clear technical explanation of the vulnerability, its root cause, and real-world impact. | |
| | **Pentest Report Writing** | Generate executive summaries, technical finding write-ups, risk ratings, and remediation sections in standard pentest report format. | |
| | **Attack Strategy Planning** | Given a target technology stack, suggest prioritized attack paths aligned with MITRE ATT&CK and OWASP Testing Guide methodologies. | |
| | **Remediation Guidance** | Provide specific, actionable fix recommendations with code examples for common vulnerability classes. | |
| | **Compliance Assessment** | Map findings to compliance frameworks (PCI DSS, SOC 2, HIPAA, ISO 27001) and articulate control gaps. | |
| | **Threat Briefing** | Summarize threat intelligence, emerging CVEs, and APT campaign TTPs for stakeholder communication. | |
| | **Security Code Review** | Analyze code snippets for injection flaws, authentication bypasses, insecure deserialization, and other OWASP Top 10 issues. | |
|
|
| ## Training |
|
|
| ### Data |
|
|
| Pentest-7B was trained on **260,000+ curated examples** spanning: |
|
|
| - **Production pentesting traces** -- Real (anonymized) action-observation pairs from VEXT's autonomous security agents running against authorized bug bounty targets. Includes successful exploitation chains, false positive patterns, and tool output interpretation. |
| - **CTF challenge solutions** -- Structured walkthroughs from capture-the-flag competitions covering web, pwn, crypto, reverse engineering, and forensics categories. |
| - **Bug bounty write-ups** -- Public responsible disclosure reports with structured vulnerability descriptions, reproduction steps, and impact assessments. |
| - **MITRE ATT&CK corpus** -- Technique descriptions, procedure examples, detection guidance, and mitigation strategies across all 14 tactics. |
| - **OWASP materials** -- Testing Guide procedures, ASVS requirements, cheat sheets, and vulnerability classifications. |
| - **CVE analysis** -- Detailed analysis of 50,000+ CVEs including root cause, affected versions, exploit conditions, and patch diffs. |
| - **DPO preference pairs** -- 2,000+ pairs where validated real findings are preferred over false positives, teaching the model to distinguish true vulnerabilities from noise. |
|
|
| **What is NOT in the training data:** Raw exploit code, weaponized payloads, malware source, credentials, PII, or any data that could be directly used for unauthorized access. The model is trained to *reason about* security, not to serve as an exploit toolkit. |
|
|
| ### Architecture and Training Pipeline |
|
|
| ``` |
| Qwen2.5-7B-Instruct (base) |
| | |
| v |
| QLoRA Fine-Tuning (SFT) |
| - Rank: 16, Alpha: 32 |
| - Target modules: q_proj, k_proj, v_proj, o_proj |
| - 3 epochs, effective batch size 32 |
| - Max sequence length: 4096 tokens |
| - Learning rate: 2e-4, cosine schedule |
| | |
| v |
| DPO Alignment |
| - Beta: 0.1, sigmoid loss |
| - 1 epoch, learning rate 5e-6 |
| - Preference signal: validated findings (chosen) vs false positives (rejected) |
| | |
| v |
| Adapter Merge + AWQ 4-bit Quantization (optional) |
| | |
| v |
| VEXT Pentest-7B |
| ``` |
|
|
| ### Hardware |
|
|
| - SFT: 8x NVIDIA A100 40GB (SageMaker ml.p4d.24xlarge), ~18 hours |
| - DPO: 8x NVIDIA A100 40GB, ~4 hours |
| - Quantization: Single A10G 24GB (SageMaker ml.g5.2xlarge) |
|
|
| ## Usage |
|
|
| ### Transformers (Full Precision) |
|
|
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| model_id = "vext-labs/pentest-7b" |
| |
| tokenizer = AutoTokenizer.from_pretrained(model_id) |
| model = AutoModelForCausalLM.from_pretrained( |
| model_id, |
| torch_dtype="auto", |
| device_map="auto", |
| ) |
| |
| messages = [ |
| { |
| "role": "system", |
| "content": ( |
| "You are an expert penetration tester and security analyst. " |
| "Provide detailed, technically accurate security guidance." |
| ), |
| }, |
| { |
| "role": "user", |
| "content": ( |
| "I found a reflected XSS in a search parameter on an e-commerce site " |
| "during a bug bounty engagement. The input is reflected inside a " |
| "JavaScript string literal in the response. Write the finding for my " |
| "pentest report, including severity rating, impact, and remediation." |
| ), |
| }, |
| ] |
| |
| text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
| inputs = tokenizer(text, return_tensors="pt").to(model.device) |
| |
| outputs = model.generate( |
| **inputs, |
| max_new_tokens=1024, |
| temperature=0.7, |
| top_p=0.9, |
| repetition_penalty=1.1, |
| ) |
| response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True) |
| print(response) |
| ``` |
|
|
| ### vLLM (Production Serving) |
|
|
| ```python |
| from vllm import LLM, SamplingParams |
| |
| llm = LLM( |
| model="vext-labs/pentest-7b", |
| tensor_parallel_size=1, # single GPU |
| max_model_len=4096, |
| gpu_memory_utilization=0.90, |
| ) |
| |
| sampling = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=1024) |
| |
| prompts = [ |
| "Explain CVE-2024-3094 (XZ Utils backdoor) — root cause, impact, and detection methods.", |
| "Given an exposed .git directory on a production web server, outline an attack plan.", |
| ] |
| |
| outputs = llm.generate(prompts, sampling) |
| for output in outputs: |
| print(output.outputs[0].text) |
| ``` |
|
|
| **OpenAI-compatible API with vLLM:** |
|
|
| ```bash |
| vllm serve vext-labs/pentest-7b --port 8000 |
| ``` |
|
|
| ```python |
| from openai import OpenAI |
| |
| client = OpenAI(base_url="http://localhost:8000/v1", api_key="unused") |
| |
| response = client.chat.completions.create( |
| model="vext-labs/pentest-7b", |
| messages=[ |
| {"role": "system", "content": "You are a senior penetration tester."}, |
| {"role": "user", "content": "Analyze this Nmap output and suggest next steps:\n\nPORT STATE SERVICE VERSION\n22/tcp open ssh OpenSSH 7.4\n80/tcp open http Apache 2.4.6\n443/tcp open ssl/http Apache 2.4.6\n3306/tcp open mysql MySQL 5.7.38"}, |
| ], |
| temperature=0.7, |
| max_tokens=1024, |
| ) |
| print(response.choices[0].message.content) |
| ``` |
|
|
| ### Ollama (Local, Quantized) |
|
|
| ```bash |
| # Pull the model (GGUF Q4_K_M quantization, ~4.5 GB) |
| ollama pull vext-labs/pentest-7b |
| |
| # Interactive chat |
| ollama run vext-labs/pentest-7b |
| |
| # API |
| curl http://localhost:11434/api/chat -d '{ |
| "model": "vext-labs/pentest-7b", |
| "messages": [ |
| {"role": "user", "content": "What are the top 5 things to check when auditing a JWT implementation?"} |
| ] |
| }' |
| ``` |
|
|
| ### Docker (Isolated Serving) |
|
|
| ```bash |
| docker run --gpus all -p 8000:8000 \ |
| ghcr.io/vext-labs/pentest-7b:latest \ |
| --model vext-labs/pentest-7b --port 8000 |
| ``` |
|
|
| ## Telemetry |
|
|
| Pentest-7B includes an **opt-in** telemetry collector to help us improve the model. It is **off by default** and collects only anonymized aggregate statistics (vulnerability categories, tool success rates, session metadata). It **never** collects URLs, IPs, credentials, vulnerability details, request/response bodies, file paths, or user identity. |
|
|
| ```bash |
| # Enable (opt-in) |
| export VEXT_TELEMETRY=on |
| |
| # Disable (default) |
| export VEXT_TELEMETRY=off |
| |
| # See exactly what is collected |
| python -c "from vext_telemetry import what_we_collect; what_we_collect()" |
| ``` |
|
|
| Full telemetry source code is included in the repository for audit: [`telemetry/collector.py`](telemetry/collector.py). |
|
|
| ## Evaluation |
|
|
| | Benchmark | Pentest-7B | Qwen2.5-7B-Instruct (base) | GPT-4o (API) | |
| |---|---|---|---| |
| | SecBench (vuln classification) | **82.4%** | 61.2% | 79.8% | |
| | CyberMetric (security knowledge) | **74.1%** | 52.7% | 71.3% | |
| | PentestQA (methodology) | **88.6%** | 44.3% | 83.1% | |
| | Finding Quality (human eval, 1-5) | **4.2** | 2.1 | 4.4 | |
| | False Positive Rate | **12.3%** | 41.7% | 15.8% | |
|
|
| *Benchmarks run with temperature=0, greedy decoding. Human evaluation by 3 senior pentesters on 200 randomly sampled findings.* |
|
|
| ## Intended Use |
|
|
| This model is built for **authorized security professionals**: |
|
|
| - Penetration testers writing reports and planning engagements |
| - Bug bounty hunters analyzing targets and drafting submissions |
| - Security engineers triaging vulnerabilities and planning remediation |
| - SOC analysts interpreting alerts and assessing threat severity |
| - Compliance teams mapping findings to regulatory frameworks |
| - Security researchers studying vulnerability patterns |
|
|
| ## Limitations and Responsible Use |
|
|
| - **Not a replacement for human expertise.** Always validate model outputs with manual testing and professional judgment. |
| - **Authorization required.** Do not use this model's output to test systems without explicit written authorization from the system owner. |
| - **No guarantee of accuracy.** The model can hallucinate CVE details, suggest inapplicable techniques, or miss critical context. Treat outputs as a starting point, not a final answer. |
| - **Scope of training.** The model is strongest on web application security, network infrastructure, and common vulnerability classes. It has limited depth on hardware security, ICS/SCADA, mobile reversing, and cryptographic implementation review. |
| - **Not an exploit generator.** The model is trained to reason about security concepts, not to produce weaponized code. Attempts to extract raw exploit payloads will produce lower-quality outputs by design. |
|
|
| ## License |
|
|
| Apache 2.0. Use it, modify it, deploy it commercially. Attribution appreciated but not required. |
|
|
| ## Citation |
|
|
| ```bibtex |
| @misc{vext-pentest-7b-2026, |
| title = {VEXT Pentest-7B: An Open-Source Language Model for Penetration Testing and Security Analysis}, |
| author = {VEXT Labs}, |
| year = {2026}, |
| url = {https://huggingface.co/vext-labs/pentest-7b}, |
| note = {Fine-tuned from Qwen2.5-7B-Instruct on 260K+ curated security examples with QLoRA SFT and DPO alignment}, |
| } |
| ``` |
|
|
| ## Links |
|
|
| - **VEXT Platform:** [https://tryvext.com](https://tryvext.com) |
| - **GitHub:** [https://github.com/vext-labs/pentest-7b](https://github.com/vext-labs/pentest-7b) |
| - **Discord:** [https://discord.gg/vext-security](https://discord.gg/vext-security) |
| - **Paper (coming soon):** Technical report with full training methodology and ablation studies |
|
|
| ## Built By |
|
|
| [VEXT Labs, Inc.](https://tryvext.com) -- Building autonomous security testing infrastructure. Pentest-7B is the open-source foundation of our platform's security reasoning capabilities. |
|
|
| --- |
|
|
| *If you use Pentest-7B in your research or product, we would love to hear about it. Open an issue or reach out at [oss@tryvext.com](mailto:oss@tryvext.com).* |
|
|