--- base_model: meta-llama/Llama-3.1-8B-Instruct library_name: peft model_name: corvus-v2-8b license: llama3.1 language: - en tags: - security - vulnerability-triage - cybersecurity - compliance - lora - qlora - sft - transformers - trl pipeline_tag: text-generation datasets: - custom model_type: llama --- # Corvus™ v2 — Vulnerability Triage Model **Corvus™ v2** is a fine-tuned Llama 3.1 8B model that produces structured triage decisions for software vulnerabilities. Given CVE data, CVSS scores, EPSS probability, KEV listing status, and asset context, it outputs a JSON decision with priority, recommended action, reasoning, and confidence score. Built by [CVERiskPilot](https://cveriskpilot.com) — 100% Veteran Owned, Texas, USA. ## Distribution Status Corvus v2 weights are **not currently distributed on Hugging Face**. This repository remains public for model documentation, licensing terms, and release-status updates while distribution strategy is under review. ## Why This Exists Offensive AI is accelerating. AI fuzzers are finding thousands of zero-days across every major codebase. The scanning problem is being solved. The triage problem is getting 10x harder. Security teams are drowning in findings they can't prioritize fast enough. Attackers exploit in 5 days. Defenders patch in 209. That gap gets worse every quarter. Corvus doesn't find vulnerabilities. It decides what to do about them — at machine speed, on local hardware, with no data leaving your environment. ## Model Details | Property | Value | |----------|-------| | Base model | `meta-llama/Llama-3.1-8B-Instruct` | | Fine-tuning method | QLoRA (r=16, alpha=32, dropout=0.05) | | Training examples | 50,000+ labeled vulnerability triage decisions | | Training compute | 8x NVIDIA A100 (Vertex AI), ~1.2 hours | | Priority accuracy | 94.8% | | Full match (priority + action) | 82.7% | | Training loss (final) | 0.461 | | Throughput | 11.9 samples/sec | ## Intended Use **Use this model for:** Prioritizing and triaging software vulnerabilities in security operations workflows. Deciding which CVEs need immediate attention vs. scheduled patching vs. risk acceptance. **Do not use this model for:** Generating exploits, finding vulnerabilities, offensive security operations, or any purpose that could harm system security. This is a defensive triage tool. **Human oversight required:** Model outputs are recommendations, not autonomous decisions. All triage decisions should be reviewed by a qualified security professional before action. ## Output Format Corvus outputs structured JSON with five fields: ```json { "severityOverride": "EPSS in top 1% with active exploitation — upgrading from MEDIUM to CRITICAL", "priority": "CRITICAL", "recommendedAction": "PATCH_IMMEDIATELY", "reasoning": "CVE-2024-XXXXX affects the authentication module in a production-facing service. EPSS score of 0.94 indicates high exploitation probability. Listed in CISA KEV with a remediation deadline. The affected package is a direct dependency with no available workaround. Asset is internet-facing with access to PII.", "confidenceScore": 0.92 } ``` ### Fields | Field | Type | Description | |-------|------|-------------| | `severityOverride` | `string \| null` | Explanation if the model's priority differs from raw CVSS severity | | `priority` | `string` | `CRITICAL`, `HIGH`, `MEDIUM`, or `LOW` | | `recommendedAction` | `string` | One of 6 actions (see below) | | `reasoning` | `string` | Detailed explanation referencing specific technical factors | | `confidenceScore` | `number` | 0.0 to 1.0 — model's confidence in the decision | ### Action Taxonomy | Action | When to use | |--------|-------------| | `PATCH_IMMEDIATELY` | Active exploitation, critical asset, no workaround | | `SCHEDULE_PATCH` | Important but not actively exploited, patch available | | `MITIGATE` | Patch unavailable or risky — apply compensating controls | | `ACCEPT_RISK` | Low impact, unreachable code path, network-isolated asset | | `INVESTIGATE` | Insufficient data to make a confident decision | | `DEFER` | Non-critical, low EPSS, no KEV listing, internal-only asset | ## Availability If distribution resumes, CVERiskPilot will publish updated instructions here. For evaluation, partnership, or commercial licensing inquiries, contact [sales@cveriskpilot.com](mailto:sales@cveriskpilot.com). ## Input Format The model expects vulnerability data as a newline-separated key-value string: ``` CVE: CVE-2024-3094 Title: XZ Utils Backdoor Severity: CRITICAL CVSS: 10.0 EPSS: 0.97 KEV: Yes Package: xz-utils@5.6.0 Description: Malicious backdoor in XZ Utils compression library allowing unauthorized access via modified liblzma in SSH authentication path ``` ### Supported Fields | Field | Required | Description | |-------|----------|-------------| | `Title` | Yes | Vulnerability title or summary | | `CVE` | No | CVE identifier(s), comma-separated | | `Severity` | No | CVSS severity label (CRITICAL/HIGH/MEDIUM/LOW) | | `CVSS` | No | CVSS base score (0.0-10.0) | | `EPSS` | No | EPSS exploitation probability (0.0-1.0) | | `KEV` | No | CISA Known Exploited Vulnerabilities listing (Yes/No) | | `Package` | No | Affected package name and version | | `Description` | No | Vulnerability description (truncated to 500 chars) | The model performs best with more context. Providing EPSS, KEV, and CVSS together produces the most accurate triage decisions. ## Training Data The model was trained on 50,000+ labeled vulnerability triage examples generated through a synthetic data pipeline with 6-layer quality validation: 1. **Real CVE data** from NVD, GHSA, OSV, and ExploitDB 2. **Enrichment** with EPSS scores, KEV status, and CVSS vectors 3. **Synthetic triage decisions** generated by Claude with domain-specific prompting 4. **6-layer quality gate** — schema validation, field completeness, reasoning coherence, action-priority alignment, confidence calibration, cross-reference consistency 5. **Class balancing** across all 6 action types and 4 priority levels 6. **Human review** of edge cases and override patterns The training data is not included in this release. ## Evaluation Evaluated on a held-out test set of 5,000 examples: | Metric | Score | |--------|-------| | Priority accuracy (4-class) | 94.8% | | Action accuracy (6-class) | 84.4% | | Full match (priority + action) | 82.7% | | Confidence calibration (ECE) | 0.08 | ### Known Limitations - **Trained on public CVE data only.** The model has no knowledge of proprietary or internal vulnerability disclosures. - **No asset topology reasoning.** The model uses asset context fields provided in the input but cannot reason about network topology or dependency chains on its own. - **English only.** Training data is exclusively English-language CVE descriptions. - **Temporal cutoff.** Training data includes CVEs through early 2026. The model may be less accurate on novel vulnerability classes that emerge after this date. - **Not a scanner.** Corvus triages known vulnerabilities. It does not discover, detect, or exploit vulnerabilities. ## Ethical Considerations This model is designed exclusively for defensive security operations. It helps security teams prioritize remediation work, not bypass security controls. We release it openly because we believe defensive AI capabilities should not be gated behind enterprise contracts while offensive AI capabilities continue to advance. Security teams at organizations of every size deserve access to intelligent triage. The model outputs recommendations, not autonomous actions. Every decision should be reviewed by a qualified professional before implementation. ## Training Procedure - **Method:** QLoRA (4-bit quantization + Low-Rank Adaptation) - **Rank:** 16 - **Alpha:** 32 - **Dropout:** 0.05 - **Learning rate:** 2e-4 with cosine schedule - **Epochs:** 3 - **Batch size:** 4 per device, gradient accumulation 4 (effective batch 128 on 8 GPUs) - **Optimizer:** AdamW (8-bit) - **Max sequence length:** 2048 - **Compute:** 8x NVIDIA A100 80GB (Vertex AI Custom Job) - **Training time:** 1.2 hours (4,432 seconds) - **Cost:** ~$30 (Vertex AI spot pricing) ### Quantization GGUF quantization performed with `llama-cpp-python`: | Quantization | Size | Quality | Use case | |-------------|------|---------|----------| | f16 | 16 GB | Full precision | Research, benchmarking | | Q4_K_M | 4.6 GB | Minimal loss | Production, single GPU | ### Framework Versions - PEFT 0.18.1 - TRL 1.0.0 - Transformers 5.5.0 - PyTorch 2.7.1+cu128 - Datasets 4.8.4 - Tokenizers 0.22.2 ## Citation ```bibtex @misc{corvus-v2-2026, title={Corvus v2: A Fine-Tuned Language Model for Vulnerability Triage}, author={CVERiskPilot}, year={2026}, url={https://huggingface.co/CVRP/corvus-v2-8b}, note={QLoRA fine-tuned Llama 3.1 8B on 50K+ vulnerability triage examples} } ``` ## License - **Model weights:** [Llama 3.1 Community License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE) + CVERiskPilot Commercial Use Addendum (see below) - **Modelfile, documentation, and evaluation code:** Apache 2.0 ### CVERiskPilot Commercial Use Addendum This model is released under the Llama 3.1 Community License with the following additional terms from CVERiskPilot LLC: **Permitted use:** - Internal vulnerability triage within your organization - Research, benchmarking, and academic use - Integration into internal security tooling and workflows - Fine-tuning for your own internal use - Educational and non-commercial use **Restricted use (requires a commercial license from CVERiskPilot LLC):** - Offering this model or any derivative as a hosted, managed, or API-accessible vulnerability triage service to third parties - Embedding this model or any derivative in a commercial product sold or licensed to third parties - Redistributing fine-tuned versions for commercial purposes **Trademark notice:** - No trademark license is granted under this repository, the Llama 3.1 Community License, or this addendum. - "Corvus", "Corvus AI", "CVERiskPilot", and related logos are claimed trademarks or common-law marks of CVERiskPilot LLC. - Derivative works may not use the "Corvus™" or "CVERiskPilot™" names, logos, or branding without written permission. - You may make factual nominative reference to Corvus v2 only to identify the original model, provided that use does not imply endorsement, affiliation, certification, or sponsorship by CVERiskPilot LLC. For commercial licensing inquiries: [sales@cveriskpilot.com](mailto:sales@cveriskpilot.com) ## Provenance and First Use Corvus™ was created by CVERiskPilot LLC and has been in continuous development and commercial use since January 2026. | Milestone | Date | |-----------|------| | CVERiskPilot LLC incorporated (Texas) | 2026 | | Corvus v1 (Strix) internal deployment | January 2026 | | Corvus v2 training data pipeline (50K+ examples) | February–March 2026 | | Corvus v2 QLoRA training completed (Vertex AI) | April 5, 2026 | | Corvus v2 deployed to production (CVERiskPilot platform) | April 8, 2026 | | Corvus v2 public Hugging Face documentation repository | April 2026 | | NVIDIA Inception program membership | Active | All training artifacts, commit history, GCP job logs, and deployment records are retained by CVERiskPilot LLC as evidence of continuous use and first use in commerce. ## Contact - **Website:** [cveriskpilot.com](https://cveriskpilot.com) - **LinkedIn:** [CVERiskPilot](https://linkedin.com/company/cveriskpilot) - **Commercial licensing:** [sales@cveriskpilot.com](mailto:sales@cveriskpilot.com) CVERiskPilot LLC | 100% Veteran Owned | Texas, USA