File size: 11,732 Bytes
73ce1b8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c6c7bb8
73ce1b8
c6c7bb8
73ce1b8
 
 
48798b8
 
 
 
 
 
73ce1b8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48798b8
73ce1b8
48798b8
73ce1b8
48798b8
73ce1b8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
600173c
73ce1b8
 
482e639
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a2835e5
 
 
 
 
482e639
dd14a4d
482e639
c6c7bb8
 
 
 
 
 
a2835e5
c6c7bb8
 
 
 
a2835e5
c6c7bb8
 
 
 
73ce1b8
 
 
 
dd14a4d
73ce1b8
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
---
base_model: meta-llama/Llama-3.1-8B-Instruct
library_name: peft
model_name: corvus-v2-8b
license: llama3.1
language:
  - en
tags:
  - security
  - vulnerability-triage
  - cybersecurity
  - compliance
  - lora
  - qlora
  - sft
  - transformers
  - trl
pipeline_tag: text-generation
datasets:
  - custom
model_type: llama
---

# Corvus™ v2 — Vulnerability Triage Model

**Corvus™ v2** is a fine-tuned Llama 3.1 8B model that produces structured triage decisions for software vulnerabilities. Given CVE data, CVSS scores, EPSS probability, KEV listing status, and asset context, it outputs a JSON decision with priority, recommended action, reasoning, and confidence score.

Built by [CVERiskPilot](https://cveriskpilot.com) — 100% Veteran Owned, Texas, USA.

## Distribution Status

Corvus v2 weights are **not currently distributed on Hugging Face**.

This repository remains public for model documentation, licensing terms, and release-status updates while distribution strategy is under review.

## Why This Exists

Offensive AI is accelerating. AI fuzzers are finding thousands of zero-days across every major codebase. The scanning problem is being solved. The triage problem is getting 10x harder.

Security teams are drowning in findings they can't prioritize fast enough. Attackers exploit in 5 days. Defenders patch in 209. That gap gets worse every quarter.

Corvus doesn't find vulnerabilities. It decides what to do about them — at machine speed, on local hardware, with no data leaving your environment.

## Model Details

| Property | Value |
|----------|-------|
| Base model | `meta-llama/Llama-3.1-8B-Instruct` |
| Fine-tuning method | QLoRA (r=16, alpha=32, dropout=0.05) |
| Training examples | 50,000+ labeled vulnerability triage decisions |
| Training compute | 8x NVIDIA A100 (Vertex AI), ~1.2 hours |
| Priority accuracy | 94.8% |
| Full match (priority + action) | 82.7% |
| Training loss (final) | 0.461 |
| Throughput | 11.9 samples/sec |

## Intended Use

**Use this model for:** Prioritizing and triaging software vulnerabilities in security operations workflows. Deciding which CVEs need immediate attention vs. scheduled patching vs. risk acceptance.

**Do not use this model for:** Generating exploits, finding vulnerabilities, offensive security operations, or any purpose that could harm system security. This is a defensive triage tool.

**Human oversight required:** Model outputs are recommendations, not autonomous decisions. All triage decisions should be reviewed by a qualified security professional before action.

## Output Format

Corvus outputs structured JSON with five fields:

```json
{
  "severityOverride": "EPSS in top 1% with active exploitation — upgrading from MEDIUM to CRITICAL",
  "priority": "CRITICAL",
  "recommendedAction": "PATCH_IMMEDIATELY",
  "reasoning": "CVE-2024-XXXXX affects the authentication module in a production-facing service. EPSS score of 0.94 indicates high exploitation probability. Listed in CISA KEV with a remediation deadline. The affected package is a direct dependency with no available workaround. Asset is internet-facing with access to PII.",
  "confidenceScore": 0.92
}
```

### Fields

| Field | Type | Description |
|-------|------|-------------|
| `severityOverride` | `string \| null` | Explanation if the model's priority differs from raw CVSS severity |
| `priority` | `string` | `CRITICAL`, `HIGH`, `MEDIUM`, or `LOW` |
| `recommendedAction` | `string` | One of 6 actions (see below) |
| `reasoning` | `string` | Detailed explanation referencing specific technical factors |
| `confidenceScore` | `number` | 0.0 to 1.0 — model's confidence in the decision |

### Action Taxonomy

| Action | When to use |
|--------|-------------|
| `PATCH_IMMEDIATELY` | Active exploitation, critical asset, no workaround |
| `SCHEDULE_PATCH` | Important but not actively exploited, patch available |
| `MITIGATE` | Patch unavailable or risky — apply compensating controls |
| `ACCEPT_RISK` | Low impact, unreachable code path, network-isolated asset |
| `INVESTIGATE` | Insufficient data to make a confident decision |
| `DEFER` | Non-critical, low EPSS, no KEV listing, internal-only asset |

## Availability

If distribution resumes, CVERiskPilot will publish updated instructions here.

For evaluation, partnership, or commercial licensing inquiries, contact [sales@cveriskpilot.com](mailto:sales@cveriskpilot.com).

## Input Format

The model expects vulnerability data as a newline-separated key-value string:

```
CVE: CVE-2024-3094
Title: XZ Utils Backdoor
Severity: CRITICAL
CVSS: 10.0
EPSS: 0.97
KEV: Yes
Package: xz-utils@5.6.0
Description: Malicious backdoor in XZ Utils compression library allowing unauthorized access via modified liblzma in SSH authentication path
```

### Supported Fields

| Field | Required | Description |
|-------|----------|-------------|
| `Title` | Yes | Vulnerability title or summary |
| `CVE` | No | CVE identifier(s), comma-separated |
| `Severity` | No | CVSS severity label (CRITICAL/HIGH/MEDIUM/LOW) |
| `CVSS` | No | CVSS base score (0.0-10.0) |
| `EPSS` | No | EPSS exploitation probability (0.0-1.0) |
| `KEV` | No | CISA Known Exploited Vulnerabilities listing (Yes/No) |
| `Package` | No | Affected package name and version |
| `Description` | No | Vulnerability description (truncated to 500 chars) |

The model performs best with more context. Providing EPSS, KEV, and CVSS together produces the most accurate triage decisions.

## Training Data

The model was trained on 50,000+ labeled vulnerability triage examples generated through a synthetic data pipeline with 6-layer quality validation:

1. **Real CVE data** from NVD, GHSA, OSV, and ExploitDB
2. **Enrichment** with EPSS scores, KEV status, and CVSS vectors
3. **Synthetic triage decisions** generated by Claude with domain-specific prompting
4. **6-layer quality gate** — schema validation, field completeness, reasoning coherence, action-priority alignment, confidence calibration, cross-reference consistency
5. **Class balancing** across all 6 action types and 4 priority levels
6. **Human review** of edge cases and override patterns

The training data is not included in this release.

## Evaluation

Evaluated on a held-out test set of 5,000 examples:

| Metric | Score |
|--------|-------|
| Priority accuracy (4-class) | 94.8% |
| Action accuracy (6-class) | 84.4% |
| Full match (priority + action) | 82.7% |
| Confidence calibration (ECE) | 0.08 |

### Known Limitations

- **Trained on public CVE data only.** The model has no knowledge of proprietary or internal vulnerability disclosures.
- **No asset topology reasoning.** The model uses asset context fields provided in the input but cannot reason about network topology or dependency chains on its own.
- **English only.** Training data is exclusively English-language CVE descriptions.
- **Temporal cutoff.** Training data includes CVEs through early 2026. The model may be less accurate on novel vulnerability classes that emerge after this date.
- **Not a scanner.** Corvus triages known vulnerabilities. It does not discover, detect, or exploit vulnerabilities.

## Ethical Considerations

This model is designed exclusively for defensive security operations. It helps security teams prioritize remediation work, not bypass security controls.

We release it openly because we believe defensive AI capabilities should not be gated behind enterprise contracts while offensive AI capabilities continue to advance. Security teams at organizations of every size deserve access to intelligent triage.

The model outputs recommendations, not autonomous actions. Every decision should be reviewed by a qualified professional before implementation.

## Training Procedure

- **Method:** QLoRA (4-bit quantization + Low-Rank Adaptation)
- **Rank:** 16
- **Alpha:** 32
- **Dropout:** 0.05
- **Learning rate:** 2e-4 with cosine schedule
- **Epochs:** 3
- **Batch size:** 4 per device, gradient accumulation 4 (effective batch 128 on 8 GPUs)
- **Optimizer:** AdamW (8-bit)
- **Max sequence length:** 2048
- **Compute:** 8x NVIDIA A100 80GB (Vertex AI Custom Job)
- **Training time:** 1.2 hours (4,432 seconds)
- **Cost:** ~$30 (Vertex AI spot pricing)

### Quantization

GGUF quantization performed with `llama-cpp-python`:

| Quantization | Size | Quality | Use case |
|-------------|------|---------|----------|
| f16 | 16 GB | Full precision | Research, benchmarking |
| Q4_K_M | 4.6 GB | Minimal loss | Production, single GPU |

### Framework Versions

- PEFT 0.18.1
- TRL 1.0.0
- Transformers 5.5.0
- PyTorch 2.7.1+cu128
- Datasets 4.8.4
- Tokenizers 0.22.2

## Citation

```bibtex
@misc{corvus-v2-2026,
  title={Corvus v2: A Fine-Tuned Language Model for Vulnerability Triage},
  author={CVERiskPilot},
  year={2026},
  url={https://huggingface.co/CVRP/corvus-v2-8b},
  note={QLoRA fine-tuned Llama 3.1 8B on 50K+ vulnerability triage examples}
}
```

## License

- **Model weights:** [Llama 3.1 Community License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE) + CVERiskPilot Commercial Use Addendum (see below)
- **Modelfile, documentation, and evaluation code:** Apache 2.0

### CVERiskPilot Commercial Use Addendum

This model is released under the Llama 3.1 Community License with the following additional terms from CVERiskPilot LLC:

**Permitted use:**
- Internal vulnerability triage within your organization
- Research, benchmarking, and academic use
- Integration into internal security tooling and workflows
- Fine-tuning for your own internal use
- Educational and non-commercial use

**Restricted use (requires a commercial license from CVERiskPilot LLC):**
- Offering this model or any derivative as a hosted, managed, or API-accessible vulnerability triage service to third parties
- Embedding this model or any derivative in a commercial product sold or licensed to third parties
- Redistributing fine-tuned versions for commercial purposes

**Trademark notice:**
- No trademark license is granted under this repository, the Llama 3.1 Community License, or this addendum.
- "Corvus", "Corvus AI", "CVERiskPilot", and related logos are claimed trademarks or common-law marks of CVERiskPilot LLC.
- Derivative works may not use the "Corvus™" or "CVERiskPilot™" names, logos, or branding without written permission.
- You may make factual nominative reference to Corvus v2 only to identify the original model, provided that use does not imply endorsement, affiliation, certification, or sponsorship by CVERiskPilot LLC.

For commercial licensing inquiries: [sales@cveriskpilot.com](mailto:sales@cveriskpilot.com)

## Provenance and First Use

Corvus™ was created by CVERiskPilot LLC and has been in continuous development and commercial use since January 2026.

| Milestone | Date |
|-----------|------|
| CVERiskPilot LLC incorporated (Texas) | 2026 |
| Corvus v1 (Strix) internal deployment | January 2026 |
| Corvus v2 training data pipeline (50K+ examples) | February–March 2026 |
| Corvus v2 QLoRA training completed (Vertex AI) | April 5, 2026 |
| Corvus v2 deployed to production (CVERiskPilot platform) | April 8, 2026 |
| Corvus v2 public Hugging Face documentation repository | April 2026 |
| NVIDIA Inception program membership | Active |

All training artifacts, commit history, GCP job logs, and deployment records are retained by CVERiskPilot LLC as evidence of continuous use and first use in commerce.

## Contact

- **Website:** [cveriskpilot.com](https://cveriskpilot.com)
- **LinkedIn:** [CVERiskPilot](https://linkedin.com/company/cveriskpilot)
- **Commercial licensing:** [sales@cveriskpilot.com](mailto:sales@cveriskpilot.com)

CVERiskPilot LLC | 100% Veteran Owned | Texas, USA