File size: 4,785 Bytes
a54e644 48a4a0d a54e644 48a4a0d a54e644 48a4a0d a54e644 48a4a0d a54e644 48a4a0d a54e644 48a4a0d a54e644 48a4a0d a54e644 48a4a0d a54e644 48a4a0d a54e644 48a4a0d a54e644 48a4a0d a54e644 48a4a0d a54e644 48a4a0d a54e644 48a4a0d a54e644 48a4a0d a54e644 48a4a0d a54e644 48a4a0d a54e644 48a4a0d a54e644 48a4a0d a54e644 48a4a0d a54e644 48a4a0d a54e644 48a4a0d a54e644 48a4a0d a54e644 48a4a0d a54e644 48a4a0d a54e644 48a4a0d a54e644 48a4a0d a54e644 48a4a0d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 | ---
library_name: peft
license: apache-2.0
base_model: mistralai/Mistral-7B-Instruct-v0.3
tags:
- veris
- cybersecurity
- incident-classification
- lora
- qlora
- mistral
datasets:
- vibesecurityguy/veris-classifier-training
- vibesecurityguy/veris-incident-classifications
language:
- en
pipeline_tag: text-generation
---
# VERIS Classifier v2
A fine-tuned [Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) model that classifies cybersecurity incident descriptions into the [VERIS](http://veriscommunity.net/) (Vocabulary for Event Recording and Incident Sharing) framework.
Given a plain-English incident description, the model outputs structured JSON with the correct VERIS categories for **action**, **actor**, **asset**, and **attribute**.
**[Try the live demo](https://huggingface.co/spaces/vibesecurityguy/veris-classifier)** — no API key required, runs on ZeroGPU.
## Example
**Input:**
> An employee at a hospital clicked a phishing email, which installed ransomware that encrypted patient records.
**Output:**
```json
{
"action": {"hacking": {"variety": ["Ransomware"]}, "social": {"variety": ["Phishing"]}},
"actor": {"external": {"variety": ["Unaffiliated"], "motive": ["Financial"]}},
"asset": {"assets": [{"variety": "S - Database"}]},
"attribute": {"availability": {"variety": ["Obscuration"]}}
}
```
## Training Details
| Parameter | Value |
|-----------|-------|
| **Base model** | mistralai/Mistral-7B-Instruct-v0.3 |
| **Method** | QLoRA (4-bit NF4 quantization + LoRA) |
| **LoRA rank (r)** | 16 |
| **LoRA alpha** | 32 |
| **LoRA dropout** | 0.05 |
| **Target modules** | All linear (q, k, v, o, gate, up, down) |
| **Training examples** | 9,813 train / 517 eval |
| **Epochs** | 3 |
| **Batch size** | 2 x 4 gradient accumulation = 8 effective |
| **Learning rate** | 2e-4 (cosine schedule, 10% warmup) |
| **Precision** | bf16 |
| **Optimizer** | AdamW |
| **Max sequence length** | 2,048 tokens |
| **Hardware** | NVIDIA A10G (24GB VRAM) |
## Training Data
Fine-tuned on [vibesecurityguy/veris-classifier-training](https://huggingface.co/datasets/vibesecurityguy/veris-classifier-training), which contains:
- **10,019 classification examples** — synthetic incident descriptions generated from real [VCDB](https://github.com/vz-risk/VCDB) (Verizon Community Database) records, paired with their ground-truth VERIS classifications
- **311 Q&A pairs** — questions and answers about the VERIS framework itself
The source classifications come from 8,559 real-world incidents in VCDB, spanning healthcare, finance, retail, government, and other industries.
## How to Use
### With Transformers + PEFT
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model = "mistralai/Mistral-7B-Instruct-v0.3"
adapter = "vibesecurityguy/veris-classifier-v2"
tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(base_model, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)
messages = [
{"role": "system", "content": "You are a VERIS classification expert..."},
{"role": "user", "content": "Classify this incident: An employee lost a laptop containing unencrypted customer data."}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Intended Use
This model is designed for:
- Classifying cybersecurity incidents into the VERIS framework
- Answering questions about VERIS categories and taxonomy
- Assisting incident response teams with structured data entry
## Limitations
- **VCDB bias**: Training data over-represents healthcare (HIPAA mandatory disclosure) and US-based incidents
- **Schema version**: Trained primarily on VERIS 1.3.x schema; may not cover all 1.4 additions
- **Not a replacement for human analysis**: Output should be reviewed by a security analyst
- **English only**: Trained on English-language incident descriptions
## Links
- **Live Demo:** [huggingface.co/spaces/vibesecurityguy/veris-classifier](https://huggingface.co/spaces/vibesecurityguy/veris-classifier)
- **Training Data:** [huggingface.co/datasets/vibesecurityguy/veris-classifier-training](https://huggingface.co/datasets/vibesecurityguy/veris-classifier-training)
- **Source Code:** [github.com/petershamoon/veris-classifier](https://github.com/petershamoon/veris-classifier)
- **VERIS Framework:** [verisframework.org](https://verisframework.org/)
## Model Card Authors
Peter Shamoon ([@vibesecurityguy](https://huggingface.co/vibesecurityguy))
|