veris-classifier-v2 / README.md
vibesecurityguy's picture
Upload README.md with huggingface_hub
48a4a0d verified
---
library_name: peft
license: apache-2.0
base_model: mistralai/Mistral-7B-Instruct-v0.3
tags:
- veris
- cybersecurity
- incident-classification
- lora
- qlora
- mistral
datasets:
- vibesecurityguy/veris-classifier-training
- vibesecurityguy/veris-incident-classifications
language:
- en
pipeline_tag: text-generation
---
# VERIS Classifier v2
A fine-tuned [Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) model that classifies cybersecurity incident descriptions into the [VERIS](http://veriscommunity.net/) (Vocabulary for Event Recording and Incident Sharing) framework.
Given a plain-English incident description, the model outputs structured JSON with the correct VERIS categories for **action**, **actor**, **asset**, and **attribute**.
**[Try the live demo](https://huggingface.co/spaces/vibesecurityguy/veris-classifier)** — no API key required, runs on ZeroGPU.
## Example
**Input:**
> An employee at a hospital clicked a phishing email, which installed ransomware that encrypted patient records.
**Output:**
```json
{
"action": {"hacking": {"variety": ["Ransomware"]}, "social": {"variety": ["Phishing"]}},
"actor": {"external": {"variety": ["Unaffiliated"], "motive": ["Financial"]}},
"asset": {"assets": [{"variety": "S - Database"}]},
"attribute": {"availability": {"variety": ["Obscuration"]}}
}
```
## Training Details
| Parameter | Value |
|-----------|-------|
| **Base model** | mistralai/Mistral-7B-Instruct-v0.3 |
| **Method** | QLoRA (4-bit NF4 quantization + LoRA) |
| **LoRA rank (r)** | 16 |
| **LoRA alpha** | 32 |
| **LoRA dropout** | 0.05 |
| **Target modules** | All linear (q, k, v, o, gate, up, down) |
| **Training examples** | 9,813 train / 517 eval |
| **Epochs** | 3 |
| **Batch size** | 2 x 4 gradient accumulation = 8 effective |
| **Learning rate** | 2e-4 (cosine schedule, 10% warmup) |
| **Precision** | bf16 |
| **Optimizer** | AdamW |
| **Max sequence length** | 2,048 tokens |
| **Hardware** | NVIDIA A10G (24GB VRAM) |
## Training Data
Fine-tuned on [vibesecurityguy/veris-classifier-training](https://huggingface.co/datasets/vibesecurityguy/veris-classifier-training), which contains:
- **10,019 classification examples** — synthetic incident descriptions generated from real [VCDB](https://github.com/vz-risk/VCDB) (Verizon Community Database) records, paired with their ground-truth VERIS classifications
- **311 Q&A pairs** — questions and answers about the VERIS framework itself
The source classifications come from 8,559 real-world incidents in VCDB, spanning healthcare, finance, retail, government, and other industries.
## How to Use
### With Transformers + PEFT
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model = "mistralai/Mistral-7B-Instruct-v0.3"
adapter = "vibesecurityguy/veris-classifier-v2"
tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(base_model, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)
messages = [
{"role": "system", "content": "You are a VERIS classification expert..."},
{"role": "user", "content": "Classify this incident: An employee lost a laptop containing unencrypted customer data."}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Intended Use
This model is designed for:
- Classifying cybersecurity incidents into the VERIS framework
- Answering questions about VERIS categories and taxonomy
- Assisting incident response teams with structured data entry
## Limitations
- **VCDB bias**: Training data over-represents healthcare (HIPAA mandatory disclosure) and US-based incidents
- **Schema version**: Trained primarily on VERIS 1.3.x schema; may not cover all 1.4 additions
- **Not a replacement for human analysis**: Output should be reviewed by a security analyst
- **English only**: Trained on English-language incident descriptions
## Links
- **Live Demo:** [huggingface.co/spaces/vibesecurityguy/veris-classifier](https://huggingface.co/spaces/vibesecurityguy/veris-classifier)
- **Training Data:** [huggingface.co/datasets/vibesecurityguy/veris-classifier-training](https://huggingface.co/datasets/vibesecurityguy/veris-classifier-training)
- **Source Code:** [github.com/petershamoon/veris-classifier](https://github.com/petershamoon/veris-classifier)
- **VERIS Framework:** [verisframework.org](https://verisframework.org/)
## Model Card Authors
Peter Shamoon ([@vibesecurityguy](https://huggingface.co/vibesecurityguy))