VERIS Classifier v1
A fine-tuned Qwen2.5-7B-Instruct model that classifies cybersecurity incident descriptions into the VERIS (Vocabulary for Event Recording and Incident Sharing) framework.
Given a plain-English incident description, the model outputs structured JSON with the correct VERIS categories for action, actor, asset, and attribute.
Try the live demo — no API key required, runs on ZeroGPU.
Example
Input:
An employee at a hospital clicked a phishing email, which installed ransomware that encrypted patient records.
Output:
{
"action": {"hacking": {"variety": ["Ransomware"]}, "social": {"variety": ["Phishing"]}},
"actor": {"external": {"variety": ["Unaffiliated"], "motive": ["Financial"]}},
"asset": {"assets": [{"variety": "S - Database"}]},
"attribute": {"availability": {"variety": ["Obscuration"]}}
}
Training Details
| Parameter | Value |
|---|---|
| Base model | Qwen/Qwen2.5-7B-Instruct |
| Method | QLoRA (4-bit NF4 quantization + LoRA) |
| LoRA rank (r) | 16 |
| LoRA alpha | 32 |
| LoRA dropout | 0.05 |
| Target modules | All linear (q, k, v, o, gate, up, down) |
| Trainable parameters | 40.4M / 4.4B (0.92%) |
| Training examples | 9,813 train / 517 eval |
| Epochs | 3 |
| Batch size | 2 x 4 gradient accumulation = 8 effective |
| Learning rate | 2e-4 (cosine schedule, 10% warmup) |
| Precision | bf16 |
| Optimizer | AdamW |
| Max sequence length | 2,048 tokens |
| Hardware | NVIDIA A10G (24GB VRAM) |
| Adapter size | 162 MB |
Training Data
Fine-tuned on vibesecurityguy/veris-classifier-training, which contains:
- 10,019 classification examples — synthetic incident descriptions generated from real VCDB (Verizon Community Database) records, paired with their ground-truth VERIS classifications
- 311 Q&A pairs — questions and answers about the VERIS framework itself
The source classifications come from 8,559 real-world incidents in VCDB, spanning healthcare, finance, retail, government, and other industries.
How to Use
With Transformers + PEFT
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model = "Qwen/Qwen2.5-7B-Instruct"
adapter = "vibesecurityguy/veris-classifier-v1"
tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(base_model, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)
messages = [
{"role": "system", "content": "You are a VERIS classification expert..."},
{"role": "user", "content": "Classify this incident: An employee lost a laptop containing unencrypted customer data."}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Intended Use
This model is designed for:
- Classifying cybersecurity incidents into the VERIS framework
- Answering questions about VERIS categories and taxonomy
- Assisting incident response teams with structured data entry
Limitations
- VCDB bias: Training data over-represents healthcare (HIPAA mandatory disclosure) and US-based incidents
- Schema version: Trained primarily on VERIS 1.3.x schema; may not cover all 1.4 additions
- Not a replacement for human analysis: Output should be reviewed by a security analyst
- English only: Trained on English-language incident descriptions
Links
- Live Demo: huggingface.co/spaces/vibesecurityguy/veris-classifier
- Training Data: huggingface.co/datasets/vibesecurityguy/veris-classifier-training
- Source Code: github.com/pshamoon/veris-classifier
- VERIS Framework: verisframework.org
Model Card Authors
Peter Shamoon (@vibesecurityguy)
- Downloads last month
- 94