VERIS Classifier v2

A fine-tuned Mistral-7B-Instruct-v0.3 model that classifies cybersecurity incident descriptions into the VERIS (Vocabulary for Event Recording and Incident Sharing) framework.

Given a plain-English incident description, the model outputs structured JSON with the correct VERIS categories for action, actor, asset, and attribute.

Try the live demo — no API key required, runs on ZeroGPU.

Example

Input:

An employee at a hospital clicked a phishing email, which installed ransomware that encrypted patient records.

Output:

{
  "action": {"hacking": {"variety": ["Ransomware"]}, "social": {"variety": ["Phishing"]}},
  "actor": {"external": {"variety": ["Unaffiliated"], "motive": ["Financial"]}},
  "asset": {"assets": [{"variety": "S - Database"}]},
  "attribute": {"availability": {"variety": ["Obscuration"]}}
}

Training Details

Parameter Value
Base model mistralai/Mistral-7B-Instruct-v0.3
Method QLoRA (4-bit NF4 quantization + LoRA)
LoRA rank (r) 16
LoRA alpha 32
LoRA dropout 0.05
Target modules All linear (q, k, v, o, gate, up, down)
Training examples 9,813 train / 517 eval
Epochs 3
Batch size 2 x 4 gradient accumulation = 8 effective
Learning rate 2e-4 (cosine schedule, 10% warmup)
Precision bf16
Optimizer AdamW
Max sequence length 2,048 tokens
Hardware NVIDIA A10G (24GB VRAM)

Training Data

Fine-tuned on vibesecurityguy/veris-classifier-training, which contains:

  • 10,019 classification examples — synthetic incident descriptions generated from real VCDB (Verizon Community Database) records, paired with their ground-truth VERIS classifications
  • 311 Q&A pairs — questions and answers about the VERIS framework itself

The source classifications come from 8,559 real-world incidents in VCDB, spanning healthcare, finance, retail, government, and other industries.

How to Use

With Transformers + PEFT

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = "mistralai/Mistral-7B-Instruct-v0.3"
adapter = "vibesecurityguy/veris-classifier-v2"

tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(base_model, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)

messages = [
    {"role": "system", "content": "You are a VERIS classification expert..."},
    {"role": "user", "content": "Classify this incident: An employee lost a laptop containing unencrypted customer data."}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Intended Use

This model is designed for:

  • Classifying cybersecurity incidents into the VERIS framework
  • Answering questions about VERIS categories and taxonomy
  • Assisting incident response teams with structured data entry

Limitations

  • VCDB bias: Training data over-represents healthcare (HIPAA mandatory disclosure) and US-based incidents
  • Schema version: Trained primarily on VERIS 1.3.x schema; may not cover all 1.4 additions
  • Not a replacement for human analysis: Output should be reviewed by a security analyst
  • English only: Trained on English-language incident descriptions

Links

Model Card Authors

Peter Shamoon (@vibesecurityguy)

Downloads last month
55
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for vibesecurityguy/veris-classifier-v2

Adapter
(757)
this model

Datasets used to train vibesecurityguy/veris-classifier-v2

Space using vibesecurityguy/veris-classifier-v2 1