---
library_name: peft
license: apache-2.0
base_model: Qwen/Qwen2.5-7B-Instruct
tags:
  - veris
  - cybersecurity
  - incident-classification
  - lora
  - qlora
  - qwen2
datasets:
  - vibesecurityguy/veris-classifier-training
  - vibesecurityguy/veris-incident-classifications
language:
  - en
pipeline_tag: text-generation
---

# VERIS Classifier v1

A fine-tuned [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) model that classifies cybersecurity incident descriptions into the [VERIS](http://veriscommunity.net/) (Vocabulary for Event Recording and Incident Sharing) framework.

Given a plain-English incident description, the model outputs structured JSON with the correct VERIS categories for **action**, **actor**, **asset**, and **attribute**.

**[Try the live demo](https://huggingface.co/spaces/vibesecurityguy/veris-classifier)** — no API key required, runs on ZeroGPU.

## Example

**Input:**
> An employee at a hospital clicked a phishing email, which installed ransomware that encrypted patient records.

**Output:**
```json
{
  "action": {"hacking": {"variety": ["Ransomware"]}, "social": {"variety": ["Phishing"]}},
  "actor": {"external": {"variety": ["Unaffiliated"], "motive": ["Financial"]}},
  "asset": {"assets": [{"variety": "S - Database"}]},
  "attribute": {"availability": {"variety": ["Obscuration"]}}
}
```

## Training Details

| Parameter | Value |
|-----------|-------|
| **Base model** | Qwen/Qwen2.5-7B-Instruct |
| **Method** | QLoRA (4-bit NF4 quantization + LoRA) |
| **LoRA rank (r)** | 16 |
| **LoRA alpha** | 32 |
| **LoRA dropout** | 0.05 |
| **Target modules** | All linear (q, k, v, o, gate, up, down) |
| **Trainable parameters** | 40.4M / 4.4B (0.92%) |
| **Training examples** | 9,813 train / 517 eval |
| **Epochs** | 3 |
| **Batch size** | 2 x 4 gradient accumulation = 8 effective |
| **Learning rate** | 2e-4 (cosine schedule, 10% warmup) |
| **Precision** | bf16 |
| **Optimizer** | AdamW |
| **Max sequence length** | 2,048 tokens |
| **Hardware** | NVIDIA A10G (24GB VRAM) |
| **Adapter size** | 162 MB |

## Training Data

Fine-tuned on [vibesecurityguy/veris-classifier-training](https://huggingface.co/datasets/vibesecurityguy/veris-classifier-training), which contains:

- **10,019 classification examples** — synthetic incident descriptions generated from real [VCDB](https://github.com/vz-risk/VCDB) (Verizon Community Database) records, paired with their ground-truth VERIS classifications
- **311 Q&A pairs** — questions and answers about the VERIS framework itself

The source classifications come from 8,559 real-world incidents in VCDB, spanning healthcare, finance, retail, government, and other industries.

## How to Use

### With Transformers + PEFT

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = "Qwen/Qwen2.5-7B-Instruct"
adapter = "vibesecurityguy/veris-classifier-v1"

tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(base_model, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)

messages = [
    {"role": "system", "content": "You are a VERIS classification expert..."},
    {"role": "user", "content": "Classify this incident: An employee lost a laptop containing unencrypted customer data."}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## Intended Use

This model is designed for:
- Classifying cybersecurity incidents into the VERIS framework
- Answering questions about VERIS categories and taxonomy
- Assisting incident response teams with structured data entry

## Limitations

- **VCDB bias**: Training data over-represents healthcare (HIPAA mandatory disclosure) and US-based incidents
- **Schema version**: Trained primarily on VERIS 1.3.x schema; may not cover all 1.4 additions
- **Not a replacement for human analysis**: Output should be reviewed by a security analyst
- **English only**: Trained on English-language incident descriptions

## Links

- **Live Demo:** [huggingface.co/spaces/vibesecurityguy/veris-classifier](https://huggingface.co/spaces/vibesecurityguy/veris-classifier)
- **Training Data:** [huggingface.co/datasets/vibesecurityguy/veris-classifier-training](https://huggingface.co/datasets/vibesecurityguy/veris-classifier-training)
- **Source Code:** [github.com/pshamoon/veris-classifier](https://github.com/pshamoon/veris-classifier)
- **VERIS Framework:** [verisframework.org](https://verisframework.org/)

## Model Card Authors

Peter Shamoon ([@vibesecurityguy](https://huggingface.co/vibesecurityguy))