File size: 4,812 Bytes

5b2ba79
0c2a677
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5b2ba79
 
0c2a677
5b2ba79
0c2a677
5b2ba79
0c2a677
5b2ba79
f40d7b7
 
0c2a677
5b2ba79
0c2a677
 
5b2ba79
0c2a677
 
 
 
 
 
 
 
 
5b2ba79
 
 
0c2a677
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5b2ba79
0c2a677
5b2ba79
0c2a677
5b2ba79
0c2a677
 
5b2ba79
0c2a677
5b2ba79
0c2a677
5b2ba79
0c2a677
5b2ba79
0c2a677
 
 
5b2ba79
0c2a677
 
5b2ba79
0c2a677
 
 
5b2ba79
0c2a677
 
 
 
5b2ba79
0c2a677
 
 
 
 
5b2ba79
0c2a677
5b2ba79
0c2a677
 
 
 
5b2ba79
0c2a677
5b2ba79
0c2a677
 
 
 
5b2ba79
f40d7b7
 
 
 
 
 
 
0c2a677
5b2ba79
0c2a677

---
library_name: peft
license: apache-2.0
base_model: Qwen/Qwen2.5-7B-Instruct
tags:
  - veris
  - cybersecurity
  - incident-classification
  - lora
  - qlora
  - qwen2
datasets:
  - vibesecurityguy/veris-classifier-training
  - vibesecurityguy/veris-incident-classifications
language:
  - en
pipeline_tag: text-generation
---

# VERIS Classifier v1

A fine-tuned [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) model that classifies cybersecurity incident descriptions into the [VERIS](http://veriscommunity.net/) (Vocabulary for Event Recording and Incident Sharing) framework.

Given a plain-English incident description, the model outputs structured JSON with the correct VERIS categories for **action**, **actor**, **asset**, and **attribute**.

**[Try the live demo](https://huggingface.co/spaces/vibesecurityguy/veris-classifier)** — no API key required, runs on ZeroGPU.

## Example

**Input:**
> An employee at a hospital clicked a phishing email, which installed ransomware that encrypted patient records.

**Output:**
```json
{
  "action": {"hacking": {"variety": ["Ransomware"]}, "social": {"variety": ["Phishing"]}},
  "actor": {"external": {"variety": ["Unaffiliated"], "motive": ["Financial"]}},
  "asset": {"assets": [{"variety": "S - Database"}]},
  "attribute": {"availability": {"variety": ["Obscuration"]}}
}
```

## Training Details

| Parameter | Value |
|-----------|-------|
| **Base model** | Qwen/Qwen2.5-7B-Instruct |
| **Method** | QLoRA (4-bit NF4 quantization + LoRA) |
| **LoRA rank (r)** | 16 |
| **LoRA alpha** | 32 |
| **LoRA dropout** | 0.05 |
| **Target modules** | All linear (q, k, v, o, gate, up, down) |
| **Trainable parameters** | 40.4M / 4.4B (0.92%) |
| **Training examples** | 9,813 train / 517 eval |
| **Epochs** | 3 |
| **Batch size** | 2 x 4 gradient accumulation = 8 effective |
| **Learning rate** | 2e-4 (cosine schedule, 10% warmup) |
| **Precision** | bf16 |
| **Optimizer** | AdamW |
| **Max sequence length** | 2,048 tokens |
| **Hardware** | NVIDIA A10G (24GB VRAM) |
| **Adapter size** | 162 MB |

## Training Data

Fine-tuned on [vibesecurityguy/veris-classifier-training](https://huggingface.co/datasets/vibesecurityguy/veris-classifier-training), which contains:

- **10,019 classification examples** — synthetic incident descriptions generated from real [VCDB](https://github.com/vz-risk/VCDB) (Verizon Community Database) records, paired with their ground-truth VERIS classifications
- **311 Q&A pairs** — questions and answers about the VERIS framework itself

The source classifications come from 8,559 real-world incidents in VCDB, spanning healthcare, finance, retail, government, and other industries.

## How to Use

### With Transformers + PEFT

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = "Qwen/Qwen2.5-7B-Instruct"
adapter = "vibesecurityguy/veris-classifier-v1"

tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(base_model, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)

messages = [
    {"role": "system", "content": "You are a VERIS classification expert..."},
    {"role": "user", "content": "Classify this incident: An employee lost a laptop containing unencrypted customer data."}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## Intended Use

This model is designed for:
- Classifying cybersecurity incidents into the VERIS framework
- Answering questions about VERIS categories and taxonomy
- Assisting incident response teams with structured data entry

## Limitations

- **VCDB bias**: Training data over-represents healthcare (HIPAA mandatory disclosure) and US-based incidents
- **Schema version**: Trained primarily on VERIS 1.3.x schema; may not cover all 1.4 additions
- **Not a replacement for human analysis**: Output should be reviewed by a security analyst
- **English only**: Trained on English-language incident descriptions

## Links

- **Live Demo:** [huggingface.co/spaces/vibesecurityguy/veris-classifier](https://huggingface.co/spaces/vibesecurityguy/veris-classifier)
- **Training Data:** [huggingface.co/datasets/vibesecurityguy/veris-classifier-training](https://huggingface.co/datasets/vibesecurityguy/veris-classifier-training)
- **Source Code:** [github.com/pshamoon/veris-classifier](https://github.com/pshamoon/veris-classifier)
- **VERIS Framework:** [verisframework.org](https://verisframework.org/)

## Model Card Authors

Peter Shamoon ([@vibesecurityguy](https://huggingface.co/vibesecurityguy))