File size: 4,812 Bytes
5b2ba79 0c2a677 5b2ba79 0c2a677 5b2ba79 0c2a677 5b2ba79 0c2a677 5b2ba79 f40d7b7 0c2a677 5b2ba79 0c2a677 5b2ba79 0c2a677 5b2ba79 0c2a677 5b2ba79 0c2a677 5b2ba79 0c2a677 5b2ba79 0c2a677 5b2ba79 0c2a677 5b2ba79 0c2a677 5b2ba79 0c2a677 5b2ba79 0c2a677 5b2ba79 0c2a677 5b2ba79 0c2a677 5b2ba79 0c2a677 5b2ba79 0c2a677 5b2ba79 0c2a677 5b2ba79 0c2a677 5b2ba79 0c2a677 5b2ba79 0c2a677 5b2ba79 f40d7b7 0c2a677 5b2ba79 0c2a677 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 | ---
library_name: peft
license: apache-2.0
base_model: Qwen/Qwen2.5-7B-Instruct
tags:
- veris
- cybersecurity
- incident-classification
- lora
- qlora
- qwen2
datasets:
- vibesecurityguy/veris-classifier-training
- vibesecurityguy/veris-incident-classifications
language:
- en
pipeline_tag: text-generation
---
# VERIS Classifier v1
A fine-tuned [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) model that classifies cybersecurity incident descriptions into the [VERIS](http://veriscommunity.net/) (Vocabulary for Event Recording and Incident Sharing) framework.
Given a plain-English incident description, the model outputs structured JSON with the correct VERIS categories for **action**, **actor**, **asset**, and **attribute**.
**[Try the live demo](https://huggingface.co/spaces/vibesecurityguy/veris-classifier)** — no API key required, runs on ZeroGPU.
## Example
**Input:**
> An employee at a hospital clicked a phishing email, which installed ransomware that encrypted patient records.
**Output:**
```json
{
"action": {"hacking": {"variety": ["Ransomware"]}, "social": {"variety": ["Phishing"]}},
"actor": {"external": {"variety": ["Unaffiliated"], "motive": ["Financial"]}},
"asset": {"assets": [{"variety": "S - Database"}]},
"attribute": {"availability": {"variety": ["Obscuration"]}}
}
```
## Training Details
| Parameter | Value |
|-----------|-------|
| **Base model** | Qwen/Qwen2.5-7B-Instruct |
| **Method** | QLoRA (4-bit NF4 quantization + LoRA) |
| **LoRA rank (r)** | 16 |
| **LoRA alpha** | 32 |
| **LoRA dropout** | 0.05 |
| **Target modules** | All linear (q, k, v, o, gate, up, down) |
| **Trainable parameters** | 40.4M / 4.4B (0.92%) |
| **Training examples** | 9,813 train / 517 eval |
| **Epochs** | 3 |
| **Batch size** | 2 x 4 gradient accumulation = 8 effective |
| **Learning rate** | 2e-4 (cosine schedule, 10% warmup) |
| **Precision** | bf16 |
| **Optimizer** | AdamW |
| **Max sequence length** | 2,048 tokens |
| **Hardware** | NVIDIA A10G (24GB VRAM) |
| **Adapter size** | 162 MB |
## Training Data
Fine-tuned on [vibesecurityguy/veris-classifier-training](https://huggingface.co/datasets/vibesecurityguy/veris-classifier-training), which contains:
- **10,019 classification examples** — synthetic incident descriptions generated from real [VCDB](https://github.com/vz-risk/VCDB) (Verizon Community Database) records, paired with their ground-truth VERIS classifications
- **311 Q&A pairs** — questions and answers about the VERIS framework itself
The source classifications come from 8,559 real-world incidents in VCDB, spanning healthcare, finance, retail, government, and other industries.
## How to Use
### With Transformers + PEFT
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model = "Qwen/Qwen2.5-7B-Instruct"
adapter = "vibesecurityguy/veris-classifier-v1"
tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(base_model, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)
messages = [
{"role": "system", "content": "You are a VERIS classification expert..."},
{"role": "user", "content": "Classify this incident: An employee lost a laptop containing unencrypted customer data."}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Intended Use
This model is designed for:
- Classifying cybersecurity incidents into the VERIS framework
- Answering questions about VERIS categories and taxonomy
- Assisting incident response teams with structured data entry
## Limitations
- **VCDB bias**: Training data over-represents healthcare (HIPAA mandatory disclosure) and US-based incidents
- **Schema version**: Trained primarily on VERIS 1.3.x schema; may not cover all 1.4 additions
- **Not a replacement for human analysis**: Output should be reviewed by a security analyst
- **English only**: Trained on English-language incident descriptions
## Links
- **Live Demo:** [huggingface.co/spaces/vibesecurityguy/veris-classifier](https://huggingface.co/spaces/vibesecurityguy/veris-classifier)
- **Training Data:** [huggingface.co/datasets/vibesecurityguy/veris-classifier-training](https://huggingface.co/datasets/vibesecurityguy/veris-classifier-training)
- **Source Code:** [github.com/pshamoon/veris-classifier](https://github.com/pshamoon/veris-classifier)
- **VERIS Framework:** [verisframework.org](https://verisframework.org/)
## Model Card Authors
Peter Shamoon ([@vibesecurityguy](https://huggingface.co/vibesecurityguy))
|