Upload README.md with huggingface_hub

48a4a0d verified 24 days ago

4.79 kB

	---
	library_name: peft
	license: apache-2.0
	base_model: mistralai/Mistral-7B-Instruct-v0.3
	tags:
	- veris
	- cybersecurity
	- incident-classification
	- lora
	- qlora
	- mistral
	datasets:
	- vibesecurityguy/veris-classifier-training
	- vibesecurityguy/veris-incident-classifications
	language:
	- en
	pipeline_tag: text-generation
	---

	# VERIS Classifier v2

	A fine-tuned [Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) model that classifies cybersecurity incident descriptions into the [VERIS](http://veriscommunity.net/) (Vocabulary for Event Recording and Incident Sharing) framework.

	Given a plain-English incident description, the model outputs structured JSON with the correct VERIS categories for action, actor, asset, and attribute.

	[Try the live demo](https://huggingface.co/spaces/vibesecurityguy/veris-classifier) — no API key required, runs on ZeroGPU.

	## Example

	Input:
	> An employee at a hospital clicked a phishing email, which installed ransomware that encrypted patient records.

	Output:
	```json
	{
	"action": {"hacking": {"variety": ["Ransomware"]}, "social": {"variety": ["Phishing"]}},
	"actor": {"external": {"variety": ["Unaffiliated"], "motive": ["Financial"]}},
	"asset": {"assets": [{"variety": "S - Database"}]},
	"attribute": {"availability": {"variety": ["Obscuration"]}}
	}
	```

	## Training Details

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Base model \| mistralai/Mistral-7B-Instruct-v0.3 \|
	\| Method \| QLoRA (4-bit NF4 quantization + LoRA) \|
	\| LoRA rank (r) \| 16 \|
	\| LoRA alpha \| 32 \|
	\| LoRA dropout \| 0.05 \|
	\| Target modules \| All linear (q, k, v, o, gate, up, down) \|
	\| Training examples \| 9,813 train / 517 eval \|
	\| Epochs \| 3 \|
	\| Batch size \| 2 x 4 gradient accumulation = 8 effective \|
	\| Learning rate \| 2e-4 (cosine schedule, 10% warmup) \|
	\| Precision \| bf16 \|
	\| Optimizer \| AdamW \|
	\| Max sequence length \| 2,048 tokens \|
	\| Hardware \| NVIDIA A10G (24GB VRAM) \|

	## Training Data

	Fine-tuned on [vibesecurityguy/veris-classifier-training](https://huggingface.co/datasets/vibesecurityguy/veris-classifier-training), which contains:

	- 10,019 classification examples — synthetic incident descriptions generated from real [VCDB](https://github.com/vz-risk/VCDB) (Verizon Community Database) records, paired with their ground-truth VERIS classifications
	- 311 Q&A pairs — questions and answers about the VERIS framework itself

	The source classifications come from 8,559 real-world incidents in VCDB, spanning healthcare, finance, retail, government, and other industries.

	## How to Use

	### With Transformers + PEFT

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel

	base_model = "mistralai/Mistral-7B-Instruct-v0.3"
	adapter = "vibesecurityguy/veris-classifier-v2"

	tokenizer = AutoTokenizer.from_pretrained(base_model)
	model = AutoModelForCausalLM.from_pretrained(base_model, device_map="auto")
	model = PeftModel.from_pretrained(model, adapter)

	messages = [
	{"role": "system", "content": "You are a VERIS classification expert..."},
	{"role": "user", "content": "Classify this incident: An employee lost a laptop containing unencrypted customer data."}
	]

	text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = tokenizer(text, return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=512)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	## Intended Use

	This model is designed for:
	- Classifying cybersecurity incidents into the VERIS framework
	- Answering questions about VERIS categories and taxonomy
	- Assisting incident response teams with structured data entry

	## Limitations

	- VCDB bias: Training data over-represents healthcare (HIPAA mandatory disclosure) and US-based incidents
	- Schema version: Trained primarily on VERIS 1.3.x schema; may not cover all 1.4 additions
	- Not a replacement for human analysis: Output should be reviewed by a security analyst
	- English only: Trained on English-language incident descriptions

	## Links

	- Live Demo: [huggingface.co/spaces/vibesecurityguy/veris-classifier](https://huggingface.co/spaces/vibesecurityguy/veris-classifier)
	- Training Data: [huggingface.co/datasets/vibesecurityguy/veris-classifier-training](https://huggingface.co/datasets/vibesecurityguy/veris-classifier-training)
	- Source Code: [github.com/petershamoon/veris-classifier](https://github.com/petershamoon/veris-classifier)
	- VERIS Framework: [verisframework.org](https://verisframework.org/)

	## Model Card Authors

	Peter Shamoon ([@vibesecurityguy](https://huggingface.co/vibesecurityguy))