File size: 2,606 Bytes
a950722
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f715937
a950722
f715937
a950722
f715937
a950722
f715937
a950722
b229c96
a950722
f715937
a950722
f715937
 
 
 
 
 
 
 
a950722
f715937
 
 
 
 
 
a950722
f715937
a950722
f715937
 
 
a950722
f715937
a950722
f715937
 
 
 
 
 
a950722
f715937
a950722
f715937
 
a950722
f715937
 
 
 
 
a950722
f715937
 
a950722
f715937
 
a950722
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
---
base_model: unsloth/Phi-3-mini-4k-instruct-bnb-4bit
tags:
- text-generation-inference
- transformers
- unsloth
- mistral
- cybersecurity
- threat-intelligence
- cve
license: apache-2.0
language:
- en
---

# 🛡️ CyberThreat Intel LLM (Phi-3-mini Fine-Tuned)

This is a fine-tuned version of Microsoft's **Phi-3-mini-4k-instruct**, optimized specifically to act as a Cybersecurity Threat Analyst. It takes raw CVE vulnerability data and generates professional, structured threat intelligence reports.

**▶️ Try the Live Demo:** [CyberThreat Intel Analyzer (Hugging Face Space)](https://huggingface.co/spaces/vanshkamra12/CyberThreat-Intel-Analyzer)

**💻 Code & Dataset:** [GitHub Repository](https://github.com/vanshkamra12/CyberThreat-Intel-LLM)

---

## 🎯 What it does

Feed the model a raw CVE description, CVSS score, and vendor, and it will generate a comprehensive report including:
- **Executive Summary** (Plain English explanation)
- **Technical Analysis** (Vectors, complexity, privileges)
- **Indicators of Compromise (IOCs)**
- **MITRE ATT&CK Mappings**
- **Risk Assessment**
- **Remediation Steps**
- **Detection Rules** (YARA/Sigma)

## 🧠 Model Details
- **Base Model:** `Phi-3-mini-4k-instruct` (3.8B parameters)
- **Training Method:** QLoRA (4-bit quantization) with Unsloth
- **Trainable Parameters:** 29.8M (0.78% of total)
- **Training Data:** 471 synthetic instruction-tuning pairs generated using Llama 3.1 8B from raw NIST NVD CVE data.
- **Final Training Loss:** 0.337

## 🚀 How to use in Python

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "vanshkamra12/CyberSecurity-Model"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    torch_dtype=torch.float16, 
    device_map="auto"
)

prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Analyze the following vulnerability data and produce a structured threat intelligence report.

### Input:
CVE ID: CVE-2024-21762
Description: A out-of-bound write vulnerability in FortiOS SSL VPN allows a remote unauthenticated attacker to execute arbitrary code or commands via specially crafted HTTP requests.
CVSS Score: 9.8 CRITICAL
Vendor: Fortinet

### Response:
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=1000, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))