File size: 5,544 Bytes
11644c2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 |
---
license: apache-2.0
base_model: mistralai/Mistral-7B-Instruct-v0.2
tags:
- cybersecurity
- siem
- log-analysis
- threat-detection
- qlora
- peft
- mistral
- fine-tuned
datasets:
- custom
language:
- en
pipeline_tag: text-generation
library_name: transformers
---
# SIEM Log Generator - Mistral 7B QLoRA
A fine-tuned Mistral-7B model specialized in Security Information and Event Management (SIEM) log analysis and generation. This model has been trained using QLoRA (4-bit quantization) on multiple cybersecurity log sources to understand and generate security-related event data.
## Model Description
This model is a specialized variant of Mistral-7B-Instruct fine-tuned for SIEM operations, including:
- Network traffic analysis (DDoS detection, port scanning)
- Authentication event monitoring (credential stuffing, brute force)
- Cloud security events (AWS CloudTrail analysis)
- System log interpretation
- MITRE ATT&CK framework mapping
### Training Data Sources
The model was trained on a diverse set of security logs:
- **Network Logs**: CICIDS2017 dataset (DDoS, PortScan patterns)
- **Authentication Logs**: Risk-based authentication events
- **System Logs**: Linux/Unix syslog events
- **Cloud Logs**: AWS CloudTrail security events
### MITRE ATT&CK Coverage
The model recognizes and maps events to MITRE ATT&CK techniques:
- T1499: Endpoint Denial of Service (DDoS)
- T1046: Network Service Scanning
- T1110: Brute Force
- T1110.004: Credential Stuffing
- T1078.004: Cloud Account Access
## Training Details
### Training Configuration
- **Base Model**: mistralai/Mistral-7B-Instruct-v0.2
- **Method**: QLoRA (4-bit quantization with LoRA adapters)
- **LoRA Rank**: 8
- **LoRA Alpha**: 16
- **Target Modules**: q_proj, v_proj
- **Training Samples**: ~500 diverse security events
- **Batch Size**: 8
- **Learning Rate**: 5e-4
- **Precision**: bfloat16
- **Training Steps**: 50
### Hardware
- **GPU**: NVIDIA Tesla T4 (16GB VRAM)
- **Platform**: Kaggle Notebooks
- **Training Time**: ~5-10 minutes
## Usage
### Installation
```bash
pip install transformers peft torch bitsandbytes accelerate
```
### Loading the Model
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
# Load base model with 4-bit quantization
base_model = "mistralai/Mistral-7B-Instruct-v0.2"
model = AutoModelForCausalLM.from_pretrained(
base_model,
load_in_4bit=True,
device_map="auto"
)
# Load LoRA adapters
model = PeftModel.from_pretrained(model, "your-username/siem-log-generator-mistral-7b-qlora")
tokenizer = AutoTokenizer.from_pretrained("your-username/siem-log-generator-mistral-7b-qlora")
# Generate security event analysis
prompt = "<s>[INST] event=network attack=DDoS [/INST]"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
### Inference Example
```python
# Analyze a security event
event = "timestamp=2024-01-14T10:30:00Z event=auth user=admin attack=BruteForce"
prompt = f"<s>[INST] {event} [/INST]"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=150,
temperature=0.7,
top_p=0.9,
do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```
## Use Cases
### 1. Security Event Classification
Classify incoming logs into attack types or benign traffic.
### 2. MITRE ATT&CK Mapping
Automatically map security events to MITRE ATT&CK framework techniques.
### 3. Log Enrichment
Generate additional context and metadata for security events.
### 4. Threat Intelligence
Analyze patterns and generate threat reports from log data.
### 5. Training Data Generation
Create synthetic security logs for testing SIEM systems.
## Limitations
- **Training Data**: Model trained on limited samples (~500) for demonstration
- **Domain Specific**: Optimized for SIEM/security logs, not general purpose
- **Language**: English only
- **Real-time**: Not optimized for ultra-low latency applications
- **Accuracy**: Should be used as an assistive tool, not sole decision-maker
## Ethical Considerations
⚠️ **Important Security Notice**:
- This model is for **defensive cybersecurity purposes only**
- Do not use for malicious activities or unauthorized access
- Always comply with applicable laws and regulations
- Validate all model outputs before taking action
- Use in conjunction with human security experts
## Model Card Authors
Created by the SIEM Research Team
## Citation
If you use this model in your research, please cite:
```bibtex
@misc{siem-log-generator-2025,
author = {Your Name},
title = {SIEM Log Generator - Mistral 7B QLoRA},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/your-username/siem-log-generator-mistral-7b-qlora}
}
```
## License
This model inherits the Apache 2.0 license from Mistral-7B-Instruct-v0.2.
## Acknowledgments
- **Mistral AI** for the base Mistral-7B-Instruct-v0.2 model
- **CICIDS2017** dataset contributors
- **Hugging Face** for the model hosting platform
- **QLoRA paper** authors for the efficient fine-tuning method
## Contact
For questions or issues, please open an issue on the model repository.
---
**Note**: This is a research/demonstration model. For production SIEM deployments, additional training on larger, domain-specific datasets is recommended.
|