|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: mistralai/Mistral-7B-Instruct-v0.2 |
|
|
tags: |
|
|
- cybersecurity |
|
|
- siem |
|
|
- log-analysis |
|
|
- threat-detection |
|
|
- qlora |
|
|
- peft |
|
|
- mistral |
|
|
- fine-tuned |
|
|
datasets: |
|
|
- custom |
|
|
language: |
|
|
- en |
|
|
pipeline_tag: text-generation |
|
|
library_name: transformers |
|
|
--- |
|
|
|
|
|
# SIEM Log Generator - Mistral 7B QLoRA |
|
|
|
|
|
A fine-tuned Mistral-7B model specialized in Security Information and Event Management (SIEM) log analysis and generation. This model has been trained using QLoRA (4-bit quantization) on multiple cybersecurity log sources to understand and generate security-related event data. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This model is a specialized variant of Mistral-7B-Instruct fine-tuned for SIEM operations, including: |
|
|
- Network traffic analysis (DDoS detection, port scanning) |
|
|
- Authentication event monitoring (credential stuffing, brute force) |
|
|
- Cloud security events (AWS CloudTrail analysis) |
|
|
- System log interpretation |
|
|
- MITRE ATT&CK framework mapping |
|
|
|
|
|
### Training Data Sources |
|
|
|
|
|
The model was trained on a diverse set of security logs: |
|
|
- **Network Logs**: CICIDS2017 dataset (DDoS, PortScan patterns) |
|
|
- **Authentication Logs**: Risk-based authentication events |
|
|
- **System Logs**: Linux/Unix syslog events |
|
|
- **Cloud Logs**: AWS CloudTrail security events |
|
|
|
|
|
### MITRE ATT&CK Coverage |
|
|
|
|
|
The model recognizes and maps events to MITRE ATT&CK techniques: |
|
|
- T1499: Endpoint Denial of Service (DDoS) |
|
|
- T1046: Network Service Scanning |
|
|
- T1110: Brute Force |
|
|
- T1110.004: Credential Stuffing |
|
|
- T1078.004: Cloud Account Access |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Configuration |
|
|
- **Base Model**: mistralai/Mistral-7B-Instruct-v0.2 |
|
|
- **Method**: QLoRA (4-bit quantization with LoRA adapters) |
|
|
- **LoRA Rank**: 8 |
|
|
- **LoRA Alpha**: 16 |
|
|
- **Target Modules**: q_proj, v_proj |
|
|
- **Training Samples**: ~500 diverse security events |
|
|
- **Batch Size**: 8 |
|
|
- **Learning Rate**: 5e-4 |
|
|
- **Precision**: bfloat16 |
|
|
- **Training Steps**: 50 |
|
|
|
|
|
### Hardware |
|
|
- **GPU**: NVIDIA Tesla T4 (16GB VRAM) |
|
|
- **Platform**: Kaggle Notebooks |
|
|
- **Training Time**: ~5-10 minutes |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Installation |
|
|
|
|
|
```bash |
|
|
pip install transformers peft torch bitsandbytes accelerate |
|
|
``` |
|
|
|
|
|
### Loading the Model |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
from peft import PeftModel |
|
|
import torch |
|
|
|
|
|
# Load base model with 4-bit quantization |
|
|
base_model = "mistralai/Mistral-7B-Instruct-v0.2" |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
base_model, |
|
|
load_in_4bit=True, |
|
|
device_map="auto" |
|
|
) |
|
|
|
|
|
# Load LoRA adapters |
|
|
model = PeftModel.from_pretrained(model, "your-username/siem-log-generator-mistral-7b-qlora") |
|
|
tokenizer = AutoTokenizer.from_pretrained("your-username/siem-log-generator-mistral-7b-qlora") |
|
|
|
|
|
# Generate security event analysis |
|
|
prompt = "<s>[INST] event=network attack=DDoS [/INST]" |
|
|
inputs = tokenizer(prompt, return_tensors="pt").to("cuda") |
|
|
outputs = model.generate(**inputs, max_new_tokens=100) |
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
### Inference Example |
|
|
|
|
|
```python |
|
|
# Analyze a security event |
|
|
event = "timestamp=2024-01-14T10:30:00Z event=auth user=admin attack=BruteForce" |
|
|
prompt = f"<s>[INST] {event} [/INST]" |
|
|
|
|
|
inputs = tokenizer(prompt, return_tensors="pt").to("cuda") |
|
|
with torch.no_grad(): |
|
|
outputs = model.generate( |
|
|
**inputs, |
|
|
max_new_tokens=150, |
|
|
temperature=0.7, |
|
|
top_p=0.9, |
|
|
do_sample=True |
|
|
) |
|
|
|
|
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
|
print(response) |
|
|
``` |
|
|
|
|
|
## Use Cases |
|
|
|
|
|
### 1. Security Event Classification |
|
|
Classify incoming logs into attack types or benign traffic. |
|
|
|
|
|
### 2. MITRE ATT&CK Mapping |
|
|
Automatically map security events to MITRE ATT&CK framework techniques. |
|
|
|
|
|
### 3. Log Enrichment |
|
|
Generate additional context and metadata for security events. |
|
|
|
|
|
### 4. Threat Intelligence |
|
|
Analyze patterns and generate threat reports from log data. |
|
|
|
|
|
### 5. Training Data Generation |
|
|
Create synthetic security logs for testing SIEM systems. |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- **Training Data**: Model trained on limited samples (~500) for demonstration |
|
|
- **Domain Specific**: Optimized for SIEM/security logs, not general purpose |
|
|
- **Language**: English only |
|
|
- **Real-time**: Not optimized for ultra-low latency applications |
|
|
- **Accuracy**: Should be used as an assistive tool, not sole decision-maker |
|
|
|
|
|
## Ethical Considerations |
|
|
|
|
|
⚠️ **Important Security Notice**: |
|
|
- This model is for **defensive cybersecurity purposes only** |
|
|
- Do not use for malicious activities or unauthorized access |
|
|
- Always comply with applicable laws and regulations |
|
|
- Validate all model outputs before taking action |
|
|
- Use in conjunction with human security experts |
|
|
|
|
|
## Model Card Authors |
|
|
|
|
|
Created by the SIEM Research Team |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model in your research, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{siem-log-generator-2025, |
|
|
author = {Your Name}, |
|
|
title = {SIEM Log Generator - Mistral 7B QLoRA}, |
|
|
year = {2025}, |
|
|
publisher = {HuggingFace}, |
|
|
url = {https://huggingface.co/your-username/siem-log-generator-mistral-7b-qlora} |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
This model inherits the Apache 2.0 license from Mistral-7B-Instruct-v0.2. |
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
- **Mistral AI** for the base Mistral-7B-Instruct-v0.2 model |
|
|
- **CICIDS2017** dataset contributors |
|
|
- **Hugging Face** for the model hosting platform |
|
|
- **QLoRA paper** authors for the efficient fine-tuning method |
|
|
|
|
|
## Contact |
|
|
|
|
|
For questions or issues, please open an issue on the model repository. |
|
|
|
|
|
--- |
|
|
|
|
|
**Note**: This is a research/demonstration model. For production SIEM deployments, additional training on larger, domain-specific datasets is recommended. |
|
|
|