sohomn's picture
Upload SIEM Log Generator - QLoRA fine-tuned Mistral-7B
11644c2 verified
---
license: apache-2.0
base_model: mistralai/Mistral-7B-Instruct-v0.2
tags:
- cybersecurity
- siem
- log-analysis
- threat-detection
- qlora
- peft
- mistral
- fine-tuned
datasets:
- custom
language:
- en
pipeline_tag: text-generation
library_name: transformers
---
# SIEM Log Generator - Mistral 7B QLoRA
A fine-tuned Mistral-7B model specialized in Security Information and Event Management (SIEM) log analysis and generation. This model has been trained using QLoRA (4-bit quantization) on multiple cybersecurity log sources to understand and generate security-related event data.
## Model Description
This model is a specialized variant of Mistral-7B-Instruct fine-tuned for SIEM operations, including:
- Network traffic analysis (DDoS detection, port scanning)
- Authentication event monitoring (credential stuffing, brute force)
- Cloud security events (AWS CloudTrail analysis)
- System log interpretation
- MITRE ATT&CK framework mapping
### Training Data Sources
The model was trained on a diverse set of security logs:
- **Network Logs**: CICIDS2017 dataset (DDoS, PortScan patterns)
- **Authentication Logs**: Risk-based authentication events
- **System Logs**: Linux/Unix syslog events
- **Cloud Logs**: AWS CloudTrail security events
### MITRE ATT&CK Coverage
The model recognizes and maps events to MITRE ATT&CK techniques:
- T1499: Endpoint Denial of Service (DDoS)
- T1046: Network Service Scanning
- T1110: Brute Force
- T1110.004: Credential Stuffing
- T1078.004: Cloud Account Access
## Training Details
### Training Configuration
- **Base Model**: mistralai/Mistral-7B-Instruct-v0.2
- **Method**: QLoRA (4-bit quantization with LoRA adapters)
- **LoRA Rank**: 8
- **LoRA Alpha**: 16
- **Target Modules**: q_proj, v_proj
- **Training Samples**: ~500 diverse security events
- **Batch Size**: 8
- **Learning Rate**: 5e-4
- **Precision**: bfloat16
- **Training Steps**: 50
### Hardware
- **GPU**: NVIDIA Tesla T4 (16GB VRAM)
- **Platform**: Kaggle Notebooks
- **Training Time**: ~5-10 minutes
## Usage
### Installation
```bash
pip install transformers peft torch bitsandbytes accelerate
```
### Loading the Model
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
# Load base model with 4-bit quantization
base_model = "mistralai/Mistral-7B-Instruct-v0.2"
model = AutoModelForCausalLM.from_pretrained(
base_model,
load_in_4bit=True,
device_map="auto"
)
# Load LoRA adapters
model = PeftModel.from_pretrained(model, "your-username/siem-log-generator-mistral-7b-qlora")
tokenizer = AutoTokenizer.from_pretrained("your-username/siem-log-generator-mistral-7b-qlora")
# Generate security event analysis
prompt = "<s>[INST] event=network attack=DDoS [/INST]"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
### Inference Example
```python
# Analyze a security event
event = "timestamp=2024-01-14T10:30:00Z event=auth user=admin attack=BruteForce"
prompt = f"<s>[INST] {event} [/INST]"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=150,
temperature=0.7,
top_p=0.9,
do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```
## Use Cases
### 1. Security Event Classification
Classify incoming logs into attack types or benign traffic.
### 2. MITRE ATT&CK Mapping
Automatically map security events to MITRE ATT&CK framework techniques.
### 3. Log Enrichment
Generate additional context and metadata for security events.
### 4. Threat Intelligence
Analyze patterns and generate threat reports from log data.
### 5. Training Data Generation
Create synthetic security logs for testing SIEM systems.
## Limitations
- **Training Data**: Model trained on limited samples (~500) for demonstration
- **Domain Specific**: Optimized for SIEM/security logs, not general purpose
- **Language**: English only
- **Real-time**: Not optimized for ultra-low latency applications
- **Accuracy**: Should be used as an assistive tool, not sole decision-maker
## Ethical Considerations
⚠️ **Important Security Notice**:
- This model is for **defensive cybersecurity purposes only**
- Do not use for malicious activities or unauthorized access
- Always comply with applicable laws and regulations
- Validate all model outputs before taking action
- Use in conjunction with human security experts
## Model Card Authors
Created by the SIEM Research Team
## Citation
If you use this model in your research, please cite:
```bibtex
@misc{siem-log-generator-2025,
author = {Your Name},
title = {SIEM Log Generator - Mistral 7B QLoRA},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/your-username/siem-log-generator-mistral-7b-qlora}
}
```
## License
This model inherits the Apache 2.0 license from Mistral-7B-Instruct-v0.2.
## Acknowledgments
- **Mistral AI** for the base Mistral-7B-Instruct-v0.2 model
- **CICIDS2017** dataset contributors
- **Hugging Face** for the model hosting platform
- **QLoRA paper** authors for the efficient fine-tuning method
## Contact
For questions or issues, please open an issue on the model repository.
---
**Note**: This is a research/demonstration model. For production SIEM deployments, additional training on larger, domain-specific datasets is recommended.