SIEM Cloud Log Generator - Mistral 7B QLoRA
A fine-tuned Mistral-7B model specialized in generating realistic cloud security logs for AWS, Azure, and GCP platforms. This model generates authentic SIEM logs from structured prompts for security testing, training, and analysis.
Model Description
This model generates production-quality cloud security logs including:
- Network Traffic Logs: VPC Flow Logs (AWS), NSG Logs (Azure), VPC Flow Logs (GCP)
- Authentication Events: CloudTrail (AWS), Sign-In Logs (Azure), Audit Logs (GCP)
- Security Events: DDoS attacks, brute force attempts, port scans, infiltration attempts
- MITRE ATT&CK Mapping: Automatic mapping to threat techniques
Training Data
Trained on diverse security datasets:
- CIC-IDS2017: 1,500+ network traffic samples covering 30+ attack types
- RBA Dataset: 500+ authentication events (benign and malicious)
- System Logs: 100+ real syslog entries
- Variations: 2x variations per sample for diversity (~4,200 training samples)
Attack Types Covered
- DDoS variants (LOIC-HTTP, HOIC, Syn, UDP, NTP, DNS, etc.)
- DoS attacks (Hulk, GoldenEye, Slowloris, Slowhttptest)
- Brute Force (SSH, FTP, Web)
- Port Scanning
- Web Attacks (XSS, SQLi)
- Infiltration
- Botnet activity
- Benign traffic
MITRE ATT&CK Coverage
- T1498: Endpoint Denial of Service
- T1499: Endpoint Denial of Service variants
- T1046: Network Service Scanning
- T1110: Brute Force
- T1071: Application Layer Protocol
- T1190: Exploit Public-Facing Application
- T1078: Valid Accounts
Training Details
Configuration
- Base Model: mistralai/Mistral-7B-Instruct-v0.2
- Method: QLoRA (4-bit quantization with LoRA adapters)
- LoRA Rank: 16
- LoRA Alpha: 32
- Target Modules: q_proj, v_proj, k_proj, o_proj
- Trainable Parameters: 13.6M (0.19% of total)
- Training Samples: ~4,200
- Epochs: 1
- Batch Size: 8 (effective: 16 with gradient accumulation)
- Learning Rate: 3e-4
- Max Sequence Length: 384 tokens
- Precision: bfloat16
Hardware
- GPU: NVIDIA Tesla T4 (16GB VRAM)
- Platform: Kaggle Notebooks
- Training Time: ~7-9 hours
Usage
Installation
pip install transformers peft torch bitsandbytes accelerate
Loading the Model
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch
import json
# Configure 4-bit quantization
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True
)
# Load base model
base_model = "mistralai/Mistral-7B-Instruct-v0.2"
model = AutoModelForCausalLM.from_pretrained(
base_model,
quantization_config=bnb_config,
device_map="auto"
)
# Load LoRA adapters
model = PeftModel.from_pretrained(model, "sohomn/siem-log-generator-v1")
tokenizer = AutoTokenizer.from_pretrained("sohomn/siem-log-generator-v1")
# Set to evaluation mode
model.eval()
Generating Logs
def generate_log(prompt, max_tokens=400):
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=max_tokens,
temperature=0.7,
top_p=0.9,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
# Extract generated log (after [/INST])
if "[/INST]" in response:
log_text = response.split("[/INST]")[-1].strip()
try:
return json.loads(log_text)
except:
return log_text
return response
# Example 1: Generate DDoS attack log
prompt = "[INST] Generate cloud security log for DDoS attack [/INST]"
log = generate_log(prompt)
print(json.dumps(log, indent=2))
# Example 2: Generate authentication failure
prompt = "[INST] Generate authentication log for BruteForce [/INST]"
log = generate_log(prompt)
print(json.dumps(log, indent=2))
# Example 3: Generate benign traffic
prompt = "[INST] Generate cloud security log for BENIGN attack [/INST]"
log = generate_log(prompt)
print(json.dumps(log, indent=2))
Example Outputs
AWS CloudTrail DDoS Event:
{
"eventVersion": "1.08",
"eventTime": "2026-01-15T14:32:45.123Z",
"eventSource": "ec2.amazonaws.com",
"eventName": "NetworkActivity.DDoS",
"awsRegion": "us-east-1",
"sourceIPAddress": "203.45.67.89",
"destinationIPAddress": "10.0.1.15",
"threatIntel": {
"attackType": "DDoS",
"mitreAttack": "T1498",
"severity": "high"
}
}
Azure Sign-In Brute Force:
{
"time": "2026-01-15T14:35:22.456Z",
"operationName": "Sign-in activity",
"properties": {
"userPrincipalName": "admin@domain.com",
"ipAddress": "203.45.67.100",
"attackType": "BruteForce",
"mitreAttackId": "T1110"
}
}
Use Cases
- SIEM Testing: Generate realistic logs for testing SIEM detection rules
- Security Training: Create training datasets for SOC analysts
- ML Training: Generate synthetic data for security ML models
- Threat Intelligence: Simulate attack scenarios for analysis
- Compliance Testing: Validate log collection and analysis pipelines
Supported Platforms
- AWS: CloudTrail, VPC Flow Logs, GuardDuty format
- Azure: Sign-In Logs, NSG Flow Logs, Security Center format
- GCP: Cloud Audit Logs, VPC Flow Logs, Security Command Center format
Limitations
- Training on limited samples (~4,200) - may not cover all edge cases
- Optimized for speed (1 epoch) - extended training may improve quality
- English language only
- Requires GPU for efficient inference
- Should be used for testing/training, not as authoritative security data
Ethical Considerations
⚠️ Important:
- For defensive security purposes only
- Do not use for malicious activities
- Validate all outputs before use in production
- Comply with applicable laws and regulations
- Use alongside human security experts
Citation
@misc{siem-log-generator-2026,
author = {Sohom Nandi},
title = {SIEM Cloud Log Generator - Mistral 7B QLoRA},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/sohomn/siem-log-generator-v1}
}
License
Apache 2.0 (inherits from Mistral-7B-Instruct-v0.2)
Acknowledgments
- Mistral AI for Mistral-7B-Instruct-v0.2
- CIC-IDS2017 dataset contributors
- Hugging Face for model hosting
- QLoRA authors for efficient fine-tuning method
Contact
For issues or questions, please open an issue on the model repository.
Model Version: v1.0
Last Updated: January 2026
Status: Experimental - Use with validation
Model tree for Final-year-grp24/siem-log-generator-base-1500ds
Base model
mistralai/Mistral-7B-Instruct-v0.2