SIEM Cloud Log Generator - Mistral 7B QLoRA

A fine-tuned Mistral-7B model specialized in generating realistic cloud security logs for AWS, Azure, and GCP platforms. This model generates authentic SIEM logs from structured prompts for security testing, training, and analysis.

Model Description

This model generates production-quality cloud security logs including:

Network Traffic Logs: VPC Flow Logs (AWS), NSG Logs (Azure), VPC Flow Logs (GCP)
Authentication Events: CloudTrail (AWS), Sign-In Logs (Azure), Audit Logs (GCP)
Security Events: DDoS attacks, brute force attempts, port scans, infiltration attempts
MITRE ATT&CK Mapping: Automatic mapping to threat techniques

Training Data

Trained on diverse security datasets:

CIC-IDS2017: 1,500+ network traffic samples covering 30+ attack types
RBA Dataset: 500+ authentication events (benign and malicious)
System Logs: 100+ real syslog entries
Variations: 2x variations per sample for diversity (~4,200 training samples)

Attack Types Covered

DDoS variants (LOIC-HTTP, HOIC, Syn, UDP, NTP, DNS, etc.)
DoS attacks (Hulk, GoldenEye, Slowloris, Slowhttptest)
Brute Force (SSH, FTP, Web)
Port Scanning
Web Attacks (XSS, SQLi)
Infiltration
Botnet activity
Benign traffic

MITRE ATT&CK Coverage

T1498: Endpoint Denial of Service
T1499: Endpoint Denial of Service variants
T1046: Network Service Scanning
T1110: Brute Force
T1071: Application Layer Protocol
T1190: Exploit Public-Facing Application
T1078: Valid Accounts

Training Details

Configuration

Base Model: mistralai/Mistral-7B-Instruct-v0.2
Method: QLoRA (4-bit quantization with LoRA adapters)
LoRA Rank: 16
LoRA Alpha: 32
Target Modules: q_proj, v_proj, k_proj, o_proj
Trainable Parameters: 13.6M (0.19% of total)
Training Samples: ~4,200
Epochs: 1
Batch Size: 8 (effective: 16 with gradient accumulation)
Learning Rate: 3e-4
Max Sequence Length: 384 tokens
Precision: bfloat16

Hardware

GPU: NVIDIA Tesla T4 (16GB VRAM)
Platform: Kaggle Notebooks
Training Time: ~7-9 hours

Usage

Installation

pip install transformers peft torch bitsandbytes accelerate

Loading the Model

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch
import json

# Configure 4-bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True
)

# Load base model
base_model = "mistralai/Mistral-7B-Instruct-v0.2"
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=bnb_config,
    device_map="auto"
)

# Load LoRA adapters
model = PeftModel.from_pretrained(model, "sohomn/siem-log-generator-v1")
tokenizer = AutoTokenizer.from_pretrained("sohomn/siem-log-generator-v1")

# Set to evaluation mode
model.eval()

Generating Logs

def generate_log(prompt, max_tokens=400):
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_tokens,
            temperature=0.7,
            top_p=0.9,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )
    
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    
    # Extract generated log (after [/INST])
    if "[/INST]" in response:
        log_text = response.split("[/INST]")[-1].strip()
        try:
            return json.loads(log_text)
        except:
            return log_text
    return response

# Example 1: Generate DDoS attack log
prompt = "[INST] Generate cloud security log for DDoS attack [/INST]"
log = generate_log(prompt)
print(json.dumps(log, indent=2))

# Example 2: Generate authentication failure
prompt = "[INST] Generate authentication log for BruteForce [/INST]"
log = generate_log(prompt)
print(json.dumps(log, indent=2))

# Example 3: Generate benign traffic
prompt = "[INST] Generate cloud security log for BENIGN attack [/INST]"
log = generate_log(prompt)
print(json.dumps(log, indent=2))

Example Outputs

AWS CloudTrail DDoS Event:

{
  "eventVersion": "1.08",
  "eventTime": "2026-01-15T14:32:45.123Z",
  "eventSource": "ec2.amazonaws.com",
  "eventName": "NetworkActivity.DDoS",
  "awsRegion": "us-east-1",
  "sourceIPAddress": "203.45.67.89",
  "destinationIPAddress": "10.0.1.15",
  "threatIntel": {
    "attackType": "DDoS",
    "mitreAttack": "T1498",
    "severity": "high"
  }
}

Azure Sign-In Brute Force:

{
  "time": "2026-01-15T14:35:22.456Z",
  "operationName": "Sign-in activity",
  "properties": {
    "userPrincipalName": "admin@domain.com",
    "ipAddress": "203.45.67.100",
    "attackType": "BruteForce",
    "mitreAttackId": "T1110"
  }
}

Use Cases

SIEM Testing: Generate realistic logs for testing SIEM detection rules
Security Training: Create training datasets for SOC analysts
ML Training: Generate synthetic data for security ML models
Threat Intelligence: Simulate attack scenarios for analysis
Compliance Testing: Validate log collection and analysis pipelines

Supported Platforms

AWS: CloudTrail, VPC Flow Logs, GuardDuty format
Azure: Sign-In Logs, NSG Flow Logs, Security Center format
GCP: Cloud Audit Logs, VPC Flow Logs, Security Command Center format

Limitations

Training on limited samples (~4,200) - may not cover all edge cases
Optimized for speed (1 epoch) - extended training may improve quality
English language only
Requires GPU for efficient inference
Should be used for testing/training, not as authoritative security data

Ethical Considerations

⚠️ Important:

For defensive security purposes only
Do not use for malicious activities
Validate all outputs before use in production
Comply with applicable laws and regulations
Use alongside human security experts

Citation

@misc{siem-log-generator-2026,
  author = {Sohom Nandi},
  title = {SIEM Cloud Log Generator - Mistral 7B QLoRA},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/sohomn/siem-log-generator-v1}
}

License

Apache 2.0 (inherits from Mistral-7B-Instruct-v0.2)

Acknowledgments

Mistral AI for Mistral-7B-Instruct-v0.2
CIC-IDS2017 dataset contributors
Hugging Face for model hosting
QLoRA authors for efficient fine-tuning method

Contact

For issues or questions, please open an issue on the model repository.

Model Version: v1.0
Last Updated: January 2026
Status: Experimental - Use with validation

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Final-year-grp24/siem-log-generator-base-1500ds

Base model

mistralai/Mistral-7B-Instruct-v0.2

Finetuned

(1065)

this model