SIEM Cloud Log Generator - Mistral 7B QLoRA

A fine-tuned Mistral-7B model specialized in generating realistic cloud security logs for AWS, Azure, and GCP platforms. This model generates authentic SIEM logs from structured prompts for security testing, training, and analysis.

Model Description

This model generates production-quality cloud security logs including:

  • Network Traffic Logs: VPC Flow Logs (AWS), NSG Logs (Azure), VPC Flow Logs (GCP)
  • Authentication Events: CloudTrail (AWS), Sign-In Logs (Azure), Audit Logs (GCP)
  • Security Events: DDoS attacks, brute force attempts, port scans, infiltration attempts
  • MITRE ATT&CK Mapping: Automatic mapping to threat techniques

Training Data

Trained on diverse security datasets:

  • CIC-IDS2017: 1,500+ network traffic samples covering 30+ attack types
  • RBA Dataset: 500+ authentication events (benign and malicious)
  • System Logs: 100+ real syslog entries
  • Variations: 2x variations per sample for diversity (~4,200 training samples)

Attack Types Covered

  • DDoS variants (LOIC-HTTP, HOIC, Syn, UDP, NTP, DNS, etc.)
  • DoS attacks (Hulk, GoldenEye, Slowloris, Slowhttptest)
  • Brute Force (SSH, FTP, Web)
  • Port Scanning
  • Web Attacks (XSS, SQLi)
  • Infiltration
  • Botnet activity
  • Benign traffic

MITRE ATT&CK Coverage

  • T1498: Endpoint Denial of Service
  • T1499: Endpoint Denial of Service variants
  • T1046: Network Service Scanning
  • T1110: Brute Force
  • T1071: Application Layer Protocol
  • T1190: Exploit Public-Facing Application
  • T1078: Valid Accounts

Training Details

Configuration

  • Base Model: mistralai/Mistral-7B-Instruct-v0.2
  • Method: QLoRA (4-bit quantization with LoRA adapters)
  • LoRA Rank: 16
  • LoRA Alpha: 32
  • Target Modules: q_proj, v_proj, k_proj, o_proj
  • Trainable Parameters: 13.6M (0.19% of total)
  • Training Samples: ~4,200
  • Epochs: 1
  • Batch Size: 8 (effective: 16 with gradient accumulation)
  • Learning Rate: 3e-4
  • Max Sequence Length: 384 tokens
  • Precision: bfloat16

Hardware

  • GPU: NVIDIA Tesla T4 (16GB VRAM)
  • Platform: Kaggle Notebooks
  • Training Time: ~7-9 hours

Usage

Installation

pip install transformers peft torch bitsandbytes accelerate

Loading the Model

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch
import json

# Configure 4-bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True
)

# Load base model
base_model = "mistralai/Mistral-7B-Instruct-v0.2"
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=bnb_config,
    device_map="auto"
)

# Load LoRA adapters
model = PeftModel.from_pretrained(model, "sohomn/siem-log-generator-v1")
tokenizer = AutoTokenizer.from_pretrained("sohomn/siem-log-generator-v1")

# Set to evaluation mode
model.eval()

Generating Logs

def generate_log(prompt, max_tokens=400):
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_tokens,
            temperature=0.7,
            top_p=0.9,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )
    
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    
    # Extract generated log (after [/INST])
    if "[/INST]" in response:
        log_text = response.split("[/INST]")[-1].strip()
        try:
            return json.loads(log_text)
        except:
            return log_text
    return response

# Example 1: Generate DDoS attack log
prompt = "[INST] Generate cloud security log for DDoS attack [/INST]"
log = generate_log(prompt)
print(json.dumps(log, indent=2))

# Example 2: Generate authentication failure
prompt = "[INST] Generate authentication log for BruteForce [/INST]"
log = generate_log(prompt)
print(json.dumps(log, indent=2))

# Example 3: Generate benign traffic
prompt = "[INST] Generate cloud security log for BENIGN attack [/INST]"
log = generate_log(prompt)
print(json.dumps(log, indent=2))

Example Outputs

AWS CloudTrail DDoS Event:

{
  "eventVersion": "1.08",
  "eventTime": "2026-01-15T14:32:45.123Z",
  "eventSource": "ec2.amazonaws.com",
  "eventName": "NetworkActivity.DDoS",
  "awsRegion": "us-east-1",
  "sourceIPAddress": "203.45.67.89",
  "destinationIPAddress": "10.0.1.15",
  "threatIntel": {
    "attackType": "DDoS",
    "mitreAttack": "T1498",
    "severity": "high"
  }
}

Azure Sign-In Brute Force:

{
  "time": "2026-01-15T14:35:22.456Z",
  "operationName": "Sign-in activity",
  "properties": {
    "userPrincipalName": "admin@domain.com",
    "ipAddress": "203.45.67.100",
    "attackType": "BruteForce",
    "mitreAttackId": "T1110"
  }
}

Use Cases

  1. SIEM Testing: Generate realistic logs for testing SIEM detection rules
  2. Security Training: Create training datasets for SOC analysts
  3. ML Training: Generate synthetic data for security ML models
  4. Threat Intelligence: Simulate attack scenarios for analysis
  5. Compliance Testing: Validate log collection and analysis pipelines

Supported Platforms

  • AWS: CloudTrail, VPC Flow Logs, GuardDuty format
  • Azure: Sign-In Logs, NSG Flow Logs, Security Center format
  • GCP: Cloud Audit Logs, VPC Flow Logs, Security Command Center format

Limitations

  • Training on limited samples (~4,200) - may not cover all edge cases
  • Optimized for speed (1 epoch) - extended training may improve quality
  • English language only
  • Requires GPU for efficient inference
  • Should be used for testing/training, not as authoritative security data

Ethical Considerations

⚠️ Important:

  • For defensive security purposes only
  • Do not use for malicious activities
  • Validate all outputs before use in production
  • Comply with applicable laws and regulations
  • Use alongside human security experts

Citation

@misc{siem-log-generator-2026,
  author = {Sohom Nandi},
  title = {SIEM Cloud Log Generator - Mistral 7B QLoRA},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/sohomn/siem-log-generator-v1}
}

License

Apache 2.0 (inherits from Mistral-7B-Instruct-v0.2)

Acknowledgments

  • Mistral AI for Mistral-7B-Instruct-v0.2
  • CIC-IDS2017 dataset contributors
  • Hugging Face for model hosting
  • QLoRA authors for efficient fine-tuning method

Contact

For issues or questions, please open an issue on the model repository.


Model Version: v1.0
Last Updated: January 2026
Status: Experimental - Use with validation

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Final-year-grp24/siem-log-generator-base-1500ds

Finetuned
(1065)
this model