You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

🛡️ SIEM Multisource Log Generator

🚀 Why This Model?

Security teams need large volumes of realistic SIEM data to build, test, and validate detections — but real logs are often sensitive, restricted, or unavailable.

The SIEM Multisource Log Generator produces high-quality synthetic security logs that resemble real-world telemetry across multiple sources, enabling safer experimentation, faster iteration, and better analyst training without touching production data.

📌 Model Summary

The SIEM Multisource Log Generator is a transformer-based language model fine-tuned to generate synthetic Security Information and Event Management (SIEM) logs. It is designed for cybersecurity research, detection engineering, SOC training, and SIEM validation workflows in non-production environments.

🧠 Model Details

Developed by: Adarsh Ranjan
Model type: Transformer-based causal language model
Base model: mistralai/Mistral-7B-Instruct-v0.3
Language: English
License: MIT
Framework: 🤗 Hugging Face Transformers
Model format: Safetensors

📄 Associated Papers

This model is fine-tuned from Mistral 7B and is grounded in the following foundational research:

Mistral 7B
Mistral 7B — Efficient, high-performance open-weight language model
https://huggingface.co/papers/2310.06825
Instruction Tuning & Open Foundation Models
Direct Preference Optimization and Instruction-Following Models
https://huggingface.co/papers/2305.14314

These papers describe the base architecture and training philosophy underlying the model.

📖 What Does It Generate?

The model produces structured and semi-structured SIEM-style logs, including signals from:

🔐 Authentication and identity systems
🌐 Firewalls and network devices
💻 Endpoint and host-based agents

All outputs are fully synthetic and safe for testing and research.

🎯 Intended Use

✅ Direct Use

Synthetic SIEM log generation
Detection rule and alert testing
Security analytics experimentation
SOC analyst training and simulations

🔁 Downstream Use

Fine-tuning for organization-specific log formats
Integration into SIEM test or staging environments

🚫 Out-of-Scope Use

Production ingestion of real security logs
Automated security decisions without human oversight
Real-world attack execution or facilitation

⚠️ Bias, Risks, and Limitations

Synthetic logs may not fully capture real attacker behavior
Rare or advanced attack techniques may be underrepresented
Benchmarks are qualitative and task-oriented
Outputs should be reviewed by cybersecurity professionals

🚀 Getting Started

📦 Installation

pip install transformers safetensors

🧪 Basic Usage Example

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "adarsh-aur/siem-multisource-log-generator"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

prompt = (
    "Generate SIEM logs for a suspicious login scenario.\n"
    "Include timestamp, source IP, username, host, and outcome."
)

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=256)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

🧠 Chain-of-Thought–Safe Prompting (Recommended)

To remain policy-safe and improve output quality, avoid asking for reasoning or explanations.
Instead, request structured outputs directly.

✅ Preferred

Generate SIEM logs showing a brute-force login attempt.
Return only the logs in JSON format.

❌ Avoid

Explain step by step how an attacker performs a brute-force attack.

The model is optimized for output generation, not procedural reasoning.

🧩 Prompt Templates

🔐 Authentication Anomaly

Generate SIEM logs for multiple failed login attempts followed by a success.
Include timestamp, username, source IP, host, and result.

🌐 Firewall Activity

Generate firewall logs showing blocked outbound traffic to malicious IPs.
Include rule_id, destination_ip, port, protocol, and action.

💻 Endpoint Detection

Generate endpoint logs for suspicious PowerShell execution.
Include process_name, command_line, parent_process, and severity.

📈 Detection Rule Examples

Example: Brute Force Detection (Pseudo-SPL)

index=auth_logs action=failure
| stats count by src_ip, user
| where count > 5

Example: Suspicious PowerShell Execution

index=endpoint_logs process_name="powershell.exe"
| search command_line="*EncodedCommand*"

These rules can be validated using synthetic logs generated by this model.

🔬 Benchmarks (Qualitative)

Task	Result Summary
Log Structure Consistency	High
Field Coherence	High
Scenario Diversity	Medium–High
Detection Rule Compatibility	High

Note: Benchmarks are qualitative and based on domain inspection. No automated scoring metrics are published.

🧾 Dataset Card (Embedded)

Dataset Description

Type: Synthetic text-based SIEM logs
Sources: Authentication, network, endpoint-style events
Sensitive Data: None (fully synthetic)

Dataset Usage

SIEM testing and validation
Detection engineering
Cybersecurity research and education

Dataset Limitations

May not reflect organization-specific schemas
Rare attack patterns may be underrepresented

🏋️ Training Details

Exact training datasets, preprocessing steps, and hyperparameters have not been publicly disclosed.
The model is assumed to be fine-tuned on curated or synthetic SIEM-style log text.

🌱 Environmental Impact

Training-related carbon emissions were not recorded.

Environmental impact can be estimated using:
Lacoste et al. (2019), Quantifying the Carbon Emissions of Machine Learning
https://arxiv.org/abs/1910.09700

⚙️ Technical Specifications

Architecture: Transformer-based causal language model
Objective: Synthetic SIEM log generation
Software: Python, PyTorch, Hugging Face Transformers
Hardware: Not publicly documented

📚 Citation

@misc{siem_multisource_log_generator,
  title={SIEM Multisource Log Generator},
  author={Adarsh Ranjan},
  year={2025},
  howpublished={Hugging Face Model Hub},
  url={https://huggingface.co/adarsh-aur/siem-multisource-log-generator}
}

👤 Model Card Author

Adarsh Ranjan

💬 Contact

For questions, feedback, or contributions, please use the Hugging Face model repository discussion page.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for adarsh-aur/siem-multisource-log-generator

Base model

mistralai/Mistral-7B-v0.3

Finetuned

mistralai/Mistral-7B-Instruct-v0.3

Finetuned

(508)

this model

Papers for adarsh-aur/siem-multisource-log-generator