Transformers
Safetensors

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

πŸ›‘οΈ SIEM Multisource Log Generator


πŸš€ Why This Model?

Security teams need large volumes of realistic SIEM data to build, test, and validate detections β€” but real logs are often sensitive, restricted, or unavailable.

The SIEM Multisource Log Generator produces high-quality synthetic security logs that resemble real-world telemetry across multiple sources, enabling safer experimentation, faster iteration, and better analyst training without touching production data.


πŸ“Œ Model Summary

The SIEM Multisource Log Generator is a transformer-based language model fine-tuned to generate synthetic Security Information and Event Management (SIEM) logs. It is designed for cybersecurity research, detection engineering, SOC training, and SIEM validation workflows in non-production environments.


🧠 Model Details

  • Developed by: Adarsh Ranjan
  • Model type: Transformer-based causal language model
  • Base model: mistralai/Mistral-7B-Instruct-v0.3
  • Language: English
  • License: MIT
  • Framework: πŸ€— Hugging Face Transformers
  • Model format: Safetensors

πŸ“„ Associated Papers

This model is fine-tuned from Mistral 7B and is grounded in the following foundational research:

These papers describe the base architecture and training philosophy underlying the model.


πŸ“– What Does It Generate?

The model produces structured and semi-structured SIEM-style logs, including signals from:

  • πŸ” Authentication and identity systems
  • 🌐 Firewalls and network devices
  • πŸ’» Endpoint and host-based agents

All outputs are fully synthetic and safe for testing and research.


🎯 Intended Use

βœ… Direct Use

  • Synthetic SIEM log generation
  • Detection rule and alert testing
  • Security analytics experimentation
  • SOC analyst training and simulations

πŸ” Downstream Use

  • Fine-tuning for organization-specific log formats
  • Integration into SIEM test or staging environments

🚫 Out-of-Scope Use

  • Production ingestion of real security logs
  • Automated security decisions without human oversight
  • Real-world attack execution or facilitation

⚠️ Bias, Risks, and Limitations

  • Synthetic logs may not fully capture real attacker behavior
  • Rare or advanced attack techniques may be underrepresented
  • Benchmarks are qualitative and task-oriented
  • Outputs should be reviewed by cybersecurity professionals

πŸš€ Getting Started

πŸ“¦ Installation

pip install transformers safetensors

πŸ§ͺ Basic Usage Example

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "adarsh-aur/siem-multisource-log-generator"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

prompt = (
    "Generate SIEM logs for a suspicious login scenario.\n"
    "Include timestamp, source IP, username, host, and outcome."
)

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=256)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

🧠 Chain-of-Thought–Safe Prompting (Recommended)

To remain policy-safe and improve output quality, avoid asking for reasoning or explanations.
Instead, request structured outputs directly.

βœ… Preferred

Generate SIEM logs showing a brute-force login attempt.
Return only the logs in JSON format.

❌ Avoid

Explain step by step how an attacker performs a brute-force attack.

The model is optimized for output generation, not procedural reasoning.


🧩 Prompt Templates

πŸ” Authentication Anomaly

Generate SIEM logs for multiple failed login attempts followed by a success.
Include timestamp, username, source IP, host, and result.

🌐 Firewall Activity

Generate firewall logs showing blocked outbound traffic to malicious IPs.
Include rule_id, destination_ip, port, protocol, and action.

πŸ’» Endpoint Detection

Generate endpoint logs for suspicious PowerShell execution.
Include process_name, command_line, parent_process, and severity.

πŸ“ˆ Detection Rule Examples

Example: Brute Force Detection (Pseudo-SPL)

index=auth_logs action=failure
| stats count by src_ip, user
| where count > 5

Example: Suspicious PowerShell Execution

index=endpoint_logs process_name="powershell.exe"
| search command_line="*EncodedCommand*"

These rules can be validated using synthetic logs generated by this model.


πŸ”¬ Benchmarks (Qualitative)

Task Result Summary
Log Structure Consistency High
Field Coherence High
Scenario Diversity Medium–High
Detection Rule Compatibility High

Note: Benchmarks are qualitative and based on domain inspection. No automated scoring metrics are published.


🧾 Dataset Card (Embedded)

Dataset Description

  • Type: Synthetic text-based SIEM logs
  • Sources: Authentication, network, endpoint-style events
  • Sensitive Data: None (fully synthetic)

Dataset Usage

  • SIEM testing and validation
  • Detection engineering
  • Cybersecurity research and education

Dataset Limitations

  • May not reflect organization-specific schemas
  • Rare attack patterns may be underrepresented

πŸ‹οΈ Training Details

Exact training datasets, preprocessing steps, and hyperparameters have not been publicly disclosed.
The model is assumed to be fine-tuned on curated or synthetic SIEM-style log text.


🌱 Environmental Impact

Training-related carbon emissions were not recorded.

Environmental impact can be estimated using:
Lacoste et al. (2019), Quantifying the Carbon Emissions of Machine Learning
https://arxiv.org/abs/1910.09700


βš™οΈ Technical Specifications

  • Architecture: Transformer-based causal language model
  • Objective: Synthetic SIEM log generation
  • Software: Python, PyTorch, Hugging Face Transformers
  • Hardware: Not publicly documented

πŸ“š Citation

@misc{siem_multisource_log_generator,
  title={SIEM Multisource Log Generator},
  author={Adarsh Ranjan},
  year={2025},
  howpublished={Hugging Face Model Hub},
  url={https://huggingface.co/adarsh-aur/siem-multisource-log-generator}
}

πŸ‘€ Model Card Author

Adarsh Ranjan


πŸ’¬ Contact

For questions, feedback, or contributions, please use the Hugging Face model repository discussion page.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for adarsh-aur/siem-multisource-log-generator

Finetuned
(414)
this model

Papers for adarsh-aur/siem-multisource-log-generator