Transformers
Safetensors
adarsh-aur's picture
Added Papers in readme.md
c54ef79 verified
metadata
library_name: transformers
license: mit
base_model:
  - mistralai/Mistral-7B-Instruct-v0.3

πŸ›‘οΈ SIEM Multisource Log Generator


πŸš€ Why This Model?

Security teams need large volumes of realistic SIEM data to build, test, and validate detections β€” but real logs are often sensitive, restricted, or unavailable.

The SIEM Multisource Log Generator produces high-quality synthetic security logs that resemble real-world telemetry across multiple sources, enabling safer experimentation, faster iteration, and better analyst training without touching production data.


πŸ“Œ Model Summary

The SIEM Multisource Log Generator is a transformer-based language model fine-tuned to generate synthetic Security Information and Event Management (SIEM) logs. It is designed for cybersecurity research, detection engineering, SOC training, and SIEM validation workflows in non-production environments.


🧠 Model Details

  • Developed by: Adarsh Ranjan
  • Model type: Transformer-based causal language model
  • Base model: mistralai/Mistral-7B-Instruct-v0.3
  • Language: English
  • License: MIT
  • Framework: πŸ€— Hugging Face Transformers
  • Model format: Safetensors

πŸ“„ Associated Papers

This model is fine-tuned from Mistral 7B and is grounded in the following foundational research:

These papers describe the base architecture and training philosophy underlying the model.


πŸ“– What Does It Generate?

The model produces structured and semi-structured SIEM-style logs, including signals from:

  • πŸ” Authentication and identity systems
  • 🌐 Firewalls and network devices
  • πŸ’» Endpoint and host-based agents

All outputs are fully synthetic and safe for testing and research.


🎯 Intended Use

βœ… Direct Use

  • Synthetic SIEM log generation
  • Detection rule and alert testing
  • Security analytics experimentation
  • SOC analyst training and simulations

πŸ” Downstream Use

  • Fine-tuning for organization-specific log formats
  • Integration into SIEM test or staging environments

🚫 Out-of-Scope Use

  • Production ingestion of real security logs
  • Automated security decisions without human oversight
  • Real-world attack execution or facilitation

⚠️ Bias, Risks, and Limitations

  • Synthetic logs may not fully capture real attacker behavior
  • Rare or advanced attack techniques may be underrepresented
  • Benchmarks are qualitative and task-oriented
  • Outputs should be reviewed by cybersecurity professionals

πŸš€ Getting Started

πŸ“¦ Installation

pip install transformers safetensors

πŸ§ͺ Basic Usage Example

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "adarsh-aur/siem-multisource-log-generator"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

prompt = (
    "Generate SIEM logs for a suspicious login scenario.\n"
    "Include timestamp, source IP, username, host, and outcome."
)

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=256)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

🧠 Chain-of-Thought–Safe Prompting (Recommended)

To remain policy-safe and improve output quality, avoid asking for reasoning or explanations.
Instead, request structured outputs directly.

βœ… Preferred

Generate SIEM logs showing a brute-force login attempt.
Return only the logs in JSON format.

❌ Avoid

Explain step by step how an attacker performs a brute-force attack.

The model is optimized for output generation, not procedural reasoning.


🧩 Prompt Templates

πŸ” Authentication Anomaly

Generate SIEM logs for multiple failed login attempts followed by a success.
Include timestamp, username, source IP, host, and result.

🌐 Firewall Activity

Generate firewall logs showing blocked outbound traffic to malicious IPs.
Include rule_id, destination_ip, port, protocol, and action.

πŸ’» Endpoint Detection

Generate endpoint logs for suspicious PowerShell execution.
Include process_name, command_line, parent_process, and severity.

πŸ“ˆ Detection Rule Examples

Example: Brute Force Detection (Pseudo-SPL)

index=auth_logs action=failure
| stats count by src_ip, user
| where count > 5

Example: Suspicious PowerShell Execution

index=endpoint_logs process_name="powershell.exe"
| search command_line="*EncodedCommand*"

These rules can be validated using synthetic logs generated by this model.


πŸ”¬ Benchmarks (Qualitative)

Task Result Summary
Log Structure Consistency High
Field Coherence High
Scenario Diversity Medium–High
Detection Rule Compatibility High

Note: Benchmarks are qualitative and based on domain inspection. No automated scoring metrics are published.


🧾 Dataset Card (Embedded)

Dataset Description

  • Type: Synthetic text-based SIEM logs
  • Sources: Authentication, network, endpoint-style events
  • Sensitive Data: None (fully synthetic)

Dataset Usage

  • SIEM testing and validation
  • Detection engineering
  • Cybersecurity research and education

Dataset Limitations

  • May not reflect organization-specific schemas
  • Rare attack patterns may be underrepresented

πŸ‹οΈ Training Details

Exact training datasets, preprocessing steps, and hyperparameters have not been publicly disclosed.
The model is assumed to be fine-tuned on curated or synthetic SIEM-style log text.


🌱 Environmental Impact

Training-related carbon emissions were not recorded.

Environmental impact can be estimated using:
Lacoste et al. (2019), Quantifying the Carbon Emissions of Machine Learning
https://arxiv.org/abs/1910.09700


βš™οΈ Technical Specifications

  • Architecture: Transformer-based causal language model
  • Objective: Synthetic SIEM log generation
  • Software: Python, PyTorch, Hugging Face Transformers
  • Hardware: Not publicly documented

πŸ“š Citation

@misc{siem_multisource_log_generator,
  title={SIEM Multisource Log Generator},
  author={Adarsh Ranjan},
  year={2025},
  howpublished={Hugging Face Model Hub},
  url={https://huggingface.co/adarsh-aur/siem-multisource-log-generator}
}

πŸ‘€ Model Card Author

Adarsh Ranjan


πŸ’¬ Contact

For questions, feedback, or contributions, please use the Hugging Face model repository discussion page.