File size: 5,544 Bytes

11644c2

---
license: apache-2.0
base_model: mistralai/Mistral-7B-Instruct-v0.2
tags:
- cybersecurity
- siem
- log-analysis
- threat-detection
- qlora
- peft
- mistral
- fine-tuned
datasets:
- custom
language:
- en
pipeline_tag: text-generation
library_name: transformers
---

# SIEM Log Generator - Mistral 7B QLoRA

A fine-tuned Mistral-7B model specialized in Security Information and Event Management (SIEM) log analysis and generation. This model has been trained using QLoRA (4-bit quantization) on multiple cybersecurity log sources to understand and generate security-related event data.

## Model Description

This model is a specialized variant of Mistral-7B-Instruct fine-tuned for SIEM operations, including:
- Network traffic analysis (DDoS detection, port scanning)
- Authentication event monitoring (credential stuffing, brute force)
- Cloud security events (AWS CloudTrail analysis)
- System log interpretation
- MITRE ATT&CK framework mapping

### Training Data Sources

The model was trained on a diverse set of security logs:
- **Network Logs**: CICIDS2017 dataset (DDoS, PortScan patterns)
- **Authentication Logs**: Risk-based authentication events
- **System Logs**: Linux/Unix syslog events
- **Cloud Logs**: AWS CloudTrail security events

### MITRE ATT&CK Coverage

The model recognizes and maps events to MITRE ATT&CK techniques:
- T1499: Endpoint Denial of Service (DDoS)
- T1046: Network Service Scanning
- T1110: Brute Force
- T1110.004: Credential Stuffing
- T1078.004: Cloud Account Access

## Training Details

### Training Configuration
- **Base Model**: mistralai/Mistral-7B-Instruct-v0.2
- **Method**: QLoRA (4-bit quantization with LoRA adapters)
- **LoRA Rank**: 8
- **LoRA Alpha**: 16
- **Target Modules**: q_proj, v_proj
- **Training Samples**: ~500 diverse security events
- **Batch Size**: 8
- **Learning Rate**: 5e-4
- **Precision**: bfloat16
- **Training Steps**: 50

### Hardware
- **GPU**: NVIDIA Tesla T4 (16GB VRAM)
- **Platform**: Kaggle Notebooks
- **Training Time**: ~5-10 minutes

## Usage

### Installation

```bash
pip install transformers peft torch bitsandbytes accelerate
```

### Loading the Model

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

# Load base model with 4-bit quantization
base_model = "mistralai/Mistral-7B-Instruct-v0.2"
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    load_in_4bit=True,
    device_map="auto"
)

# Load LoRA adapters
model = PeftModel.from_pretrained(model, "your-username/siem-log-generator-mistral-7b-qlora")
tokenizer = AutoTokenizer.from_pretrained("your-username/siem-log-generator-mistral-7b-qlora")

# Generate security event analysis
prompt = "<s>[INST] event=network attack=DDoS [/INST]"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

### Inference Example

```python
# Analyze a security event
event = "timestamp=2024-01-14T10:30:00Z event=auth user=admin attack=BruteForce"
prompt = f"<s>[INST] {event} [/INST]"

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=150,
        temperature=0.7,
        top_p=0.9,
        do_sample=True
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```

## Use Cases

### 1. Security Event Classification
Classify incoming logs into attack types or benign traffic.

### 2. MITRE ATT&CK Mapping
Automatically map security events to MITRE ATT&CK framework techniques.

### 3. Log Enrichment
Generate additional context and metadata for security events.

### 4. Threat Intelligence
Analyze patterns and generate threat reports from log data.

### 5. Training Data Generation
Create synthetic security logs for testing SIEM systems.

## Limitations

- **Training Data**: Model trained on limited samples (~500) for demonstration
- **Domain Specific**: Optimized for SIEM/security logs, not general purpose
- **Language**: English only
- **Real-time**: Not optimized for ultra-low latency applications
- **Accuracy**: Should be used as an assistive tool, not sole decision-maker

## Ethical Considerations

⚠️ **Important Security Notice**:
- This model is for **defensive cybersecurity purposes only**
- Do not use for malicious activities or unauthorized access
- Always comply with applicable laws and regulations
- Validate all model outputs before taking action
- Use in conjunction with human security experts

## Model Card Authors

Created by the SIEM Research Team

## Citation

If you use this model in your research, please cite:

```bibtex
@misc{siem-log-generator-2025,
  author = {Your Name},
  title = {SIEM Log Generator - Mistral 7B QLoRA},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/your-username/siem-log-generator-mistral-7b-qlora}
}
```

## License

This model inherits the Apache 2.0 license from Mistral-7B-Instruct-v0.2.

## Acknowledgments

- **Mistral AI** for the base Mistral-7B-Instruct-v0.2 model
- **CICIDS2017** dataset contributors
- **Hugging Face** for the model hosting platform
- **QLoRA paper** authors for the efficient fine-tuning method

## Contact

For questions or issues, please open an issue on the model repository.

---

**Note**: This is a research/demonstration model. For production SIEM deployments, additional training on larger, domain-specific datasets is recommended.