Upload SIEM Log Generator - QLoRA fine-tuned Mistral-7B

11644c2 verified 23 days ago

5.54 kB

	---
	license: apache-2.0
	base_model: mistralai/Mistral-7B-Instruct-v0.2
	tags:
	- cybersecurity
	- siem
	- log-analysis
	- threat-detection
	- qlora
	- peft
	- mistral
	- fine-tuned
	datasets:
	- custom
	language:
	- en
	pipeline_tag: text-generation
	library_name: transformers
	---

	# SIEM Log Generator - Mistral 7B QLoRA

	A fine-tuned Mistral-7B model specialized in Security Information and Event Management (SIEM) log analysis and generation. This model has been trained using QLoRA (4-bit quantization) on multiple cybersecurity log sources to understand and generate security-related event data.

	## Model Description

	This model is a specialized variant of Mistral-7B-Instruct fine-tuned for SIEM operations, including:
	- Network traffic analysis (DDoS detection, port scanning)
	- Authentication event monitoring (credential stuffing, brute force)
	- Cloud security events (AWS CloudTrail analysis)
	- System log interpretation
	- MITRE ATT&CK framework mapping

	### Training Data Sources

	The model was trained on a diverse set of security logs:
	- Network Logs: CICIDS2017 dataset (DDoS, PortScan patterns)
	- Authentication Logs: Risk-based authentication events
	- System Logs: Linux/Unix syslog events
	- Cloud Logs: AWS CloudTrail security events

	### MITRE ATT&CK Coverage

	The model recognizes and maps events to MITRE ATT&CK techniques:
	- T1499: Endpoint Denial of Service (DDoS)
	- T1046: Network Service Scanning
	- T1110: Brute Force
	- T1110.004: Credential Stuffing
	- T1078.004: Cloud Account Access

	## Training Details

	### Training Configuration
	- Base Model: mistralai/Mistral-7B-Instruct-v0.2
	- Method: QLoRA (4-bit quantization with LoRA adapters)
	- LoRA Rank: 8
	- LoRA Alpha: 16
	- Target Modules: q_proj, v_proj
	- Training Samples: ~500 diverse security events
	- Batch Size: 8
	- Learning Rate: 5e-4
	- Precision: bfloat16
	- Training Steps: 50

	### Hardware
	- GPU: NVIDIA Tesla T4 (16GB VRAM)
	- Platform: Kaggle Notebooks
	- Training Time: ~5-10 minutes

	## Usage

	### Installation

	```bash
	pip install transformers peft torch bitsandbytes accelerate
	```

	### Loading the Model

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	from peft import PeftModel
	import torch

	# Load base model with 4-bit quantization
	base_model = "mistralai/Mistral-7B-Instruct-v0.2"
	model = AutoModelForCausalLM.from_pretrained(
	base_model,
	load_in_4bit=True,
	device_map="auto"
	)

	# Load LoRA adapters
	model = PeftModel.from_pretrained(model, "your-username/siem-log-generator-mistral-7b-qlora")
	tokenizer = AutoTokenizer.from_pretrained("your-username/siem-log-generator-mistral-7b-qlora")

	# Generate security event analysis
	prompt = "<s>[INST] event=network attack=DDoS [/INST]"
	inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
	outputs = model.generate(**inputs, max_new_tokens=100)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	### Inference Example

	```python
	# Analyze a security event
	event = "timestamp=2024-01-14T10:30:00Z event=auth user=admin attack=BruteForce"
	prompt = f"<s>[INST] {event} [/INST]"

	inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
	with torch.no_grad():
	outputs = model.generate(
	**inputs,
	max_new_tokens=150,
	temperature=0.7,
	top_p=0.9,
	do_sample=True
	)

	response = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print(response)
	```

	## Use Cases

	### 1. Security Event Classification
	Classify incoming logs into attack types or benign traffic.

	### 2. MITRE ATT&CK Mapping
	Automatically map security events to MITRE ATT&CK framework techniques.

	### 3. Log Enrichment
	Generate additional context and metadata for security events.

	### 4. Threat Intelligence
	Analyze patterns and generate threat reports from log data.

	### 5. Training Data Generation
	Create synthetic security logs for testing SIEM systems.

	## Limitations

	- Training Data: Model trained on limited samples (~500) for demonstration
	- Domain Specific: Optimized for SIEM/security logs, not general purpose
	- Language: English only
	- Real-time: Not optimized for ultra-low latency applications
	- Accuracy: Should be used as an assistive tool, not sole decision-maker

	## Ethical Considerations

	⚠️ Important Security Notice:
	- This model is for defensive cybersecurity purposes only
	- Do not use for malicious activities or unauthorized access
	- Always comply with applicable laws and regulations
	- Validate all model outputs before taking action
	- Use in conjunction with human security experts

	## Model Card Authors

	Created by the SIEM Research Team

	## Citation

	If you use this model in your research, please cite:

	```bibtex
	@misc{siem-log-generator-2025,
	author = {Your Name},
	title = {SIEM Log Generator - Mistral 7B QLoRA},
	year = {2025},
	publisher = {HuggingFace},
	url = {https://huggingface.co/your-username/siem-log-generator-mistral-7b-qlora}
	}
	```

	## License

	This model inherits the Apache 2.0 license from Mistral-7B-Instruct-v0.2.

	## Acknowledgments

	- Mistral AI for the base Mistral-7B-Instruct-v0.2 model
	- CICIDS2017 dataset contributors
	- Hugging Face for the model hosting platform
	- QLoRA paper authors for the efficient fine-tuning method

	## Contact

	For questions or issues, please open an issue on the model repository.

	---

	Note: This is a research/demonstration model. For production SIEM deployments, additional training on larger, domain-specific datasets is recommended.