Upload SIEM Log Generator - QLoRA fine-tuned Mistral-7B

Browse files

Files changed (8) hide show

README.md +194 -0
adapter_config.json +43 -0
adapter_model.safetensors +3 -0
chat_template.jinja +24 -0
special_tokens_map.json +24 -0
tokenizer.json +0 -0
tokenizer.model +3 -0
tokenizer_config.json +44 -0

README.md ADDED Viewed

	@@ -0,0 +1,194 @@

+---
+license: apache-2.0
+base_model: mistralai/Mistral-7B-Instruct-v0.2
+tags:
+- cybersecurity
+- siem
+- log-analysis
+- threat-detection
+- qlora
+- peft
+- mistral
+- fine-tuned
+datasets:
+- custom
+language:
+- en
+pipeline_tag: text-generation
+library_name: transformers
+---
+# SIEM Log Generator - Mistral 7B QLoRA
+A fine-tuned Mistral-7B model specialized in Security Information and Event Management (SIEM) log analysis and generation. This model has been trained using QLoRA (4-bit quantization) on multiple cybersecurity log sources to understand and generate security-related event data.
+## Model Description
+This model is a specialized variant of Mistral-7B-Instruct fine-tuned for SIEM operations, including:
+- Network traffic analysis (DDoS detection, port scanning)
+- Authentication event monitoring (credential stuffing, brute force)
+- Cloud security events (AWS CloudTrail analysis)
+- System log interpretation
+- MITRE ATT&CK framework mapping
+### Training Data Sources
+The model was trained on a diverse set of security logs:
+- **Network Logs**: CICIDS2017 dataset (DDoS, PortScan patterns)
+- **Authentication Logs**: Risk-based authentication events
+- **System Logs**: Linux/Unix syslog events
+- **Cloud Logs**: AWS CloudTrail security events
+### MITRE ATT&CK Coverage
+The model recognizes and maps events to MITRE ATT&CK techniques:
+- T1499: Endpoint Denial of Service (DDoS)
+- T1046: Network Service Scanning
+- T1110: Brute Force
+- T1110.004: Credential Stuffing
+- T1078.004: Cloud Account Access
+## Training Details
+### Training Configuration
+- **Base Model**: mistralai/Mistral-7B-Instruct-v0.2
+- **Method**: QLoRA (4-bit quantization with LoRA adapters)
+- **LoRA Rank**: 8
+- **LoRA Alpha**: 16
+- **Target Modules**: q_proj, v_proj
+- **Training Samples**: ~500 diverse security events
+- **Batch Size**: 8
+- **Learning Rate**: 5e-4
+- **Precision**: bfloat16
+- **Training Steps**: 50
+### Hardware
+- **GPU**: NVIDIA Tesla T4 (16GB VRAM)
+- **Platform**: Kaggle Notebooks
+- **Training Time**: ~5-10 minutes
+## Usage
+### Installation
+```bash
+pip install transformers peft torch bitsandbytes accelerate
+```
+### Loading the Model
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+from peft import PeftModel
+import torch
+# Load base model with 4-bit quantization
+base_model = "mistralai/Mistral-7B-Instruct-v0.2"
+model = AutoModelForCausalLM.from_pretrained(
+    base_model,
+    load_in_4bit=True,
+    device_map="auto"
+)
+# Load LoRA adapters
+model = PeftModel.from_pretrained(model, "your-username/siem-log-generator-mistral-7b-qlora")
+tokenizer = AutoTokenizer.from_pretrained("your-username/siem-log-generator-mistral-7b-qlora")
+# Generate security event analysis
+prompt = "<s>[INST] event=network attack=DDoS [/INST]"
+inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
+outputs = model.generate(**inputs, max_new_tokens=100)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+### Inference Example
+```python
+# Analyze a security event
+event = "timestamp=2024-01-14T10:30:00Z event=auth user=admin attack=BruteForce"
+prompt = f"<s>[INST] {event} [/INST]"
+inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
+with torch.no_grad():
+    outputs = model.generate(
+        **inputs,
+        max_new_tokens=150,
+        temperature=0.7,
+        top_p=0.9,
+        do_sample=True
+    )
+response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(response)
+```
+## Use Cases
+### 1. Security Event Classification
+Classify incoming logs into attack types or benign traffic.
+### 2. MITRE ATT&CK Mapping
+Automatically map security events to MITRE ATT&CK framework techniques.
+### 3. Log Enrichment
+Generate additional context and metadata for security events.
+### 4. Threat Intelligence
+Analyze patterns and generate threat reports from log data.
+### 5. Training Data Generation
+Create synthetic security logs for testing SIEM systems.
+## Limitations
+- **Training Data**: Model trained on limited samples (~500) for demonstration
+- **Domain Specific**: Optimized for SIEM/security logs, not general purpose
+- **Language**: English only
+- **Real-time**: Not optimized for ultra-low latency applications
+- **Accuracy**: Should be used as an assistive tool, not sole decision-maker
+## Ethical Considerations
+⚠️ **Important Security Notice**:
+- This model is for **defensive cybersecurity purposes only**
+- Do not use for malicious activities or unauthorized access
+- Always comply with applicable laws and regulations
+- Validate all model outputs before taking action
+- Use in conjunction with human security experts
+## Model Card Authors
+Created by the SIEM Research Team
+## Citation
+If you use this model in your research, please cite:
+```bibtex
+@misc{siem-log-generator-2025,
+  author = {Your Name},
+  title = {SIEM Log Generator - Mistral 7B QLoRA},
+  year = {2025},
+  publisher = {HuggingFace},
+  url = {https://huggingface.co/your-username/siem-log-generator-mistral-7b-qlora}
+}
+```
+## License
+This model inherits the Apache 2.0 license from Mistral-7B-Instruct-v0.2.
+## Acknowledgments
+- **Mistral AI** for the base Mistral-7B-Instruct-v0.2 model
+- **CICIDS2017** dataset contributors
+- **Hugging Face** for the model hosting platform
+- **QLoRA paper** authors for the efficient fine-tuning method
+## Contact
+For questions or issues, please open an issue on the model repository.
+---
+**Note**: This is a research/demonstration model. For production SIEM deployments, additional training on larger, domain-specific datasets is recommended.

adapter_config.json ADDED Viewed

	@@ -0,0 +1,43 @@

+{
+  "alora_invocation_tokens": null,
+  "alpha_pattern": {},
+  "arrow_config": null,
+  "auto_mapping": null,
+  "base_model_name_or_path": "mistralai/Mistral-7B-Instruct-v0.2",
+  "bias": "none",
+  "corda_config": null,
+  "ensure_weight_tying": false,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 64,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "peft_version": "0.18.1",
+  "qalora_group_size": 16,
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "v_proj",
+    "o_proj",
+    "k_proj",
+    "q_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5e59459571e16b42f81d4c445dbd900f2db9465537a21b3f6e6d763a11d4c7ae
+size 54560624

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,24 @@

+{%- if messages[0]['role'] == 'system' %}
+    {%- set system_message = messages[0]['content'] %}
+    {%- set loop_messages = messages[1:] %}
+{%- else %}
+    {%- set loop_messages = messages %}
+{%- endif %}
+{{- bos_token }}
+{%- for message in loop_messages %}
+    {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}
+        {{- raise_exception('After the optional system message, conversation roles must alternate user/assistant/user/assistant/...') }}
+    {%- endif %}
+    {%- if message['role'] == 'user' %}
+        {%- if loop.first and system_message is defined %}
+            {{- ' [INST] ' + system_message + '\n\n' + message['content'] + ' [/INST]' }}
+        {%- else %}
+            {{- ' [INST] ' + message['content'] + ' [/INST]' }}
+        {%- endif %}
+    {%- elif message['role'] == 'assistant' %}
+        {{- ' ' + message['content'] + eos_token}}
+    {%- else %}
+        {{- raise_exception('Only user and assistant roles are supported, with the exception of an initial optional system message!') }}
+    {%- endif %}
+{%- endfor %}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "</s>",
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dadfd56d766715c61d2ef780a525ab43b8e6da4de6865bda3d95fdef5e134055
+size 493443

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,44 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "add_prefix_space": null,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [],
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "</s>",
+  "extra_special_tokens": {},
+  "legacy": false,
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "</s>",
+  "sp_model_kwargs": {},
+  "spaces_between_special_tokens": false,
+  "tokenizer_class": "LlamaTokenizer",
+  "unk_token": "<unk>",
+  "use_default_system_prompt": false
+}