sohomn commited on
Commit
11644c2
·
verified ·
1 Parent(s): 07ebf36

Upload SIEM Log Generator - QLoRA fine-tuned Mistral-7B

Browse files
README.md ADDED
@@ -0,0 +1,194 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: mistralai/Mistral-7B-Instruct-v0.2
4
+ tags:
5
+ - cybersecurity
6
+ - siem
7
+ - log-analysis
8
+ - threat-detection
9
+ - qlora
10
+ - peft
11
+ - mistral
12
+ - fine-tuned
13
+ datasets:
14
+ - custom
15
+ language:
16
+ - en
17
+ pipeline_tag: text-generation
18
+ library_name: transformers
19
+ ---
20
+
21
+ # SIEM Log Generator - Mistral 7B QLoRA
22
+
23
+ A fine-tuned Mistral-7B model specialized in Security Information and Event Management (SIEM) log analysis and generation. This model has been trained using QLoRA (4-bit quantization) on multiple cybersecurity log sources to understand and generate security-related event data.
24
+
25
+ ## Model Description
26
+
27
+ This model is a specialized variant of Mistral-7B-Instruct fine-tuned for SIEM operations, including:
28
+ - Network traffic analysis (DDoS detection, port scanning)
29
+ - Authentication event monitoring (credential stuffing, brute force)
30
+ - Cloud security events (AWS CloudTrail analysis)
31
+ - System log interpretation
32
+ - MITRE ATT&CK framework mapping
33
+
34
+ ### Training Data Sources
35
+
36
+ The model was trained on a diverse set of security logs:
37
+ - **Network Logs**: CICIDS2017 dataset (DDoS, PortScan patterns)
38
+ - **Authentication Logs**: Risk-based authentication events
39
+ - **System Logs**: Linux/Unix syslog events
40
+ - **Cloud Logs**: AWS CloudTrail security events
41
+
42
+ ### MITRE ATT&CK Coverage
43
+
44
+ The model recognizes and maps events to MITRE ATT&CK techniques:
45
+ - T1499: Endpoint Denial of Service (DDoS)
46
+ - T1046: Network Service Scanning
47
+ - T1110: Brute Force
48
+ - T1110.004: Credential Stuffing
49
+ - T1078.004: Cloud Account Access
50
+
51
+ ## Training Details
52
+
53
+ ### Training Configuration
54
+ - **Base Model**: mistralai/Mistral-7B-Instruct-v0.2
55
+ - **Method**: QLoRA (4-bit quantization with LoRA adapters)
56
+ - **LoRA Rank**: 8
57
+ - **LoRA Alpha**: 16
58
+ - **Target Modules**: q_proj, v_proj
59
+ - **Training Samples**: ~500 diverse security events
60
+ - **Batch Size**: 8
61
+ - **Learning Rate**: 5e-4
62
+ - **Precision**: bfloat16
63
+ - **Training Steps**: 50
64
+
65
+ ### Hardware
66
+ - **GPU**: NVIDIA Tesla T4 (16GB VRAM)
67
+ - **Platform**: Kaggle Notebooks
68
+ - **Training Time**: ~5-10 minutes
69
+
70
+ ## Usage
71
+
72
+ ### Installation
73
+
74
+ ```bash
75
+ pip install transformers peft torch bitsandbytes accelerate
76
+ ```
77
+
78
+ ### Loading the Model
79
+
80
+ ```python
81
+ from transformers import AutoTokenizer, AutoModelForCausalLM
82
+ from peft import PeftModel
83
+ import torch
84
+
85
+ # Load base model with 4-bit quantization
86
+ base_model = "mistralai/Mistral-7B-Instruct-v0.2"
87
+ model = AutoModelForCausalLM.from_pretrained(
88
+ base_model,
89
+ load_in_4bit=True,
90
+ device_map="auto"
91
+ )
92
+
93
+ # Load LoRA adapters
94
+ model = PeftModel.from_pretrained(model, "your-username/siem-log-generator-mistral-7b-qlora")
95
+ tokenizer = AutoTokenizer.from_pretrained("your-username/siem-log-generator-mistral-7b-qlora")
96
+
97
+ # Generate security event analysis
98
+ prompt = "<s>[INST] event=network attack=DDoS [/INST]"
99
+ inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
100
+ outputs = model.generate(**inputs, max_new_tokens=100)
101
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
102
+ ```
103
+
104
+ ### Inference Example
105
+
106
+ ```python
107
+ # Analyze a security event
108
+ event = "timestamp=2024-01-14T10:30:00Z event=auth user=admin attack=BruteForce"
109
+ prompt = f"<s>[INST] {event} [/INST]"
110
+
111
+ inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
112
+ with torch.no_grad():
113
+ outputs = model.generate(
114
+ **inputs,
115
+ max_new_tokens=150,
116
+ temperature=0.7,
117
+ top_p=0.9,
118
+ do_sample=True
119
+ )
120
+
121
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
122
+ print(response)
123
+ ```
124
+
125
+ ## Use Cases
126
+
127
+ ### 1. Security Event Classification
128
+ Classify incoming logs into attack types or benign traffic.
129
+
130
+ ### 2. MITRE ATT&CK Mapping
131
+ Automatically map security events to MITRE ATT&CK framework techniques.
132
+
133
+ ### 3. Log Enrichment
134
+ Generate additional context and metadata for security events.
135
+
136
+ ### 4. Threat Intelligence
137
+ Analyze patterns and generate threat reports from log data.
138
+
139
+ ### 5. Training Data Generation
140
+ Create synthetic security logs for testing SIEM systems.
141
+
142
+ ## Limitations
143
+
144
+ - **Training Data**: Model trained on limited samples (~500) for demonstration
145
+ - **Domain Specific**: Optimized for SIEM/security logs, not general purpose
146
+ - **Language**: English only
147
+ - **Real-time**: Not optimized for ultra-low latency applications
148
+ - **Accuracy**: Should be used as an assistive tool, not sole decision-maker
149
+
150
+ ## Ethical Considerations
151
+
152
+ ⚠️ **Important Security Notice**:
153
+ - This model is for **defensive cybersecurity purposes only**
154
+ - Do not use for malicious activities or unauthorized access
155
+ - Always comply with applicable laws and regulations
156
+ - Validate all model outputs before taking action
157
+ - Use in conjunction with human security experts
158
+
159
+ ## Model Card Authors
160
+
161
+ Created by the SIEM Research Team
162
+
163
+ ## Citation
164
+
165
+ If you use this model in your research, please cite:
166
+
167
+ ```bibtex
168
+ @misc{siem-log-generator-2025,
169
+ author = {Your Name},
170
+ title = {SIEM Log Generator - Mistral 7B QLoRA},
171
+ year = {2025},
172
+ publisher = {HuggingFace},
173
+ url = {https://huggingface.co/your-username/siem-log-generator-mistral-7b-qlora}
174
+ }
175
+ ```
176
+
177
+ ## License
178
+
179
+ This model inherits the Apache 2.0 license from Mistral-7B-Instruct-v0.2.
180
+
181
+ ## Acknowledgments
182
+
183
+ - **Mistral AI** for the base Mistral-7B-Instruct-v0.2 model
184
+ - **CICIDS2017** dataset contributors
185
+ - **Hugging Face** for the model hosting platform
186
+ - **QLoRA paper** authors for the efficient fine-tuning method
187
+
188
+ ## Contact
189
+
190
+ For questions or issues, please open an issue on the model repository.
191
+
192
+ ---
193
+
194
+ **Note**: This is a research/demonstration model. For production SIEM deployments, additional training on larger, domain-specific datasets is recommended.
adapter_config.json ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "mistralai/Mistral-7B-Instruct-v0.2",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 64,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.05,
22
+ "megatron_config": null,
23
+ "megatron_core": "megatron.core",
24
+ "modules_to_save": null,
25
+ "peft_type": "LORA",
26
+ "peft_version": "0.18.1",
27
+ "qalora_group_size": 16,
28
+ "r": 32,
29
+ "rank_pattern": {},
30
+ "revision": null,
31
+ "target_modules": [
32
+ "v_proj",
33
+ "o_proj",
34
+ "k_proj",
35
+ "q_proj"
36
+ ],
37
+ "target_parameters": null,
38
+ "task_type": "CAUSAL_LM",
39
+ "trainable_token_indices": null,
40
+ "use_dora": false,
41
+ "use_qalora": false,
42
+ "use_rslora": false
43
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5e59459571e16b42f81d4c445dbd900f2db9465537a21b3f6e6d763a11d4c7ae
3
+ size 54560624
chat_template.jinja ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if messages[0]['role'] == 'system' %}
2
+ {%- set system_message = messages[0]['content'] %}
3
+ {%- set loop_messages = messages[1:] %}
4
+ {%- else %}
5
+ {%- set loop_messages = messages %}
6
+ {%- endif %}
7
+
8
+ {{- bos_token }}
9
+ {%- for message in loop_messages %}
10
+ {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}
11
+ {{- raise_exception('After the optional system message, conversation roles must alternate user/assistant/user/assistant/...') }}
12
+ {%- endif %}
13
+ {%- if message['role'] == 'user' %}
14
+ {%- if loop.first and system_message is defined %}
15
+ {{- ' [INST] ' + system_message + '\n\n' + message['content'] + ' [/INST]' }}
16
+ {%- else %}
17
+ {{- ' [INST] ' + message['content'] + ' [/INST]' }}
18
+ {%- endif %}
19
+ {%- elif message['role'] == 'assistant' %}
20
+ {{- ' ' + message['content'] + eos_token}}
21
+ {%- else %}
22
+ {{- raise_exception('Only user and assistant roles are supported, with the exception of an initial optional system message!') }}
23
+ {%- endif %}
24
+ {%- endfor %}
special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "</s>",
17
+ "unk_token": {
18
+ "content": "<unk>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dadfd56d766715c61d2ef780a525ab43b8e6da4de6865bda3d95fdef5e134055
3
+ size 493443
tokenizer_config.json ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "add_prefix_space": null,
5
+ "added_tokens_decoder": {
6
+ "0": {
7
+ "content": "<unk>",
8
+ "lstrip": false,
9
+ "normalized": false,
10
+ "rstrip": false,
11
+ "single_word": false,
12
+ "special": true
13
+ },
14
+ "1": {
15
+ "content": "<s>",
16
+ "lstrip": false,
17
+ "normalized": false,
18
+ "rstrip": false,
19
+ "single_word": false,
20
+ "special": true
21
+ },
22
+ "2": {
23
+ "content": "</s>",
24
+ "lstrip": false,
25
+ "normalized": false,
26
+ "rstrip": false,
27
+ "single_word": false,
28
+ "special": true
29
+ }
30
+ },
31
+ "additional_special_tokens": [],
32
+ "bos_token": "<s>",
33
+ "clean_up_tokenization_spaces": false,
34
+ "eos_token": "</s>",
35
+ "extra_special_tokens": {},
36
+ "legacy": false,
37
+ "model_max_length": 1000000000000000019884624838656,
38
+ "pad_token": "</s>",
39
+ "sp_model_kwargs": {},
40
+ "spaces_between_special_tokens": false,
41
+ "tokenizer_class": "LlamaTokenizer",
42
+ "unk_token": "<unk>",
43
+ "use_default_system_prompt": false
44
+ }