HassanShehata commited on
Commit
b362ce4
·
verified ·
1 Parent(s): f644e61

Upload 10 files

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,252 +1,210 @@
1
  ---
2
- license: apache-2.0
3
- base_model:
4
- - Qwen/Qwen3-0.6B
5
  pipeline_tag: text-generation
6
  tags:
7
- - cybersecurity
8
- - siem
9
- - log-analysis
10
- - field-extraction
11
- - security-automation
12
- - fine-tuned
13
  ---
14
 
15
- # LLMSIEM/logem
 
 
 
16
 
17
- LLMSIEM/logem is a specialized language model fine-tuned for Security Information and Event Management (SIEM) tasks, particularly excelling at structured field extraction from security logs and events.
18
 
19
  ## Model Details
20
 
21
  ### Model Description
22
 
23
- LLMSIEM/logem is a fine-tuned version of Qwen3-0.6B, specifically optimized for cybersecurity applications. The model demonstrates that targeted fine-tuning can dramatically improve performance on domain-specific tasks, achieving superior results compared to much larger general-purpose models.
 
24
 
25
- - **Developed by:** [Your Name/Organization]
26
- - **Model type:** Causal Language Model (Fine-tuned)
27
- - **Language(s):** English
28
- - **License:** Apache 2.0
29
- - **Finetuned from model:** Qwen/Qwen3-0.6B
30
- - **Model size:** 1.2 GB (FP16), 396 MB (Q4_K_M quantized)
31
- - **Parameters:** 0.6B
32
 
33
- ### Model Sources
 
 
 
 
 
 
34
 
35
- - **Repository:** [Your GitHub Repository]
36
- - **Paper:** [Research Paper Link if available]
37
- - **Blog Post:** [LinkedIn/Blog Series Link]
38
 
39
- ## Performance Highlights
40
 
41
- 🏆 **Best-in-class performance** for SIEM field extraction tasks:
42
- - **66.7% perfect matches** (FP16 version)
43
- - **0.833 F1 score** - outperforms 12B parameter models
44
- - **1.00s average response time** - 3x faster than larger alternatives
45
- - **Zero complete failures** on standardized test suite
46
 
47
  ## Uses
48
 
49
- ### Direct Use
50
 
51
- The model is designed for cybersecurity professionals and SIEM engineers who need to:
52
- - Extract structured fields from security logs
53
- - Parse and normalize security event data
54
- - Automate log analysis workflows
55
- - Generate structured outputs from unstructured security data
56
 
57
- ### Example Use Cases
58
 
59
- ```python
60
- # Example: Extract fields from a security log
61
- input_text = "Extract fields from: Failed login attempt from 192.168.1.100 for user admin at 2024-01-15T10:30:45Z"
62
 
63
- # Model will output structured JSON with relevant fields:
64
- # {
65
- # "event_type": "failed_login",
66
- # "source_ip": "192.168.1.100",
67
- # "username": "admin",
68
- # "timestamp": "2024-01-15T10:30:45Z"
69
- # }
70
- ```
71
 
72
- ### Downstream Use
73
 
74
- - Integration into SIEM platforms (Splunk, ELK, QRadar)
75
- - Security orchestration and automated response (SOAR) workflows
76
- - Threat hunting and incident response automation
77
- - Security data lake processing pipelines
78
 
79
  ### Out-of-Scope Use
80
 
81
- - General-purpose text generation
82
- - Non-security related field extraction
83
- - Real-time processing without proper input validation
84
- - Decision-making for critical security responses without human oversight
85
 
86
  ## Bias, Risks, and Limitations
87
 
88
- ### Technical Limitations
89
- - Optimized specifically for security log formats seen during training
90
- - May struggle with completely novel log formats or schemas
91
- - Performance may degrade on logs with unusual encoding or formatting
92
- - Quantized version (Q4_K_M) shows 5% accuracy reduction vs FP16
93
 
94
- ### Security Considerations
95
- - Model outputs should be validated before use in automated security workflows
96
- - Not suitable for real-time critical security decisions without human oversight
97
- - Training data may contain biases from specific security environments
98
- - Should not be the sole source of truth for security incident classification
99
 
100
  ### Recommendations
101
 
102
- - Always validate model outputs in production security environments
103
- - Implement fallback mechanisms for handling novel or malformed inputs
104
- - Regular retraining recommended as new log formats emerge
105
- - Use FP16 version for maximum accuracy, Q4_K_M for resource-constrained deployments
106
 
107
- ## How to Get Started with the Model
108
-
109
- ```python
110
- from transformers import AutoTokenizer, AutoModelForCausalLM
111
- import torch
112
 
113
- # Load the model and tokenizer
114
- tokenizer = AutoTokenizer.from_pretrained("LLMSIEM/logem")
115
- model = AutoModelForCausalLM.from_pretrained("LLMSIEM/logem")
116
 
117
- # Example usage
118
- prompt = "Extract security fields from the following log: [your log here]"
119
- inputs = tokenizer(prompt, return_tensors="pt")
120
 
121
- with torch.no_grad():
122
- outputs = model.generate(
123
- inputs.input_ids,
124
- max_length=512,
125
- temperature=0.1,
126
- do_sample=False,
127
- pad_token_id=tokenizer.eos_token_id
128
- )
129
 
130
- result = tokenizer.decode(outputs[0], skip_special_tokens=True)
131
- print(result)
132
- ```
133
 
134
- ### Using with Ollama (Recommended for Production)
135
 
136
- ```bash
137
- # Pull the quantized version
138
- ollama pull LLMSIEM/logem
139
 
140
- # Run inference
141
- ollama run LLMSIEM/logem "Extract fields from: SSH login from 10.0.0.5 by root"
142
- ```
143
 
144
- ## Training Details
145
 
146
- ### Training Data
147
 
148
- The model was fine-tuned on a curated dataset of security logs and corresponding structured field extractions, including:
149
- - Network security events (firewall, IDS/IPS)
150
- - Authentication logs (successful/failed logins)
151
- - System security events (file access, process execution)
152
- - Application security logs (web servers, databases)
153
 
154
- Dataset characteristics:
155
- - 21 standardized test cases for evaluation
156
- - Diverse log formats and security event types
157
- - JSON-formatted target outputs for structured field extraction
158
 
159
- ### Training Procedure
160
 
161
  #### Training Hyperparameters
162
 
163
- - **Base model:** Qwen3-0.6B
164
- - **Training regime:** Mixed precision (fp16)
165
- - **Fine-tuning approach:** Supervised fine-tuning on field extraction tasks
166
- - **Optimization:** Task-specific training for SIEM applications
167
 
168
- #### Model Variants
169
 
170
- - **FP16 Version:** 1.2 GB, maximum accuracy (0.833 F1)
171
- - **Q4_K_M Quantized:** 396 MB, production-optimized (0.800 F1)
 
172
 
173
  ## Evaluation
174
 
 
 
175
  ### Testing Data, Factors & Metrics
176
 
177
  #### Testing Data
178
- - 21 standardized security log parsing test cases
179
- - Diverse log formats from multiple security tools
180
- - Ground truth structured outputs for comparison
 
 
 
 
 
 
 
181
 
182
  #### Metrics
183
- - **Perfect Match Rate:** Percentage of test cases with 100% accurate field extraction
184
- - **F1 Score:** Harmonic mean of precision and recall for field detection
185
- - **Precision:** Accuracy of extracted fields
186
- - **Response Time:** Average inference latency
187
 
188
  ### Results
189
 
190
- | Model | Perfect Matches | Avg F1 | Precision | Speed | Size |
191
- |-------|----------------|---------|-----------|-------|------|
192
- | **LLMSIEM/logem (FP16)** | **14/21 (66.7%)** | **0.833** | **0.848** | **1.00s** | **1.2 GB** |
193
- | LLMSIEM/logem (Q4_K_M) | 13/21 (61.9%) | 0.800 | 0.819 | 1.00s | 396 MB |
194
- | Gemma:12B | 15/21 (71.4%) | 0.790 | 0.788 | 3.06s | 5.0 GB |
195
- | Qwen3:0.6B (base) | 9/21 (42.9%) | 0.651 | 0.636 | 1.57s | 522 MB |
 
 
 
196
 
197
- #### Key Findings
198
- - **+28% F1 improvement** over base Qwen3-0.6B model
199
- - **Outperforms 12B models** in F1 score despite being 20x smaller
200
- - **3x faster** than comparable accuracy models
201
- - **12.6x smaller** than Gemma while maintaining superior performance
202
 
203
  ## Environmental Impact
204
 
205
- Training a specialized 0.6B parameter model requires significantly less computational resources compared to training larger models from scratch:
206
 
207
- - **Hardware Type:** NVIDIA GPU (specific details TBD)
208
- - **Training approach:** Fine-tuning (more efficient than training from scratch)
209
- - **Base model efficiency:** Starting from pre-trained Qwen3-0.6B reduces carbon footprint
210
- - **Production efficiency:** Smaller model size reduces inference energy consumption
211
 
212
- ## Technical Specifications
 
 
 
 
213
 
214
- ### Model Architecture
215
- - **Architecture:** Transformer decoder (Qwen3 family)
216
- - **Parameters:** 0.6 billion
217
- - **Context length:** [Inherited from Qwen3-0.6B]
218
- - **Vocabulary size:** [Inherited from Qwen3-0.6B]
219
 
220
  ### Compute Infrastructure
221
- - **Training:** Fine-tuning on security-specific datasets
222
- - **Inference:** Optimized for CPU and GPU deployment
223
- - **Quantization:** GGML Q4_K_M for edge deployment
224
 
225
- ## Citation
226
 
227
- If you use this model in your research or applications, please cite:
228
 
229
- ```bibtex
230
- @misc{llmsiem-logem-2024,
231
- title={LLMSIEM/logem: A Fine-tuned Language Model for Security Log Analysis},
232
- author={[Your Name]},
233
- year={2024},
234
- url={https://huggingface.co/LLMSIEM/logem},
235
- note={Fine-tuned from Qwen3-0.6B for SIEM applications}
236
- }
237
- ```
238
 
239
- ## Model Card Authors
240
 
241
- [Your Name/Organization]
242
 
243
- ## Model Card Contact
244
 
245
- For questions about this model, please contact:
246
- - **Email:** [your-email@domain.com]
247
- - **LinkedIn:** [Your LinkedIn Profile]
248
- - **GitHub:** [Your GitHub Profile]
249
 
250
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
251
 
252
- *This model is part of the LLMSIEM research series exploring the application of Large Language Models in cybersecurity and SIEM workflows.*
 
1
  ---
2
+ base_model: unsloth/qwen3-0.6b-bnb-4bit
3
+ library_name: peft
 
4
  pipeline_tag: text-generation
5
  tags:
6
+ - base_model:adapter:unsloth/qwen3-0.6b-bnb-4bit
7
+ - lora
8
+ - sft
9
+ - transformers
10
+ - trl
11
+ - unsloth
12
  ---
13
 
14
+ # Model Card for Model ID
15
+
16
+ <!-- Provide a quick summary of what the model is/does. -->
17
+
18
 
 
19
 
20
  ## Model Details
21
 
22
  ### Model Description
23
 
24
+ <!-- Provide a longer summary of what this model is. -->
25
+
26
 
 
 
 
 
 
 
 
27
 
28
+ - **Developed by:** [More Information Needed]
29
+ - **Funded by [optional]:** [More Information Needed]
30
+ - **Shared by [optional]:** [More Information Needed]
31
+ - **Model type:** [More Information Needed]
32
+ - **Language(s) (NLP):** [More Information Needed]
33
+ - **License:** [More Information Needed]
34
+ - **Finetuned from model [optional]:** [More Information Needed]
35
 
36
+ ### Model Sources [optional]
 
 
37
 
38
+ <!-- Provide the basic links for the model. -->
39
 
40
+ - **Repository:** [More Information Needed]
41
+ - **Paper [optional]:** [More Information Needed]
42
+ - **Demo [optional]:** [More Information Needed]
 
 
43
 
44
  ## Uses
45
 
46
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
47
 
48
+ ### Direct Use
 
 
 
 
49
 
50
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
51
 
52
+ [More Information Needed]
 
 
53
 
54
+ ### Downstream Use [optional]
 
 
 
 
 
 
 
55
 
56
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
57
 
58
+ [More Information Needed]
 
 
 
59
 
60
  ### Out-of-Scope Use
61
 
62
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
63
+
64
+ [More Information Needed]
 
65
 
66
  ## Bias, Risks, and Limitations
67
 
68
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
 
 
 
 
69
 
70
+ [More Information Needed]
 
 
 
 
71
 
72
  ### Recommendations
73
 
74
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
 
 
 
75
 
76
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
 
 
 
 
77
 
78
+ ## How to Get Started with the Model
 
 
79
 
80
+ Use the code below to get started with the model.
 
 
81
 
82
+ [More Information Needed]
 
 
 
 
 
 
 
83
 
84
+ ## Training Details
 
 
85
 
86
+ ### Training Data
87
 
88
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
 
 
89
 
90
+ [More Information Needed]
 
 
91
 
92
+ ### Training Procedure
93
 
94
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
95
 
96
+ #### Preprocessing [optional]
 
 
 
 
97
 
98
+ [More Information Needed]
 
 
 
99
 
 
100
 
101
  #### Training Hyperparameters
102
 
103
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
 
 
 
104
 
105
+ #### Speeds, Sizes, Times [optional]
106
 
107
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
108
+
109
+ [More Information Needed]
110
 
111
  ## Evaluation
112
 
113
+ <!-- This section describes the evaluation protocols and provides the results. -->
114
+
115
  ### Testing Data, Factors & Metrics
116
 
117
  #### Testing Data
118
+
119
+ <!-- This should link to a Dataset Card if possible. -->
120
+
121
+ [More Information Needed]
122
+
123
+ #### Factors
124
+
125
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
126
+
127
+ [More Information Needed]
128
 
129
  #### Metrics
130
+
131
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
132
+
133
+ [More Information Needed]
134
 
135
  ### Results
136
 
137
+ [More Information Needed]
138
+
139
+ #### Summary
140
+
141
+
142
+
143
+ ## Model Examination [optional]
144
+
145
+ <!-- Relevant interpretability work for the model goes here -->
146
 
147
+ [More Information Needed]
 
 
 
 
148
 
149
  ## Environmental Impact
150
 
151
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
152
 
153
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
 
 
 
154
 
155
+ - **Hardware Type:** [More Information Needed]
156
+ - **Hours used:** [More Information Needed]
157
+ - **Cloud Provider:** [More Information Needed]
158
+ - **Compute Region:** [More Information Needed]
159
+ - **Carbon Emitted:** [More Information Needed]
160
 
161
+ ## Technical Specifications [optional]
162
+
163
+ ### Model Architecture and Objective
164
+
165
+ [More Information Needed]
166
 
167
  ### Compute Infrastructure
 
 
 
168
 
169
+ [More Information Needed]
170
 
171
+ #### Hardware
172
 
173
+ [More Information Needed]
 
 
 
 
 
 
 
 
174
 
175
+ #### Software
176
 
177
+ [More Information Needed]
178
 
179
+ ## Citation [optional]
180
 
181
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
 
 
 
182
 
183
+ **BibTeX:**
184
+
185
+ [More Information Needed]
186
+
187
+ **APA:**
188
+
189
+ [More Information Needed]
190
+
191
+ ## Glossary [optional]
192
+
193
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
194
+
195
+ [More Information Needed]
196
+
197
+ ## More Information [optional]
198
+
199
+ [More Information Needed]
200
+
201
+ ## Model Card Authors [optional]
202
+
203
+ [More Information Needed]
204
+
205
+ ## Model Card Contact
206
+
207
+ [More Information Needed]
208
+ ### Framework versions
209
 
210
+ - PEFT 0.17.0
adapter_config.json ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "unsloth/qwen3-0.6b-bnb-4bit",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 16,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "qalora_group_size": 16,
24
+ "r": 16,
25
+ "rank_pattern": {},
26
+ "revision": null,
27
+ "target_modules": [
28
+ "down_proj",
29
+ "up_proj",
30
+ "v_proj",
31
+ "o_proj",
32
+ "gate_proj",
33
+ "q_proj",
34
+ "k_proj"
35
+ ],
36
+ "target_parameters": null,
37
+ "task_type": "CAUSAL_LM",
38
+ "trainable_token_indices": null,
39
+ "use_dora": false,
40
+ "use_qalora": false,
41
+ "use_rslora": false
42
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b40ed4316708bff050f2cc842c43fc21962c03ed62257e500bf4ea7aaa1a53f6
3
+ size 40422168
added_tokens.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</think>": 151668,
3
+ "</tool_call>": 151658,
4
+ "</tool_response>": 151666,
5
+ "<think>": 151667,
6
+ "<tool_call>": 151657,
7
+ "<tool_response>": 151665,
8
+ "<|box_end|>": 151649,
9
+ "<|box_start|>": 151648,
10
+ "<|endoftext|>": 151643,
11
+ "<|file_sep|>": 151664,
12
+ "<|fim_middle|>": 151660,
13
+ "<|fim_pad|>": 151662,
14
+ "<|fim_prefix|>": 151659,
15
+ "<|fim_suffix|>": 151661,
16
+ "<|im_end|>": 151645,
17
+ "<|im_start|>": 151644,
18
+ "<|image_pad|>": 151655,
19
+ "<|object_ref_end|>": 151647,
20
+ "<|object_ref_start|>": 151646,
21
+ "<|quad_end|>": 151651,
22
+ "<|quad_start|>": 151650,
23
+ "<|repo_name|>": 151663,
24
+ "<|video_pad|>": 151656,
25
+ "<|vision_end|>": 151653,
26
+ "<|vision_pad|>": 151654,
27
+ "<|vision_start|>": 151652
28
+ }
chat_template.jinja ADDED
@@ -0,0 +1,99 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if tools %}
2
+ {{- '<|im_start|>system\n' }}
3
+ {%- if messages[0].role == 'system' %}
4
+ {{- messages[0].content + '\n\n' }}
5
+ {%- endif %}
6
+ {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
7
+ {%- for tool in tools %}
8
+ {{- "\n" }}
9
+ {{- tool | tojson }}
10
+ {%- endfor %}
11
+ {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
12
+ {%- else %}
13
+ {%- if messages[0].role == 'system' %}
14
+ {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
15
+ {%- endif %}
16
+ {%- endif %}
17
+ {%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
18
+ {%- for forward_message in messages %}
19
+ {%- set index = (messages|length - 1) - loop.index0 %}
20
+ {%- set message = messages[index] %}
21
+ {%- set current_content = message.content if message.content is defined and message.content is not none else '' %}
22
+ {%- set tool_start = '<tool_response>' %}
23
+ {%- set tool_start_length = tool_start|length %}
24
+ {%- set start_of_message = current_content[:tool_start_length] %}
25
+ {%- set tool_end = '</tool_response>' %}
26
+ {%- set tool_end_length = tool_end|length %}
27
+ {%- set start_pos = (current_content|length) - tool_end_length %}
28
+ {%- if start_pos < 0 %}
29
+ {%- set start_pos = 0 %}
30
+ {%- endif %}
31
+ {%- set end_of_message = current_content[start_pos:] %}
32
+ {%- if ns.multi_step_tool and message.role == "user" and not(start_of_message == tool_start and end_of_message == tool_end) %}
33
+ {%- set ns.multi_step_tool = false %}
34
+ {%- set ns.last_query_index = index %}
35
+ {%- endif %}
36
+ {%- endfor %}
37
+ {%- for message in messages %}
38
+ {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
39
+ {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
40
+ {%- elif message.role == "assistant" %}
41
+ {%- set m_content = message.content if message.content is defined and message.content is not none else '' %}
42
+ {%- set content = m_content %}
43
+ {%- set reasoning_content = '' %}
44
+ {%- if message.reasoning_content is defined and message.reasoning_content is not none %}
45
+ {%- set reasoning_content = message.reasoning_content %}
46
+ {%- else %}
47
+ {%- if '</think>' in m_content %}
48
+ {%- set content = (m_content.split('</think>')|last).lstrip('\n') %}
49
+ {%- set reasoning_content = (m_content.split('</think>')|first).rstrip('\n') %}
50
+ {%- set reasoning_content = (reasoning_content.split('<think>')|last).lstrip('\n') %}
51
+ {%- endif %}
52
+ {%- endif %}
53
+ {%- if loop.index0 > ns.last_query_index %}
54
+ {%- if loop.last or (not loop.last and (not reasoning_content.strip() == '')) %}
55
+ {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
56
+ {%- else %}
57
+ {{- '<|im_start|>' + message.role + '\n' + content }}
58
+ {%- endif %}
59
+ {%- else %}
60
+ {{- '<|im_start|>' + message.role + '\n' + content }}
61
+ {%- endif %}
62
+ {%- if message.tool_calls %}
63
+ {%- for tool_call in message.tool_calls %}
64
+ {%- if (loop.first and content) or (not loop.first) %}
65
+ {{- '\n' }}
66
+ {%- endif %}
67
+ {%- if tool_call.function %}
68
+ {%- set tool_call = tool_call.function %}
69
+ {%- endif %}
70
+ {{- '<tool_call>\n{"name": "' }}
71
+ {{- tool_call.name }}
72
+ {{- '", "arguments": ' }}
73
+ {%- if tool_call.arguments is string %}
74
+ {{- tool_call.arguments }}
75
+ {%- else %}
76
+ {{- tool_call.arguments | tojson }}
77
+ {%- endif %}
78
+ {{- '}\n</tool_call>' }}
79
+ {%- endfor %}
80
+ {%- endif %}
81
+ {{- '<|im_end|>\n' }}
82
+ {%- elif message.role == "tool" %}
83
+ {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
84
+ {{- '<|im_start|>user' }}
85
+ {%- endif %}
86
+ {{- '\n<tool_response>\n' }}
87
+ {{- message.content }}
88
+ {{- '\n</tool_response>' }}
89
+ {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
90
+ {{- '<|im_end|>\n' }}
91
+ {%- endif %}
92
+ {%- endif %}
93
+ {%- endfor %}
94
+ {%- if add_generation_prompt %}
95
+ {{- '<|im_start|>assistant\n' }}
96
+ {%- if enable_thinking is defined and enable_thinking is false %}
97
+ {{- '<think>\n\n</think>\n\n' }}
98
+ {%- endif %}
99
+ {%- endif %}
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
special_tokens_map.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "eos_token": {
18
+ "content": "<|im_end|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "pad_token": {
25
+ "content": "<|vision_pad|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ }
31
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aeb13307a71acd8fe81861d94ad54ab689df773318809eed3cbe794b4492dae4
3
+ size 11422654
tokenizer_config.json ADDED
@@ -0,0 +1,240 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ },
181
+ "151665": {
182
+ "content": "<tool_response>",
183
+ "lstrip": false,
184
+ "normalized": false,
185
+ "rstrip": false,
186
+ "single_word": false,
187
+ "special": false
188
+ },
189
+ "151666": {
190
+ "content": "</tool_response>",
191
+ "lstrip": false,
192
+ "normalized": false,
193
+ "rstrip": false,
194
+ "single_word": false,
195
+ "special": false
196
+ },
197
+ "151667": {
198
+ "content": "<think>",
199
+ "lstrip": false,
200
+ "normalized": false,
201
+ "rstrip": false,
202
+ "single_word": false,
203
+ "special": false
204
+ },
205
+ "151668": {
206
+ "content": "</think>",
207
+ "lstrip": false,
208
+ "normalized": false,
209
+ "rstrip": false,
210
+ "single_word": false,
211
+ "special": false
212
+ }
213
+ },
214
+ "additional_special_tokens": [
215
+ "<|im_start|>",
216
+ "<|im_end|>",
217
+ "<|object_ref_start|>",
218
+ "<|object_ref_end|>",
219
+ "<|box_start|>",
220
+ "<|box_end|>",
221
+ "<|quad_start|>",
222
+ "<|quad_end|>",
223
+ "<|vision_start|>",
224
+ "<|vision_end|>",
225
+ "<|vision_pad|>",
226
+ "<|image_pad|>",
227
+ "<|video_pad|>"
228
+ ],
229
+ "bos_token": null,
230
+ "clean_up_tokenization_spaces": false,
231
+ "eos_token": "<|im_end|>",
232
+ "errors": "replace",
233
+ "extra_special_tokens": {},
234
+ "model_max_length": 40960,
235
+ "pad_token": "<|vision_pad|>",
236
+ "padding_side": "right",
237
+ "split_special_tokens": false,
238
+ "tokenizer_class": "Qwen2Tokenizer",
239
+ "unk_token": null
240
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff