ykhrustalev commited on
Commit
0948ce9
·
verified ·
1 Parent(s): 298152f

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,7 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ onnx/model.onnx_data filter=lfs diff=lfs merge=lfs -text
37
+ onnx/model_fp16.onnx_data filter=lfs diff=lfs merge=lfs -text
38
+ onnx/model_q4.onnx_data filter=lfs diff=lfs merge=lfs -text
39
+ onnx/model_q4f16.onnx_data filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,257 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: lfm1.0
4
+ license_link: LICENSE
5
+ language:
6
+ - en
7
+ - ja
8
+ - ko
9
+ - fr
10
+ - es
11
+ - de
12
+ - it
13
+ - pt
14
+ - ar
15
+ - zh
16
+ pipeline_tag: text-generation
17
+ tags:
18
+ - liquid
19
+ - edge
20
+ - lfm2.5
21
+ - onnx
22
+ - onnxruntime
23
+ - webgpu
24
+ base_model:
25
+ - LiquidAI/LFM2.5-350M
26
+ ---
27
+
28
+ ![Liquid AI](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/2b08LKpev0DNEk6DlnWkY.png)
29
+
30
+ [**Try LFM**](https://playground.liquid.ai/) • [**Documentation**](https://docs.liquid.ai/lfm) • [**LEAP**](https://leap.liquid.ai/) • [**Blog**](https://www.liquid.ai/blog/)
31
+
32
+ ## LFM2.5-350M-ONNX
33
+
34
+ ONNX export of [LFM2.5-350M](https://huggingface.co/LiquidAI/LFM2.5-350M) for cross-platform inference.
35
+
36
+ ## Variants
37
+
38
+ | Variant | Size | Description |
39
+ |---------|------|-------------|
40
+ | FP16 | ~692MB | All weights in FP16 |
41
+ | Q4 | ~298MB | INT4 MatMul weights (including lm_head), FP16 embeddings and norms |
42
+ | Q4F16 | ~298MB | INT4 MatMul weights, FP16 lm_head/embeddings and norms |
43
+
44
+ Q4 and Q4F16 use symmetric block-wise quantization (block_size=32) via MatMulNBits.
45
+ The difference is that Q4 also quantizes the lm_head projection to INT4, while Q4F16
46
+ keeps it in FP16. Both store the token embedding (Gather) and normalization weights in FP16.
47
+
48
+ ## Model Files
49
+
50
+ ```
51
+ onnx/
52
+ ├── model.onnx # FP32
53
+ ├── model_fp16.onnx # FP16
54
+ ├── model_q4.onnx # Q4
55
+ └── model_q4f16.onnx # Q4F16
56
+ ```
57
+
58
+ ## Python
59
+
60
+ ### Installation
61
+
62
+ ```bash
63
+ pip install onnxruntime transformers numpy huggingface_hub
64
+ # or with GPU support:
65
+ pip install onnxruntime-gpu transformers numpy huggingface_hub
66
+ ```
67
+
68
+ ### Inference
69
+
70
+ ```python
71
+ import numpy as np
72
+ import onnxruntime as ort
73
+ from huggingface_hub import hf_hub_download
74
+ from transformers import AutoTokenizer
75
+
76
+ # Download model
77
+ model_id = "LiquidAI/LFM2.5-350M-ONNX"
78
+ model_path = hf_hub_download(model_id, "onnx/model_q4.onnx")
79
+ data_path = hf_hub_download(model_id, "onnx/model_q4.onnx_data")
80
+
81
+ # Load model and tokenizer
82
+ session = ort.InferenceSession(model_path)
83
+ tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
84
+
85
+ # Prepare chat input
86
+ messages = [{"role": "user", "content": "What is the capital of France?"}]
87
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
88
+ input_ids = np.array([tokenizer.encode(prompt, add_special_tokens=False)], dtype=np.int64)
89
+
90
+ # Initialize KV cache
91
+ ONNX_DTYPE = {"tensor(float)": np.float32, "tensor(float16)": np.float16, "tensor(int64)": np.int64}
92
+ cache = {}
93
+ for inp in session.get_inputs():
94
+ if inp.name in {"input_ids", "attention_mask", "position_ids"}:
95
+ continue
96
+ shape = [d if isinstance(d, int) else 1 for d in inp.shape]
97
+ for i, d in enumerate(inp.shape):
98
+ if isinstance(d, str) and "sequence" in d.lower():
99
+ shape[i] = 0
100
+ cache[inp.name] = np.zeros(shape, dtype=ONNX_DTYPE.get(inp.type, np.float32))
101
+
102
+ # Check if model uses position_ids
103
+ input_names = {inp.name for inp in session.get_inputs()}
104
+ use_position_ids = "position_ids" in input_names
105
+
106
+ # Generate tokens
107
+ seq_len = input_ids.shape[1]
108
+ generated_tokens = []
109
+
110
+ for step in range(512): # max tokens
111
+ if step == 0:
112
+ ids = input_ids
113
+ pos = np.arange(seq_len, dtype=np.int64).reshape(1, -1)
114
+ else:
115
+ ids = np.array([[generated_tokens[-1]]], dtype=np.int64)
116
+ pos = np.array([[seq_len + len(generated_tokens) - 1]], dtype=np.int64)
117
+
118
+ attn_mask = np.ones((1, seq_len + len(generated_tokens)), dtype=np.int64)
119
+ feed = {"input_ids": ids, "attention_mask": attn_mask, **cache}
120
+ if use_position_ids:
121
+ feed["position_ids"] = pos
122
+
123
+ outputs = session.run(None, feed)
124
+ next_token = int(np.argmax(outputs[0][0, -1]))
125
+ generated_tokens.append(next_token)
126
+
127
+ # Update cache
128
+ for i, out in enumerate(session.get_outputs()[1:], 1):
129
+ name = out.name.replace("present_conv", "past_conv").replace("present.", "past_key_values.")
130
+ if name in cache:
131
+ cache[name] = outputs[i]
132
+
133
+ if next_token == tokenizer.eos_token_id:
134
+ break
135
+
136
+ print(tokenizer.decode(generated_tokens, skip_special_tokens=True))
137
+ ```
138
+
139
+ ## WebGPU (Browser)
140
+
141
+ ### Installation
142
+
143
+ ```bash
144
+ npm install onnxruntime-web @huggingface/transformers
145
+ ```
146
+
147
+ ### Enable WebGPU
148
+
149
+ WebGPU is required for browser inference. To enable:
150
+
151
+ 1. **Chrome/Edge**: Navigate to `chrome://flags/#enable-unsafe-webgpu`, enable, and restart
152
+ 2. **Verify**: Check `chrome://gpu` for "WebGPU" status
153
+ 3. **Test**: Run `navigator.gpu.requestAdapter()` in DevTools console
154
+
155
+ ### Inference
156
+
157
+ ```javascript
158
+ import * as ort from "onnxruntime-web/webgpu";
159
+ import { AutoTokenizer } from "@huggingface/transformers";
160
+
161
+ // Check WebGPU availability
162
+ if (!navigator.gpu) {
163
+ throw new Error("WebGPU not available. Enable at chrome://flags/#enable-unsafe-webgpu");
164
+ }
165
+ const adapter = await navigator.gpu.requestAdapter();
166
+ if (!adapter) {
167
+ throw new Error("WebGPU adapter not found. Check chrome://gpu for status.");
168
+ }
169
+
170
+ ort.env.wasm.numThreads = 1;
171
+
172
+ const modelId = "LiquidAI/LFM2.5-350M-ONNX";
173
+ const modelBase = `https://huggingface.co/${modelId}/resolve/main`;
174
+
175
+ // Load tokenizer
176
+ const tokenizer = await AutoTokenizer.from_pretrained(modelId);
177
+
178
+ // Load ONNX session with external data
179
+ const onnxPath = `${modelBase}/onnx/model_q4.onnx`;
180
+ const dataPath = `${modelBase}/onnx/model_q4.onnx_data`;
181
+ const session = await ort.InferenceSession.create(onnxPath, {
182
+ executionProviders: ["webgpu"],
183
+ externalData: [{ path: "model_q4.onnx_data", data: dataPath }],
184
+ });
185
+
186
+ // Model config (from config.json)
187
+ const hiddenSize = 1024;
188
+ const numKVHeads = 8;
189
+ const headDim = 64;
190
+
191
+ // Initialize KV cache
192
+ function initCache() {
193
+ const cache = {};
194
+ for (const name of session.inputNames) {
195
+ if (name.startsWith("past_conv")) {
196
+ cache[name] = new ort.Tensor("float32", new Float32Array(hiddenSize * 3), [1, hiddenSize, 3]);
197
+ } else if (name.startsWith("past_key_values")) {
198
+ cache[name] = new ort.Tensor("float32", new Float32Array(0), [1, numKVHeads, 0, headDim]);
199
+ }
200
+ }
201
+ return cache;
202
+ }
203
+
204
+ // Update cache from outputs
205
+ function updateCache(cache, outputs) {
206
+ for (const [name, tensor] of Object.entries(outputs)) {
207
+ if (name.startsWith("present_conv")) {
208
+ cache[name.replace("present_conv", "past_conv")] = tensor;
209
+ } else if (name.startsWith("present.")) {
210
+ cache[name.replace("present.", "past_key_values.")] = tensor;
211
+ }
212
+ }
213
+ }
214
+
215
+ // Build prompt and tokenize
216
+ const messages = [{ role: "user", content: "What is the capital of France?" }];
217
+ const prompt = tokenizer.apply_chat_template(messages, { add_generation_prompt: true, tokenize: false });
218
+ const inputIds = tokenizer.encode(prompt);
219
+
220
+ // Generation loop
221
+ const cache = initCache();
222
+ const eosTokenId = tokenizer.eos_token_id;
223
+ const generatedTokens = [];
224
+ let curLen = inputIds.length;
225
+ let ids = inputIds;
226
+
227
+ for (let step = 0; step < 512; step++) {
228
+ const inputIdsTensor = new ort.Tensor("int64", new BigInt64Array(ids.map(BigInt)), [1, ids.length]);
229
+ const attentionMask = new ort.Tensor("int64", new BigInt64Array(curLen).fill(1n), [1, curLen]);
230
+
231
+ const outputs = await session.run({ input_ids: inputIdsTensor, attention_mask: attentionMask, ...cache });
232
+
233
+ // Greedy decode: argmax of last token logits
234
+ const logits = outputs.logits;
235
+ const vocabSize = logits.dims[2];
236
+ const lastLogits = logits.data.slice((logits.dims[1] - 1) * vocabSize);
237
+ const nextToken = lastLogits.indexOf(Math.max(...lastLogits));
238
+
239
+ generatedTokens.push(nextToken);
240
+ if (nextToken === eosTokenId) break;
241
+
242
+ updateCache(cache, outputs);
243
+ ids = [nextToken];
244
+ curLen++;
245
+ }
246
+
247
+ console.log(tokenizer.decode(generatedTokens, { skip_special_tokens: true }));
248
+ ```
249
+
250
+ ### WebGPU Notes
251
+
252
+ * Models use external data files (`.onnx_data`) that are loaded automatically
253
+ * int64 tensors require `BigInt64Array`
254
+
255
+ ## License
256
+
257
+ This model is released under the [LFM 1.0 License](LICENSE).
chat_template.jinja ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {{- bos_token -}}
2
+ {%- set keep_past_thinking = keep_past_thinking | default(false) -%}
3
+ {%- set ns = namespace(system_prompt="") -%}
4
+ {%- if messages[0]["role"] == "system" -%}
5
+ {%- set sys_content = messages[0]["content"] -%}
6
+ {%- if sys_content is not string -%}
7
+ {%- for item in sys_content -%}
8
+ {%- if item["type"] == "text" -%}
9
+ {%- set ns.system_prompt = ns.system_prompt + item["text"] -%}
10
+ {%- endif -%}
11
+ {%- endfor -%}
12
+ {%- else -%}
13
+ {%- set ns.system_prompt = sys_content -%}
14
+ {%- endif -%}
15
+ {%- set messages = messages[1:] -%}
16
+ {%- endif -%}
17
+ {%- if tools -%}
18
+ {%- set ns.system_prompt = ns.system_prompt + ("\n" if ns.system_prompt else "") + "List of tools: [" -%}
19
+ {%- for tool in tools -%}
20
+ {%- if tool is not string -%}
21
+ {%- set tool = tool | tojson -%}
22
+ {%- endif -%}
23
+ {%- set ns.system_prompt = ns.system_prompt + tool -%}
24
+ {%- if not loop.last -%}
25
+ {%- set ns.system_prompt = ns.system_prompt + ", " -%}
26
+ {%- endif -%}
27
+ {%- endfor -%}
28
+ {%- set ns.system_prompt = ns.system_prompt + "]" -%}
29
+ {%- endif -%}
30
+ {%- if ns.system_prompt -%}
31
+ {{- "<|im_start|>system\n" + ns.system_prompt + "<|im_end|>\n" -}}
32
+ {%- endif -%}
33
+ {%- set ns.last_assistant_index = -1 -%}
34
+ {%- for message in messages -%}
35
+ {%- if message["role"] == "assistant" -%}
36
+ {%- set ns.last_assistant_index = loop.index0 -%}
37
+ {%- endif -%}
38
+ {%- endfor -%}
39
+ {%- for message in messages -%}
40
+ {{- "<|im_start|>" + message["role"] + "\n" -}}
41
+ {%- set content = message["content"] -%}
42
+ {%- if content is not string -%}
43
+ {%- set ns.content = "" -%}
44
+ {%- for item in content -%}
45
+ {%- if item["type"] == "image" -%}
46
+ {%- set ns.content = ns.content + "<image>" -%}
47
+ {%- elif item["type"] == "text" -%}
48
+ {%- set ns.content = ns.content + item["text"] -%}
49
+ {%- else -%}
50
+ {%- set ns.content = ns.content + item | tojson -%}
51
+ {%- endif -%}
52
+ {%- endfor -%}
53
+ {%- set content = ns.content -%}
54
+ {%- endif -%}
55
+ {%- if message["role"] == "assistant" and not keep_past_thinking and loop.index0 != ns.last_assistant_index -%}
56
+ {%- if "</think>" in content -%}
57
+ {%- set content = content.split("</think>")[-1] | trim -%}
58
+ {%- endif -%}
59
+ {%- endif -%}
60
+ {{- content + "<|im_end|>\n" -}}
61
+ {%- endfor -%}
62
+ {%- if add_generation_prompt -%}
63
+ {{- "<|im_start|>assistant\n" -}}
64
+ {%- endif -%}
config.json ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Lfm2ForCausalLM"
4
+ ],
5
+ "block_auto_adjust_ff_dim": true,
6
+ "block_dim": 1024,
7
+ "block_ff_dim": 6656,
8
+ "block_ffn_dim_multiplier": 1.0,
9
+ "block_mlp_init_scale": 1.0,
10
+ "block_multiple_of": 256,
11
+ "block_norm_eps": 1e-05,
12
+ "block_out_init_scale": 1.0,
13
+ "block_use_swiglu": true,
14
+ "block_use_xavier_init": true,
15
+ "bos_token_id": 1,
16
+ "conv_L_cache": 3,
17
+ "conv_bias": false,
18
+ "conv_dim": 1024,
19
+ "conv_use_xavier_init": true,
20
+ "dtype": "bfloat16",
21
+ "eos_token_id": 7,
22
+ "hidden_size": 1024,
23
+ "initializer_range": 0.02,
24
+ "intermediate_size": 6656,
25
+ "layer_types": [
26
+ "conv",
27
+ "conv",
28
+ "full_attention",
29
+ "conv",
30
+ "conv",
31
+ "full_attention",
32
+ "conv",
33
+ "conv",
34
+ "full_attention",
35
+ "conv",
36
+ "full_attention",
37
+ "conv",
38
+ "full_attention",
39
+ "conv",
40
+ "full_attention",
41
+ "conv"
42
+ ],
43
+ "max_position_embeddings": 128000,
44
+ "model_type": "lfm2",
45
+ "norm_eps": 1e-05,
46
+ "num_attention_heads": 16,
47
+ "num_heads": 16,
48
+ "num_hidden_layers": 16,
49
+ "num_key_value_heads": 8,
50
+ "pad_token_id": 0,
51
+ "rope_parameters": {
52
+ "rope_theta": 1000000.0,
53
+ "rope_type": "default"
54
+ },
55
+ "tie_embedding": true,
56
+ "transformers_version": "5.0.0.dev0",
57
+ "use_cache": true,
58
+ "use_pos_enc": true,
59
+ "vocab_size": 65536,
60
+ "transformers.js_config": {
61
+ "kv_cache_dtype": {
62
+ "fp32": "float32"
63
+ },
64
+ "use_external_data_format": true
65
+ }
66
+ }
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 7,
5
+ "pad_token_id": 0,
6
+ "transformers_version": "4.54.0"
7
+ }
onnx/model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6c43efd6be4b93b69676b8d6c9699042cd0aef76329eea1934636111103880ff
3
+ size 145288
onnx/model.onnx_data ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:00ec28c8f2c6e5cbca8c5e6b15d6b51cf849a91b5f51a4f495f7fbc3dc6ca0dc
3
+ size 1450700800
onnx/model_fp16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ea5f5fa192a5c3dae08ffbcaa3c8eeeeffb417aa3ac7d9e51ae71a8c5b108f64
3
+ size 151040
onnx/model_fp16.onnx_data ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fdca4e312c904c91527ad62299d4c17e5e9b699938cd1fb18c8a7d90c8468ddc
3
+ size 725350400
onnx/model_q4.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5542e4d0c077cbe9537d78ac1ad7d03d7da79a1baa26dc3ac338fdf9059adc33
3
+ size 177899
onnx/model_q4.onnx_data ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:15e418caccb304d414625da177ad412792332f54078532924c1fda5a98cbb484
3
+ size 312342528
onnx/model_q4f16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c73ea401925f1325119f1c1e24b5015c43d9ed7393283ec45dcc7ce376683e39
3
+ size 178589
onnx/model_q4f16.onnx_data ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:15e418caccb304d414625da177ad412792332f54078532924c1fda5a98cbb484
3
+ size 312342528
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": null,
3
+ "backend": "tokenizers",
4
+ "bos_token": "<|startoftext|>",
5
+ "clean_up_tokenization_spaces": false,
6
+ "eos_token": "<|im_end|>",
7
+ "is_local": false,
8
+ "legacy": false,
9
+ "model_input_names": [
10
+ "input_ids",
11
+ "attention_mask"
12
+ ],
13
+ "model_max_length": 1000000000000000019884624838656,
14
+ "model_specific_special_tokens": {},
15
+ "pad_token": "<|pad|>",
16
+ "sp_model_kwargs": {},
17
+ "spaces_between_special_tokens": false,
18
+ "tokenizer_class": "TokenizersBackend",
19
+ "use_default_system_prompt": false,
20
+ "use_fast": true,
21
+ "chat_template": "{{- bos_token -}}\n{%- set keep_past_thinking = keep_past_thinking | default(false) -%}\n{%- set ns = namespace(system_prompt=\"\") -%}\n{%- if messages[0][\"role\"] == \"system\" -%}\n {%- set sys_content = messages[0][\"content\"] -%}\n {%- if sys_content is not string -%}\n {%- for item in sys_content -%}\n {%- if item[\"type\"] == \"text\" -%}\n {%- set ns.system_prompt = ns.system_prompt + item[\"text\"] -%}\n {%- endif -%}\n {%- endfor -%}\n {%- else -%}\n {%- set ns.system_prompt = sys_content -%}\n {%- endif -%}\n {%- set messages = messages[1:] -%}\n{%- endif -%}\n{%- if tools -%}\n {%- set ns.system_prompt = ns.system_prompt + (\"\\n\" if ns.system_prompt else \"\") + \"List of tools: [\" -%}\n {%- for tool in tools -%}\n {%- if tool is not string -%}\n {%- set tool = tool | tojson -%}\n {%- endif -%}\n {%- set ns.system_prompt = ns.system_prompt + tool -%}\n {%- if not loop.last -%}\n {%- set ns.system_prompt = ns.system_prompt + \", \" -%}\n {%- endif -%}\n {%- endfor -%}\n {%- set ns.system_prompt = ns.system_prompt + \"]\" -%}\n{%- endif -%}\n{%- if ns.system_prompt -%}\n {{- \"<|im_start|>system\\n\" + ns.system_prompt + \"<|im_end|>\\n\" -}}\n{%- endif -%}\n{%- set ns.last_assistant_index = -1 -%}\n{%- for message in messages -%}\n {%- if message[\"role\"] == \"assistant\" -%}\n {%- set ns.last_assistant_index = loop.index0 -%}\n {%- endif -%}\n{%- endfor -%}\n{%- for message in messages -%}\n {{- \"<|im_start|>\" + message[\"role\"] + \"\\n\" -}}\n {%- set content = message[\"content\"] -%}\n {%- if content is not string -%}\n {%- set ns.content = \"\" -%}\n {%- for item in content -%}\n {%- if item[\"type\"] == \"image\" -%}\n {%- set ns.content = ns.content + \"<image>\" -%}\n {%- elif item[\"type\"] == \"text\" -%}\n {%- set ns.content = ns.content + item[\"text\"] -%}\n {%- else -%}\n {%- set ns.content = ns.content + item | tojson -%}\n {%- endif -%}\n {%- endfor -%}\n {%- set content = ns.content -%}\n {%- endif -%}\n {%- if message[\"role\"] == \"assistant\" and not keep_past_thinking and loop.index0 != ns.last_assistant_index -%}\n {%- if \"</think>\" in content -%}\n {%- set content = content.split(\"</think>\")[-1] | trim -%}\n {%- endif -%}\n {%- endif -%}\n {{- content + \"<|im_end|>\\n\" -}}\n{%- endfor -%}\n{%- if add_generation_prompt -%}\n {{- \"<|im_start|>assistant\\n\" -}}\n{%- endif -%}"
22
+ }