Matjac5 commited on
Commit
76056b7
·
verified ·
1 Parent(s): a6d65b9

Upload rag SentenceTransformer

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 1024,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,394 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:268861
8
+ - loss:MultipleNegativesRankingLoss
9
+ base_model: Qwen/Qwen3-0.6B-Base
10
+ widget:
11
+ - source_sentence: how many seconds will a 450 m long train take to cross a man walking
12
+ with a speed of 3 km / hr in the direction of the moving train if the speed of
13
+ the train is 63 km / hr ?
14
+ sentences:
15
+ - ''''
16
+ - '['
17
+ - '2'
18
+ - source_sentence: 'A patient of CSOM has choleastatoma and presents with veigo .
19
+ Treatment of choice would be:'
20
+ sentences:
21
+ - A
22
+ - ''''
23
+ - ''''
24
+ - source_sentence: Dhoni spent 25 percent of his earning last month on rent and 10
25
+ percent less than what he spent on rent to purchase a new dishwasher. What percent
26
+ of last month's earning did Dhoni have left over?
27
+ sentences:
28
+ - C
29
+ - ''''
30
+ - '%'
31
+ - source_sentence: 'On the xy co-ordinate plane, point C is (5,-2) and point D is
32
+ (-1,2). The point on line segment CD that is twice as far from C as from D is:'
33
+ sentences:
34
+ - '1'
35
+ - n
36
+ - y
37
+ - source_sentence: car a runs at the speed of 35 km / hr & reaches its destination
38
+ in 9 hr . car b runs at the speed of 43 km / h & reaches its destination in 10
39
+ h . what is the respective ratio of distances covered by car a & car b ?
40
+ sentences:
41
+ - ' '
42
+ - R
43
+ - ''''
44
+ pipeline_tag: sentence-similarity
45
+ library_name: sentence-transformers
46
+ ---
47
+
48
+ # SentenceTransformer based on Qwen/Qwen3-0.6B-Base
49
+
50
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Qwen/Qwen3-0.6B-Base](https://huggingface.co/Qwen/Qwen3-0.6B-Base). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
51
+
52
+ ## Model Details
53
+
54
+ ### Model Description
55
+ - **Model Type:** Sentence Transformer
56
+ - **Base model:** [Qwen/Qwen3-0.6B-Base](https://huggingface.co/Qwen/Qwen3-0.6B-Base) <!-- at revision 11214f7f3465775dcce23c3752ecea5a42ee0ddc -->
57
+ - **Maximum Sequence Length:** 128 tokens
58
+ - **Output Dimensionality:** 1024 dimensions
59
+ - **Similarity Function:** Cosine Similarity
60
+ <!-- - **Training Dataset:** Unknown -->
61
+ <!-- - **Language:** Unknown -->
62
+ <!-- - **License:** Unknown -->
63
+
64
+ ### Model Sources
65
+
66
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
67
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
68
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
69
+
70
+ ### Full Model Architecture
71
+
72
+ ```
73
+ SentenceTransformer(
74
+ (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: Qwen3Model
75
+ (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
76
+ )
77
+ ```
78
+
79
+ ## Usage
80
+
81
+ ### Direct Usage (Sentence Transformers)
82
+
83
+ First install the Sentence Transformers library:
84
+
85
+ ```bash
86
+ pip install -U sentence-transformers
87
+ ```
88
+
89
+ Then you can load this model and run inference.
90
+ ```python
91
+ from sentence_transformers import SentenceTransformer
92
+
93
+ # Download from the 🤗 Hub
94
+ model = SentenceTransformer("sentence_transformers_model_id")
95
+ # Run inference
96
+ sentences = [
97
+ 'car a runs at the speed of 35 km / hr & reaches its destination in 9 hr . car b runs at the speed of 43 km / h & reaches its destination in 10 h . what is the respective ratio of distances covered by car a & car b ?',
98
+ ' ',
99
+ "'",
100
+ ]
101
+ embeddings = model.encode(sentences)
102
+ print(embeddings.shape)
103
+ # [3, 1024]
104
+
105
+ # Get the similarity scores for the embeddings
106
+ similarities = model.similarity(embeddings, embeddings)
107
+ print(similarities.shape)
108
+ # [3, 3]
109
+ ```
110
+
111
+ <!--
112
+ ### Direct Usage (Transformers)
113
+
114
+ <details><summary>Click to see the direct usage in Transformers</summary>
115
+
116
+ </details>
117
+ -->
118
+
119
+ <!--
120
+ ### Downstream Usage (Sentence Transformers)
121
+
122
+ You can finetune this model on your own dataset.
123
+
124
+ <details><summary>Click to expand</summary>
125
+
126
+ </details>
127
+ -->
128
+
129
+ <!--
130
+ ### Out-of-Scope Use
131
+
132
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
133
+ -->
134
+
135
+ <!--
136
+ ## Bias, Risks and Limitations
137
+
138
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
139
+ -->
140
+
141
+ <!--
142
+ ### Recommendations
143
+
144
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
145
+ -->
146
+
147
+ ## Training Details
148
+
149
+ ### Training Dataset
150
+
151
+ #### Unnamed Dataset
152
+
153
+ * Size: 268,861 training samples
154
+ * Columns: <code>sentence_0</code> and <code>sentence_1</code>
155
+ * Approximate statistics based on the first 1000 samples:
156
+ | | sentence_0 | sentence_1 |
157
+ |:--------|:-----------------------------------------------------------------------------------|:--------------------------------------------------------------------------------|
158
+ | type | string | string |
159
+ | details | <ul><li>min: 4 tokens</li><li>mean: 48.06 tokens</li><li>max: 128 tokens</li></ul> | <ul><li>min: 0 tokens</li><li>mean: 0.98 tokens</li><li>max: 1 tokens</li></ul> |
160
+ * Samples:
161
+ | sentence_0 | sentence_1 |
162
+ |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
163
+ | <code>What is known to cause pedal Botryomycosis</code> | <code>A</code> |
164
+ | <code>Two friends plan to walk along a 33-km trail, starting at opposite ends of the trail at the same time. If Friend P's rate is 20% faster than Friend Q's, how many kilometers will Friend P have walked when they pass each other?</code> | <code>5</code> |
165
+ | <code>The average age of a husband and a wife is 23 years when they were married five years ago but now the average age of the husband, wife and child is 20 years(the child was born during the interval). What is the present age of the child?</code> | <code>)</code> |
166
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
167
+ ```json
168
+ {
169
+ "scale": 20.0,
170
+ "similarity_fct": "cos_sim"
171
+ }
172
+ ```
173
+
174
+ ### Training Hyperparameters
175
+ #### Non-Default Hyperparameters
176
+
177
+ - `per_device_train_batch_size`: 16
178
+ - `per_device_eval_batch_size`: 16
179
+ - `num_train_epochs`: 1
180
+ - `fp16`: True
181
+ - `multi_dataset_batch_sampler`: round_robin
182
+
183
+ #### All Hyperparameters
184
+ <details><summary>Click to expand</summary>
185
+
186
+ - `overwrite_output_dir`: False
187
+ - `do_predict`: False
188
+ - `eval_strategy`: no
189
+ - `prediction_loss_only`: True
190
+ - `per_device_train_batch_size`: 16
191
+ - `per_device_eval_batch_size`: 16
192
+ - `per_gpu_train_batch_size`: None
193
+ - `per_gpu_eval_batch_size`: None
194
+ - `gradient_accumulation_steps`: 1
195
+ - `eval_accumulation_steps`: None
196
+ - `torch_empty_cache_steps`: None
197
+ - `learning_rate`: 5e-05
198
+ - `weight_decay`: 0.0
199
+ - `adam_beta1`: 0.9
200
+ - `adam_beta2`: 0.999
201
+ - `adam_epsilon`: 1e-08
202
+ - `max_grad_norm`: 1
203
+ - `num_train_epochs`: 1
204
+ - `max_steps`: -1
205
+ - `lr_scheduler_type`: linear
206
+ - `lr_scheduler_kwargs`: {}
207
+ - `warmup_ratio`: 0.0
208
+ - `warmup_steps`: 0
209
+ - `log_level`: passive
210
+ - `log_level_replica`: warning
211
+ - `log_on_each_node`: True
212
+ - `logging_nan_inf_filter`: True
213
+ - `save_safetensors`: True
214
+ - `save_on_each_node`: False
215
+ - `save_only_model`: False
216
+ - `restore_callback_states_from_checkpoint`: False
217
+ - `no_cuda`: False
218
+ - `use_cpu`: False
219
+ - `use_mps_device`: False
220
+ - `seed`: 42
221
+ - `data_seed`: None
222
+ - `jit_mode_eval`: False
223
+ - `use_ipex`: False
224
+ - `bf16`: False
225
+ - `fp16`: True
226
+ - `fp16_opt_level`: O1
227
+ - `half_precision_backend`: auto
228
+ - `bf16_full_eval`: False
229
+ - `fp16_full_eval`: False
230
+ - `tf32`: None
231
+ - `local_rank`: 0
232
+ - `ddp_backend`: None
233
+ - `tpu_num_cores`: None
234
+ - `tpu_metrics_debug`: False
235
+ - `debug`: []
236
+ - `dataloader_drop_last`: False
237
+ - `dataloader_num_workers`: 0
238
+ - `dataloader_prefetch_factor`: None
239
+ - `past_index`: -1
240
+ - `disable_tqdm`: False
241
+ - `remove_unused_columns`: True
242
+ - `label_names`: None
243
+ - `load_best_model_at_end`: False
244
+ - `ignore_data_skip`: False
245
+ - `fsdp`: []
246
+ - `fsdp_min_num_params`: 0
247
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
248
+ - `fsdp_transformer_layer_cls_to_wrap`: None
249
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
250
+ - `deepspeed`: None
251
+ - `label_smoothing_factor`: 0.0
252
+ - `optim`: adamw_torch
253
+ - `optim_args`: None
254
+ - `adafactor`: False
255
+ - `group_by_length`: False
256
+ - `length_column_name`: length
257
+ - `ddp_find_unused_parameters`: None
258
+ - `ddp_bucket_cap_mb`: None
259
+ - `ddp_broadcast_buffers`: False
260
+ - `dataloader_pin_memory`: True
261
+ - `dataloader_persistent_workers`: False
262
+ - `skip_memory_metrics`: True
263
+ - `use_legacy_prediction_loop`: False
264
+ - `push_to_hub`: False
265
+ - `resume_from_checkpoint`: None
266
+ - `hub_model_id`: None
267
+ - `hub_strategy`: every_save
268
+ - `hub_private_repo`: None
269
+ - `hub_always_push`: False
270
+ - `gradient_checkpointing`: False
271
+ - `gradient_checkpointing_kwargs`: None
272
+ - `include_inputs_for_metrics`: False
273
+ - `include_for_metrics`: []
274
+ - `eval_do_concat_batches`: True
275
+ - `fp16_backend`: auto
276
+ - `push_to_hub_model_id`: None
277
+ - `push_to_hub_organization`: None
278
+ - `mp_parameters`:
279
+ - `auto_find_batch_size`: False
280
+ - `full_determinism`: False
281
+ - `torchdynamo`: None
282
+ - `ray_scope`: last
283
+ - `ddp_timeout`: 1800
284
+ - `torch_compile`: False
285
+ - `torch_compile_backend`: None
286
+ - `torch_compile_mode`: None
287
+ - `include_tokens_per_second`: False
288
+ - `include_num_input_tokens_seen`: False
289
+ - `neftune_noise_alpha`: None
290
+ - `optim_target_modules`: None
291
+ - `batch_eval_metrics`: False
292
+ - `eval_on_start`: False
293
+ - `use_liger_kernel`: False
294
+ - `eval_use_gather_object`: False
295
+ - `average_tokens_across_devices`: False
296
+ - `prompts`: None
297
+ - `batch_sampler`: batch_sampler
298
+ - `multi_dataset_batch_sampler`: round_robin
299
+
300
+ </details>
301
+
302
+ ### Training Logs
303
+ | Epoch | Step | Training Loss |
304
+ |:------:|:-----:|:-------------:|
305
+ | 0.0298 | 500 | 2.7788 |
306
+ | 0.0595 | 1000 | 2.5217 |
307
+ | 0.0893 | 1500 | 2.5004 |
308
+ | 0.1190 | 2000 | 2.5451 |
309
+ | 0.1488 | 2500 | 2.5165 |
310
+ | 0.1785 | 3000 | 2.5384 |
311
+ | 0.2083 | 3500 | 2.4994 |
312
+ | 0.2380 | 4000 | 0.0 |
313
+ | 0.2678 | 4500 | 0.0 |
314
+ | 0.2975 | 5000 | 0.0 |
315
+ | 0.3273 | 5500 | 0.0 |
316
+ | 0.3571 | 6000 | 0.0 |
317
+ | 0.3868 | 6500 | 0.0 |
318
+ | 0.4166 | 7000 | 0.0 |
319
+ | 0.4463 | 7500 | 0.0 |
320
+ | 0.4761 | 8000 | 0.0 |
321
+ | 0.5058 | 8500 | 0.0 |
322
+ | 0.5356 | 9000 | 0.0 |
323
+ | 0.5653 | 9500 | 0.0 |
324
+ | 0.5951 | 10000 | 0.0 |
325
+ | 0.6249 | 10500 | 0.0 |
326
+ | 0.6546 | 11000 | 0.0 |
327
+ | 0.6844 | 11500 | 0.0 |
328
+ | 0.7141 | 12000 | 0.0 |
329
+ | 0.7439 | 12500 | 0.0 |
330
+ | 0.7736 | 13000 | 0.0 |
331
+ | 0.8034 | 13500 | 0.0 |
332
+ | 0.8331 | 14000 | 0.0 |
333
+ | 0.8629 | 14500 | 0.0 |
334
+ | 0.8926 | 15000 | 0.0 |
335
+ | 0.9224 | 15500 | 0.0 |
336
+ | 0.9522 | 16000 | 0.0 |
337
+ | 0.9819 | 16500 | 0.0 |
338
+
339
+
340
+ ### Framework Versions
341
+ - Python: 3.11.13
342
+ - Sentence Transformers: 4.1.0
343
+ - Transformers: 4.52.3
344
+ - PyTorch: 2.6.0+cu124
345
+ - Accelerate: 1.7.0
346
+ - Datasets: 3.6.0
347
+ - Tokenizers: 0.21.1
348
+
349
+ ## Citation
350
+
351
+ ### BibTeX
352
+
353
+ #### Sentence Transformers
354
+ ```bibtex
355
+ @inproceedings{reimers-2019-sentence-bert,
356
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
357
+ author = "Reimers, Nils and Gurevych, Iryna",
358
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
359
+ month = "11",
360
+ year = "2019",
361
+ publisher = "Association for Computational Linguistics",
362
+ url = "https://arxiv.org/abs/1908.10084",
363
+ }
364
+ ```
365
+
366
+ #### MultipleNegativesRankingLoss
367
+ ```bibtex
368
+ @misc{henderson2017efficient,
369
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
370
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
371
+ year={2017},
372
+ eprint={1705.00652},
373
+ archivePrefix={arXiv},
374
+ primaryClass={cs.CL}
375
+ }
376
+ ```
377
+
378
+ <!--
379
+ ## Glossary
380
+
381
+ *Clearly define terms in order to be accessible across audiences.*
382
+ -->
383
+
384
+ <!--
385
+ ## Model Card Authors
386
+
387
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
388
+ -->
389
+
390
+ <!--
391
+ ## Model Card Contact
392
+
393
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
394
+ -->
added_tokens.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</think>": 151668,
3
+ "</tool_call>": 151658,
4
+ "</tool_response>": 151666,
5
+ "<think>": 151667,
6
+ "<tool_call>": 151657,
7
+ "<tool_response>": 151665,
8
+ "<|box_end|>": 151649,
9
+ "<|box_start|>": 151648,
10
+ "<|endoftext|>": 151643,
11
+ "<|file_sep|>": 151664,
12
+ "<|fim_middle|>": 151660,
13
+ "<|fim_pad|>": 151662,
14
+ "<|fim_prefix|>": 151659,
15
+ "<|fim_suffix|>": 151661,
16
+ "<|im_end|>": 151645,
17
+ "<|im_start|>": 151644,
18
+ "<|image_pad|>": 151655,
19
+ "<|object_ref_end|>": 151647,
20
+ "<|object_ref_start|>": 151646,
21
+ "<|quad_end|>": 151651,
22
+ "<|quad_start|>": 151650,
23
+ "<|repo_name|>": 151663,
24
+ "<|video_pad|>": 151656,
25
+ "<|vision_end|>": 151653,
26
+ "<|vision_pad|>": 151654,
27
+ "<|vision_start|>": 151652
28
+ }
chat_template.jinja ADDED
@@ -0,0 +1,85 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if tools %}
2
+ {{- '<|im_start|>system\n' }}
3
+ {%- if messages[0].role == 'system' %}
4
+ {{- messages[0].content + '\n\n' }}
5
+ {%- endif %}
6
+ {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
7
+ {%- for tool in tools %}
8
+ {{- "\n" }}
9
+ {{- tool | tojson }}
10
+ {%- endfor %}
11
+ {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
12
+ {%- else %}
13
+ {%- if messages[0].role == 'system' %}
14
+ {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
15
+ {%- endif %}
16
+ {%- endif %}
17
+ {%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
18
+ {%- for message in messages[::-1] %}
19
+ {%- set index = (messages|length - 1) - loop.index0 %}
20
+ {%- if ns.multi_step_tool and message.role == "user" and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
21
+ {%- set ns.multi_step_tool = false %}
22
+ {%- set ns.last_query_index = index %}
23
+ {%- endif %}
24
+ {%- endfor %}
25
+ {%- for message in messages %}
26
+ {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
27
+ {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
28
+ {%- elif message.role == "assistant" %}
29
+ {%- set content = message.content %}
30
+ {%- set reasoning_content = '' %}
31
+ {%- if message.reasoning_content is defined and message.reasoning_content is not none %}
32
+ {%- set reasoning_content = message.reasoning_content %}
33
+ {%- else %}
34
+ {%- if '</think>' in message.content %}
35
+ {%- set content = message.content.split('</think>')[-1].lstrip('\n') %}
36
+ {%- set reasoning_content = message.content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
37
+ {%- endif %}
38
+ {%- endif %}
39
+ {%- if loop.index0 > ns.last_query_index %}
40
+ {%- if loop.last or (not loop.last and reasoning_content) %}
41
+ {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
42
+ {%- else %}
43
+ {{- '<|im_start|>' + message.role + '\n' + content }}
44
+ {%- endif %}
45
+ {%- else %}
46
+ {{- '<|im_start|>' + message.role + '\n' + content }}
47
+ {%- endif %}
48
+ {%- if message.tool_calls %}
49
+ {%- for tool_call in message.tool_calls %}
50
+ {%- if (loop.first and content) or (not loop.first) %}
51
+ {{- '\n' }}
52
+ {%- endif %}
53
+ {%- if tool_call.function %}
54
+ {%- set tool_call = tool_call.function %}
55
+ {%- endif %}
56
+ {{- '<tool_call>\n{"name": "' }}
57
+ {{- tool_call.name }}
58
+ {{- '", "arguments": ' }}
59
+ {%- if tool_call.arguments is string %}
60
+ {{- tool_call.arguments }}
61
+ {%- else %}
62
+ {{- tool_call.arguments | tojson }}
63
+ {%- endif %}
64
+ {{- '}\n</tool_call>' }}
65
+ {%- endfor %}
66
+ {%- endif %}
67
+ {{- '<|im_end|>\n' }}
68
+ {%- elif message.role == "tool" %}
69
+ {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
70
+ {{- '<|im_start|>user' }}
71
+ {%- endif %}
72
+ {{- '\n<tool_response>\n' }}
73
+ {{- message.content }}
74
+ {{- '\n</tool_response>' }}
75
+ {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
76
+ {{- '<|im_end|>\n' }}
77
+ {%- endif %}
78
+ {%- endif %}
79
+ {%- endfor %}
80
+ {%- if add_generation_prompt %}
81
+ {{- '<|im_start|>assistant\n' }}
82
+ {%- if enable_thinking is defined and enable_thinking is false %}
83
+ {{- '<think>\n\n</think>\n\n' }}
84
+ {%- endif %}
85
+ {%- endif %}
config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Qwen3Model"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 151643,
8
+ "eos_token_id": 151643,
9
+ "head_dim": 128,
10
+ "hidden_act": "silu",
11
+ "hidden_size": 1024,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 3072,
14
+ "max_position_embeddings": 32768,
15
+ "max_window_layers": 28,
16
+ "model_type": "qwen3",
17
+ "num_attention_heads": 16,
18
+ "num_hidden_layers": 28,
19
+ "num_key_value_heads": 8,
20
+ "rms_norm_eps": 1e-06,
21
+ "rope_scaling": null,
22
+ "rope_theta": 1000000,
23
+ "sliding_window": null,
24
+ "tie_word_embeddings": true,
25
+ "torch_dtype": "float32",
26
+ "transformers_version": "4.52.3",
27
+ "use_cache": true,
28
+ "use_sliding_window": false,
29
+ "vocab_size": 151936
30
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "4.1.0",
4
+ "transformers": "4.52.3",
5
+ "pytorch": "2.6.0+cu124"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fdeae6d28d0a1704d66cdb36938add1c6ef97c23633f5a1e29a8bcf009486f9f
3
+ size 2384233112
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 128,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "eos_token": {
18
+ "content": "<|endoftext|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "pad_token": {
25
+ "content": "<|endoftext|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ }
31
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2c9573ae979ec2d2616f50161510156609a81f0842bbc4e8d1f161995c5cd8f4
3
+ size 11422920
tokenizer_config.json ADDED
@@ -0,0 +1,239 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ },
181
+ "151665": {
182
+ "content": "<tool_response>",
183
+ "lstrip": false,
184
+ "normalized": false,
185
+ "rstrip": false,
186
+ "single_word": false,
187
+ "special": false
188
+ },
189
+ "151666": {
190
+ "content": "</tool_response>",
191
+ "lstrip": false,
192
+ "normalized": false,
193
+ "rstrip": false,
194
+ "single_word": false,
195
+ "special": false
196
+ },
197
+ "151667": {
198
+ "content": "<think>",
199
+ "lstrip": false,
200
+ "normalized": false,
201
+ "rstrip": false,
202
+ "single_word": false,
203
+ "special": false
204
+ },
205
+ "151668": {
206
+ "content": "</think>",
207
+ "lstrip": false,
208
+ "normalized": false,
209
+ "rstrip": false,
210
+ "single_word": false,
211
+ "special": false
212
+ }
213
+ },
214
+ "additional_special_tokens": [
215
+ "<|im_start|>",
216
+ "<|im_end|>",
217
+ "<|object_ref_start|>",
218
+ "<|object_ref_end|>",
219
+ "<|box_start|>",
220
+ "<|box_end|>",
221
+ "<|quad_start|>",
222
+ "<|quad_end|>",
223
+ "<|vision_start|>",
224
+ "<|vision_end|>",
225
+ "<|vision_pad|>",
226
+ "<|image_pad|>",
227
+ "<|video_pad|>"
228
+ ],
229
+ "bos_token": null,
230
+ "clean_up_tokenization_spaces": false,
231
+ "eos_token": "<|endoftext|>",
232
+ "errors": "replace",
233
+ "extra_special_tokens": {},
234
+ "model_max_length": 128,
235
+ "pad_token": "<|endoftext|>",
236
+ "split_special_tokens": false,
237
+ "tokenizer_class": "Qwen2Tokenizer",
238
+ "unk_token": null
239
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff