fanjiang98 commited on
Commit
69a934e
·
verified ·
1 Parent(s): 03f3b74

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -1,3 +1,179 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - zh
6
+ - ar
7
+ - de
8
+ - es
9
+ - fr
10
+ - ko
11
+ - ja
12
+ - pt
13
+ - tr
14
+ - id
15
+ - it
16
+ - nl
17
+ - pl
18
+ - ru
19
+ - vi
20
+ - th
21
+ - he
22
+ - uk
23
+ - ms
24
+ - bn
25
+ - cs
26
+ - ur
27
+ - kk
28
+ - el
29
+ - ro
30
+ - hu
31
+ - ne
32
+ - az
33
+ library_name: transformers
34
+ tags:
35
+ - moe
36
+ - mixture-of-experts
37
+ - multilingual
38
+ - upcycling
39
+ datasets:
40
+ - nvidia/Nemotron-CC-v2
41
+ - nvidia/Nemotron-Pretraining-SFT-v1
42
+ - nvidia/Nemotron-Pretraining-Specialized-v1
43
+ - nvidia/Nemotron-CC-v2.1
44
+ - allenai/dolmino-mix-1124
45
+ - nvidia/Nemotron-CC-Math-v1
46
+ - nvidia/OpenMathInstruct-2
47
+ - HuggingFaceTB/finemath
48
+ - LLM360/MegaMath
49
+ - open-thoughts/OpenThoughts3-1.2M
50
+ - opencsg/Fineweb-Edu-Chinese-V2.1
51
+ - HuggingFaceFW/fineweb-2
52
+ - allenai/dolma3_dolmino_mix-100B-1125
53
+ ---
54
+
55
+ # Marco-Nano-Base
56
+
57
+ **Marco-Nano-Base** is a compact, highly sparse Mixture-of-Experts (MoE) multilingual language model from the [Marco-MoE](https://github.com/AIDC-AI/Marco-LLM) family, developed by Alibaba International Digital Commerce. It activates only **0.6B out of 8B total parameters** (7.5% activation ratio) per token, achieving strong English and multilingual performance across 29 languages while requiring significantly less compute than comparable dense models.
58
+
59
+ ## Model Description
60
+
61
+ Marco-Nano is built on a decoder-only Transformer architecture with sparse MoE layers replacing standard FFN layers. It is upcycled from [Qwen3-0.6B-Base](https://huggingface.co/Qwen/Qwen3-0.6B-Base) using a fine-grained sub-matrix splitting strategy combined with Drop-Upcycling to promote expert diversification.
62
+
63
+ | Configuration | Value |
64
+ |:---|:---:|
65
+ | Total Parameters | 8B |
66
+ | Activated Parameters | 0.6B |
67
+ | Activation Ratio | 7.5% |
68
+ | Num Layers | 28 |
69
+ | Model Dimension | 1024 |
70
+ | FFN Intermediate Dimension | 3072 |
71
+ | Q-Heads | 16 |
72
+ | KV-Heads | 8 |
73
+ | Head Dimension | 128 |
74
+ | Expert Dimension | 384 |
75
+ | Total Experts | 232 |
76
+ | Activated Experts | 8 |
77
+ | Tie Embeddings | True |
78
+ | Training FLOPs | $1.40 \times 10^{23}$ |
79
+
80
+ ## Training Details
81
+
82
+ Marco-Nano was pre-trained on **5.1 trillion tokens** using a four-stage curriculum:
83
+
84
+ 1. **Stage 1 (0 - 2.4T tokens): Foundational Training** — High-quality English data (Nemotron-CC-v2), reasoning and instruction data, and multilingual web/QA data for 19 languages.
85
+ 2. **Stage 2 (2.4T - 4.1T tokens): Optimization & Upsampling** — Upsampled reasoning corpora, downsampled English web data, and upsampled Chinese data with learning rate decay.
86
+ 3. **Stage 3 (4.1T - 4.6T tokens): Language Expansion** — Added 9 new languages (Bengali, Czech, Urdu, Kazakh, Greek, Romanian, Hungarian, Nepali, Azerbaijani) and upsampled medium-resource languages.
87
+ 4. **Stage 4 (4.6T - 5.1T tokens): Synthetic Data Integration** — Curated multilingual synthetic data including cultural content (Fineweb2-Culture) and synthetic regional MCQs.
88
+
89
+ ## Supported Languages
90
+
91
+ English, Chinese, Arabic, German, Spanish, French, Korean, Japanese, Portuguese, Turkish, Indonesian, Italian, Dutch, Polish, Russian, Vietnamese, Thai, Hebrew, Ukrainian, Malay, Bengali, Czech, Urdu, Kazakh, Greek, Romanian, Hungarian, Nepali, Azerbaijani
92
+
93
+ ## Evaluation
94
+
95
+ We compare Marco-Nano against size-matched baselines: **Qwen3-1.7B** (1.7B activated), **Trinity Nano** (1.09B activated), and **Granite4-Tiny** (1.47B activated). Marco-Nano uses only **0.6B activated parameters** — the smallest among all baselines.
96
+
97
+ ### English
98
+
99
+ | Benchmark | # Shots | Qwen3-1.7B | Trinity Nano | Granite4-Tiny | **Marco-Nano** |
100
+ |:---|:---:|:---:|:---:|:---:|:---:|
101
+ | MMLU _(Acc)_ | 5-shot | 65.1 | 64.7 | **69.1** | 64.7 |
102
+ | MMLU-Redux _(Acc)_ | 0-shot | 61.2 | 60.1 | **65.8** | 62.9 |
103
+ | MMLU-Pro _(Acc)_ | 5-shot | 33.2 | 32.0 | 32.1 | **35.9** |
104
+ | AGIEval _(Acc)_ | 0-shot | 35.9 | 31.4 | 36.1 | **38.4** |
105
+ | BBH _(EM)_ | 3-shot | 54.5 | 49.3 | **59.9** | 53.5 |
106
+ | ARC-Easy _(Acc)_ | 0-shot | 69.3 | 77.9 | **78.5** | 75.3 |
107
+ | ARC-Challenge _(Acc)_ | 0-shot | 42.8 | **53.5** | 52.3 | 49.4 |
108
+ | HellaSwag _(Acc)_ | 0-shot | 66.6 | 77.4 | **77.9** | 69.2 |
109
+ | WinoGrande _(Acc)_ | 0-shot | 57.1 | 57.1 | **58.6** | 53.4 |
110
+ | BoolQ _(Acc)_ | 0-shot | **74.6** | 71.5 | 63.5 | 71.2 |
111
+ | CommonsenseQA _(Acc)_ | 0-shot | 49.5 | 54.1 | **55.9** | 55.7 |
112
+ | OpenBookQA _(Acc)_ | 0-shot | 36.4 | 42.0 | **43.6** | 39.4 |
113
+ | PIQA _(Acc)_ | 0-shot | 75.5 | 69.6 | **80.6** | 76.5 |
114
+ | SIQA _(Acc)_ | 0-shot | 47.8 | 52.7 | **53.0** | 46.0 |
115
+ | GSM8K _(EM)_ | 5-shot | 69.1 | 57.8 | **70.7** | 69.7 |
116
+ | **Average** | - | 55.9 | 56.7 | **59.8** | 57.5 |
117
+
118
+ ### Multilingual — General
119
+
120
+ | Benchmark | # Shots | Qwen3-1.7B | Trinity Nano | Granite4-Tiny | **Marco-Nano** |
121
+ |:---|:---:|:---:|:---:|:---:|:---:|
122
+ | GlobalMMLU _(Acc)_ | 5-shot | 49.6 | 43.6 | **54.8** | 52.2 |
123
+ | MMMLU _(Acc)_ | 0-shot | 48.6 | 41.2 | 52.3 | **52.6** |
124
+ | MMLU-ProX-Lite _(Acc)_ | 5-shot | 27.2 | 20.3 | **30.1** | 28.9 |
125
+ | BELEBELE _(Acc)_ | 0-shot | 67.5 | 54.5 | 61.2 | **73.8** |
126
+ | mHellaSwag _(Acc_norm)_ | 0-shot | 43.9 | 42.5 | **53.2** | 48.8 |
127
+ | mARC-Challenge _(Acc_norm)_ | 0-shot | 34.7 | 30.9 | **39.9** | 36.9 |
128
+ | FLORES-200 En→Xx _(BLEU)_ | 5-shot | 18.6 | 15.1 | **25.4** | 24.7 |
129
+ | FLORES-200 Xx→En _(BLEU)_ | 5-shot | 31.5 | 31.1 | **36.7** | 33.6 |
130
+ | WMT24++ En→Xx _(BLEU)_ | 5-shot | 18.3 | 15.0 | **21.9** | 20.7 |
131
+ | WMT24++ Xx→En _(BLEU)_ | 5-shot | 28.3 | 28.0 | **30.7** | 28.1 |
132
+ | MGSM _(EM)_ | 8-shot | 58.8 | 40.6 | 56.7 | **65.3** |
133
+ | **Average** | - | 38.8 | 33.0 | 42.1 | **42.3** |
134
+
135
+ ### Multilingual — Cultural & Regional
136
+
137
+ | Benchmark | # Shots | Qwen3-1.7B | Trinity Nano | Granite4-Tiny | **Marco-Nano** |
138
+ |:---|:---:|:---:|:---:|:---:|:---:|
139
+ | INCLUDE _(Acc)_ | 5-shot | 51.2 | 43.9 | 52.1 | **53.2** |
140
+ | Global-PIQA _(Acc_norm)_ | 0-shot | 60.3 | 52.3 | 64.0 | **64.3** |
141
+ | CMMLU _(Acc)_ | 5-shot | **66.1** | 49.6 | 53.5 | 55.5 |
142
+ | C-Eval _(Acc)_ | 5-shot | **65.1** | 47.6 | 50.9 | 56.0 |
143
+ | ArabicMMLU _(Acc)_ | 3-shot | 57.6 | 44.0 | **60.5** | 55.8 |
144
+ | TurkishMMLU _(Acc)_ | 5-shot | 47.9 | 29.6 | 41.8 | **48.9** |
145
+ | GreekMMLU _(Acc)_ | 5-shot | 58.1 | 52.2 | 62.3 | **64.1** |
146
+ | KazakhMMLU _(Acc)_ | 5-shot | 52.1 | 43.1 | 52.6 | **53.1** |
147
+ | IndoMMLU _(Acc)_ | 0-shot | **51.0** | 41.5 | 49.0 | **51.0** |
148
+ | IndoCareer _(Acc)_ | 3-shot | **53.9** | 46.7 | 53.0 | 52.1 |
149
+ | IndoCulture _(Acc)_ | 0-shot | 51.6 | 49.8 | 51.3 | **57.4** |
150
+ | **Average** | - | **55.9** | 45.5 | 53.7 | 55.6 |
151
+
152
+ ## Usage
153
+
154
+ ```python
155
+ from transformers import AutoModelForCausalLM, AutoTokenizer
156
+
157
+ model_name = "AIDC-AI/Marco-Nano-Base"
158
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
159
+ model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
160
+
161
+ input_text = "The capital of France is"
162
+ inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
163
+ outputs = model.generate(**inputs, max_new_tokens=50)
164
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
165
+ ```
166
+
167
+ ## Citation
168
+
169
+ ```bibtex
170
+ @article{marco-moe,
171
+ title={Marco-MoE: Open Multilingual Mixture-of-Expert Language Models with Efficient Upcycling},
172
+ author={Fan Jiang, Yu Zhao, Chenyang Lyu, Tianqi Shi, Yichao Du, Feihu Jiang, Longyue Wang and Weihua Luo},
173
+ year={2026}
174
+ }
175
+ ```
176
+
177
+ ## License
178
+
179
+ This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
config.json ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Qwen3MoeForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 151643,
8
+ "decoder_sparse_step": 1,
9
+ "dtype": "float32",
10
+ "eos_token_id": 151643,
11
+ "head_dim": 128,
12
+ "hidden_act": "silu",
13
+ "hidden_size": 1024,
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 3072,
16
+ "max_position_embeddings": 32768,
17
+ "max_window_layers": 28,
18
+ "mlp_only_layers": [],
19
+ "model_type": "qwen3_moe",
20
+ "moe_intermediate_size": 384,
21
+ "norm_topk_prob": true,
22
+ "num_attention_heads": 16,
23
+ "num_experts": 232,
24
+ "num_experts_per_tok": 8,
25
+ "num_hidden_layers": 28,
26
+ "num_key_value_heads": 8,
27
+ "output_router_logits": false,
28
+ "qkv_bias": false,
29
+ "rms_norm_eps": 1e-06,
30
+ "rope_scaling": null,
31
+ "rope_theta": 1000000.0,
32
+ "router_aux_loss_coef": 0.001,
33
+ "sliding_window": null,
34
+ "tie_word_embeddings": true,
35
+ "transformers_version": "4.57.1",
36
+ "use_cache": true,
37
+ "use_qk_norm": true,
38
+ "use_sliding_window": false,
39
+ "vocab_size": 151936
40
+ }
configuration.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"framework":"Pytorch","task":"text-generation"}
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 151643,
4
+ "eos_token_id": 151643,
5
+ "transformers_version": "4.57.1"
6
+ }
model-00001-of-00009.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:43928a210920006f4895c25f366edd6568131837910c27cd0c4f31eacc7725a1
3
+ size 1998974704
model-00002-of-00009.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c9a585d1032af8d93ef6d69e639b63d4a78ef689608adef083ab5fdfe78d9dc1
3
+ size 1999760536
model-00003-of-00009.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2e09c8f17f298ba72fefd87d4840ea8889e1c633201f4d89052cfb3b33e12d7c
3
+ size 1999764464
model-00004-of-00009.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ae23ebbc85dc8dc47b6e12ca1c6d5454a67466e52f3cd40ca85ccc60a8382cb7
3
+ size 2000074512
model-00005-of-00009.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:639923c4f1a89d6d0f75dc810ea42707bc4d47d6737520f445550b028003c9a6
3
+ size 1999766736
model-00006-of-00009.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8eccab512ec959d3253a970f9a46a028bed6cace2223d37688130b6cd9a05d48
3
+ size 2000074624
model-00007-of-00009.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:64db2f612bdeec7718138b5cc15b40355d3223e59de0ed91c3616f042ec50ce8
3
+ size 1999766640
model-00008-of-00009.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:083855cc8286d4a9e49d2ad035b524de8f2e43b1414d79473cb56ab6bd47fb09
3
+ size 2000074720
model-00009-of-00009.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:790390ae35ce32737520da0e2ee1e3e6ed951ebc830fcf56befd0e9a53466035
3
+ size 318250520
model.safetensors.index.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,239 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ },
181
+ "151665": {
182
+ "content": "<tool_response>",
183
+ "lstrip": false,
184
+ "normalized": false,
185
+ "rstrip": false,
186
+ "single_word": false,
187
+ "special": false
188
+ },
189
+ "151666": {
190
+ "content": "</tool_response>",
191
+ "lstrip": false,
192
+ "normalized": false,
193
+ "rstrip": false,
194
+ "single_word": false,
195
+ "special": false
196
+ },
197
+ "151667": {
198
+ "content": "<think>",
199
+ "lstrip": false,
200
+ "normalized": false,
201
+ "rstrip": false,
202
+ "single_word": false,
203
+ "special": false
204
+ },
205
+ "151668": {
206
+ "content": "</think>",
207
+ "lstrip": false,
208
+ "normalized": false,
209
+ "rstrip": false,
210
+ "single_word": false,
211
+ "special": false
212
+ }
213
+ },
214
+ "additional_special_tokens": [
215
+ "<|im_start|>",
216
+ "<|im_end|>",
217
+ "<|object_ref_start|>",
218
+ "<|object_ref_end|>",
219
+ "<|box_start|>",
220
+ "<|box_end|>",
221
+ "<|quad_start|>",
222
+ "<|quad_end|>",
223
+ "<|vision_start|>",
224
+ "<|vision_end|>",
225
+ "<|vision_pad|>",
226
+ "<|image_pad|>",
227
+ "<|video_pad|>"
228
+ ],
229
+ "bos_token": null,
230
+ "chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0].role == 'system' %}\n {{- messages[0].content + '\\n\\n' }}\n {%- endif %}\n {{- \"# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0].role == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0].content + '<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n {%- if message.content is string %}\n {%- set content = message.content %}\n {%- else %}\n {%- set content = '' %}\n {%- endif %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) %}\n {{- '<|im_start|>' + message.role + '\\n' + content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- if message.tool_calls %}\n {%- for tool_call in message.tool_calls %}\n {%- if (loop.first and content) or (not loop.first) %}\n {{- '\\n' }}\n {%- endif %}\n {%- if tool_call.function %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {%- if tool_call.arguments is string %}\n {{- tool_call.arguments }}\n {%- else %}\n {{- tool_call.arguments | tojson }}\n {%- endif %}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {%- endif %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if loop.first or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n{%- endif %}",
231
+ "clean_up_tokenization_spaces": false,
232
+ "eos_token": "<|im_end|>",
233
+ "errors": "replace",
234
+ "model_max_length": 131072,
235
+ "pad_token": "<|endoftext|>",
236
+ "split_special_tokens": false,
237
+ "tokenizer_class": "Qwen2Tokenizer",
238
+ "unk_token": null
239
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff