fanjiang98 commited on
Commit
9656f61
·
verified ·
1 Parent(s): f543d57

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -1,3 +1,197 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - zh
6
+ - ar
7
+ - de
8
+ - es
9
+ - fr
10
+ - ko
11
+ - ja
12
+ - pt
13
+ - tr
14
+ - id
15
+ - it
16
+ - nl
17
+ - pl
18
+ - ru
19
+ - vi
20
+ - th
21
+ - he
22
+ - uk
23
+ - ms
24
+ - bn
25
+ - cs
26
+ - ur
27
+ - kk
28
+ - el
29
+ - ro
30
+ - hu
31
+ - ne
32
+ - az
33
+ library_name: transformers
34
+ tags:
35
+ - moe
36
+ - mixture-of-experts
37
+ - multilingual
38
+ - upcycling
39
+ datasets:
40
+ - allenai/Dolci-Instruct-SFT
41
+ - nvidia/Nemotron-Cascade-2-SFT-Data
42
+ - nvidia/Nemotron-RL-instruction_following
43
+ - nvidia/Nemotron-RL-instruction_following-structured_outputs
44
+ - nvidia/Nemotron-RL-ReasoningGym-v1
45
+ - nvidia/Nemotron-RL-knowledge-mcqa
46
+ - nvidia/Nemotron-Cascade-RL-RLHF
47
+ - BytedTsinghua-SIA/DAPO-Math-17k
48
+ - Skywork/Skywork-OR1-RL-Data
49
+ - nvidia/Nemotron-SFT-Multilingual-v1
50
+ ---
51
+
52
+ # Marco-Nano-Instruct
53
+
54
+ **Marco-Nano-Instruct** is the post-trained variant of [Marco-Nano-Base](https://huggingface.co/AIDC-AI/Marco-Nano-Base), a highly sparse Mixture-of-Experts (MoE) multilingual language model from the [Marco-MoE](https://github.com/AIDC-AI/Marco-LLM) family, developed by Alibaba International Digital Commerce. It activates only **0.6B out of 8B total parameters** (7.5% activation ratio) per token. Despite its extreme sparsity, Marco-Nano-Instruct achieves the **best average performance** across English, multilingual general, and multilingual cultural benchmarks among all comparable instruct models up to 3.84B activated parameters.
55
+
56
+ ## Model Description
57
+
58
+ Marco-Nano-Instruct shares the same architecture as [Marco-Nano-Base](https://huggingface.co/AIDC-AI/Marco-Nano-Base): a decoder-only Transformer with sparse MoE layers replacing standard FFN layers, upcycled from [Qwen3-0.6B-Base](https://huggingface.co/Qwen/Qwen3-0.6B-Base) using fine-grained sub-matrix splitting combined with Drop-Upcycling.
59
+
60
+ | Configuration | Value |
61
+ |:---|:---:|
62
+ | Total Parameters | 8B |
63
+ | Activated Parameters | 0.6B |
64
+ | Activation Ratio | 7.5% |
65
+ | Num Layers | 28 |
66
+ | Model Dimension | 1024 |
67
+ | FFN Intermediate Dimension | 3072 |
68
+ | Q-Heads | 16 |
69
+ | KV-Heads | 8 |
70
+ | Head Dimension | 128 |
71
+ | Expert Dimension | 384 |
72
+ | Total Experts | 232 |
73
+ | Activated Experts | 8 |
74
+ | Tie Embeddings | True |
75
+ | Training FLOPs | $1.40 \times 10^{23}$ |
76
+
77
+ ## Post-Training Details
78
+
79
+ Marco-Nano-Instruct is trained from [Marco-Nano-Base](https://huggingface.co/AIDC-AI/Marco-Nano-Base) using a two-stage post-training pipeline implemented with the SLIME framework:
80
+
81
+ ### Stage 1: Supervised Fine-Tuning (SFT)
82
+
83
+ - **Duration:** ~24 hours on 64 GPUs
84
+ - **Steps:** ~4,000 (1 epoch)
85
+ - **Learning rate:** 1e-5 with cosine decay to 1e-6
86
+ - **Batch size:** 512, context length 8,192 tokens
87
+
88
+ **Data sources:**
89
+ 1. **General instructions** — Dolci-Instruct dataset, augmented with Nemotron-Cascade-2 data
90
+ 2. **Knowledge-intensive data** — Scientific prompts from Nemotron-Cascade-2, responses distilled from Gemini3-Flash
91
+ 3. **Translation data** — Web-mined NLLB translation pairs, filtered and scored with [Qwen3-Embedding-8B](https://huggingface.co/Qwen/Qwen3-Embedding-8B) (top 10K per language)
92
+ 4. **Multilingual & cultural data** — Wikidata-sourced content with Gemini3-Flash text synthesis for cultural concepts.
93
+
94
+ ### Stage 2: On-Policy Distillation (OPD)
95
+
96
+ - **Duration:** ~110 hours on 64 GPUs
97
+ - **Steps:** ~2,900 total (2 responses sampled per prompt)
98
+ - **Learning rate:** 1e-6 (constant)
99
+
100
+ **Cascaded distillation:**
101
+ 1. ~1,900 steps with Qwen3-30B-A3B-Instruct as teacher
102
+ 2. ~1,000 steps with Qwen3-Next-80B-A3B-Instruct as stronger teacher
103
+
104
+ **OPD data mixture:**
105
+
106
+ | Category | Datasets | Ratio |
107
+ |:---|:---|:---:|
108
+ | Instruction Following | Nemotron-RL-instruction-following + structured outputs | 25% |
109
+ | Knowledge & Reasoning | Nemotron-RL-ReasoningGym-v1 + knowledge-mcqa | 25% |
110
+ | Alignment | Nemotron-Cascade-RL-RLHF | 10% |
111
+ | Math | DAPO-Math-17k + Skywork-OR1-RL-Data | 10% |
112
+ | Multilingual | Translation + Cultural + Nemotron-SFT-Multilingual-v1 | 30% |
113
+
114
+ ## Supported Languages
115
+
116
+ English, Chinese, Arabic, German, Spanish, French, Korean, Japanese, Portuguese, Turkish, Indonesian, Italian, Dutch, Polish, Russian, Vietnamese, Thai, Hebrew, Ukrainian, Malay, Bengali, Czech, Urdu, Kazakh, Greek, Romanian, Hungarian, Nepali, Azerbaijani
117
+
118
+ ## Evaluation
119
+
120
+ We compare Marco-Nano-Instruct against instruct models of comparable size: **Qwen3-1.7B-Instruct** (1.7B activated), **Qwen3-VL-2B-Instruct** (2B activated), **Ministral3-3B-Instruct** (3.84B activated), **LFM2-8B-A1B** (1.5B activated), and **Granite4-Tiny-Instruct** (1.47B activated). Marco-Nano-Instruct uses only **0.6B activated parameters** — the smallest among all baselines. Avg@8 accuracies are reported, except for GlobalMMLU and MMMLU where Acc@1 is reported.
121
+
122
+ ### English
123
+
124
+ | Benchmark | Qwen3-1.7B | Qwen3-VL-2B | Ministral3-3B | LFM2-8B-A1B | Granite4-Tiny | **Marco-Nano** |
125
+ |:---|:---:|:---:|:---:|:---:|:---:|:---:|
126
+ | MMLU _(Acc)_ | 62.4 | 62.1 | 69.8 | 72.1 | 50.8 | **73.2** |
127
+ | MMLU-Redux _(Acc)_ | 62.4 | 62.2 | 69.6 | 71.9 | 51.2 | **73.3** |
128
+ | MMLU-Pro _(Acc)_ | 35.2 | 38.3 | 49.5 | 49.5 | 25.3 | **54.5** |
129
+ | AGIEval _(Acc)_ | 39.6 | 33.0 | 44.7 | 45.2 | 30.7 | **49.8** |
130
+ | GPQA-Diamond _(Acc)_ | 27.5 | 21.0 | 31.6 | **31.9** | 28.3 | 22.2 |
131
+ | GSM8K _(EM)_ | 77.9 | 79.7 | 79.0 | 84.6 | 71.1 | **86.7** |
132
+ | MATH _(EM)_ | 70.6 | 73.7 | 70.2 | **82.6** | 53.4 | 79.6 |
133
+ | **Average** | 53.7 | 52.9 | 59.2 | 62.5 | 44.4 | **62.8** |
134
+
135
+ ### Multilingual — General
136
+
137
+ | Benchmark | Qwen3-1.7B | Qwen3-VL-2B | Ministral3-3B | LFM2-8B-A1B | Granite4-Tiny | **Marco-Nano** |
138
+ |:---|:---:|:---:|:---:|:---:|:---:|:---:|
139
+ | GlobalMMLU _(Acc)_ | 46.3 | 45.9 | 38.4 | 49.0 | 43.0 | **58.7** |
140
+ | MMMLU _(Acc)_ | 49.0 | 49.0 | 39.4 | 56.5 | 44.1 | **59.9** |
141
+ | MMLU-ProX-Lite _(Acc)_ | 28.6 | 30.3 | 26.7 | 33.8 | 22.1 | **43.2** |
142
+ | MGPQA _(Acc)_ | 25.3 | 22.3 | 18.8 | **27.2** | 25.9 | 21.6 |
143
+ | FLORES-200 En→Xx _(BLEU)_ | 12.7 | 15.3 | 8.3 | 14.9 | **22.5** | 22.3 |
144
+ | FLORES-200 Xx→En _(BLEU)_ | 28.2 | 28.6 | 18.9 | 20.1 | 30.4 | **31.1** |
145
+ | WMT24++ En→Xx _(BLEU)_ | 13.2 | 14.6 | 4.4 | 14.6 | **18.9** | 18.7 |
146
+ | WMT24++ Xx→En _(BLEU)_ | 26.4 | 26.2 | 8.3 | 17.9 | 25.1 | **27.3** |
147
+ | MGSM _(EM)_ | 63.6 | 67.6 | 47.0 | 56.5 | 55.3 | **76.5** |
148
+ | PolyMath _(EM)_ | 23.4 | 25.5 | 16.3 | 26.5 | 18.7 | **29.6** |
149
+ | **Average** | 31.7 | 32.5 | 22.7 | 31.7 | 30.6 | **38.9** |
150
+
151
+ ### Multilingual — Cultural & Regional
152
+
153
+ | Benchmark | Qwen3-1.7B | Qwen3-VL-2B | Ministral3-3B | LFM2-8B-A1B | Granite4-Tiny | **Marco-Nano** |
154
+ |:---|:---:|:---:|:---:|:---:|:---:|:---:|
155
+ | INCLUDE _(Acc)_ | 44.9 | 44.4 | 35.4 | 43.5 | 38.6 | **54.3** |
156
+ | Global-PIQA _(Acc)_ | 62.0 | 65.8 | 50.6 | 60.8 | 63.3 | **70.7** |
157
+ | CMMLU _(Acc)_ | 60.4 | **63.3** | 48.9 | 52.7 | 39.2 | 60.0 |
158
+ | C-Eval _(Acc)_ | 58.7 | **63.2** | 50.6 | 50.8 | 39.4 | 60.8 |
159
+ | ArabicMMLU _(Acc)_ | 48.8 | 46.9 | 22.7 | **56.5** | 43.4 | **56.5** |
160
+ | TurkishMMLU _(Acc)_ | 42.7 | 39.6 | 38.6 | 26.3 | 31.6 | **59.9** |
161
+ | GreekMMLU _(Acc)_ | 48.7 | 48.0 | 38.4 | 40.0 | 44.8 | **61.6** |
162
+ | KazakhMMLU _(Acc)_ | 46.0 | 47.1 | 41.4 | 39.6 | 39.6 | **56.3** |
163
+ | IndoMMLU _(Acc)_ | 48.8 | 49.3 | 35.2 | 41.1 | 37.2 | **56.3** |
164
+ | IndoCareer _(Acc)_ | 46.1 | 45.7 | 36.0 | 41.7 | 34.7 | **54.9** |
165
+ | IndoCulture _(Acc)_ | 45.8 | 47.7 | 37.2 | 45.9 | 42.8 | **59.1** |
166
+ | **Average** | 50.3 | 51.0 | 39.5 | 45.4 | 41.3 | **59.1** |
167
+
168
+ ## Usage
169
+
170
+ ```python
171
+ from transformers import AutoModelForCausalLM, AutoTokenizer
172
+
173
+ model_name = "AIDC-AI/Marco-Nano-Instruct"
174
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
175
+ model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
176
+
177
+ messages = [
178
+ {"role": "user", "content": "What is the capital of France?"}
179
+ ]
180
+ inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
181
+ outputs = model.generate(inputs, max_new_tokens=256)
182
+ print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
183
+ ```
184
+
185
+ ## Citation
186
+
187
+ ```bibtex
188
+ @article{marco-moe,
189
+ title={Marco-MoE: Open Multilingual Mixture-of-Expert Language Models with Efficient Upcycling},
190
+ author={Fan Jiang, Yu Zhao, Chenyang Lyu, Tianqi Shi, Yichao Du, Feihu Jiang, Longyue Wang and Weihua Luo},
191
+ year={2026}
192
+ }
193
+ ```
194
+
195
+ ## License
196
+
197
+ This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
config.json ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Qwen3MoeForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 151643,
8
+ "decoder_sparse_step": 1,
9
+ "dtype": "float32",
10
+ "eos_token_id": 151643,
11
+ "head_dim": 128,
12
+ "hidden_act": "silu",
13
+ "hidden_size": 1024,
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 3072,
16
+ "max_position_embeddings": 32768,
17
+ "max_window_layers": 28,
18
+ "mlp_only_layers": [],
19
+ "model_type": "qwen3_moe",
20
+ "moe_intermediate_size": 384,
21
+ "norm_topk_prob": true,
22
+ "num_attention_heads": 16,
23
+ "num_experts": 232,
24
+ "num_experts_per_tok": 8,
25
+ "num_hidden_layers": 28,
26
+ "num_key_value_heads": 8,
27
+ "output_router_logits": false,
28
+ "qkv_bias": false,
29
+ "rms_norm_eps": 1e-06,
30
+ "rope_scaling": null,
31
+ "rope_theta": 1000000.0,
32
+ "router_aux_loss_coef": 0.001,
33
+ "sliding_window": null,
34
+ "tie_word_embeddings": true,
35
+ "transformers_version": "4.57.1",
36
+ "use_cache": true,
37
+ "use_qk_norm": true,
38
+ "use_sliding_window": false,
39
+ "vocab_size": 151936
40
+ }
configuration.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"framework":"Pytorch","task":"text-generation"}
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 151643,
4
+ "eos_token_id": 151643,
5
+ "transformers_version": "4.57.1"
6
+ }
model-00000-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:da1e1111e03afb05e163fcce14436cf3ba0f45d83704d1ed231e056b56d57536
3
+ size 5368922728
model-00001-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a6e1de44f5e55986fd4245cf25a0f4a52d64a0be0d340756a378793d40c34c5b
3
+ size 5369237272
model-00002-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:70d71fd309de72269bd8c55fd5add7cd2814ce1c8cdd5319f75af028ee462638
3
+ size 5267196360
model.safetensors.index.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,239 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ },
181
+ "151665": {
182
+ "content": "<tool_response>",
183
+ "lstrip": false,
184
+ "normalized": false,
185
+ "rstrip": false,
186
+ "single_word": false,
187
+ "special": false
188
+ },
189
+ "151666": {
190
+ "content": "</tool_response>",
191
+ "lstrip": false,
192
+ "normalized": false,
193
+ "rstrip": false,
194
+ "single_word": false,
195
+ "special": false
196
+ },
197
+ "151667": {
198
+ "content": "<think>",
199
+ "lstrip": false,
200
+ "normalized": false,
201
+ "rstrip": false,
202
+ "single_word": false,
203
+ "special": false
204
+ },
205
+ "151668": {
206
+ "content": "</think>",
207
+ "lstrip": false,
208
+ "normalized": false,
209
+ "rstrip": false,
210
+ "single_word": false,
211
+ "special": false
212
+ }
213
+ },
214
+ "additional_special_tokens": [
215
+ "<|im_start|>",
216
+ "<|im_end|>",
217
+ "<|object_ref_start|>",
218
+ "<|object_ref_end|>",
219
+ "<|box_start|>",
220
+ "<|box_end|>",
221
+ "<|quad_start|>",
222
+ "<|quad_end|>",
223
+ "<|vision_start|>",
224
+ "<|vision_end|>",
225
+ "<|vision_pad|>",
226
+ "<|image_pad|>",
227
+ "<|video_pad|>"
228
+ ],
229
+ "bos_token": null,
230
+ "chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0].role == 'system' %}\n {{- messages[0].content + '\\n\\n' }}\n {%- endif %}\n {{- \"# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0].role == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0].content + '<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n {%- if message.content is string %}\n {%- set content = message.content %}\n {%- else %}\n {%- set content = '' %}\n {%- endif %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) %}\n {{- '<|im_start|>' + message.role + '\\n' + content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- if message.tool_calls %}\n {%- for tool_call in message.tool_calls %}\n {%- if (loop.first and content) or (not loop.first) %}\n {{- '\\n' }}\n {%- endif %}\n {%- if tool_call.function %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {%- if tool_call.arguments is string %}\n {{- tool_call.arguments }}\n {%- else %}\n {{- tool_call.arguments | tojson }}\n {%- endif %}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {%- endif %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if loop.first or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n{%- endif %}",
231
+ "clean_up_tokenization_spaces": false,
232
+ "eos_token": "<|im_end|>",
233
+ "errors": "replace",
234
+ "model_max_length": 131072,
235
+ "pad_token": "<|endoftext|>",
236
+ "split_special_tokens": false,
237
+ "tokenizer_class": "Qwen2Tokenizer",
238
+ "unk_token": null
239
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff