fanjiang98 commited on
Commit
3793a62
·
verified ·
1 Parent(s): 3274ca7

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -1,3 +1,198 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - zh
6
+ - ar
7
+ - de
8
+ - es
9
+ - fr
10
+ - ko
11
+ - ja
12
+ - pt
13
+ - tr
14
+ - id
15
+ - it
16
+ - nl
17
+ - pl
18
+ - ru
19
+ - vi
20
+ - th
21
+ - he
22
+ - uk
23
+ - ms
24
+ - bn
25
+ - cs
26
+ - ur
27
+ - kk
28
+ - el
29
+ - ro
30
+ - hu
31
+ - ne
32
+ - az
33
+ library_name: transformers
34
+ tags:
35
+ - moe
36
+ - mixture-of-experts
37
+ - multilingual
38
+ - upcycling
39
+ - on-policy distillation
40
+ datasets:
41
+ - allenai/Dolci-Instruct-SFT
42
+ - nvidia/Nemotron-Cascade-2-SFT-Data
43
+ - nvidia/Nemotron-RL-instruction_following
44
+ - nvidia/Nemotron-RL-instruction_following-structured_outputs
45
+ - nvidia/Nemotron-RL-ReasoningGym-v1
46
+ - nvidia/Nemotron-RL-knowledge-mcqa
47
+ - nvidia/Nemotron-Cascade-RL-RLHF
48
+ - BytedTsinghua-SIA/DAPO-Math-17k
49
+ - Skywork/Skywork-OR1-RL-Data
50
+ - nvidia/Nemotron-SFT-Multilingual-v1
51
+ ---
52
+
53
+ # Marco-Mini-Instruct
54
+
55
+ **Marco-Mini-Instruct** is the instruction-tuned variant of [Marco-Mini-Base](https://huggingface.co/AIDC-AI/Marco-Mini-Base), a highly sparse Mixture-of-Experts (MoE) multilingual language model from the [Marco-MoE](https://github.com/AIDC-AI/Marco-LLM) family, developed by Alibaba International Digital Commerce. It activates only **0.86B out of 17.3B total parameters** (5% activation ratio) per token. Marco-Mini-Instruct achieves the **best average performance** across English, multilingual general, and multilingual cultural benchmarks when compared against instruct models with up to 12B activated parameters, including Qwen3-4B-Instruct, Ministral3-8B-Instruct, Gemma3-12B-Instruct, LFM2-24B-A2B, and Granite4-Small-Instruct.
56
+
57
+ ## Model Description
58
+
59
+ Marco-Mini-Instruct shares the same architecture as [Marco-Mini-Base](https://huggingface.co/AIDC-AI/Marco-Mini-Base): a decoder-only Transformer with sparse MoE layers replacing standard FFN layers, upcycled from [Qwen3-0.6B-Base](https://huggingface.co/Qwen/Qwen3-0.6B-Base) using fine-grained sub-matrix splitting combined with Drop-Upcycling.
60
+
61
+ | Configuration | Value |
62
+ |:---|:---:|
63
+ | Total Parameters | 17.3B |
64
+ | Activated Parameters | 0.86B |
65
+ | Activation Ratio | 5% |
66
+ | Num Layers | 28 |
67
+ | Model Dimension | 1024 |
68
+ | FFN Intermediate Dimension | 3072 |
69
+ | Q-Heads | 16 |
70
+ | KV-Heads | 8 |
71
+ | Head Dimension | 128 |
72
+ | Expert Dimension | 768 |
73
+ | Total Experts | 256 |
74
+ | Activated Experts | 8 |
75
+ | Tie Embeddings | True |
76
+ | Training FLOPs | $1.56 \times 10^{23}$ |
77
+
78
+ ## Post-Training Details
79
+
80
+ Marco-Mini-Instruct is trained from [Marco-Mini-Base](https://huggingface.co/AIDC-AI/Marco-Mini-Base) using a two-stage post-training pipeline implemented with the SLIME framework:
81
+
82
+ ### Stage 1: Supervised Fine-Tuning (SFT)
83
+
84
+ - **Duration:** ~24 hours on 64 GPUs
85
+ - **Steps:** ~4,000 (1 epoch)
86
+ - **Learning rate:** 1e-5 with cosine decay to 1e-6
87
+ - **Batch size:** 512, context length 8,192 tokens
88
+
89
+ **Data sources:**
90
+ 1. **General instructions** — Dolci-Instruct dataset, augmented with Nemotron-Cascade-2 data
91
+ 2. **Knowledge-intensive data** — Scientific prompts from Nemotron-Cascade-2, responses distilled from Gemini3-Flash
92
+ 3. **Translation data** — Web-mined NLLB translation pairs, filtered and scored with Qwen3-Embedding-8B (top 10K per language)
93
+ 4. **Multilingual & cultural data** — Wikidata-sourced content with Gemini3-Flash text synthesis for cultural concepts.
94
+
95
+ ### Stage 2: On-Policy Distillation (OPD)
96
+
97
+ - **Duration:** ~110 hours on 64 GPUs
98
+ - **Steps:** ~3,800 total (2 responses sampled per prompt)
99
+ - **Learning rate:** 1e-6 (constant)
100
+
101
+ **Cascaded distillation:**
102
+ 1. ~1,900 steps with Qwen3-30B-A3B-Instruct as teacher
103
+ 2. ~1,900 steps with Qwen3-Next-80B-A3B-Instruct as stronger teacher
104
+
105
+ **OPD data mixture:**
106
+
107
+ | Category | Datasets | Ratio |
108
+ |:---|:---|:---:|
109
+ | Instruction Following | Nemotron-RL-instruction-following + structured outputs | 25% |
110
+ | Knowledge & Reasoning | Nemotron-RL-ReasoningGym-v1 + knowledge-mcqa | 25% |
111
+ | Alignment | Nemotron-Cascade-RL-RLHF | 10% |
112
+ | Math | DAPO-Math-17k + Skywork-OR1-RL-Data | 10% |
113
+ | Multilingual | Translation + Cultural + Nemotron-SFT-Multilingual-v1 | 30% |
114
+
115
+ ## Supported Languages
116
+
117
+ English, Chinese, Arabic, German, Spanish, French, Korean, Japanese, Portuguese, Turkish, Indonesian, Italian, Dutch, Polish, Russian, Vietnamese, Thai, Hebrew, Ukrainian, Malay, Bengali, Czech, Urdu, Kazakh, Greek, Romanian, Hungarian, Nepali, Azerbaijani
118
+
119
+ ## Evaluation
120
+
121
+ We compare Marco-Mini-Instruct against strong instruct baselines: **Qwen3-4B-Instruct** (4B activated), **Ministral3-8B-Instruct** (8.8B activated), **Gemma3-12B-Instruct** (12B activated), **Granite4-Small-Instruct** (9B activated), and **LFM2-24B-A2B** (2B activated). Marco-Mini-Instruct uses only **0.86B activated parameters**. Avg@8 accuracies are reported, except for GlobalMMLU and MMMLU where Acc@1 is reported.
122
+
123
+ ### English
124
+
125
+ | Benchmark | Qwen3-4B | Ministral3-8B | Gemma3-12B | Granite4-Small | LFM2-24B-A2B | **Marco-Mini** |
126
+ |:---|:---:|:---:|:---:|:---:|:---:|:---:|
127
+ | MMLU _(Acc)_ | 80.8 | 79.8 | 76.2 | 76.7 | 74.9 | **83.4** |
128
+ | MMLU-Redux _(Acc)_ | 80.9 | 79.9 | 76.2 | 76.7 | 74.9 | **83.5** |
129
+ | MMLU-Pro _(Acc)_ | 66.9 | 63.9 | 55.8 | 57.1 | 57.6 | **70.7** |
130
+ | AGIEval _(Acc)_ | 51.7 | 52.4 | 43.6 | 44.7 | 49.0 | **55.4** |
131
+ | GPQA-Diamond _(Acc)_ | **50.8** | 44.8 | 35.2 | 38.6 | 39.7 | 50.3 |
132
+ | GSM8K _(EM)_ | 88.6 | 89.5 | 89.7 | 83.9 | 87.2 | **93.1** |
133
+ | MATH _(EM)_ | **93.4** | 86.2 | 83.8 | 75.7 | 83.9 | 91.8 |
134
+ | **Average** | 73.3 | 70.9 | 65.8 | 64.8 | 66.7 | **75.5** |
135
+
136
+ ### Multilingual — General
137
+
138
+ | Benchmark | Qwen3-4B | Ministral3-8B | Gemma3-12B | Granite4-Small | LFM2-24B-A2B | **Marco-Mini** |
139
+ |:---|:---:|:---:|:---:|:---:|:---:|:---:|
140
+ | GlobalMMLU _(Acc)_ | 70.2 | 55.4 | 69.2 | 67.4 | 57.0 | **73.3** |
141
+ | MMMLU _(Acc)_ | 71.3 | 56.4 | 69.4 | 68.1 | 62.3 | **73.7** |
142
+ | MMLU-ProX-Lite _(Acc)_ | 58.3 | 43.3 | 51.3 | 51.6 | 43.3 | **61.2** |
143
+ | MGPQA _(Acc)_ | 41.0 | 30.5 | 32.8 | 35.0 | 32.7 | **41.8** |
144
+ | FLORES-200 En→Xx _(BLEU)_ | 22.1 | 17.5 | **35.6** | 31.9 | 19.2 | 30.6 |
145
+ | FLORES-200 Xx→En _(BLEU)_ | 33.5 | 31.0 | **40.3** | 32.2 | 22.7 | 36.8 |
146
+ | WMT24++ En→Xx _(BLEU)_ | 20.9 | 14.4 | **32.1** | 26.6 | 16.0 | 26.8 |
147
+ | WMT24++ Xx→En _(BLEU)_ | 29.9 | 24.2 | **35.5** | 27.5 | 18.8 | 31.3 |
148
+ | MGSM _(EM)_ | 84.4 | 68.7 | 84.0 | 75.7 | 67.8 | **87.4** |
149
+ | PolyMath _(EM)_ | **47.2** | 26.4 | 35.5 | 28.9 | 29.3 | 44.7 |
150
+ | **Average** | 47.9 | 36.8 | 48.6 | 44.5 | 36.9 | **50.8** |
151
+
152
+ ### Multilingual — Cultural & Regional
153
+
154
+ | Benchmark | Qwen3-4B | Ministral3-8B | Gemma3-12B | Granite4-Small | LFM2-24B-A2B | **Marco-Mini** |
155
+ |:---|:---:|:---:|:---:|:---:|:---:|:---:|
156
+ | INCLUDE _(Acc)_ | 63.8 | 50.7 | 65.0 | 60.3 | 49.1 | **65.6** |
157
+ | Global-PIQA _(Acc)_ | 79.6 | 61.3 | 82.2 | 80.2 | 69.0 | **84.2** |
158
+ | CMMLU _(Acc)_ | **78.6** | 67.4 | 60.8 | 59.6 | 56.7 | 75.3 |
159
+ | C-Eval _(Acc)_ | **80.4** | 68.0 | 59.7 | 59.4 | 56.7 | 75.4 |
160
+ | ArabicMMLU _(Acc)_ | 66.0 | 41.4 | **70.1** | 66.3 | 61.3 | 67.8 |
161
+ | TurkishMMLU _(Acc)_ | 71.6 | 48.2 | 64.4 | 57.9 | 33.4 | **74.7** |
162
+ | GreekMMLU _(Acc)_ | 68.6 | 49.5 | **77.7** | 71.7 | 44.7 | 72.5 |
163
+ | KazakhMMLU _(Acc)_ | 66.6 | 59.1 | 66.8 | 63.5 | 47.6 | **68.8** |
164
+ | IndoMMLU _(Acc)_ | 64.4 | 52.4 | 65.3 | 59.6 | 42.7 | **65.7** |
165
+ | IndoCareer _(Acc)_ | 62.2 | 53.4 | 63.2 | 56.3 | 43.7 | **64.4** |
166
+ | IndoCulture _(Acc)_ | 58.7 | 47.8 | **69.6** | 59.3 | 44.2 | 67.1 |
167
+ | **Average** | 69.1 | 54.5 | 67.7 | 63.1 | 49.9 | **71.0** |
168
+
169
+ ## Usage
170
+
171
+ ```python
172
+ from transformers import AutoModelForCausalLM, AutoTokenizer
173
+
174
+ model_name = "AIDC-AI/Marco-Mini-Instruct"
175
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
176
+ model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
177
+
178
+ messages = [
179
+ {"role": "user", "content": "What is the capital of France?"}
180
+ ]
181
+ inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
182
+ outputs = model.generate(inputs, max_new_tokens=256)
183
+ print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
184
+ ```
185
+
186
+ ## Citation
187
+
188
+ ```bibtex
189
+ @article{marco-moe,
190
+ title={Marco-MoE: Open Multilingual Mixture-of-Expert Language Models with Efficient Upcycling},
191
+ author={Fan Jiang, Yu Zhao, Chenyang Lyu, Tianqi Shi, Yichao Du, Feihu Jiang, Longyue Wang and Weihua Luo},
192
+ year={2026}
193
+ }
194
+ ```
195
+
196
+ ## License
197
+
198
+ This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
config.json ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Qwen3MoeForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 151643,
8
+ "decoder_sparse_step": 1,
9
+ "dtype": "float32",
10
+ "eos_token_id": 151643,
11
+ "head_dim": 128,
12
+ "hidden_act": "silu",
13
+ "hidden_size": 1024,
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 3072,
16
+ "max_position_embeddings": 32768,
17
+ "max_window_layers": 28,
18
+ "mlp_only_layers": [],
19
+ "model_type": "qwen3_moe",
20
+ "moe_intermediate_size": 768,
21
+ "norm_topk_prob": true,
22
+ "num_attention_heads": 16,
23
+ "num_experts": 256,
24
+ "num_experts_per_tok": 8,
25
+ "num_hidden_layers": 28,
26
+ "num_key_value_heads": 8,
27
+ "output_router_logits": false,
28
+ "qkv_bias": false,
29
+ "rms_norm_eps": 1e-06,
30
+ "rope_scaling": null,
31
+ "rope_theta": 1000000.0,
32
+ "router_aux_loss_coef": 0.001,
33
+ "sliding_window": null,
34
+ "tie_word_embeddings": true,
35
+ "transformers_version": "4.57.1",
36
+ "use_cache": true,
37
+ "use_qk_norm": true,
38
+ "use_sliding_window": false,
39
+ "vocab_size": 151936
40
+ }
configuration.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"framework":"Pytorch","task":"text-generation"}
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 151643,
4
+ "eos_token_id": 151643,
5
+ "transformers_version": "4.57.1"
6
+ }
model-00000-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3b19e7172bcd8b7ffe266ead406aabcb392b711ae3c220d08418f331a6ee4cc0
3
+ size 5368320120
model-00001-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1bf0abe85405e521bb875032eccfb9eb303ebf4a40a17308d514100122334262
3
+ size 5367589824
model-00002-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0c4618c2a2a2af2abec0ee6526bdec4bc28811360cd8d9f8e4017894f2832a9c
3
+ size 5369132800
model-00003-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:37e1a800efe2e84ffc70bed96800992f9634ce9761bea0e25b07e8e8f2dcf967
3
+ size 5367593048
model-00004-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9695175d9c065c195a915756a6635a4bd44e28d77ce6ffe3c222c64732f1a9ad
3
+ size 5367593272
model-00005-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:50ec013c1c1ce3dba2eeba8e5edee2b8501b6a14d84ea4059d22cb79089b47bd
3
+ size 5368609760
model-00006-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2361868fa0ce57386c5a539e485c77c656c6990bfbeb921c8fabd4a44b285253
3
+ size 2295024184
model.safetensors.index.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "151643": {
5
+ "content": "<|endoftext|>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "151644": {
13
+ "content": "<|im_start|>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "151645": {
21
+ "content": "<|im_end|>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "151646": {
29
+ "content": "<|object_ref_start|>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "151647": {
37
+ "content": "<|object_ref_end|>",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ },
44
+ "151648": {
45
+ "content": "<|box_start|>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": true
51
+ },
52
+ "151649": {
53
+ "content": "<|box_end|>",
54
+ "lstrip": false,
55
+ "normalized": false,
56
+ "rstrip": false,
57
+ "single_word": false,
58
+ "special": true
59
+ },
60
+ "151650": {
61
+ "content": "<|quad_start|>",
62
+ "lstrip": false,
63
+ "normalized": false,
64
+ "rstrip": false,
65
+ "single_word": false,
66
+ "special": true
67
+ },
68
+ "151651": {
69
+ "content": "<|quad_end|>",
70
+ "lstrip": false,
71
+ "normalized": false,
72
+ "rstrip": false,
73
+ "single_word": false,
74
+ "special": true
75
+ },
76
+ "151652": {
77
+ "content": "<|vision_start|>",
78
+ "lstrip": false,
79
+ "normalized": false,
80
+ "rstrip": false,
81
+ "single_word": false,
82
+ "special": true
83
+ },
84
+ "151653": {
85
+ "content": "<|vision_end|>",
86
+ "lstrip": false,
87
+ "normalized": false,
88
+ "rstrip": false,
89
+ "single_word": false,
90
+ "special": true
91
+ },
92
+ "151654": {
93
+ "content": "<|vision_pad|>",
94
+ "lstrip": false,
95
+ "normalized": false,
96
+ "rstrip": false,
97
+ "single_word": false,
98
+ "special": true
99
+ },
100
+ "151655": {
101
+ "content": "<|image_pad|>",
102
+ "lstrip": false,
103
+ "normalized": false,
104
+ "rstrip": false,
105
+ "single_word": false,
106
+ "special": true
107
+ },
108
+ "151656": {
109
+ "content": "<|video_pad|>",
110
+ "lstrip": false,
111
+ "normalized": false,
112
+ "rstrip": false,
113
+ "single_word": false,
114
+ "special": true
115
+ },
116
+ "151657": {
117
+ "content": "<tool_call>",
118
+ "lstrip": false,
119
+ "normalized": false,
120
+ "rstrip": false,
121
+ "single_word": false,
122
+ "special": false
123
+ },
124
+ "151658": {
125
+ "content": "</tool_call>",
126
+ "lstrip": false,
127
+ "normalized": false,
128
+ "rstrip": false,
129
+ "single_word": false,
130
+ "special": false
131
+ },
132
+ "151659": {
133
+ "content": "<|fim_prefix|>",
134
+ "lstrip": false,
135
+ "normalized": false,
136
+ "rstrip": false,
137
+ "single_word": false,
138
+ "special": false
139
+ },
140
+ "151660": {
141
+ "content": "<|fim_middle|>",
142
+ "lstrip": false,
143
+ "normalized": false,
144
+ "rstrip": false,
145
+ "single_word": false,
146
+ "special": false
147
+ },
148
+ "151661": {
149
+ "content": "<|fim_suffix|>",
150
+ "lstrip": false,
151
+ "normalized": false,
152
+ "rstrip": false,
153
+ "single_word": false,
154
+ "special": false
155
+ },
156
+ "151662": {
157
+ "content": "<|fim_pad|>",
158
+ "lstrip": false,
159
+ "normalized": false,
160
+ "rstrip": false,
161
+ "single_word": false,
162
+ "special": false
163
+ },
164
+ "151663": {
165
+ "content": "<|repo_name|>",
166
+ "lstrip": false,
167
+ "normalized": false,
168
+ "rstrip": false,
169
+ "single_word": false,
170
+ "special": false
171
+ },
172
+ "151664": {
173
+ "content": "<|file_sep|>",
174
+ "lstrip": false,
175
+ "normalized": false,
176
+ "rstrip": false,
177
+ "single_word": false,
178
+ "special": false
179
+ }
180
+ },
181
+ "additional_special_tokens": [
182
+ "<|im_start|>",
183
+ "<|im_end|>",
184
+ "<|object_ref_start|>",
185
+ "<|object_ref_end|>",
186
+ "<|box_start|>",
187
+ "<|box_end|>",
188
+ "<|quad_start|>",
189
+ "<|quad_end|>",
190
+ "<|vision_start|>",
191
+ "<|vision_end|>",
192
+ "<|vision_pad|>",
193
+ "<|image_pad|>",
194
+ "<|video_pad|>"
195
+ ],
196
+ "bos_token": null,
197
+ "chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0].role == 'system' %}\n {{- messages[0].content + '\\n\\n' }}\n {%- endif %}\n {{- \"# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0].role == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0].content + '<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n {%- if message.content is string %}\n {%- set content = message.content %}\n {%- else %}\n {%- set content = '' %}\n {%- endif %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) %}\n {{- '<|im_start|>' + message.role + '\\n' + content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- if message.tool_calls %}\n {%- for tool_call in message.tool_calls %}\n {%- if (loop.first and content) or (not loop.first) %}\n {{- '\\n' }}\n {%- endif %}\n {%- if tool_call.function %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {%- if tool_call.arguments is string %}\n {{- tool_call.arguments }}\n {%- else %}\n {{- tool_call.arguments | tojson }}\n {%- endif %}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {%- endif %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if loop.first or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n{%- endif %}",
198
+ "clean_up_tokenization_spaces": false,
199
+ "eos_token": "<|im_end|>",
200
+ "errors": "replace",
201
+ "model_max_length": 131072,
202
+ "pad_token": "<|endoftext|>",
203
+ "split_special_tokens": false,
204
+ "tokenizer_class": "Qwen2Tokenizer",
205
+ "unk_token": null,
206
+ "add_bos_token": false
207
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff