matichon zhouyx1998 commited on
Commit
8b0024e
·
0 Parent(s):

Duplicate from openbmb/VoxCPM2

Browse files

Co-authored-by: Yixuan Zhou <zhouyx1998@users.noreply.huggingface.co>

.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,225 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - zh
4
+ - en
5
+ - ar
6
+ - my
7
+ - da
8
+ - nl
9
+ - fi
10
+ - fr
11
+ - de
12
+ - el
13
+ - he
14
+ - hi
15
+ - id
16
+ - it
17
+ - ja
18
+ - km
19
+ - ko
20
+ - lo
21
+ - ms
22
+ - no
23
+ - pl
24
+ - pt
25
+ - ru
26
+ - es
27
+ - sw
28
+ - sv
29
+ - tl
30
+ - th
31
+ - tr
32
+ - vi
33
+ license: apache-2.0
34
+ library_name: voxcpm
35
+ tags:
36
+ - text-to-speech
37
+ - tts
38
+ - multilingual
39
+ - voice-cloning
40
+ - voice-design
41
+ - diffusion
42
+ - audio
43
+ pipeline_tag: text-to-speech
44
+ ---
45
+
46
+ # VoxCPM2
47
+
48
+ **VoxCPM2** is a tokenizer-free, diffusion autoregressive Text-to-Speech model — **2B parameters**, **30 languages**, **48kHz** audio output, trained on over **2 million hours** of multilingual speech data.
49
+
50
+ [![GitHub](https://img.shields.io/badge/GitHub-VoxCPM-blue?logo=github)](https://github.com/OpenBMB/VoxCPM)
51
+ [![Docs](https://img.shields.io/badge/Docs-ReadTheDocs-8CA1AF)](https://voxcpm.readthedocs.io/en/latest/)
52
+ [![Demo](https://img.shields.io/badge/Live%20Playground-Demo-orange)](https://huggingface.co/spaces/OpenBMB/VoxCPM-Demo)
53
+ [![Audio Samples](https://img.shields.io/badge/Audio%20Samples-Demo%20Page-green)](https://openbmb.github.io/voxcpm2-demopage)
54
+ [![Discord](https://img.shields.io/badge/Discord-VoxCPM-5865F2?logo=discord&logoColor=white)](https://discord.gg/KZUx7tVNwz)
55
+
56
+ ## Highlights
57
+
58
+ - 🌍 **30-Language Multilingual** — No language tag needed; input text in any supported language directly
59
+ - 🎨 **Voice Design** — Generate a novel voice from a natural-language description alone (gender, age, tone, emotion, pace…); no reference audio required
60
+ - 🎛️ **Controllable Cloning** — Clone any voice from a short clip, with optional style guidance to steer emotion, pace, and expression while preserving timbre
61
+ - 🎙️ **Ultimate Cloning** — Provide reference audio + its transcript for audio-continuation cloning; every vocal nuance faithfully reproduced
62
+ - 🔊 **48kHz Studio-Quality Output** — Accepts 16kHz reference; outputs 48kHz via AudioVAE V2's built-in super-resolution, no external upsampler needed
63
+ - 🧠 **Context-Aware Synthesis** — Automatically infers appropriate prosody and expressiveness from text content
64
+ - ⚡ **Real-Time Streaming** — RTF as low as ~0.3 on NVIDIA RTX 4090, and ~0.13 accelerated by [Nano-VLLM](https://github.com/a710128/nanovllm-voxcpm)
65
+ - 📜 **Fully Open-Source & Commercial-Ready** — Apache-2.0 license, free for commercial use
66
+
67
+ <details>
68
+ <summary><b>Supported Languages (30)</b></summary>
69
+
70
+ Arabic, Burmese, Chinese, Danish, Dutch, English, Finnish, French, German, Greek, Hebrew, Hindi, Indonesian, Italian, Japanese, Khmer, Korean, Lao, Malay, Norwegian, Polish, Portuguese, Russian, Spanish, Swahili, Swedish, Tagalog, Thai, Turkish, Vietnamese
71
+
72
+ Chinese Dialects: 四川话, 粤语, 吴语, 东北话, 河南话, 陕西话, 山东话, 天津话, 闽南话
73
+ </details>
74
+
75
+ ## Quick Start
76
+
77
+ ### Installation
78
+
79
+ ```bash
80
+ pip install voxcpm
81
+ ```
82
+
83
+ **Requirements:** Python ≥ 3.10, PyTorch ≥ 2.5.0, CUDA ≥ 12.0 · [Full Quick Start →](https://voxcpm.readthedocs.io/en/latest/quickstart.html)
84
+
85
+ ### Text-to-Speech
86
+
87
+ ```python
88
+ from voxcpm import VoxCPM
89
+ import soundfile as sf
90
+
91
+ model = VoxCPM.from_pretrained("openbmb/VoxCPM2", load_denoiser=False)
92
+
93
+ wav = model.generate(
94
+ text="VoxCPM2 brings multilingual support, creative voice design, and controllable voice cloning.",
95
+ cfg_value=2.0,
96
+ inference_timesteps=10,
97
+ )
98
+ sf.write("output.wav", wav, model.tts_model.sample_rate)
99
+ ```
100
+
101
+ ### Voice Design
102
+
103
+ Put the voice description in parentheses at the start of `text`, followed by the content to synthesize:
104
+
105
+ ```python
106
+ wav = model.generate(
107
+ text="(A young woman, gentle and sweet voice)Hello, welcome to VoxCPM2!",
108
+ cfg_value=2.0,
109
+ inference_timesteps=10,
110
+ )
111
+ sf.write("voice_design.wav", wav, model.tts_model.sample_rate)
112
+ ```
113
+
114
+ ### Controllable Voice Cloning
115
+
116
+ ```python
117
+ # Basic cloning
118
+ wav = model.generate(
119
+ text="This is a cloned voice generated by VoxCPM2.",
120
+ reference_wav_path="speaker.wav",
121
+ )
122
+ sf.write("clone.wav", wav, model.tts_model.sample_rate)
123
+
124
+ # Cloning with style control
125
+ wav = model.generate(
126
+ text="(slightly faster, cheerful tone)This is a cloned voice with style control.",
127
+ reference_wav_path="speaker.wav",
128
+ cfg_value=2.0,
129
+ inference_timesteps=10,
130
+ )
131
+ sf.write("controllable_clone.wav", wav, model.tts_model.sample_rate)
132
+ ```
133
+
134
+ ### Ultimate Cloning
135
+
136
+ Provide both the reference audio and its exact transcript for maximum fidelity. Pass the same clip to both `reference_wav_path` and `prompt_wav_path` for highest similarity:
137
+
138
+ ```python
139
+ wav = model.generate(
140
+ text="This is an ultimate cloning demonstration using VoxCPM2.",
141
+ prompt_wav_path="speaker_reference.wav",
142
+ prompt_text="The transcript of the reference audio.",
143
+ reference_wav_path="speaker_reference.wav",
144
+ )
145
+ sf.write("hifi_clone.wav", wav, model.tts_model.sample_rate)
146
+ ```
147
+
148
+ ### Streaming
149
+
150
+ ```python
151
+ import numpy as np
152
+
153
+ chunks = []
154
+ for chunk in model.generate_streaming(text="Streaming is easy with VoxCPM!"):
155
+ chunks.append(chunk)
156
+ wav = np.concatenate(chunks)
157
+ sf.write("streaming.wav", wav, model.tts_model.sample_rate)
158
+ ```
159
+
160
+ ## Model Details
161
+
162
+ | Property | Value |
163
+ |---|---|
164
+ | Architecture | Tokenizer-free Diffusion Autoregressive (LocEnc → TSLM → RALM → LocDiT) |
165
+ | Backbone | Based on MiniCPM-4, totally 2B parameters |
166
+ | Audio VAE | AudioVAE V2 (asymmetric encode/decode, 16kHz in → 48kHz out) |
167
+ | Training Data | 2M+ hours multilingual speech |
168
+ | LM Token Rate | 6.25 Hz |
169
+ | Max Sequence Length | 8192 tokens |
170
+ | dtype | bfloat16 |
171
+ | VRAM | ~8 GB |
172
+ | RTF (RTX 4090) | ~0.30 (standard) / ~0.13 (Nano-vLLM) |
173
+
174
+ ## Performance
175
+
176
+ VoxCPM2 achieves state-of-the-art or competitive results on major zero-shot and controllable TTS benchmarks.
177
+
178
+ See the [GitHub repo](https://github.com/OpenBMB/VoxCPM#-performance) for full benchmark tables (Seed-TTS-eval, CV3-eval, InstructTTSEval, MiniMax Multilingual Test).
179
+
180
+ ## Fine-tuning
181
+
182
+ VoxCPM2 supports both full SFT and LoRA fine-tuning with as little as 5–10 minutes of audio:
183
+
184
+ ```bash
185
+ # LoRA fine-tuning (recommended)
186
+ python scripts/train_voxcpm_finetune.py \
187
+ --config_path conf/voxcpm_v2/voxcpm_finetune_lora.yaml
188
+
189
+ # Full fine-tuning
190
+ python scripts/train_voxcpm_finetune.py \
191
+ --config_path conf/voxcpm_v2/voxcpm_finetune_all.yaml
192
+ ```
193
+
194
+ See the [Fine-tuning Guide](https://voxcpm.readthedocs.io/en/latest/finetuning/finetune.html) for full instructions.
195
+
196
+ ## Limitations
197
+
198
+ - Voice Design and Style Control results may vary between runs; generating 1–3 times is recommended to obtain the desired output.
199
+ - Performance varies across languages depending on training data availability.
200
+ - Occasional instability may occur with very long or highly expressive inputs.
201
+ - **Strictly forbidden** to use for impersonation, fraud, or disinformation. AI-generated content should be clearly labeled.
202
+
203
+ ## Citation
204
+
205
+ ```bibtex
206
+ @article{voxcpm2_2026,
207
+ title = {VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning},
208
+ author = {VoxCPM Team},
209
+ journal = {GitHub},
210
+ year = {2026},
211
+ }
212
+
213
+ @article{voxcpm2025,
214
+ title = {VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning},
215
+ author = {Zhou, Yixuan and Zeng, Guoyang and Liu, Xin and Li, Xiang and
216
+ Yu, Renjie and Wang, Ziyang and Ye, Runchuan and Sun, Weiyue and
217
+ Gui, Jiancheng and Li, Kehan and Wu, Zhiyong and Liu, Zhiyuan},
218
+ journal = {arXiv preprint arXiv:2509.24650},
219
+ year = {2025},
220
+ }
221
+ ```
222
+
223
+ ## License
224
+
225
+ Released under the [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) license, free for commercial use. For production deployments, we recommend thorough testing and safety evaluation tailored to your use case.
audiovae.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:94b5d51e107e0507d4acc976cfdadb64edd6fd06d1f751dadbf2fd1594274bf1
3
+ size 376951122
config.json ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architecture": "voxcpm2",
3
+ "lm_config": {
4
+ "bos_token_id": 1,
5
+ "eos_token_id": 2,
6
+ "hidden_size": 2048,
7
+ "intermediate_size": 6144,
8
+ "max_position_embeddings": 32768,
9
+ "num_attention_heads": 16,
10
+ "num_hidden_layers": 28,
11
+ "num_key_value_heads": 2,
12
+ "rms_norm_eps": 1e-05,
13
+ "rope_theta": 10000,
14
+ "kv_channels": 128,
15
+ "rope_scaling": {
16
+ "type": "longrope",
17
+ "long_factor": [0.9977997200264581, 1.014658295992452, 1.0349680404997148, 1.059429246056193, 1.0888815016813513, 1.1243301355211495, 1.166977103606075, 1.2182568066927284, 1.2798772354275727, 1.3538666751582975, 1.4426259039919596, 1.5489853358570191, 1.6762658237220625, 1.8283407612492941, 2.0096956085876183, 2.225478927469756, 2.481536379650452, 2.784415934557119, 3.1413289096347365, 3.560047844772632, 4.048719380066383, 4.615569542115128, 5.2684819496549835, 6.014438591970396, 6.858830049237097, 7.804668263503327, 8.851768731513417, 9.99600492938444, 11.228766118181639, 12.536757560834843, 13.902257701387796, 15.303885189125953, 16.717837610115794, 18.119465097853947, 19.484965238406907, 20.792956681060105, 22.02571786985731, 23.16995406772833, 24.217054535738416, 25.16289275000465, 26.007284207271347, 26.753240849586767, 27.40615325712662, 27.973003419175363, 28.461674954469114, 28.880393889607006, 29.237306864684626, 29.540186419591297, 29.79624387177199, 30.01202719065413, 30.193382037992453, 30.34545697551969, 30.47273746338473, 30.579096895249787, 30.66785612408345, 30.741845563814174, 30.80346599254902, 30.85474569563567, 30.897392663720595, 30.932841297560394, 30.962293553185553, 30.986754758742034, 31.007064503249293, 31.02392307921529],
18
+ "short_factor": [0.9977997200264581, 1.014658295992452, 1.0349680404997148, 1.059429246056193, 1.0888815016813513, 1.1243301355211495, 1.166977103606075, 1.2182568066927284, 1.2798772354275727, 1.3538666751582975, 1.4426259039919596, 1.5489853358570191, 1.6762658237220625, 1.8283407612492941, 2.0096956085876183, 2.225478927469756, 2.481536379650452, 2.784415934557119, 3.1413289096347365, 3.560047844772632, 4.048719380066383, 4.615569542115128, 5.2684819496549835, 6.014438591970396, 6.858830049237097, 7.804668263503327, 8.851768731513417, 9.99600492938444, 11.228766118181639, 12.536757560834843, 13.902257701387796, 15.303885189125953, 16.717837610115794, 18.119465097853947, 19.484965238406907, 20.792956681060105, 22.02571786985731, 23.16995406772833, 24.217054535738416, 25.16289275000465, 26.007284207271347, 26.753240849586767, 27.40615325712662, 27.973003419175363, 28.461674954469114, 28.880393889607006, 29.237306864684626, 29.540186419591297, 29.79624387177199, 30.01202719065413, 30.193382037992453, 30.34545697551969, 30.47273746338473, 30.579096895249787, 30.66785612408345, 30.741845563814174, 30.80346599254902, 30.85474569563567, 30.897392663720595, 30.932841297560394, 30.962293553185553, 30.986754758742034, 31.007064503249293, 31.02392307921529],
19
+ "original_max_position_embeddings": 32768
20
+ },
21
+ "vocab_size": 73448,
22
+ "use_mup": false,
23
+ "scale_emb": 12,
24
+ "dim_model_base": 256,
25
+ "scale_depth": 1.4
26
+ },
27
+ "patch_size": 4,
28
+ "feat_dim": 64,
29
+ "scalar_quantization_latent_dim": 512,
30
+ "scalar_quantization_scale": 9,
31
+ "residual_lm_num_layers": 8,
32
+ "residual_lm_no_rope": true,
33
+ "encoder_config": {
34
+ "hidden_dim": 1024,
35
+ "ffn_dim": 4096,
36
+ "num_heads": 16,
37
+ "num_layers": 12,
38
+ "kv_channels": 128
39
+ },
40
+ "dit_config": {
41
+ "hidden_dim": 1024,
42
+ "ffn_dim": 4096,
43
+ "num_heads": 16,
44
+ "num_layers": 12,
45
+ "kv_channels": 128,
46
+ "mean_mode": false,
47
+ "cfm_config": {
48
+ "sigma_min": 1e-06,
49
+ "solver": "euler",
50
+ "t_scheduler": "log-norm",
51
+ "inference_cfg_rate": 2.0
52
+ }
53
+ },
54
+ "audio_vae_config": {
55
+ "encoder_dim": 128,
56
+ "encoder_rates": [2, 5, 8, 8],
57
+ "latent_dim": 64,
58
+ "decoder_dim": 2048,
59
+ "decoder_rates": [8, 6, 5, 2, 2, 2],
60
+ "sr_bin_boundaries": [20000, 30000, 40000],
61
+ "sample_rate": 16000,
62
+ "out_sample_rate": 48000
63
+ },
64
+ "max_length": 8192,
65
+ "device": "cuda",
66
+ "dtype": "bfloat16"
67
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f7f964cfa9da23653baec6e6f7750719977ad944ed9f95fe52fe3a620506891d
3
+ size 4580080592
special_tokens_map.json ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ {
4
+ "content": "<|im_end|>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false
9
+ },
10
+ {
11
+ "content": "<|im_start|>",
12
+ "lstrip": false,
13
+ "normalized": false,
14
+ "rstrip": false,
15
+ "single_word": false
16
+ },
17
+ {
18
+ "content": "<|tool_call|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ {
25
+ "content": "<|execute_start|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ },
31
+ {
32
+ "content": "<|execute_end|>",
33
+ "lstrip": false,
34
+ "normalized": false,
35
+ "rstrip": false,
36
+ "single_word": false
37
+ },
38
+ {
39
+ "content": "<|fim_prefix|>",
40
+ "lstrip": false,
41
+ "normalized": false,
42
+ "rstrip": false,
43
+ "single_word": false
44
+ },
45
+ {
46
+ "content": "<|fim_middle|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false
51
+ },
52
+ {
53
+ "content": "<|fim_suffix|>",
54
+ "lstrip": false,
55
+ "normalized": false,
56
+ "rstrip": false,
57
+ "single_word": false
58
+ }
59
+ ],
60
+ "bos_token": {
61
+ "content": "<s>",
62
+ "lstrip": false,
63
+ "normalized": false,
64
+ "rstrip": false,
65
+ "single_word": false
66
+ },
67
+ "eos_token": {
68
+ "content": "</s>",
69
+ "lstrip": false,
70
+ "normalized": false,
71
+ "rstrip": false,
72
+ "single_word": false
73
+ },
74
+ "unk_token": {
75
+ "content": "<unk>",
76
+ "lstrip": false,
77
+ "normalized": false,
78
+ "rstrip": false,
79
+ "single_word": false
80
+ }
81
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,212 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "added_tokens_decoder": {
5
+ "0": {
6
+ "content": "<unk>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "1": {
14
+ "content": "<s>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "2": {
22
+ "content": "</s>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "101": {
30
+ "content": "<|audio_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "102": {
38
+ "content": "<|audio_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "103": {
46
+ "content": "<|audio_prompt_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "104": {
54
+ "content": "<|audio_prompt_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "105": {
62
+ "content": "<|background|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "106": {
70
+ "content": "<|/background|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "107": {
78
+ "content": "<|characters|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "108": {
86
+ "content": "<|/characters|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "109": {
94
+ "content": "<|speaker_id|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "110": {
102
+ "content": "<|/speaker_id|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "111": {
110
+ "content": "<|span|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "112": {
118
+ "content": "<|/span|>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": true
124
+ },
125
+ "73440": {
126
+ "content": "<|im_end|>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": true
132
+ },
133
+ "73441": {
134
+ "content": "<|im_start|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": true
140
+ },
141
+ "73442": {
142
+ "content": "<|tool_call|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": true
148
+ },
149
+ "73443": {
150
+ "content": "<|execute_start|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": true
156
+ },
157
+ "73444": {
158
+ "content": "<|execute_end|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": true
164
+ },
165
+ "73445": {
166
+ "content": "<|fim_prefix|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": true
172
+ },
173
+ "73446": {
174
+ "content": "<|fim_middle|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": true
180
+ },
181
+ "73447": {
182
+ "content": "<|fim_suffix|>",
183
+ "lstrip": false,
184
+ "normalized": false,
185
+ "rstrip": false,
186
+ "single_word": false,
187
+ "special": true
188
+ }
189
+ },
190
+ "additional_special_tokens": [
191
+ "<|im_end|>",
192
+ "<|im_start|>",
193
+ "<|tool_call|>",
194
+ "<|execute_start|>",
195
+ "<|execute_end|>",
196
+ "<|fim_prefix|>",
197
+ "<|fim_middle|>",
198
+ "<|fim_suffix|>"
199
+ ],
200
+ "bos_token": "<s>",
201
+ "clean_up_tokenization_spaces": false,
202
+ "eos_token": "<|im_end|>",
203
+ "legacy": true,
204
+ "model_max_length": 1000000000000000019884624838656,
205
+ "pad_token": null,
206
+ "sp_model_kwargs": {},
207
+ "spaces_between_special_tokens": false,
208
+ "tokenizer_class": "LlamaTokenizer",
209
+ "unk_token": "<unk>",
210
+ "use_default_system_prompt": false,
211
+ "chat_template": "{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}"
212
+ }