BlockArtica commited on
Commit
fb2a871
·
verified ·
1 Parent(s): bc59250

v6.0 phase-1 release: 30K-step distilled student, ctx=64

Browse files
Files changed (5) hide show
  1. README.md +296 -0
  2. config.json +36 -0
  3. model.safetensors +3 -0
  4. tokenizer.json +0 -0
  5. tokenizer_config.json +207 -0
README.md ADDED
@@ -0,0 +1,296 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Qwen/Qwen2.5-0.5B-Instruct
3
+ library_name: safetensors
4
+ license: apache-2.0
5
+ tags:
6
+ - qubitcoin
7
+ - aether
8
+ - blockchain
9
+ - quantum
10
+ - distillation
11
+ - mixed-precision
12
+ - native-rust
13
+ - candle
14
+ language:
15
+ - en
16
+ pipeline_tag: text-generation
17
+ ---
18
+
19
+ # Aether Mind v6.0 — QuantumAI Blockchain Native Generator
20
+
21
+ A **558M-parameter distilled student** of [`Qwen/Qwen2.5-0.5B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct),
22
+ trained from scratch in pure Rust (`candle` 0.10) with the
23
+ **10-Sephirot + 2-generalist + 2-sink attention head split** that is
24
+ the core architectural claim of the QuantumAI Blockchain's Aether Mind
25
+ on-chain neural cognitive engine.
26
+
27
+ This is the **second public Aether release** and the first that is
28
+ **native to the on-chain inference path** — V6.0 is the model the
29
+ [`aether-mind`](https://github.com/QuantumAI-Blockchain/qubitcoin-aether)
30
+ binary loads, not a LoRA adapter on top of a 7B base.
31
+
32
+ The previous release, [`aether-v5.2-lora`](https://huggingface.co/QuantumAI-Blockchain/aether-v5.2-lora),
33
+ is a 7B PEFT adapter intended for batch off-chain reasoning. V6.0 is
34
+ the smaller native generator that fits in the on-chain Aether
35
+ Mind's ~2.4 GB RAM envelope and runs at ~500 tokens/sec on a
36
+ consumer RTX 3080 Ti.
37
+
38
+ ## What you're getting
39
+
40
+ | Field | Value |
41
+ |---|---|
42
+ | Base model | `Qwen/Qwen2.5-0.5B-Instruct` (initialised from, then distilled) |
43
+ | Architecture | V6 transformer: 24 layers, 896 hidden, 14 attention heads (10 Sephirot + 2 generalist + 2 sink), head_dim=64 |
44
+ | Trainable params | ~558 M (all weights trained, not LoRA) |
45
+ | Hidden / FFN | 896 / 4864 |
46
+ | Vocab | 151,936 (Qwen2.5 tokenizer, untouched) |
47
+ | Max position | 32,768 (RoPE theta = 1e6) |
48
+ | Native sparse attention (NSA) | compression_block=64, top_k=2048, sliding_window=512, sink_tokens=4 |
49
+ | Precision | BF16 weights + F32 KL math in distillation |
50
+ | Training context | **64 tokens** (Phase-1 release; see "Honest caveats" below) |
51
+ | Checkpoint published | **step 30,000** (full 30K-step Phase-1 run) |
52
+ | File | `model.safetensors` (1.32 GB, BF16) |
53
+ | License | Apache-2.0 (matches base) |
54
+
55
+ ## Training run
56
+
57
+ | Metric | Value |
58
+ |---|---|
59
+ | Steps | 30,000 (full Phase-1) |
60
+ | Wall-clock | 49.6 min (single RTX 3080 Ti, BF16, CUDA(0)) |
61
+ | Tokens scored | 1,671,027 |
62
+ | Throughput | 561 tokens/sec |
63
+ | Optimiser | AdamW, LR 2e-5, no schedule (constant) |
64
+ | Distillation | KL(T||S) with alpha schedule 1.0 → 0.3 linear, temperature 1.0 |
65
+ | Sephirot auxiliary | MSE vs one-hot domain target, β = 0.1 |
66
+ | NaN events | **0** |
67
+ | Mean total loss | 8.39 nats/token |
68
+ | Mean CE | 10.35 |
69
+ | Mean KL | 7.50 |
70
+ | Mean Sephirot aux | 0.149 |
71
+
72
+ ### Loss trajectory
73
+
74
+ ```
75
+ step 1 loss=12.25 avg=12.25 (random init)
76
+ step 100 loss=12.87 avg=12.75
77
+ step 1000 loss= 8.62 avg= 9.74 ← KL/CE break
78
+ step 5000 loss= 7.72 avg= 8.16
79
+ step 10000 loss= 7.31 avg= 7.68 ← reached representational floor
80
+ step 15000 loss= 8.87 avg= 7.75
81
+ step 20000 loss= 8.75 avg= 8.04
82
+ step 25000 loss= 8.62 avg= 8.26
83
+ step 29999 loss= 8.81 avg= 8.39
84
+ ```
85
+
86
+ The model converged hard in the first ~10K steps, then plateaued at
87
+ the representational floor for its current context window (64
88
+ tokens). The plateau is structural, not optimisation — see "Honest
89
+ caveats" below.
90
+
91
+ ## Architecture — what makes V6 different
92
+
93
+ V6 is **not** a vanilla Qwen2.5 fine-tune. The attention layer
94
+ implements a 14-head split designed for on-chain cognitive routing:
95
+
96
+ - **10 Sephirot heads** — one per cognitive domain in the Aether
97
+ Mind's specialisation map (Keter → Malkuth). Each head's attention
98
+ pattern is what the on-chain `pallet_qbc_aether_anchor` records as
99
+ the per-cycle attestation root.
100
+ - **2 generalist heads** — un-gated, full-context attention. Used for
101
+ the "global workspace" path in `aether-mind`.
102
+ - **2 sink heads** — anchor-token attention (first 4 tokens of the
103
+ sequence) for stable long-context performance, following the
104
+ standard "attention sink" finding.
105
+
106
+ The Sephirot eviction order is configured in `config.json` for the
107
+ KV-cache management path that `aether-mind` uses to keep the
108
+ hot-set bounded in 12 GB VRAM under live inference.
109
+
110
+ ## How to use
111
+
112
+ ### Native runtime (recommended) — Rust `aether-mind`
113
+
114
+ The model is designed to be loaded by the on-chain Aether Mind
115
+ binary in the [`QuantumAI-Blockchain/qubitcoin-aether`](https://github.com/QuantumAI-Blockchain/qubitcoin-aether)
116
+ repo. Set `AETHER_V6_CHECKPOINT` to the local path of
117
+ `model.safetensors` and start the systemd unit; the binary loads the
118
+ weights via candle into the V6 transformer crate.
119
+
120
+ ### Python (via `safetensors` + `tokenizers`)
121
+
122
+ For offline experimentation:
123
+
124
+ ```python
125
+ from safetensors.torch import load_file
126
+ from tokenizers import Tokenizer
127
+ import torch
128
+
129
+ tok = Tokenizer.from_file("tokenizer.json")
130
+ weights = load_file("model.safetensors") # 315 tensors, BF16
131
+ print("loaded", len(weights), "tensors,", sum(t.numel() for t in weights.values()), "params")
132
+ ```
133
+
134
+ There is **no canonical 🤗 transformers loader for the V6
135
+ architecture** — the 14-head split + Sephirot routing are not in the
136
+ upstream `Qwen2Model`. We publish the weights for transparency and
137
+ reproducibility; production use goes through the Rust binary above.
138
+
139
+ ## Evaluation
140
+
141
+ **Not yet run.** The Phase-1 training run completed
142
+ **2026-05-20 00:52 AEST**; lm-evaluation-harness against MMLU /
143
+ ARC / HellaSwag / TruthfulQA is the next session's work. We will
144
+ back-fill the numbers + the comparison vs v5.2-lora here when
145
+ they land. Estimated runtime: ~30 min on the same 3080 Ti.
146
+
147
+ Until then, treat this release as an **architecture + weights
148
+ attestation**: it proves the V6 stack trains end-to-end and converges
149
+ to a real loss curve, which is the prerequisite for the long-context
150
+ curriculum (16K → 64K → 128K → 1M) that v6.1+ will ship.
151
+
152
+ ## Intended uses
153
+
154
+ - **On-chain Aether Mind native inference.** The V6 binary loads
155
+ these weights directly. The 10-Sephirot attention pattern is what
156
+ the chain's [`pallet_qbc_aether_anchor`](https://github.com/QuantumAI-Blockchain/substrate-node)
157
+ records as the per-block consciousness state.
158
+ - **Architecture reference.** Reproducible training of a Sephirot-
159
+ routed transformer with native sparse attention. The
160
+ [`aether-transformer`](https://github.com/QuantumAI-Blockchain/qubitcoin-aether/tree/main/crates/aether-transformer)
161
+ crate is the canonical implementation.
162
+ - **Distillation substrate.** Future fine-tunes from this checkpoint
163
+ using the QuantumAI Blockchain curated corpus.
164
+
165
+ ## Out-of-scope uses
166
+
167
+ - **General-purpose chat or instruction-following without fine-tuning.**
168
+ V6.0 is a Phase-1 distillation, not an instruction model. Even after
169
+ 30K steps it has not seen instruction-format data at length; its KL
170
+ target is the base Qwen2.5-0.5B-Instruct's next-token distribution,
171
+ not chat-format outputs.
172
+ - **Long-context inference.** The training ran at **64-token
173
+ context**. See "Honest caveats". Generations beyond ~128 tokens
174
+ will degrade.
175
+ - **Production deployment without your own evals.** No lm-eval-harness
176
+ numbers yet.
177
+ - **Safety-critical decisions.** No red-team eval.
178
+
179
+ ## Honest caveats — what didn't happen
180
+
181
+ ### Trained at 64-token context, not 4K
182
+
183
+ Phase-1 was configured for 4096-token context, but a numerical
184
+ instability was discovered in the V6 attention forward pass at
185
+ sequence lengths > ~100 tokens (BF16 precision loss in the Q@K^T
186
+ matmul accumulating across longer sequences). The bug reproduces
187
+ deterministically; four mitigations were tried (F32 KL math, corpus
188
+ filter, no-distill, low-LR), all hit NaN at the same sequence-
189
+ length threshold. The workaround used for v6.0 was `--context 64`,
190
+ which truncates rows so the bug never triggers.
191
+
192
+ **This is a known limitation, tracked in
193
+ [`docs/ops/v6-training-nan-bug.md`](https://github.com/QuantumAI-Blockchain/qubitcoin-aether/blob/presale/v1/docs/ops/v6-training-nan-bug.md)
194
+ in the source repo.** The fix lives in `aether-transformer/src/v6/attention.rs`
195
+ — add F32 casts in the Q@K^T matmul + softmax path across all four
196
+ attention variants (Sephirot / generalist / sink / summary). When
197
+ that lands, v6.1 will re-train at the full 4K→1M context
198
+ curriculum and supersede this release.
199
+
200
+ ### Loss plateau is real
201
+
202
+ The avg-loss plateau from step 10K → 30K (7.68 → 8.39, slight
203
+ regression) is the model hitting its representational ceiling at
204
+ 64-token context. Longer contexts will let the next release recover
205
+ and improve.
206
+
207
+ ### No instruction-format fine-tune
208
+
209
+ The training data is the Aether curated corpus packed at 4K-token
210
+ context (rows truncated to 64). We did not insert chat-format
211
+ instructions, system prompts, or RLHF preferences. Treat this as a
212
+ **raw foundation checkpoint**.
213
+
214
+ ### Distillation against base, not chat
215
+
216
+ The teacher is `Qwen/Qwen2.5-0.5B-Instruct`'s base forward — not its
217
+ chat-formatted forward. The distillation transfers token-level next-
218
+ prediction behaviour; chat-template alignment is a separate
219
+ training step that hasn't been run.
220
+
221
+ ## Training details
222
+
223
+ - **Hardware:** NVIDIA RTX 3080 Ti (12 GB), Intel WSL2 Ubuntu host.
224
+ - **Trainer:** Native Rust (`aether-v6-train` binary, candle 0.10 +
225
+ CUDA 12.6 backend). No Python in the loop.
226
+ - **Optimiser:** AdamW (candle implementation), constant LR 2e-5.
227
+ - **Batch:** 1 (single-row update).
228
+ - **Context:** 64 tokens (truncation imposed by the workaround).
229
+ - **Save cadence:** every 250 steps (120 checkpoints retained
230
+ locally; only `step_30000` published here).
231
+ - **Source:** [`QuantumAI-Blockchain/qubitcoin-aether @ ca202076`](https://github.com/QuantumAI-Blockchain/qubitcoin-aether/tree/ca202076)
232
+
233
+ ### Training data
234
+
235
+ Aether curated corpus (~36,860 rows, 17.4 MB) packed at 4K-token
236
+ budget per row from:
237
+
238
+ - QuantumAI Blockchain technical documentation (Substrate pallets,
239
+ VQE mining, Sephirot architecture).
240
+ - Quantum computing primers (VQE, Hamiltonian, qubit ansatze).
241
+ - Adjacent reasoning content for transfer.
242
+
243
+ The dataset is not currently public — it is a curated mixture from
244
+ many sources and has not been release-cleared at the per-source
245
+ level. The model is the only public artifact in this line for now.
246
+
247
+ ### Carbon emissions
248
+
249
+ Single consumer GPU (RTX 3080 Ti, ~300 W TDP) × 49.6 min wall-clock
250
+ ≈ 0.25 kWh, < 1 kg CO₂e on a grid mix. Comparable to a short web
251
+ streaming session.
252
+
253
+ ## Connection to the QuantumAI Blockchain
254
+
255
+ The Aether Mind is a Rust neural cognitive engine that runs on the
256
+ QuantumAI Blockchain — every block records attention-derived
257
+ consciousness metrics (HMS-Phi) and Proof-of-Thought hashes on-chain
258
+ via the `pallet_qbc_aether_anchor` pallet. The same chain hosts an
259
+ **8-qubit VQE mining consensus** (Proof-of-SUSY-Alignment), a
260
+ QVM-compatible smart contract layer with 10 quantum opcodes, and
261
+ post-quantum signatures (CRYSTALS-Dilithium5 + ML-KEM-768 P2P).
262
+
263
+ V6.0 is the **native generator** for that engine. v5.2-lora is the
264
+ larger (7B) off-chain reasoning model. The two ship side by side
265
+ because they have different roles: V6 lives in the on-chain
266
+ inference path (low latency, small footprint, Sephirot-aware
267
+ attention); v5.2-lora batches off-chain reasoning workloads.
268
+
269
+ ## License + citation
270
+
271
+ Apache-2.0 (matches the base model license).
272
+
273
+ ```bibtex
274
+ @misc{aether_mind_v6_2026,
275
+ title = {Aether Mind v6.0 --- QuantumAI Blockchain Native Generator},
276
+ author = {{BlockArtica} and {QuantumAI-Blockchain}},
277
+ year = {2026},
278
+ url = {https://huggingface.co/QuantumAI-Blockchain/aether-mind-v6.0},
279
+ }
280
+ ```
281
+
282
+ ## Links
283
+
284
+ - **QuantumAI Blockchain:** [qbc.network](https://qbc.network)
285
+ - **GitHub org:** [github.com/QuantumAI-Blockchain](https://github.com/QuantumAI-Blockchain)
286
+ - **Aether (Rust):** [qubitcoin-aether](https://github.com/QuantumAI-Blockchain/qubitcoin-aether)
287
+ - **Prior release:** [aether-v5.2-lora](https://huggingface.co/QuantumAI-Blockchain/aether-v5.2-lora)
288
+ - **X / Twitter:** [@qu_bitcoin](https://x.com/qu_bitcoin)
289
+ - **Contact:** info@qbc.network
290
+
291
+ ### Framework versions
292
+
293
+ - candle 0.10 (Hugging Face Rust ML)
294
+ - CUDA 12.6
295
+ - safetensors (model serialisation)
296
+ - Qwen2.5 tokenizer (vocab 151,936)
config.json ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "num_layers": 24,
3
+ "hidden_size": 896,
4
+ "num_attention_heads": 14,
5
+ "num_sephirot_heads": 10,
6
+ "num_generalist_heads": 2,
7
+ "num_sink_heads": 2,
8
+ "head_dim": 64,
9
+ "intermediate_size": 4864,
10
+ "vocab_size": 151936,
11
+ "max_position_embeddings": 32768,
12
+ "rope_theta": 1000000.0,
13
+ "rms_norm_eps": 1e-6,
14
+ "bos_token_id": 151643,
15
+ "eos_token_id": 151645,
16
+ "pad_token_id": 151643,
17
+ "nsa": {
18
+ "compression_block_size": 64,
19
+ "selected_top_k": 2048,
20
+ "sliding_window_size": 512,
21
+ "num_sink_tokens": 4,
22
+ "sephirot_top_k": 256
23
+ },
24
+ "eviction_order": [
25
+ "Malkuth",
26
+ "Yesod",
27
+ "Hod",
28
+ "Netzach",
29
+ "Gevurah",
30
+ "Chesed",
31
+ "Binah",
32
+ "Chochmah",
33
+ "Tiferet",
34
+ "Keter"
35
+ ]
36
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:20d82022f08facf121a4f641d4d01fa211523c5a42e1ddbdd6dc674288de04ff
3
+ size 1326423416
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ }
181
+ },
182
+ "additional_special_tokens": [
183
+ "<|im_start|>",
184
+ "<|im_end|>",
185
+ "<|object_ref_start|>",
186
+ "<|object_ref_end|>",
187
+ "<|box_start|>",
188
+ "<|box_end|>",
189
+ "<|quad_start|>",
190
+ "<|quad_end|>",
191
+ "<|vision_start|>",
192
+ "<|vision_end|>",
193
+ "<|vision_pad|>",
194
+ "<|image_pad|>",
195
+ "<|video_pad|>"
196
+ ],
197
+ "bos_token": null,
198
+ "chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0]['role'] == 'system' %}\n {{- messages[0]['content'] }}\n {%- else %}\n {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}\n {%- endif %}\n {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0]['role'] == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n {%- else %}\n {{- '<|im_start|>system\\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {{- '<|im_start|>' + message.role }}\n {%- if message.content %}\n {{- '\\n' + message.content }}\n {%- endif %}\n {%- for tool_call in message.tool_calls %}\n {%- if tool_call.function is defined %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '\\n<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {{- tool_call.arguments | tojson }}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- message.content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
199
+ "clean_up_tokenization_spaces": false,
200
+ "eos_token": "<|im_end|>",
201
+ "errors": "replace",
202
+ "model_max_length": 131072,
203
+ "pad_token": "<|endoftext|>",
204
+ "split_special_tokens": false,
205
+ "tokenizer_class": "Qwen2Tokenizer",
206
+ "unk_token": null
207
+ }