Duplicate from g-ntovas/Qwen3.5-0.8B-LiteRT

Browse files

Co-authored-by: John <g-ntovas@users.noreply.huggingface.co>

Files changed (10) hide show

.gitattributes +39 -0
README.md +161 -0
inference_tflite.py +216 -0
qwen35_embedder_q8.tflite +3 -0
qwen35_mm_q8_ekv2048.litertlm +3 -0
qwen35_mm_q8_ekv2048.tflite +3 -0
qwen35_vision_adapter_q8.tflite +3 -0
qwen35_vision_encoder_q8.tflite +3 -0
tokenizer.json +3 -0
tokenizer_config.json +305 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,39 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+qwen35_q8_ekv2048.litertlm filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text
+qwen35_mm_q8_ekv2048.litertlm filter=lfs diff=lfs merge=lfs -text
+qwen35_mm_q4_block32_ekv4096.litertlm filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,161 @@

+---
+license: apache-2.0
+base_model:
+- Qwen/Qwen3.5-0.8B
+pipeline_tag: image-text-to-text
+library_name: litert-lm
+tags:
+- Qwen3.5
+- litert
+- litert-lm
+- tflite
+- on-device
+- hybrid-attention
+- GatedDeltaNet
+- multimodal
+- vision
+---
+# Qwen3.5-0.8B LiteRT (Multimodal)
+This repository contains a [LiteRT](https://ai.google.dev/edge/litert) (formerly TFLite) conversion of [Qwen/Qwen3.5-0.8B](https://huggingface.co/Qwen/Qwen3.5-0.8B) for on-device inference, packaged in the [LiteRT-LM](https://github.com/nicfv/litert-torch) `.litertlm` format. Includes the **full multimodal pipeline**: language model, vision encoder, and vision adapter for image understanding.
+## Model Details
+| Property | Value |
+|----------|-------|
+| **Base Model** | [Qwen/Qwen3.5-0.8B](https://huggingface.co/Qwen/Qwen3.5-0.8B) |
+| **Architecture** | Hybrid attention (GatedDeltaNet + Full Attention) + ViT vision encoder |
+| **Parameters** | 752M (language) + 675M (vision encoder) + 10M (vision adapter) |
+| **Quantization** | Dynamic INT8 |
+| **KV Cache Length** | 2048 |
+| **Prefill Signatures** | 64, 128, 256, 512 |
+| **Vision Signatures** | 256, 576, 1024, 2304 patches |
+| **Format** | `.litertlm` (LiteRT-LM container) |
+## Architecture
+### Language Model
+Qwen3.5-0.8B uses a **hybrid attention** architecture that combines:
+- **18 GatedDeltaNet layers** (linear attention with recurrent delta rule) at positions 0-2, 4-6, 8-10, 12-14, 16-18, 20-22
+- **6 Full Attention layers** (standard multi-head attention with output gating and partial RoPE) at positions 3, 7, 11, 15, 19, 23
+### Vision Encoder
+The vision encoder is a 27-layer Vision Transformer (ViT):
+- **Patch embedding**: Conv3d (3→1152, kernel=[2,16,16]) with learned position embeddings (bilinear interpolation from 48×48 grid)
+- **27 VisionBlocks**: LayerNorm → Self-Attention (16 heads, head_dim=72, 2D rotary pos emb) → MLP (1152→4304→1152, GELU)
+- **Patch merger** (vision adapter): Groups 4 adjacent patches (spatial_merge_size=2) and projects to language model dimension (4608→1024)
+The model was **re-authored from scratch** using the LiteRT Generative API. The vision encoder and adapter are exported as separate TFLite models bundled alongside the language model.
+## Files
+| File | Size | Description |
+|------|------|-------------|
+| `qwen35_mm_q8_ekv2048.litertlm` | ~1.2 GB | LiteRT-LM bundle (LM + vision encoder + vision adapter + tokenizer) |
+| `qwen35_mm_q8_ekv2048.tflite` | ~757 MB | Language model TFLite |
+| `qwen35_vision_encoder_q8.tflite` | ~88 MB | Vision encoder TFLite |
+| `qwen35_vision_adapter_q8.tflite` | ~12 MB | Vision adapter TFLite |
+| `qwen35_embedder_q8.tflite` | ~245 MB | Text embedder TFLite |
+| `tokenizer.json` | ~11 MB | HuggingFace tokenizer |
+| `tokenizer_config.json` | ~2 KB | Tokenizer configuration |
+## Signatures
+### Language Model
+| Signature | Input Length | Outputs |
+|-----------|-------------|---------|
+| `prefill_64` | 64 tokens | Updated KV cache |
+| `prefill_128` | 128 tokens | Updated KV cache |
+| `prefill_256` | 256 tokens | Updated KV cache |
+| `prefill_512` | 512 tokens | Updated KV cache |
+| `decode` | 1 token | Logits + Updated KV cache |
+### Vision Encoder
+| Signature | Patches | Approx. Image Size |
+|-----------|---------|---------------------|
+| `encode_256` | 256 | 256×256 |
+| `encode_576` | 576 | 384×384 |
+| `encode_1024` | 1024 | 512×512 |
+| `encode_2304` | 2304 | 768×768 |
+### Vision Adapter
+| Signature | Merged Tokens | From Patches |
+|-----------|---------------|--------------|
+| `adapt_64` | 64 | 256 |
+| `adapt_144` | 144 | 576 |
+| `adapt_256` | 256 | 1024 |
+| `adapt_576` | 576 | 2304 |
+## Usage
+### Python (ai-edge-litert)
+```python
+import numpy as np
+from ai_edge_litert import interpreter as tfl_interpreter
+# Load model
+interp = tfl_interpreter.Interpreter(model_path="qwen35_mm_q8_ekv2048.tflite")
+interp.allocate_tensors()
+# Initialize KV cache (24 layers, mixed shapes)
+kv_cache = {}  # See inference_tflite.py for full initialization
+# Prefill
+prefill_runner = interp.get_signature_runner("prefill_64")
+tokens = np.array([[...]], dtype=np.int32)  # Padded to 64
+input_pos = np.arange(64, dtype=np.int32)
+output = prefill_runner(tokens=tokens, input_pos=input_pos, **kv_cache)
+# Decode loop
+decode_runner = interp.get_signature_runner("decode")
+for step in range(max_tokens):
+    output = decode_runner(tokens=next_token, input_pos=pos, **kv_cache)
+    next_token = np.argmax(output["logits"][0, -1])
+```
+### Tokenizer
+```python
+from transformers import AutoTokenizer
+tokenizer = AutoTokenizer.from_pretrained("g-ntovas/Qwen3.5-0.8B-LiteRT")
+```
+## Conversion Details
+- **Source**: [Qwen/Qwen3.5-0.8B](https://huggingface.co/Qwen/Qwen3.5-0.8B) (multimodal model)
+- **Method**: Custom re-authoring using LiteRT Generative API
+- **Quantization**: Dynamic INT8 (`dynamic_int8`)
+- **Export**: Per-signature tracing with fixed prefill lengths and patch counts
+- **Vision**: Encoder and adapter exported as separate TFLite models, bundled into `.litertlm`
+## Limitations
+- Video input is not yet supported (encoder architecture supports it, but the data processor returns UNIMPLEMENTED for video)
+- Prompts are padded to the nearest prefill signature length, which may introduce minor quality differences for the linear attention layers
+- The recurrent GatedDeltaNet implementation may produce slightly different outputs compared to the chunk-based HuggingFace implementation due to floating-point operation ordering
+## License
+This model inherits the [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) from the original [Qwen/Qwen3.5-0.8B](https://huggingface.co/Qwen/Qwen3.5-0.8B) model.
+## Citation
+If you use this model, please cite the original Qwen3.5 paper:
+```bibtex
+@misc{qwen3.5,
+  title={Qwen3.5 Technical Report},
+  author={Qwen Team},
+  year={2026},
+  url={https://huggingface.co/Qwen/Qwen3.5-0.8B}
+}
+```

inference_tflite.py ADDED Viewed

	@@ -0,0 +1,216 @@

+"""
+Run text generation inference on the exported Qwen3.5-0.8B TFLite model.
+Usage:
+    python inference_tflite.py --model_path output/qwen35_0.8b/qwen35_q8_ekv2048.tflite
+    python inference_tflite.py --prompt "Explain gravity" --max_new_tokens 100
+"""
+import argparse
+import glob
+import logging
+import time
+import numpy as np
+import transformers
+from ai_edge_litert import interpreter as tfl_interpreter
+logging.basicConfig(
+    level=logging.INFO,
+    format="%(asctime)s [%(levelname)s] %(message)s",
+)
+logger = logging.getLogger(__name__)
+# Architecture constants (must match qwen35_model.py)
+NUM_LAYERS = 24
+LAYER_TYPES = [
+    "linear", "linear", "linear", "full",
+    "linear", "linear", "linear", "full",
+    "linear", "linear", "linear", "full",
+    "linear", "linear", "linear", "full",
+    "linear", "linear", "linear", "full",
+    "linear", "linear", "linear", "full",
+]
+LINEAR_QKV_DIM = 6144
+LINEAR_CONV_KERNEL = 4
+LINEAR_NUM_HEADS = 16
+LINEAR_K_HEAD_DIM = 128
+LINEAR_V_HEAD_DIM = 128
+FULL_ATTN_NUM_KV_HEADS = 2
+FULL_ATTN_HEAD_DIM = 256
+MODEL_ID = "Qwen/Qwen3.5-0.8B"
+def create_initial_kv_cache(kv_cache_max_len, batch_size=1):
+    """Create zero-initialized KV cache arrays matching the model's per-layer shapes."""
+    kv = {}
+    for i in range(NUM_LAYERS):
+        if LAYER_TYPES[i] == "linear":
+            kv[f"kv_cache_k_{i}"] = np.zeros(
+                (batch_size, LINEAR_QKV_DIM, LINEAR_CONV_KERNEL - 1),
+                dtype=np.float32,
+            )
+            kv[f"kv_cache_v_{i}"] = np.zeros(
+                (batch_size, LINEAR_NUM_HEADS, LINEAR_K_HEAD_DIM, LINEAR_V_HEAD_DIM),
+                dtype=np.float32,
+            )
+        else:
+            kv[f"kv_cache_k_{i}"] = np.zeros(
+                (batch_size, kv_cache_max_len, FULL_ATTN_NUM_KV_HEADS, FULL_ATTN_HEAD_DIM),
+                dtype=np.float32,
+            )
+            kv[f"kv_cache_v_{i}"] = np.zeros(
+                (batch_size, kv_cache_max_len, FULL_ATTN_NUM_KV_HEADS, FULL_ATTN_HEAD_DIM),
+                dtype=np.float32,
+            )
+    return kv
+def find_prefill_signature(signatures, seq_len):
+    """Find the best prefill signature for the given sequence length."""
+    prefill_sigs = sorted(
+        [s for s in signatures if s.startswith("prefill_")],
+        key=lambda s: int(s.split("_")[1]),
+    )
+    if not prefill_sigs:
+        raise ValueError("No prefill signatures found in model")
+    for sig in prefill_sigs:
+        sig_len = int(sig.split("_")[1])
+        if sig_len >= seq_len:
+            return sig, sig_len
+    # Use largest available
+    largest = prefill_sigs[-1]
+    return largest, int(largest.split("_")[1])
+def generate(model_path, prompt, max_new_tokens, kv_cache_max_len):
+    """Run text generation with the TFLite model."""
+    # Load tokenizer
+    logger.info("Loading tokenizer from: %s", MODEL_ID)
+    tokenizer = transformers.AutoTokenizer.from_pretrained(
+        MODEL_ID, trust_remote_code=True
+    )
+    # Tokenize prompt
+    input_ids = tokenizer.encode(prompt)
+    logger.info("Prompt: %s", prompt)
+    logger.info("Token count: %d", len(input_ids))
+    # Load TFLite model
+    logger.info("Loading TFLite model from: %s", model_path)
+    t0 = time.time()
+    interp = tfl_interpreter.Interpreter(model_path=model_path)
+    interp.allocate_tensors()
+    logger.info("Model loaded in %.1fs", time.time() - t0)
+    signatures = interp.get_signature_list()
+    logger.info("Available signatures: %s", list(signatures.keys()))
+    # Initialize KV cache
+    kv_cache = create_initial_kv_cache(kv_cache_max_len)
+    # --- Prefill phase ---
+    sig_name, sig_len = find_prefill_signature(signatures, len(input_ids))
+    logger.info("Using prefill signature: %s (padding %d -> %d)", sig_name, len(input_ids), sig_len)
+    # Pad input to match signature length
+    padded_ids = input_ids + [0] * (sig_len - len(input_ids))
+    tokens = np.array([padded_ids], dtype=np.int32)
+    input_pos = np.arange(sig_len, dtype=np.int32)
+    prefill_runner = interp.get_signature_runner(sig_name)
+    t0 = time.time()
+    prefill_out = prefill_runner(tokens=tokens, input_pos=input_pos, **kv_cache)
+    prefill_time = time.time() - t0
+    logger.info("Prefill done in %.2fs", prefill_time)
+    # Update KV cache from prefill output
+    for key in kv_cache:
+        if key in prefill_out:
+            kv_cache[key] = prefill_out[key]
+    # --- Decode phase ---
+    # Prefill processed sig_len tokens (including padding). Next decode
+    # position is sig_len. We feed the last real token to get the first
+    # generated token.
+    decode_runner = interp.get_signature_runner("decode")
+    generated_ids = list(input_ids)
+    current_pos = sig_len  # continue after prefill
+    logger.info("Starting decode (max %d tokens)...", max_new_tokens)
+    print(f"\n--- Generated text ---\n{prompt}", end="", flush=True)
+    t0 = time.time()
+    for step in range(max_new_tokens):
+        # Feed last token, get next
+        tok = np.array([[generated_ids[-1]]], dtype=np.int32)
+        pos = np.array([current_pos], dtype=np.int32)
+        decode_out = decode_runner(tokens=tok, input_pos=pos, **kv_cache)
+        # Update KV cache
+        for key in kv_cache:
+            if key in decode_out:
+                kv_cache[key] = decode_out[key]
+        next_token = int(np.argmax(decode_out["logits"][0, -1]))
+        generated_ids.append(next_token)
+        current_pos += 1
+        # Print token
+        word = tokenizer.decode([next_token])
+        print(word, end="", flush=True)
+        # Stop on EOS
+        if next_token == tokenizer.eos_token_id:
+            break
+    decode_time = time.time() - t0
+    num_decoded = len(generated_ids) - len(input_ids)
+    print(f"\n\n--- Stats ---")
+    print(f"Prefill: {prefill_time:.2f}s ({len(input_ids)} tokens)")
+    print(f"Decode:  {decode_time:.2f}s ({num_decoded} tokens, {num_decoded/decode_time:.1f} tok/s)")
+def main():
+    parser = argparse.ArgumentParser(description="TFLite inference for Qwen3.5-0.8B")
+    parser.add_argument(
+        "--model_path",
+        default=None,
+        help="Path to .tflite model file",
+    )
+    parser.add_argument(
+        "--prompt",
+        default="What is the meaning of life?",
+        help="Input prompt",
+    )
+    parser.add_argument(
+        "--max_new_tokens",
+        type=int,
+        default=50,
+        help="Maximum tokens to generate",
+    )
+    parser.add_argument(
+        "--kv_cache_max_len",
+        type=int,
+        default=2048,
+        help="KV cache max length (must match exported model)",
+    )
+    args = parser.parse_args()
+    # Auto-find model if not specified
+    if args.model_path is None:
+        files = glob.glob("output/**/*.tflite", recursive=True)
+        if files:
+            args.model_path = max(files, key=lambda f: __import__("os").path.getmtime(f))
+            logger.info("Auto-found model: %s", args.model_path)
+        else:
+            raise FileNotFoundError("No .tflite files found in output/")
+    generate(args.model_path, args.prompt, args.max_new_tokens, args.kv_cache_max_len)
+if __name__ == "__main__":
+    main()

qwen35_embedder_q8.tflite ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0a3cc2102f1c345110215d23bc6963a1369c358d8ef91fe4f295b8606dc1df27
+size 257260872

qwen35_mm_q8_ekv2048.litertlm ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:92999fe4a9242c983e99892d6e57f368e8cd7a4534bc9a383a9551155b7f70a5
+size 1159757824

qwen35_mm_q8_ekv2048.tflite ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2a59a2b85cf06e1245a5ea4a0b0e1e0b0348de8c803e8c806dac42951a3035ed
+size 793905384

qwen35_vision_adapter_q8.tflite ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b1032a0082a74a38c8d1a56e024c2f596de48973c18bf54aeed1acff2e11d1a4
+size 12662960

qwen35_vision_encoder_q8.tflite ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2e58590d0a610d399438223c854192ac3ccbfc98b0bd57f0aedb84ddae17540a
+size 92250944

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5f9e4d4901a92b997e463c1f46055088b6cca5ca61a6522d1b9f64c4bb81cb42
+size 12807982

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,305 @@

+{
+    "add_prefix_space": false,
+    "added_tokens_decoder": {
+        "248044": {
+            "content": "<|endoftext|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "248045": {
+            "content": "<|im_start|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "248046": {
+            "content": "<|im_end|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "248047": {
+            "content": "<|object_ref_start|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "248048": {
+            "content": "<|object_ref_end|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "248049": {
+            "content": "<|box_start|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "248050": {
+            "content": "<|box_end|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "248051": {
+            "content": "<|quad_start|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "248052": {
+            "content": "<|quad_end|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "248053": {
+            "content": "<|vision_start|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "248054": {
+            "content": "<|vision_end|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "248055": {
+            "content": "<|vision_pad|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "248056": {
+            "content": "<|image_pad|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "248057": {
+            "content": "<|video_pad|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "248058": {
+            "content": "<tool_call>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": false
+        },
+        "248059": {
+            "content": "</tool_call>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": false
+        },
+        "248060": {
+            "content": "<|fim_prefix|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": false
+        },
+        "248061": {
+            "content": "<|fim_middle|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": false
+        },
+        "248062": {
+            "content": "<|fim_suffix|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": false
+        },
+        "248063": {
+            "content": "<|fim_pad|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": false
+        },
+        "248064": {
+            "content": "<|repo_name|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": false
+        },
+        "248065": {
+            "content": "<|file_sep|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": false
+        },
+        "248066": {
+            "content": "<tool_response>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": false
+        },
+        "248067": {
+            "content": "</tool_response>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": false
+        },
+        "248068": {
+            "content": "<think>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": false
+        },
+        "248069": {
+            "content": "</think>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": false
+        },
+        "248070": {
+            "content": "<|audio_start|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "248071": {
+            "content": "<|audio_end|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "248072": {
+            "content": "<tts_pad>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "248073": {
+            "content": "<tts_text_bos>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "248074": {
+            "content": "<tts_text_eod>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "248075": {
+            "content": "<tts_text_bos_single>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "248076": {
+            "content": "<|audio_pad|>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        }
+    },
+    "additional_special_tokens": [
+        "<|im_start|>",
+        "<|im_end|>",
+        "<|object_ref_start|>",
+        "<|object_ref_end|>",
+        "<|box_start|>",
+        "<|box_end|>",
+        "<|quad_start|>",
+        "<|quad_end|>",
+        "<|vision_start|>",
+        "<|vision_end|>",
+        "<|vision_pad|>",
+        "<|image_pad|>",
+        "<|video_pad|>"
+    ],
+    "bos_token": null,
+    "chat_template": "{%- set image_count = namespace(value=0) %}\n{%- set video_count = namespace(value=0) %}\n{%- macro render_content(content, do_vision_count, is_system_content=false) %}\n    {%- if content is string %}\n        {{- content }}\n    {%- elif content is iterable and content is not mapping %}\n        {%- for item in content %}\n            {%- if 'image' in item or 'image_url' in item or item.type == 'image' %}\n                {%- if is_system_content %}\n                    {{- raise_exception('System message cannot contain images.') }}\n                {%- endif %}\n                {%- if do_vision_count %}\n                    {%- set image_count.value = image_count.value + 1 %}\n                {%- endif %}\n                {%- if add_vision_id %}\n                    {{- 'Picture ' ~ image_count.value ~ ': ' }}\n                {%- endif %}\n                {{- '<|vision_start|><|image_pad|><|vision_end|>' }}\n            {%- elif 'video' in item or item.type == 'video' %}\n                {%- if is_system_content %}\n                    {{- raise_exception('System message cannot contain videos.') }}\n                {%- endif %}\n                {%- if do_vision_count %}\n                    {%- set video_count.value = video_count.value + 1 %}\n                {%- endif %}\n                {%- if add_vision_id %}\n                    {{- 'Video ' ~ video_count.value ~ ': ' }}\n                {%- endif %}\n                {{- '<|vision_start|><|video_pad|><|vision_end|>' }}\n            {%- elif 'text' in item %}\n                {{- item.text }}\n            {%- else %}\n                {{- raise_exception('Unexpected item type in content.') }}\n            {%- endif %}\n        {%- endfor %}\n    {%- elif content is none or content is undefined %}\n        {{- '' }}\n    {%- else %}\n        {{- raise_exception('Unexpected content type.') }}\n    {%- endif %}\n{%- endmacro %}\n{%- if not messages %}\n    {{- raise_exception('No messages provided.') }}\n{%- endif %}\n{%- if tools and tools is iterable and tools is not mapping %}\n    {{- '<|im_start|>system\\n' }}\n    {{- \"# Tools\\n\\nYou have access to the following functions:\\n\\n<tools>\" }}\n    {%- for tool in tools %}\n        {{- \"\\n\" }}\n        {{- tool | tojson }}\n    {%- endfor %}\n    {{- \"\\n</tools>\" }}\n    {{- '\\n\\nIf you choose to call a function ONLY reply in the following format with NO suffix:\\n\\n<tool_call>\\n<function=example_function_name>\\n<parameter=example_parameter_1>\\nvalue_1\\n</parameter>\\n<parameter=example_parameter_2>\\nThis is the value for the second parameter\\nthat can span\\nmultiple lines\\n</parameter>\\n</function>\\n</tool_call>\\n\\n<IMPORTANT>\\nReminder:\\n- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags\\n- Required parameters MUST be specified\\n- You may provide optional reasoning for your function call in natural language BEFORE the function call, but NOT after\\n- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls\\n</IMPORTANT>' }}\n    {%- if messages[0].role == 'system' %}\n        {%- set content = render_content(messages[0].content, false, true)|trim %}\n        {%- if content %}\n            {{- '\\n\\n' + content }}\n        {%- endif %}\n    {%- endif %}\n    {{- '<|im_end|>\\n' }}\n{%- else %}\n    {%- if messages[0].role == 'system' %}\n        {%- set content = render_content(messages[0].content, false, true)|trim %}\n        {{- '<|im_start|>system\\n' + content + '<|im_end|>\\n' }}\n    {%- endif %}\n{%- endif %}\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for message in messages[::-1] %}\n    {%- set index = (messages|length - 1) - loop.index0 %}\n    {%- if ns.multi_step_tool and message.role == \"user\" %}\n        {%- set content = render_content(message.content, false)|trim %}\n        {%- if not(content.startswith('<tool_response>') and content.endswith('</tool_response>')) %}\n            {%- set ns.multi_step_tool = false %}\n            {%- set ns.last_query_index = index %}\n        {%- endif %}\n    {%- endif %}\n{%- endfor %}\n{%- if ns.multi_step_tool %}\n    {{- raise_exception('No user query found in messages.') }}\n{%- endif %}\n{%- for message in messages %}\n    {%- set content = render_content(message.content, true)|trim %}\n    {%- if message.role == \"system\" %}\n        {%- if not loop.first %}\n            {{- raise_exception('System message must be at the beginning.') }}\n        {%- endif %}\n    {%- elif message.role == \"user\" %}\n        {{- '<|im_start|>' + message.role + '\\n' + content + '<|im_end|>' + '\\n' }}\n    {%- elif message.role == \"assistant\" %}\n        {%- set reasoning_content = '' %}\n        {%- if message.reasoning_content is string %}\n            {%- set reasoning_content = message.reasoning_content %}\n        {%- else %}\n            {%- if '</think>' in content %}\n                {%- set reasoning_content = content.split('</think>')[0].rstrip('\\n').split('<think>')[-1].lstrip('\\n') %}\n                {%- set content = content.split('</think>')[-1].lstrip('\\n') %}\n            {%- endif %}\n        {%- endif %}\n        {%- set reasoning_content = reasoning_content|trim %}\n        {%- if loop.index0 > ns.last_query_index %}\n            {{- '<|im_start|>' + message.role + '\\n<think>\\n' + reasoning_content + '\\n</think>\\n\\n' + content }}\n        {%- else %}\n            {{- '<|im_start|>' + message.role + '\\n' + content }}\n        {%- endif %}\n        {%- if message.tool_calls and message.tool_calls is iterable and message.tool_calls is not mapping %}\n            {%- for tool_call in message.tool_calls %}\n                {%- if tool_call.function is defined %}\n                    {%- set tool_call = tool_call.function %}\n                {%- endif %}\n                {%- if loop.first %}\n                    {%- if content|trim %}\n                        {{- '\\n\\n<tool_call>\\n<function=' + tool_call.name + '>\\n' }}\n                    {%- else %}\n                        {{- '<tool_call>\\n<function=' + tool_call.name + '>\\n' }}\n                    {%- endif %}\n                {%- else %}\n                    {{- '\\n<tool_call>\\n<function=' + tool_call.name + '>\\n' }}\n                {%- endif %}\n                {%- if tool_call.arguments is defined %}\n                    {%- for args_name, args_value in tool_call.arguments|items %}\n                        {{- '<parameter=' + args_name + '>\\n' }}\n                        {%- set args_value = args_value | tojson | safe if args_value is mapping or (args_value is sequence and args_value is not string) else args_value | string %}\n                        {{- args_value }}\n                        {{- '\\n</parameter>\\n' }}\n                    {%- endfor %}\n                {%- endif %}\n                {{- '</function>\\n</tool_call>' }}\n            {%- endfor %}\n        {%- endif %}\n        {{- '<|im_end|>\\n' }}\n    {%- elif message.role == \"tool\" %}\n        {%- if loop.previtem and loop.previtem.role != \"tool\" %}\n            {{- '<|im_start|>user' }}\n        {%- endif %}\n        {{- '\\n<tool_response>\\n' }}\n        {{- content }}\n        {{- '\\n</tool_response>' }}\n        {%- if not loop.last and loop.nextitem.role != \"tool\" %}\n            {{- '<|im_end|>\\n' }}\n        {%- elif loop.last %}\n            {{- '<|im_end|>\\n' }}\n        {%- endif %}\n    {%- else %}\n        {{- raise_exception('Unexpected message role.') }}\n    {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n    {{- '<|im_start|>assistant\\n' }}\n    {%- if enable_thinking is defined and enable_thinking is true %}\n        {{- '<think>\\n' }}\n    {%- else %}\n        {{- '<think>\\n\\n</think>\\n\\n' }}\n    {%- endif %}\n{%- endif %}",
+    "clean_up_tokenization_spaces": false,
+    "eos_token": "<|im_end|>",
+    "errors": "replace",
+    "model_max_length": 262144,
+    "pad_token": "<|endoftext|>",
+    "split_special_tokens": false,
+    "tokenizer_class": "Qwen2Tokenizer",
+    "unk_token": null,
+    "add_bos_token": false,
+    "pretokenize_regex": "(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?[\\p{L}\\p{M}]+|\\p{N}| ?[^\\s\\p{L}\\p{M}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+",
+    "extra_special_tokens": {
+        "audio_bos_token": "<|audio_start|>",
+        "audio_eos_token": "<|audio_end|>",
+        "audio_token": "<|audio_pad|>",
+        "image_token": "<|image_pad|>",
+        "video_token": "<|video_pad|>",
+        "vision_bos_token": "<|vision_start|>",
+        "vision_eos_token": "<|vision_end|>"
+    }
+}