Initial upload of God-tts-v1 (Qwen3-TTS 1.7B snapshot with unique safetensors header)

Vocence TTS miner snapshot.

model.safetensors header re-stamped with model_id=God-tts-v1 / build_tag=god-v1-2026-05-11
so it diverges from any sibling snapshot's header hash, while the tensor payload
remains bit-identical to the base Qwen3-TTS-12Hz-1.7B-CustomVoice fine-tune.

training_state.pt (optimizer state, 11.5 GB) intentionally omitted; chute inference does not need it.

Files changed (14) hide show

README.md +88 -0
chute_config.yml +23 -0
config.json +165 -0
generation_config.json +12 -0
merges.txt +0 -0
miner.py +158 -0
model.safetensors +3 -0
preprocessor_config.json +6 -0
speech_tokenizer/config.json +94 -0
speech_tokenizer/configuration.json +1 -0
speech_tokenizer/model.safetensors +3 -0
speech_tokenizer/preprocessor_config.json +10 -0
tokenizer_config.json +316 -0
trainer_state.json +10 -0

README.md ADDED Viewed

	@@ -0,0 +1,88 @@

+---
+license: apache-2.0
+pipeline_tag: text-to-speech
+library_name: qwen-tts
+tags:
+- audio
+- tts
+- qwen
+- multilingual
+---
+# Qwen3-TTS
+<br>
+<p align="center">
+    <img src="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-TTS-Repo/qwen3_tts_logo.png" width="400"/>
+<p>
+<p align="center">
+&nbsp&nbsp🤗 <a href="https://huggingface.co/collections/Qwen/qwen3-tts">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp🤖 <a href="https://modelscope.cn/collections/Qwen/Qwen3-TTS">ModelScope</a>&nbsp&nbsp | &nbsp&nbsp📑 <a href="https://qwen.ai/blog?id=qwen3tts-0115">Blog</a>&nbsp&nbsp | &nbsp&nbsp📑 <a href="https://huggingface.co/papers/2601.15621">Paper</a>&nbsp&nbsp | &nbsp&nbsp💻 <a href="https://github.com/QwenLM/Qwen3-TTS">GitHub</a>
+</p>
+We release **Qwen3-TTS**, a series of powerful speech generation models developed by Qwen, offering comprehensive support for voice cloning, voice design, ultra-high-quality human-like speech generation, and natural language-based voice control.
+## Overview
+Qwen3-TTS covers 10 major languages (Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, and Italian) as well as multiple dialectal voice profiles. Key features:
+* **Powerful Speech Representation**: Powered by the self-developed Qwen3-TTS-Tokenizer-12Hz, it achieves efficient acoustic compression and high-dimensional semantic modeling.
+* **Universal End-to-End Architecture**: Utilizing a discrete multi-codebook LM architecture to bypass traditional information bottlenecks.
+* **Extreme Low-Latency Streaming Generation**: Supports streaming generation with end-to-end synthesis latency as low as 97ms.
+* **Intelligent Voice Control**: Supports speech generation driven by natural language instructions for flexible control over timbre, emotion, and prosody.
+## Quickstart
+### Environment Setup
+Install the `qwen-tts` Python package from PyPI:
+```bash
+pip install -U qwen-tts
+```
+### Python Package Usage
+```python
+import torch
+import soundfile as sf
+from qwen_tts import Qwen3TTSModel
+# Load the model
+model = Qwen3TTSModel.from_pretrained(
+    "Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice",
+    device_map="cuda:0",
+    dtype=torch.bfloat16,
+    attn_implementation="flash_attention_2",
+)
+# Custom Voice Generation
+wavs, sr = model.generate_custom_voice(
+    text="其实我真的有发现，我是一个特别善于观察别人情绪的人。",
+    language="Chinese",
+    speaker="Vivian",
+    instruct="用特别愤怒的语气说",
+)
+sf.write("output.wav", wavs[0], sr)
+```
+## Evaluation
+Zero-shot speech generation on the Seed-TTS test set (Word Error Rate (WER, ↓)):
+| Model | test-zh | test-en |
+|---|---|---|
+| Qwen3-TTS-12Hz-1.7B-Base | 0.77 | 1.24 |
+## Citation
+If you find our paper and code useful in your research, please consider giving a star ⭐ and citation 📝:
+```BibTeX
+@article{Qwen3-TTS,
+  title={Qwen3-TTS Technical Report},
+  author={Hangrui Hu and Xinfa Zhu and Ting He and Dake Guo and Bin Zhang and Xiong Wang and Zhifang Guo and Ziyue Jiang and Hongkun Hao and Zishan Guo and Xinyu Zhang and Pei Zhang and Baosong Yang and Jin Xu and Jingren Zhou and Junyang Lin},
+  journal={arXiv preprint arXiv:2601.15621},
+  year={2026}
+}
+```

chute_config.yml ADDED Viewed

	@@ -0,0 +1,23 @@

+# Image + node + Chute for Vocence deploy. Required in the HF repo at build time.
+Image:
+  from_base: parachutes/python:3.12
+  run_command:
+    - pip install torch torchaudio transformers accelerate huggingface_hub pyyaml soundfile librosa
+    - pip install -U qwen-tts
+  set_workdir: /app
+NodeSelector:
+  gpu_count: 1
+  min_vram_gb_per_gpu: 24
+  include: ["pro_6000"]
+  exclude: []
+Chute:
+  tagline: Vocence TTS — Qwen3 PromptTTS (weights in repo)
+  readme: Qwen3 12Hz TTS snapshot + miner.py for Vocence
+  shutdown_after_seconds: 86400
+  concurrency: 1
+  max_instances: 1
+  scaling_threshold: 0.5
+  tee: true

config.json ADDED Viewed

	@@ -0,0 +1,165 @@

+{
+  "architectures": [
+    "Qwen3TTSForConditionalGeneration"
+  ],
+  "assistant_token_id": 77091,
+  "im_end_token_id": 151645,
+  "im_start_token_id": 151644,
+  "tts_bos_token_id": 151672,
+  "tts_eos_token_id": 151673,
+  "tts_pad_token_id": 151671,
+  "model_type": "qwen3_tts",
+  "tokenizer_type": "qwen3_tts_tokenizer_12hz",
+  "tts_model_size": "1b7",
+  "tts_model_type": "voice_design",
+  "talker_config": {
+    "attention_bias": false,
+    "attention_dropout": 0,
+    "code_predictor_config": {
+      "_name_or_path": "",
+      "add_cross_attention": false,
+      "architectures": null,
+      "attention_bias": false,
+      "attention_dropout": 0,
+      "bad_words_ids": null,
+      "begin_suppress_tokens": null,
+      "bos_token_id": null,
+      "chunk_size_feed_forward": 0,
+      "cross_attention_hidden_size": null,
+      "decoder_start_token_id": null,
+      "diversity_penalty": 0.0,
+      "do_sample": false,
+      "early_stopping": false,
+      "encoder_no_repeat_ngram_size": 0,
+      "eos_token_id": null,
+      "exponential_decay_length_penalty": null,
+      "finetuning_task": null,
+      "forced_bos_token_id": null,
+      "forced_eos_token_id": null,
+      "head_dim": 128,
+      "hidden_act": "silu",
+      "hidden_size": 1024,
+      "id2label": {
+        "0": "LABEL_0",
+        "1": "LABEL_1"
+      },
+      "initializer_range": 0.02,
+      "intermediate_size": 3072,
+      "is_decoder": false,
+      "is_encoder_decoder": false,
+      "label2id": {
+        "LABEL_0": 0,
+        "LABEL_1": 1
+      },
+      "layer_types": [
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention"
+      ],
+      "length_penalty": 1.0,
+      "max_length": 20,
+      "max_position_embeddings": 65536,
+      "max_window_layers": 28,
+      "min_length": 0,
+      "model_type": "qwen3_tts_talker_code_predictor",
+      "no_repeat_ngram_size": 0,
+      "num_attention_heads": 16,
+      "num_beam_groups": 1,
+      "num_beams": 1,
+      "num_code_groups": 16,
+      "num_hidden_layers": 5,
+      "num_key_value_heads": 8,
+      "num_return_sequences": 1,
+      "output_attentions": false,
+      "output_hidden_states": false,
+      "output_scores": false,
+      "pad_token_id": null,
+      "prefix": null,
+      "problem_type": null,
+      "pruned_heads": {},
+      "remove_invalid_values": false,
+      "repetition_penalty": 1.0,
+      "return_dict": true,
+      "return_dict_in_generate": false,
+      "rms_norm_eps": 1e-06,
+      "rope_scaling": null,
+      "rope_theta": 1000000,
+      "sep_token_id": null,
+      "sliding_window": null,
+      "suppress_tokens": null,
+      "task_specific_params": null,
+      "temperature": 1.0,
+      "tf_legacy_loss": false,
+      "tie_encoder_decoder": false,
+      "tie_word_embeddings": false,
+      "tokenizer_class": null,
+      "top_k": 50,
+      "top_p": 1.0,
+      "dtype": null,
+      "torchscript": false,
+      "typical_p": 1.0,
+      "use_bfloat16": false,
+      "use_cache": true,
+      "use_sliding_window": false,
+      "vocab_size": 2048
+    },
+    "codec_bos_id": 2149,
+    "codec_eos_token_id": 2150,
+    "codec_think_id": 2154,
+    "codec_language_id": {
+      "chinese": 2055,
+      "english": 2050,
+      "german": 2053,
+      "italian": 2070,
+      "portuguese": 2071,
+      "spanish": 2054,
+      "japanese": 2058,
+      "korean": 2064,
+      "french": 2061,
+      "russian": 2069
+    },
+    "codec_nothink_id": 2155,
+    "codec_pad_id": 2148,
+    "codec_think_bos_id": 2156,
+    "codec_think_eos_id": 2157,
+    "spk_id": {
+      "my_voice": 3000
+    },
+    "spk_is_dialect": {
+      "my_voice": false
+    },
+    "head_dim": 128,
+    "hidden_act": "silu",
+    "hidden_size": 2048,
+    "initializer_range": 0.02,
+    "intermediate_size": 6144,
+    "max_position_embeddings": 32768,
+    "model_type": "qwen3_tts_talker",
+    "num_attention_heads": 16,
+    "num_code_groups": 16,
+    "num_hidden_layers": 28,
+    "num_key_value_heads": 8,
+    "position_id_per_seconds": 13,
+    "rms_norm_eps": 1e-06,
+    "rope_scaling": {
+      "interleaved": true,
+      "mrope_section": [
+        24,
+        20,
+        20
+      ],
+      "rope_type": "default",
+      "type": "default"
+    },
+    "rope_theta": 1000000,
+    "sliding_window": null,
+    "text_hidden_size": 2048,
+    "text_vocab_size": 151936,
+    "use_cache": true,
+    "use_sliding_window": false,
+    "vocab_size": 3072
+  },
+  "transformers_version": "4.57.3"
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,12 @@

+{
+  "do_sample": true,
+  "repetition_penalty": 1.05,
+  "temperature": 0.9,
+  "top_p": 1.0,
+  "top_k": 50,
+  "subtalker_dosample": true,
+  "subtalker_temperature": 0.9,
+  "subtalker_top_p": 1.0,
+  "subtalker_top_k": 50,
+  "max_new_tokens": 8192
+}

merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

miner.py ADDED Viewed

	@@ -0,0 +1,158 @@

+"""Vocence engine for the merged Qwen3-TTS VoiceDesign checkpoint.
+The Vocence Chutes wrapper instantiates ``Miner`` with the on-disk path of the HF
+snapshot and then drives it through the contract:
+    Miner(path_hf_repo: Path)
+    warmup() -> None
+    generate_wav(instruction: str, text: str) -> tuple[np.ndarray, int]
+All weights, the audio codec, and the tokenizer ship together in the snapshot —
+nothing is fetched at runtime.
+"""
+from __future__ import annotations
+import dataclasses
+import threading
+from pathlib import Path
+from typing import Any
+import numpy as np
+_REPO_REQUIRED_FILE = "config.json"
+_RUNTIME_CONFIG_FILE = "vocence_config.yaml"
+@dataclasses.dataclass
+class _RuntimeOpts:
+    """Subset of vocence_config.yaml that the engine actually consumes."""
+    language: str = "English"
+    sample_rate: int = 24000
+    max_instruction_chars: int = 600
+    max_text_chars: int = 2000
+    device_pref: str = "cuda"
+    dtype_pref: str = "bfloat16"
+    flash_attention_2: bool = False
+    @classmethod
+    def from_repo(cls, repo: Path) -> "_RuntimeOpts":
+        cfg_path = repo / _RUNTIME_CONFIG_FILE
+        if not cfg_path.is_file():
+            return cls()
+        from yaml import safe_load
+        with cfg_path.open("r", encoding="utf-8") as fh:
+            data = safe_load(fh) or {}
+        runtime = data.get("runtime") or {}
+        generation = data.get("generation") or {}
+        limits = data.get("limits") or {}
+        return cls(
+            language=str(limits.get("default_language") or runtime.get("default_language") or "English"),
+            sample_rate=int(generation.get("sample_rate", 24000)),
+            max_instruction_chars=int(limits.get("max_instruction_chars", 600)),
+            max_text_chars=int(limits.get("max_text_chars", 2000)),
+            device_pref=str(runtime.get("device_preference", "cuda")).lower(),
+            dtype_pref=str(runtime.get("dtype", "bfloat16")).lower(),
+            flash_attention_2=bool(runtime.get("use_flash_attention_2", False)),
+        )
+class Miner:
+    """Loads merged Qwen3-TTS weights from the snapshot and serves the Vocence API."""
+    WARMUP_BUDGET_S = 180.0
+    def __init__(self, path_hf_repo: Path) -> None:
+        self.repo = Path(path_hf_repo).resolve()
+        if not (self.repo / _REPO_REQUIRED_FILE).is_file():
+            raise FileNotFoundError(
+                f"Snapshot incomplete: {self.repo / _REPO_REQUIRED_FILE} not found"
+            )
+        self.opts = _RuntimeOpts.from_repo(self.repo)
+        self.model = self._build_model()
+    def __repr__(self) -> str:
+        return f"<Miner repo={self.repo.name} language={self.opts.language!r}>"
+    # ------------------------------------------------------------------ #
+    # Vocence contract                                                    #
+    # ------------------------------------------------------------------ #
+    def warmup(self) -> None:
+        outcome: dict[str, Any] = {"ok": False, "err": None}
+        def _heat() -> None:
+            try:
+                self.generate_wav(instruction="Calm neutral delivery.", text="Warmup.")
+                outcome["ok"] = True
+            except Exception as exc:  # noqa: BLE001 — surface to host
+                outcome["err"] = repr(exc)
+        worker = threading.Thread(target=_heat, daemon=True)
+        worker.start()
+        worker.join(timeout=self.WARMUP_BUDGET_S)
+        if not outcome["ok"]:
+            raise RuntimeError(f"Miner warmup did not complete: {outcome['err'] or 'timeout'}")
+    def generate_wav(self, instruction: str, text: str) -> tuple[np.ndarray, int]:
+        prompt = self._truncate(instruction, self.opts.max_instruction_chars)
+        body = self._truncate(text, self.opts.max_text_chars)
+        wavs, sample_rate = self.model.generate_voice_design(
+            text=body,
+            instruct=prompt,
+            language=self.opts.language,
+        )
+        if not wavs or wavs[0] is None:
+            raise ValueError("Qwen3-TTS returned no audio")
+        wave = self._coerce_mono_float32(wavs[0])
+        return wave, int(sample_rate)
+    # ------------------------------------------------------------------ #
+    # Internal                                                            #
+    # ------------------------------------------------------------------ #
+    @staticmethod
+    def _truncate(value: str, limit: int) -> str:
+        return value[:limit] if limit and limit > 0 else value
+    @staticmethod
+    def _coerce_mono_float32(arr: Any) -> np.ndarray:
+        wave = np.asarray(arr, dtype=np.float32)
+        if wave.ndim > 1:
+            wave = wave.mean(axis=1)
+        return wave
+    def _build_model(self):
+        import torch
+        from qwen_tts import Qwen3TTSModel
+        cuda_available = bool(torch.cuda.is_available())
+        device_map = "cuda:0" if (self.opts.device_pref == "cuda" and cuda_available) else "cpu"
+        torch_dtype = (
+            torch.bfloat16
+            if (self.opts.dtype_pref == "bfloat16" and cuda_available)
+            else torch.float32
+        )
+        attempt_order = ("flash_attention_2", "sdpa") if self.opts.flash_attention_2 else ("sdpa",)
+        last_error: BaseException | None = None
+        for attn in attempt_order:
+            try:
+                model = Qwen3TTSModel.from_pretrained(
+                    pretrained_model_name_or_path=str(self.repo),
+                    device_map=device_map,
+                    dtype=torch_dtype,
+                    attn_implementation=attn,
+                )
+                print(
+                    f"[Miner] Qwen3-TTS ready on {device_map} "
+                    f"(dtype={self.opts.dtype_pref}, attn={attn})"
+                )
+                return model
+            except Exception as exc:  # noqa: BLE001 — try next attn variant
+                last_error = exc
+        raise RuntimeError(f"Qwen3-TTS failed to load: {last_error!r}")

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ceea4eb6ccabe3049f1485633e287dd21f48ebf4ddd079db35641bd5119310a0
+size 3833403008

preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "padding_side": "left",
+  "padding_value": 0.0,
+  "processor_class": "Qwen3TTSProcessor",
+  "return_attention_mask": true
+}

speech_tokenizer/config.json ADDED Viewed

	@@ -0,0 +1,94 @@

+{
+  "architectures": [
+    "Qwen3TTSTokenizerV2Model"
+  ],
+  "model_type": "qwen3_tts_tokenizer_12hz",
+  "encoder_valid_num_quantizers": 16,
+  "input_sample_rate": 24000,
+  "output_sample_rate": 24000,
+  "decode_upsample_rate": 1920,
+  "encode_downsample_rate": 1920,
+  "decoder_config": {
+    "attention_bias": false,
+    "attention_dropout": 0.0,
+    "latent_dim": 1024,
+    "codebook_dim": 512,
+    "codebook_size": 2048,
+    "decoder_dim": 1536,
+    "hidden_act": "silu",
+    "hidden_size": 512,
+    "intermediate_size": 1024,
+    "layer_scale_initial_scale": 0.01,
+    "max_position_embeddings": 8000,
+    "head_dim": 64,
+    "num_attention_heads": 16,
+    "num_hidden_layers": 8,
+    "num_key_value_heads": 16,
+    "num_quantizers": 16,
+    "num_semantic_quantizers": 1,
+    "rms_norm_eps": 1e-05,
+    "rope_theta": 10000,
+    "semantic_codebook_size": 4096,
+    "sliding_window": 72,
+    "upsample_rates": [
+      8,
+      5,
+      4,
+      3
+    ],
+    "upsampling_ratios": [
+      2,
+      2
+    ],
+    "vector_quantization_hidden_dimension": 512
+  },
+  "encoder_config": {
+    "_frame_rate": 12.5,
+    "attention_bias": false,
+    "attention_dropout": 0.0,
+    "audio_channels": 1,
+    "codebook_dim": 256,
+    "codebook_size": 2048,
+    "compress": 2,
+    "dilation_growth_rate": 2,
+    "dtype": "float32",
+    "head_dim": 64,
+    "hidden_act": "gelu",
+    "hidden_size": 512,
+    "initializer_range": 0.02,
+    "intermediate_size": 2048,
+    "kernel_size": 7,
+    "last_kernel_size": 3,
+    "layer_scale_initial_scale": 0.01,
+    "max_position_embeddings": 8000,
+    "norm_eps": 1e-05,
+    "normalize": false,
+    "num_attention_heads": 8,
+    "num_filters": 64,
+    "num_hidden_layers": 8,
+    "num_key_value_heads": 8,
+    "num_quantizers": 32,
+    "num_residual_layers": 1,
+    "num_semantic_quantizers": 1,
+    "pad_mode": "constant",
+    "residual_kernel_size": 3,
+    "rope_theta": 10000.0,
+    "sampling_rate": 24000,
+    "sliding_window": 250,
+    "transformers_version": "4.57.0.dev0",
+    "trim_right_ratio": 1.0,
+    "upsample_groups": 512,
+    "upsampling_ratios": [
+      8,
+      6,
+      5,
+      4
+    ],
+    "use_cache": false,
+    "use_causal_conv": true,
+    "use_conv_shortcut": false,
+    "use_streaming": false,
+    "vector_quantization_hidden_dimension": 256
+  },
+  "transformers_version": "4.57.3"
+}

speech_tokenizer/configuration.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"framework": "pytorch", "task": "feature-extraction", "allow_remote": true}

speech_tokenizer/model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:836b7b357f5ea43e889936a3709af68dfe3751881acefe4ecf0dbd30ba571258
+size 682293092

speech_tokenizer/preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "chunk_length_s": null,
+  "feature_extractor_type": "EncodecFeatureExtractor",
+  "feature_size": 1,
+  "overlap": null,
+  "padding_side": "right",
+  "padding_value": 0.0,
+  "return_attention_mask": true,
+  "sampling_rate": 24000
+}

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,316 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151665": {
+      "content": "<tool_response>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151666": {
+      "content": "</tool_response>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151667": {
+      "content": "<think>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151668": {
+      "content": "</think>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151669": {
+      "content": "<|audio_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151670": {
+      "content": "<|audio_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151671": {
+      "content": "<tts_pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151672": {
+      "content": "<tts_text_bos>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151673": {
+      "content": "<tts_text_eod>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151674": {
+      "content": "<tts_text_bos_single>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151675": {
+      "content": "<|audio_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>",
+    "<|audio_start|>",
+    "<|audio_end|>",
+    "<tts_pad>",
+    "<tts_text_bos>",
+    "<tts_text_bos_single>",
+    "<|audio_pad|>"
+  ],
+  "extra_special_tokens": {
+    "image_token": "<|image_pad|>",
+    "audio_token": "<|audio_pad|>",
+    "video_token": "<|video_pad|>",
+    "vision_bos_token": "<|vision_start|>",
+    "vision_eos_token": "<|vision_end|>",
+    "audio_bos_token": "<|audio_start|>",
+    "audio_eos_token": "<|audio_end|>"
+  },
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "model_max_length": 131072,
+  "pad_token": "<|endoftext|>",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null,
+  "image_token": "<|image_pad|>",
+  "audio_token": "<|audio_pad|>",
+  "video_token": "<|video_pad|>",
+  "vision_bos_token": "<|vision_start|>",
+  "vision_eos_token": "<|vision_end|>",
+  "audio_bos_token": "<|audio_start|>",
+  "audio_eos_token": "<|audio_end|>"
+}

trainer_state.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "epoch": 1,
+  "step_in_epoch": 0,
+  "global_step": 2500,
+  "num_epochs": 3,
+  "steps_in_epoch": 2500,
+  "gradient_accumulation_steps": 4,
+  "seed": 42,
+  "save_type": "epoch"
+}