--- license: cc-by-nc-sa-4.0 base_model: Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign pipeline_tag: text-to-speech library_name: transformers language: - en tags: - tts - qwen3-tts - voice-design - prompttts - vocence - bittensor --- Inference uses **`qwen_tts.Qwen3TTSModel`**, loaded from the repo root via `from_pretrained(this_folder)`. ## Layout | Path | Role | |------|------| | `config.json`, weights, tokenizer, codec dirs | Qwen3-TTS snapshot (as shipped by the upstream model card) | | `miner.py` | Vocence engine: `Miner`, `warmup()`, `generate_wav(instruction, text)` | | `vocence_config.yaml` | Device, dtype, caps, language | | `chute_config.yml` | Chutes image / GPU / scaling / TEE | | `demo.py` | Optional local smoke test (if present) | ## Vocence API Validators call your deployed chute with JSON shaped like: ```json { "text": "Words to speak.", "instruction": "gender: male | pitch: mid | speed: normal | age_group: adult | emotion: neutral | tone: casual | accent: us" } ``` The miner forwards **`text`** → `generate_voice_design(..., text=...)` and **`instruction`** → `instruct=...`, using **`language`** from config (default English). ## Configure (`vocence_config.yaml`) | Area | Keys | |------|------| | Runtime | `device_preference` (`cuda` / `cpu`), `dtype` (`bfloat16` / `float32`), `use_flash_attention_2`, `default_language` | | Generation | `sample_rate` (e.g. 24000), `max_seconds` | | Limits | `max_text_chars`, `max_instruction_chars`, `default_language` | Warmup runs one short `generate_voice_design` with a **180 s** timeout. ## Local quick test Install PyTorch (CUDA if available), then: ```bash pip install "qwen-tts" pyyaml soundfile numpy ``` ```python from pathlib import Path from miner import Miner miner = Miner(Path(".")) miner.warmup() wave, sr = miner.generate_wav( instruction="A calm, clear narrator, neutral US accent.", text="Hello — this is a short synthesis check.", ) ``` Or load the class directly from transformers-style layout: ```python from qwen_tts import Qwen3TTSModel model = Qwen3TTSModel.from_pretrained(".") # or your HF repo id wavs, sr = model.generate_voice_design( text="Hello fellas.", instruct="Cute voice.", language="english", ) ``` Replace `"."` with your HF repo id after upload, e.g. `"your-org/your-repo"`. ## Chutes / Vocence deploy 1. Push this layout to a Hugging Face **model** repo; pin a **commit SHA** for `VOCENCE_REVISION`. 2. Render the canonical Vocence chute script with `VOCENCE_REPO`, `VOCENCE_REVISION`, `VOCENCE_CHUTES_USER`, `VOCENCE_CHUTE_ID`. 3. `chutes build … --wait` then `chutes deploy … --accept-fee`. 4. Commit on chain: `model_name`, `model_revision` (HF SHA), `chute_id` (UUID from Chutes). Chute **name** must contain **`vocence`** (case-insensitive). See **`miner_sample/MINER_GUIDE.md`** in the Vocence repo. ## Training / fine-tuning Fine-tuning is done **outside** Chutes on your own GPU; export a full snapshot compatible with **`Qwen3TTSModel.from_pretrained(...)`**, then replace weights in this repo layout and push a new revision. ## License **CC BY-NC-SA 4.0** — see the license file in this repo. Respect upstream Qwen / Alibaba terms for the base checkpoint.