minerTTS / README.md
aiseosae's picture
Upload folder using huggingface_hub
87c19a6 verified
metadata
license: cc-by-nc-sa-4.0
base_model: Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign
pipeline_tag: text-to-speech
library_name: transformers
language:
  - en
tags:
  - tts
  - qwen3-tts
  - voice-design
  - prompttts
  - vocence
  - bittensor

Inference uses qwen_tts.Qwen3TTSModel, loaded from the repo root via from_pretrained(this_folder).

Layout

Path Role
config.json, weights, tokenizer, codec dirs Qwen3-TTS snapshot (as shipped by the upstream model card)
miner.py Vocence engine: Miner, warmup(), generate_wav(instruction, text)
vocence_config.yaml Device, dtype, caps, language
chute_config.yml Chutes image / GPU / scaling / TEE
demo.py Optional local smoke test (if present)

Vocence API

Validators call your deployed chute with JSON shaped like:

{
  "text": "Words to speak.",
  "instruction": "gender: male | pitch: mid | speed: normal | age_group: adult | emotion: neutral | tone: casual | accent: us"
}

The miner forwards textgenerate_voice_design(..., text=...) and instructioninstruct=..., using language from config (default English).

Configure (vocence_config.yaml)

Area Keys
Runtime device_preference (cuda / cpu), dtype (bfloat16 / float32), use_flash_attention_2, default_language
Generation sample_rate (e.g. 24000), max_seconds
Limits max_text_chars, max_instruction_chars, default_language

Warmup runs one short generate_voice_design with a 180 s timeout.

Local quick test

Install PyTorch (CUDA if available), then:

pip install "qwen-tts" pyyaml soundfile numpy
from pathlib import Path
from miner import Miner

miner = Miner(Path("."))
miner.warmup()
wave, sr = miner.generate_wav(
    instruction="A calm, clear narrator, neutral US accent.",
    text="Hello — this is a short synthesis check.",
)

Or load the class directly from transformers-style layout:

from qwen_tts import Qwen3TTSModel

model = Qwen3TTSModel.from_pretrained(".")  # or your HF repo id
wavs, sr = model.generate_voice_design(
    text="Hello fellas.",
    instruct="Cute voice.",
    language="english",
)

Replace "." with your HF repo id after upload, e.g. "your-org/your-repo".

Chutes / Vocence deploy

  1. Push this layout to a Hugging Face model repo; pin a commit SHA for VOCENCE_REVISION.
  2. Render the canonical Vocence chute script with VOCENCE_REPO, VOCENCE_REVISION, VOCENCE_CHUTES_USER, VOCENCE_CHUTE_ID.
  3. chutes build … --wait then chutes deploy … --accept-fee.
  4. Commit on chain: model_name, model_revision (HF SHA), chute_id (UUID from Chutes).

Chute name must contain vocence (case-insensitive). See miner_sample/MINER_GUIDE.md in the Vocence repo.

Training / fine-tuning

Fine-tuning is done outside Chutes on your own GPU; export a full snapshot compatible with Qwen3TTSModel.from_pretrained(...), then replace weights in this repo layout and push a new revision.

License

CC BY-NC-SA 4.0 — see the license file in this repo. Respect upstream Qwen / Alibaba terms for the base checkpoint.