minerTTS / README.md

Upload folder using huggingface_hub

87c19a6 verified 8 days ago

3.25 kB

license: cc-by-nc-sa-4.0
base_model: Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign
pipeline_tag: text-to-speech
library_name: transformers
language:
  - en
tags:
  - tts
  - qwen3-tts
  - voice-design
  - prompttts
  - vocence
  - bittensor

Inference uses qwen_tts.Qwen3TTSModel, loaded from the repo root via from_pretrained(this_folder).

Layout

Path	Role
`config.json`, weights, tokenizer, codec dirs	Qwen3-TTS snapshot (as shipped by the upstream model card)
`miner.py`	Vocence engine: `Miner`, `warmup()`, `generate_wav(instruction, text)`
`vocence_config.yaml`	Device, dtype, caps, language
`chute_config.yml`	Chutes image / GPU / scaling / TEE
`demo.py`	Optional local smoke test (if present)

Vocence API

Validators call your deployed chute with JSON shaped like:

{
  "text": "Words to speak.",
  "instruction": "gender: male | pitch: mid | speed: normal | age_group: adult | emotion: neutral | tone: casual | accent: us"
}

The miner forwards text → generate_voice_design(..., text=...) and instruction → instruct=..., using language from config (default English).

Configure (`vocence_config.yaml`)

Area	Keys
Runtime	`device_preference` (`cuda` / `cpu`), `dtype` (`bfloat16` / `float32`), `use_flash_attention_2`, `default_language`
Generation	`sample_rate` (e.g. 24000), `max_seconds`
Limits	`max_text_chars`, `max_instruction_chars`, `default_language`

Warmup runs one short generate_voice_design with a 180 s timeout.

Local quick test

Install PyTorch (CUDA if available), then:

pip install "qwen-tts" pyyaml soundfile numpy

from pathlib import Path
from miner import Miner

miner = Miner(Path("."))
miner.warmup()
wave, sr = miner.generate_wav(
    instruction="A calm, clear narrator, neutral US accent.",
    text="Hello — this is a short synthesis check.",
)

Or load the class directly from transformers-style layout:

from qwen_tts import Qwen3TTSModel

model = Qwen3TTSModel.from_pretrained(".")  # or your HF repo id
wavs, sr = model.generate_voice_design(
    text="Hello fellas.",
    instruct="Cute voice.",
    language="english",
)

Replace "." with your HF repo id after upload, e.g. "your-org/your-repo".

Chutes / Vocence deploy

Push this layout to a Hugging Face model repo; pin a commit SHA for VOCENCE_REVISION.
Render the canonical Vocence chute script with VOCENCE_REPO, VOCENCE_REVISION, VOCENCE_CHUTES_USER, VOCENCE_CHUTE_ID.
chutes build … --wait then chutes deploy … --accept-fee.
Commit on chain: model_name, model_revision (HF SHA), chute_id (UUID from Chutes).

Chute name must contain vocence (case-insensitive). See miner_sample/MINER_GUIDE.md in the Vocence repo.

Training / fine-tuning

Fine-tuning is done outside Chutes on your own GPU; export a full snapshot compatible with Qwen3TTSModel.from_pretrained(...), then replace weights in this repo layout and push a new revision.

License

CC BY-NC-SA 4.0 — see the license file in this repo. Respect upstream Qwen / Alibaba terms for the base checkpoint.