How to use from the
Use from the
Transformers library
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-to-speech", model="aiseosae/minerTTS")
# Load model directly
from transformers import AutoModelForSeq2SeqLM
model = AutoModelForSeq2SeqLM.from_pretrained("aiseosae/minerTTS", dtype="auto")
Quick Links

Inference uses qwen_tts.Qwen3TTSModel, loaded from the repo root via from_pretrained(this_folder).

Layout

Path Role
config.json, weights, tokenizer, codec dirs Qwen3-TTS snapshot (as shipped by the upstream model card)
miner.py Vocence engine: Miner, warmup(), generate_wav(instruction, text)
vocence_config.yaml Device, dtype, caps, language
chute_config.yml Chutes image / GPU / scaling / TEE
demo.py Optional local smoke test (if present)

Vocence API

Validators call your deployed chute with JSON shaped like:

{
  "text": "Words to speak.",
  "instruction": "gender: male | pitch: mid | speed: normal | age_group: adult | emotion: neutral | tone: casual | accent: us"
}

The miner forwards text โ†’ generate_voice_design(..., text=...) and instruction โ†’ instruct=..., using language from config (default English).

Configure (vocence_config.yaml)

Area Keys
Runtime device_preference (cuda / cpu), dtype (bfloat16 / float32), use_flash_attention_2, default_language
Generation sample_rate (e.g. 24000), max_seconds
Limits max_text_chars, max_instruction_chars, default_language

Warmup runs one short generate_voice_design with a 180 s timeout.

Local quick test

Install PyTorch (CUDA if available), then:

pip install "qwen-tts" pyyaml soundfile numpy
from pathlib import Path
from miner import Miner

miner = Miner(Path("."))
miner.warmup()
wave, sr = miner.generate_wav(
    instruction="A calm, clear narrator, neutral US accent.",
    text="Hello โ€” this is a short synthesis check.",
)

Or load the class directly from transformers-style layout:

from qwen_tts import Qwen3TTSModel

model = Qwen3TTSModel.from_pretrained(".")  # or your HF repo id
wavs, sr = model.generate_voice_design(
    text="Hello fellas.",
    instruct="Cute voice.",
    language="english",
)

Replace "." with your HF repo id after upload, e.g. "your-org/your-repo".

Chutes / Vocence deploy

  1. Push this layout to a Hugging Face model repo; pin a commit SHA for VOCENCE_REVISION.
  2. Render the canonical Vocence chute script with VOCENCE_REPO, VOCENCE_REVISION, VOCENCE_CHUTES_USER, VOCENCE_CHUTE_ID.
  3. chutes build โ€ฆ --wait then chutes deploy โ€ฆ --accept-fee.
  4. Commit on chain: model_name, model_revision (HF SHA), chute_id (UUID from Chutes).

Chute name must contain vocence (case-insensitive). See miner_sample/MINER_GUIDE.md in the Vocence repo.

Training / fine-tuning

Fine-tuning is done outside Chutes on your own GPU; export a full snapshot compatible with Qwen3TTSModel.from_pretrained(...), then replace weights in this repo layout and push a new revision.

License

CC BY-NC-SA 4.0 โ€” see the license file in this repo. Respect upstream Qwen / Alibaba terms for the base checkpoint.

Downloads last month
18
Safetensors
Model size
2B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for aiseosae/minerTTS

Finetuned
(34)
this model