Instructions to use aiseosae/minerTTS with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use aiseosae/minerTTS with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-to-speech", model="aiseosae/minerTTS")# Load model directly from transformers import AutoModelForSeq2SeqLM model = AutoModelForSeq2SeqLM.from_pretrained("aiseosae/minerTTS", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Inference uses qwen_tts.Qwen3TTSModel, loaded from the repo root via from_pretrained(this_folder).
Layout
| Path | Role |
|---|---|
config.json, weights, tokenizer, codec dirs |
Qwen3-TTS snapshot (as shipped by the upstream model card) |
miner.py |
Vocence engine: Miner, warmup(), generate_wav(instruction, text) |
vocence_config.yaml |
Device, dtype, caps, language |
chute_config.yml |
Chutes image / GPU / scaling / TEE |
demo.py |
Optional local smoke test (if present) |
Vocence API
Validators call your deployed chute with JSON shaped like:
{
"text": "Words to speak.",
"instruction": "gender: male | pitch: mid | speed: normal | age_group: adult | emotion: neutral | tone: casual | accent: us"
}
The miner forwards text โ generate_voice_design(..., text=...) and instruction โ instruct=..., using language from config (default English).
Configure (vocence_config.yaml)
| Area | Keys |
|---|---|
| Runtime | device_preference (cuda / cpu), dtype (bfloat16 / float32), use_flash_attention_2, default_language |
| Generation | sample_rate (e.g. 24000), max_seconds |
| Limits | max_text_chars, max_instruction_chars, default_language |
Warmup runs one short generate_voice_design with a 180 s timeout.
Local quick test
Install PyTorch (CUDA if available), then:
pip install "qwen-tts" pyyaml soundfile numpy
from pathlib import Path
from miner import Miner
miner = Miner(Path("."))
miner.warmup()
wave, sr = miner.generate_wav(
instruction="A calm, clear narrator, neutral US accent.",
text="Hello โ this is a short synthesis check.",
)
Or load the class directly from transformers-style layout:
from qwen_tts import Qwen3TTSModel
model = Qwen3TTSModel.from_pretrained(".") # or your HF repo id
wavs, sr = model.generate_voice_design(
text="Hello fellas.",
instruct="Cute voice.",
language="english",
)
Replace "." with your HF repo id after upload, e.g. "your-org/your-repo".
Chutes / Vocence deploy
- Push this layout to a Hugging Face model repo; pin a commit SHA for
VOCENCE_REVISION. - Render the canonical Vocence chute script with
VOCENCE_REPO,VOCENCE_REVISION,VOCENCE_CHUTES_USER,VOCENCE_CHUTE_ID. chutes build โฆ --waitthenchutes deploy โฆ --accept-fee.- Commit on chain:
model_name,model_revision(HF SHA),chute_id(UUID from Chutes).
Chute name must contain vocence (case-insensitive). See miner_sample/MINER_GUIDE.md in the Vocence repo.
Training / fine-tuning
Fine-tuning is done outside Chutes on your own GPU; export a full snapshot compatible with Qwen3TTSModel.from_pretrained(...), then replace weights in this repo layout and push a new revision.
License
CC BY-NC-SA 4.0 โ see the license file in this repo. Respect upstream Qwen / Alibaba terms for the base checkpoint.
- Downloads last month
- 18
Model tree for aiseosae/minerTTS
Base model
Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign