Text-to-Speech
Transformers
Safetensors
Qwen3-TTS
English
text-generation
tts
prompttts
qwen3-tts
voice-design
vocence
Instructions to use aiseosae/good_v3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use aiseosae/good_v3 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-to-speech", model="aiseosae/good_v3")# Load model directly from transformers import AutoModelForSeq2SeqLM model = AutoModelForSeq2SeqLM.from_pretrained("aiseosae/good_v3", dtype="auto") - Notebooks
- Google Colab
- Kaggle
| license: cc-by-nc-sa-4.0 | |
| base_model: Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign | |
| pipeline_tag: text-to-speech | |
| library_name: transformers | |
| language: | |
| - en | |
| tags: | |
| - tts | |
| - prompttts | |
| - qwen3-tts | |
| - voice-design | |
| - vocence | |
| # vocence_miner_v3 | |
| A reliability-and-naturalness pass over the prompt-driven Qwen3-TTS-12Hz-1.7B-VoiceDesign backbone. v3 ships two changes that matter at inference time: | |
| **1. Full-sentence generation.** Earlier checkpoints would sometimes render only the first clause of a longer input β the rest of the sentence would be cut off, dropped, or replaced with silence. v3 generates the entire input from start to end, including longer sentences with intermediate clauses, em-dashes, and parenthetical asides. | |
| **2. More natural delivery.** Across the same prompt set, v3 produces audibly smoother prosody β fewer flat reads on neutral prompts, less "narrated" surface on short utterances, and more believable breath placement on persona reads. | |
| Everything else stays the same: free-form English `instruct`, 24 kHz mono output, single-call inference, no reference audio. | |
| --- | |
| ## Use it | |
| ```bash | |
| pip install qwen-tts transformers torch soundfile | |
| ``` | |
| ```python | |
| from qwen_tts import Qwen3TTSModel | |
| import soundfile as sf | |
| m = Qwen3TTSModel.from_pretrained("magma90909/vocence_miner_v3") | |
| wavs, sr = m.generate_voice_design( | |
| text="When I got home, the lights were on, the back door was wide open, and somebody had left tea brewing on the kitchen counter.", | |
| instruct="A nervous middle-aged man recounting the moment, slightly hushed, slightly fast.", | |
| language="english", | |
| ) | |
| sf.write("out.wav", wavs[0], sr) | |
| ``` | |
| The example deliberately uses a long, multi-clause sentence β the kind that earlier checkpoints would clip mid-read. | |
| --- | |
| ## What `instruct` understands | |
| | Axis | Working values | | |
| |------|----------------| | |
| | Gender | male, female | | |
| | Pitch | deep, low, medium, high, thin | | |
| | Pace | slow, halting, moderate, brisk, fast | | |
| | Affect | neutral, happy, sad, angry, fearful, urgent, calm, projected, whispered, sarcastic | | |
| | Persona | bedtime storyteller, news anchor, sports announcer, stern parent, weary narrator | | |
| Lead with gender on emotion-heavy prompts to avoid timbre drift. | |
| --- | |
| ## Caveats | |
| - English only β other languages were not part of this checkpoint's adaptation set. | |
| - Strongly expressive reads (drawn-out sad reads, projected announcer reads) may run slightly less precise on automatic transcription than the base. The trade-off was made deliberately for delivery character. | |
| - CC BY-NC-SA 4.0 β research and non-commercial use only. | |
| --- | |
| ## What's in the repo | |
| - `model.safetensors` β merged Talker weights | |
| - `speech_tokenizer/` β Qwen3 12 Hz audio codec | |
| - `tokenizer.json`, `vocab.json`, `merges.txt`, configs β text-side assets | |
| - `miner.py`, `chute_config.yml`, `vocence_config.yaml` β Vocence engine glue (TEE / pro_6000) | |
| - `demo.py` β quick smoke test | |
| The Vocence files make this repo deployable on **Bittensor SN78 (Vocence)** via the canonical Vocence/Chutes wrapper without modification. | |