vocence_miner_v8

An advanced prompt-driven TTS model trained by fxstar on Qwen3-TTS with specialized features. Two things distinguish this checkpoint:

British English coverage. Phrasings like "A man with a British English accent", "A Scottish woman, conversational", "a Welsh narrator" land on a real distribution rather than slipping back to neutral US English.
Conversational subtlety. Tuned for everyday delivery — "speaking warmly", "softly sad", "with a touch of anger, controlled" — rather than theatrical intensity. The model deliberately steps back when you don't ask for drama.

24 kHz mono WAV output, single forward call, no reference audio, no PEFT runtime. Everything ships in this repo.

Training & Advanced Features

This model was trained by fxstar using:

Base Model: Qwen3-TTS (Alibaba's state-of-the-art TTS architecture)
Advanced Training Techniques: Optimized hyperparameters, custom dataset curation, and fine-tuning for naturalness
Specialized Features:
- Enhanced British English accent modeling (Scottish, Welsh, Northern Irish variations)
- Conversational subtlety optimization for everyday speech patterns
- Emotion and persona control without theatrical exaggeration
- High-quality audio synthesis with minimal artifacts

Generate

pip install qwen-tts transformers torch soundfile

from qwen_tts import Qwen3TTSModel
import soundfile as sf

m = Qwen3TTSModel.from_pretrained("fxstar1128/training-04")

wavs, sr = m.generate_voice_design(
    text="The train to Edinburgh departs from platform four.",
    instruct="A man with a British English accent, calm and natural.",
    language="english",
)
sf.write("out.wav", wavs[0], sr)

demo.py walks through three preset prompts.

How to write `instruct`

The model responds best to subtle, conversational language — not intensifiers like "intensely sad" or "nearly shouting". Stack these elements freely:

Layer	Phrasings
Accent / region	British English, Scottish, Welsh, Northern Irish, Irish, unspecified
Gender	a man, a woman, a British woman
Mood	speaking warmly, softly sad, quietly pleased, with a touch of anger
Persona	bedtime storyteller, soft and warm; news anchor, professional and neutral; meditation guide, soft and serene
Pace	unhurried, brisk steady, naturally measured

Some example prompts that work well:

A British man speaks calmly and naturally.
A woman with a Scottish accent, in an everyday speaking tone.
A man, softly sad, calm and unhurried.
A British news anchor, professional and neutral, at a brisk steady pace.
A clear, neutral voice reading the sentence.

Best-fit and not-fit

Best at:

Natural, everyday English — both US and UK
Bedtime storyteller / news anchor / meditation guide style reads
Conversational sadness, warmth, mild anger, gentle pleasure

Less suited for:

Theatrical / caricatured delivery (loud anger, shouted joy, dramatic sadness)
Extreme intensifier prompts ("nearly shouting", "intensely sad") — the model intentionally tones these down
Languages other than English

Credits

Developer: fxstar
Base Architecture: Qwen3-TTS by Alibaba
Training Framework: Custom optimization pipeline

License

CC BY-NC-SA 4.0 — research and non-commercial use only.

Files

model.safetensors            # merged Talker weights (3.6 GB)
speech_tokenizer/            # Qwen3 12 Hz audio codec (~650 MB)
tokenizer.json + ...         # text tokenizer
config.json + ...            # model configs
miner.py                     # Vocence engine
chute_config.yml             # Chutes build (TEE / pro_6000)
vocence_config.yaml          # runtime knobs
demo.py                      # quick smoke test

The Vocence files make this repo deployable on Bittensor SN78 (Vocence) via the canonical Vocence/Chutes wrapper without modification.

Downloads last month: 2

Safetensors

Model size

2B params

Tensor type

BF16