vocence_miner_v8

A naturalness-first prompt-driven TTS, built on top of magma90909/vocence_miner_v7. Two things distinguish this checkpoint:

British English coverage. Phrasings like "A man with a British English accent", "A Scottish woman, conversational", "a Welsh narrator" land on a real distribution rather than slipping back to neutral US English.
Conversational subtlety. Tuned for everyday delivery — "speaking warmly", "softly sad", "with a touch of anger, controlled" — rather than theatrical intensity. The model deliberately steps back when you don't ask for drama.

24 kHz mono WAV output, single forward call, no reference audio, no PEFT runtime. Everything ships in this repo.

Generate

pip install qwen-tts transformers torch soundfile

from qwen_tts import Qwen3TTSModel
import soundfile as sf

m = Qwen3TTSModel.from_pretrained("magma90909/vocence_miner_v8")

wavs, sr = m.generate_voice_design(
    text="The train to Edinburgh departs from platform four.",
    instruct="A man with a British English accent, calm and natural.",
    language="english",
)
sf.write("out.wav", wavs[0], sr)

demo.py walks through three preset prompts.

How to write `instruct`

The model responds best to subtle, conversational language — not intensifiers like "intensely sad" or "nearly shouting". Stack these elements freely:

Layer	Phrasings
Accent / region	British English, Scottish, Welsh, Northern Irish, Irish, unspecified
Gender	a man, a woman, a British woman
Mood	speaking warmly, softly sad, quietly pleased, with a touch of anger
Persona	bedtime storyteller, soft and warm; news anchor, professional and neutral; meditation guide, soft and serene
Pace	unhurried, brisk steady, naturally measured

Some example prompts that work well:

A British man speaks calmly and naturally.
A woman with a Scottish accent, in an everyday speaking tone.
A man, softly sad, calm and unhurried.
A British news anchor, professional and neutral, at a brisk steady pace.
A clear, neutral voice reading the sentence.

Best-fit and not-fit

Best at:

Natural, everyday English — both US and UK
Bedtime storyteller / news anchor / meditation guide style reads
Conversational sadness, warmth, mild anger, gentle pleasure

Less suited for:

Theatrical / caricatured delivery (loud anger, shouted joy, dramatic sadness)
Extreme intensifier prompts ("nearly shouting", "intensely sad") — the model intentionally tones these down
Languages other than English

CC BY-NC-SA 4.0 — research and non-commercial use only.

Files

model.safetensors            # merged Talker weights (3.6 GB)
speech_tokenizer/            # Qwen3 12 Hz audio codec (~650 MB)
tokenizer.json + ...         # text tokenizer
config.json + ...            # model configs
miner.py                     # Vocence engine
chute_config.yml             # Chutes build (TEE / pro_6000)
vocence_config.yaml          # runtime knobs
demo.py                      # quick smoke test

The Vocence files make this repo deployable on Bittensor SN78 (Vocence) via the canonical Vocence/Chutes wrapper without modification.

Downloads last month: 2

Safetensors

Model size

2B params

Tensor type

BF16

vocence_miner_v8

Generate

How to write instruct

Best-fit and not-fit

Files

How to write `instruct`