vocence_miner_v8

A naturalness-first prompt-driven TTS, built on top of magma90909/vocence_miner_v7. Two things distinguish this checkpoint:

  • British English coverage. Phrasings like "A man with a British English accent", "A Scottish woman, conversational", "a Welsh narrator" land on a real distribution rather than slipping back to neutral US English.
  • Conversational subtlety. Tuned for everyday delivery โ€” "speaking warmly", "softly sad", "with a touch of anger, controlled" โ€” rather than theatrical intensity. The model deliberately steps back when you don't ask for drama.

24 kHz mono WAV output, single forward call, no reference audio, no PEFT runtime. Everything ships in this repo.

Generate

pip install qwen-tts transformers torch soundfile
from qwen_tts import Qwen3TTSModel
import soundfile as sf

m = Qwen3TTSModel.from_pretrained("magma90909/vocence_miner_v8")

wavs, sr = m.generate_voice_design(
    text="The train to Edinburgh departs from platform four.",
    instruct="A man with a British English accent, calm and natural.",
    language="english",
)
sf.write("out.wav", wavs[0], sr)

demo.py walks through three preset prompts.

How to write instruct

The model responds best to subtle, conversational language โ€” not intensifiers like "intensely sad" or "nearly shouting". Stack these elements freely:

Layer Phrasings
Accent / region British English, Scottish, Welsh, Northern Irish, Irish, unspecified
Gender a man, a woman, a British woman
Mood speaking warmly, softly sad, quietly pleased, with a touch of anger
Persona bedtime storyteller, soft and warm; news anchor, professional and neutral; meditation guide, soft and serene
Pace unhurried, brisk steady, naturally measured

Some example prompts that work well:

A British man speaks calmly and naturally.
A woman with a Scottish accent, in an everyday speaking tone.
A man, softly sad, calm and unhurried.
A British news anchor, professional and neutral, at a brisk steady pace.
A clear, neutral voice reading the sentence.

Best-fit and not-fit

Best at:

  • Natural, everyday English โ€” both US and UK
  • Bedtime storyteller / news anchor / meditation guide style reads
  • Conversational sadness, warmth, mild anger, gentle pleasure

Less suited for:

  • Theatrical / caricatured delivery (loud anger, shouted joy, dramatic sadness)
  • Extreme intensifier prompts ("nearly shouting", "intensely sad") โ€” the model intentionally tones these down
  • Languages other than English

CC BY-NC-SA 4.0 โ€” research and non-commercial use only.

Files

model.safetensors            # merged Talker weights (3.6 GB)
speech_tokenizer/            # Qwen3 12 Hz audio codec (~650 MB)
tokenizer.json + ...         # text tokenizer
config.json + ...            # model configs
miner.py                     # Vocence engine
chute_config.yml             # Chutes build (TEE / pro_6000)
vocence_config.yaml          # runtime knobs
demo.py                      # quick smoke test

The Vocence files make this repo deployable on Bittensor SN78 (Vocence) via the canonical Vocence/Chutes wrapper without modification.

Downloads last month
-
Safetensors
Model size
2B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support