aiseosae
/

2026-TTS

text-generation

british-english

Model card Files Files and versions

2026-TTS / README.md

aiseosae's picture

Upload folder using huggingface_hub

854efa4 verified 28 days ago

|

history blame contribute delete

3.54 kB

	---
	license: cc-by-nc-sa-4.0
	base_model: magma90909/vocence_miner_v7
	pipeline_tag: text-to-speech
	library_name: transformers
	language:
	- en
	tags:
	- tts
	- prompttts
	- qwen3-tts
	- voice-design
	- vocence
	- british-english
	- uk-accent
	---

	# vocence_miner_v8

	A naturalness-first prompt-driven TTS, built on top of `magma90909/vocence_miner_v7`. Two things distinguish this checkpoint:

	* British English coverage. Phrasings like "A man with a British English accent", "A Scottish woman, conversational", "a Welsh narrator" land on a real distribution rather than slipping back to neutral US English.
	* Conversational subtlety. Tuned for everyday delivery — "speaking warmly", "softly sad", "with a touch of anger, controlled" — rather than theatrical intensity. The model deliberately steps back when you don't ask for drama.

	24 kHz mono WAV output, single forward call, no reference audio, no PEFT runtime. Everything ships in this repo.

	## Generate

	```bash
	pip install qwen-tts transformers torch soundfile
	```

	```python
	from qwen_tts import Qwen3TTSModel
	import soundfile as sf

	m = Qwen3TTSModel.from_pretrained("magma90909/vocence_miner_v8")

	wavs, sr = m.generate_voice_design(
	text="The train to Edinburgh departs from platform four.",
	instruct="A man with a British English accent, calm and natural.",
	language="english",
	)
	sf.write("out.wav", wavs[0], sr)
	```

	`demo.py` walks through three preset prompts.

	## How to write `instruct`

	The model responds best to subtle, conversational language — not intensifiers like "intensely sad" or "nearly shouting". Stack these elements freely:

	\| Layer \| Phrasings \|
	\|-------\|-----------\|
	\| Accent / region \| British English, Scottish, Welsh, Northern Irish, Irish, unspecified \|
	\| Gender \| a man, a woman, a British woman \|
	\| Mood \| speaking warmly, softly sad, quietly pleased, with a touch of anger \|
	\| Persona \| bedtime storyteller, soft and warm; news anchor, professional and neutral; meditation guide, soft and serene \|
	\| Pace \| unhurried, brisk steady, naturally measured \|

	Some example prompts that work well:

	```
	A British man speaks calmly and naturally.
	A woman with a Scottish accent, in an everyday speaking tone.
	A man, softly sad, calm and unhurried.
	A British news anchor, professional and neutral, at a brisk steady pace.
	A clear, neutral voice reading the sentence.
	```

	## Best-fit and not-fit

	Best at:
	* Natural, everyday English — both US and UK
	* Bedtime storyteller / news anchor / meditation guide style reads
	* Conversational sadness, warmth, mild anger, gentle pleasure

	Less suited for:
	* Theatrical / caricatured delivery (loud anger, shouted joy, dramatic sadness)
	* Extreme intensifier prompts ("nearly shouting", "intensely sad") — the model intentionally tones these down
	* Languages other than English

	CC BY-NC-SA 4.0 — research and non-commercial use only.

	## Files

	```
	model.safetensors # merged Talker weights (3.6 GB)
	speech_tokenizer/ # Qwen3 12 Hz audio codec (~650 MB)
	tokenizer.json + ... # text tokenizer
	config.json + ... # model configs
	miner.py # Vocence engine
	chute_config.yml # Chutes build (TEE / pro_6000)
	vocence_config.yaml # runtime knobs
	demo.py # quick smoke test
	```

	The Vocence files make this repo deployable on Bittensor SN78 (Vocence) via the canonical Vocence/Chutes wrapper without modification.