Sad-tts-v1 (Vocence PromptTTS)

Sad-tts-v1 is a Vocence SN78 miner bundle (OmniVoice backbone). Hub: KGSS/Sad-tts-v1 — natural-language instruction plus text in, mono WAV out via miner.py.

Base weights align with k2-fsa/OmniVoice · Apache-2.0 · omnivoice runtime. Pin commits in miner_deploy_sad_tts_v1.py and VOCENCE_HF.md.

Vocence contract

Method Role
Miner(path_hf_repo) Load checkpoint from a directory (or HF snapshot) containing config.json, model.safetensors, tokenizers, and audio_tokenizer/.
warmup() One short synthesis to prime the stack.
generate_wav(instruction, text) Returns (float32 mono ndarray, sample_rate); typically 24 kHz.

Validators send free-form instruction. OmniVoice voice-design only accepts whitelisted attribute tags; miner.py maps keywords (gender, age, pitch, whisper, accent, Chinese dialects) to those tags. Unmatched instructions fall back to runtime.default_instruct in vocence_config.yaml.

Repo layout

File Role
miner.py Engine + NL → instruct mapping
chute_config.yml Chutes image (PyTorch cu128 + omnivoice)
vocence_config.yaml Limits, default voice tags, num_step / guidance_scale
Weight files Shipped in this Hub repo (model.safetensors, audio_tokenizer/); see VOCENCE_HF.md

Local quick check (GPU)

pip install omnivoice torch torchaudio  # match CUDA index from chute_config.yml
# Copy snapshot: huggingface-cli download k2-fsa/OmniVoice --local-dir ./OmniVoice_weights
python -c "
from pathlib import Path
from miner import Miner
m = Miner(Path('./OmniVoice_weights'))
m.warmup()
w, sr = m.generate_wav('Calm female voice, British accent.', 'Hello from OmniVoice on Vocence.')
print(w.shape, sr)
"

License

Apache-2.0 for this packaging layout and miner glue; OmniVoice weights and upstream code remain under their stated licenses on the model card.

Downloads last month
72
Safetensors
Model size
0.6B params
Tensor type
I64
·
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support