Sad-tts-v1 (Vocence PromptTTS)

Sad-tts-v1 is a Vocence SN78 miner bundle (OmniVoice backbone). Hub: KGSS/Sad-tts-v1 — natural-language instruction plus text in, mono WAV out via miner.py.

Base weights align with k2-fsa/OmniVoice · Apache-2.0 · omnivoice runtime. Pin commits in miner_deploy_sad_tts_v1.py and VOCENCE_HF.md.

Vocence contract

Method	Role
`Miner(path_hf_repo)`	Load checkpoint from a directory (or HF snapshot) containing `config.json`, `model.safetensors`, tokenizers, and `audio_tokenizer/`.
`warmup()`	One short synthesis to prime the stack.
`generate_wav(instruction, text)`	Returns `(float32 mono ndarray, sample_rate)`; typically 24 kHz.

Validators send free-form instruction. OmniVoice voice-design only accepts whitelisted attribute tags; miner.py maps keywords (gender, age, pitch, whisper, accent, Chinese dialects) to those tags. Unmatched instructions fall back to runtime.default_instruct in vocence_config.yaml.

Repo layout

File	Role
`miner.py`	Engine + NL → `instruct` mapping
`chute_config.yml`	Chutes image (PyTorch cu128 + `omnivoice`)
`vocence_config.yaml`	Limits, default voice tags, `num_step` / `guidance_scale`
Weight files	Shipped in this Hub repo (`model.safetensors`, `audio_tokenizer/`); see VOCENCE_HF.md

Local quick check (GPU)

pip install omnivoice torch torchaudio  # match CUDA index from chute_config.yml
# Copy snapshot: huggingface-cli download k2-fsa/OmniVoice --local-dir ./OmniVoice_weights
python -c "
from pathlib import Path
from miner import Miner
m = Miner(Path('./OmniVoice_weights'))
m.warmup()
w, sr = m.generate_wav('Calm female voice, British accent.', 'Hello from OmniVoice on Vocence.')
print(w.shape, sr)
"

License

Apache-2.0 for this packaging layout and miner glue; OmniVoice weights and upstream code remain under their stated licenses on the model card.

Downloads last month: 2

Safetensors

Model size

0.6B params

Tensor type

I64

F32