Add 23 pre-converted voice GGUFs (DE/EN/FR/IN/IT/JP/KR/NL/PL/PT/SP)

#1
by lokegud - opened

Adds the 23 voice GGUFs not currently in this repo, pre-converted from the upstream Microsoft VibeVoice demo voices in microsoft/VibeVoice/demo/voices/streaming_model/.

Repo previously had: voice-en-Carter_man.gguf, voice-en-Emma.gguf. After this PR, all 25 voices ship pre-converted - anyone using vibevoice.cpp can hf download mudler/vibevoice.cpp-models --local-dir models and have every voice ready, no torch install or conversion script run required.

Voices added

  • EN (4): Davis (m), Frank (m), Grace (w), Mike (m)
  • DE: Spk0 (m), Spk1 (w)
  • FR: Spk0 (m), Spk1 (w)
  • IT: Spk0 (w), Spk1 (m)
  • SP: Spk0 (w), Spk1 (m)
  • PT: Spk0 (w), Spk1 (m)
  • NL: Spk0 (m), Spk1 (w)
  • PL: Spk0 (m), Spk1 (w)
  • JP: Spk0 (m), Spk1 (w)
  • KR: Spk0 (w), Spk1 (m)
  • IN: Samuel (m)

Total ~176 MB across 23 files.

Conversion details

  • Script: scripts/convert_voice_to_gguf.py from mudler/vibevoice.cpp (this repo's tooling project)
  • Source: https://github.com/microsoft/VibeVoice/raw/main/demo/voices/streaming_model/<name>.pt
  • Smoke-tested four (PL-Spk1, FR-Spk1, JP-Spk1, IN-Samuel) end-to-end with English text + the realtime-0.5B-q8_0 model - all produce valid 24kHz mono PCM WAVs. Foreign-language voices speak English with their native accent (as expected - the voice GGUF carries timbre/prosody, the model handles language).

Naming convention

Kept upstream's <lang>-<name>_<gender>.pt -> voice-<lang>-<name>_<gender>.gguf. Note the existing voice-en-Emma.gguf in this repo dropped the _woman suffix; if you'd like all files normalized one way (with or without gender suffix), happy to follow up.

Licensing

Voices are part of the official Microsoft VibeVoice demo distribution (MIT). Conversion tooling here is MIT. Conversion is straightforward derivative work; no new license obligations introduced.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment