KugelAudio-0-Open GGUF

GGUF conversions of kugelaudio/kugelaudio-0-open for use with CrispASR.

Files

File Quant Size Notes
kugelaudio-0-open-f16.gguf F16 16.1 GB Full precision, inference-only (no encoders)
kugelaudio-0-open-q4_k.gguf Q4_K 5.3 GB 4-bit quantized, inference-only (no encoders)

Both files contain the inference-only components (--no-encoders): Qwen2.5-7B language model, 4-layer DiT diffusion head, acoustic connector, and acoustic VAE decoder. Encoder weights (for voice cloning from raw audio) are omitted since the open-source release only supports pre-encoded voices.

Architecture

Text β†’ Qwen2.5-7B (28L, 3584d, GQA 28/4)
  β†’ AR decode with constrained token set
  β†’ 4-layer DiT diffusion head (AdaLN, SwiGLU, v-prediction)
  β†’ 20-step SDE-DPMSolver++ (cosine beta schedule)
  β†’ Acoustic VAE decoder (6-stage ConvNeXt, 3200x upsample)
  β†’ 24 kHz mono PCM
  • 23 languages: en, de, fr, es, it, pt, nl, pl, ru, uk, cs, ro, hu, sv, da, fi, no, el, bg, sk, hr, sr, tr
  • License: MIT
  • Output: 24 kHz mono PCM
  • Classifier-free guidance: cfg_scale=3.0 (default)

Usage with CrispASR

# Synthesize speech
crispasr --backend kugelaudio \
  -m kugelaudio-0-open-q4_k.gguf \
  --tts "Hello, this is a test of the speech synthesis system." \
  --tts-output output.wav

# With auto-download
crispasr --backend kugelaudio -m auto \
  --tts "Hallo, dies ist ein Test." -l de

Conversion

Converted with:

python models/convert-kugelaudio-to-gguf.py \
  --input kugelaudio/kugelaudio-0-open \
  --output kugelaudio-0-open-f16.gguf \
  --no-encoders --type f16

crispasr-quantize kugelaudio-0-open-f16.gguf kugelaudio-0-open-q4_k.gguf q4_k

Original Model

  • Paper/repo: kugelaudio/kugelaudio-0-open
  • Architecture: Based on Microsoft VibeVoice β€” hybrid AR + diffusion TTS
  • Parameters: ~7B (Qwen2.5-7B backbone)
  • Training data: Undisclosed
  • License: MIT
Downloads last month
-
GGUF
Model size
9B params
Architecture
kugelaudio
Hardware compatibility
Log In to add your hardware

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for cstr/kugelaudio-0-open-GGUF

Quantized
(1)
this model