JuzeZhang
/

ViBES-Audio

@@ -23,15 +23,8 @@ frozen speech **tokenizer** (Whisper-VQ, 12.5 Hz) and **decoder** (CosyVoice flo
 are reused unchanged.
 It is the **speech/text backbone** of [ViBES](https://github.com/Juzezhang/ViBES) (our
-speech-language-behavior model). ViBES therefore comes in two sizes that differ only in this backbone:
-| ViBES variant | Speech/text backbone (Expert-0) | Use |
-|---|---|---|
-| **ViBES (9B)** | GLM-4-Voice-9B | best quality |
-| **ViBES (0.5B)** | **ViBES-Audio (this model)** | ~15× smaller backbone, low-latency / on-device |
-The motion experts are released separately: [`ViBES-Face`](https://huggingface.co/JuzeZhang/ViBES-Face)
-(and ViBES-Body).
 ## Model

 are reused unchanged.
 It is the **speech/text backbone** of [ViBES](https://github.com/Juzezhang/ViBES) (our
+speech-language-behavior model) — a lightweight, low-latency alternative to the GLM-4-Voice-9B base.
+The motion experts are released separately: [`ViBES-Face`](https://huggingface.co/JuzeZhang/ViBES-Face).
 ## Model