tamarher commited on
Commit
95a503f
·
verified ·
1 Parent(s): 0c66ae4

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +56 -0
README.md ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - zh
4
+ - en
5
+ license: apache-2.0
6
+ library_name: mlx
7
+ pipeline_tag: text-to-speech
8
+ tags:
9
+ - mlx
10
+ - tts
11
+ - speech
12
+ - voice-conditioned
13
+ - long-form
14
+ - diffusion
15
+ - apple-silicon
16
+ - quantized
17
+ - 8bit
18
+ ---
19
+
20
+ # VibeVoice — MLX
21
+
22
+ VibeVoice converted and quantized for native MLX inference on Apple Silicon.
23
+
24
+ A hybrid LLM + diffusion architecture built for long-form speech and voice-conditioned generation. Works in greedy or sampled mode, and produces natural-sounding output at scale.
25
+
26
+ ## Variants
27
+
28
+ | Path | Precision |
29
+ | --- | --- |
30
+ | `mlx-int8/` | int8 quantized weights |
31
+
32
+ ## How to Get Started
33
+
34
+ Via [mlx-speech](https://github.com/appautomaton/mlx-speech):
35
+
36
+ ```bash
37
+ python scripts/generate_vibevoice.py \
38
+ --text "Hello from VibeVoice." \
39
+ --output outputs/vibevoice.wav
40
+ ```
41
+
42
+ ```python
43
+ from mlx_speech.generation import VibeVoiceModel
44
+
45
+ model = VibeVoiceModel.from_path("mlx-int8")
46
+ ```
47
+
48
+ ## Model Details
49
+
50
+ VibeVoice uses a 9B-parameter hybrid architecture combining a Qwen2 language model backbone with a continuous diffusion acoustic decoder. Converted to MLX with explicit weight remapping — no PyTorch at inference time.
51
+
52
+ See [mlx-speech](https://github.com/appautomaton/mlx-speech) for the full runtime and conversion code.
53
+
54
+ ## License
55
+
56
+ Apache 2.0.