acul3 commited on
Commit
dc9e5c1
·
verified ·
1 Parent(s): 2784721

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +84 -0
README.md ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: openbmb/VoxCPM2
4
+ tags:
5
+ - mlx
6
+ - tts
7
+ - text-to-speech
8
+ - voice-cloning
9
+ - voice-design
10
+ - multilingual
11
+ library_name: mlx-audio
12
+ pipeline_tag: text-to-speech
13
+ language:
14
+ - en
15
+ - zh
16
+ - id
17
+ - ja
18
+ - ko
19
+ - multilingual
20
+ ---
21
+
22
+ # VoxCPM2 - 4-bit quantized
23
+
24
+ MLX port of [openbmb/VoxCPM2](https://huggingface.co/openbmb/VoxCPM2) — a 2B-parameter multilingual TTS model with 48kHz studio-quality output, voice cloning, and voice design.
25
+
26
+ 4-bit quantized (LM layers only, VAE/DiT at full precision). Fastest, smallest, with minimal quality loss.
27
+
28
+ ## Features
29
+ - **30 languages** — including English, Chinese, Indonesian, Japanese, Korean, and more
30
+ - **48kHz output** — studio-quality audio
31
+ - **Voice Design** — create voices from text descriptions (no reference audio needed)
32
+ - **Voice Cloning** — clone any voice from a short audio reference
33
+ - **4 generation modes** — zero-shot, continuation, reference cloning, combined
34
+
35
+ ## Usage
36
+
37
+ ```bash
38
+ pip install mlx-audio
39
+
40
+ # Zero-shot
41
+ python -m mlx_audio.tts.generate --model mlx-community/VoxCPM2-4bit --text "Hello world" --verbose
42
+
43
+ # Voice design
44
+ python -m mlx_audio.tts.generate --model mlx-community/VoxCPM2-4bit \
45
+ --text "Hello world" \
46
+ --instruct "A young woman, gentle and sweet voice"
47
+
48
+ # Voice cloning
49
+ python -m mlx_audio.tts.generate --model mlx-community/VoxCPM2-4bit \
50
+ --text "Hello world" \
51
+ --ref_audio speaker.wav --ref_text "reference text"
52
+ ```
53
+
54
+ ### Python API
55
+
56
+ ```python
57
+ from mlx_audio.tts import load_model
58
+
59
+ model = load_model("mlx-community/VoxCPM2-4bit")
60
+
61
+ # Generate
62
+ for result in model.generate(
63
+ text="Hello, this is VoxCPM2 on Apple Silicon.",
64
+ inference_timesteps=7,
65
+ cfg_value=2.0,
66
+ ):
67
+ print(f"Duration: {result.audio_duration}")
68
+ ```
69
+
70
+ ## Performance (Apple Silicon)
71
+
72
+ | Variant | Size | RTF (7 timesteps) |
73
+ |---------|------|--------------------|
74
+ | bf16 | 4.96 GB | 0.48x |
75
+ | **8-bit** | **3.23 GB** | **0.85x** |
76
+ | **4-bit** | **2.30 GB** | **0.90x** |
77
+
78
+ *RTF = Real-Time Factor (>1.0 = faster than realtime)*
79
+
80
+ ## Original Model
81
+ - [openbmb/VoxCPM2](https://huggingface.co/openbmb/VoxCPM2)
82
+ - Apache 2.0 License
83
+
84
+ Converted with [mlx-audio](https://github.com/Blaizzy/mlx-audio).