VoxCPM2-4bit / README.md

acul3

Upload README.md with huggingface_hub

dc9e5c1 verified 2 days ago

preview code

raw

history blame contribute delete

2.21 kB

metadata

license: apache-2.0
base_model: openbmb/VoxCPM2
tags:
  - mlx
  - tts
  - text-to-speech
  - voice-cloning
  - voice-design
  - multilingual
library_name: mlx-audio
pipeline_tag: text-to-speech
language:
  - en
  - zh
  - id
  - ja
  - ko
  - multilingual

VoxCPM2 - 4-bit quantized

MLX port of openbmb/VoxCPM2 — a 2B-parameter multilingual TTS model with 48kHz studio-quality output, voice cloning, and voice design.

4-bit quantized (LM layers only, VAE/DiT at full precision). Fastest, smallest, with minimal quality loss.

Features

30 languages — including English, Chinese, Indonesian, Japanese, Korean, and more
48kHz output — studio-quality audio
Voice Design — create voices from text descriptions (no reference audio needed)
Voice Cloning — clone any voice from a short audio reference
4 generation modes — zero-shot, continuation, reference cloning, combined

Usage

pip install mlx-audio

# Zero-shot
python -m mlx_audio.tts.generate --model mlx-community/VoxCPM2-4bit --text "Hello world" --verbose

# Voice design
python -m mlx_audio.tts.generate --model mlx-community/VoxCPM2-4bit \
  --text "Hello world" \
  --instruct "A young woman, gentle and sweet voice"

# Voice cloning
python -m mlx_audio.tts.generate --model mlx-community/VoxCPM2-4bit \
  --text "Hello world" \
  --ref_audio speaker.wav --ref_text "reference text"

Python API

from mlx_audio.tts import load_model

model = load_model("mlx-community/VoxCPM2-4bit")

# Generate
for result in model.generate(
    text="Hello, this is VoxCPM2 on Apple Silicon.",
    inference_timesteps=7,
    cfg_value=2.0,
):
    print(f"Duration: {result.audio_duration}")

Performance (Apple Silicon)

Variant	Size	RTF (7 timesteps)
bf16	4.96 GB	0.48x
8-bit	3.23 GB	0.85x
4-bit	2.30 GB	0.90x

RTF = Real-Time Factor (>1.0 = faster than realtime)

Original Model

openbmb/VoxCPM2
Apache 2.0 License

Converted with mlx-audio.