depasquale's picture
Verified commit
27bd4b4 verified
---
library_name: mlx-audio-plus
base_model:
- FunAudioLLM/CosyVoice2-0.5B
tags:
- mlx
- tts
- cosyvoice2
pipeline_tag: text-to-speech
language:
- en
- zh
- ja
- ko
---
# mlx-community/CosyVoice2-0.5B-8bit
This model was converted to MLX format from [FunAudioLLM/CosyVoice2-0.5B](https://huggingface.co/FunAudioLLM/CosyVoice2-0.5B) using [mlx-audio-plus](https://github.com/DePasqualeOrg/mlx-audio-plus) version **0.1.2**.
## Usage
```bash
pip install -U mlx-audio-plus
```
### Inference Modes
| Mode | Parameters | Description |
|------|------------|-------------|
| Cross-lingual | `ref_audio` | Zero-shot TTS (default) |
| Zero-shot | `ref_audio` + `ref_text` | Better quality with transcription |
| Instruct | `ref_audio` + `instruct_text` | Style control (e.g., "speak slowly") |
| Voice Conversion | `source_audio` + `ref_audio` | Convert audio to target voice |
### Command line
```bash
# Cross-lingual (default)
mlx_audio.tts --model mlx-community/CosyVoice2-0.5B-8bit --text "Hello!" --ref_audio ref.wav
# Zero-shot (with transcription)
mlx_audio.tts --model mlx-community/CosyVoice2-0.5B-8bit --text "Hello!" --ref_audio ref.wav --ref_text "Transcription of ref audio."
# Instruct (style control)
mlx_audio.tts --model mlx-community/CosyVoice2-0.5B-8bit --text "Hello!" --ref_audio ref.wav --instruct_text "Speak slowly and calmly"
# Voice Conversion
mlx_audio.tts --model mlx-community/CosyVoice2-0.5B-8bit --source_audio source.wav --ref_audio ref.wav
```
### Python
```python
from mlx_audio.tts.generate import generate_audio
generate_audio(
text="Hello, this is CosyVoice2 on MLX!",
model="mlx-community/CosyVoice2-0.5B-8bit",
ref_audio="reference.wav",
file_prefix="output",
)
```