File size: 1,728 Bytes
c08bace 27bd4b4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
---
library_name: mlx-audio-plus
base_model:
- FunAudioLLM/CosyVoice2-0.5B
tags:
- mlx
- tts
- cosyvoice2
pipeline_tag: text-to-speech
language:
- en
- zh
- ja
- ko
---
# mlx-community/CosyVoice2-0.5B-8bit
This model was converted to MLX format from [FunAudioLLM/CosyVoice2-0.5B](https://huggingface.co/FunAudioLLM/CosyVoice2-0.5B) using [mlx-audio-plus](https://github.com/DePasqualeOrg/mlx-audio-plus) version **0.1.2**.
## Usage
```bash
pip install -U mlx-audio-plus
```
### Inference Modes
| Mode | Parameters | Description |
|------|------------|-------------|
| Cross-lingual | `ref_audio` | Zero-shot TTS (default) |
| Zero-shot | `ref_audio` + `ref_text` | Better quality with transcription |
| Instruct | `ref_audio` + `instruct_text` | Style control (e.g., "speak slowly") |
| Voice Conversion | `source_audio` + `ref_audio` | Convert audio to target voice |
### Command line
```bash
# Cross-lingual (default)
mlx_audio.tts --model mlx-community/CosyVoice2-0.5B-8bit --text "Hello!" --ref_audio ref.wav
# Zero-shot (with transcription)
mlx_audio.tts --model mlx-community/CosyVoice2-0.5B-8bit --text "Hello!" --ref_audio ref.wav --ref_text "Transcription of ref audio."
# Instruct (style control)
mlx_audio.tts --model mlx-community/CosyVoice2-0.5B-8bit --text "Hello!" --ref_audio ref.wav --instruct_text "Speak slowly and calmly"
# Voice Conversion
mlx_audio.tts --model mlx-community/CosyVoice2-0.5B-8bit --source_audio source.wav --ref_audio ref.wav
```
### Python
```python
from mlx_audio.tts.generate import generate_audio
generate_audio(
text="Hello, this is CosyVoice2 on MLX!",
model="mlx-community/CosyVoice2-0.5B-8bit",
ref_audio="reference.wav",
file_prefix="output",
)
```
|