--- library_name: mlx-audio-plus base_model: - FunAudioLLM/CosyVoice2-0.5B tags: - mlx - tts - cosyvoice2 pipeline_tag: text-to-speech language: - en - zh - ja - ko --- # mlx-community/CosyVoice2-0.5B-8bit This model was converted to MLX format from [FunAudioLLM/CosyVoice2-0.5B](https://huggingface.co/FunAudioLLM/CosyVoice2-0.5B) using [mlx-audio-plus](https://github.com/DePasqualeOrg/mlx-audio-plus) version **0.1.2**. ## Usage ```bash pip install -U mlx-audio-plus ``` ### Inference Modes | Mode | Parameters | Description | |------|------------|-------------| | Cross-lingual | `ref_audio` | Zero-shot TTS (default) | | Zero-shot | `ref_audio` + `ref_text` | Better quality with transcription | | Instruct | `ref_audio` + `instruct_text` | Style control (e.g., "speak slowly") | | Voice Conversion | `source_audio` + `ref_audio` | Convert audio to target voice | ### Command line ```bash # Cross-lingual (default) mlx_audio.tts --model mlx-community/CosyVoice2-0.5B-8bit --text "Hello!" --ref_audio ref.wav # Zero-shot (with transcription) mlx_audio.tts --model mlx-community/CosyVoice2-0.5B-8bit --text "Hello!" --ref_audio ref.wav --ref_text "Transcription of ref audio." # Instruct (style control) mlx_audio.tts --model mlx-community/CosyVoice2-0.5B-8bit --text "Hello!" --ref_audio ref.wav --instruct_text "Speak slowly and calmly" # Voice Conversion mlx_audio.tts --model mlx-community/CosyVoice2-0.5B-8bit --source_audio source.wav --ref_audio ref.wav ``` ### Python ```python from mlx_audio.tts.generate import generate_audio generate_audio( text="Hello, this is CosyVoice2 on MLX!", model="mlx-community/CosyVoice2-0.5B-8bit", ref_audio="reference.wav", file_prefix="output", ) ```