mlx-community
/

CosyVoice2-0.5B-8bit

Model card Files Files and versions

CosyVoice2-0.5B-8bit / README.md

depasquale's picture

Verified commit

27bd4b4 verified about 11 hours ago

|

history blame contribute delete

1.73 kB

	---
	library_name: mlx-audio-plus
	base_model:
	- FunAudioLLM/CosyVoice2-0.5B
	tags:
	- mlx
	- tts
	- cosyvoice2
	pipeline_tag: text-to-speech
	language:
	- en
	- zh
	- ja
	- ko
	---

	# mlx-community/CosyVoice2-0.5B-8bit

	This model was converted to MLX format from [FunAudioLLM/CosyVoice2-0.5B](https://huggingface.co/FunAudioLLM/CosyVoice2-0.5B) using [mlx-audio-plus](https://github.com/DePasqualeOrg/mlx-audio-plus) version 0.1.2.

	## Usage

	```bash
	pip install -U mlx-audio-plus
	```

	### Inference Modes

	\| Mode \| Parameters \| Description \|
	\|------\|------------\|-------------\|
	\| Cross-lingual \| `ref_audio` \| Zero-shot TTS (default) \|
	\| Zero-shot \| `ref_audio` + `ref_text` \| Better quality with transcription \|
	\| Instruct \| `ref_audio` + `instruct_text` \| Style control (e.g., "speak slowly") \|
	\| Voice Conversion \| `source_audio` + `ref_audio` \| Convert audio to target voice \|

	### Command line

	```bash
	# Cross-lingual (default)
	mlx_audio.tts --model mlx-community/CosyVoice2-0.5B-8bit --text "Hello!" --ref_audio ref.wav

	# Zero-shot (with transcription)
	mlx_audio.tts --model mlx-community/CosyVoice2-0.5B-8bit --text "Hello!" --ref_audio ref.wav --ref_text "Transcription of ref audio."

	# Instruct (style control)
	mlx_audio.tts --model mlx-community/CosyVoice2-0.5B-8bit --text "Hello!" --ref_audio ref.wav --instruct_text "Speak slowly and calmly"

	# Voice Conversion
	mlx_audio.tts --model mlx-community/CosyVoice2-0.5B-8bit --source_audio source.wav --ref_audio ref.wav
	```

	### Python

	```python
	from mlx_audio.tts.generate import generate_audio

	generate_audio(
	text="Hello, this is CosyVoice2 on MLX!",
	model="mlx-community/CosyVoice2-0.5B-8bit",
	ref_audio="reference.wav",
	file_prefix="output",
	)
	```