mlx-community
/

LongCat-AudioDiT-3.5B-5bit

@@ -1,8 +1,5 @@
 ---
-license: mit
-language:
-- zh
-- en
 tags:
 - mlx
 - text-to-speech
@@ -11,8 +8,8 @@ tags:
 - voice cloning
 - tts
 - mlx-audio
-library_name: mlx-audio
 ---
 # mlx-community/LongCat-AudioDiT-3.5B-5bit
 This model was converted to MLX format from [`meituan-longcat/LongCat-AudioDiT-3.5B`](https://huggingface.co/meituan-longcat/LongCat-AudioDiT-3.5B) using mlx-audio version **0.4.3**.
@@ -25,22 +22,83 @@ Refer to the [original model card](https://huggingface.co/meituan-longcat/LongCa
 pip install -U mlx-audio
 ```
-### CLI Example:
-```bash
-python -m mlx_audio.tts.generate --model mlx-community/LongCat-AudioDiT-3.5B-5bit --text "Hello, this is a test."
 ```
-### Python Example:
-```python
-from mlx_audio.tts.utils import load_model
-from mlx_audio.tts.generate import generate_audio
-model = load_model("mlx-community/LongCat-AudioDiT-3.5B-5bit")
-generate_audio(
-    model=model,
-    text="Hello, this is a test.",
-    ref_audio="path_to_audio.wav",
-    file_prefix="test_audio",
-)
 ```

 ---
+library_name: mlx-audio
 tags:
 - mlx
 - text-to-speech
 - voice cloning
 - tts
 - mlx-audio
 ---
 # mlx-community/LongCat-AudioDiT-3.5B-5bit
 This model was converted to MLX format from [`meituan-longcat/LongCat-AudioDiT-3.5B`](https://huggingface.co/meituan-longcat/LongCat-AudioDiT-3.5B) using mlx-audio version **0.4.3**.
 pip install -U mlx-audio
 ```
+## Usage
+```python
+from mlx_audio.tts.utils import load
+model = load("mlx-community/LongCat-AudioDiT-3.5B-5bit")
+result = next(model.generate("Hello, this is a test of AudioDiT."))
+audio = result.audio  # mlx array, 24kHz
+```
+Play audio directly:
+```python
+from mlx_audio.tts.audio_player import AudioPlayer
+player = AudioPlayer(sample_rate=24000)
+result = next(model.generate("The quick brown fox jumps over the lazy dog."))
+player.queue_audio(result.audio)
+player.wait_for_drain()
+player.stop()
 ```
+## Voice Cloning
+Clone any voice using a reference audio sample and its transcript. Use `guidance_method="apg"` for best voice cloning quality:
+```python
+result = next(model.generate(
+    text="Today is warm turning to rain, with good air quality.",
+    ref_audio="reference.wav",
+    ref_text="Transcript of the reference audio.",
+    guidance_method="apg",
+    cfg_strength=4.0,
+    steps=16,
+))
 ```
+## Zero-Shot Generation (Chinese)
+```python
+result = next(model.generate(
+    text="今天晴暖转阴雨，空气质量优至良，空气相对湿度较低。",
+    steps=16,
+    cfg_strength=4.0,
+))
+```
+## Generation Parameters
+| Parameter | Default | Description |
+|-----------|---------|-------------|
+| `steps` | 16 | Euler ODE solver steps. Higher = better quality, slower |
+| `cfg_strength` | 4.0 | Classifier-free guidance strength |
+| `guidance_method` | `"cfg"` | `"cfg"` for TTS, `"apg"` for voice cloning |
+| `seed` | 1024 | Random seed for reproducibility |
+| `ref_audio` | `None` | Reference audio for voice cloning (24kHz) |
+| `ref_text` | `None` | Transcript of the reference audio |
+## CLI
+```bash
+# Zero-shot TTS
+python -m mlx_audio.tts.generate \
+  --model mlx-community/LongCat-AudioDiT-3.5B-5bit \
+  --text "Hello, this is a test of AudioDiT." \
+  --play
+# Voice cloning
+python -m mlx_audio.tts.generate \
+  --model mlx-community/LongCat-AudioDiT-3.5B-5bit \
+  --text "Today is warm turning to rain." \
+  --ref_audio reference.wav \
+  --ref_text "Transcript of the reference audio." \
+  --play
+```
+## License
+LongCat-AudioDiT weights and code are released under the [MIT License](https://github.com/meituan-longcat/LongCat-AudioDiT/blob/main/LICENSE).