prince-canuma commited on
Commit
2dbc9eb
·
verified ·
1 Parent(s): 05fb9b8

Update README with usage examples and mlx-audio version

Browse files
Files changed (1) hide show
  1. README.md +77 -19
README.md CHANGED
@@ -1,8 +1,5 @@
1
  ---
2
- license: mit
3
- language:
4
- - zh
5
- - en
6
  tags:
7
  - mlx
8
  - text-to-speech
@@ -11,8 +8,8 @@ tags:
11
  - voice cloning
12
  - tts
13
  - mlx-audio
14
- library_name: mlx-audio
15
  ---
 
16
  # mlx-community/LongCat-AudioDiT-3.5B-5bit
17
 
18
  This model was converted to MLX format from [`meituan-longcat/LongCat-AudioDiT-3.5B`](https://huggingface.co/meituan-longcat/LongCat-AudioDiT-3.5B) using mlx-audio version **0.4.3**.
@@ -25,22 +22,83 @@ Refer to the [original model card](https://huggingface.co/meituan-longcat/LongCa
25
  pip install -U mlx-audio
26
  ```
27
 
28
- ### CLI Example:
29
- ```bash
30
- python -m mlx_audio.tts.generate --model mlx-community/LongCat-AudioDiT-3.5B-5bit --text "Hello, this is a test."
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
  ```
32
 
33
- ### Python Example:
34
- ```python
35
- from mlx_audio.tts.utils import load_model
36
- from mlx_audio.tts.generate import generate_audio
37
 
38
- model = load_model("mlx-community/LongCat-AudioDiT-3.5B-5bit")
39
- generate_audio(
40
- model=model,
41
- text="Hello, this is a test.",
42
- ref_audio="path_to_audio.wav",
43
- file_prefix="test_audio",
44
- )
45
 
 
 
 
 
 
 
 
 
 
46
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ library_name: mlx-audio
 
 
 
3
  tags:
4
  - mlx
5
  - text-to-speech
 
8
  - voice cloning
9
  - tts
10
  - mlx-audio
 
11
  ---
12
+
13
  # mlx-community/LongCat-AudioDiT-3.5B-5bit
14
 
15
  This model was converted to MLX format from [`meituan-longcat/LongCat-AudioDiT-3.5B`](https://huggingface.co/meituan-longcat/LongCat-AudioDiT-3.5B) using mlx-audio version **0.4.3**.
 
22
  pip install -U mlx-audio
23
  ```
24
 
25
+ ## Usage
26
+
27
+ ```python
28
+ from mlx_audio.tts.utils import load
29
+
30
+ model = load("mlx-community/LongCat-AudioDiT-3.5B-5bit")
31
+
32
+ result = next(model.generate("Hello, this is a test of AudioDiT."))
33
+ audio = result.audio # mlx array, 24kHz
34
+ ```
35
+
36
+ Play audio directly:
37
+
38
+ ```python
39
+ from mlx_audio.tts.audio_player import AudioPlayer
40
+
41
+ player = AudioPlayer(sample_rate=24000)
42
+ result = next(model.generate("The quick brown fox jumps over the lazy dog."))
43
+ player.queue_audio(result.audio)
44
+ player.wait_for_drain()
45
+ player.stop()
46
  ```
47
 
48
+ ## Voice Cloning
 
 
 
49
 
50
+ Clone any voice using a reference audio sample and its transcript. Use `guidance_method="apg"` for best voice cloning quality:
 
 
 
 
 
 
51
 
52
+ ```python
53
+ result = next(model.generate(
54
+ text="Today is warm turning to rain, with good air quality.",
55
+ ref_audio="reference.wav",
56
+ ref_text="Transcript of the reference audio.",
57
+ guidance_method="apg",
58
+ cfg_strength=4.0,
59
+ steps=16,
60
+ ))
61
  ```
62
+
63
+ ## Zero-Shot Generation (Chinese)
64
+
65
+ ```python
66
+ result = next(model.generate(
67
+ text="今天晴暖转阴雨,空气质量优至良,空气相对湿度较低。",
68
+ steps=16,
69
+ cfg_strength=4.0,
70
+ ))
71
+ ```
72
+
73
+ ## Generation Parameters
74
+
75
+ | Parameter | Default | Description |
76
+ |-----------|---------|-------------|
77
+ | `steps` | 16 | Euler ODE solver steps. Higher = better quality, slower |
78
+ | `cfg_strength` | 4.0 | Classifier-free guidance strength |
79
+ | `guidance_method` | `"cfg"` | `"cfg"` for TTS, `"apg"` for voice cloning |
80
+ | `seed` | 1024 | Random seed for reproducibility |
81
+ | `ref_audio` | `None` | Reference audio for voice cloning (24kHz) |
82
+ | `ref_text` | `None` | Transcript of the reference audio |
83
+
84
+ ## CLI
85
+
86
+ ```bash
87
+ # Zero-shot TTS
88
+ python -m mlx_audio.tts.generate \
89
+ --model mlx-community/LongCat-AudioDiT-3.5B-5bit \
90
+ --text "Hello, this is a test of AudioDiT." \
91
+ --play
92
+
93
+ # Voice cloning
94
+ python -m mlx_audio.tts.generate \
95
+ --model mlx-community/LongCat-AudioDiT-3.5B-5bit \
96
+ --text "Today is warm turning to rain." \
97
+ --ref_audio reference.wav \
98
+ --ref_text "Transcript of the reference audio." \
99
+ --play
100
+ ```
101
+
102
+ ## License
103
+
104
+ LongCat-AudioDiT weights and code are released under the [MIT License](https://github.com/meituan-longcat/LongCat-AudioDiT/blob/main/LICENSE).