Stable Audio Open
Paper
• 2407.14358 • Published
• 26
Text-to-audio generation on Apple Silicon using MLX. Supports MusicGen and Stable Audio Open.
Runs entirely on-device via Metal GPU — no cloud API needed.
| Model | Output | Sample Rate | Architecture |
|---|---|---|---|
| MusicGen (small/medium/large) | Mono | 32 kHz | Autoregressive (T5 + Transformer + EnCodec) |
| Stable Audio Open (small/1.0) | Stereo | 44.1 kHz | Diffusion (T5 + DiT + Oobleck VAE) |
# Install
git clone https://github.com/jasonvassallo/mlx-audio-generate
cd mlx-audio-generate
uv sync
# Convert weights (one-time per model)
uv run mlx-audio-convert --model facebook/musicgen-small --output ./converted/musicgen-small
# Generate audio
uv run mlx-audio-generate \
--model musicgen \
--prompt "happy upbeat rock song with electric guitar" \
--seconds 10 \
--weights-dir ./converted/musicgen-small \
--output my_song.wav
# Convert weights
uv run mlx-audio-convert --model stabilityai/stable-audio-open-small --output ./converted/stable-audio
# Generate (stereo, 44.1kHz)
uv run mlx-audio-generate \
--model stable_audio \
--prompt "ambient electronic pad with warm reverb" \
--seconds 15 \
--steps 100 \
--cfg-scale 7.0 \
--weights-dir ./converted/stable-audio \
--output ambient.wav