mlx-audio-generate

Text-to-audio generation on Apple Silicon using MLX. Supports MusicGen and Stable Audio Open.

Runs entirely on-device via Metal GPU — no cloud API needed.

Supported Models

Model Output Sample Rate Architecture
MusicGen (small/medium/large) Mono 32 kHz Autoregressive (T5 + Transformer + EnCodec)
Stable Audio Open (small/1.0) Stereo 44.1 kHz Diffusion (T5 + DiT + Oobleck VAE)

Quick Start

# Install
git clone https://github.com/jasonvassallo/mlx-audio-generate
cd mlx-audio-generate
uv sync

# Convert weights (one-time per model)
uv run mlx-audio-convert --model facebook/musicgen-small --output ./converted/musicgen-small

# Generate audio
uv run mlx-audio-generate \
  --model musicgen \
  --prompt "happy upbeat rock song with electric guitar" \
  --seconds 10 \
  --weights-dir ./converted/musicgen-small \
  --output my_song.wav

Stable Audio Example

# Convert weights
uv run mlx-audio-convert --model stabilityai/stable-audio-open-small --output ./converted/stable-audio

# Generate (stereo, 44.1kHz)
uv run mlx-audio-generate \
  --model stable_audio \
  --prompt "ambient electronic pad with warm reverb" \
  --seconds 15 \
  --steps 100 \
  --cfg-scale 7.0 \
  --weights-dir ./converted/stable-audio \
  --output ambient.wav

Requirements

  • Python 3.11+
  • Apple Silicon Mac (M1/M2/M3/M4)
  • uv package manager (recommended)

Links

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for jasonvassallo/mlx-audio-generate