AudioGen Medium (MLX)

This is the MLX-native port of facebook/audiogen-medium, a 1.5B parameter autoregressive transformer for text-to-audio generation.

Model Details

  • Architecture: Autoregressive Transformer LM over EnCodec discrete tokens
  • Parameters: ~1.5B (LM) + EnCodec compression model
  • Sampling rate: 16 kHz
  • Frame rate: 50 Hz (4 codebooks, delayed pattern)
  • Text encoder: T5-large (d_model=1024, 24 layers, 16 heads)
  • Max duration: 10 seconds (configurable)

Files

  • config.json โ€” Model configuration (includes t5_model_name reference)
  • model.safetensors โ€” LM + EnCodec weights
  • model.safetensors.index.json โ€” Weight index (for sharded variants)

T5 Conditioner (extracted separately)

The T5-large text encoder weights are not included in this repository. Use extract_t5.py to extract them from the original facebook/audiogen-medium checkpoint:

python extract_t5.py --output /path/to/audiogen-mlx/t5

This produces a t5/ directory with config.json, model.safetensors, and tokenizer files.

Note: The T5 safetensors keys use MLX-compatible naming (.layer_0. / .layer_1. instead of HuggingFace's .layer.0. / .layer.1.). This is required because MLX's ModuleParameters.unflattened() splits on all dots.

Usage (Swift/MLX)

import MLXAudioGen

let model = try await AudioGenModel.fromPretrained(
    modelFolder: modelURL,
    t5Folder: t5URL
)

let tokens = try await model.generate(
    descriptions: ["dog barking"],
    duration: 5.0,
    cfgCoef: 3.0,
    temperature: 1.0,
    topK: 250
)

let audio = model.decode(tokens: tokens)

T5 Attention

T5's self-attention intentionally does not scale scores by 1/sqrt(d_k). This is a deliberate design choice in the T5 architecture โ€” do not add scaling in the inference code.

License

This model is published under the CC-BY-NC 4.0 license (non-commercial use only), following the original AudioGen license.

Downloads last month
265
Safetensors
Model size
2B params
Tensor type
F32
ยท
F16
ยท
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for mlx-community/audiogen-medium-mlx

Finetuned
(1)
this model