LongCat AudioDiT 3.5B โ€” MLX 8-bit

This repository contains a self-contained MLX-native int8 conversion of LongCat AudioDiT 3.5B for local text-to-speech on Apple Silicon.

It is intended for local speech generation with mlx-speech, without a PyTorch runtime at inference time.

Model Details

  • Developed by: AppAutomaton
  • Shared by: AppAutomaton on Hugging Face
  • Upstream model: meituan-longcat/LongCat-AudioDiT-3.5B
  • Task: text-to-speech and voice cloning
  • Runtime: MLX on Apple Silicon
  • Precision: int8 quantized weights with bundled tokenizer

Bundle Contents

This bundle is self-contained and includes:

  • config.json
  • model.safetensors
  • tokenizer files (tokenizer.json, tokenizer_config.json, special_tokens_map.json)

How to Get Started

Basic generation:

python scripts/generate/longcat_audiodit.py \
  --text "Hello from LongCat AudioDiT." \
  --output-audio outputs/longcat.wav

Voice cloning:

python scripts/generate/longcat_audiodit.py \
  --text "Hello from LongCat AudioDiT." \
  --prompt-text "Original speaker text." \
  --prompt-audio /path/to/prompt.wav \
  --output-audio outputs/longcat_clone.wav \
  --guidance-method apg

Minimal Python usage:

from pathlib import Path

from mlx_speech.generation.longcat_audiodit import generate_longcat_audiodit

result = generate_longcat_audiodit(
    text="Hello from LongCat AudioDiT.",
    output_audio="outputs/longcat.wav",
)

Notes

  • This repo contains the quantized MLX runtime artifact only.
  • The conversion preserves the LongCat AudioDiT diffusion transformer and bundled VAE for waveform decode.
  • Voice cloning uses --guidance-method apg (Adaptive Projected Guidance) or cfg (Classifier-Free Guidance, default). --guidance-strength controls speaker adherence (default: 4.0).
  • The current bundle is intended for local MLX runtime use and parity validation.

Links

License

MIT License โ€” following the upstream license published with meituan-longcat/LongCat-AudioDiT-3.5B.

Downloads last month
46
Safetensors
Model size
1B params
Tensor type
F32
ยท
U32
ยท
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for appautomaton/longcat-audiodit-3.5b-8bit-mlx

Quantized
(2)
this model