LTX-2 MLX (Audio-Video)

MLX-optimized version of Lightricks/LTX-2 for Apple Silicon Macs.

Generate synchronized video and audio from text prompts, running entirely on your Mac's GPU.

Features

Text-to-Video + Audio: Generate videos with synchronized audio from text
Image-to-Video + Audio: Animate images with audio
Unified format: Single file for fast loading
Native bfloat16: Optimized for Apple Silicon

Requirements

macOS with Apple Silicon (M1/M2/M3/M4)
~45GB RAM for generation
ffmpeg (brew install ffmpeg)

Installation

pip install git+https://github.com/james-see/mlx-video-with-audio.git

Usage

Text-to-Video with Audio

uv run mlx_video.generate_av \
    --prompt "A jazz band playing in a smoky club" \
    --model-repo notapalindrome/ltx2-mlx-av

Landscape 16:9 format (4 seconds)

uv run mlx_video.generate_av \
    --prompt "Ocean waves crashing on rocks at sunset" \
    --width 768 --height 448 --num-frames 97 \
    --model-repo notapalindrome/ltx2-mlx-av

Image-to-Video with Audio

uv run mlx_video.generate_av \
    --prompt "A person dancing to music" \
    --image photo.jpg \
    --model-repo notapalindrome/ltx2-mlx-av

Model Details

Component	Details
Format	Unified MLX safetensors
Dtype	bfloat16
Size	~42GB
Transformer	48 layers, 19B parameters
Video VAE	128 latent channels
Audio VAE	8 latent channels, 64 mel bins
Vocoder	HiFi-GAN, 24kHz stereo output

Generation Parameters

Parameter	Default	Notes
Height/Width	512	Must be divisible by 64
Frames	65	Must be 8n+1 (9, 17, 25, 33, 41, 49, 57, 65, 73, 81, 89, 97...)
FPS	24	Video framerate

Credits

Original model: Lightricks/LTX-2
MLX conversion: mlx-video-with-audio

License

This model inherits the LTX-Video license from Lightricks.

Downloads last month: 278

MLX

Hardware compatibility

Quantized

Model tree for notapalindrome/ltx2-mlx-av

Base model

Lightricks/LTX-2

Finetuned

(60)

this model