LTX-2 MLX (Audio-Video)

MLX-optimized version of Lightricks/LTX-2 for Apple Silicon Macs.

Generate synchronized video and audio from text prompts, running entirely on your Mac's GPU.

Features

  • Text-to-Video + Audio: Generate videos with synchronized audio from text
  • Image-to-Video + Audio: Animate images with audio
  • Unified format: Single file for fast loading
  • Native bfloat16: Optimized for Apple Silicon

Requirements

  • macOS with Apple Silicon (M1/M2/M3/M4)
  • ~45GB RAM for generation
  • ffmpeg (brew install ffmpeg)

Installation

pip install git+https://github.com/james-see/mlx-video-with-audio.git

Usage

Text-to-Video with Audio

uv run mlx_video.generate_av \
    --prompt "A jazz band playing in a smoky club" \
    --model-repo notapalindrome/ltx2-mlx-av

Landscape 16:9 format (4 seconds)

uv run mlx_video.generate_av \
    --prompt "Ocean waves crashing on rocks at sunset" \
    --width 768 --height 448 --num-frames 97 \
    --model-repo notapalindrome/ltx2-mlx-av

Image-to-Video with Audio

uv run mlx_video.generate_av \
    --prompt "A person dancing to music" \
    --image photo.jpg \
    --model-repo notapalindrome/ltx2-mlx-av

Model Details

Component Details
Format Unified MLX safetensors
Dtype bfloat16
Size ~42GB
Transformer 48 layers, 19B parameters
Video VAE 128 latent channels
Audio VAE 8 latent channels, 64 mel bins
Vocoder HiFi-GAN, 24kHz stereo output

Generation Parameters

Parameter Default Notes
Height/Width 512 Must be divisible by 64
Frames 65 Must be 8n+1 (9, 17, 25, 33, 41, 49, 57, 65, 73, 81, 89, 97...)
FPS 24 Video framerate

Credits

License

This model inherits the LTX-Video license from Lightricks.

Downloads last month
202
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for notapalindrome/ltx2-mlx-av

Base model

Lightricks/LTX-2
Finetuned
(44)
this model