LTX-2 MLX (Audio-Video)
MLX-optimized version of Lightricks/LTX-2 for Apple Silicon Macs.
Generate synchronized video and audio from text prompts, running entirely on your Mac's GPU.
Features
- Text-to-Video + Audio: Generate videos with synchronized audio from text
- Image-to-Video + Audio: Animate images with audio
- Unified format: Single file for fast loading
- Native bfloat16: Optimized for Apple Silicon
Requirements
- macOS with Apple Silicon (M1/M2/M3/M4)
- ~45GB RAM for generation
- ffmpeg (
brew install ffmpeg)
Installation
pip install git+https://github.com/james-see/mlx-video-with-audio.git
Usage
Text-to-Video with Audio
uv run mlx_video.generate_av \
--prompt "A jazz band playing in a smoky club" \
--model-repo notapalindrome/ltx2-mlx-av
Landscape 16:9 format (4 seconds)
uv run mlx_video.generate_av \
--prompt "Ocean waves crashing on rocks at sunset" \
--width 768 --height 448 --num-frames 97 \
--model-repo notapalindrome/ltx2-mlx-av
Image-to-Video with Audio
uv run mlx_video.generate_av \
--prompt "A person dancing to music" \
--image photo.jpg \
--model-repo notapalindrome/ltx2-mlx-av
Model Details
| Component | Details |
|---|---|
| Format | Unified MLX safetensors |
| Dtype | bfloat16 |
| Size | ~42GB |
| Transformer | 48 layers, 19B parameters |
| Video VAE | 128 latent channels |
| Audio VAE | 8 latent channels, 64 mel bins |
| Vocoder | HiFi-GAN, 24kHz stereo output |
Generation Parameters
| Parameter | Default | Notes |
|---|---|---|
| Height/Width | 512 | Must be divisible by 64 |
| Frames | 65 | Must be 8n+1 (9, 17, 25, 33, 41, 49, 57, 65, 73, 81, 89, 97...) |
| FPS | 24 | Video framerate |
Credits
- Original model: Lightricks/LTX-2
- MLX conversion: mlx-video-with-audio
License
This model inherits the LTX-Video license from Lightricks.
- Downloads last month
- 202
Hardware compatibility
Log In
to add your hardware
Quantized
Model tree for notapalindrome/ltx2-mlx-av
Base model
Lightricks/LTX-2