AEmotionStudio
/

mmaudio-models

+---
+license: cc-by-nc-4.0
+tags:
+  - mmaudio
+  - audio-generation
+  - video-to-audio
+  - fp16
+---
+# MMAudio Models (FP16 Safetensors)
+FP16 `.safetensors` conversions of [MMAudio](https://github.com/hkchengrex/MMAudio) model checkpoints for use with [ComfyUI-FFMPEGA](https://github.com/AEmotionStudio/ComfyUI-FFMPEGA).
+## Models
+| File | Description | FP16 Size | Original Size |
+|------|-------------|-----------|---------------|
+| `mmaudio_large_44k_v2.safetensors` | MMAudio large model (44kHz, v2) | 1,966 MB | 3,932 MB |
+| `v1-44.safetensors` | VAE decoder (44kHz) | 583 MB | 1,165 MB |
+| `synchformer_state_dict.safetensors` | Synchformer temporal encoder | 453 MB | 906 MB |
+All models have been converted from FP32 `.pth` → **FP16** `.safetensors` (50% size reduction).
+## Usage
+These models are automatically downloaded by the `generate_audio` skill in ComfyUI-FFMPEGA. No manual setup needed.
+**Manual installation:**
+```
+ComfyUI/models/mmaudio/
+├── mmaudio_large_44k_v2.pth   (converted from .safetensors on download)
+├── v1-44.pth
+└── synchformer_state_dict.pth
+```
+## License
+> ⚠️ **CC-BY-NC 4.0** — These model checkpoints are licensed under [Creative Commons Attribution-NonCommercial 4.0](https://creativecommons.org/licenses/by-nc/4.0/).
+> Commercial use of the models is restricted. The code that loads/runs them is MIT/GPL-3.0.
+## Source
+Original models: [hkchengrex/MMAudio](https://huggingface.co/hkchengrex/MMAudio)
+Paper: *Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis* (CVPR 2025)