Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,44 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc-by-nc-4.0
|
| 3 |
+
tags:
|
| 4 |
+
- mmaudio
|
| 5 |
+
- audio-generation
|
| 6 |
+
- video-to-audio
|
| 7 |
+
- fp16
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
# MMAudio Models (FP16 Safetensors)
|
| 11 |
+
|
| 12 |
+
FP16 `.safetensors` conversions of [MMAudio](https://github.com/hkchengrex/MMAudio) model checkpoints for use with [ComfyUI-FFMPEGA](https://github.com/AEmotionStudio/ComfyUI-FFMPEGA).
|
| 13 |
+
|
| 14 |
+
## Models
|
| 15 |
+
|
| 16 |
+
| File | Description | FP16 Size | Original Size |
|
| 17 |
+
|------|-------------|-----------|---------------|
|
| 18 |
+
| `mmaudio_large_44k_v2.safetensors` | MMAudio large model (44kHz, v2) | 1,966 MB | 3,932 MB |
|
| 19 |
+
| `v1-44.safetensors` | VAE decoder (44kHz) | 583 MB | 1,165 MB |
|
| 20 |
+
| `synchformer_state_dict.safetensors` | Synchformer temporal encoder | 453 MB | 906 MB |
|
| 21 |
+
|
| 22 |
+
All models have been converted from FP32 `.pth` → **FP16** `.safetensors` (50% size reduction).
|
| 23 |
+
|
| 24 |
+
## Usage
|
| 25 |
+
|
| 26 |
+
These models are automatically downloaded by the `generate_audio` skill in ComfyUI-FFMPEGA. No manual setup needed.
|
| 27 |
+
|
| 28 |
+
**Manual installation:**
|
| 29 |
+
```
|
| 30 |
+
ComfyUI/models/mmaudio/
|
| 31 |
+
├── mmaudio_large_44k_v2.pth (converted from .safetensors on download)
|
| 32 |
+
├── v1-44.pth
|
| 33 |
+
└── synchformer_state_dict.pth
|
| 34 |
+
```
|
| 35 |
+
|
| 36 |
+
## License
|
| 37 |
+
|
| 38 |
+
> ⚠️ **CC-BY-NC 4.0** — These model checkpoints are licensed under [Creative Commons Attribution-NonCommercial 4.0](https://creativecommons.org/licenses/by-nc/4.0/).
|
| 39 |
+
> Commercial use of the models is restricted. The code that loads/runs them is MIT/GPL-3.0.
|
| 40 |
+
|
| 41 |
+
## Source
|
| 42 |
+
|
| 43 |
+
Original models: [hkchengrex/MMAudio](https://huggingface.co/hkchengrex/MMAudio)
|
| 44 |
+
Paper: *Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis* (CVPR 2025)
|