AEmotionStudio commited on
Commit
29ebf9d
·
verified ·
1 Parent(s): ff8ba65

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +44 -0
README.md ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ tags:
4
+ - mmaudio
5
+ - audio-generation
6
+ - video-to-audio
7
+ - fp16
8
+ ---
9
+
10
+ # MMAudio Models (FP16 Safetensors)
11
+
12
+ FP16 `.safetensors` conversions of [MMAudio](https://github.com/hkchengrex/MMAudio) model checkpoints for use with [ComfyUI-FFMPEGA](https://github.com/AEmotionStudio/ComfyUI-FFMPEGA).
13
+
14
+ ## Models
15
+
16
+ | File | Description | FP16 Size | Original Size |
17
+ |------|-------------|-----------|---------------|
18
+ | `mmaudio_large_44k_v2.safetensors` | MMAudio large model (44kHz, v2) | 1,966 MB | 3,932 MB |
19
+ | `v1-44.safetensors` | VAE decoder (44kHz) | 583 MB | 1,165 MB |
20
+ | `synchformer_state_dict.safetensors` | Synchformer temporal encoder | 453 MB | 906 MB |
21
+
22
+ All models have been converted from FP32 `.pth` → **FP16** `.safetensors` (50% size reduction).
23
+
24
+ ## Usage
25
+
26
+ These models are automatically downloaded by the `generate_audio` skill in ComfyUI-FFMPEGA. No manual setup needed.
27
+
28
+ **Manual installation:**
29
+ ```
30
+ ComfyUI/models/mmaudio/
31
+ ├── mmaudio_large_44k_v2.pth (converted from .safetensors on download)
32
+ ├── v1-44.pth
33
+ └── synchformer_state_dict.pth
34
+ ```
35
+
36
+ ## License
37
+
38
+ > ⚠️ **CC-BY-NC 4.0** — These model checkpoints are licensed under [Creative Commons Attribution-NonCommercial 4.0](https://creativecommons.org/licenses/by-nc/4.0/).
39
+ > Commercial use of the models is restricted. The code that loads/runs them is MIT/GPL-3.0.
40
+
41
+ ## Source
42
+
43
+ Original models: [hkchengrex/MMAudio](https://huggingface.co/hkchengrex/MMAudio)
44
+ Paper: *Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis* (CVPR 2025)