AEmotionStudio commited on
Commit
9edfe69
·
verified ·
1 Parent(s): 8c9027c

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +67 -0
README.md ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ tags:
4
+ - audiox
5
+ - audio-generation
6
+ - music-generation
7
+ - text-to-audio
8
+ - video-to-audio
9
+ - audio-inpainting
10
+ - safetensors
11
+ base_model:
12
+ - HKUSTAudio/AudioX
13
+ - HKUSTAudio/AudioX-MAF
14
+ pipeline_tag: text-to-audio
15
+ ---
16
+
17
+ # AudioX Models (Safetensors)
18
+
19
+ `.safetensors` conversions of [AudioX-MAF](https://huggingface.co/HKUSTAudio/AudioX-MAF) model checkpoints for use with [ComfyUI-FFMPEGA](https://github.com/AEmotionStudio/ComfyUI-FFMPEGA).
20
+
21
+ AudioX is a unified anything-to-audio model from ICLR 2026 that supports text-to-audio, text-to-music, video-to-audio, and audio inpainting.
22
+
23
+ ## Models
24
+
25
+ | File | Description | Size |
26
+ |------|-------------|------|
27
+ | `model.safetensors` | AudioX-MAF DiT model (full precision) | 5.19 GB |
28
+ | `synchformer_state_dict.safetensors` | Synchformer temporal encoder (shared with MMAudio) | 475 MB |
29
+ | `config.json` | Model architecture configuration | 3.3 KB |
30
+
31
+ ## Sources
32
+
33
+ All models were downloaded from their **original sources** and converted by us:
34
+
35
+ - **AudioX-MAF**: [HKUSTAudio/AudioX-MAF](https://huggingface.co/HKUSTAudio/AudioX-MAF)
36
+ - **Synchformer**: Shared with [MMAudio](https://huggingface.co/hkchengrex/MMAudio)
37
+
38
+ ## Usage
39
+
40
+ These models are automatically downloaded by the `generate_music` and `audio_inpaint` skills in ComfyUI-FFMPEGA. No manual setup needed.
41
+
42
+ **Manual installation:**
43
+ ```
44
+ ComfyUI/models/audiox/
45
+ ├── model.safetensors
46
+ ├── synchformer_state_dict.safetensors
47
+ └── config.json
48
+ ```
49
+
50
+ > **Note:** The `synchformer_state_dict.safetensors` is shared with MMAudio. If you already have it in `ComfyUI/models/mmaudio/`, AudioX will reuse it automatically — no duplicate download needed.
51
+
52
+ ## Capabilities
53
+
54
+ | Skill | Description |
55
+ |-------|-------------|
56
+ | `generate_music` | Text-to-music and video-to-music generation |
57
+ | `audio_inpaint` | Fill gaps, extend, or regenerate sections of audio |
58
+
59
+ ## License
60
+
61
+ > ⚠️ **CC-BY-NC 4.0** — AudioX model weights are licensed under [Creative Commons Attribution-NonCommercial 4.0](https://creativecommons.org/licenses/by-nc/4.0/).
62
+ > Commercial use of the models is restricted. The code that loads/runs them is GPL-3.0.
63
+
64
+ ## Paper
65
+
66
+ *AudioX: Diffusion Transformer for Anything-to-Audio Generation* (ICLR 2026)
67
+ [arXiv:2503.10522](https://arxiv.org/abs/2503.10522)