--- license: cc-by-nc-4.0 tags: - audiox - audio-generation - music-generation - text-to-audio - video-to-audio - audio-inpainting - safetensors base_model: - HKUSTAudio/AudioX - HKUSTAudio/AudioX-MAF pipeline_tag: text-to-audio --- # AudioX Models (Safetensors) `.safetensors` conversions of [AudioX-MAF](https://huggingface.co/HKUSTAudio/AudioX-MAF) model checkpoints for use with [ComfyUI-FFMPEGA](https://github.com/AEmotionStudio/ComfyUI-FFMPEGA). AudioX is a unified anything-to-audio model from ICLR 2026 that supports text-to-audio, text-to-music, video-to-audio, and audio inpainting. ## Models | File | Description | Size | |------|-------------|------| | `model.safetensors` | AudioX-MAF DiT model (full precision) | 5.19 GB | | `synchformer_state_dict.safetensors` | Synchformer temporal encoder (shared with MMAudio) | 475 MB | | `config.json` | Model architecture configuration | 3.3 KB | ## Sources All models were downloaded from their **original sources** and converted by us: - **AudioX-MAF**: [HKUSTAudio/AudioX-MAF](https://huggingface.co/HKUSTAudio/AudioX-MAF) - **Synchformer**: Shared with [MMAudio](https://huggingface.co/hkchengrex/MMAudio) ## Usage These models are automatically downloaded by the `generate_music` and `audio_inpaint` skills in ComfyUI-FFMPEGA. No manual setup needed. **Manual installation:** ``` ComfyUI/models/audiox/ ├── model.safetensors ├── synchformer_state_dict.safetensors └── config.json ``` > **Note:** The `synchformer_state_dict.safetensors` is shared with MMAudio. If you already have it in `ComfyUI/models/mmaudio/`, AudioX will reuse it automatically — no duplicate download needed. ## Capabilities | Skill | Description | |-------|-------------| | `generate_music` | Text-to-music and video-to-music generation | | `audio_inpaint` | Fill gaps, extend, or regenerate sections of audio | ## License > ⚠️ **CC-BY-NC 4.0** — AudioX model weights are licensed under [Creative Commons Attribution-NonCommercial 4.0](https://creativecommons.org/licenses/by-nc/4.0/). > Commercial use of the models is restricted. The code that loads/runs them is GPL-3.0. ## Paper *AudioX: Diffusion Transformer for Anything-to-Audio Generation* (ICLR 2026) [arXiv:2503.10522](https://arxiv.org/abs/2503.10522)