|
|
--- |
|
|
license: mit |
|
|
--- |
|
|
|
|
|
<div align="center"> |
|
|
<p align="center"> |
|
|
<h1>MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows</h1> |
|
|
<!-- <a href=>Paper</a> | <a href="https://meanaudio.github.io/">Webpage</a> --> |
|
|
|
|
|
[](https://arxiv.org/abs/2508.06098) |
|
|
[](https://github.com/xiquan-li/MeanAudio?tab=readme-ov-file) |
|
|
[](https://huggingface.co/AndreasXi/MeanAudio) |
|
|
[](https://huggingface.co/spaces/chenxie95/MeanAudio) |
|
|
[](https://meanaudio.github.io/) |
|
|
|
|
|
|
|
|
</p> |
|
|
</div> |
|
|
|
|
|
|
|
|
## Overview |
|
|
MeanAudio is a novel MeanFlow-based model tailored for fast and faithful text-to-audio generation. It can synthesize realistic sound in a single step, achieving a real-time factor (RTF) of 0.013 on a single NVIDIA 3090 GPU. Moreover, it also demonstrates strong performance in multi-step generation. |
|
|
|
|
|
|
|
|
## Environmental Setup |
|
|
|
|
|
**1. Create a new conda environment:** |
|
|
|
|
|
```bash |
|
|
conda create -n meanaudio python=3.11 -y |
|
|
conda activate meanaudio |
|
|
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 --upgrade |
|
|
``` |
|
|
<!-- ``` |
|
|
conda install -c conda-forge 'ffmpeg<7 |
|
|
``` |
|
|
(Optional, if you use miniforge and don't already have the appropriate ffmpeg) --> |
|
|
|
|
|
**2. Install with pip:** |
|
|
|
|
|
```bash |
|
|
git clone https://github.com/xiquan-li/MeanAudio.git |
|
|
|
|
|
cd MeanAudio |
|
|
pip install -e . |
|
|
``` |
|
|
|
|
|
<!-- (If you encounter the File "setup.py" not found error, upgrade your pip with pip install --upgrade pip) --> |
|
|
|
|
|
|
|
|
## Quick Start |
|
|
|
|
|
<!-- **1. Download pre-trained models:** --> |
|
|
To generate audio with our pre-trained model, simply run: |
|
|
```bash |
|
|
python demo.py --prompt 'your prompt' --num_steps 1 |
|
|
``` |
|
|
This will automatically download the pre-trained checkpoints from huggingface, and generate audio according to your prompt. |
|
|
The output audio will be at `MeanAudio/output/`, and the checkpoints will be at `MeanAudio/weights/`. |
|
|
|
|
|
Have fun with MeanAudio ๐ !!! |