| | --- |
| | license: mit |
| | --- |
| | |
| | <div align="center"> |
| | <p align="center"> |
| | <h2>Resonate: Reinforcing Text-to-Audio Generation with Online Feedbacks from Large Audio Language Models</h2> |
| | <!-- <a href=>Paper</a> | <a href="https://meanaudio.github.io/">Webpage</a> --> |
| | <!-- |
| | [](https://arxiv.org/abs/2508.06098) |
| | [](https://github.com/xiquan-li/MeanAudio?tab=readme-ov-file) |
| | [](https://huggingface.co/AndreasXi/MeanAudio) |
| | [](https://huggingface.co/spaces/chenxie95/MeanAudio) |
| | [](https://meanaudio.github.io/) --> |
| |
|
| | [](https://arxiv.org/abs/2603.11661) |
| | [](https://github.com/xiquan-li/Resonate) |
| | [](https://huggingface.co/AndreasXi/Resonate) |
| | [](https://huggingface.co/spaces/chenxie95/Resonate) |
| | [](https://resonatedemo.github.io/) |
| | </p> |
| | </div> |
| |
|
| |
|
| | ## Overview |
| | Reosnate is a SOTA text-to-audio generator reinforced with online GRPO algorithm. |
| | It leverages the sophisticated reasoning capabilities of modern Large Audio Language Models as reward models. |
| | This repo provides a comprehensive pipeline for audio generation, covering Pre-training, SFT, DPO, and GRPO. |
| |
|
| | ## Environmental Setup |
| |
|
| | 1. Create a new conda environment: |
| |
|
| | ```bash |
| | conda create -n resonate python=3.11 -y |
| | conda activate resonate |
| | pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 --upgrade |
| | ``` |
| | <!-- ``` |
| | conda install -c conda-forge 'ffmpeg<7 |
| | ``` |
| | (Optional, if you use miniforge and don't already have the appropriate ffmpeg) --> |
| |
|
| | 2. Install with pip: |
| |
|
| | ```bash |
| | git clone https://github.com/xiquan-li/Resonate.git |
| | |
| | cd Resonate |
| | pip install -e . |
| | ``` |
| |
|
| | <!-- (If you encounter the File "setup.py" not found error, upgrade your pip with pip install --upgrade pip) --> |
| |
|
| |
|
| | ## Quick Start |
| |
|
| | <!-- **1. Download pre-trained models:** --> |
| | To generate audio with our pre-trained model, simply run: |
| | ```bash |
| | python demo.py --prompt 'your prompt' |
| | ``` |
| | This will automatically download the pre-trained checkpoints from huggingface, and generate audio according to your prompt. |
| | By default, this will use [Resonate-GRPO](https://huggingface.co/AndreasXi/Resonate/blob/main/Resonate_GRPO.pth). |
| | The output audio will be at `Resonate/output/`, and the checkpoints will be at `Resonate/weights/`. |