Resonate: Reinforcing Text-to-Audio Generation with Online Feedbacks from Large Audio Language Models
[](https://arxiv.org/abs/2603.11661)
[](https://github.com/xiquan-li/Resonate)
[](https://huggingface.co/AndreasXi/Resonate)
[](https://huggingface.co/spaces/chenxie95/Resonate)
[](https://resonatedemo.github.io/)
## Overview
Reosnate is a SOTA text-to-audio generator reinforced with online GRPO algorithm.
It leverages the sophisticated reasoning capabilities of modern Large Audio Language Models as reward models.
This repo provides a comprehensive pipeline for audio generation, covering Pre-training, SFT, DPO, and GRPO.
## Environmental Setup
1. Create a new conda environment:
```bash
conda create -n resonate python=3.11 -y
conda activate resonate
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 --upgrade
```
2. Install with pip:
```bash
git clone https://github.com/xiquan-li/Resonate.git
cd Resonate
pip install -e .
```
## Quick Start
To generate audio with our pre-trained model, simply run:
```bash
python demo.py --prompt 'your prompt'
```
This will automatically download the pre-trained checkpoints from huggingface, and generate audio according to your prompt.
By default, this will use [Resonate-GRPO](https://huggingface.co/AndreasXi/Resonate/blob/main/Resonate_GRPO.pth).
The output audio will be at `Resonate/output/`, and the checkpoints will be at `Resonate/weights/`.