metadata
license: mit
Resonate: Reinforcing Text-to-Audio Generation with Online Feedbacks from Large Audio Language Models
Overview
Reosnate is a SOTA text-to-audio generator reinforced with online GRPO algorithm. It leverages the sophisticated reasoning capabilities of modern Large Audio Language Models as reward models. This repo provides a comprehensive pipeline for audio generation, covering Pre-training, SFT, DPO, and GRPO.
Environmental Setup
- Create a new conda environment:
conda create -n resonate python=3.11 -y
conda activate resonate
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 --upgrade
- Install with pip:
git clone https://github.com/xiquan-li/Resonate.git
cd Resonate
pip install -e .
Quick Start
To generate audio with our pre-trained model, simply run:
python demo.py --prompt 'your prompt'
This will automatically download the pre-trained checkpoints from huggingface, and generate audio according to your prompt.
By default, this will use Resonate-GRPO.
The output audio will be at Resonate/output/, and the checkpoints will be at Resonate/weights/.