Resonate / README.md
AndreasXi's picture
Update README.md
6e5bc1e verified
metadata
license: mit

Resonate: Reinforcing Text-to-Audio Generation with Online Feedbacks from Large Audio Language Models

Paper Code Hugging Face Model Hugging Face Space Webpage

Overview

Reosnate is a SOTA text-to-audio generator reinforced with online GRPO algorithm. It leverages the sophisticated reasoning capabilities of modern Large Audio Language Models as reward models. This repo provides a comprehensive pipeline for audio generation, covering Pre-training, SFT, DPO, and GRPO.

Environmental Setup

  1. Create a new conda environment:
conda create -n resonate python=3.11 -y
conda activate resonate
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 --upgrade
  1. Install with pip:
git clone https://github.com/xiquan-li/Resonate.git

cd Resonate
pip install -e .

Quick Start

To generate audio with our pre-trained model, simply run:

python demo.py --prompt 'your prompt'

This will automatically download the pre-trained checkpoints from huggingface, and generate audio according to your prompt. By default, this will use Resonate-GRPO. The output audio will be at Resonate/output/, and the checkpoints will be at Resonate/weights/.