Resonate / README.md

Update README.md

6e5bc1e verified 3 days ago

3.1 kB

license: mit

Resonate: Reinforcing Text-to-Audio Generation with Online Feedbacks from Large Audio Language Models

Overview

Reosnate is a SOTA text-to-audio generator reinforced with online GRPO algorithm. It leverages the sophisticated reasoning capabilities of modern Large Audio Language Models as reward models. This repo provides a comprehensive pipeline for audio generation, covering Pre-training, SFT, DPO, and GRPO.

Environmental Setup

Create a new conda environment:

conda create -n resonate python=3.11 -y
conda activate resonate
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 --upgrade

Install with pip:

git clone https://github.com/xiquan-li/Resonate.git

cd Resonate
pip install -e .

Quick Start

To generate audio with our pre-trained model, simply run:

python demo.py --prompt 'your prompt'

This will automatically download the pre-trained checkpoints from huggingface, and generate audio according to your prompt. By default, this will use Resonate-GRPO. The output audio will be at Resonate/output/, and the checkpoints will be at Resonate/weights/.