Stable-Audio-Open / README.md
OnyxMunk's picture
Refactor application to integrate AudioLDM2 model
a82c7f0

A newer version of the Gradio SDK is available: 6.5.1

Upgrade
metadata
title: Stable Audio Open
emoji: 🎡
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.2.0
app_file: app.py
pinned: false

🎡 Stable Audio Open

An open-source web interface for generating high-quality audio from text prompts using advanced AI models. Create music, sound effects, ambient sounds, and more with simple text descriptions.

Features

  • 🎼 Text-to-Audio Generation: Convert text prompts into audio
  • 🎚️ Customizable Duration: Generate audio from 1-30 seconds
  • 🎲 Reproducible Results: Use seeds for consistent generation
  • 🎧 Real-time Playback: Listen to generated audio instantly
  • πŸ“ Example Prompts: Pre-built examples to get you started

Setup

Hugging Face Authentication (Recommended)

For the best experience and to avoid rate limits, set up Hugging Face authentication:

  1. Create a Hugging Face account at huggingface.co
  2. Go to Settings > Access Tokens and create a new token
  3. Copy env-example.txt to .env and add your token:
    HF_TOKEN=hf_your_token_here
    
  4. Install dependencies: pip install -r requirements.txt

Note: Without authentication, you may experience rate limits or reduced access to some models.

Usage

  1. Enter a text description of the audio you want to generate
  2. Adjust the duration slider (1-30 seconds)
  3. Optionally set a random seed for reproducible results
  4. Click "Generate Audio" to create your sound

Examples

  • "A gentle piano melody playing in a cozy room"
  • "Upbeat electronic dance music with synthesizers"
  • "Rain falling on a tin roof with distant thunder"
  • "Classical violin concerto with orchestra accompaniment"

Technical Details

This application uses:

  • AudioLDM2 (cvssp/audioldm2) - Advanced AI model for text-to-audio generation
  • Gradio for the web interface
  • PyTorch & Diffusers for model inference
  • NumPy for audio processing and fallback synthesis
  • Automatic fallback to simple synthesis if model is unavailable

Model Information

  • Model: cvssp/audioldm2
  • First Run: Model will be automatically downloaded (~1-2 GB)
  • Device: Automatically uses GPU if available, falls back to CPU
  • Caching: Model is cached in memory for faster subsequent generations

Contributing

This is an open-source project. Contributions are welcome! Feel free to:

  • Report bugs and issues
  • Suggest new features
  • Submit pull requests
  • Improve documentation

License

This project is open source and available under the MIT License.


Built with ❀️ using Hugging Face Spaces and Gradio