Spaces:
Runtime error
Runtime error
A newer version of the Gradio SDK is available:
6.5.1
metadata
title: Stable Audio Open
emoji: π΅
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.2.0
app_file: app.py
pinned: false
π΅ Stable Audio Open
An open-source web interface for generating high-quality audio from text prompts using advanced AI models. Create music, sound effects, ambient sounds, and more with simple text descriptions.
Features
- πΌ Text-to-Audio Generation: Convert text prompts into audio
- ποΈ Customizable Duration: Generate audio from 1-30 seconds
- π² Reproducible Results: Use seeds for consistent generation
- π§ Real-time Playback: Listen to generated audio instantly
- π Example Prompts: Pre-built examples to get you started
Setup
Hugging Face Authentication (Recommended)
For the best experience and to avoid rate limits, set up Hugging Face authentication:
- Create a Hugging Face account at huggingface.co
- Go to Settings > Access Tokens and create a new token
- Copy
env-example.txtto.envand add your token:HF_TOKEN=hf_your_token_here - Install dependencies:
pip install -r requirements.txt
Note: Without authentication, you may experience rate limits or reduced access to some models.
Usage
- Enter a text description of the audio you want to generate
- Adjust the duration slider (1-30 seconds)
- Optionally set a random seed for reproducible results
- Click "Generate Audio" to create your sound
Examples
- "A gentle piano melody playing in a cozy room"
- "Upbeat electronic dance music with synthesizers"
- "Rain falling on a tin roof with distant thunder"
- "Classical violin concerto with orchestra accompaniment"
Technical Details
This application uses:
- AudioLDM2 (
cvssp/audioldm2) - Advanced AI model for text-to-audio generation - Gradio for the web interface
- PyTorch & Diffusers for model inference
- NumPy for audio processing and fallback synthesis
- Automatic fallback to simple synthesis if model is unavailable
Model Information
- Model:
cvssp/audioldm2 - First Run: Model will be automatically downloaded (~1-2 GB)
- Device: Automatically uses GPU if available, falls back to CPU
- Caching: Model is cached in memory for faster subsequent generations
Contributing
This is an open-source project. Contributions are welcome! Feel free to:
- Report bugs and issues
- Suggest new features
- Submit pull requests
- Improve documentation
License
This project is open source and available under the MIT License.
Built with β€οΈ using Hugging Face Spaces and Gradio