Spaces:
Runtime error
Runtime error
| title: Stable Audio Open | |
| emoji: π΅ | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 6.2.0 | |
| app_file: app.py | |
| pinned: false | |
| # π΅ Stable Audio Open | |
| An open-source web interface for generating high-quality audio from text prompts using advanced AI models. Create music, sound effects, ambient sounds, and more with simple text descriptions. | |
| ## Features | |
| - πΌ **Text-to-Audio Generation**: Convert text prompts into audio | |
| - ποΈ **Customizable Duration**: Generate audio from 1-30 seconds | |
| - π² **Reproducible Results**: Use seeds for consistent generation | |
| - π§ **Real-time Playback**: Listen to generated audio instantly | |
| - π **Example Prompts**: Pre-built examples to get you started | |
| ## Setup | |
| ### Hugging Face Authentication (Recommended) | |
| For the best experience and to avoid rate limits, set up Hugging Face authentication: | |
| 1. Create a Hugging Face account at [huggingface.co](https://huggingface.co/join) | |
| 2. Go to [Settings > Access Tokens](https://huggingface.co/settings/tokens) and create a new token | |
| 3. Copy `env-example.txt` to `.env` and add your token: | |
| ``` | |
| HF_TOKEN=hf_your_token_here | |
| ``` | |
| 4. Install dependencies: `pip install -r requirements.txt` | |
| **Note**: Without authentication, you may experience rate limits or reduced access to some models. | |
| ## Usage | |
| 1. Enter a text description of the audio you want to generate | |
| 2. Adjust the duration slider (1-30 seconds) | |
| 3. Optionally set a random seed for reproducible results | |
| 4. Click "Generate Audio" to create your sound | |
| ## Examples | |
| - "A gentle piano melody playing in a cozy room" | |
| - "Upbeat electronic dance music with synthesizers" | |
| - "Rain falling on a tin roof with distant thunder" | |
| - "Classical violin concerto with orchestra accompaniment" | |
| ## Technical Details | |
| This application uses: | |
| - **AudioLDM2** (`cvssp/audioldm2`) - Advanced AI model for text-to-audio generation | |
| - **Gradio** for the web interface | |
| - **PyTorch & Diffusers** for model inference | |
| - **NumPy** for audio processing and fallback synthesis | |
| - **Automatic fallback** to simple synthesis if model is unavailable | |
| ### Model Information | |
| - **Model**: `cvssp/audioldm2` | |
| - **First Run**: Model will be automatically downloaded (~1-2 GB) | |
| - **Device**: Automatically uses GPU if available, falls back to CPU | |
| - **Caching**: Model is cached in memory for faster subsequent generations | |
| ## Contributing | |
| This is an open-source project. Contributions are welcome! Feel free to: | |
| - Report bugs and issues | |
| - Suggest new features | |
| - Submit pull requests | |
| - Improve documentation | |
| ## License | |
| This project is open source and available under the MIT License. | |
| --- | |
| *Built with β€οΈ using Hugging Face Spaces and Gradio* | |