--- title: Stable Audio Open emoji: 🎵 colorFrom: blue colorTo: purple sdk: gradio sdk_version: 6.2.0 app_file: app.py pinned: false --- # 🎵 Stable Audio Open An open-source web interface for generating high-quality audio from text prompts using advanced AI models. Create music, sound effects, ambient sounds, and more with simple text descriptions. ## Features - 🎼 **Text-to-Audio Generation**: Convert text prompts into audio - 🎚️ **Customizable Duration**: Generate audio from 1-30 seconds - 🎲 **Reproducible Results**: Use seeds for consistent generation - 🎧 **Real-time Playback**: Listen to generated audio instantly - 📝 **Example Prompts**: Pre-built examples to get you started ## Setup ### Hugging Face Authentication (Recommended) For the best experience and to avoid rate limits, set up Hugging Face authentication: 1. Create a Hugging Face account at [huggingface.co](https://huggingface.co/join) 2. Go to [Settings > Access Tokens](https://huggingface.co/settings/tokens) and create a new token 3. Copy `env-example.txt` to `.env` and add your token: ``` HF_TOKEN=hf_your_token_here ``` 4. Install dependencies: `pip install -r requirements.txt` **Note**: Without authentication, you may experience rate limits or reduced access to some models. ## Usage 1. Enter a text description of the audio you want to generate 2. Adjust the duration slider (1-30 seconds) 3. Optionally set a random seed for reproducible results 4. Click "Generate Audio" to create your sound ## Examples - "A gentle piano melody playing in a cozy room" - "Upbeat electronic dance music with synthesizers" - "Rain falling on a tin roof with distant thunder" - "Classical violin concerto with orchestra accompaniment" ## Technical Details This application uses: - **AudioLDM2** (`cvssp/audioldm2`) - Advanced AI model for text-to-audio generation - **Gradio** for the web interface - **PyTorch & Diffusers** for model inference - **NumPy** for audio processing and fallback synthesis - **Automatic fallback** to simple synthesis if model is unavailable ### Model Information - **Model**: `cvssp/audioldm2` - **First Run**: Model will be automatically downloaded (~1-2 GB) - **Device**: Automatically uses GPU if available, falls back to CPU - **Caching**: Model is cached in memory for faster subsequent generations ## Contributing This is an open-source project. Contributions are welcome! Feel free to: - Report bugs and issues - Suggest new features - Submit pull requests - Improve documentation ## License This project is open source and available under the MIT License. --- *Built with ❤️ using Hugging Face Spaces and Gradio*