Spaces:

OnyxMunk
/

Stable-Audio-Open

Runtime error

App Files Files Community

Stable-Audio-Open / README.md

OnyxMunk

Refactor application to integrate AudioLDM2 model

a82c7f0 about 2 months ago

preview code

raw

history blame contribute delete

2.67 kB

	---
	title: Stable Audio Open
	emoji: 🎵
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: 6.2.0
	app_file: app.py
	pinned: false
	---

	# 🎵 Stable Audio Open

	An open-source web interface for generating high-quality audio from text prompts using advanced AI models. Create music, sound effects, ambient sounds, and more with simple text descriptions.

	## Features

	- 🎼 Text-to-Audio Generation: Convert text prompts into audio
	- 🎚️ Customizable Duration: Generate audio from 1-30 seconds
	- 🎲 Reproducible Results: Use seeds for consistent generation
	- 🎧 Real-time Playback: Listen to generated audio instantly
	- 📝 Example Prompts: Pre-built examples to get you started

	## Setup

	### Hugging Face Authentication (Recommended)

	For the best experience and to avoid rate limits, set up Hugging Face authentication:

	1. Create a Hugging Face account at [huggingface.co](https://huggingface.co/join)
	2. Go to [Settings > Access Tokens](https://huggingface.co/settings/tokens) and create a new token
	3. Copy `env-example.txt` to `.env` and add your token:
	```
	HF_TOKEN=hf_your_token_here
	```
	4. Install dependencies: `pip install -r requirements.txt`

	Note: Without authentication, you may experience rate limits or reduced access to some models.

	## Usage

	1. Enter a text description of the audio you want to generate
	2. Adjust the duration slider (1-30 seconds)
	3. Optionally set a random seed for reproducible results
	4. Click "Generate Audio" to create your sound

	## Examples

	- "A gentle piano melody playing in a cozy room"
	- "Upbeat electronic dance music with synthesizers"
	- "Rain falling on a tin roof with distant thunder"
	- "Classical violin concerto with orchestra accompaniment"

	## Technical Details

	This application uses:
	- AudioLDM2 (`cvssp/audioldm2`) - Advanced AI model for text-to-audio generation
	- Gradio for the web interface
	- PyTorch & Diffusers for model inference
	- NumPy for audio processing and fallback synthesis
	- Automatic fallback to simple synthesis if model is unavailable

	### Model Information
	- Model: `cvssp/audioldm2`
	- First Run: Model will be automatically downloaded (~1-2 GB)
	- Device: Automatically uses GPU if available, falls back to CPU
	- Caching: Model is cached in memory for faster subsequent generations
	## Contributing

	This is an open-source project. Contributions are welcome! Feel free to:
	- Report bugs and issues
	- Suggest new features
	- Submit pull requests
	- Improve documentation

	## License

	This project is open source and available under the MIT License.

	---

	Built with ❤️ using Hugging Face Spaces and Gradio