File size: 2,674 Bytes
c64278a
 
505eff0
 
 
c64278a
 
 
 
 
 
505eff0
 
 
 
 
 
 
 
 
 
 
 
fa03fad
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
505eff0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a82c7f0
505eff0
d0fa7b7
 
 
 
 
a82c7f0
d0fa7b7
 
 
505eff0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
---
title: Stable Audio Open
emoji: šŸŽµ
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.2.0
app_file: app.py
pinned: false
---

# šŸŽµ Stable Audio Open

An open-source web interface for generating high-quality audio from text prompts using advanced AI models. Create music, sound effects, ambient sounds, and more with simple text descriptions.

## Features

- šŸŽ¼ **Text-to-Audio Generation**: Convert text prompts into audio
- šŸŽšļø **Customizable Duration**: Generate audio from 1-30 seconds
- šŸŽ² **Reproducible Results**: Use seeds for consistent generation
- šŸŽ§ **Real-time Playback**: Listen to generated audio instantly
- šŸ“ **Example Prompts**: Pre-built examples to get you started

## Setup

### Hugging Face Authentication (Recommended)

For the best experience and to avoid rate limits, set up Hugging Face authentication:

1. Create a Hugging Face account at [huggingface.co](https://huggingface.co/join)
2. Go to [Settings > Access Tokens](https://huggingface.co/settings/tokens) and create a new token
3. Copy `env-example.txt` to `.env` and add your token:
   ```
   HF_TOKEN=hf_your_token_here
   ```
4. Install dependencies: `pip install -r requirements.txt`

**Note**: Without authentication, you may experience rate limits or reduced access to some models.

## Usage

1. Enter a text description of the audio you want to generate
2. Adjust the duration slider (1-30 seconds)
3. Optionally set a random seed for reproducible results
4. Click "Generate Audio" to create your sound

## Examples

- "A gentle piano melody playing in a cozy room"
- "Upbeat electronic dance music with synthesizers"
- "Rain falling on a tin roof with distant thunder"
- "Classical violin concerto with orchestra accompaniment"

## Technical Details

This application uses:
- **AudioLDM2** (`cvssp/audioldm2`) - Advanced AI model for text-to-audio generation
- **Gradio** for the web interface
- **PyTorch & Diffusers** for model inference
- **NumPy** for audio processing and fallback synthesis
- **Automatic fallback** to simple synthesis if model is unavailable

### Model Information
- **Model**: `cvssp/audioldm2`
- **First Run**: Model will be automatically downloaded (~1-2 GB)
- **Device**: Automatically uses GPU if available, falls back to CPU
- **Caching**: Model is cached in memory for faster subsequent generations
## Contributing

This is an open-source project. Contributions are welcome! Feel free to:
- Report bugs and issues
- Suggest new features
- Submit pull requests
- Improve documentation

## License

This project is open source and available under the MIT License.

---

*Built with ā¤ļø using Hugging Face Spaces and Gradio*