|
|
--- |
|
|
title: Stable Audio Open Small - 4 Variations |
|
|
emoji: 🎵 |
|
|
colorFrom: blue |
|
|
colorTo: purple |
|
|
sdk: gradio |
|
|
sdk_version: 5.20.0 |
|
|
app_file: app.py |
|
|
pinned: false |
|
|
license: other |
|
|
--- |
|
|
|
|
|
# Stable Audio Open Small - 4 Variations |
|
|
|
|
|
Generate up to 4 audio variations from a single text prompt using Stability AI's Stable Audio Open Small model. |
|
|
|
|
|
## Model Information |
|
|
|
|
|
**Model**: [stabilityai/stable-audio-open-small](https://huggingface.co/stabilityai/stable-audio-open-small) |
|
|
|
|
|
- **Type**: Latent diffusion model (DiT) with autoencoder |
|
|
- **Sample Rate**: 44.1 kHz |
|
|
- **Format**: Stereo audio |
|
|
- **Max Duration**: 11 seconds |
|
|
- **License**: Stability AI Community License |
|
|
|
|
|
## Features |
|
|
|
|
|
- **4 Variations**: Generate 4 different audio variations from a single prompt |
|
|
- **Text-to-Audio**: Simple text prompt interface |
|
|
- **Variable Duration**: Control audio length (1-11 seconds) |
|
|
- **Fast Generation**: Uses optimized pingpong sampler with 8 steps |
|
|
|
|
|
## Setup |
|
|
|
|
|
This model requires accepting the license agreement on Hugging Face. To use this Space: |
|
|
|
|
|
1. **Accept the model license**: Visit [stabilityai/stable-audio-open-small](https://huggingface.co/stabilityai/stable-audio-open-small) and accept the license agreement |
|
|
2. **Create an access token**: Go to [Settings > Access Tokens](https://huggingface.co/settings/tokens) and create a token with "read" permissions |
|
|
3. **Add token to Space**: In your Space settings, go to "Variables and secrets" and add a new secret: |
|
|
- Name: `HF_TOKEN` |
|
|
- Value: Your access token |
|
|
- Make sure it's marked as private |
|
|
|
|
|
## Usage |
|
|
|
|
|
1. Enter a text prompt describing the audio you want to generate |
|
|
2. Adjust the duration slider (1-11 seconds) |
|
|
3. Click "Generate" to create 4 variations |
|
|
4. Listen to and download your favorite variations |
|
|
|
|
|
## Example Prompts |
|
|
|
|
|
- "128 BPM tech house drum loop" |
|
|
- "Ocean waves crashing on beach" |
|
|
- "Jazz piano melody" |
|
|
- "Rainforest ambience with bird calls" |
|
|
- "Electronic synth pad" |
|
|
|
|
|
## Model Limitations |
|
|
|
|
|
- The model is not able to generate realistic vocals |
|
|
- Trained with English descriptions - may not perform as well in other languages |
|
|
- Better at generating sound effects and field recordings than music |
|
|
- Performance varies across different music styles and cultures |
|
|
- Prompt engineering may be required for best results |
|
|
|
|
|
## Technical Details |
|
|
|
|
|
- **Steps**: 8 (optimized for speed) |
|
|
- **CFG Scale**: 1.0 |
|
|
- **Sampler**: pingpong |
|
|
- **Batch Size**: 4 (for generating variations) |
|
|
|
|
|
## License |
|
|
|
|
|
This Space uses the Stability AI Community License. For commercial use, please refer to [stability.ai/license](https://stability.ai/license). |
|
|
|
|
|
## Model Card |
|
|
|
|
|
For more information about the model, training data, and limitations, see the [model card](https://huggingface.co/stabilityai/stable-audio-open-small). |
|
|
|
|
|
## Research Paper |
|
|
|
|
|
[Stable Audio Open: An Open Generative Audio Model](https://arxiv.org/abs/2505.08175) |
|
|
|
|
|
|