saos / README.md
hugofloresgarcia's picture
Add HF_TOKEN authentication support for model access
2760947
---
title: Stable Audio Open Small - 4 Variations
emoji: 🎵
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.20.0
app_file: app.py
pinned: false
license: other
---
# Stable Audio Open Small - 4 Variations
Generate up to 4 audio variations from a single text prompt using Stability AI's Stable Audio Open Small model.
## Model Information
**Model**: [stabilityai/stable-audio-open-small](https://huggingface.co/stabilityai/stable-audio-open-small)
- **Type**: Latent diffusion model (DiT) with autoencoder
- **Sample Rate**: 44.1 kHz
- **Format**: Stereo audio
- **Max Duration**: 11 seconds
- **License**: Stability AI Community License
## Features
- **4 Variations**: Generate 4 different audio variations from a single prompt
- **Text-to-Audio**: Simple text prompt interface
- **Variable Duration**: Control audio length (1-11 seconds)
- **Fast Generation**: Uses optimized pingpong sampler with 8 steps
## Setup
This model requires accepting the license agreement on Hugging Face. To use this Space:
1. **Accept the model license**: Visit [stabilityai/stable-audio-open-small](https://huggingface.co/stabilityai/stable-audio-open-small) and accept the license agreement
2. **Create an access token**: Go to [Settings > Access Tokens](https://huggingface.co/settings/tokens) and create a token with "read" permissions
3. **Add token to Space**: In your Space settings, go to "Variables and secrets" and add a new secret:
- Name: `HF_TOKEN`
- Value: Your access token
- Make sure it's marked as private
## Usage
1. Enter a text prompt describing the audio you want to generate
2. Adjust the duration slider (1-11 seconds)
3. Click "Generate" to create 4 variations
4. Listen to and download your favorite variations
## Example Prompts
- "128 BPM tech house drum loop"
- "Ocean waves crashing on beach"
- "Jazz piano melody"
- "Rainforest ambience with bird calls"
- "Electronic synth pad"
## Model Limitations
- The model is not able to generate realistic vocals
- Trained with English descriptions - may not perform as well in other languages
- Better at generating sound effects and field recordings than music
- Performance varies across different music styles and cultures
- Prompt engineering may be required for best results
## Technical Details
- **Steps**: 8 (optimized for speed)
- **CFG Scale**: 1.0
- **Sampler**: pingpong
- **Batch Size**: 4 (for generating variations)
## License
This Space uses the Stability AI Community License. For commercial use, please refer to [stability.ai/license](https://stability.ai/license).
## Model Card
For more information about the model, training data, and limitations, see the [model card](https://huggingface.co/stabilityai/stable-audio-open-small).
## Research Paper
[Stable Audio Open: An Open Generative Audio Model](https://arxiv.org/abs/2505.08175)