Spaces:

hugggof
/

saos

Running

File size: 2,853 Bytes

---
title: Stable Audio Open Small - 4 Variations
emoji: 🎵
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.20.0
app_file: app.py
pinned: false
license: other
---

# Stable Audio Open Small - 4 Variations

Generate up to 4 audio variations from a single text prompt using Stability AI's Stable Audio Open Small model.

## Model Information

**Model**: [stabilityai/stable-audio-open-small](https://huggingface.co/stabilityai/stable-audio-open-small)

- **Type**: Latent diffusion model (DiT) with autoencoder
- **Sample Rate**: 44.1 kHz
- **Format**: Stereo audio
- **Max Duration**: 11 seconds
- **License**: Stability AI Community License

## Features

- **4 Variations**: Generate 4 different audio variations from a single prompt
- **Text-to-Audio**: Simple text prompt interface
- **Variable Duration**: Control audio length (1-11 seconds)
- **Fast Generation**: Uses optimized pingpong sampler with 8 steps

## Setup

This model requires accepting the license agreement on Hugging Face. To use this Space:

1. **Accept the model license**: Visit [stabilityai/stable-audio-open-small](https://huggingface.co/stabilityai/stable-audio-open-small) and accept the license agreement
2. **Create an access token**: Go to [Settings > Access Tokens](https://huggingface.co/settings/tokens) and create a token with "read" permissions
3. **Add token to Space**: In your Space settings, go to "Variables and secrets" and add a new secret:
   - Name: `HF_TOKEN`
   - Value: Your access token
   - Make sure it's marked as private

## Usage

1. Enter a text prompt describing the audio you want to generate
2. Adjust the duration slider (1-11 seconds)
3. Click "Generate" to create 4 variations
4. Listen to and download your favorite variations

## Example Prompts

- "128 BPM tech house drum loop"
- "Ocean waves crashing on beach"
- "Jazz piano melody"
- "Rainforest ambience with bird calls"
- "Electronic synth pad"

## Model Limitations

- The model is not able to generate realistic vocals
- Trained with English descriptions - may not perform as well in other languages
- Better at generating sound effects and field recordings than music
- Performance varies across different music styles and cultures
- Prompt engineering may be required for best results

## Technical Details

- **Steps**: 8 (optimized for speed)
- **CFG Scale**: 1.0
- **Sampler**: pingpong
- **Batch Size**: 4 (for generating variations)

## License

This Space uses the Stability AI Community License. For commercial use, please refer to [stability.ai/license](https://stability.ai/license).

## Model Card

For more information about the model, training data, and limitations, see the [model card](https://huggingface.co/stabilityai/stable-audio-open-small).

## Research Paper

[Stable Audio Open: An Open Generative Audio Model](https://arxiv.org/abs/2505.08175)