Spaces:

hugggof
/

saos

Running

App Files Files Community

saos / README.md

hugofloresgarcia

Add HF_TOKEN authentication support for model access

2760947 24 days ago

preview code

raw

history blame contribute delete

2.85 kB

	---
	title: Stable Audio Open Small - 4 Variations
	emoji: 🎵
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: 5.20.0
	app_file: app.py
	pinned: false
	license: other
	---

	# Stable Audio Open Small - 4 Variations

	Generate up to 4 audio variations from a single text prompt using Stability AI's Stable Audio Open Small model.

	## Model Information

	Model: [stabilityai/stable-audio-open-small](https://huggingface.co/stabilityai/stable-audio-open-small)

	- Type: Latent diffusion model (DiT) with autoencoder
	- Sample Rate: 44.1 kHz
	- Format: Stereo audio
	- Max Duration: 11 seconds
	- License: Stability AI Community License

	## Features

	- 4 Variations: Generate 4 different audio variations from a single prompt
	- Text-to-Audio: Simple text prompt interface
	- Variable Duration: Control audio length (1-11 seconds)
	- Fast Generation: Uses optimized pingpong sampler with 8 steps

	## Setup

	This model requires accepting the license agreement on Hugging Face. To use this Space:

	1. Accept the model license: Visit [stabilityai/stable-audio-open-small](https://huggingface.co/stabilityai/stable-audio-open-small) and accept the license agreement
	2. Create an access token: Go to [Settings > Access Tokens](https://huggingface.co/settings/tokens) and create a token with "read" permissions
	3. Add token to Space: In your Space settings, go to "Variables and secrets" and add a new secret:
	- Name: `HF_TOKEN`
	- Value: Your access token
	- Make sure it's marked as private

	## Usage

	1. Enter a text prompt describing the audio you want to generate
	2. Adjust the duration slider (1-11 seconds)
	3. Click "Generate" to create 4 variations
	4. Listen to and download your favorite variations

	## Example Prompts

	- "128 BPM tech house drum loop"
	- "Ocean waves crashing on beach"
	- "Jazz piano melody"
	- "Rainforest ambience with bird calls"
	- "Electronic synth pad"

	## Model Limitations

	- The model is not able to generate realistic vocals
	- Trained with English descriptions - may not perform as well in other languages
	- Better at generating sound effects and field recordings than music
	- Performance varies across different music styles and cultures
	- Prompt engineering may be required for best results

	## Technical Details

	- Steps: 8 (optimized for speed)
	- CFG Scale: 1.0
	- Sampler: pingpong
	- Batch Size: 4 (for generating variations)

	## License

	This Space uses the Stability AI Community License. For commercial use, please refer to [stability.ai/license](https://stability.ai/license).

	## Model Card

	For more information about the model, training data, and limitations, see the [model card](https://huggingface.co/stabilityai/stable-audio-open-small).

	## Research Paper

	[Stable Audio Open: An Open Generative Audio Model](https://arxiv.org/abs/2505.08175)