AEmotionStudio
/

stable-audio-open-models

StableAudioPipeline

Model card Files Files and versions

stable-audio-open-models / README.md

AEmotionStudio's picture

Add README for Mæstræa mirror

815570d verified about 1 month ago

|

history blame contribute delete

3.44 kB

	---
	license: other
	license_name: stability-ai-community
	license_link: LICENSE.md
	tags:
	- audio
	- text-to-audio
	- sound-effects
	- ambient
	- diffusion
	- stable-audio
	- safetensors
	- maestraea
	pipeline_tag: text-to-audio
	base_model: stabilityai/stable-audio-open-1.0
	---

	# Stable Audio Open 1.0 (Mæstræa Mirror)

	Text-to-Audio SFX & Ambient Textures — Up to 47s Stereo @ 44.1kHz

	[Original Model](https://huggingface.co/stabilityai/stable-audio-open-1.0) by [Stability AI](https://stability.ai/) · Stability AI Community License

	> This is an ungated mirror of the Stable Audio Open 1.0 model weights for use with [Mæstræa AI Workstation](https://github.com/AEmotionStudio/Maestraea). Only safetensors-format weights are included (legacy `.ckpt` files stripped). All credits go to the original authors.

	## What's in This Repo

	\| Path \| Description \| Size \|
	\|------\|-------------\|------\|
	\| `model.safetensors` \| Main model checkpoint \| ~3 GB \|
	\| `transformer/diffusion_pytorch_model.safetensors` \| DiT transformer \| ~1.5 GB \|
	\| `text_encoder/model.safetensors` \| T5 text encoder \| ~1.2 GB \|
	\| `vae/diffusion_pytorch_model.safetensors` \| VAE decoder \| ~150 MB \|
	\| `projection_model/diffusion_pytorch_model.safetensors` \| Projection model \| ~50 MB \|
	\| `tokenizer/` \| T5 tokenizer files \| < 10 MB \|
	\| `model_config.json` \| Model architecture config \| < 1 KB \|
	\| `model_index.json` \| Diffusers pipeline index \| < 1 KB \|
	\| `scheduler/` \| Scheduler config \| < 1 KB \|

	## What Stable Audio Open Does

	Stable Audio Open generates stereo audio at 44.1kHz from text prompts. It excels at:

	- Sound effects — Foley, impacts, transitions
	- Ambient textures — Rain, wind, crowds, environments
	- Musical textures — Pads, drones, atmospheric sounds
	- Audio scenes — Complex layered soundscapes

	Up to 47 seconds of stereo audio per generation.

	### What It's NOT Good At

	- Full songs with vocals
	- High-fidelity musical instruments (use Foundation-1 for that)
	- Speech synthesis

	### VRAM Requirements

	- Minimum: ~4 GB (FP16)
	- Recommended: ~7 GB (FP16, longer durations)

	## Usage with Mæstræa

	These models are automatically downloaded by the Mæstræa AI Workstation backend.

	### Direct Usage (diffusers)

	```python
	from diffusers import StableAudioPipeline
	import torch

	pipe = StableAudioPipeline.from_pretrained(
	"AEmotionStudio/stable-audio-open-models",
	torch_dtype=torch.float16,
	).to("cuda")

	audio = pipe(
	prompt="Thunderstorm with heavy rain and distant rolling thunder",
	negative_prompt="low quality, distorted",
	audio_end_in_s=10.0,
	num_inference_steps=100,
	).audios[0]
	```

	### Using stable-audio-tools

	```python
	from stable_audio_tools import get_pretrained_model
	model, model_config = get_pretrained_model("AEmotionStudio/stable-audio-open-models")
	```

	## License

	Stability AI Community License — see [LICENSE.md](LICENSE.md) for full terms.

	Key points:
	- Free for research and non-commercial use
	- Commercial use requires revenue < $1M/year or a separate license from Stability AI
	- Model outputs cannot be used to train competing models

	## Credits

	- Model: [Stability AI](https://stability.ai/)
	- Paper: [Stable Audio Open](https://stability.ai/research/stable-audio-open)
	- Training Data: FreeSound + Free Music Archive (see attribution CSVs)
	- Mirror by: [AEmotionStudio](https://huggingface.co/AEmotionStudio)