Text-to-Audio
Diffusers
Safetensors
StableAudioPipeline
audio
sound-effects
ambient
diffusion
stable-audio
maestraea
Instructions to use AEmotionStudio/stable-audio-open-models with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use AEmotionStudio/stable-audio-open-models with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("AEmotionStudio/stable-audio-open-models", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
| license: other | |
| license_name: stability-ai-community | |
| license_link: LICENSE.md | |
| tags: | |
| - audio | |
| - text-to-audio | |
| - sound-effects | |
| - ambient | |
| - diffusion | |
| - stable-audio | |
| - safetensors | |
| - maestraea | |
| pipeline_tag: text-to-audio | |
| base_model: stabilityai/stable-audio-open-1.0 | |
| # Stable Audio Open 1.0 (Mæstræa Mirror) | |
| **Text-to-Audio SFX & Ambient Textures — Up to 47s Stereo @ 44.1kHz** | |
| [Original Model](https://huggingface.co/stabilityai/stable-audio-open-1.0) by [Stability AI](https://stability.ai/) · Stability AI Community License | |
| > This is an **ungated mirror** of the Stable Audio Open 1.0 model weights for use with [Mæstræa AI Workstation](https://github.com/AEmotionStudio/Maestraea). Only safetensors-format weights are included (legacy `.ckpt` files stripped). All credits go to the original authors. | |
| ## What's in This Repo | |
| | Path | Description | Size | | |
| |------|-------------|------| | |
| | `model.safetensors` | Main model checkpoint | ~3 GB | | |
| | `transformer/diffusion_pytorch_model.safetensors` | DiT transformer | ~1.5 GB | | |
| | `text_encoder/model.safetensors` | T5 text encoder | ~1.2 GB | | |
| | `vae/diffusion_pytorch_model.safetensors` | VAE decoder | ~150 MB | | |
| | `projection_model/diffusion_pytorch_model.safetensors` | Projection model | ~50 MB | | |
| | `tokenizer/` | T5 tokenizer files | < 10 MB | | |
| | `model_config.json` | Model architecture config | < 1 KB | | |
| | `model_index.json` | Diffusers pipeline index | < 1 KB | | |
| | `scheduler/` | Scheduler config | < 1 KB | | |
| ## What Stable Audio Open Does | |
| Stable Audio Open generates stereo audio at 44.1kHz from text prompts. It excels at: | |
| - **Sound effects** — Foley, impacts, transitions | |
| - **Ambient textures** — Rain, wind, crowds, environments | |
| - **Musical textures** — Pads, drones, atmospheric sounds | |
| - **Audio scenes** — Complex layered soundscapes | |
| Up to 47 seconds of stereo audio per generation. | |
| ### What It's NOT Good At | |
| - Full songs with vocals | |
| - High-fidelity musical instruments (use Foundation-1 for that) | |
| - Speech synthesis | |
| ### VRAM Requirements | |
| - **Minimum**: ~4 GB (FP16) | |
| - **Recommended**: ~7 GB (FP16, longer durations) | |
| ## Usage with Mæstræa | |
| These models are automatically downloaded by the Mæstræa AI Workstation backend. | |
| ### Direct Usage (diffusers) | |
| ```python | |
| from diffusers import StableAudioPipeline | |
| import torch | |
| pipe = StableAudioPipeline.from_pretrained( | |
| "AEmotionStudio/stable-audio-open-models", | |
| torch_dtype=torch.float16, | |
| ).to("cuda") | |
| audio = pipe( | |
| prompt="Thunderstorm with heavy rain and distant rolling thunder", | |
| negative_prompt="low quality, distorted", | |
| audio_end_in_s=10.0, | |
| num_inference_steps=100, | |
| ).audios[0] | |
| ``` | |
| ### Using stable-audio-tools | |
| ```python | |
| from stable_audio_tools import get_pretrained_model | |
| model, model_config = get_pretrained_model("AEmotionStudio/stable-audio-open-models") | |
| ``` | |
| ## License | |
| **Stability AI Community License** — see [LICENSE.md](LICENSE.md) for full terms. | |
| Key points: | |
| - Free for research and non-commercial use | |
| - Commercial use requires revenue < $1M/year or a separate license from Stability AI | |
| - Model outputs cannot be used to train competing models | |
| ## Credits | |
| - **Model**: [Stability AI](https://stability.ai/) | |
| - **Paper**: [Stable Audio Open](https://stability.ai/research/stable-audio-open) | |
| - **Training Data**: FreeSound + Free Music Archive (see attribution CSVs) | |
| - **Mirror by**: [AEmotionStudio](https://huggingface.co/AEmotionStudio) | |