AEmotionStudio's picture
Add README for Mæstræa mirror
815570d verified
---
license: other
license_name: stability-ai-community
license_link: LICENSE.md
tags:
- audio
- text-to-audio
- sound-effects
- ambient
- diffusion
- stable-audio
- safetensors
- maestraea
pipeline_tag: text-to-audio
base_model: stabilityai/stable-audio-open-1.0
---
# Stable Audio Open 1.0 (Mæstræa Mirror)
**Text-to-Audio SFX & Ambient Textures — Up to 47s Stereo @ 44.1kHz**
[Original Model](https://huggingface.co/stabilityai/stable-audio-open-1.0) by [Stability AI](https://stability.ai/) · Stability AI Community License
> This is an **ungated mirror** of the Stable Audio Open 1.0 model weights for use with [Mæstræa AI Workstation](https://github.com/AEmotionStudio/Maestraea). Only safetensors-format weights are included (legacy `.ckpt` files stripped). All credits go to the original authors.
## What's in This Repo
| Path | Description | Size |
|------|-------------|------|
| `model.safetensors` | Main model checkpoint | ~3 GB |
| `transformer/diffusion_pytorch_model.safetensors` | DiT transformer | ~1.5 GB |
| `text_encoder/model.safetensors` | T5 text encoder | ~1.2 GB |
| `vae/diffusion_pytorch_model.safetensors` | VAE decoder | ~150 MB |
| `projection_model/diffusion_pytorch_model.safetensors` | Projection model | ~50 MB |
| `tokenizer/` | T5 tokenizer files | < 10 MB |
| `model_config.json` | Model architecture config | < 1 KB |
| `model_index.json` | Diffusers pipeline index | < 1 KB |
| `scheduler/` | Scheduler config | < 1 KB |
## What Stable Audio Open Does
Stable Audio Open generates stereo audio at 44.1kHz from text prompts. It excels at:
- **Sound effects** — Foley, impacts, transitions
- **Ambient textures** — Rain, wind, crowds, environments
- **Musical textures** — Pads, drones, atmospheric sounds
- **Audio scenes** — Complex layered soundscapes
Up to 47 seconds of stereo audio per generation.
### What It's NOT Good At
- Full songs with vocals
- High-fidelity musical instruments (use Foundation-1 for that)
- Speech synthesis
### VRAM Requirements
- **Minimum**: ~4 GB (FP16)
- **Recommended**: ~7 GB (FP16, longer durations)
## Usage with Mæstræa
These models are automatically downloaded by the Mæstræa AI Workstation backend.
### Direct Usage (diffusers)
```python
from diffusers import StableAudioPipeline
import torch
pipe = StableAudioPipeline.from_pretrained(
"AEmotionStudio/stable-audio-open-models",
torch_dtype=torch.float16,
).to("cuda")
audio = pipe(
prompt="Thunderstorm with heavy rain and distant rolling thunder",
negative_prompt="low quality, distorted",
audio_end_in_s=10.0,
num_inference_steps=100,
).audios[0]
```
### Using stable-audio-tools
```python
from stable_audio_tools import get_pretrained_model
model, model_config = get_pretrained_model("AEmotionStudio/stable-audio-open-models")
```
## License
**Stability AI Community License** — see [LICENSE.md](LICENSE.md) for full terms.
Key points:
- Free for research and non-commercial use
- Commercial use requires revenue < $1M/year or a separate license from Stability AI
- Model outputs cannot be used to train competing models
## Credits
- **Model**: [Stability AI](https://stability.ai/)
- **Paper**: [Stable Audio Open](https://stability.ai/research/stable-audio-open)
- **Training Data**: FreeSound + Free Music Archive (see attribution CSVs)
- **Mirror by**: [AEmotionStudio](https://huggingface.co/AEmotionStudio)