File size: 3,435 Bytes
815570d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
---
license: other
license_name: stability-ai-community
license_link: LICENSE.md
tags:
  - audio
  - text-to-audio
  - sound-effects
  - ambient
  - diffusion
  - stable-audio
  - safetensors
  - maestraea
pipeline_tag: text-to-audio
base_model: stabilityai/stable-audio-open-1.0
---

# Stable Audio Open 1.0 (Mæstræa Mirror)

**Text-to-Audio SFX & Ambient Textures — Up to 47s Stereo @ 44.1kHz**

[Original Model](https://huggingface.co/stabilityai/stable-audio-open-1.0) by [Stability AI](https://stability.ai/) · Stability AI Community License

> This is an **ungated mirror** of the Stable Audio Open 1.0 model weights for use with [Mæstræa AI Workstation](https://github.com/AEmotionStudio/Maestraea). Only safetensors-format weights are included (legacy `.ckpt` files stripped). All credits go to the original authors.

## What's in This Repo

| Path | Description | Size |
|------|-------------|------|
| `model.safetensors` | Main model checkpoint | ~3 GB |
| `transformer/diffusion_pytorch_model.safetensors` | DiT transformer | ~1.5 GB |
| `text_encoder/model.safetensors` | T5 text encoder | ~1.2 GB |
| `vae/diffusion_pytorch_model.safetensors` | VAE decoder | ~150 MB |
| `projection_model/diffusion_pytorch_model.safetensors` | Projection model | ~50 MB |
| `tokenizer/` | T5 tokenizer files | < 10 MB |
| `model_config.json` | Model architecture config | < 1 KB |
| `model_index.json` | Diffusers pipeline index | < 1 KB |
| `scheduler/` | Scheduler config | < 1 KB |

## What Stable Audio Open Does

Stable Audio Open generates stereo audio at 44.1kHz from text prompts. It excels at:

- **Sound effects** — Foley, impacts, transitions
- **Ambient textures** — Rain, wind, crowds, environments
- **Musical textures** — Pads, drones, atmospheric sounds
- **Audio scenes** — Complex layered soundscapes

Up to 47 seconds of stereo audio per generation.

### What It's NOT Good At

- Full songs with vocals
- High-fidelity musical instruments (use Foundation-1 for that)
- Speech synthesis

### VRAM Requirements

- **Minimum**: ~4 GB (FP16)
- **Recommended**: ~7 GB (FP16, longer durations)

## Usage with Mæstræa

These models are automatically downloaded by the Mæstræa AI Workstation backend.

### Direct Usage (diffusers)

```python
from diffusers import StableAudioPipeline
import torch

pipe = StableAudioPipeline.from_pretrained(
    "AEmotionStudio/stable-audio-open-models",
    torch_dtype=torch.float16,
).to("cuda")

audio = pipe(
    prompt="Thunderstorm with heavy rain and distant rolling thunder",
    negative_prompt="low quality, distorted",
    audio_end_in_s=10.0,
    num_inference_steps=100,
).audios[0]
```

### Using stable-audio-tools

```python
from stable_audio_tools import get_pretrained_model
model, model_config = get_pretrained_model("AEmotionStudio/stable-audio-open-models")
```

## License

**Stability AI Community License** — see [LICENSE.md](LICENSE.md) for full terms.

Key points:
- Free for research and non-commercial use
- Commercial use requires revenue < $1M/year or a separate license from Stability AI
- Model outputs cannot be used to train competing models

## Credits

- **Model**: [Stability AI](https://stability.ai/)
- **Paper**: [Stable Audio Open](https://stability.ai/research/stable-audio-open)
- **Training Data**: FreeSound + Free Music Archive (see attribution CSVs)
- **Mirror by**: [AEmotionStudio](https://huggingface.co/AEmotionStudio)