Instructions to use cudabenchmarktest/stable-audio-open-1.0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Stable Audio Tools
How to use cudabenchmarktest/stable-audio-open-1.0 with Stable Audio Tools:
import torch import torchaudio from einops import rearrange from stable_audio_tools import get_pretrained_model from stable_audio_tools.inference.generation import generate_diffusion_cond device = "cuda" if torch.cuda.is_available() else "cpu" # Download model model, model_config = get_pretrained_model("cudabenchmarktest/stable-audio-open-1.0") sample_rate = model_config["sample_rate"] sample_size = model_config["sample_size"] model = model.to(device) # Set up text and timing conditioning conditioning = [{ "prompt": "128 BPM tech house drum loop", }] # Generate stereo audio output = generate_diffusion_cond( model, conditioning=conditioning, sample_size=sample_size, device=device ) # Rearrange audio batch to a single sequence output = rearrange(output, "b d n -> d (b n)") # Peak normalize, clip, convert to int16, and save to file output = output.to(torch.float32).div(torch.max(torch.abs(output))).clamp(-1, 1).mul(32767).to(torch.int16).cpu() torchaudio.save("output.wav", output, sample_rate) - Notebooks
- Google Colab
- Kaggle
| license: other | |
| license_name: stabilityai-community | |
| license_link: LICENSE.md | |
| base_model: stabilityai/stable-audio-open-1.0 | |
| pipeline_tag: text-to-audio | |
| library_name: stable-audio-tools | |
| tags: | |
| - text-to-audio | |
| - latent-diffusion | |
| - stereo | |
| - 44100hz | |
| # stable-audio-open-1.0 | |
| Generates variable-length (up to 47 s) stereo audio at 44.1 kHz from text prompts. Latent diffusion architecture with an autoencoder, a T5 text encoder, and a transformer-based diffusion (DiT) operating in the autoencoder's latent space. | |
| This repository is an unmodified redistribution of [`stabilityai/stable-audio-open-1.0`](https://huggingface.co/stabilityai/stable-audio-open-1.0). Weights, configs, license, and dataset attribution files are preserved verbatim. | |
| ## Files | |
| - `model.safetensors` (~4.85 GB) β primary weights. | |
| - `model.ckpt` (~4.85 GB) β same weights in `.ckpt` format for `stable_audio_tools`. | |
| - `model_config.json`, `model_index.json` β pipeline configs. | |
| - `LICENSE.md` β Stability AI Community License (verbatim). | |
| - `fma_dataset_attribution2.csv`, `freesound_dataset_attribution2.csv` β training-data attribution (required by the license). | |
| ## Inference | |
| ```python | |
| import torch | |
| import torchaudio | |
| from einops import rearrange | |
| from stable_audio_tools import get_pretrained_model | |
| from stable_audio_tools.inference.generation import generate_diffusion_cond | |
| device = "cuda" if torch.cuda.is_available() else "cpu" | |
| model, model_config = get_pretrained_model("cudabenchmarktest/stable-audio-open-1.0") | |
| sample_rate = model_config["sample_rate"] | |
| sample_size = model_config["sample_size"] | |
| model = model.to(device) | |
| conditioning = [{"prompt": "128 BPM tech house drum loop", "seconds_start": 0, "seconds_total": 30}] | |
| output = generate_diffusion_cond( | |
| model, steps=100, cfg_scale=7, conditioning=conditioning, | |
| sample_size=sample_size, sigma_min=0.3, sigma_max=500, | |
| sampler_type="dpmpp-3m-sde", device=device, | |
| ) | |
| output = rearrange(output, "b d n -> d (b n)") | |
| output = output.to(torch.float32).div(torch.max(torch.abs(output))).clamp(-1, 1).mul(32767).to(torch.int16).cpu() | |
| torchaudio.save("output.wav", output, sample_rate) | |
| ``` | |
| ## License and attribution | |
| Governed by the **Stability AI Community License Agreement** (see `LICENSE.md`). Permits research, non-commercial use, and commercial use for organizations or individuals with less than $1M USD in total annual revenue. Above that threshold a separate Stability Enterprise license is required. | |
| Training-data attribution: see the FMA and Freesound CSV files. Distribution of these attribution files alongside the weights is a license requirement and is preserved here. | |
| - Original release: Stability AI (`stabilityai/stable-audio-open-1.0`). | |
| - This redistribution: weights and configs unmodified, LICENSE preserved, README replaced. No additional modifications. | |