Instructions to use cudabenchmarktest/stable-audio-open-1.0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Stable Audio Tools
How to use cudabenchmarktest/stable-audio-open-1.0 with Stable Audio Tools:
import torch import torchaudio from einops import rearrange from stable_audio_tools import get_pretrained_model from stable_audio_tools.inference.generation import generate_diffusion_cond device = "cuda" if torch.cuda.is_available() else "cpu" # Download model model, model_config = get_pretrained_model("cudabenchmarktest/stable-audio-open-1.0") sample_rate = model_config["sample_rate"] sample_size = model_config["sample_size"] model = model.to(device) # Set up text and timing conditioning conditioning = [{ "prompt": "128 BPM tech house drum loop", }] # Generate stereo audio output = generate_diffusion_cond( model, conditioning=conditioning, sample_size=sample_size, device=device ) # Rearrange audio batch to a single sequence output = rearrange(output, "b d n -> d (b n)") # Peak normalize, clip, convert to int16, and save to file output = output.to(torch.float32).div(torch.max(torch.abs(output))).clamp(-1, 1).mul(32767).to(torch.int16).cpu() torchaudio.save("output.wav", output, sample_rate) - Notebooks
- Google Colab
- Kaggle
stable-audio-open-1.0
Generates variable-length (up to 47 s) stereo audio at 44.1 kHz from text prompts. Latent diffusion architecture with an autoencoder, a T5 text encoder, and a transformer-based diffusion (DiT) operating in the autoencoder's latent space.
This repository is an unmodified redistribution of stabilityai/stable-audio-open-1.0. Weights, configs, license, and dataset attribution files are preserved verbatim.
Files
model.safetensors(~4.85 GB) โ primary weights.model.ckpt(~4.85 GB) โ same weights in.ckptformat forstable_audio_tools.model_config.json,model_index.jsonโ pipeline configs.LICENSE.mdโ Stability AI Community License (verbatim).fma_dataset_attribution2.csv,freesound_dataset_attribution2.csvโ training-data attribution (required by the license).
Inference
import torch
import torchaudio
from einops import rearrange
from stable_audio_tools import get_pretrained_model
from stable_audio_tools.inference.generation import generate_diffusion_cond
device = "cuda" if torch.cuda.is_available() else "cpu"
model, model_config = get_pretrained_model("cudabenchmarktest/stable-audio-open-1.0")
sample_rate = model_config["sample_rate"]
sample_size = model_config["sample_size"]
model = model.to(device)
conditioning = [{"prompt": "128 BPM tech house drum loop", "seconds_start": 0, "seconds_total": 30}]
output = generate_diffusion_cond(
model, steps=100, cfg_scale=7, conditioning=conditioning,
sample_size=sample_size, sigma_min=0.3, sigma_max=500,
sampler_type="dpmpp-3m-sde", device=device,
)
output = rearrange(output, "b d n -> d (b n)")
output = output.to(torch.float32).div(torch.max(torch.abs(output))).clamp(-1, 1).mul(32767).to(torch.int16).cpu()
torchaudio.save("output.wav", output, sample_rate)
License and attribution
Governed by the Stability AI Community License Agreement (see LICENSE.md). Permits research, non-commercial use, and commercial use for organizations or individuals with less than $1M USD in total annual revenue. Above that threshold a separate Stability Enterprise license is required.
Training-data attribution: see the FMA and Freesound CSV files. Distribution of these attribution files alongside the weights is a license requirement and is preserved here.
- Original release: Stability AI (
stabilityai/stable-audio-open-1.0). - This redistribution: weights and configs unmodified, LICENSE preserved, README replaced. No additional modifications.
- Downloads last month
- 6
Model tree for cudabenchmarktest/stable-audio-open-1.0
Base model
stabilityai/stable-audio-open-1.0