Instructions to use stabilityai/stable-audio-open-1.0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Stable Audio Tools
How to use stabilityai/stable-audio-open-1.0 with Stable Audio Tools:
import torch import torchaudio from einops import rearrange from stable_audio_tools import get_pretrained_model from stable_audio_tools.inference.generation import generate_diffusion_cond device = "cuda" if torch.cuda.is_available() else "cpu" # Download model model, model_config = get_pretrained_model("stabilityai/stable-audio-open-1.0") sample_rate = model_config["sample_rate"] sample_size = model_config["sample_size"] model = model.to(device) # Set up text and timing conditioning conditioning = [{ "prompt": "128 BPM tech house drum loop", }] # Generate stereo audio output = generate_diffusion_cond( model, conditioning=conditioning, sample_size=sample_size, device=device ) # Rearrange audio batch to a single sequence output = rearrange(output, "b d n -> d (b n)") # Peak normalize, clip, convert to int16, and save to file output = output.to(torch.float32).div(torch.max(torch.abs(output))).clamp(-1, 1).mul(32767).to(torch.int16).cpu() torchaudio.save("output.wav", output, sample_rate) - Notebooks
- Google Colab
- Kaggle
Trim output audio function + colab player + fun output sample
#9
by asigalov61 - opened
Hello, Stability AI team! :)
I just wanted to thank you for making and sharing this model. Its very nice, capable and I enjoyed it a lot!
I also wanted to contribute a bit and post this code for the default inference example to make things simpler. This code is useful for Google Colab and also for better output audio files.
from IPython.display import display, Audio
def trim_silence(audio_tensor):
# Flip the tensor along the second dimension (time dimension)
flipped = torch.flip(audio_tensor, [1])
# Find the index of the first non-zero element in the flipped tensor
non_zero_indices = torch.nonzero(flipped, as_tuple=True)[1]
# If there are no non-zero elements, return an empty tensor
if non_zero_indices.size(0) == 0:
return torch.empty_like(audio_tensor)
# Find the index of the last non-zero element in the original tensor
last_non_zero = audio_tensor.size(1) - torch.min(non_zero_indices) - 1
# Slice the tensor up to the last non-zero element
trimmed = audio_tensor[:, :last_non_zero+1]
return trimmed
trimmed_audio = trim_silence(output)
display(Audio(trimmed_audio, rate=sample_rate))
And last but not least, I wanted to share one output sample I liked:
This was generated with the following settings:
# Set up text and timing conditioning
conditioning = [{
"prompt": "So close, no matter how far, Couldn't be much more from the heart, Forever trusting who we are, And nothing else matters!",
"seconds_start": 0,
"seconds_total": 47
}]
# Generate stereo audio
output = generate_diffusion_cond(
model,
steps=300,
cfg_scale=7,
conditioning=conditioning,
sample_size=sample_size,
sigma_min=0.3,
sigma_max=500,
sampler_type="dpmpp-3m-sde",
device=device
)
Thanks again!
Sincerely,
Alex