Instructions to use stabilityai/stable-audio-open-1.0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Stable Audio Tools
How to use stabilityai/stable-audio-open-1.0 with Stable Audio Tools:
import torch import torchaudio from einops import rearrange from stable_audio_tools import get_pretrained_model from stable_audio_tools.inference.generation import generate_diffusion_cond device = "cuda" if torch.cuda.is_available() else "cpu" # Download model model, model_config = get_pretrained_model("stabilityai/stable-audio-open-1.0") sample_rate = model_config["sample_rate"] sample_size = model_config["sample_size"] model = model.to(device) # Set up text and timing conditioning conditioning = [{ "prompt": "128 BPM tech house drum loop", }] # Generate stereo audio output = generate_diffusion_cond( model, conditioning=conditioning, sample_size=sample_size, device=device ) # Rearrange audio batch to a single sequence output = rearrange(output, "b d n -> d (b n)") # Peak normalize, clip, convert to int16, and save to file output = output.to(torch.float32).div(torch.max(torch.abs(output))).clamp(-1, 1).mul(32767).to(torch.int16).cpu() torchaudio.save("output.wav", output, sample_rate) - Notebooks
- Google Colab
- Kaggle
specifying `init_audio` when calling `generate_diffusion_cond` doesn't work as expected
#47
by jps-la - opened
Hi. I'm using the example python code from the "model card" page. I am loading an input .wav into a Tensor and providing it, with its sample rate, to generate_diffusion_cond() thusly:
...
inputInfo = torchaudio.info("./input.wav") # AudioMetaData(sample_rate=44100, num_frames=2097152, num_channels=2, bits_per_sample=16, encoding=PCM_S)
(inputTensor, inputSampleRate) = torchaudio.load("./input.wav")
inputTuple = tuple([inputSampleRate, inputTensor])
...
output = generate_diffusion_cond(
model,
steps=100,
cfg_scale=7,
conditioning=conditioning,
sample_size=sample_size,
init_audio=inputTuple,
sigma_min=0.3,
sigma_max=500,
sampler_type="dpmpp-3m-sde",
device=device
)
Basically, the generated output output.wav is identical to input.wav, as though the conditioning prompt was completely ignored. I was expecting the input file to be transformed byt he conditioning prompt into something different (for example, changing rock music into piano, or whatever).
Forking behavior seems to preclude using the python debugger. Did I misunderstand something?