Instructions to use stabilityai/stable-audio-open-1.0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Stable Audio Tools
How to use stabilityai/stable-audio-open-1.0 with Stable Audio Tools:
import torch import torchaudio from einops import rearrange from stable_audio_tools import get_pretrained_model from stable_audio_tools.inference.generation import generate_diffusion_cond device = "cuda" if torch.cuda.is_available() else "cpu" # Download model model, model_config = get_pretrained_model("stabilityai/stable-audio-open-1.0") sample_rate = model_config["sample_rate"] sample_size = model_config["sample_size"] model = model.to(device) # Set up text and timing conditioning conditioning = [{ "prompt": "128 BPM tech house drum loop", }] # Generate stereo audio output = generate_diffusion_cond( model, conditioning=conditioning, sample_size=sample_size, device=device ) # Rearrange audio batch to a single sequence output = rearrange(output, "b d n -> d (b n)") # Peak normalize, clip, convert to int16, and save to file output = output.to(torch.float32).div(torch.max(torch.abs(output))).clamp(-1, 1).mul(32767).to(torch.int16).cpu() torchaudio.save("output.wav", output, sample_rate) - Notebooks
- Google Colab
- Kaggle
Model cannot produce the human voice?
#26
by sanjeev-bhandari01 - opened
I tried Stable Audio open 1.0(from hugging face space) and stable audio audiosparx 2.0 from stableaudio.com
Both of this model couldnot generate the human voice properly.
Is it limitation of model or it is designed to not generate the human voice?
Is there paper published alongside the model?
Thank you
Here's the paper:
https://arxiv.org/abs/2404.10301v1
Neither Stable Audio Open or Stable Audio 2.0 are trained to produce coherent vocals or speech, by design.
Thanks @julian-parker
sanjeev-bhandari01 changed discussion status to closed