Audio Quality

#4
by andriih1112312 - opened

I tried running it locally and also tested it on the HF Space: https://huggingface.co/spaces/multimodalart/Foundation-1 . I attempted to replicate one of the samples listed in the README.md, but every time the output quality is very poorβ€”the sound is crackly, and the virtual speaker is clipping.

Prompt: "Bass, FM Bass, Medium Delay, Medium Reverb, Low Distortion, Phaser, Sub Bass, Bass, Upper Mids, Acid, Gritty, Wide, Dubstep, Thick, Silky, Warm, Rich, Overdriven, Crisp, Deep, Clean, Pitch Bend, 303, 8 Bars, 140 BPM, E minor"
Output Audio:

Audio from the README for the same prompt:

Where could the problem be?

Same problem here, on 5090, Linux.

EDIT: Discovered there's a problem with torchsave on versions of torchaudio>2.7.
Here's what worked for me:
Remove torch and torchaudio from the setup.py file.
(If you've already gone through setup once, make sure you uninstall torch before the next step:
pip uninstall -y torch torchvision torchaudio
Otherwise, run pip install stable-audio-tools and pip install . )
Then: pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu128

I noticed the issue is gone after applying the audio distortion fix that the authors committed yesterday:
https://huggingface.co/spaces/multimodalart/Foundation-1/commit/4e71cc614f8ed10032858102b61ac30678dc1131

If you are using the RC-stable-audio-tools repository locally instead of the RC Gradio Space, apply the same fix here:
https://github.com/RoyalCities/RC-stable-audio-tools/blob/main/stable_audio_tools/interface/gradio.py#L545-L565

Same problem here, on 5090, Linux.

EDIT: Discovered there's a problem with torchsave on versions of torchaudio>2.7.
Here's what worked for me:
Remove torch and torchaudio from the setup.py file.
(If you've already gone through setup once, make sure you uninstall torch before the next step:
pip uninstall -y torch torchvision torchaudio
Otherwise, run pip install stable-audio-tools and pip install . )
Then: pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu128

Thanks! This helped to fix distorted overdrived sound on M4 Mac as well (and decreased generation time from 1 min to ~45 secs).

One fix was still required to resolve libsndfile.dylib (no such file) error when starting the app with python run_gradio.py:

conda install -c conda-forge libsndfile -y

So the working path on Mac would be via conda (could not make it work with python venv like in orig instructions):

conda create -n RC-stable-audio-tools python=3.10 -y
conda activate RC-stable-audio-tools

pip install stable-audio-tools
pip install .

pip uninstall -y torch torchvision torchaudio
pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0

conda install -c conda-forge libsndfile -y

python run_gradio.py

Hey all - appreciate you for finding a fix. Ill need to get an M series or look into a cloud provider to test then loop in extra instructions for mac on the gradio. I don't own one and mpe support was provided after the fact with another contributor so it was tough to thoroughly test through and through. But I appreciate all of this since it's a good reference point!

Here's the fix for the self-hosted version: https://github.com/rsxdalv/tts-webui.foundation1/commit/62076d9c32fefc04fe4765700b7373cf7f98e835

I made a detached fork where you can submit issues/PRs a little more easily. I have a fork of stable-audio already so I can't fork foundation1 due to how github works.

My request would be for RoyalCities to make a detached fork/mirror and/or to enable issues on their fork.

Sign up or log in to comment