Audio Quality
I tried running it locally and also tested it on the HF Space: https://huggingface.co/spaces/multimodalart/Foundation-1 . I attempted to replicate one of the samples listed in the README.md, but every time the output quality is very poorβthe sound is crackly, and the virtual speaker is clipping.
Prompt: "Bass, FM Bass, Medium Delay, Medium Reverb, Low Distortion, Phaser, Sub Bass, Bass, Upper Mids, Acid, Gritty, Wide, Dubstep, Thick, Silky, Warm, Rich, Overdriven, Crisp, Deep, Clean, Pitch Bend, 303, 8 Bars, 140 BPM, E minor"
Output Audio:
Audio from the README for the same prompt:
Where could the problem be?
Same problem here, on 5090, Linux.
EDIT: Discovered there's a problem with torchsave on versions of torchaudio>2.7.
Here's what worked for me:
Remove torch and torchaudio from the setup.py file.
(If you've already gone through setup once, make sure you uninstall torch before the next step:
pip uninstall -y torch torchvision torchaudio
Otherwise, run pip install stable-audio-tools and pip install . )
Then: pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu128
I noticed the issue is gone after applying the audio distortion fix that the authors committed yesterday:
https://huggingface.co/spaces/multimodalart/Foundation-1/commit/4e71cc614f8ed10032858102b61ac30678dc1131
If you are using the RC-stable-audio-tools repository locally instead of the RC Gradio Space, apply the same fix here:
https://github.com/RoyalCities/RC-stable-audio-tools/blob/main/stable_audio_tools/interface/gradio.py#L545-L565
Same problem here, on 5090, Linux.
EDIT: Discovered there's a problem with torchsave on versions of torchaudio>2.7.
Here's what worked for me:
Remove torch and torchaudio from the setup.py file.
(If you've already gone through setup once, make sure you uninstall torch before the next step:
pip uninstall -y torch torchvision torchaudio
Otherwise, run pip install stable-audio-tools and pip install . )
Then: pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu128
Thanks! This helped to fix distorted overdrived sound on M4 Mac as well (and decreased generation time from 1 min to ~45 secs).
One fix was still required to resolve libsndfile.dylib (no such file) error when starting the app with python run_gradio.py:
conda install -c conda-forge libsndfile -y
So the working path on Mac would be via conda (could not make it work with python venv like in orig instructions):
conda create -n RC-stable-audio-tools python=3.10 -y
conda activate RC-stable-audio-tools
pip install stable-audio-tools
pip install .
pip uninstall -y torch torchvision torchaudio
pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0
conda install -c conda-forge libsndfile -y
python run_gradio.py
Hey all - appreciate you for finding a fix. Ill need to get an M series or look into a cloud provider to test then loop in extra instructions for mac on the gradio. I don't own one and mpe support was provided after the fact with another contributor so it was tough to thoroughly test through and through. But I appreciate all of this since it's a good reference point!
Here's the fix for the self-hosted version: https://github.com/rsxdalv/tts-webui.foundation1/commit/62076d9c32fefc04fe4765700b7373cf7f98e835
I made a detached fork where you can submit issues/PRs a little more easily. I have a fork of stable-audio already so I can't fork foundation1 due to how github works.
My request would be for RoyalCities to make a detached fork/mirror and/or to enable issues on their fork.