Instructions to use declare-lab/tango-full-ft-audiocaps with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use declare-lab/tango-full-ft-audiocaps with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-to-audio", model="declare-lab/tango-full-ft-audiocaps")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("declare-lab/tango-full-ft-audiocaps", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Could you share your inference setting?
I used the "tango-full-ft-audiocaps" ckpt to inference and get bad evaluation score(FD 28, FAD 2.3, KL 2.0,etc), my inference setting is "num_steps 200, guide 3.0, no seed"
Could you use this script instead? https://github.com/declare-lab/tango/blob/master/inference_hf.py
The sampling rate should be 16 kHz.
Could you use this script instead? https://github.com/declare-lab/tango/blob/master/inference_hf.py
The sampling rate should be 16 kHz.
Yes, I just used script. But the results are still not good. Should the ground turth files be resampled from 32khz to 16khz?
Yes! You need to resample everything to 16khz. See this issue on Github: https://github.com/declare-lab/tango/issues/28
Yes! You need to resample everything to 16khz. See this issue on Github: https://github.com/declare-lab/tango/issues/28
Thanks! After resample the reference files to 16khz, I got better FD:19.5 and KL_softmax:1.148, but FAD get worse from 2.3 to 51.021. Could you give some advice to fix it
Yes! You need to resample everything to 16khz. See this issue on Github: https://github.com/declare-lab/tango/issues/28
Thanks! After resample the reference files to 16khz, I got better FD:19.5 and KL_softmax:1.148, but FAD get worse from 2.3 to 51.021. Could you give some advice to fix it
I just change the reference audio encoding from pcm_f32le to pcm_s16le , in order to be same with the orignal 32khz reference audio, the fad decrease from 54 to 2.7, but it still not so good. orz