Convert and separate audio using models and TTS
Inpaint images using prompts
Create a spectrogram and get audio info