Separate songs into individual stems (vocals, drums, etc.)
Convert and separate audio using models and TTS