Generate customized realistic photos from face images
Convert and separate audio using models and TTS