Generate spoken audio from text using selectable voices
Generate a talking face video from an image and audio