Generate images from text prompts using a lightweight SDXL model
Generate a talking face video from an image and audio