Visual Intelligence, Pretrained Vision-and-Language Model, Embodied AI, Collaborative Agents, Vision Task(Object Detection, Segmentation)
Generate images from text prompts using a lightweight SDXL model
Answer questions about images with text prompts