Media understanding
A Visual Question Answering using BLIP model.
Showcasing Yolo, enabling human pose detection
Generate images from text using Stable Diffusion v1.4