I am interested in Multimodal Video Understanding, Anomaly Detection, and Generative Video Models, with a focus on modeling complex real-world phenomena such as motion, behavior, semantics, and cross-modal interactions from visual, textual, and audio data. My work aims to develop intelligent real-time systems for understanding and generating video content in safety-critical and dynamic environments.