Hala Technical Report: Building Arabic-Centric Instruction & Translation Models at Scale Paper • 2509.14008 • Published Sep 17, 2025 • 88
Motion-Aware Concept Alignment for Consistent Video Editing Paper • 2506.01004 • Published Jun 1, 2025 • 8
Token-Efficient Long Video Understanding for Multimodal LLMs Paper • 2503.04130 • Published Mar 6, 2025 • 96
Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction Paper • 2501.03218 • Published Jan 6, 2025 • 35
Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization Paper • 2412.18525 • Published Dec 24, 2024 • 74
MarDini: Masked Autoregressive Diffusion for Video Generation at Scale Paper • 2410.20280 • Published Oct 26, 2024 • 23