MuSS: A Large-Scale Dataset and Cinematic Narrative Benchmark for Multi-Shot Subject-to-Video Generation Paper • 2604.23789 • Published 21 days ago • 6
Patch-as-Decodable-Token: Towards Unified Multi-Modal Vision Tasks in MLLMs Paper • 2510.01954 • Published Oct 2, 2025 • 14