EgoLCD: Egocentric Video Generation with Long Context Diffusion Paper • 2512.04515 • Published 8 days ago • 5
ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning Paper • 2512.05111 • Published 7 days ago • 45
PRInTS: Reward Modeling for Long-Horizon Information Seeking Paper • 2511.19314 • Published 17 days ago • 6
StreamGaze: Gaze-Guided Temporal Reasoning and Proactive Understanding in Streaming Videos Paper • 2512.01707 • Published 11 days ago • 7
StreamingVLM: Real-Time Understanding for Infinite Video Streams Paper • 2510.09608 • Published Oct 10 • 50
Video-Skill-CoT: Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning Paper • 2506.03525 • Published Jun 4 • 6
Complex-Edit: CoT-Like Instruction Generation for Complexity-Controllable Image Editing Benchmark Paper • 2504.13143 • Published Apr 17 • 7
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published Apr 14 • 303
On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective Paper • 2502.14296 • Published Feb 20 • 45
TokenVerse: Versatile Multi-concept Personalization in Token Modulation Space Paper • 2501.12224 • Published Jan 21 • 48
BECoTTA: Input-dependent Online Blending of Experts for Continual Test-time Adaptation Paper • 2402.08712 • Published Feb 13, 2024 • 1
VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement Paper • 2411.15115 • Published Nov 22, 2024 • 9