Video-Teller: Enhancing Cross-Modal Generation with Fusion and Decoupling Paper • 2310.04991 • Published Oct 8, 2023 • 1
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning Paper • 2409.12568 • Published Sep 19, 2024 • 50
DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling Paper • 2505.11196 • Published May 16, 2025 • 14
Vidi2: Large Multimodal Models for Video Understanding and Creation Paper • 2511.19529 • Published Nov 24, 2025 • 2
Thinking With Bounding Boxes: Enhancing Spatio-Temporal Video Grounding via Reinforcement Fine-Tuning Paper • 2511.21375 • Published Nov 26, 2025
UniPrefill: Universal Long-Context Prefill Acceleration via Block-wise Dynamic Sparsification Paper • 2605.06221 • Published 5 days ago • 20
UniPrefill: Universal Long-Context Prefill Acceleration via Block-wise Dynamic Sparsification Paper • 2605.06221 • Published 5 days ago • 20
FlashPrefill: Instantaneous Pattern Discovery and Thresholding for Ultra-Fast Long-Context Prefilling Paper • 2603.06199 • Published Mar 6 • 9