BLIP3o-NEXT: Next Frontier of Native Image Generation Paper • 2510.15857 • Published Oct 17, 2025 • 26
VideoNSA: Native Sparse Attention Scales Video Understanding Paper • 2510.02295 • Published Oct 2, 2025 • 10
Benchmark Designers Should "Train on the Test Set" to Expose Exploitable Non-Visual Shortcuts Paper • 2511.04655 • Published Nov 6, 2025 • 10
Benchmarking Visual State Tracking in Multimodal Video Understanding Paper • 2606.03920 • Published 24 days ago • 50
Benchmarking Visual State Tracking in Multimodal Video Understanding Paper • 2606.03920 • Published 24 days ago • 50
Beyond Language Modeling: An Exploration of Multimodal Pretraining Paper • 2603.03276 • Published Mar 3 • 107
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders Paper • 2601.16208 • Published Jan 22 • 55
Cambrian-S-Data Collection Data used during Cambrian-S's 4-stage training • 4 items • Updated Feb 27 • 5