TerraScope: Pixel-Grounded Visual Reasoning for Earth Observation Paper • 2603.19039 • Published 5 days ago • 43
ProactiveBench: Benchmarking Proactiveness in Multimodal Large Language Models Paper • 2603.19466 • Published 5 days ago • 34
On Large Multimodal Models as Open-World Image Classifiers Paper • 2503.21851 • Published Mar 27, 2025 • 8
Specificity-aware reinforcement learning for fine-grained open-world classification Paper • 2603.03197 • Published 21 days ago • 16
Large Multimodal Models as General In-Context Classifiers Paper • 2602.23229 • Published 26 days ago • 26
Compositional Caching for Training-free Open-vocabulary Attribute Detection Paper • 2503.19145 • Published Mar 24, 2025 • 3
GroundingME: Exposing the Visual Grounding Gap in MLLMs through Multi-Dimensional Evaluation Paper • 2512.17495 • Published Dec 19, 2025 • 20
Backward-Compatible Aligned Representations via an Orthogonal Transformation Layer Paper • 2408.08793 • Published Aug 16, 2024 • 7
$\boldsymbolλ$-Orthogonality Regularization for Compatible Representation Learning Paper • 2509.16664 • Published Sep 20, 2025
Comics Datasets Framework: Mix of Comics datasets for detection benchmarking Paper • 2407.03540 • Published Jul 3, 2024 • 3
CoMix: A Comprehensive Benchmark for Multi-Task Comic Understanding Paper • 2407.03550 • Published Jul 4, 2024 • 2
One missing piece in Vision and Language: A Survey on Comics Understanding Paper • 2409.09502 • Published Sep 14, 2024 • 24
Backward-Compatible Aligned Representations via an Orthogonal Transformation Layer Paper • 2408.08793 • Published Aug 16, 2024 • 7
Backward-Compatible Aligned Representations via an Orthogonal Transformation Layer Paper • 2408.08793 • Published Aug 16, 2024 • 7 • 2
Comics Datasets Framework: Mix of Comics datasets for detection benchmarking Paper • 2407.03540 • Published Jul 3, 2024 • 3