SEM: Sparse Embedding Modulation for Post-Hoc Debiasing of Vision-Language Models Paper • 2603.19028 • Published 5 days ago • 11
TerraScope: Pixel-Grounded Visual Reasoning for Earth Observation Paper • 2603.19039 • Published 5 days ago • 42
ProactiveBench: Benchmarking Proactiveness in Multimodal Large Language Models Paper • 2603.19466 • Published 5 days ago • 33
ProactiveBench: Benchmarking Proactiveness in Multimodal Large Language Models Paper • 2603.19466 • Published 5 days ago • 33
Specificity-aware reinforcement learning for fine-grained open-world classification Paper • 2603.03197 • Published 21 days ago • 16
How to Take a Memorable Picture? Empowering Users with Actionable Feedback Paper • 2602.21877 • Published 27 days ago • 16
Large Multimodal Models as General In-Context Classifiers Paper • 2602.23229 • Published 26 days ago • 26
On the Effectiveness of LayerNorm Tuning for Continual Learning in Vision Transformers Paper • 2308.09610 • Published Aug 18, 2023
Less is more: Summarizing Patch Tokens for efficient Multi-Label Class-Incremental Learning Paper • 2405.15633 • Published May 24, 2024