Foreground-Aware Relation Network for Geospatial Object Segmentation in High Spatial Resolution Remote Sensing Imagery Paper • 2011.09766 • Published Nov 19, 2020
BRIGHT: A globally distributed multimodal building damage assessment dataset with very-high-resolution for all-weather disaster response Paper • 2501.06019 • Published Jan 10, 2025
MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation Paper • 2503.10497 • Published Mar 13, 2025 • 2
Seeing is Believing, but How Much? A Comprehensive Analysis of Verbalized Calibration in Vision-Language Models Paper • 2505.20236 • Published May 26, 2025 • 3
DisasterM3: A Remote Sensing Vision-Language Dataset for Disaster Damage Assessment and Response Paper • 2505.21089 • Published May 27, 2025 • 4
DynamicVL: Benchmarking Multimodal Large Language Models for Dynamic City Understanding Paper • 2505.21076 • Published May 27, 2025
DisasterM3: A Remote Sensing Vision-Language Dataset for Disaster Damage Assessment and Response Paper • 2505.21089 • Published May 27, 2025 • 4
Position: The Hidden Costs and Measurement Gaps of Reinforcement Learning with Verifiable Rewards Paper • 2509.21882 • Published Sep 26, 2025
DynamicVL: Benchmarking Multimodal Large Language Models for Dynamic City Understanding Paper • 2505.21076 • Published May 27, 2025
Taming Object Hallucinations with Verified Atomic Confidence Estimation Paper • 2511.09228 • Published Nov 12, 2025
The Confidence Dichotomy: Analyzing and Mitigating Miscalibration in Tool-Use Agents Paper • 2601.07264 • Published 16 days ago • 24
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search Paper • 2509.25454 • Published Sep 29, 2025 • 143
SynRS3D: A Synthetic Dataset for Global 3D Semantic Understanding from Monocular Remote Sensing Imagery Paper • 2406.18151 • Published Jun 26, 2024 • 1
BRIGHT: A globally distributed multimodal building damage assessment dataset with very-high-resolution for all-weather disaster response Paper • 2501.06019 • Published Jan 10, 2025
MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation Paper • 2503.10497 • Published Mar 13, 2025 • 2
Thinking Out Loud: Do Reasoning Models Know When They're Right? Paper • 2504.06564 • Published Apr 9, 2025