Jr. AI Scientist and Its Risk Report: Autonomous Scientific Exploration from a Baseline Paper Paper • 2511.04583 • Published Nov 6, 2025 • 2 • 2
WebChoreArena: Evaluating Web Browsing Agents on Realistic Tedious Web Tasks Paper • 2506.01952 • Published Jun 2, 2025 • 10 • 3
WebChoreArena: Evaluating Web Browsing Agents on Realistic Tedious Web Tasks Paper • 2506.01952 • Published Jun 2, 2025 • 10 • 3
A Benchmark and Evaluation for Real-World Out-of-Distribution Detection Using Vision-Language Models Paper • 2501.18463 • Published Jan 30, 2025 • 1 • 1
MangaVQA and MangaLMM: A Benchmark and Specialized Model for Multimodal Manga Understanding Paper • 2505.20298 • Published May 26, 2025 • 9 • 2
Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey Paper • 2407.21794 • Published Jul 31, 2024 • 6 • 2