JMMMU-Pro: Image-based Japanese Multi-discipline Multimodal Understanding Benchmark via Vibe Benchmark Construction Paper • 2512.14620 • Published Dec 16, 2025 • 2
JMMMU-Pro: Image-based Japanese Multi-discipline Multimodal Understanding Benchmark via Vibe Benchmark Construction Paper • 2512.14620 • Published Dec 16, 2025 • 2
Jr. AI Scientist and Its Risk Report: Autonomous Scientific Exploration from a Baseline Paper Paper • 2511.04583 • Published Nov 6, 2025 • 2
ShinkaEvolve: Towards Open-Ended And Sample-Efficient Program Evolution Paper • 2509.19349 • Published Sep 17, 2025 • 2
FoodMLLM-JP: Leveraging Multimodal Large Language Models for Japanese Recipe Generation Paper • 2409.18459 • Published Sep 27, 2024
Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search Paper • 2503.04412 • Published Mar 6, 2025 • 5
MangaVQA and MangaLMM: A Benchmark and Specialized Model for Multimodal Manga Understanding Paper • 2505.20298 • Published May 26, 2025 • 9
ALE-Bench: A Benchmark for Long-Horizon Objective-Driven Algorithm Engineering Paper • 2506.09050 • Published Jun 10, 2025 • 6
WebChoreArena: Evaluating Web Browsing Agents on Realistic Tedious Web Tasks Paper • 2506.01952 • Published Jun 2, 2025 • 10
A Benchmark and Evaluation for Real-World Out-of-Distribution Detection Using Vision-Language Models Paper • 2501.18463 • Published Jan 30, 2025 • 1
MangaVQA and MangaLMM: A Benchmark and Specialized Model for Multimodal Manga Understanding Paper • 2505.20298 • Published May 26, 2025 • 9