MonitorBench: A Comprehensive Benchmark for Chain-of-Thought Monitorability in Large Language Models Paper • 2603.28590 • Published 3 days ago • 17
A Unified Agentic Framework for Evaluating Conditional Image Generation Paper • 2504.07046 • Published Apr 9, 2025 • 30