EvalAgent: Discovering Implicit Evaluation Criteria from the Web Paper • 2504.15219 • Published Apr 21, 2025 • 1
QUDsim: Quantifying Discourse Similarities in LLM-Generated Text Paper • 2504.09373 • Published Apr 12, 2025
Calibrate-Then-Act: Cost-Aware Exploration in LLM Agents Paper • 2602.16699 • Published 22 days ago • 15
SkillFactory: Self-Distillation For Learning Cognitive Behaviors Paper • 2512.04072 • Published Dec 3, 2025 • 5
SkillFactory: Self-Distillation For Learning Cognitive Behaviors Paper • 2512.04072 • Published Dec 3, 2025 • 5
SkillFactory: Self-Distillation For Learning Cognitive Behaviors Paper • 2512.04072 • Published Dec 3, 2025 • 5
LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation Paper • 2501.05414 • Published Jan 9, 2025 • 2
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models Paper • 2505.13444 • Published May 19, 2025 • 16
CAR: Conceptualization-Augmented Reasoner for Zero-Shot Commonsense Question Answering Paper • 2305.14869 • Published May 24, 2023 • 1
CANDLE: Iterative Conceptualization and Instantiation Distillation from Large Language Models for Commonsense Reasoning Paper • 2401.07286 • Published Jan 14, 2024
IntentionQA: A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Language Models in E-commerce Paper • 2406.10173 • Published Jun 14, 2024
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models Paper • 2505.13444 • Published May 19, 2025 • 16
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models Paper • 2505.13444 • Published May 19, 2025 • 16
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models Paper • 2505.13444 • Published May 19, 2025 • 16
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models Paper • 2505.13444 • Published May 19, 2025 • 16
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models Paper • 2505.13444 • Published May 19, 2025 • 16
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models Paper • 2505.13444 • Published May 19, 2025 • 16
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models Paper • 2505.13444 • Published May 19, 2025 • 16
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models Paper • 2505.13444 • Published May 19, 2025 • 16
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models Paper • 2505.13444 • Published May 19, 2025 • 16