Efficient Agent Evaluation via Diversity-Guided User Simulation Paper • 2604.21480 • Published 14 days ago • 15
Alignment Makes Language Models Normative, Not Descriptive Paper • 2603.17218 • Published Mar 17 • 46
LIBERTy: A Causal Framework for Benchmarking Concept-Based Explanations of LLMs with Structural Counterfactuals Paper • 2601.10700 • Published Jan 15 • 18
TabSTAR: A Foundation Tabular Model With Semantically Target-Aware Representations Paper • 2505.18125 • Published May 23, 2025 • 112
GLEE: A Unified Framework and Benchmark for Language-based Economic Environments Paper • 2410.05254 • Published Oct 7, 2024 • 85