Can We Predict Before Executing Machine Learning Agents? Paper • 2601.05930 • Published 4 days ago • 19
Illusions of Confidence? Diagnosing LLM Truthfulness via Neighborhood Consistency Paper • 2601.05905 • Published 4 days ago • 13
InnoGym: Benchmarking the Innovation Potential of AI Agents Paper • 2512.01822 • Published Dec 1, 2025 • 35
LightMem: Lightweight and Efficient Memory-Augmented Generation Paper • 2510.18866 • Published Oct 21, 2025 • 111
OceanGym: A Benchmark Environment for Underwater Embodied Agents Paper • 2509.26536 • Published Sep 30, 2025 • 35