Probability-Entropy Calibration: An Elastic Indicator for Adaptive Fine-tuning Paper • 2602.01745 • Published Feb 2 • 8
Dr.Mi-Bench: A Modular-integrated Benchmark for Scientific Deep Research Agent Paper • 2512.00986 • Published Nov 30, 2025 • 1
RECODE-H: A Benchmark for Research Code Development with Interactive Human Feedback Paper • 2510.06186 • Published Oct 7, 2025 • 1
Generative Archetype-Grounded Item Representations for Sequential Recommendation Paper • 2606.11023 • Published 16 days ago • 1
From Knowing to Acting: Benchmarking Self-Awareness Capability of LLM Agents Paper • 2606.20661 • Published 16 days ago • 1