AstaBench: Rigorous Benchmarking of AI Agents with a Scientific Research Suite Paper • 2510.21652 • Published Oct 24, 2025 • 4
TinyScientist: An Interactive, Extensible, and Controllable Framework for Building Research Agents Paper • 2510.06579 • Published Oct 8, 2025
Analytica: Soft Propositional Reasoning for Robust and Scalable LLM-Driven Analysis Paper • 2604.23072 • Published 19 days ago
ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning Paper • 2502.01100 • Published Feb 3, 2025 • 21