What Makes a Good Query? Measuring the Impact of Human-Confusing Linguistic Features on LLM Performance Paper • 2602.20300 • Published 19 days ago • 4
CoCo: Code as CoT for Text-to-Image Preview and Rare Concept Generation Paper • 2603.08652 • Published 5 days ago • 32
DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints Paper • 2601.18137 • Published Jan 26 • 35