LongDS-Bench: On the Failure of Long-Horizon Agentic Data Analysis Paper • 2605.30434 • Published 30 days ago • 23
Exploring Autonomous Agentic Data Engineering for Model Specialization Paper • 2605.30407 • Published 30 days ago • 23
How LoRA Remembers? A Parametric Memory Law for LLM Finetuning Paper • 2605.30260 • Published 30 days ago • 44
When Should Models Change Their Minds? Contextual Belief Management in Large Language Models Paper • 2605.30219 • Published 30 days ago • 26
Terminal-World: Scaling Terminal-Agent Environments via Agent Skills Paper • 2605.20876 • Published May 20 • 7
EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL Paper • 2605.18703 • Published May 18 • 50
The Granularity Axis: A Micro-to-Macro Latent Direction for Social Roles in Language Models Paper • 2605.06196 • Published May 7 • 9
The Granularity Axis: A Micro-to-Macro Latent Direction for Social Roles in Language Models Paper • 2605.06196 • Published May 7 • 9
From Context to Skills: Can Language Models Learn from Context Skillfully? Paper • 2604.27660 • Published May 3 • 171
SAVOIR: Learning Social Savoir-Faire via Shapley-based Reward Attribution Paper • 2604.18982 • Published Apr 21 • 5
SAVOIR: Learning Social Savoir-Faire via Shapley-based Reward Attribution Paper • 2604.18982 • Published Apr 21 • 5
SAVOIR: Learning Social Savoir-Faire via Shapley-based Reward Attribution Paper • 2604.18982 • Published Apr 21 • 5
Stratagem: Learning Transferable Reasoning via Trajectory-Modulated Game Self-Play Paper • 2604.17696 • Published Apr 20 • 6
Stratagem: Learning Transferable Reasoning via Trajectory-Modulated Game Self-Play Paper • 2604.17696 • Published Apr 20 • 6
Stratagem: Learning Transferable Reasoning via Trajectory-Modulated Game Self-Play Paper • 2604.17696 • Published Apr 20 • 6
GlobeSumm: A Challenging Benchmark Towards Unifying Multi-lingual, Cross-lingual and Multi-document News Summarization Paper • 2410.04087 • Published Oct 5, 2024