AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios Paper • 2602.23166 • Published 15 days ago • 40
P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads Paper • 2602.09443 • Published Feb 10 • 57
Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations Paper • 2602.05885 • Published Feb 5 • 28
WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning Paper • 2602.04634 • Published Feb 4 • 96
LatentMem: Customizing Latent Memory for Multi-Agent Systems Paper • 2602.03036 • Published Feb 3 • 14
LatentMem: Customizing Latent Memory for Multi-Agent Systems Paper • 2602.03036 • Published Feb 3 • 14
Scaling Embeddings Outperforms Scaling Experts in Language Models Paper • 2601.21204 • Published Jan 29 • 102
AgentDoG Collection A Diagnostic Guardrail Framework for AI Agent Safety and Security • 8 items • Updated 11 days ago • 106
Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model Paper • 2601.15892 • Published Jan 22 • 53
Advances and Frontiers of LLM-based Issue Resolution in Software Engineering: A Comprehensive Survey Paper • 2601.11655 • Published Jan 15 • 61