view article Article Apriel-1.6-15b-Thinker: Cost-efficient Frontier Multimodal Performance 17 days ago • 81
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code Paper • 2508.18106 • Published Aug 25 • 346
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning Paper • 2509.08755 • Published Sep 10 • 56