Auto Research with Specialist Agents Develops Effective and Non-Trivial Training Recipes Paper • 2605.05724 • Published 24 days ago • 15
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning Paper • 2511.22570 • Published Nov 27, 2025 • 95
R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth? Paper • 2510.08189 • Published Oct 9, 2025 • 28
OpenCUA: Open Foundations for Computer-Use Agents Collection This is the official versions of OpenCUA models and AgentNet datasets. Website: https://opencua.xlang.ai/ • 7 items • Updated Mar 2 • 25
DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research Paper • 2505.19253 • Published May 25, 2025 • 34
Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis Paper • 2505.13227 • Published May 19, 2025 • 46
Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning Paper • 2410.14208 • Published Oct 18, 2024 • 3
view article Article Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models +1 loubnabnl, anton-l, davanstrien • Mar 20, 2024 • 114
view article Article Fine-tuning Llama 2 70B using PyTorch FSDP +2 smangrul, sgugger, lewtun, philschmid • Sep 13, 2023 • 32
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Paper • 2404.07972 • Published Apr 11, 2024 • 52