Data Darwinism Part I: Unlocking the Value of Scientific Data for Pre-training Paper • 2602.07824 • Published Feb 8 • 18
daVinci-Dev: Agent-native Mid-training for Software Engineering Paper • 2601.18418 • Published Jan 26 • 126
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation Paper • 2512.23576 • Published Dec 29, 2025 • 66
What Generative Search Engines Like and How to Optimize Web Content Cooperatively Paper • 2510.11438 • Published Oct 13, 2025 • 12
RefineX: Learning to Refine Pre-training Data at Scale from Expert-Guided Programs Paper • 2507.03253 • Published Jul 4, 2025 • 19
OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling Paper • 2506.20512 • Published Jun 25, 2025 • 48
DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research Paper • 2505.19253 • Published May 25, 2025 • 34