Beyond Language Modeling: An Exploration of Multimodal Pretraining Paper • 2603.03276 • Published 16 days ago • 97
Data Darwinism Part I: Unlocking the Value of Scientific Data for Pre-training Paper • 2602.07824 • Published Feb 8 • 16
daVinci-Agency: Unlocking Long-Horizon Agency Data-Efficiently Paper • 2602.02619 • Published Feb 2 • 52
daVinci-Dev: Agent-native Mid-training for Software Engineering Paper • 2601.18418 • Published Jan 26 • 126
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders Paper • 2601.16208 • Published Jan 22 • 55
AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World Contexts Paper • 2601.11044 • Published Jan 16 • 34
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation Paper • 2512.23576 • Published Dec 29, 2025 • 66
The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution Paper • 2510.25726 • Published Oct 29, 2025 • 46
Diffusion Transformers with Representation Autoencoders Paper • 2510.11690 • Published Oct 13, 2025 • 170
HuggingFaceTB/SmolVLM2-256M-Video-Instruct Image-Text-to-Text • 0.3B • Updated Apr 8, 2025 • 152k • 98
MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning Paper • 2507.16812 • Published Jul 22, 2025 • 64