Beyond Language Modeling: An Exploration of Multimodal Pretraining Paper • 2603.03276 • Published 10 days ago • 88
Solaris: Building a Multiplayer Video World Model in Minecraft Paper • 2602.22208 • Published 16 days ago • 28
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders Paper • 2601.16208 • Published Jan 22 • 54
Diffusion Transformers with Representation Autoencoders Paper • 2510.11690 • Published Oct 13, 2025 • 168
BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing Paper • 2506.17450 • Published Jun 20, 2025 • 64
LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming? Paper • 2506.11928 • Published Jun 13, 2025 • 25
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps Paper • 2501.09732 • Published Jan 16, 2025 • 72
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs Paper • 2406.16860 • Published Jun 24, 2024 • 63