Thoth: Mid-Training Bridges LLMs to Time Series Understanding Paper • 2603.01042 • Published 5 days ago
Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models Paper • 2601.19834 • Published Jan 27 • 25
Audio-Visual World Models: Towards Multisensory Imagination in Sight and Sound Paper • 2512.00883 • Published Nov 30, 2025
Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models Paper • 2601.19834 • Published Jan 27 • 25
Vid2World: Crafting Video Diffusion Models to Interactive World Models Paper • 2505.14357 • Published May 20, 2025 • 27
SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors Paper • 2502.11167 • Published Feb 16, 2025 • 10
Long-Sequence Recommendation Models Need Decoupled Embeddings Paper • 2410.02604 • Published Oct 3, 2024
Pre-training Contextualized World Models with In-the-wild Videos for Reinforcement Learning Paper • 2305.18499 • Published May 29, 2023
Supercompiler Code Optimization with Zero-Shot Reinforcement Learning Paper • 2404.16077 • Published Apr 24, 2024
Out-of-Dynamics Imitation Learning from Multimodal Demonstrations Paper • 2211.06839 • Published Nov 13, 2022
Flowformer: Linearizing Transformers with Conservation Flows Paper • 2202.06258 • Published Feb 13, 2022
Supported Policy Optimization for Offline Reinforcement Learning Paper • 2202.06239 • Published Feb 13, 2022
iVideoGPT: Interactive VideoGPTs are Scalable World Models Paper • 2405.15223 • Published May 24, 2024 • 17