-
NitroGen: An Open Foundation Model for Generalist Gaming Agents
Paper • 2601.02427 • Published • 41 -
mHC: Manifold-Constrained Hyper-Connections
Paper • 2512.24880 • Published • 264 -
DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models
Paper • 2512.24165 • Published • 49 -
Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting
Paper • 2601.02151 • Published • 98
Collections
Discover the best community collections!
Collections including paper arxiv:2601.00393
-
Learning from Next-Frame Prediction: Autoregressive Video Modeling Encodes Effective Representations
Paper • 2512.21004 • Published • 12 -
Spatia: Video Generation with Updatable Spatial Memory
Paper • 2512.15716 • Published • 32 -
NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos
Paper • 2601.00393 • Published • 122
-
Captain Safari: A World Engine
Paper • 2511.22815 • Published • 10 -
Visionary: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform
Paper • 2512.08478 • Published • 76 -
WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling
Paper • 2512.14614 • Published • 69 -
NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos
Paper • 2601.00393 • Published • 122
-
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Paper • 2506.09113 • Published • 105 -
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion
Paper • 2506.08009 • Published • 30 -
Seeing Voices: Generating A-Roll Video from Audio with Mirage
Paper • 2506.08279 • Published • 27 -
PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement
Paper • 2506.07848 • Published • 4
-
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
Paper • 2509.26507 • Published • 544 -
mHC: Manifold-Constrained Hyper-Connections
Paper • 2512.24880 • Published • 264 -
NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos
Paper • 2601.00393 • Published • 122 -
LTX-2: Efficient Joint Audio-Visual Foundation Model
Paper • 2601.03233 • Published • 127
-
PersonaLive! Expressive Portrait Image Animation for Live Streaming
Paper • 2512.11253 • Published • 34 -
NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos
Paper • 2601.00393 • Published • 122 -
Agent READMEs: An Empirical Study of Context Files for Agentic Coding
Paper • 2511.12884 • Published • 20 -
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion
Paper • 2503.11576 • Published • 135
-
MagicWorld: Interactive Geometry-driven Video World Exploration
Paper • 2511.18886 • Published • 19 -
EvoVLA: Self-Evolving Vision-Language-Action Model
Paper • 2511.16166 • Published • 5 -
MobileVLA-R1: Reinforcing Vision-Language-Action for Mobile Robots
Paper • 2511.17889 • Published • 5 -
NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos
Paper • 2601.00393 • Published • 122
-
NitroGen: An Open Foundation Model for Generalist Gaming Agents
Paper • 2601.02427 • Published • 41 -
mHC: Manifold-Constrained Hyper-Connections
Paper • 2512.24880 • Published • 264 -
DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models
Paper • 2512.24165 • Published • 49 -
Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting
Paper • 2601.02151 • Published • 98
-
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
Paper • 2509.26507 • Published • 544 -
mHC: Manifold-Constrained Hyper-Connections
Paper • 2512.24880 • Published • 264 -
NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos
Paper • 2601.00393 • Published • 122 -
LTX-2: Efficient Joint Audio-Visual Foundation Model
Paper • 2601.03233 • Published • 127
-
Learning from Next-Frame Prediction: Autoregressive Video Modeling Encodes Effective Representations
Paper • 2512.21004 • Published • 12 -
Spatia: Video Generation with Updatable Spatial Memory
Paper • 2512.15716 • Published • 32 -
NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos
Paper • 2601.00393 • Published • 122
-
PersonaLive! Expressive Portrait Image Animation for Live Streaming
Paper • 2512.11253 • Published • 34 -
NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos
Paper • 2601.00393 • Published • 122 -
Agent READMEs: An Empirical Study of Context Files for Agentic Coding
Paper • 2511.12884 • Published • 20 -
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion
Paper • 2503.11576 • Published • 135
-
Captain Safari: A World Engine
Paper • 2511.22815 • Published • 10 -
Visionary: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform
Paper • 2512.08478 • Published • 76 -
WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling
Paper • 2512.14614 • Published • 69 -
NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos
Paper • 2601.00393 • Published • 122
-
MagicWorld: Interactive Geometry-driven Video World Exploration
Paper • 2511.18886 • Published • 19 -
EvoVLA: Self-Evolving Vision-Language-Action Model
Paper • 2511.16166 • Published • 5 -
MobileVLA-R1: Reinforcing Vision-Language-Action for Mobile Robots
Paper • 2511.17889 • Published • 5 -
NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos
Paper • 2601.00393 • Published • 122
-
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Paper • 2506.09113 • Published • 105 -
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion
Paper • 2506.08009 • Published • 30 -
Seeing Voices: Generating A-Roll Video from Audio with Mirage
Paper • 2506.08279 • Published • 27 -
PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement
Paper • 2506.07848 • Published • 4