Collections
Discover the best community collections!
Collections including paper arxiv:2604.10905
-
Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device
Paper • 2602.20161 • Published • 23 -
A Very Big Video Reasoning Suite
Paper • 2602.20159 • Published • 519 -
Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model
Paper • 2603.21986 • Published • 123 -
AURA: Always-On Understanding and Real-Time Assistance via Video Streams
Paper • 2604.04184 • Published • 50
-
Reinforcement Learning via Self-Distillation
Paper • 2601.20802 • Published • 43 -
Reinforced Attention Learning
Paper • 2602.04884 • Published • 29 -
Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled
Image-Text-to-Text • 28B • Updated • 577k • 2.73k -
Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music
Paper • 2604.10905 • Published • 26
-
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
Paper • 2503.14734 • Published • 7 -
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation
Paper • 2401.02117 • Published • 33 -
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics
Paper • 2506.01844 • Published • 158 -
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding
Paper • 2506.16035 • Published • 89
-
ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling
Paper • 2603.25746 • Published • 155 -
TAPS: Task Aware Proposal Distributions for Speculative Sampling
Paper • 2603.27027 • Published • 142 -
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models
Paper • 2603.25716 • Published • 154 -
LongCat-Next: Lexicalizing Modalities as Discrete Tokens
Paper • 2603.27538 • Published • 143
-
mHC: Manifold-Constrained Hyper-Connections
Paper • 2512.24880 • Published • 322 -
Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process
Paper • 2512.23988 • Published • 19 -
SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time
Paper • 2512.25075 • Published • 15 -
Guiding a Diffusion Transformer with the Internal Dynamics of Itself
Paper • 2512.24176 • Published • 8
-
MuLan: A Joint Embedding of Music Audio and Natural Language
Paper • 2208.12415 • Published -
CoCa: Contrastive Captioners are Image-Text Foundation Models
Paper • 2205.01917 • Published • 3 -
SoundStream: An End-to-End Neural Audio Codec
Paper • 2107.03312 • Published -
Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music
Paper • 2604.10905 • Published • 26
-
ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling
Paper • 2603.25746 • Published • 155 -
TAPS: Task Aware Proposal Distributions for Speculative Sampling
Paper • 2603.27027 • Published • 142 -
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models
Paper • 2603.25716 • Published • 154 -
LongCat-Next: Lexicalizing Modalities as Discrete Tokens
Paper • 2603.27538 • Published • 143
-
Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device
Paper • 2602.20161 • Published • 23 -
A Very Big Video Reasoning Suite
Paper • 2602.20159 • Published • 519 -
Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model
Paper • 2603.21986 • Published • 123 -
AURA: Always-On Understanding and Real-Time Assistance via Video Streams
Paper • 2604.04184 • Published • 50
-
mHC: Manifold-Constrained Hyper-Connections
Paper • 2512.24880 • Published • 322 -
Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process
Paper • 2512.23988 • Published • 19 -
SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time
Paper • 2512.25075 • Published • 15 -
Guiding a Diffusion Transformer with the Internal Dynamics of Itself
Paper • 2512.24176 • Published • 8
-
Reinforcement Learning via Self-Distillation
Paper • 2601.20802 • Published • 43 -
Reinforced Attention Learning
Paper • 2602.04884 • Published • 29 -
Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled
Image-Text-to-Text • 28B • Updated • 577k • 2.73k -
Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music
Paper • 2604.10905 • Published • 26
-
MuLan: A Joint Embedding of Music Audio and Natural Language
Paper • 2208.12415 • Published -
CoCa: Contrastive Captioners are Image-Text Foundation Models
Paper • 2205.01917 • Published • 3 -
SoundStream: An End-to-End Neural Audio Codec
Paper • 2107.03312 • Published -
Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music
Paper • 2604.10905 • Published • 26
-
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
Paper • 2503.14734 • Published • 7 -
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation
Paper • 2401.02117 • Published • 33 -
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics
Paper • 2506.01844 • Published • 158 -
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding
Paper • 2506.16035 • Published • 89