Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning Paper • 2601.16163 • Published 5 days ago • 13
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders Paper • 2601.16208 • Published 5 days ago • 51
Agentic-R: Learning to Retrieve for Agentic Search Paper • 2601.11888 • Published 10 days ago • 19
OmniTransfer: All-in-one Framework for Spatio-temporal Video Transfer Paper • 2601.14250 • Published 7 days ago • 44
Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning Paper • 2601.09667 • Published 13 days ago • 82
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head Paper • 2601.07832 • Published 15 days ago • 51
Transition Matching Distillation for Fast Video Generation Paper • 2601.09881 • Published 13 days ago • 32
DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation Paper • 2601.09688 • Published 13 days ago • 124
view article Article How We Built a Semantic Highlight Model To Save Token Cost for RAG 12 days ago • 60
VideoAR: Autoregressive Video Generation via Next-Frame & Scale Prediction Paper • 2601.05966 • Published 18 days ago • 23
Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking Paper • 2601.04720 • Published 19 days ago • 50
NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos Paper • 2601.00393 • Published 26 days ago • 129
VINO: A Unified Visual Generator with Interleaved OmniModal Context Paper • 2601.02358 • Published 22 days ago • 29
DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer Paper • 2601.01425 • Published 23 days ago • 51