Memory-V2V: Augmenting Video-to-Video Diffusion Models with Memory Paper • 2601.16296 • Published 4 days ago • 13
HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding Paper • 2601.14724 • Published 5 days ago • 71
OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation Paper • 2601.15369 • Published 5 days ago • 16
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders Paper • 2601.16208 • Published 4 days ago • 50
LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR Paper • 2601.14251 • Published 6 days ago • 23
OmniTransfer: All-in-one Framework for Spatio-temporal Video Transfer Paper • 2601.14250 • Published 6 days ago • 43
CoDance: An Unbind-Rebind Paradigm for Robust Multi-Subject Animation Paper • 2601.11096 • Published 10 days ago • 8
Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding Paper • 2601.10611 • Published 11 days ago • 26
Transition Matching Distillation for Fast Video Generation Paper • 2601.09881 • Published 12 days ago • 31
Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking Paper • 2601.04720 • Published 18 days ago • 49