JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation Paper β’ 2602.19163 β’ Published 19 days ago β’ 14
DreamID-Omni: Unified Framework for Controllable Human-Centric Audio-Video Generation Paper β’ 2602.12160 β’ Published 29 days ago β’ 38
AdaGaR: Adaptive Gabor Representation for Dynamic Scene Reconstruction Paper β’ 2601.00796 β’ Published Jan 2 β’ 32
Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation Paper β’ 2601.00664 β’ Published Jan 2 β’ 57
NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos Paper β’ 2601.00393 β’ Published Jan 1 β’ 133
Omni-Attribute: Open-vocabulary Attribute Encoder for Visual Concept Personalization Paper β’ 2512.10955 β’ Published Dec 11, 2025 β’ 7
Efficiently Reconstructing Dynamic Scenes One D4RT at a Time Paper β’ 2512.08924 β’ Published Dec 9, 2025 β’ 20
Preserving Source Video Realism: High-Fidelity Face Swapping for Cinematic Quality Paper β’ 2512.07951 β’ Published Dec 8, 2025 β’ 50
RynnVLA-002: A Unified Vision-Language-Action and World Model Paper β’ 2511.17502 β’ Published Nov 21, 2025 β’ 28
Parrot: Persuasion and Agreement Robustness Rating of Output Truth -- A Sycophancy Robustness Benchmark for LLMs Paper β’ 2511.17220 β’ Published Nov 21, 2025 β’ 19
Benchmarking Diversity in Image Generation via Attribute-Conditional Human Evaluation Paper β’ 2511.10547 β’ Published Nov 13, 2025 β’ 5
UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist Paper β’ 2511.08521 β’ Published Nov 11, 2025 β’ 38
One Small Step in Latent, One Giant Leap for Pixels: Fast Latent Upscale Adapter for Your Diffusion Models Paper β’ 2511.10629 β’ Published Nov 13, 2025 β’ 127
Depth Anything 3: Recovering the Visual Space from Any Views Paper β’ 2511.10647 β’ Published Nov 13, 2025 β’ 99
Kimi Linear: An Expressive, Efficient Attention Architecture Paper β’ 2510.26692 β’ Published Oct 30, 2025 β’ 128