LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation Paper ⢠2605.18739 ⢠Published May 18 ⢠115
Rolling Sink: Bridging Limited-Horizon Training and Open-Ended Testing in Autoregressive Video Diffusion Paper ⢠2602.07775 ⢠Published Feb 8 ⢠8
Running on Zero Agents 19 Concerto š¶ 19 Reconstruct and visualize 3D point clouds from videos or PLY files
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs Paper ⢠2510.11696 ⢠Published Oct 13, 2025 ⢠183
LongLive: Real-time Interactive Long Video Generation Paper ⢠2509.22622 ⢠Published Sep 26, 2025 ⢠189
EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning Paper ⢠2509.20360 ⢠Published Sep 24, 2025 ⢠18
Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search Paper ⢠2509.07969 ⢠Published Sep 9, 2025 ⢠60
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning Paper ⢠2507.13348 ⢠Published Jul 17, 2025 ⢠80
MiCo: Multi-image Contrast for Reinforcement Visual Reasoning Paper ⢠2506.22434 ⢠Published Jun 27, 2025 ⢠10
Training-Free Efficient Video Generation via Dynamic Token Carving Paper ⢠2505.16864 ⢠Published May 22, 2025 ⢠24
Video-P2P: Video Editing with Cross-attention Control Paper ⢠2303.04761 ⢠Published Mar 8, 2023 ⢠2
Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code Paper ⢠2310.01506 ⢠Published Oct 2, 2023
RL-GPT: Integrating Reinforcement Learning and Code-as-policy Paper ⢠2402.19299 ⢠Published Feb 29, 2024 ⢠2
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models Paper ⢠2403.18814 ⢠Published Mar 27, 2024 ⢠49
Multi-modal Cooking Workflow Construction for Food Recipes Paper ⢠2008.09151 ⢠Published Aug 20, 2020 ⢠1