From Vision to Motion - a Vanqi Collection

Vanqi 's Collections

Interesting work but not directly related

From Vision to Motion

From Vision to Motion

updated 1 day ago

HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning

Paper • 2603.17024 • Published Mar 17 • 110
WorldAgents: Can Foundation Image Models be Agents for 3D World Models?

Paper • 2603.19708 • Published Mar 20 • 13
MACRO: Advancing Multi-Reference Image Generation with Structured Long-Context Data

Paper • 2603.25319 • Published Mar 26 • 32
ArtHOI: Taming Foundation Models for Monocular 4D Reconstruction of Hand-Articulated-Object Interactions

Paper • 2603.25791 • Published Mar 26 • 7
Agentic-MME: What Agentic Capability Really Brings to Multimodal Intelligence?

Paper • 2604.03016 • Published Apr 3 • 37
The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook

Paper • 2604.02029 • Published Apr 2 • 152
OpenWorldLib: A Unified Codebase and Definition of Advanced World Models

Paper • 2604.04707 • Published Apr 6 • 203
Prox-E: Fine-Grained 3D Shape Editing via Primitive-Based Abstractions

Paper • 2604.23774 • Published Apr 29 • 17
End-to-End Autoregressive Image Generation with 1D Semantic Tokenizer

Paper • 2605.00503 • Published May 1 • 13
Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation

Paper • 2605.04128 • Published May 5 • 17
Beyond the Last Layer: Multi-Layer Representation Fusion for Visual Tokenization

Paper • 2605.10780 • Published May 12 • 33
LLaVA-OneVision-2: Towards Next-Generation Perceptual Intelligence

Paper • 2605.25979 • Published May 25 • 27
Agent Explorative Policy Optimization for Multimodal Agentic Reasoning

Paper • 2605.28774 • Published May 27 • 93
Representation Forcing for Bottleneck-Free Unified Multimodal Models

Paper • 2605.31604 • Published May 29 • 63
Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data

Paper • 2507.07095 • Published Jul 9, 2025 • 56
VideoMDM: Towards 3D Human Motion Generation From 2D Supervision

Paper • 2606.13364 • Published Jun 11 • 20
Motion4Motion: Motion Transfer Across Subjects at Inference

Paper • 2607.11644 • Published 6 days ago • 9
Latent-Identity Tuning in Text-to-Image Personalization Models

Paper • 2607.11885 • Published 6 days ago • 10
4D Human-Scene Reconstruction from Low-Overlap Captures

Paper • 2607.09125 • Published 9 days ago • 49
PhysisForcing: Physics Reinforced World Simulator for Robotic Manipulation

Paper • 2606.28128 • Published 23 days ago • 52
Wan-Streamer v0.1: End-to-end Real-time Interactive Foundation Models

Paper • 2606.25041 • Published 26 days ago • 119
VideoChat3: Fully Open Video MLLM for Efficient and Generalist Video Understanding

Paper • 2607.14935 • Published 3 days ago • 116