Controllable Egocentric Video Generation via Occlusion-Aware Sparse 3D Hand Joints
Abstract
Efficient control module for controllable video generation of hand-object interactions uses sparse 3D hand joints as explicit control signals to achieve 3D-consistent hand articulation and realistic motion propagation.
Controllable video generation for complex hand-object interactions is a critical step toward building visual world models. However, existing methods often struggle to achieve fine-grained, 3D-consistent hand articulation in generated videos. By relying on dense 2D trajectories or implicit pose representations, they collapse crucial geometric structures into spatially ambiguous signals, leading to severe motion inconsistencies and hallucinated artifacts under egocentric occlusions. To address this, we propose leveraging sparse 3D hand joints as explicit control signals with three key advantages: explicit geometry to resolve occlusions, an intuitive interface for interactive editing, and cross-embodiment generalization to robotic hands. Built upon this, our efficient control module extracts occlusion-aware features from the source reference frame by penalizing unreliable visual features from hidden joints, and employs a 3D-based weighting mechanism to handle dynamically occluded target joints during motion propagation. Meanwhile, it directly injects 3D geometric embeddings into the latent space to enforce structural consistency. To facilitate robust training and evaluation, we develop an automated annotation pipeline, yielding 1M high-quality egocentric video clips paired with precise hand trajectories. Experiments demonstrate that our approach outperforms state-of-the-art baselines, generating high-fidelity egocentric videos with realistic hand-object interactions.
Get this paper in your agent:
hf papers read 2603.11755 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 1
Datasets citing this paper 1
cy0307/awesome-egocentric-atlas
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper