Papers
arxiv:2605.12498

EgoForce: Forearm-Guided Camera-Space 3D Hand Pose from a Monocular Egocentric Camera

Published on May 12
· Submitted by
Christen Milller
on May 13
Authors:
,
,
,
,

Abstract

EgoForce is a monocular 3D hand reconstruction framework that uses a unified network to recover robust, absolute hand pose and position across different camera models through differentiable forearm representation, arm-hand transformers, and ray space solvers.

AI-generated summary

Reconstructing the absolute 3D pose and shape of the hands from the user's viewpoint using a single head-mounted camera is crucial for practical egocentric interaction in AR/VR, telepresence, and hand-centric manipulation tasks, where sensing must remain compact and unobtrusive. While monocular RGB methods have made progress, they remain constrained by depth-scale ambiguity and struggle to generalize across the diverse optical configurations of head-mounted devices. As a result, models typically require extensive training on device-specific datasets, which are costly and laborious to acquire. This paper addresses these challenges by introducing EgoForce, a monocular 3D hand reconstruction framework that recovers robust, absolute 3D hand pose and its position from the user's (camera-space) viewpoint. EgoForce operates across fisheye, perspective, and distorted wide-FOV camera models using a single unified network. Our approach combines a differentiable forearm representation that stabilizes hand pose, a unified arm-hand transformer that predicts both hand and forearm geometry from a single egocentric view, mitigating depth-scale ambiguity, and a ray space closed-form solver that enables absolute 3D pose recovery across diverse head-mounted camera models. Experiments on three egocentric benchmarks show that EgoForce achieves state-of-the-art 3D accuracy, reducing camera-space MPJPE by up to 28% on the HOT3D dataset compared to prior methods and maintaining consistent performance across camera configurations. For more details, visit the project page at https://dfki-av.github.io/EgoForce.

Community

Paper author Paper submitter

Absolute 3D hand pose and shape reconstruction from a single head-mounted camera is essential for smart-glasses-based AR, telepresence, and hand-centric manipulation. However, monocular RGB methods suffer from depth–scale ambiguity and poor generalization across diverse head-mounted camera models, often requiring costly device-specific training data. We introduce EgoForce, a unified monocular 3D hand reconstruction framework that recovers robust camera-space hand pose and position across fisheye, perspective, and distorted wide-FOV cameras. EgoForce combines a differentiable forearm representation, a unified arm–hand transformer, and a ray-space closed-form solver to stabilize pose estimation and resolve absolute 3D geometry across camera models.

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.12498
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 1

Datasets citing this paper 1

Spaces citing this paper 1

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.