arxiv:2605.12498

EgoForce: Forearm-Guided Camera-Space 3D Hand Pose from a Monocular Egocentric Camera

Published on May 12

· Submitted by

Christen Milller on May 13

Augmented Vision

Upvote

Authors:

Christen Millerdurai ,

Abstract

EgoForce is a monocular 3D hand reconstruction framework that uses a unified network to recover robust, absolute hand pose and position across different camera models through differentiable forearm representation, arm-hand transformers, and ray space solvers.

AI-generated summary

Reconstructing the absolute 3D pose and shape of the hands from the user's viewpoint using a single head-mounted camera is crucial for practical egocentric interaction in AR/VR, telepresence, and hand-centric manipulation tasks, where sensing must remain compact and unobtrusive. While monocular RGB methods have made progress, they remain constrained by depth-scale ambiguity and struggle to generalize across the diverse optical configurations of head-mounted devices. As a result, models typically require extensive training on device-specific datasets, which are costly and laborious to acquire. This paper addresses these challenges by introducing EgoForce, a monocular 3D hand reconstruction framework that recovers robust, absolute 3D hand pose and its position from the user's (camera-space) viewpoint. EgoForce operates across fisheye, perspective, and distorted wide-FOV camera models using a single unified network. Our approach combines a differentiable forearm representation that stabilizes hand pose, a unified arm-hand transformer that predicts both hand and forearm geometry from a single egocentric view, mitigating depth-scale ambiguity, and a ray space closed-form solver that enables absolute 3D pose recovery across diverse head-mounted camera models. Experiments on three egocentric benchmarks show that EgoForce achieves state-of-the-art 3D accuracy, reducing camera-space MPJPE by up to 28% on the HOT3D dataset compared to prior methods and maintaining consistent performance across camera configurations. For more details, visit the project page at https://dfki-av.github.io/EgoForce.

View arXiv page View PDF Project page GitHub 6 Add to collection

Community

chris10

Paper author Paper submitter about 21 hours ago

Absolute 3D hand pose and shape reconstruction from a single head-mounted camera is essential for smart-glasses-based AR, telepresence, and hand-centric manipulation. However, monocular RGB methods suffer from depth–scale ambiguity and poor generalization across diverse head-mounted camera models, often requiring costly device-specific training data. We introduce EgoForce, a unified monocular 3D hand reconstruction framework that recovers robust camera-space hand pose and position across fisheye, perspective, and distorted wide-FOV cameras. EgoForce combines a differentiable forearm representation, a unified arm–hand transformer, and a ray-space closed-form solver to stabilize pose estimation and resolve absolute 3D geometry across camera models.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.12498

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 1

Datasets citing this paper 1

Spaces citing this paper 1

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.