WiLoR finetuned on POV-Surgery

WiLoR hand-pose estimator finetuned on the POV-Surgery synthetic surgical dataset. Matches / beats the CPCI paper's finetuned-WiLoR baseline (MICCAI 2293).

Results on POV-Surgery full test set (26,418 frames)

Metric	Off-the-shelf	Finetuned (this)	CPCI paper ft-WiLoR
MPJPE (mm)	50.36	10.59	13.72
PA-MPJPE (mm)	10.69	4.36	4.33
PVE (mm)	47.92	10.11	12.91
PA-PVE (mm)	10.01	4.19	4.20
P2D (px)	25.99	29.40	18.48

Training

Base model: wilor_final.ckpt (wilor_mini default)
Data: POV-Surgery official train split (~38k samples, right hand only)
Loss: 3D-joint L1 + 2D-joint L1 + MANO-param MSE (CPCI recipe)
Optimizer: AdamW, LR 5e-6, weight decay 1e-4, fp16-mixed
50 000 steps (~20 epochs), batch size 16
Wall-clock: ~13.2 h on a single RTX PRO 6000 Blackwell Max-Q

Usage

See https://github.com/monk1337/AIM_2_Project (branch feat/finetuning), specifically finetuning/README.md (training) and ood_eval/README.md (OOD evaluation on Aria + AIxSuture).

# Load into the standard WiLoR pipeline
import torch
from wilor_mini.pipelines.wilor_hand_pose3d_estimation_pipeline import (
    WiLorHandPose3dEstimationPipeline,
)

pipe = WiLorHandPose3dEstimationPipeline(device="cuda", dtype=torch.float16)
ckpt = torch.load("last.ckpt", map_location="cpu", weights_only=False)
pipe.wilor_model.load_state_dict(ckpt["state_dict"], strict=False)
pipe.wilor_model.eval().to("cuda", dtype=torch.float16)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support