WiLoR finetuned on POV-Surgery

WiLoR hand-pose estimator finetuned on the POV-Surgery synthetic surgical dataset. Matches / beats the CPCI paper's finetuned-WiLoR baseline (MICCAI 2293).

Results on POV-Surgery full test set (26,418 frames)

Metric Off-the-shelf Finetuned (this) CPCI paper ft-WiLoR
MPJPE (mm) 50.36 10.59 13.72
PA-MPJPE (mm) 10.69 4.36 4.33
PVE (mm) 47.92 10.11 12.91
PA-PVE (mm) 10.01 4.19 4.20
P2D (px) 25.99 29.40 18.48

Training

  • Base model: wilor_final.ckpt (wilor_mini default)
  • Data: POV-Surgery official train split (~38k samples, right hand only)
  • Loss: 3D-joint L1 + 2D-joint L1 + MANO-param MSE (CPCI recipe)
  • Optimizer: AdamW, LR 5e-6, weight decay 1e-4, fp16-mixed
  • 50 000 steps (~20 epochs), batch size 16
  • Wall-clock: ~13.2 h on a single RTX PRO 6000 Blackwell Max-Q

Usage

See https://github.com/monk1337/AIM_2_Project (branch feat/finetuning), specifically finetuning/README.md (training) and ood_eval/README.md (OOD evaluation on Aria + AIxSuture).

# Load into the standard WiLoR pipeline
import torch
from wilor_mini.pipelines.wilor_hand_pose3d_estimation_pipeline import (
    WiLorHandPose3dEstimationPipeline,
)

pipe = WiLorHandPose3dEstimationPipeline(device="cuda", dtype=torch.float16)
ckpt = torch.load("last.ckpt", map_location="cpu", weights_only=False)
pipe.wilor_model.load_state_dict(ckpt["state_dict"], strict=False)
pipe.wilor_model.eval().to("cuda", dtype=torch.float16)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support