WiLoR finetuned on POV-Surgery
WiLoR hand-pose estimator finetuned on the POV-Surgery synthetic surgical dataset. Matches / beats the CPCI paper's finetuned-WiLoR baseline (MICCAI 2293).
Results on POV-Surgery full test set (26,418 frames)
| Metric | Off-the-shelf | Finetuned (this) | CPCI paper ft-WiLoR |
|---|---|---|---|
| MPJPE (mm) | 50.36 | 10.59 | 13.72 |
| PA-MPJPE (mm) | 10.69 | 4.36 | 4.33 |
| PVE (mm) | 47.92 | 10.11 | 12.91 |
| PA-PVE (mm) | 10.01 | 4.19 | 4.20 |
| P2D (px) | 25.99 | 29.40 | 18.48 |
Training
- Base model:
wilor_final.ckpt(wilor_mini default) - Data: POV-Surgery official train split (~38k samples, right hand only)
- Loss: 3D-joint L1 + 2D-joint L1 + MANO-param MSE (CPCI recipe)
- Optimizer: AdamW, LR 5e-6, weight decay 1e-4, fp16-mixed
- 50 000 steps (~20 epochs), batch size 16
- Wall-clock: ~13.2 h on a single RTX PRO 6000 Blackwell Max-Q
Usage
See https://github.com/monk1337/AIM_2_Project (branch feat/finetuning),
specifically finetuning/README.md (training) and ood_eval/README.md
(OOD evaluation on Aria + AIxSuture).
# Load into the standard WiLoR pipeline
import torch
from wilor_mini.pipelines.wilor_hand_pose3d_estimation_pipeline import (
WiLorHandPose3dEstimationPipeline,
)
pipe = WiLorHandPose3dEstimationPipeline(device="cuda", dtype=torch.float16)
ckpt = torch.load("last.ckpt", map_location="cpu", weights_only=False)
pipe.wilor_model.load_state_dict(ckpt["state_dict"], strict=False)
pipe.wilor_model.eval().to("cuda", dtype=torch.float16)
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support