68-pt dlib β†’ 478-pt MP mesh (PCA-bottleneck)

Cross-topology bridge: lets users running py-feat's default Detector (mobilefacenet 68-pt landmarks) render the rich 478-vertex MediaPipe FaceMesh without running MPDetector.

68-pt dlib landmarks β†’ 478-pt MP mesh PCA-bottleneck regression (v2)

Linear regression with PCA bottleneck mapping aligned 68-point dlib landmarks (mobilefacenet output of py-feat's Detector) to 478-vertex MediaPipe FaceMesh in pose-canonical (MP-canonical-aligned) space.

Method

  • PCA the training-set 478-pt mesh into k=50 components (captures 0.9959 of variance)
  • Linear regression: 136-d aligned 68-pt landmarks β†’ 50 PCA scores
  • Absorb pca.inverse_transform into (coef, intercept) so inference is single matmul

In a 3-way comparison on the same train/test split (compare_68_to_478.py):

  • PCA-bottleneck k=50: RΒ²=0.4789 (selected)
  • PLS direct k=136 (full rank): RΒ²=0.4794 (statistical tie)
  • TPS warp: catastrophic failure (units mismatch β€” fixable but skipped)

Training data

  • 341,793 frames from 9,937 CelebV-HQ celebrity videos
  • 68-pt source: mobilefacenet (Detector(face_model="img2pose", landmark_model="mobilefacenet"))
  • 478-pt source: mp_facemesh_v2 (MPDetector), GPA-aligned to MP canonical reference (cm units)
  • Inner-joined on (video_id, frame); pose-filtered to |yaw|≀40Β°, |pitch|≀30Β°
  • 2D Procrustes + GPA aligned 68-pt landmarks (population-mean reference, 5-iter convergence)

Out-of-sample performance (fold 0 of 5-fold GroupKFold by video_id)

  • RΒ² = 0.4789 (variance-weighted across 1,434 dims)
  • MAE = 0.1801 cm

For reference: AU β†’ 478-pt mesh model achieves RΒ²=0.244. The 68β†’478 model nearly 2Γ— higher because dlib landmarks share rich spatial information with MP mesh that 20 AU intensities cannot encode.

Inference

import numpy as np
m = np.load("landmarks68_to_mesh478_pca_v2.npz")

# 1. User has raw 68-pt landmarks from Detector (image-space pixels), shape (68, 2)
raw = ...  # from fex.landmarks_2d (or extract from x_0..x_67, y_0..y_67 columns)

# 2. Procrustes-align raw landmarks to the saved reference (8 stable upper-face anchors)
#    See feat.utils.image_operations or save_landmarks68_to_mesh478_pca.py for fit_similarity_batched_2d
anchor_indices = m["anchor_indices_dlib68"]      # [27, 28, 29, 30, 36, 39, 42, 45]
ref_anchors = m["reference_dlib_anchors"]        # (8, 2) population-mean dlib anchor positions
aligned = procrustes_align_2d(raw, anchor_indices, ref_anchors)   # (68, 2)

# 3. Predict 478-pt MP mesh
flat = np.concatenate([aligned[:, 0], aligned[:, 1]]) @ m["coef"] + m["intercept"]
# 4. Unpack to (478, 3) β€” IMPORTANT: axis-major layout, NOT interleaved
mesh = np.stack([flat[:478], flat[478:956], flat[956:]], axis=1)

Citation context

Adapts ideas from Cheong et al. 2023 / Py-Feat tutorial 06 (E. Jolly): Cheong's model used PLS for AU β†’ 68 dlib landmarks. Here we extend the linear-regression approach to a cross-topology problem (68 dlib β†’ 478 MP), use PCA-bottleneck for output regularization, and train on 10Γ— more data (~340K wild-celebrity frames).

File format

NPZ with:

  • coef (136, 1434) float32 β€” absorbed weights, axis-major layout
  • intercept (1434,) float32
  • input_columns (136,) str β€” input feature order: lm_x_0..lm_x_67, lm_y_0..lm_y_67
  • reference_dlib_anchors (8, 2) float32 β€” for Procrustes-aligning raw input
  • anchor_indices_dlib68 (8,) int32
  • mean_aligned_dlib_landmarks (68, 2) float32 β€” population mean dlib landmarks (aligned)
  • mean_predicted_mesh (478, 3) float32 β€” population mean predicted mesh (cm units)
  • n_pca_components () int32 β€” = 50
  • pca_variance_ratio () float32 β€” = 0.9959
  • model_card () str
  • training_metadata () str (JSON)

Loader: np.load(...) β€” no extra deps.

Applying pose post-hoc (optional)

The model produces a pose-canonical mesh (frontal, MP head-centric cm coords). To align it with the head pose of the original observed image β€” or to render it at any user-chosen pose β€” apply a rigid transform after prediction:

import numpy as np
m = np.load("landmarks68_to_mesh478_pca_v2.npz")

# 1) Predict the canonical mesh as before
flat = aligned_lm @ m["coef"] + m["intercept"]
canonical_mesh = np.stack([flat[:478], flat[478:956], flat[956:]], axis=1)   # (478, 3)

# 2) Build a 3x3 rotation matrix R from your pose source. With img2pose
#    Pitch/Yaw/Roll (radians) returned by py-feat's Detector:
from scipy.spatial.transform import Rotation
R = Rotation.from_euler("xyz", [pitch, yaw, roll]).as_matrix()    # (3, 3)

# 3) (Optional) scale s and translation t:
posed_mesh = canonical_mesh @ R.T                                  # (478, 3)
# posed_mesh = s * (canonical_mesh @ R.T) + t                       # if you want re-projection

# Or, if you have MP's full 4x4 facial_transformation_matrix (M):
#   homog = np.concatenate([canonical_mesh, np.ones((478, 1))], axis=1)
#   posed = (homog @ M.T)[:, :3]

Convention notes:

  • MP canonical y-axis points UP (forehead at +y, chin at -y), x-axis to subject's left, z-axis out of face. Standard right-handed.
  • Rotation.from_euler("xyz", ...) is intrinsic xyz (R = RxΒ·RyΒ·Rz) β€” the most common convention. For img2pose Pitch/Yaw/Roll, this matches.
  • For pure visualization (not re-projecting onto an image), s=1 and t=0 are fine.
  • The model is pose-INVARIANT by design: same input always yields the same canonical mesh; pose is decoupled and you control it at render time.

Figures

examples.png

per_vertex_error.png

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support