68-pt dlib β 478-pt MP mesh (PCA-bottleneck)
Cross-topology bridge: lets users running py-feat's default Detector (mobilefacenet 68-pt landmarks) render the rich 478-vertex MediaPipe FaceMesh without running MPDetector.
68-pt dlib landmarks β 478-pt MP mesh PCA-bottleneck regression (v2)
Linear regression with PCA bottleneck mapping aligned 68-point dlib landmarks (mobilefacenet output of py-feat's Detector) to 478-vertex MediaPipe FaceMesh in pose-canonical (MP-canonical-aligned) space.
Method
- PCA the training-set 478-pt mesh into k=50 components (captures 0.9959 of variance)
- Linear regression: 136-d aligned 68-pt landmarks β 50 PCA scores
- Absorb pca.inverse_transform into (coef, intercept) so inference is single matmul
In a 3-way comparison on the same train/test split (compare_68_to_478.py):
- PCA-bottleneck k=50: RΒ²=0.4789 (selected)
- PLS direct k=136 (full rank): RΒ²=0.4794 (statistical tie)
- TPS warp: catastrophic failure (units mismatch β fixable but skipped)
Training data
- 341,793 frames from 9,937 CelebV-HQ celebrity videos
- 68-pt source: mobilefacenet (
Detector(face_model="img2pose", landmark_model="mobilefacenet")) - 478-pt source: mp_facemesh_v2 (
MPDetector), GPA-aligned to MP canonical reference (cm units) - Inner-joined on (video_id, frame); pose-filtered to |yaw|β€40Β°, |pitch|β€30Β°
- 2D Procrustes + GPA aligned 68-pt landmarks (population-mean reference, 5-iter convergence)
Out-of-sample performance (fold 0 of 5-fold GroupKFold by video_id)
- RΒ² = 0.4789 (variance-weighted across 1,434 dims)
- MAE = 0.1801 cm
For reference: AU β 478-pt mesh model achieves RΒ²=0.244. The 68β478 model nearly 2Γ higher because dlib landmarks share rich spatial information with MP mesh that 20 AU intensities cannot encode.
Inference
import numpy as np
m = np.load("landmarks68_to_mesh478_pca_v2.npz")
# 1. User has raw 68-pt landmarks from Detector (image-space pixels), shape (68, 2)
raw = ... # from fex.landmarks_2d (or extract from x_0..x_67, y_0..y_67 columns)
# 2. Procrustes-align raw landmarks to the saved reference (8 stable upper-face anchors)
# See feat.utils.image_operations or save_landmarks68_to_mesh478_pca.py for fit_similarity_batched_2d
anchor_indices = m["anchor_indices_dlib68"] # [27, 28, 29, 30, 36, 39, 42, 45]
ref_anchors = m["reference_dlib_anchors"] # (8, 2) population-mean dlib anchor positions
aligned = procrustes_align_2d(raw, anchor_indices, ref_anchors) # (68, 2)
# 3. Predict 478-pt MP mesh
flat = np.concatenate([aligned[:, 0], aligned[:, 1]]) @ m["coef"] + m["intercept"]
# 4. Unpack to (478, 3) β IMPORTANT: axis-major layout, NOT interleaved
mesh = np.stack([flat[:478], flat[478:956], flat[956:]], axis=1)
Citation context
Adapts ideas from Cheong et al. 2023 / Py-Feat tutorial 06 (E. Jolly): Cheong's model used PLS for AU β 68 dlib landmarks. Here we extend the linear-regression approach to a cross-topology problem (68 dlib β 478 MP), use PCA-bottleneck for output regularization, and train on 10Γ more data (~340K wild-celebrity frames).
File format
NPZ with:
- coef (136, 1434) float32 β absorbed weights, axis-major layout
- intercept (1434,) float32
- input_columns (136,) str β input feature order: lm_x_0..lm_x_67, lm_y_0..lm_y_67
- reference_dlib_anchors (8, 2) float32 β for Procrustes-aligning raw input
- anchor_indices_dlib68 (8,) int32
- mean_aligned_dlib_landmarks (68, 2) float32 β population mean dlib landmarks (aligned)
- mean_predicted_mesh (478, 3) float32 β population mean predicted mesh (cm units)
- n_pca_components () int32 β = 50
- pca_variance_ratio () float32 β = 0.9959
- model_card () str
- training_metadata () str (JSON)
Loader: np.load(...) β no extra deps.
Applying pose post-hoc (optional)
The model produces a pose-canonical mesh (frontal, MP head-centric cm coords). To align it with the head pose of the original observed image β or to render it at any user-chosen pose β apply a rigid transform after prediction:
import numpy as np
m = np.load("landmarks68_to_mesh478_pca_v2.npz")
# 1) Predict the canonical mesh as before
flat = aligned_lm @ m["coef"] + m["intercept"]
canonical_mesh = np.stack([flat[:478], flat[478:956], flat[956:]], axis=1) # (478, 3)
# 2) Build a 3x3 rotation matrix R from your pose source. With img2pose
# Pitch/Yaw/Roll (radians) returned by py-feat's Detector:
from scipy.spatial.transform import Rotation
R = Rotation.from_euler("xyz", [pitch, yaw, roll]).as_matrix() # (3, 3)
# 3) (Optional) scale s and translation t:
posed_mesh = canonical_mesh @ R.T # (478, 3)
# posed_mesh = s * (canonical_mesh @ R.T) + t # if you want re-projection
# Or, if you have MP's full 4x4 facial_transformation_matrix (M):
# homog = np.concatenate([canonical_mesh, np.ones((478, 1))], axis=1)
# posed = (homog @ M.T)[:, :3]
Convention notes:
- MP canonical y-axis points UP (forehead at +y, chin at -y), x-axis to subject's left, z-axis out of face. Standard right-handed.
Rotation.from_euler("xyz", ...)is intrinsic xyz (R = RxΒ·RyΒ·Rz) β the most common convention. For img2pose Pitch/Yaw/Roll, this matches.- For pure visualization (not re-projecting onto an image), s=1 and t=0 are fine.
- The model is pose-INVARIANT by design: same input always yields the same canonical mesh; pose is decoupled and you control it at render time.

