68-pt dlib → 478-pt MP mesh (PCA-bottleneck)

Cross-topology bridge: lets users running py-feat's default Detector (mobilefacenet 68-pt landmarks) render the rich 478-vertex MediaPipe FaceMesh without running MPDetector.

68-pt dlib landmarks → 478-pt MP mesh PCA-bottleneck regression (v2)

Linear regression with PCA bottleneck mapping aligned 68-point dlib landmarks (mobilefacenet output of py-feat's Detector) to 478-vertex MediaPipe FaceMesh in pose-canonical (MP-canonical-aligned) space.

Method

PCA the training-set 478-pt mesh into k=50 components (captures 0.9959 of variance)
Linear regression: 136-d aligned 68-pt landmarks → 50 PCA scores
Absorb pca.inverse_transform into (coef, intercept) so inference is single matmul

In a 3-way comparison on the same train/test split (compare_68_to_478.py):

PCA-bottleneck k=50: R²=0.4789 (selected)
PLS direct k=136 (full rank): R²=0.4794 (statistical tie)
TPS warp: catastrophic failure (units mismatch — fixable but skipped)

Training data

341,793 frames from 9,937 CelebV-HQ celebrity videos
68-pt source: mobilefacenet (Detector(face_model="img2pose", landmark_model="mobilefacenet"))
478-pt source: mp_facemesh_v2 (MPDetector), GPA-aligned to MP canonical reference (cm units)
Inner-joined on (video_id, frame); pose-filtered to |yaw|≤40°, |pitch|≤30°
2D Procrustes + GPA aligned 68-pt landmarks (population-mean reference, 5-iter convergence)

Out-of-sample performance (fold 0 of 5-fold GroupKFold by video_id)

R² = 0.4789 (variance-weighted across 1,434 dims)
MAE = 0.1801 cm

For reference: AU → 478-pt mesh model achieves R²=0.244. The 68→478 model nearly 2× higher because dlib landmarks share rich spatial information with MP mesh that 20 AU intensities cannot encode.

Inference

import numpy as np
m = np.load("landmarks68_to_mesh478_pca_v2.npz")

# 1. User has raw 68-pt landmarks from Detector (image-space pixels), shape (68, 2)
raw = ...  # from fex.landmarks_2d (or extract from x_0..x_67, y_0..y_67 columns)

# 2. Procrustes-align raw landmarks to the saved reference (8 stable upper-face anchors)
#    See feat.utils.image_operations or save_landmarks68_to_mesh478_pca.py for fit_similarity_batched_2d
anchor_indices = m["anchor_indices_dlib68"]      # [27, 28, 29, 30, 36, 39, 42, 45]
ref_anchors = m["reference_dlib_anchors"]        # (8, 2) population-mean dlib anchor positions
aligned = procrustes_align_2d(raw, anchor_indices, ref_anchors)   # (68, 2)

# 3. Predict 478-pt MP mesh
flat = np.concatenate([aligned[:, 0], aligned[:, 1]]) @ m["coef"] + m["intercept"]
# 4. Unpack to (478, 3) — IMPORTANT: axis-major layout, NOT interleaved
mesh = np.stack([flat[:478], flat[478:956], flat[956:]], axis=1)

Citation context

Adapts ideas from Cheong et al. 2023 / Py-Feat tutorial 06 (E. Jolly): Cheong's model used PLS for AU → 68 dlib landmarks. Here we extend the linear-regression approach to a cross-topology problem (68 dlib → 478 MP), use PCA-bottleneck for output regularization, and train on 10× more data (~340K wild-celebrity frames).

File format

NPZ with:

coef (136, 1434) float32 — absorbed weights, axis-major layout
intercept (1434,) float32
input_columns (136,) str — input feature order: lm_x_0..lm_x_67, lm_y_0..lm_y_67
reference_dlib_anchors (8, 2) float32 — for Procrustes-aligning raw input
anchor_indices_dlib68 (8,) int32
mean_aligned_dlib_landmarks (68, 2) float32 — population mean dlib landmarks (aligned)
mean_predicted_mesh (478, 3) float32 — population mean predicted mesh (cm units)
n_pca_components () int32 — = 50
pca_variance_ratio () float32 — = 0.9959
model_card () str
training_metadata () str (JSON)

Loader: np.load(...) — no extra deps.

Applying pose post-hoc (optional)

The model produces a pose-canonical mesh (frontal, MP head-centric cm coords). To align it with the head pose of the original observed image — or to render it at any user-chosen pose — apply a rigid transform after prediction:

import numpy as np
m = np.load("landmarks68_to_mesh478_pca_v2.npz")

# 1) Predict the canonical mesh as before
flat = aligned_lm @ m["coef"] + m["intercept"]
canonical_mesh = np.stack([flat[:478], flat[478:956], flat[956:]], axis=1)   # (478, 3)

# 2) Build a 3x3 rotation matrix R from your pose source. With img2pose
#    Pitch/Yaw/Roll (radians) returned by py-feat's Detector:
from scipy.spatial.transform import Rotation
R = Rotation.from_euler("xyz", [pitch, yaw, roll]).as_matrix()    # (3, 3)

# 3) (Optional) scale s and translation t:
posed_mesh = canonical_mesh @ R.T                                  # (478, 3)
# posed_mesh = s * (canonical_mesh @ R.T) + t                       # if you want re-projection

# Or, if you have MP's full 4x4 facial_transformation_matrix (M):
#   homog = np.concatenate([canonical_mesh, np.ones((478, 1))], axis=1)
#   posed = (homog @ M.T)[:, :3]

Convention notes:

MP canonical y-axis points UP (forehead at +y, chin at -y), x-axis to subject's left, z-axis out of face. Standard right-handed.
Rotation.from_euler("xyz", ...) is intrinsic xyz (R = Rx·Ry·Rz) — the most common convention. For img2pose Pitch/Yaw/Roll, this matches.
For pure visualization (not re-projecting onto an image), s=1 and t=0 are fine.
The model is pose-INVARIANT by design: same input always yields the same canonical mesh; pose is decoupled and you control it at render time.

Figures

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support