AU → 68-pt dlib landmarks (py-feat visualization model)
This repository hosts the AU → 68-point dlib facial-landmark visualization
model used by py-feat's feat.plotting.predict() / load_viz_model() to
render schematic facial expressions from action-unit (AU) intensity vectors.
Two versions are provided. The v2 PLS model is the default; v1 is preserved for reproducibility / rollback.
v2 (default, recommended) — au_to_landmarks_pls_v2.npz
Cheong / Py-Feat-tutorial-06 style: linear PLS regression mapping 20 FACS AU intensities to 68 mobilefacenet dlib landmarks (136-d output, 68 × 2). Re-trained on 27× more data than v1 with stricter preprocessing.
Training data
- 350,568 frames from ~10,000 CelebV-HQ celebrity videos
- Landmark source:
Detector(landmark_model="mobilefacenet")(the default Detector landmark model) - AU source:
Detector(face_model="img2pose", au_model="xgb")(the default AU model) - Pose source: img2pose 6DoF (Pitch, Yaw, Roll, radians)
- Pose-filtered to |yaw| ≤ 40°, |pitch| ≤ 30°
- 2D Procrustes + GPA aligned 68-pt landmarks (8-anchor, population-mean reference)
- NO per-subject neutral subtraction (population mean shape IS the AU=0 target)
Method
PLSRegression(n_components=20, scale=True)— full input rank- Training inputs: 20 AU + 3 pose + 60 pose×AU = 83 features
- Deployed inputs: 23 (AU + pose); pose×AU vanish at pose=0 by construction
- No pairwise AU interactions (tested and degraded OOS R²)
Out-of-sample performance (3-fold GroupKFold by video_id)
- R² = 0.794 ± 0.004 (variance-weighted across 136 dims)
- MAE = 3.59 px
For comparison, v1 (Cheong et al. 2023) was trained on ~13K class-balanced rows from EmotioNet/DISFA/BP4D and reports R² ≈ 0.4–0.5.
Inference
import numpy as np
m = np.load("au_to_landmarks_pls_v2.npz")
au = np.zeros(20); au[m["au_columns"].tolist().index("AU12")] = 1.0 # smile
pose = np.zeros(3) # pitch, yaw, roll
x = np.concatenate([au, pose]) # (23,)
flat = x @ m["coef"] + m["intercept"] # (136,)
# IMPORTANT: layout is axis-major [all x | all y], NOT interleaved
landmarks = np.stack([flat[:68], flat[68:]], axis=1) # (68, 2)
NPZ also includes:
mean_aligned_landmarks(68, 2) — population mean canonical landmarksmean_low_au_landmarks(68, 2) — mean of low-AU-sum frames (cleaner neutral)reference_anchors(8, 2) for input alignment at inferencemodel_card,training_metadata
v1 (legacy) — pyfeat_aus_to_landmarks.{joblib,h5}
Original Cheong et al. 2023 / Py-Feat tutorial-06 model. Trained on EmotioNet + DISFA + BP4D class-balanced subsample (~13K rows), affine registration to a single neutral template, 20-component PLSRegression.
Preserved here for reproducibility and to allow rollback. Loadable via
feat.plotting.load_viz_model("pyfeat_aus_to_landmarks") if you want the
original predictions.
See py-feat tutorial 06 for the original training recipe.
File summary
| File | Version | Format | Size | Status |
|---|---|---|---|---|
au_to_landmarks_pls_v2.npz |
v2 | NPZ | 20 KB | default; loaded automatically by py-feat |
pyfeat_aus_to_landmarks.joblib |
v1 | sklearn pickle | 24 MB | legacy fallback |
pyfeat_aus_to_landmarks.h5 |
v1 | HDF5 metadata | 20 MB | legacy fallback (paired with .joblib) |
Figures
Citation
If you use these models, please cite py-feat and Cheong et al. 2023 (for the original v1 method, which v2 extends):
Cheong, J. H., Jolly, E., et al. (2023). Py-Feat: Python Facial Expression Analysis Toolbox. Affective Science. https://doi.org/10.1007/s42761-023-00191-4



