Instructions to use py-feat/au_to_mesh with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Py-Feat
How to use py-feat/au_to_mesh with Py-Feat:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
| license: mit | |
| library_name: py-feat | |
| tags: | |
| - face-action-units | |
| - facs | |
| - facial-landmarks | |
| - regression | |
| - pls | |
| - mediapipe | |
| - py-feat | |
| # AU → MP-mesh PLS (visualization) | |
| > Predicts the 478-vertex MediaPipe FaceMesh deformation from 20 AU intensities. Trained with pose covariates and pose×AU interactions for cleaner AU coefficients; deploy as 23-d input (AU + pose), pose=0 at inference for the standard canonical-frontal viz. | |
| # AU+pose -> MP mesh PLS (v2) | |
| Linear PLS regression mapping 20 FACS AU intensities + 3 pose covariates | |
| (Pitch, Yaw, Roll) to 478×3 = 1434 MediaPipe FaceMesh vertex coordinates in a | |
| pose-canonical frame. Used to visualize how the MP face mesh deforms as AU | |
| sliders move. | |
| ## Training data | |
| - 344,418 frames from 9,978 CelebV-HQ celebrity videos | |
| - Mesh source: MPDetector mp_facemesh_v2 (478-vertex MediaPipe topology) | |
| - AU source: Detector with img2pose face + xgb AU on the same frames | |
| - Pose source: MPDetector facial_transformation_matrices (Pitch, Yaw, Roll, radians) | |
| - Pose-filtered to |yaw| <= 40°, |pitch| <= 30° | |
| - Per-frame Umeyama similarity Procrustes alignment to a reference template | |
| using 12 stable upper-face anchors (forehead 10/9/8/151, nose bridge 6/168/197/195, | |
| outer canthi 33/263, inner canthi 133/362). Removes (R, s, t). | |
| - Top 1% of frames by max anchor residual dropped (alignment outliers). | |
| - IMPORTANT: NO per-subject neutral subtraction — predicting absolute aligned | |
| coords directly is the Cheong / Py-Feat-tutorial-06 recipe. Per-subject | |
| neutral subtraction was tested and capped R² at 0.083; switching to absolute | |
| coords raised R² to 0.143 (no-pose) and 0.244 (with pose). | |
| ## Method | |
| - PLSRegression(n_components=83, scale=True), Cheong style | |
| - Inputs: [20 AU | 3 pose (Pitch, Yaw, Roll)] = 23 features | |
| - Outputs: 1434-d absolute pose-canonical mesh coords (image-space pixels) | |
| ## Out-of-sample performance (3-fold GroupKFold by video_id) | |
| - **Variance-weighted R² = 0.2443 ± 0.0034** across 1434 dims | |
| - Per-fold R² = [0.2395, 0.2472, 0.2463] | |
| - MAE = 0.223 px (canonical-frame image space) | |
| R² is modest by absolute standards because AU intensities can't fully describe | |
| 1434-d mesh deformation (continuous AU intensities from xgb are noisier than | |
| human FACS coding; many micro-expressions aren't captured by 20 AUs). For | |
| visualization, qualitative AU-direction correctness matters more than R². | |
| ## Inference | |
| ```python | |
| import numpy as np | |
| m = np.load("au_to_mesh_pls_v2.npz") | |
| au = np.zeros(20); au[m["au_columns"].tolist().index("AU12")] = 1.0 # smile | |
| pose = np.zeros(3) # [pitch, yaw, roll] | |
| x = np.concatenate([au, pose]) # (23,) | |
| flat = x @ m["coef"] + m["intercept"] # (1434,) | |
| # IMPORTANT: layout is axis-major [all x | all y | all z], NOT interleaved | |
| mesh = np.stack([flat[:478], flat[478:956], flat[956:]], axis=1) # (478, 3) | |
| # Render with mediapipe.solutions.face_mesh.FACEMESH_TESSELATION | |
| # Optionally apply user-chosen rigid (R, s, t) post-hoc. | |
| ``` | |
| The NPZ also includes: | |
| - `mean_aligned_mesh` (478, 3) — population mean canonical mesh ("AU=0" face) | |
| - `mean_low_au_mesh` (478, 3) — mean of low-AU-sum frames (cleaner neutral) | |
| ## Citation context | |
| Adapts Cheong et al. 2023 / Py-Feat tutorial 06 (E. Jolly): affine-aligned 68 | |
| dlib landmarks + pose -> 20 AUs via PLS. Direction inverted (AU+pose -> mesh) | |
| and scaled to MP's 478-vertex mesh on 10x larger wild-celebrity data. | |
| ## File format | |
| NPZ with: | |
| - coef (23, 1434) float32 — linear weights, rows match input_columns | |
| - intercept (1434,) float32 — bias | |
| - input_columns (23,) str — input feature order: AU01..AU43, Pitch, Yaw, Roll | |
| - au_columns (20,) str — convenience: just the AU subset | |
| - pose_columns (3,) str — convenience: just the pose subset | |
| - mean_aligned_mesh (478, 3) float32 — population mean (viz default) | |
| - mean_low_au_mesh (478, 3) float32 — low-AU-frame mean (cleaner neutral) | |
| - reference_anchors (12, 3) float32 — Procrustes reference | |
| - anchor_indices (12,) int32 — MP indices used as anchors | |
| - n_components () int32 | |
| - model_card () str | |
| - training_metadata () str (JSON) | |
| Loader: np.load("au_to_mesh_pls_v2.npz") — no extra deps needed. | |
| ## Applying pose post-hoc (optional) | |
| The model is trained with pose covariates as input but at inference users | |
| typically pass `pose=0` to get the canonical (frontal) deformation. To render | |
| the result at any chosen head pose, apply a rigid transform AFTER prediction: | |
| ```python | |
| import numpy as np | |
| from scipy.spatial.transform import Rotation | |
| m = np.load("au_to_mesh_pls_v2.npz") | |
| # 1) Predict canonical mesh from AU vector at pose=0: | |
| au = np.zeros(20); au[m["au_columns"].tolist().index("AU12")] = 1.0 | |
| x = np.concatenate([au, np.zeros(3)]) # (23,) | |
| flat = x @ m["coef"] + m["intercept"] # (1434,) | |
| # IMPORTANT: layout is axis-major [all x | all y | all z], NOT interleaved | |
| canonical_mesh = np.stack([flat[:478], flat[478:956], flat[956:]], axis=1) # (478, 3) | |
| # 2) Apply user-chosen rigid pose: | |
| R = Rotation.from_euler("xyz", [pitch, yaw, roll]).as_matrix() # (3, 3) | |
| posed_mesh = canonical_mesh @ R.T # (478, 3) | |
| # Or for re-projection onto image: s * (canonical @ R.T) + t | |
| ``` | |
| **Two ways to control pose at inference:** | |
| - **Render-time rotation** (recommended for clean separation): set `pose=0` in | |
| the input vector, then rotate the predicted mesh post-hoc as above. | |
| - **Pose-conditioned prediction** (if you want the model's residual pose-correlated | |
| bias): pass non-zero pose values directly into the input vector. The model was | |
| trained with pose+pose×AU interactions, so non-zero pose changes the prediction | |
| in addition to the rigid rotation needed at render time. | |
| For most use cases, render-time rotation is cleaner — the model's AU-driven | |
| deformation stays canonical and pose is purely a viewing transform. | |
| **Convention notes**: | |
| - MP canonical: y-axis UP, x-axis to subject's left, z-axis out of face. | |
| Standard right-handed. | |
| - `Rotation.from_euler("xyz", ...)` is intrinsic xyz (R = Rx·Ry·Rz) — matches | |
| img2pose Pitch/Yaw/Roll output. | |
| - If you have MP's full 4x4 `facial_transformation_matrix` (M), apply via | |
| `posed = (np.concatenate([canonical, np.ones((478, 1))], axis=1) @ M.T)[:, :3]`. | |
| ## Figures | |
|  | |
|  | |
|  | |
|  | |