Upload folder using huggingface_hub

db19582 verified 29 days ago

6.75 kB

	---
	license: mit
	library_name: py-feat
	tags:
	- face-action-units
	- facs
	- facial-landmarks
	- regression
	- pls
	- mediapipe
	- py-feat
	---

	# AU → MP-mesh PLS (visualization)

	> Predicts the 478-vertex MediaPipe FaceMesh deformation from 20 AU intensities. Trained with pose covariates and pose×AU interactions for cleaner AU coefficients; deploy as 23-d input (AU + pose), pose=0 at inference for the standard canonical-frontal viz.

	# AU+pose -> MP mesh PLS (v2)

	Linear PLS regression mapping 20 FACS AU intensities + 3 pose covariates
	(Pitch, Yaw, Roll) to 478×3 = 1434 MediaPipe FaceMesh vertex coordinates in a
	pose-canonical frame. Used to visualize how the MP face mesh deforms as AU
	sliders move.

	## Training data
	- 344,418 frames from 9,978 CelebV-HQ celebrity videos
	- Mesh source: MPDetector mp_facemesh_v2 (478-vertex MediaPipe topology)
	- AU source: Detector with img2pose face + xgb AU on the same frames
	- Pose source: MPDetector facial_transformation_matrices (Pitch, Yaw, Roll, radians)
	- Pose-filtered to \|yaw\| <= 40°, \|pitch\| <= 30°
	- Per-frame Umeyama similarity Procrustes alignment to a reference template
	using 12 stable upper-face anchors (forehead 10/9/8/151, nose bridge 6/168/197/195,
	outer canthi 33/263, inner canthi 133/362). Removes (R, s, t).
	- Top 1% of frames by max anchor residual dropped (alignment outliers).
	- IMPORTANT: NO per-subject neutral subtraction — predicting absolute aligned
	coords directly is the Cheong / Py-Feat-tutorial-06 recipe. Per-subject
	neutral subtraction was tested and capped R² at 0.083; switching to absolute
	coords raised R² to 0.143 (no-pose) and 0.244 (with pose).

	## Method
	- PLSRegression(n_components=83, scale=True), Cheong style
	- Inputs: [20 AU \| 3 pose (Pitch, Yaw, Roll)] = 23 features
	- Outputs: 1434-d absolute pose-canonical mesh coords (image-space pixels)

	## Out-of-sample performance (3-fold GroupKFold by video_id)
	- Variance-weighted R² = 0.2443 ± 0.0034 across 1434 dims
	- Per-fold R² = [0.2395, 0.2472, 0.2463]
	- MAE = 0.223 px (canonical-frame image space)

	R² is modest by absolute standards because AU intensities can't fully describe
	1434-d mesh deformation (continuous AU intensities from xgb are noisier than
	human FACS coding; many micro-expressions aren't captured by 20 AUs). For
	visualization, qualitative AU-direction correctness matters more than R².

	## Inference

	```python
	import numpy as np
	m = np.load("au_to_mesh_pls_v2.npz")
	au = np.zeros(20); au[m["au_columns"].tolist().index("AU12")] = 1.0 # smile
	pose = np.zeros(3) # [pitch, yaw, roll]
	x = np.concatenate([au, pose]) # (23,)
	flat = x @ m["coef"] + m["intercept"] # (1434,)
	# IMPORTANT: layout is axis-major [all x \| all y \| all z], NOT interleaved
	mesh = np.stack([flat[:478], flat[478:956], flat[956:]], axis=1) # (478, 3)
	# Render with mediapipe.solutions.face_mesh.FACEMESH_TESSELATION
	# Optionally apply user-chosen rigid (R, s, t) post-hoc.
	```

	The NPZ also includes:
	- `mean_aligned_mesh` (478, 3) — population mean canonical mesh ("AU=0" face)
	- `mean_low_au_mesh` (478, 3) — mean of low-AU-sum frames (cleaner neutral)

	## Citation context
	Adapts Cheong et al. 2023 / Py-Feat tutorial 06 (E. Jolly): affine-aligned 68
	dlib landmarks + pose -> 20 AUs via PLS. Direction inverted (AU+pose -> mesh)
	and scaled to MP's 478-vertex mesh on 10x larger wild-celebrity data.

	## File format
	NPZ with:
	- coef (23, 1434) float32 — linear weights, rows match input_columns
	- intercept (1434,) float32 — bias
	- input_columns (23,) str — input feature order: AU01..AU43, Pitch, Yaw, Roll
	- au_columns (20,) str — convenience: just the AU subset
	- pose_columns (3,) str — convenience: just the pose subset
	- mean_aligned_mesh (478, 3) float32 — population mean (viz default)
	- mean_low_au_mesh (478, 3) float32 — low-AU-frame mean (cleaner neutral)
	- reference_anchors (12, 3) float32 — Procrustes reference
	- anchor_indices (12,) int32 — MP indices used as anchors
	- n_components () int32
	- model_card () str
	- training_metadata () str (JSON)

	Loader: np.load("au_to_mesh_pls_v2.npz") — no extra deps needed.


	## Applying pose post-hoc (optional)

	The model is trained with pose covariates as input but at inference users
	typically pass `pose=0` to get the canonical (frontal) deformation. To render
	the result at any chosen head pose, apply a rigid transform AFTER prediction:

	```python
	import numpy as np
	from scipy.spatial.transform import Rotation
	m = np.load("au_to_mesh_pls_v2.npz")

	# 1) Predict canonical mesh from AU vector at pose=0:
	au = np.zeros(20); au[m["au_columns"].tolist().index("AU12")] = 1.0
	x = np.concatenate([au, np.zeros(3)]) # (23,)
	flat = x @ m["coef"] + m["intercept"] # (1434,)
	# IMPORTANT: layout is axis-major [all x \| all y \| all z], NOT interleaved
	canonical_mesh = np.stack([flat[:478], flat[478:956], flat[956:]], axis=1) # (478, 3)

	# 2) Apply user-chosen rigid pose:
	R = Rotation.from_euler("xyz", [pitch, yaw, roll]).as_matrix() # (3, 3)
	posed_mesh = canonical_mesh @ R.T # (478, 3)
	# Or for re-projection onto image: s * (canonical @ R.T) + t
	```

	Two ways to control pose at inference:
	- Render-time rotation (recommended for clean separation): set `pose=0` in
	the input vector, then rotate the predicted mesh post-hoc as above.
	- Pose-conditioned prediction (if you want the model's residual pose-correlated
	bias): pass non-zero pose values directly into the input vector. The model was
	trained with pose+pose×AU interactions, so non-zero pose changes the prediction
	in addition to the rigid rotation needed at render time.

	For most use cases, render-time rotation is cleaner — the model's AU-driven
	deformation stays canonical and pose is purely a viewing transform.

	Convention notes:
	- MP canonical: y-axis UP, x-axis to subject's left, z-axis out of face.
	Standard right-handed.
	- `Rotation.from_euler("xyz", ...)` is intrinsic xyz (R = Rx·Ry·Rz) — matches
	img2pose Pitch/Yaw/Roll output.
	- If you have MP's full 4x4 `facial_transformation_matrix` (M), apply via
	`posed = (np.concatenate([canonical, np.ones((478, 1))], axis=1) @ M.T)[:, :3]`.

	## Figures

	![au_solid_panels.png](./au_solid_panels.png)

	![au_solid_neutral_vs_activated.png](./au_solid_neutral_vs_activated.png)

	![au_compare_overlay.png](./au_compare_overlay.png)

	![au_effect_maps.png](./au_effect_maps.png)