py-feat
/

bs_to_au

face-action-units

facial-landmarks

Model card Files Files and versions

bs_to_au / README.md

ljchang's picture

Upload folder using huggingface_hub

aa7e600 verified 25 days ago

|

history blame contribute delete

2.75 kB

	---
	license: mit
	library_name: py-feat
	tags:
	- face-action-units
	- facs
	- facial-landmarks
	- regression
	- pls
	- mediapipe
	- py-feat
	---

	# BS → AU PLS

	> Predicts 20 FACS Action Unit intensities from 52 MediaPipe blendshapes via Cheong-style PLS regression. Lets MPDetector output AU columns comparable to Detector's xgb output.

	# BS -> AU PLS (v2)

	Linear PLS regression mapping 52 MediaPipe blendshapes to 20 FACS Action Unit
	intensities. Used to give MPDetector an AU output stream comparable to
	Detector's xgb AU model output.

	## Training data
	- 350,568 frames from ~10,000 CelebV-HQ celebrity videos
	- Paired blendshapes (MPDetector mp_blendshapes head) + AU intensities
	(Detector with img2pose face + xgb AU on the same frames)
	- Pose-filtered to \|yaw\| <= 40°, \|pitch\| <= 30° -> 347,897 retained
	- 9,994 unique videos after filtering
	- See /Storage/Projects/mp_blendshapes for the underlying training pipeline

	## Method
	- PLSRegression(n_components=20, scale=True), Cheong / Py-Feat style
	- 20 components = full rank (capped at min(n_features=52, n_targets=20))
	- Linear features only — pairwise BS interactions were tested in nested CV
	(2026-05-05) and HURT out-of-sample R² (bs_only=0.236 vs bs_pairs=0.214,
	with 4-6x higher fold std)
	- No pose covariates: kept pose-agnostic since MP blendshapes are
	designed to be pose-canonical
	- No clipping at training (clip to [0,1] at inference if desired)

	## Performance (3-fold GroupKFold by video_id)
	- Overall R² = 0.236 +/- 0.008 (variance-weighted across 20 AUs)
	- Overall MAE = 0.171
	- Strong on AU06/12/43 (~0.50)
	- Moderate on AU01/02/09 (~0.29)
	- Weak on AU11/15/28 (<0.10) — these are rare or visually subtle AUs

	## Citation context
	- Cheong et al. 2023 (Py-Feat AU visualization model, tutorial 06 by E. Jolly):
	affine-aligned 68 dlib landmarks -> 20 AUs via PLS on EmotioNet/DISFA/BP4D
	(~13K class-balanced rows). Our model scales the recipe up to MP blendshapes
	on 10x larger wild-celebrity data.

	## Inference
	The saved coef + intercept absorb PLSRegression's scale=True standardization,
	so inference is a single matmul:

	au = blendshapes @ coef + intercept # (n, 52) @ (52, 20) + (20,) = (n, 20)
	au = np.clip(au, 0.0, 1.0) # optional

	## File format
	NPZ with:
	- coef (52, 20) float32 — linear weights, rows match bs_columns
	- intercept (20,) float32 — bias, matches au_columns
	- bs_columns (52,) str — input feature order
	- au_columns (20,) str — output AU order
	- model_card () str — this markdown
	- training_metadata () str — JSON dict with training context

	Loader: np.load("bs_to_au_pls_v2.npz") — no extra deps needed.