py-feat
/

face_multitask_v2

Image Classification

facial-expression-analysis

emotion-recognition

gaze-estimation

Model card Files Files and versions

ljchang commited on Jun 3

Commit

0137f1d

·

verified ·

1 Parent(s): 64546af

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md +76 -0

README.md ADDED Viewed

	@@ -0,0 +1,76 @@

+---
+license: other
+license_name: research-only
+tags:
+  - facial-action-units
+  - facial-expression-recognition
+  - gaze-estimation
+  - facial-landmarks
+  - multitask
+  - py-feat
+library_name: py-feat
+---
+# face_multitask_v2
+Multitask facial-behavior model for **[py-feat](https://github.com/cosanlab/py-feat)**'s
+`Detectorv2`. A single ConvNeXt V2-Tiny backbone with lightweight task heads jointly
+predicts, from one forward pass:
+- **20 facial action units** (FACS, presence probabilities)
+- **7-class emotion** (Neutral, Happy, Sad, Surprise, Fear, Disgust, Anger)
+- **valence / arousal** (continuous, [-1, 1])
+- **gaze** (yaw, pitch)
+- **478-point MediaPipe-topology face mesh**
+- **6-DoF head pose**
+`face_multitask_v2.pt` contains `{model: state_dict, config: ModelV2Config, ...}`.
+py-feat loads it via `feat.multitask.inference.MultitaskModel` inside `Detectorv2`.
+## Architecture
+- **Backbone:** ConvNeXt V2-Tiny (FCMAE + ImageNet-22k/1k pretrain).
+- **AU head:** ANFL graph — AFG (per-AU branches) → FGG (k-NN cosine GCN) → cosine-
+  similarity classifier. (No MEFL edge head — ablation showed it was inert and ~4×
+  slower; see notes.)
+- **Unified features:** backbone GAP ∥ projected mesh-xy, feeding the emotion and gaze
+  heads (OF3-style). Emotion head additionally conditioned on the AU probabilities.
+- **Gaze:** L2CS-style 4-FC head. **Landmark / pose:** MLP heads (frozen after stage 1).
+- Multi-task loss with Kendall homoscedastic uncertainty weighting. ~37.6M params.
+Input: a 256×256 aligned face crop → center-crop 224 → ImageNet normalize.
+## What changed vs v1 (face_multitask_v1)
+- Dropped 4 poorly-represented AUs (AU16/18/27/45) → **20 AUs** (v1's 24).
+- Dropped **Contempt** → **7-class** emotion (v1's 8).
+- Removed the MEFL edge head (≈4× faster model forward, single-image latency ≈ OpenFace 3.0).
+- Emotion improved substantially; valence/arousal added/strengthened.
+## Benchmarks (py-feat end-to-end harness)
+| task | v2 | v1 (v2.3) | OpenFace 3.0 |
+|---|---|---|---|
+| DISFA+ AU macro-F1 (8-AU matched) | 0.756 | 0.757 | 0.732 |
+| AffectNet-7 macro-F1 | 0.330 | 0.264 | ~0.40* |
+| RAF-DB test macro-F1 | **0.849** | 0.751 | — |
+| Aff-Wild2 valence / arousal CCC | **0.816 / 0.783** | 0.79 / 0.74 | (no V/A head) |
+| MPIIGaze mean angular err | 3.92° | 3.33° | 2.56° |
+| Gaze360 mean angular err | **6.81°** | 5.81° | 10.6° |
+\* OF3 emotion measured on our chips (approximate crop); its paper reports higher.
+Numbers are macro-F1 / CCC / degrees on held-out external test sets.
+## Usage
+```python
+from feat.detector_v2 import Detectorv2
+det = Detectorv2(device="cuda")           # downloads this model
+fex = det.detect("face.jpg", data_type="image")
+```
+## Notes / license
+Research use. Trained on a mix of public AU, emotion, gaze, and landmark datasets;
+respect each source dataset's license. The optional ArcFace identity branch in
+Detectorv2 is a separate non-commercial-research model (not part of this checkpoint).