Image Classification
Py-Feat
facial-expression-analysis
action-units
emotion-recognition
gaze-estimation
face-landmarks
head-pose
blendshapes
multitask
Instructions to use py-feat/face_multitask_v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Py-Feat
How to use py-feat/face_multitask_v2 with Py-Feat:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,76 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: other
|
| 3 |
+
license_name: research-only
|
| 4 |
+
tags:
|
| 5 |
+
- facial-action-units
|
| 6 |
+
- facial-expression-recognition
|
| 7 |
+
- gaze-estimation
|
| 8 |
+
- facial-landmarks
|
| 9 |
+
- multitask
|
| 10 |
+
- py-feat
|
| 11 |
+
library_name: py-feat
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# face_multitask_v2
|
| 15 |
+
|
| 16 |
+
Multitask facial-behavior model for **[py-feat](https://github.com/cosanlab/py-feat)**'s
|
| 17 |
+
`Detectorv2`. A single ConvNeXt V2-Tiny backbone with lightweight task heads jointly
|
| 18 |
+
predicts, from one forward pass:
|
| 19 |
+
|
| 20 |
+
- **20 facial action units** (FACS, presence probabilities)
|
| 21 |
+
- **7-class emotion** (Neutral, Happy, Sad, Surprise, Fear, Disgust, Anger)
|
| 22 |
+
- **valence / arousal** (continuous, [-1, 1])
|
| 23 |
+
- **gaze** (yaw, pitch)
|
| 24 |
+
- **478-point MediaPipe-topology face mesh**
|
| 25 |
+
- **6-DoF head pose**
|
| 26 |
+
|
| 27 |
+
`face_multitask_v2.pt` contains `{model: state_dict, config: ModelV2Config, ...}`.
|
| 28 |
+
py-feat loads it via `feat.multitask.inference.MultitaskModel` inside `Detectorv2`.
|
| 29 |
+
|
| 30 |
+
## Architecture
|
| 31 |
+
|
| 32 |
+
- **Backbone:** ConvNeXt V2-Tiny (FCMAE + ImageNet-22k/1k pretrain).
|
| 33 |
+
- **AU head:** ANFL graph — AFG (per-AU branches) → FGG (k-NN cosine GCN) → cosine-
|
| 34 |
+
similarity classifier. (No MEFL edge head — ablation showed it was inert and ~4×
|
| 35 |
+
slower; see notes.)
|
| 36 |
+
- **Unified features:** backbone GAP ∥ projected mesh-xy, feeding the emotion and gaze
|
| 37 |
+
heads (OF3-style). Emotion head additionally conditioned on the AU probabilities.
|
| 38 |
+
- **Gaze:** L2CS-style 4-FC head. **Landmark / pose:** MLP heads (frozen after stage 1).
|
| 39 |
+
- Multi-task loss with Kendall homoscedastic uncertainty weighting. ~37.6M params.
|
| 40 |
+
|
| 41 |
+
Input: a 256×256 aligned face crop → center-crop 224 → ImageNet normalize.
|
| 42 |
+
|
| 43 |
+
## What changed vs v1 (face_multitask_v1)
|
| 44 |
+
|
| 45 |
+
- Dropped 4 poorly-represented AUs (AU16/18/27/45) → **20 AUs** (v1's 24).
|
| 46 |
+
- Dropped **Contempt** → **7-class** emotion (v1's 8).
|
| 47 |
+
- Removed the MEFL edge head (≈4× faster model forward, single-image latency ≈ OpenFace 3.0).
|
| 48 |
+
- Emotion improved substantially; valence/arousal added/strengthened.
|
| 49 |
+
|
| 50 |
+
## Benchmarks (py-feat end-to-end harness)
|
| 51 |
+
|
| 52 |
+
| task | v2 | v1 (v2.3) | OpenFace 3.0 |
|
| 53 |
+
|---|---|---|---|
|
| 54 |
+
| DISFA+ AU macro-F1 (8-AU matched) | 0.756 | 0.757 | 0.732 |
|
| 55 |
+
| AffectNet-7 macro-F1 | 0.330 | 0.264 | ~0.40* |
|
| 56 |
+
| RAF-DB test macro-F1 | **0.849** | 0.751 | — |
|
| 57 |
+
| Aff-Wild2 valence / arousal CCC | **0.816 / 0.783** | 0.79 / 0.74 | (no V/A head) |
|
| 58 |
+
| MPIIGaze mean angular err | 3.92° | 3.33° | 2.56° |
|
| 59 |
+
| Gaze360 mean angular err | **6.81°** | 5.81° | 10.6° |
|
| 60 |
+
|
| 61 |
+
\* OF3 emotion measured on our chips (approximate crop); its paper reports higher.
|
| 62 |
+
Numbers are macro-F1 / CCC / degrees on held-out external test sets.
|
| 63 |
+
|
| 64 |
+
## Usage
|
| 65 |
+
|
| 66 |
+
```python
|
| 67 |
+
from feat.detector_v2 import Detectorv2
|
| 68 |
+
det = Detectorv2(device="cuda") # downloads this model
|
| 69 |
+
fex = det.detect("face.jpg", data_type="image")
|
| 70 |
+
```
|
| 71 |
+
|
| 72 |
+
## Notes / license
|
| 73 |
+
|
| 74 |
+
Research use. Trained on a mix of public AU, emotion, gaze, and landmark datasets;
|
| 75 |
+
respect each source dataset's license. The optional ArcFace identity branch in
|
| 76 |
+
Detectorv2 is a separate non-commercial-research model (not part of this checkpoint).
|