ljchang commited on
Commit
0137f1d
·
verified ·
1 Parent(s): 64546af

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +76 -0
README.md ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: research-only
4
+ tags:
5
+ - facial-action-units
6
+ - facial-expression-recognition
7
+ - gaze-estimation
8
+ - facial-landmarks
9
+ - multitask
10
+ - py-feat
11
+ library_name: py-feat
12
+ ---
13
+
14
+ # face_multitask_v2
15
+
16
+ Multitask facial-behavior model for **[py-feat](https://github.com/cosanlab/py-feat)**'s
17
+ `Detectorv2`. A single ConvNeXt V2-Tiny backbone with lightweight task heads jointly
18
+ predicts, from one forward pass:
19
+
20
+ - **20 facial action units** (FACS, presence probabilities)
21
+ - **7-class emotion** (Neutral, Happy, Sad, Surprise, Fear, Disgust, Anger)
22
+ - **valence / arousal** (continuous, [-1, 1])
23
+ - **gaze** (yaw, pitch)
24
+ - **478-point MediaPipe-topology face mesh**
25
+ - **6-DoF head pose**
26
+
27
+ `face_multitask_v2.pt` contains `{model: state_dict, config: ModelV2Config, ...}`.
28
+ py-feat loads it via `feat.multitask.inference.MultitaskModel` inside `Detectorv2`.
29
+
30
+ ## Architecture
31
+
32
+ - **Backbone:** ConvNeXt V2-Tiny (FCMAE + ImageNet-22k/1k pretrain).
33
+ - **AU head:** ANFL graph — AFG (per-AU branches) → FGG (k-NN cosine GCN) → cosine-
34
+ similarity classifier. (No MEFL edge head — ablation showed it was inert and ~4×
35
+ slower; see notes.)
36
+ - **Unified features:** backbone GAP ∥ projected mesh-xy, feeding the emotion and gaze
37
+ heads (OF3-style). Emotion head additionally conditioned on the AU probabilities.
38
+ - **Gaze:** L2CS-style 4-FC head. **Landmark / pose:** MLP heads (frozen after stage 1).
39
+ - Multi-task loss with Kendall homoscedastic uncertainty weighting. ~37.6M params.
40
+
41
+ Input: a 256×256 aligned face crop → center-crop 224 → ImageNet normalize.
42
+
43
+ ## What changed vs v1 (face_multitask_v1)
44
+
45
+ - Dropped 4 poorly-represented AUs (AU16/18/27/45) → **20 AUs** (v1's 24).
46
+ - Dropped **Contempt** → **7-class** emotion (v1's 8).
47
+ - Removed the MEFL edge head (≈4× faster model forward, single-image latency ≈ OpenFace 3.0).
48
+ - Emotion improved substantially; valence/arousal added/strengthened.
49
+
50
+ ## Benchmarks (py-feat end-to-end harness)
51
+
52
+ | task | v2 | v1 (v2.3) | OpenFace 3.0 |
53
+ |---|---|---|---|
54
+ | DISFA+ AU macro-F1 (8-AU matched) | 0.756 | 0.757 | 0.732 |
55
+ | AffectNet-7 macro-F1 | 0.330 | 0.264 | ~0.40* |
56
+ | RAF-DB test macro-F1 | **0.849** | 0.751 | — |
57
+ | Aff-Wild2 valence / arousal CCC | **0.816 / 0.783** | 0.79 / 0.74 | (no V/A head) |
58
+ | MPIIGaze mean angular err | 3.92° | 3.33° | 2.56° |
59
+ | Gaze360 mean angular err | **6.81°** | 5.81° | 10.6° |
60
+
61
+ \* OF3 emotion measured on our chips (approximate crop); its paper reports higher.
62
+ Numbers are macro-F1 / CCC / degrees on held-out external test sets.
63
+
64
+ ## Usage
65
+
66
+ ```python
67
+ from feat.detector_v2 import Detectorv2
68
+ det = Detectorv2(device="cuda") # downloads this model
69
+ fex = det.detect("face.jpg", data_type="image")
70
+ ```
71
+
72
+ ## Notes / license
73
+
74
+ Research use. Trained on a mix of public AU, emotion, gaze, and landmark datasets;
75
+ respect each source dataset's license. The optional ArcFace identity branch in
76
+ Detectorv2 is a separate non-commercial-research model (not part of this checkpoint).