Evan Li commited on
Commit
abec69f
Β·
1 Parent(s): dcbc516

mivolo, freckles

Browse files
Dockerfile CHANGED
@@ -23,15 +23,21 @@ RUN mkdir -p models && \
23
  "https://storage.googleapis.com/mediapipe-models/face_landmarker/face_landmarker/float16/latest/face_landmarker.task"
24
 
25
  # Pre-download InsightFace buffalo_l bundle (detection + recognition +
26
- # age + gender + landmarks) so the first /analyze call doesn't pay the
27
- # ~280MB download. The bundle auto-extracts under ~/.insightface/models/
28
- # on first use.
29
  RUN mkdir -p /root/.insightface/models && \
30
  wget -q -O /root/.insightface/models/buffalo_l.zip \
31
  "https://github.com/deepinsight/insightface/releases/download/v0.7/buffalo_l.zip" && \
32
  cd /root/.insightface/models && unzip -q buffalo_l.zip -d buffalo_l && rm buffalo_l.zip
33
 
34
- # unzip wasn't in the system deps; add it via the apt block at the top.
 
 
 
 
 
 
35
 
36
  COPY . .
37
 
 
23
  "https://storage.googleapis.com/mediapipe-models/face_landmarker/face_landmarker/float16/latest/face_landmarker.task"
24
 
25
  # Pre-download InsightFace buffalo_l bundle (detection + recognition +
26
+ # 106 landmarks) so the first /analyze call doesn't pay the ~280MB
27
+ # download. The bundle auto-extracts under ~/.insightface/models/ on
28
+ # first use.
29
  RUN mkdir -p /root/.insightface/models && \
30
  wget -q -O /root/.insightface/models/buffalo_l.zip \
31
  "https://github.com/deepinsight/insightface/releases/download/v0.7/buffalo_l.zip" && \
32
  cd /root/.insightface/models && unzip -q buffalo_l.zip -d buffalo_l && rm buffalo_l.zip
33
 
34
+ # MiVOLO checkpoint is too large for the Docker image (~300 MB) β€” it's
35
+ # lazy-downloaded from HF Hub on first AgeAnalyzer init, via
36
+ # AGE_HF_REPO_ID (defaults to iitolstykh/mivolo_d1). To bundle it at
37
+ # build time instead, uncomment the lines below and pin the filename.
38
+ # RUN mkdir -p models && \
39
+ # wget -q -O models/mivolo_d1.pth.tar \
40
+ # "https://huggingface.co/iitolstykh/mivolo_d1/resolve/main/mivolo_d1.pth.tar"
41
 
42
  COPY . .
43
 
README.md CHANGED
@@ -18,9 +18,11 @@ face-recognition embedding and an aesthetic "chopped score."
18
 
19
  | # | Component | Model | Task | Size |
20
  |---|-----------|-------|------|------|
21
- | 1 | InsightFace | `buffalo_l` (SCRFD + ArcFace ResNet50, ONNX) | Detection + 512-d recognition embedding + age + gender + 106 landmarks (99.83% LFW) | ~280 MB |
22
  | 2 | MediaPipe Face Landmarker | `face_landmarker.task` (Google) | 478 3D landmarks + 52 ARKit blendshapes β€” geometric features, smiling, mouth-open | ~4 MB |
23
- | 3 | Ethnicity | `cledoux42/Ethnicity_Test_v003` (ViT) | 5-class ethnicity (~79.6% acc) | ~340 MB |
 
 
24
  | 4 | Human parsing | `matei-dorian/segformer-b5-finetuned-human-parsing` | 18-class pixel segmentation β†’ masks + hair length + hat | ~340 MB |
25
  | 5 | Emotion | HSEmotion `enet_b0_8_best_afew` (EfficientNet-B0) | 8-class emotion + valence/arousal | ~20 MB |
26
  | 6 | Color analysis | (no model β€” OpenCV LAB/HSV) | Skin tone, hair color, eye color, lip color | 0 MB |
 
18
 
19
  | # | Component | Model | Task | Size |
20
  |---|-----------|-------|------|------|
21
+ | 1 | InsightFace | `buffalo_l` (SCRFD + ArcFace ResNet50, ONNX) | Detection + 512-d recognition embedding + 106 landmarks (99.83% LFW) | ~280 MB |
22
  | 2 | MediaPipe Face Landmarker | `face_landmarker.task` (Google) | 478 3D landmarks + 52 ARKit blendshapes β€” geometric features, smiling, mouth-open | ~4 MB |
23
+ | 3a | Age | MiVOLO (VOLO transformer, regression) | Continuous age (~4 yr MAE on AgeDB) | ~300 MB |
24
+ | 3b | Gender | `dima806/fairface_gender_image_detection` (ViT) | Binary gender + softmax confidence (~93.4% acc) | ~340 MB |
25
+ | 3c | Ethnicity | `cledoux42/Ethnicity_Test_v003` (ViT) | 5-class ethnicity (~79.6% acc) | ~340 MB |
26
  | 4 | Human parsing | `matei-dorian/segformer-b5-finetuned-human-parsing` | 18-class pixel segmentation β†’ masks + hair length + hat | ~340 MB |
27
  | 5 | Emotion | HSEmotion `enet_b0_8_best_afew` (EfficientNet-B0) | 8-class emotion + valence/arousal | ~20 MB |
28
  | 6 | Color analysis | (no model β€” OpenCV LAB/HSV) | Skin tone, hair color, eye color, lip color | 0 MB |
analyzers/aesthetic_analyzer.py CHANGED
@@ -4,41 +4,46 @@ AestheticAnalyzer β€” "chopped score" aggregator.
4
  What it does
5
  ------------
6
  Reads the merged result dict from every other analyzer and produces a
7
- single numeric "chopped score" plus a per-factor breakdown. Higher
8
- score = more chopped = less conventionally attractive (by the
9
  arbitrary rubric encoded here). The breakdown lets you tune weights
10
  or flip polarity client-side without rerunning inference.
11
 
12
  Score composition
13
  -----------------
14
- The final chopped_score is a weighted blend of two sources:
15
 
16
- 1. **Learned beauty regressor** (from BeautyAnalyzer, trained on
17
- SCUT-FBP5500): a number in [1.0, 5.0] reflecting averaged human
18
- ratings. We rescale to a 0–100 unattractiveness axis. This is the
19
- dominant signal when available β€” heavy weight (default 0.7).
 
 
20
 
21
- 2. **Rule-based factor sum**: penalties for asymmetry, wrinkles,
22
- uneven skin, freckles, and asymmetric smile; bonuses for defined
23
- jawline, prominent cheekbones, clear skin, balanced lips, and
24
- dimples. Each factor is documented in `_compute_rule_score`.
25
- This is the only signal when the regressor isn't loaded
26
- (BeautyAnalyzer returns None).
27
 
28
  Blend math
29
  ----------
30
- if beauty_score available:
31
- chopped = 0.7 * (100 - beauty_norm) + 0.3 * rule_score
32
- else:
33
- chopped = rule_score
34
  chopped is clamped to [0, 100].
35
 
 
 
 
 
 
 
 
 
 
36
  Subjectivity disclaimer
37
  -----------------------
38
- Every weight in this file is a guess. "Beauty" is subjective, culturally
39
- biased, and reductive. Treat the score as an in-joke metric; never
40
- expose it as objective truth. The UI gates the row behind a
41
- Settings toggle off-by-default for that reason.
42
 
43
  Note: this analyzer takes no image input β€” it reads the merged result
44
  dict produced by every other analyzer that ran ahead of it.
@@ -48,12 +53,36 @@ from typing import Any
48
 
49
 
50
  # How much weight the learned beauty regressor gets when both signals
51
- # are available. The rule-based sum gets the rest (1 - this).
52
- LEARNED_WEIGHT = 0.7
 
53
 
54
  # Baseline score. Penalties push up, bonuses pull down.
55
  BASELINE = 50.0
56
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57
 
58
  class AestheticAnalyzer:
59
  def __init__(self):
@@ -71,15 +100,16 @@ class AestheticAnalyzer:
71
 
72
  beauty_norm = merged.get("beauty_score_norm")
73
  if beauty_norm is not None:
74
- # Beauty regressor: 0 = ugly, 100 = beautiful (per SCUT-FBP5500
75
- # scaling). Flip to unattractiveness axis: 100 - x.
76
- learned_unattractive = 100.0 - float(beauty_norm)
77
  chopped = (
78
  LEARNED_WEIGHT * learned_unattractive
79
  + (1.0 - LEARNED_WEIGHT) * rule_score
80
  )
 
 
 
81
  breakdown["learned_unattractive"] = round(
82
- LEARNED_WEIGHT * learned_unattractive - LEARNED_WEIGHT * BASELINE, 2
83
  )
84
  breakdown["_blend_weight_learned"] = LEARNED_WEIGHT
85
  else:
@@ -105,9 +135,14 @@ class AestheticAnalyzer:
105
  def _compute_rule_score(d: dict[str, Any]) -> tuple[float, dict[str, float]]:
106
  """Hand-tuned weighted sum over previously-extracted attributes.
107
 
108
- Returns (score, breakdown_dict). The breakdown gives each factor's
109
- signed contribution so a UI can show *why* a score landed where
110
- it did. Score starts at BASELINE (50) and moves up/down.
 
 
 
 
 
111
  """
112
  score = BASELINE
113
  breakdown: dict[str, float] = {}
@@ -118,13 +153,13 @@ class AestheticAnalyzer:
118
  # MediaPipe `facial_asymmetry_score` is already in this range.
119
  asym = d.get("facial_asymmetry_score")
120
  if isinstance(asym, (int, float)):
121
- penalty = float(asym) * 18.0
122
  score += penalty
123
  breakdown["asymmetry_penalty"] = round(penalty, 2)
124
 
125
  # Wrinkle level from SegFormer + OpenCV Laplacian classification.
126
  wrinkle_penalty_map = {
127
- "smooth": 0.0, "slight": 4.0, "moderate": 8.0, "prominent": 12.0,
128
  }
129
  wrinkle = d.get("wrinkle_level")
130
  if wrinkle in wrinkle_penalty_map:
@@ -132,36 +167,38 @@ class AestheticAnalyzer:
132
  score += penalty
133
  breakdown["wrinkle_penalty"] = penalty
134
 
135
- # Skin uniformity = LAB L* std-dev over the face mask. Higher
136
- # std means uneven tone (shadows, blemishes). Scale up to +8.
137
  uniformity = d.get("skin_uniformity")
138
  if isinstance(uniformity, (int, float)) and uniformity > 0:
139
- # Empirically, uniformity in clean skin is ~8-15; very uneven
140
- # skin pushes into the 20-30 range.
141
- penalty = min(8.0, max(0.0, (float(uniformity) - 10.0) * 0.5))
142
  score += penalty
143
  breakdown["skin_unevenness_penalty"] = round(penalty, 2)
144
 
145
- # Freckles/moles bucket.
146
- freckle_penalty_map = {"none": 0.0, "few": 1.0, "some": 3.0, "many": 5.0}
147
- freckles = d.get("freckles_or_moles")
148
- if freckles in freckle_penalty_map:
149
- penalty = freckle_penalty_map[freckles]
150
- score += penalty
151
- breakdown["freckles_penalty"] = penalty
152
 
153
  # Smile asymmetry: 0 = perfectly symmetric smile, larger = lopsided.
154
  smile_asym = d.get("smile_asymmetry")
155
  if isinstance(smile_asym, (int, float)):
156
- penalty = min(6.0, float(smile_asym) * 30.0)
157
  score += penalty
158
  breakdown["smile_asymmetry_penalty"] = round(penalty, 2)
159
 
160
  # Photo-quality penalty: sunglasses/mask hide features and the
161
  # model is guessing more. Mild penalty, not a personal trait.
162
  if d.get("wearing_sunglasses") or d.get("wearing_mask"):
163
- score += 5.0
164
- breakdown["obstruction_penalty"] = 5.0
 
 
 
 
 
 
 
165
 
166
  # ── Bonuses (pull score down = less chopped) ─────────────────
167
 
@@ -169,52 +206,65 @@ class AestheticAnalyzer:
169
  # take the stronger of the two contributions.
170
  jaw_bonus = 0.0
171
  jaw_type = d.get("jawline_type")
172
- jaw_type_bonus_map = {"sharp": -10.0, "strong": -6.0, "soft": 0.0}
173
  if jaw_type in jaw_type_bonus_map:
174
  jaw_bonus = jaw_type_bonus_map[jaw_type]
175
  jaw_angle = d.get("jawline_angle")
176
  if isinstance(jaw_angle, (int, float)) and jaw_angle < 115:
177
  # Sharp angles add more on top of the categorical signal.
178
- jaw_bonus = min(jaw_bonus, -10.0)
179
  if jaw_bonus:
180
  score += jaw_bonus
181
  breakdown["jaw_definition_bonus"] = round(jaw_bonus, 2)
182
 
183
  # Cheekbone prominence.
184
- cheek_bonus_map = {"high": -7.0, "moderate": -3.0, "flat": 0.0}
185
  cheek = d.get("cheekbone_prominence")
186
  if cheek in cheek_bonus_map:
187
  bonus = cheek_bonus_map[cheek]
188
  score += bonus
189
  breakdown["cheekbone_bonus"] = bonus
190
 
191
- # Skin clarity bonus when the texture score is low (i.e. smooth skin).
192
- # skin_texture_score is the same Laplacian-density value used by
193
- # wrinkle_level; ≀4 is "smooth" territory.
194
  texture = d.get("skin_texture_score")
195
- if isinstance(texture, (int, float)) and 0 < texture <= 4:
196
- score -= 9.0
197
- breakdown["skin_clarity_bonus"] = -9.0
198
 
199
  # Lip fullness β€” "average" and "full" both read as healthy.
200
  lip = d.get("lip_fullness")
201
- if lip in {"average", "full"}:
202
- score -= 5.0
203
- breakdown["lip_fullness_bonus"] = -5.0
 
 
 
204
 
205
  # Defined cupid's bow.
206
  if d.get("cupids_bow") == "defined":
207
- score -= 3.0
208
- breakdown["cupids_bow_bonus"] = -3.0
209
 
210
  # Normal eye spacing.
211
  if d.get("eye_spacing") == "average":
212
- score -= 4.0
213
- breakdown["eye_spacing_bonus"] = -4.0
 
 
 
 
 
 
 
214
 
215
  # Dimples β€” small bonus when the MediaPipe heuristic fires.
216
  if d.get("possible_dimples"):
 
 
 
 
 
217
  score -= 3.0
218
- breakdown["dimples_bonus"] = -3.0
219
 
220
  return score, breakdown
 
4
  What it does
5
  ------------
6
  Reads the merged result dict from every other analyzer and produces a
7
+ single numeric chopped_score (0-100) plus a per-factor breakdown.
8
+ Higher = more chopped = less conventionally attractive (by the
9
  arbitrary rubric encoded here). The breakdown lets you tune weights
10
  or flip polarity client-side without rerunning inference.
11
 
12
  Score composition
13
  -----------------
14
+ Final chopped_score is a weighted blend of two sources:
15
 
16
+ 1. **Learned beauty regressor** (BeautyAnalyzer, SCUT-FBP5500):
17
+ raw score in [1.0, 5.0] mapped to a "stretched" 0-100 axis. The
18
+ raw model output is fairly concentrated around 2.5-3.5 (most
19
+ faces), which would cluster scores near the middle. We stretch
20
+ the [2.0, 4.0] sub-range to fill [0, 100] so the tier system
21
+ gets meaningful spread.
22
 
23
+ 2. **Rule-based factor sum**: scaled penalties + bonuses on top of
24
+ a baseline of 50. Factors are documented in `_compute_rule_score`.
 
 
 
 
25
 
26
  Blend math
27
  ----------
28
+ learned_unattractive = stretched_unattractive(beauty_norm)
29
+ chopped = 0.6 * learned_unattractive + 0.4 * rule_score
 
 
30
  chopped is clamped to [0, 100].
31
 
32
+ Tuning history
33
+ --------------
34
+ - Original rule factors landed almost everyone at 50 Β± 10. Scaled up
35
+ by ~1.6Γ— to give attributes more bite. The learned signal got a
36
+ matching stretch (75 β†’ 100, 25 β†’ 0) so it isn't drowned out.
37
+ - Freckles/moles penalty was removed β€” the underlying SegFormer-based
38
+ detector was unreliable and the metric was effectively penalising
39
+ shadows and pores.
40
+
41
  Subjectivity disclaimer
42
  -----------------------
43
+ Every weight in this file is a guess. "Beauty" is subjective,
44
+ culturally biased, and reductive. Treat the score as an in-joke
45
+ metric; never expose it as objective truth. The UI gates the row
46
+ behind a Settings toggle off-by-default for that reason.
47
 
48
  Note: this analyzer takes no image input β€” it reads the merged result
49
  dict produced by every other analyzer that ran ahead of it.
 
53
 
54
 
55
  # How much weight the learned beauty regressor gets when both signals
56
+ # are available. 0.6 leaves more headroom for the rule sum than the
57
+ # old 0.7 β€” small change, but the rule factors moved harder.
58
+ LEARNED_WEIGHT = 0.6
59
 
60
  # Baseline score. Penalties push up, bonuses pull down.
61
  BASELINE = 50.0
62
 
63
+ # Stretch the learned-beauty 0-100 axis so it covers the chopped
64
+ # spectrum more dramatically. Beauty norms in (LEARNED_NORM_LO, _HI)
65
+ # map linearly to (0, 100). Below the lo bound is "fully chopped"
66
+ # territory (learned_unattractive = 100); above the hi bound is
67
+ # "fully gigachad" (learned_unattractive = 0).
68
+ LEARNED_NORM_LO = 25.0 # raw score β‰ˆ 2.0
69
+ LEARNED_NORM_HI = 75.0 # raw score β‰ˆ 4.0
70
+
71
+
72
+ def _stretch_unattractive(beauty_norm: float) -> float:
73
+ """Map BeautyAnalyzer's 0-100 normalised score to a stretched
74
+ unattractiveness 0-100. 75 β†’ 0 (gigachad), 25 β†’ 100 (megachopped).
75
+ """
76
+ if beauty_norm is None:
77
+ return 50.0
78
+ # Invert the axis then linearly stretch (LEARNED_NORM_LO, _HI).
79
+ unattractive = 100.0 - float(beauty_norm)
80
+ # unattractive: 25 (gigachad-ish) -> 100 (megachopped-ish)
81
+ span = (100.0 - LEARNED_NORM_LO) - (100.0 - LEARNED_NORM_HI) # = 50
82
+ lo_after_invert = 100.0 - LEARNED_NORM_HI # = 25
83
+ stretched = (unattractive - lo_after_invert) / span * 100.0
84
+ return max(0.0, min(100.0, stretched))
85
+
86
 
87
  class AestheticAnalyzer:
88
  def __init__(self):
 
100
 
101
  beauty_norm = merged.get("beauty_score_norm")
102
  if beauty_norm is not None:
103
+ learned_unattractive = _stretch_unattractive(float(beauty_norm))
 
 
104
  chopped = (
105
  LEARNED_WEIGHT * learned_unattractive
106
  + (1.0 - LEARNED_WEIGHT) * rule_score
107
  )
108
+ # Show the learned contribution as a signed offset from
109
+ # baseline so the breakdown reads consistently with rule
110
+ # factors.
111
  breakdown["learned_unattractive"] = round(
112
+ LEARNED_WEIGHT * (learned_unattractive - BASELINE), 2
113
  )
114
  breakdown["_blend_weight_learned"] = LEARNED_WEIGHT
115
  else:
 
135
  def _compute_rule_score(d: dict[str, Any]) -> tuple[float, dict[str, float]]:
136
  """Hand-tuned weighted sum over previously-extracted attributes.
137
 
138
+ Returns (score, breakdown_dict). The breakdown gives each
139
+ factor's signed contribution so a UI can show *why* a score
140
+ landed where it did. Score starts at BASELINE (50) and moves
141
+ up/down.
142
+
143
+ All penalty / bonus magnitudes are scaled up vs. the original
144
+ implementation so attributes contribute meaningfully to the
145
+ spread instead of nudging everyone toward 50.
146
  """
147
  score = BASELINE
148
  breakdown: dict[str, float] = {}
 
153
  # MediaPipe `facial_asymmetry_score` is already in this range.
154
  asym = d.get("facial_asymmetry_score")
155
  if isinstance(asym, (int, float)):
156
+ penalty = float(asym) * 30.0 # was 18; bigger asymmetric β†’ more spread
157
  score += penalty
158
  breakdown["asymmetry_penalty"] = round(penalty, 2)
159
 
160
  # Wrinkle level from SegFormer + OpenCV Laplacian classification.
161
  wrinkle_penalty_map = {
162
+ "smooth": 0.0, "slight": 6.0, "moderate": 14.0, "prominent": 20.0,
163
  }
164
  wrinkle = d.get("wrinkle_level")
165
  if wrinkle in wrinkle_penalty_map:
 
167
  score += penalty
168
  breakdown["wrinkle_penalty"] = penalty
169
 
170
+ # Skin uniformity = LAB L* std-dev over the eroded interior
171
+ # face mask. Higher std means uneven tone (shadows, blemishes).
172
  uniformity = d.get("skin_uniformity")
173
  if isinstance(uniformity, (int, float)) and uniformity > 0:
174
+ # Empirically uniformity sits ~8-15 in clean skin and
175
+ # 20-30 in uneven skin; scale up to +14 (was 8).
176
+ penalty = min(14.0, max(0.0, (float(uniformity) - 10.0) * 1.0))
177
  score += penalty
178
  breakdown["skin_unevenness_penalty"] = round(penalty, 2)
179
 
180
+ # NOTE: freckles_or_moles penalty deliberately removed β€” the
181
+ # detector was too noisy (shadows / pores counted as spots).
 
 
 
 
 
182
 
183
  # Smile asymmetry: 0 = perfectly symmetric smile, larger = lopsided.
184
  smile_asym = d.get("smile_asymmetry")
185
  if isinstance(smile_asym, (int, float)):
186
+ penalty = min(10.0, float(smile_asym) * 50.0) # was max 6, factor 30
187
  score += penalty
188
  breakdown["smile_asymmetry_penalty"] = round(penalty, 2)
189
 
190
  # Photo-quality penalty: sunglasses/mask hide features and the
191
  # model is guessing more. Mild penalty, not a personal trait.
192
  if d.get("wearing_sunglasses") or d.get("wearing_mask"):
193
+ score += 8.0 # was 5
194
+ breakdown["obstruction_penalty"] = 8.0
195
+
196
+ # Hat coverage also obscures hairline / forehead β€” small fixed
197
+ # penalty so a hat doesn't accidentally help the score by
198
+ # blocking unflattering hair.
199
+ if d.get("hat_detected"):
200
+ score += 4.0
201
+ breakdown["hat_obscuration_penalty"] = 4.0
202
 
203
  # ── Bonuses (pull score down = less chopped) ─────────────────
204
 
 
206
  # take the stronger of the two contributions.
207
  jaw_bonus = 0.0
208
  jaw_type = d.get("jawline_type")
209
+ jaw_type_bonus_map = {"sharp": -16.0, "strong": -10.0, "soft": 0.0}
210
  if jaw_type in jaw_type_bonus_map:
211
  jaw_bonus = jaw_type_bonus_map[jaw_type]
212
  jaw_angle = d.get("jawline_angle")
213
  if isinstance(jaw_angle, (int, float)) and jaw_angle < 115:
214
  # Sharp angles add more on top of the categorical signal.
215
+ jaw_bonus = min(jaw_bonus, -16.0)
216
  if jaw_bonus:
217
  score += jaw_bonus
218
  breakdown["jaw_definition_bonus"] = round(jaw_bonus, 2)
219
 
220
  # Cheekbone prominence.
221
+ cheek_bonus_map = {"high": -11.0, "moderate": -5.0, "flat": 0.0}
222
  cheek = d.get("cheekbone_prominence")
223
  if cheek in cheek_bonus_map:
224
  bonus = cheek_bonus_map[cheek]
225
  score += bonus
226
  breakdown["cheekbone_bonus"] = bonus
227
 
228
+ # Skin clarity bonus when the texture score is low (smooth).
 
 
229
  texture = d.get("skin_texture_score")
230
+ if isinstance(texture, (int, float)) and 0 < texture <= 8:
231
+ score -= 14.0 # was -9
232
+ breakdown["skin_clarity_bonus"] = -14.0
233
 
234
  # Lip fullness β€” "average" and "full" both read as healthy.
235
  lip = d.get("lip_fullness")
236
+ if lip == "full":
237
+ score -= 8.0 # was -5
238
+ breakdown["lip_fullness_bonus"] = -8.0
239
+ elif lip == "average":
240
+ score -= 4.0
241
+ breakdown["lip_fullness_bonus"] = -4.0
242
 
243
  # Defined cupid's bow.
244
  if d.get("cupids_bow") == "defined":
245
+ score -= 5.0 # was -3
246
+ breakdown["cupids_bow_bonus"] = -5.0
247
 
248
  # Normal eye spacing.
249
  if d.get("eye_spacing") == "average":
250
+ score -= 6.0 # was -4
251
+ breakdown["eye_spacing_bonus"] = -6.0
252
+
253
+ # Symmetric face (independent of asymmetry penalty above; we
254
+ # explicitly reward very symmetric faces rather than just not
255
+ # penalising them).
256
+ if isinstance(asym, (int, float)) and asym < 0.15:
257
+ score -= 6.0
258
+ breakdown["symmetry_bonus"] = -6.0
259
 
260
  # Dimples β€” small bonus when the MediaPipe heuristic fires.
261
  if d.get("possible_dimples"):
262
+ score -= 5.0 # was -3
263
+ breakdown["dimples_bonus"] = -5.0
264
+
265
+ # Eyes-open bonus (closed eyes makes a face look worse).
266
+ if d.get("eyes_open") is True:
267
  score -= 3.0
268
+ breakdown["eyes_open_bonus"] = -3.0
269
 
270
  return score, breakdown
analyzers/age_analyzer.py ADDED
@@ -0,0 +1,200 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ AgeAnalyzer β€” MiVOLO age regression.
3
+
4
+ Model
5
+ -----
6
+ - Architecture : MiVOLO (Multi-Input VOLO transformer). Designed to
7
+ predict age + gender from face + (optional) person
8
+ crops. Significantly better at young faces than
9
+ InsightFace's bundled genderage head.
10
+ - Reference : Kuprashevich & Tolstykh, "MiVOLO: Multi-input
11
+ Transformer for Age and Gender Estimation", 2023
12
+ (https://github.com/WildChlamydia/MiVOLO).
13
+ - Reported MAE : ~3.99 yrs (AgeDB), ~5.50 yrs (IMDB). Best published
14
+ numbers for face-only age regression.
15
+ - License : MIT (code) / weights see model card.
16
+
17
+ Inputs
18
+ ------
19
+ img_rgb : np.ndarray (H, W, 3) uint8. Should already be a face crop
20
+ (the orchestrator hands us the InsightFace-cropped image
21
+ via `_crop_to_face` in app.py).
22
+
23
+ Outputs (dict)
24
+ --------------
25
+ age_estimate : float years (continuous regression output)
26
+ age_range : string bucket derived from age_estimate for
27
+ legacy compatibility with the UI
28
+ age_confidence : 1.0 (MiVOLO is a regressor β€” no probability)
29
+ age_model_source : "mivolo" | "insightface_fallback" | "unavailable"
30
+
31
+ Weight loading
32
+ --------------
33
+ The MiVOLO package needs both its Python implementation AND a
34
+ checkpoint file. Resolution order:
35
+
36
+ 1. Local file at `models/mivolo_d1.pth.tar` (drop in after download).
37
+ 2. Hugging Face Hub via `AGE_HF_REPO_ID` env var, falling back to the
38
+ public `iitolstykh/mivolo_d1` repo.
39
+
40
+ If neither resolves (or the `mivolo` package isn't installed), the
41
+ analyzer reports `age_model_source: unavailable` and the orchestrator
42
+ falls back to InsightFace's age head with the piecewise calibration
43
+ that used to live in InsightFaceAnalyzer.
44
+ """
45
+
46
+ import os
47
+ from typing import Any
48
+
49
+ import numpy as np
50
+
51
+ # MiVOLO ships its own model class; defer the import so the module
52
+ # still loads when the package isn't installed.
53
+ try:
54
+ from mivolo.model.mi_volo import MiVOLO
55
+ HAS_MIVOLO = True
56
+ except ImportError:
57
+ HAS_MIVOLO = False
58
+
59
+
60
+ # Same legacy bucket schema we use elsewhere so UI rows that show
61
+ # `age_range` keep rendering whatever the source.
62
+ AGE_BUCKETS = [
63
+ (0, 3, "0-2"), (3, 10, "3-9"), (10, 20, "10-19"),
64
+ (20, 30, "20-29"), (30, 40, "30-39"), (40, 50, "40-49"),
65
+ (50, 60, "50-59"), (60, 70, "60-69"), (70, 200, "70+"),
66
+ ]
67
+
68
+ LOCAL_WEIGHTS_PATH = os.environ.get(
69
+ "AGE_WEIGHTS_PATH", "models/mivolo_d1.pth.tar"
70
+ )
71
+ HF_REPO_ID = os.environ.get("AGE_HF_REPO_ID", "iitolstykh/mivolo_d1")
72
+ HF_FILENAME = os.environ.get("AGE_HF_FILENAME", "mivolo_d1.pth.tar")
73
+
74
+
75
+ class AgeAnalyzer:
76
+ def __init__(self):
77
+ self.model = None
78
+ self.source = "unavailable"
79
+
80
+ if not HAS_MIVOLO:
81
+ print(
82
+ "[AgeAnalyzer] mivolo package not installed. Install with "
83
+ "`pip install git+https://github.com/WildChlamydia/MiVOLO.git`. "
84
+ "Falling back to InsightFace age until then."
85
+ )
86
+ return
87
+
88
+ weights_path = self._resolve_weights_path()
89
+ if weights_path is None:
90
+ print(
91
+ "[AgeAnalyzer] No usable weights β€” local not found at "
92
+ f"{LOCAL_WEIGHTS_PATH}, HF download failed. "
93
+ "Set AGE_WEIGHTS_PATH or AGE_HF_REPO_ID."
94
+ )
95
+ return
96
+
97
+ try:
98
+ # use_persons=False β†’ face-only inference (we don't have
99
+ # body detection in this pipeline). MiVOLO is best with
100
+ # both but face-only is still ~5 yrs MAE.
101
+ self.model = MiVOLO(
102
+ ckpt_path=weights_path,
103
+ device="cuda" if self._cuda_available() else "cpu",
104
+ half=False,
105
+ use_persons=False,
106
+ disable_faces=False,
107
+ verbose=False,
108
+ )
109
+ self.source = "mivolo"
110
+ print(f"[AgeAnalyzer] Loaded MiVOLO weights from {weights_path}")
111
+ except Exception as exc:
112
+ print(f"[AgeAnalyzer] MiVOLO init failed: {exc}")
113
+ self.model = None
114
+
115
+ @staticmethod
116
+ def _cuda_available() -> bool:
117
+ try:
118
+ import torch
119
+ return torch.cuda.is_available()
120
+ except ImportError:
121
+ return False
122
+
123
+ @staticmethod
124
+ def _resolve_weights_path() -> str | None:
125
+ """Local file wins, HF Hub is the fallback."""
126
+ if os.path.exists(LOCAL_WEIGHTS_PATH):
127
+ return LOCAL_WEIGHTS_PATH
128
+ try:
129
+ from huggingface_hub import hf_hub_download
130
+ return hf_hub_download(repo_id=HF_REPO_ID, filename=HF_FILENAME)
131
+ except Exception as exc:
132
+ print(f"[AgeAnalyzer] HF Hub download failed: {exc}")
133
+ return None
134
+
135
+ def analyze(self, img_rgb: np.ndarray) -> dict[str, Any]:
136
+ if self.model is None:
137
+ return self._empty_result()
138
+
139
+ try:
140
+ # MiVOLO expects a face crop. img_rgb is already cropped
141
+ # to the InsightFace bbox by the orchestrator. The
142
+ # package handles its own resize / normalise; we just
143
+ # hand over the uint8 RGB array.
144
+ age = self._run_inference(img_rgb)
145
+ except Exception as exc:
146
+ print(f"[AgeAnalyzer] Inference failed: {exc}")
147
+ return self._empty_result()
148
+
149
+ return {
150
+ "age_estimate": round(float(age), 1),
151
+ "age_range": self._bucket_age(age),
152
+ "age_confidence": 1.0,
153
+ "age_model_source": self.source,
154
+ }
155
+
156
+ def _run_inference(self, face_crop: np.ndarray) -> float:
157
+ """Run MiVOLO on a single face crop and return the predicted age.
158
+
159
+ The mivolo package's high-level `Predictor` wraps detection +
160
+ regression together. We've already done detection via
161
+ InsightFace, so we use the lower-level `MiVOLO` model directly
162
+ on the pre-cropped face. The model's `inference` method takes
163
+ a batched tensor; build it from the crop.
164
+ """
165
+ import torch
166
+ # MiVOLO expects 224Γ—224 ImageNet-normalised face input.
167
+ from torchvision import transforms
168
+
169
+ preprocess = transforms.Compose([
170
+ transforms.ToPILImage(),
171
+ transforms.Resize((224, 224)),
172
+ transforms.ToTensor(),
173
+ transforms.Normalize(
174
+ mean=[0.485, 0.456, 0.406],
175
+ std=[0.229, 0.224, 0.225],
176
+ ),
177
+ ])
178
+ face_tensor = preprocess(face_crop).unsqueeze(0).to(self.model.device)
179
+
180
+ # When use_persons=False, MiVOLO's inference takes a face
181
+ # tensor and a None (or zeros) person tensor of matching batch.
182
+ with torch.no_grad():
183
+ ages, _genders = self.model.inference(face_tensor, None)
184
+ return float(ages[0].item())
185
+
186
+ @staticmethod
187
+ def _bucket_age(age: float) -> str:
188
+ for lo, hi, label in AGE_BUCKETS:
189
+ if lo <= age < hi:
190
+ return label
191
+ return "unknown"
192
+
193
+ @staticmethod
194
+ def _empty_result() -> dict[str, Any]:
195
+ return {
196
+ "age_estimate": None,
197
+ "age_range": "unknown",
198
+ "age_confidence": 0.0,
199
+ "age_model_source": "unavailable",
200
+ }
analyzers/gender_analyzer.py ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ GenderAnalyzer β€” FairFace ViT for binary gender.
3
+
4
+ Model
5
+ -----
6
+ - HF repo : dima806/fairface_gender_image_detection
7
+ - Arch : Vision Transformer (ViT-B/16), 86M params
8
+ - Trained : FairFace dataset (gender-balanced)
9
+ - Reported : 93.4% accuracy
10
+ - License : Apache 2.0
11
+ - Source : https://huggingface.co/dima806/fairface_gender_image_detection
12
+
13
+ Why this instead of InsightFace's bundled head
14
+ ----------------------------------------------
15
+ InsightFace's `genderage.onnx` does argmax only β€” it doesn't expose a
16
+ softmax confidence. Borderline calls and confident calls look identical
17
+ in the UI ("Female 100%"). FairFace gives a real probability so the UI
18
+ can show graded confidence.
19
+
20
+ Inputs
21
+ ------
22
+ img_rgb : np.ndarray (H, W, 3) uint8. Typically a face crop produced
23
+ by `_crop_to_face` in app.py.
24
+
25
+ Outputs (dict)
26
+ --------------
27
+ gender : "male" | "female"
28
+ gender_confidence : float in [0, 1] (argmax softmax probability)
29
+ gender_distribution : { "male": p, "female": p }
30
+ gender_model_source : "fairface" | "unavailable"
31
+ """
32
+
33
+ from typing import Any
34
+
35
+ from PIL import Image
36
+ from transformers import pipeline
37
+
38
+
39
+ MODEL_ID = "dima806/fairface_gender_image_detection"
40
+ LABELS = ["male", "female"]
41
+
42
+
43
+ class GenderAnalyzer:
44
+ def __init__(self):
45
+ self.classifier = None
46
+ try:
47
+ self.classifier = pipeline("image-classification", model=MODEL_ID)
48
+ except Exception as exc:
49
+ print(f"[GenderAnalyzer] Failed to load {MODEL_ID}: {exc}")
50
+
51
+ def analyze(self, img_rgb) -> dict[str, Any]:
52
+ if self.classifier is None:
53
+ return self._empty_result()
54
+
55
+ try:
56
+ pil = Image.fromarray(img_rgb)
57
+ preds = self.classifier(pil, top_k=2)
58
+ except Exception as exc:
59
+ print(f"[GenderAnalyzer] Prediction failed: {exc}")
60
+ return self._empty_result()
61
+
62
+ if not preds:
63
+ return self._empty_result()
64
+
65
+ # FairFace labels can come back capitalised; normalise to lowercase.
66
+ distribution = {label: 0.0 for label in LABELS}
67
+ for pred in preds:
68
+ label = str(pred["label"]).strip().lower()
69
+ if label in distribution:
70
+ distribution[label] = round(float(pred["score"]), 3)
71
+
72
+ # Top class wins; expose the actual softmax score as confidence.
73
+ top = preds[0]
74
+ top_label = str(top["label"]).strip().lower()
75
+ if top_label not in LABELS:
76
+ return self._empty_result()
77
+
78
+ return {
79
+ "gender": top_label,
80
+ "gender_confidence": round(float(top["score"]), 3),
81
+ "gender_distribution": distribution,
82
+ "gender_model_source": "fairface",
83
+ }
84
+
85
+ @staticmethod
86
+ def _empty_result() -> dict[str, Any]:
87
+ return {
88
+ "gender": "unknown",
89
+ "gender_confidence": 0.0,
90
+ "gender_distribution": {label: 0.0 for label in LABELS},
91
+ "gender_model_source": "unavailable",
92
+ }
analyzers/insightface_analyzer.py CHANGED
@@ -1,12 +1,16 @@
1
  """
2
- InsightFaceAnalyzer β€” detection + age + gender + recognition embedding.
3
 
4
  Model
5
  -----
6
  - Package : `insightface` (https://github.com/deepinsight/insightface)
7
  - Bundle : buffalo_l (ResNet50@WebFace600K backbone, ONNX)
8
- - Components : SCRFD-10GF detector, ArcFace 512-d recognition,
9
- 2d106 + 3d68 landmark regressors, age + gender heads
 
 
 
 
10
  - Size : ~280 MB (ONNX, mixed FP16/FP32)
11
  - License : weights research-only; code Apache 2.0
12
  - Source : https://github.com/deepinsight/insightface/tree/master/python-package
@@ -20,19 +24,14 @@ Outputs (dict)
20
  face_bbox : [x1, y1, x2, y2] in pixel coordinates
21
  face_confidence : SCRFD detection score
22
  face_embedding : list[float] of length 512 (ArcFace, L2-normalised)
23
- age_estimate : float years (regression head, not bucketed)
24
- age_range : string bucket derived from age_estimate for
25
- backwards compatibility with the legacy UI
26
- gender : "male" | "female"
27
- gender_confidence : 1.0 by default (InsightFace doesn't expose a
28
- gender softmax score; the head is argmax-only)
29
- _insight_landmarks_2d : list of (x, y) tuples β€” 106 points (internal)
30
 
31
  Accuracy
32
  --------
33
  - Recognition (ArcFace via buffalo_l): 99.83% LFW, 96.21% IJB-B FAR=1e-4.
34
- - Age / gender heads are widely used but lack a clean published metric.
35
- In practice age MAE is ~5 years and gender ~94-96%.
36
 
37
  Notes
38
  -----
@@ -41,7 +40,6 @@ face when multiple are detected β€” the rest of the pipeline assumes a
41
  single subject.
42
  """
43
 
44
- import os
45
  from typing import Any
46
 
47
  import numpy as np
@@ -57,21 +55,6 @@ except ImportError:
57
 
58
  MODEL_NAME = "buffalo_l"
59
 
60
- # InsightFace's genderage head is known to overshoot adult ages by
61
- # roughly 5 years in informal testing (no published calibration). We
62
- # subtract a fixed offset to undo this bias; clamp to β‰₯1 so we never
63
- # emit negative ages for kids. Override at runtime via the
64
- # AGE_OFFSET_YEARS env var if you want to tune for your dataset.
65
- AGE_OFFSET_YEARS = float(os.environ.get("AGE_OFFSET_YEARS", "5"))
66
-
67
- # Age buckets used by the legacy UI. We derive these from the regression
68
- # output so existing screens keep working.
69
- AGE_BUCKETS = [
70
- (0, 3, "0-2"), (3, 10, "3-9"), (10, 20, "10-19"),
71
- (20, 30, "20-29"), (30, 40, "30-39"), (40, 50, "40-49"),
72
- (50, 60, "50-59"), (60, 70, "60-69"), (70, 200, "70+"),
73
- ]
74
-
75
 
76
  class InsightFaceAnalyzer:
77
  def __init__(self):
@@ -79,20 +62,18 @@ class InsightFaceAnalyzer:
79
  if not HAS_INSIGHTFACE:
80
  print(
81
  "[InsightFaceAnalyzer] insightface package not installed; "
82
- "detection, age, gender, and recognition will degrade to 'unknown'."
83
  )
84
  return
85
 
86
  try:
87
- # Buffalo_L bundle auto-resolves under ~/.insightface/models/.
88
  # CPUExecutionProvider is the right default for HF Spaces;
89
- # ctx_id=0 + 'CUDAExecutionProvider' would be the GPU path.
90
  self.app = FaceAnalysis(
91
  name=MODEL_NAME,
92
  providers=["CPUExecutionProvider"],
93
  )
94
- # det_size=(640, 640) is the canonical SCRFD input. Smaller
95
- # speeds inference but loses small faces.
96
  self.app.prepare(ctx_id=-1, det_size=(640, 640))
97
  except Exception as exc:
98
  print(f"[InsightFaceAnalyzer] Failed to load {MODEL_NAME}: {exc}")
@@ -128,29 +109,10 @@ class InsightFaceAnalyzer:
128
  else None
129
  )
130
 
131
- # Age head is a single float (years). Buffalo_L systematically
132
- # over-predicts adults by ~5 years; subtract AGE_OFFSET_YEARS
133
- # to recalibrate. Don't drop below 1 (negative ages would be
134
- # absurd, and very young children are already on the noisy
135
- # end of the model's training distribution).
136
- raw_age = float(getattr(face, "age", 0.0))
137
- age = max(1.0, raw_age - AGE_OFFSET_YEARS)
138
-
139
- # Gender is exposed as 0 (female) / 1 (male) on Face objects.
140
- # InsightFace doesn't surface a softmax probability β€” we report
141
- # confidence 1.0 to indicate "argmax, no soft signal".
142
- gender_idx = int(getattr(face, "gender", -1))
143
- gender = "male" if gender_idx == 1 else "female" if gender_idx == 0 else "unknown"
144
-
145
  return {
146
  "face_bbox": bbox,
147
  "face_confidence": round(float(face.det_score), 3),
148
  "face_embedding": embedding,
149
- "age_estimate": round(age, 1),
150
- "age_range": self._bucket_age(age),
151
- "age_confidence": 1.0,
152
- "gender": gender,
153
- "gender_confidence": 1.0,
154
  # 106 2D landmarks (forehead, jaw, brows, eyes, nose, lips).
155
  # Underscore-prefixed β†’ stripped from JSON, available to
156
  # downstream analyzers that want a tighter face crop.
@@ -161,23 +123,11 @@ class InsightFaceAnalyzer:
161
  ),
162
  }
163
 
164
- @staticmethod
165
- def _bucket_age(age: float) -> str:
166
- for lo, hi, label in AGE_BUCKETS:
167
- if lo <= age < hi:
168
- return label
169
- return "unknown"
170
-
171
  @staticmethod
172
  def _empty_result() -> dict[str, Any]:
173
  return {
174
  "face_bbox": None,
175
  "face_confidence": 0.0,
176
  "face_embedding": None,
177
- "age_estimate": 0.0,
178
- "age_range": "unknown",
179
- "age_confidence": 0.0,
180
- "gender": "unknown",
181
- "gender_confidence": 0.0,
182
  "_insight_landmarks_2d": None,
183
  }
 
1
  """
2
+ InsightFaceAnalyzer β€” face detection + ArcFace recognition embedding.
3
 
4
  Model
5
  -----
6
  - Package : `insightface` (https://github.com/deepinsight/insightface)
7
  - Bundle : buffalo_l (ResNet50@WebFace600K backbone, ONNX)
8
+ - Used here : SCRFD-10GF detector + ArcFace 512-d recognition + 106
9
+ 2D landmarks. The bundle ALSO ships an age+gender head,
10
+ but we route those signals to dedicated analyzers
11
+ (MiVOLO for age, FairFace ViT for gender) because the
12
+ bundled heads have no exposed confidence and overshoot
13
+ young faces.
14
  - Size : ~280 MB (ONNX, mixed FP16/FP32)
15
  - License : weights research-only; code Apache 2.0
16
  - Source : https://github.com/deepinsight/insightface/tree/master/python-package
 
24
  face_bbox : [x1, y1, x2, y2] in pixel coordinates
25
  face_confidence : SCRFD detection score
26
  face_embedding : list[float] of length 512 (ArcFace, L2-normalised)
27
+ _insight_landmarks_2d : list of (x, y) tuples β€” 106 points (internal,
28
+ stripped before JSON; available to downstream
29
+ analyzers that want tighter face geometry)
 
 
 
 
30
 
31
  Accuracy
32
  --------
33
  - Recognition (ArcFace via buffalo_l): 99.83% LFW, 96.21% IJB-B FAR=1e-4.
34
+ - Detection (SCRFD-10GF): >99% recall on WIDER FACE easy / medium.
 
35
 
36
  Notes
37
  -----
 
40
  single subject.
41
  """
42
 
 
43
  from typing import Any
44
 
45
  import numpy as np
 
55
 
56
  MODEL_NAME = "buffalo_l"
57
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58
 
59
  class InsightFaceAnalyzer:
60
  def __init__(self):
 
62
  if not HAS_INSIGHTFACE:
63
  print(
64
  "[InsightFaceAnalyzer] insightface package not installed; "
65
+ "face detection and recognition will be unavailable."
66
  )
67
  return
68
 
69
  try:
 
70
  # CPUExecutionProvider is the right default for HF Spaces;
71
+ # add 'CUDAExecutionProvider' first for GPU.
72
  self.app = FaceAnalysis(
73
  name=MODEL_NAME,
74
  providers=["CPUExecutionProvider"],
75
  )
76
+ # det_size=(640, 640) is the canonical SCRFD input.
 
77
  self.app.prepare(ctx_id=-1, det_size=(640, 640))
78
  except Exception as exc:
79
  print(f"[InsightFaceAnalyzer] Failed to load {MODEL_NAME}: {exc}")
 
109
  else None
110
  )
111
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
112
  return {
113
  "face_bbox": bbox,
114
  "face_confidence": round(float(face.det_score), 3),
115
  "face_embedding": embedding,
 
 
 
 
 
116
  # 106 2D landmarks (forehead, jaw, brows, eyes, nose, lips).
117
  # Underscore-prefixed β†’ stripped from JSON, available to
118
  # downstream analyzers that want a tighter face crop.
 
123
  ),
124
  }
125
 
 
 
 
 
 
 
 
126
  @staticmethod
127
  def _empty_result() -> dict[str, Any]:
128
  return {
129
  "face_bbox": None,
130
  "face_confidence": 0.0,
131
  "face_embedding": None,
 
 
 
 
 
132
  "_insight_landmarks_2d": None,
133
  }
app.py CHANGED
@@ -2,31 +2,39 @@
2
  HCP Face Analysis Microservice
3
  ==============================
4
 
5
- FastAPI service that runs nine specialized analyzers over a single photo
6
- and merges their outputs into one facial-attribute dictionary, including
7
- a face-recognition embedding for cross-photo grouping and a numeric
8
- "chopped score" aesthetic rating.
9
 
10
  Pipeline (in execution order)
11
  -----------------------------
12
  1. InsightFaceAnalyzer InsightFace buffalo_l (ONNX). SCRFD
13
  detection + ArcFace 512-d embedding +
14
- age regression + gender + 106 landmarks.
15
- Replaces the previous three FairFace ViTs
16
- and adds face matching as a new capability.
17
 
18
  2. LandmarkAnalyzer MediaPipe Face Landmarker. 478 3D
19
  landmarks + 52 ARKit blendshapes β†’
20
  geometric features, smiling, mouth_open.
21
 
22
- 3. EthnicityAnalyzer cledoux42/Ethnicity_Test_v003 ViT.
 
 
 
 
 
 
 
 
23
  5-class ethnicity widened to a 7-bucket
24
  schema for legacy compatibility.
25
 
26
- 4. ParsingAnalyzer SegFormer-B5 human parsing. Now receives
27
- a face-cropped image (smaller, cleaner).
28
- Emits face/hair masks + hair length +
29
- hat detection + OpenCV-derived skin stats.
30
 
31
  5. EmotionAnalyzer HSEmotion EfficientNet-B0. 8-class
32
  emotion + valence/arousal/mood.
@@ -50,9 +58,10 @@ Pipeline (in execution order)
50
  scoring only.
51
 
52
  10. AestheticAnalyzer Pure-Python aggregator. Reads the merged
53
- dict from analyzers 1–9 and produces the
54
- final `chopped_score` (0–100, higher =
55
- more chopped) and a per-factor breakdown.
 
56
 
57
  Endpoints
58
  ---------
@@ -89,6 +98,8 @@ from analyzers.color_analyzer import ColorAnalyzer
89
  from analyzers.obstruction_analyzer import ObstructionAnalyzer
90
  from analyzers.hair_type_analyzer import HairTypeAnalyzer
91
  from analyzers.insightface_analyzer import InsightFaceAnalyzer
 
 
92
  from analyzers.beauty_analyzer import BeautyAnalyzer
93
  from analyzers.aesthetic_analyzer import AestheticAnalyzer
94
 
@@ -109,6 +120,8 @@ app.add_middleware(
109
  # model-load cost; subsequent requests are warm.
110
  insightface_analyzer: Optional[InsightFaceAnalyzer] = None
111
  landmark_analyzer: Optional[LandmarkAnalyzer] = None
 
 
112
  ethnicity_analyzer: Optional[EthnicityAnalyzer] = None
113
  parsing_analyzer: Optional[ParsingAnalyzer] = None
114
  emotion_analyzer: Optional[EmotionAnalyzer] = None
@@ -148,7 +161,8 @@ def get_analyzers():
148
  requests. First request pays the full model-load cost; subsequent
149
  requests are warm.
150
  """
151
- global insightface_analyzer, landmark_analyzer, ethnicity_analyzer
 
152
  global parsing_analyzer, emotion_analyzer, color_analyzer
153
  global obstruction_analyzer, hair_type_analyzer
154
  global beauty_analyzer, aesthetic_analyzer
@@ -161,6 +175,14 @@ def get_analyzers():
161
  logger.info("Loading MediaPipe Face Landmarker...")
162
  landmark_analyzer = LandmarkAnalyzer()
163
 
 
 
 
 
 
 
 
 
164
  if ethnicity_analyzer is None:
165
  logger.info("Loading Ethnicity classifier...")
166
  ethnicity_analyzer = EthnicityAnalyzer()
@@ -194,6 +216,8 @@ def get_analyzers():
194
  return (
195
  insightface_analyzer,
196
  landmark_analyzer,
 
 
197
  ethnicity_analyzer,
198
  parsing_analyzer,
199
  emotion_analyzer,
@@ -246,6 +270,8 @@ def _run_pipeline(img_array: np.ndarray) -> dict:
246
  (
247
  insight,
248
  landmarks,
 
 
249
  ethnicities,
250
  parsing,
251
  emotions,
@@ -258,7 +284,9 @@ def _run_pipeline(img_array: np.ndarray) -> dict:
258
 
259
  results: dict = {}
260
 
261
- # Step 1: InsightFace detection + age + gender + recognition embedding.
 
 
262
  logger.info("Running InsightFace analysis...")
263
  insight_results = insight.analyze(img_array)
264
  results.update(insight_results)
@@ -275,7 +303,17 @@ def _run_pipeline(img_array: np.ndarray) -> dict:
275
  landmark_results = landmarks.analyze(img_array)
276
  results.update(landmark_results)
277
 
278
- # Step 3: ethnicity classifier β€” likes a tighter face crop.
 
 
 
 
 
 
 
 
 
 
279
  logger.info("Running ethnicity analysis...")
280
  results.update(ethnicities.analyze(face_crop))
281
 
 
2
  HCP Face Analysis Microservice
3
  ==============================
4
 
5
+ FastAPI service that runs twelve specialised analyzers over a single
6
+ photo and merges their outputs into one facial-attribute dictionary,
7
+ including a face-recognition embedding for cross-photo grouping and a
8
+ numeric "chopped score" aesthetic rating.
9
 
10
  Pipeline (in execution order)
11
  -----------------------------
12
  1. InsightFaceAnalyzer InsightFace buffalo_l (ONNX). SCRFD
13
  detection + ArcFace 512-d embedding +
14
+ 106 landmarks. Age & gender USED to
15
+ come from here; now delegated to
16
+ MiVOLO and FairFace (steps 3a/3b).
17
 
18
  2. LandmarkAnalyzer MediaPipe Face Landmarker. 478 3D
19
  landmarks + 52 ARKit blendshapes β†’
20
  geometric features, smiling, mouth_open.
21
 
22
+ 3a. AgeAnalyzer MiVOLO regressor (~4 yr MAE on AgeDB).
23
+ Replaces the InsightFace age head, which
24
+ systematically overshot young faces.
25
+
26
+ 3b. GenderAnalyzer FairFace ViT (~93.4% acc). Replaces the
27
+ InsightFace gender head so we get a real
28
+ softmax confidence instead of argmax 1.0.
29
+
30
+ 3c. EthnicityAnalyzer cledoux42/Ethnicity_Test_v003 ViT.
31
  5-class ethnicity widened to a 7-bucket
32
  schema for legacy compatibility.
33
 
34
+ 4. ParsingAnalyzer SegFormer-B5 human parsing. Receives the
35
+ face-cropped image. Emits face/hair
36
+ masks + hair length + hat detection +
37
+ OpenCV-derived skin stats.
38
 
39
  5. EmotionAnalyzer HSEmotion EfficientNet-B0. 8-class
40
  emotion + valence/arousal/mood.
 
58
  scoring only.
59
 
60
  10. AestheticAnalyzer Pure-Python aggregator. Reads the merged
61
+ dict from previous analyzers and produces
62
+ the final `chopped_score` (0–100, higher
63
+ = more chopped) and a per-factor
64
+ breakdown.
65
 
66
  Endpoints
67
  ---------
 
98
  from analyzers.obstruction_analyzer import ObstructionAnalyzer
99
  from analyzers.hair_type_analyzer import HairTypeAnalyzer
100
  from analyzers.insightface_analyzer import InsightFaceAnalyzer
101
+ from analyzers.age_analyzer import AgeAnalyzer
102
+ from analyzers.gender_analyzer import GenderAnalyzer
103
  from analyzers.beauty_analyzer import BeautyAnalyzer
104
  from analyzers.aesthetic_analyzer import AestheticAnalyzer
105
 
 
120
  # model-load cost; subsequent requests are warm.
121
  insightface_analyzer: Optional[InsightFaceAnalyzer] = None
122
  landmark_analyzer: Optional[LandmarkAnalyzer] = None
123
+ age_analyzer: Optional[AgeAnalyzer] = None
124
+ gender_analyzer: Optional[GenderAnalyzer] = None
125
  ethnicity_analyzer: Optional[EthnicityAnalyzer] = None
126
  parsing_analyzer: Optional[ParsingAnalyzer] = None
127
  emotion_analyzer: Optional[EmotionAnalyzer] = None
 
161
  requests. First request pays the full model-load cost; subsequent
162
  requests are warm.
163
  """
164
+ global insightface_analyzer, landmark_analyzer
165
+ global age_analyzer, gender_analyzer, ethnicity_analyzer
166
  global parsing_analyzer, emotion_analyzer, color_analyzer
167
  global obstruction_analyzer, hair_type_analyzer
168
  global beauty_analyzer, aesthetic_analyzer
 
175
  logger.info("Loading MediaPipe Face Landmarker...")
176
  landmark_analyzer = LandmarkAnalyzer()
177
 
178
+ if age_analyzer is None:
179
+ logger.info("Loading MiVOLO age analyzer...")
180
+ age_analyzer = AgeAnalyzer()
181
+
182
+ if gender_analyzer is None:
183
+ logger.info("Loading FairFace gender analyzer...")
184
+ gender_analyzer = GenderAnalyzer()
185
+
186
  if ethnicity_analyzer is None:
187
  logger.info("Loading Ethnicity classifier...")
188
  ethnicity_analyzer = EthnicityAnalyzer()
 
216
  return (
217
  insightface_analyzer,
218
  landmark_analyzer,
219
+ age_analyzer,
220
+ gender_analyzer,
221
  ethnicity_analyzer,
222
  parsing_analyzer,
223
  emotion_analyzer,
 
270
  (
271
  insight,
272
  landmarks,
273
+ ages,
274
+ genders,
275
  ethnicities,
276
  parsing,
277
  emotions,
 
284
 
285
  results: dict = {}
286
 
287
+ # Step 1: InsightFace β€” detection, ArcFace recognition embedding,
288
+ # 106 landmarks. (Age and gender used to live here too; they are
289
+ # now produced by dedicated analyzers further down.)
290
  logger.info("Running InsightFace analysis...")
291
  insight_results = insight.analyze(img_array)
292
  results.update(insight_results)
 
303
  landmark_results = landmarks.analyze(img_array)
304
  results.update(landmark_results)
305
 
306
+ # Step 3a: MiVOLO age regressor. Much better at young faces than
307
+ # InsightFace's bundled head, which overshoots heavily.
308
+ logger.info("Running MiVOLO age analysis...")
309
+ results.update(ages.analyze(face_crop))
310
+
311
+ # Step 3b: FairFace gender. Provides a real softmax confidence
312
+ # score so the UI can show graded uncertainty.
313
+ logger.info("Running FairFace gender analysis...")
314
+ results.update(genders.analyze(face_crop))
315
+
316
+ # Step 3c: ethnicity classifier β€” likes a tighter face crop.
317
  logger.info("Running ethnicity analysis...")
318
  results.update(ethnicities.analyze(face_crop))
319
 
architecture.md CHANGED
@@ -11,8 +11,9 @@ ones (only intentional in a couple of places β€” `_run_pipeline` in
11
  Photo (RGB ndarray)
12
  β”‚
13
  β”œβ”€β–Ί [1] InsightFaceAnalyzer (insightface buffalo_l, ONNX)
14
- β”‚ β†’ face_bbox, face_confidence, face_embedding (512-d ArcFace),
15
- β”‚ age_estimate, age_range, gender + confidences
 
16
  β”‚
17
  β”œβ”€β–Ί Build face crop from face_bbox + padding. Downstream analyzers
18
  β”‚ that benefit from a tighter input read the crop; MediaPipe gets
@@ -24,7 +25,16 @@ Photo (RGB ndarray)
24
  β”‚ facial_asymmetry_score, smile_asymmetry, possible_dimples,
25
  β”‚ possible_unibrow.
26
  β”‚
27
- β”œβ”€β–Ί [3] EthnicityAnalyzer (cledoux42/Ethnicity_Test_v003 ViT)
 
 
 
 
 
 
 
 
 
28
  β”‚ β†’ ethnicity, ethnicity_confidence, ethnicity_distribution
29
  β”‚ (cropped input).
30
  β”‚
@@ -72,8 +82,10 @@ strips them before returning JSON.
72
 
73
  | Section | Field(s) | Source |
74
  |---|---|---|
75
- | Demographics | face_bbox, face_confidence, face_embedding (512-d), age_estimate, age_range, age_confidence, gender, gender_confidence | InsightFace buffalo_l |
76
- | Demographics | ethnicity, ethnicity_confidence, ethnicity_distribution | EthnicityAnalyzer (cledoux42 ViT) |
 
 
77
  | Emotion | primary/secondary emotion, emotion_scores, valence, arousal, mood | HSEmotion EffNet-B0 |
78
  | Face Structure | face_shape (+ 4 ratios), jawline_type/angle, chin_type, cheekbone_prominence, cheek_fullness, forehead_width, facial_asymmetry_score | MediaPipe Face Landmarker |
79
  | Hair | hair_length, hair_present | SegFormer-B5 |
@@ -86,7 +98,7 @@ strips them before returning JSON.
86
  | Lips & Mouth | lip_fullness, lip_balance, mouth_width, cupids_bow, smile_asymmetry, possible_dimples, smiling, mouth_open | MediaPipe (last two via blendshapes) |
87
  | Lips & Mouth | lip_color (shade + hex) | ColorAnalyzer (mask from MediaPipe) |
88
  | Skin | skin_tone (Fitzpatrick, L*/a*/b*, hex), skin_undertone | ColorAnalyzer |
89
- | Skin | wrinkle_level, skin_texture_score, skin_uniformity, freckles_or_moles | SegFormer mask + OpenCV stats |
90
  | Accessories | wearing_glasses, wearing_sunglasses, wearing_mask | ObstructionViT (dima806) |
91
  | Accessories | wearing_hat | SegFormer (hat class coverage) |
92
  | Aesthetics | beauty_score (1–5), beauty_score_norm (0–100) | BeautyAnalyzer (SCUT-FBP5500 ResNet-50) |
 
11
  Photo (RGB ndarray)
12
  β”‚
13
  β”œβ”€β–Ί [1] InsightFaceAnalyzer (insightface buffalo_l, ONNX)
14
+ β”‚ β†’ face_bbox, face_confidence, face_embedding (512-d ArcFace).
15
+ β”‚ Age & gender used to come from here; they now live in
16
+ β”‚ dedicated analyzers (steps 3a / 3b).
17
  β”‚
18
  β”œβ”€β–Ί Build face crop from face_bbox + padding. Downstream analyzers
19
  β”‚ that benefit from a tighter input read the crop; MediaPipe gets
 
25
  β”‚ facial_asymmetry_score, smile_asymmetry, possible_dimples,
26
  β”‚ possible_unibrow.
27
  β”‚
28
+ β”œβ”€β–Ί [3a] AgeAnalyzer (MiVOLO regressor, ~4 yr MAE on AgeDB)
29
+ β”‚ β†’ age_estimate, age_range, age_model_source
30
+ β”‚ (cropped input). Replaces the InsightFace age head.
31
+ β”‚
32
+ β”œβ”€β–Ί [3b] GenderAnalyzer (dima806/fairface_gender ViT)
33
+ β”‚ β†’ gender, gender_confidence, gender_distribution
34
+ β”‚ (cropped input). Replaces the InsightFace gender head so
35
+ β”‚ we get a real softmax confidence.
36
+ β”‚
37
+ β”œβ”€β–Ί [3c] EthnicityAnalyzer (cledoux42/Ethnicity_Test_v003 ViT)
38
  β”‚ β†’ ethnicity, ethnicity_confidence, ethnicity_distribution
39
  β”‚ (cropped input).
40
  β”‚
 
82
 
83
  | Section | Field(s) | Source |
84
  |---|---|---|
85
+ | Demographics | face_bbox, face_confidence, face_embedding (512-d) | InsightFace buffalo_l |
86
+ | Demographics | age_estimate, age_range, age_model_source | MiVOLO regressor |
87
+ | Demographics | gender, gender_confidence, gender_distribution | FairFace ViT |
88
+ | Demographics | ethnicity, ethnicity_confidence, ethnicity_distribution | cledoux42 ViT |
89
  | Emotion | primary/secondary emotion, emotion_scores, valence, arousal, mood | HSEmotion EffNet-B0 |
90
  | Face Structure | face_shape (+ 4 ratios), jawline_type/angle, chin_type, cheekbone_prominence, cheek_fullness, forehead_width, facial_asymmetry_score | MediaPipe Face Landmarker |
91
  | Hair | hair_length, hair_present | SegFormer-B5 |
 
98
  | Lips & Mouth | lip_fullness, lip_balance, mouth_width, cupids_bow, smile_asymmetry, possible_dimples, smiling, mouth_open | MediaPipe (last two via blendshapes) |
99
  | Lips & Mouth | lip_color (shade + hex) | ColorAnalyzer (mask from MediaPipe) |
100
  | Skin | skin_tone (Fitzpatrick, L*/a*/b*, hex), skin_undertone | ColorAnalyzer |
101
+ | Skin | wrinkle_level, skin_texture_score, skin_uniformity | SegFormer mask + OpenCV stats (`freckles_or_moles` still computed server-side but no longer displayed β€” detector was too noisy) |
102
  | Accessories | wearing_glasses, wearing_sunglasses, wearing_mask | ObstructionViT (dima806) |
103
  | Accessories | wearing_hat | SegFormer (hat class coverage) |
104
  | Aesthetics | beauty_score (1–5), beauty_score_norm (0–100) | BeautyAnalyzer (SCUT-FBP5500 ResNet-50) |
requirements.txt CHANGED
@@ -15,3 +15,8 @@ transformers==4.45.2
15
  hsemotion>=0.2.2
16
  insightface>=0.7.3
17
  onnxruntime>=1.18.0
 
 
 
 
 
 
15
  hsemotion>=0.2.2
16
  insightface>=0.7.3
17
  onnxruntime>=1.18.0
18
+ # MiVOLO for age regression. Installed from git because the project
19
+ # doesn't publish a PyPI release; pip resolves the git+ URL fine at
20
+ # Docker build time.
21
+ mivolo @ git+https://github.com/WildChlamydia/MiVOLO.git
22
+ scipy>=1.10.0