Spaces:
Sleeping
Sleeping
Evan Li commited on
Commit Β·
abec69f
1
Parent(s): dcbc516
mivolo, freckles
Browse files- Dockerfile +10 -4
- README.md +4 -2
- analyzers/aesthetic_analyzer.py +114 -64
- analyzers/age_analyzer.py +200 -0
- analyzers/gender_analyzer.py +92 -0
- analyzers/insightface_analyzer.py +14 -64
- app.py +56 -18
- architecture.md +18 -6
- requirements.txt +5 -0
Dockerfile
CHANGED
|
@@ -23,15 +23,21 @@ RUN mkdir -p models && \
|
|
| 23 |
"https://storage.googleapis.com/mediapipe-models/face_landmarker/face_landmarker/float16/latest/face_landmarker.task"
|
| 24 |
|
| 25 |
# Pre-download InsightFace buffalo_l bundle (detection + recognition +
|
| 26 |
-
#
|
| 27 |
-
#
|
| 28 |
-
#
|
| 29 |
RUN mkdir -p /root/.insightface/models && \
|
| 30 |
wget -q -O /root/.insightface/models/buffalo_l.zip \
|
| 31 |
"https://github.com/deepinsight/insightface/releases/download/v0.7/buffalo_l.zip" && \
|
| 32 |
cd /root/.insightface/models && unzip -q buffalo_l.zip -d buffalo_l && rm buffalo_l.zip
|
| 33 |
|
| 34 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 35 |
|
| 36 |
COPY . .
|
| 37 |
|
|
|
|
| 23 |
"https://storage.googleapis.com/mediapipe-models/face_landmarker/face_landmarker/float16/latest/face_landmarker.task"
|
| 24 |
|
| 25 |
# Pre-download InsightFace buffalo_l bundle (detection + recognition +
|
| 26 |
+
# 106 landmarks) so the first /analyze call doesn't pay the ~280MB
|
| 27 |
+
# download. The bundle auto-extracts under ~/.insightface/models/ on
|
| 28 |
+
# first use.
|
| 29 |
RUN mkdir -p /root/.insightface/models && \
|
| 30 |
wget -q -O /root/.insightface/models/buffalo_l.zip \
|
| 31 |
"https://github.com/deepinsight/insightface/releases/download/v0.7/buffalo_l.zip" && \
|
| 32 |
cd /root/.insightface/models && unzip -q buffalo_l.zip -d buffalo_l && rm buffalo_l.zip
|
| 33 |
|
| 34 |
+
# MiVOLO checkpoint is too large for the Docker image (~300 MB) β it's
|
| 35 |
+
# lazy-downloaded from HF Hub on first AgeAnalyzer init, via
|
| 36 |
+
# AGE_HF_REPO_ID (defaults to iitolstykh/mivolo_d1). To bundle it at
|
| 37 |
+
# build time instead, uncomment the lines below and pin the filename.
|
| 38 |
+
# RUN mkdir -p models && \
|
| 39 |
+
# wget -q -O models/mivolo_d1.pth.tar \
|
| 40 |
+
# "https://huggingface.co/iitolstykh/mivolo_d1/resolve/main/mivolo_d1.pth.tar"
|
| 41 |
|
| 42 |
COPY . .
|
| 43 |
|
README.md
CHANGED
|
@@ -18,9 +18,11 @@ face-recognition embedding and an aesthetic "chopped score."
|
|
| 18 |
|
| 19 |
| # | Component | Model | Task | Size |
|
| 20 |
|---|-----------|-------|------|------|
|
| 21 |
-
| 1 | InsightFace | `buffalo_l` (SCRFD + ArcFace ResNet50, ONNX) | Detection + 512-d recognition embedding +
|
| 22 |
| 2 | MediaPipe Face Landmarker | `face_landmarker.task` (Google) | 478 3D landmarks + 52 ARKit blendshapes β geometric features, smiling, mouth-open | ~4 MB |
|
| 23 |
-
|
|
|
|
|
|
|
|
| 24 |
| 4 | Human parsing | `matei-dorian/segformer-b5-finetuned-human-parsing` | 18-class pixel segmentation β masks + hair length + hat | ~340 MB |
|
| 25 |
| 5 | Emotion | HSEmotion `enet_b0_8_best_afew` (EfficientNet-B0) | 8-class emotion + valence/arousal | ~20 MB |
|
| 26 |
| 6 | Color analysis | (no model β OpenCV LAB/HSV) | Skin tone, hair color, eye color, lip color | 0 MB |
|
|
|
|
| 18 |
|
| 19 |
| # | Component | Model | Task | Size |
|
| 20 |
|---|-----------|-------|------|------|
|
| 21 |
+
| 1 | InsightFace | `buffalo_l` (SCRFD + ArcFace ResNet50, ONNX) | Detection + 512-d recognition embedding + 106 landmarks (99.83% LFW) | ~280 MB |
|
| 22 |
| 2 | MediaPipe Face Landmarker | `face_landmarker.task` (Google) | 478 3D landmarks + 52 ARKit blendshapes β geometric features, smiling, mouth-open | ~4 MB |
|
| 23 |
+
| 3a | Age | MiVOLO (VOLO transformer, regression) | Continuous age (~4 yr MAE on AgeDB) | ~300 MB |
|
| 24 |
+
| 3b | Gender | `dima806/fairface_gender_image_detection` (ViT) | Binary gender + softmax confidence (~93.4% acc) | ~340 MB |
|
| 25 |
+
| 3c | Ethnicity | `cledoux42/Ethnicity_Test_v003` (ViT) | 5-class ethnicity (~79.6% acc) | ~340 MB |
|
| 26 |
| 4 | Human parsing | `matei-dorian/segformer-b5-finetuned-human-parsing` | 18-class pixel segmentation β masks + hair length + hat | ~340 MB |
|
| 27 |
| 5 | Emotion | HSEmotion `enet_b0_8_best_afew` (EfficientNet-B0) | 8-class emotion + valence/arousal | ~20 MB |
|
| 28 |
| 6 | Color analysis | (no model β OpenCV LAB/HSV) | Skin tone, hair color, eye color, lip color | 0 MB |
|
analyzers/aesthetic_analyzer.py
CHANGED
|
@@ -4,41 +4,46 @@ AestheticAnalyzer β "chopped score" aggregator.
|
|
| 4 |
What it does
|
| 5 |
------------
|
| 6 |
Reads the merged result dict from every other analyzer and produces a
|
| 7 |
-
single numeric
|
| 8 |
-
|
| 9 |
arbitrary rubric encoded here). The breakdown lets you tune weights
|
| 10 |
or flip polarity client-side without rerunning inference.
|
| 11 |
|
| 12 |
Score composition
|
| 13 |
-----------------
|
| 14 |
-
|
| 15 |
|
| 16 |
-
1. **Learned beauty regressor** (
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
|
|
|
|
|
|
| 20 |
|
| 21 |
-
2. **Rule-based factor sum**: penalties
|
| 22 |
-
|
| 23 |
-
jawline, prominent cheekbones, clear skin, balanced lips, and
|
| 24 |
-
dimples. Each factor is documented in `_compute_rule_score`.
|
| 25 |
-
This is the only signal when the regressor isn't loaded
|
| 26 |
-
(BeautyAnalyzer returns None).
|
| 27 |
|
| 28 |
Blend math
|
| 29 |
----------
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
else:
|
| 33 |
-
chopped = rule_score
|
| 34 |
chopped is clamped to [0, 100].
|
| 35 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
Subjectivity disclaimer
|
| 37 |
-----------------------
|
| 38 |
-
Every weight in this file is a guess. "Beauty" is subjective,
|
| 39 |
-
biased, and reductive. Treat the score as an in-joke
|
| 40 |
-
expose it as objective truth. The UI gates the row
|
| 41 |
-
Settings toggle off-by-default for that reason.
|
| 42 |
|
| 43 |
Note: this analyzer takes no image input β it reads the merged result
|
| 44 |
dict produced by every other analyzer that ran ahead of it.
|
|
@@ -48,12 +53,36 @@ from typing import Any
|
|
| 48 |
|
| 49 |
|
| 50 |
# How much weight the learned beauty regressor gets when both signals
|
| 51 |
-
# are available.
|
| 52 |
-
|
|
|
|
| 53 |
|
| 54 |
# Baseline score. Penalties push up, bonuses pull down.
|
| 55 |
BASELINE = 50.0
|
| 56 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 57 |
|
| 58 |
class AestheticAnalyzer:
|
| 59 |
def __init__(self):
|
|
@@ -71,15 +100,16 @@ class AestheticAnalyzer:
|
|
| 71 |
|
| 72 |
beauty_norm = merged.get("beauty_score_norm")
|
| 73 |
if beauty_norm is not None:
|
| 74 |
-
|
| 75 |
-
# scaling). Flip to unattractiveness axis: 100 - x.
|
| 76 |
-
learned_unattractive = 100.0 - float(beauty_norm)
|
| 77 |
chopped = (
|
| 78 |
LEARNED_WEIGHT * learned_unattractive
|
| 79 |
+ (1.0 - LEARNED_WEIGHT) * rule_score
|
| 80 |
)
|
|
|
|
|
|
|
|
|
|
| 81 |
breakdown["learned_unattractive"] = round(
|
| 82 |
-
LEARNED_WEIGHT * learned_unattractive -
|
| 83 |
)
|
| 84 |
breakdown["_blend_weight_learned"] = LEARNED_WEIGHT
|
| 85 |
else:
|
|
@@ -105,9 +135,14 @@ class AestheticAnalyzer:
|
|
| 105 |
def _compute_rule_score(d: dict[str, Any]) -> tuple[float, dict[str, float]]:
|
| 106 |
"""Hand-tuned weighted sum over previously-extracted attributes.
|
| 107 |
|
| 108 |
-
Returns (score, breakdown_dict). The breakdown gives each
|
| 109 |
-
signed contribution so a UI can show *why* a score
|
| 110 |
-
it did. Score starts at BASELINE (50) and moves
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 111 |
"""
|
| 112 |
score = BASELINE
|
| 113 |
breakdown: dict[str, float] = {}
|
|
@@ -118,13 +153,13 @@ class AestheticAnalyzer:
|
|
| 118 |
# MediaPipe `facial_asymmetry_score` is already in this range.
|
| 119 |
asym = d.get("facial_asymmetry_score")
|
| 120 |
if isinstance(asym, (int, float)):
|
| 121 |
-
penalty = float(asym) *
|
| 122 |
score += penalty
|
| 123 |
breakdown["asymmetry_penalty"] = round(penalty, 2)
|
| 124 |
|
| 125 |
# Wrinkle level from SegFormer + OpenCV Laplacian classification.
|
| 126 |
wrinkle_penalty_map = {
|
| 127 |
-
"smooth": 0.0, "slight":
|
| 128 |
}
|
| 129 |
wrinkle = d.get("wrinkle_level")
|
| 130 |
if wrinkle in wrinkle_penalty_map:
|
|
@@ -132,36 +167,38 @@ class AestheticAnalyzer:
|
|
| 132 |
score += penalty
|
| 133 |
breakdown["wrinkle_penalty"] = penalty
|
| 134 |
|
| 135 |
-
# Skin uniformity = LAB L* std-dev over the
|
| 136 |
-
# std means uneven tone (shadows, blemishes).
|
| 137 |
uniformity = d.get("skin_uniformity")
|
| 138 |
if isinstance(uniformity, (int, float)) and uniformity > 0:
|
| 139 |
-
# Empirically
|
| 140 |
-
# skin
|
| 141 |
-
penalty = min(
|
| 142 |
score += penalty
|
| 143 |
breakdown["skin_unevenness_penalty"] = round(penalty, 2)
|
| 144 |
|
| 145 |
-
#
|
| 146 |
-
|
| 147 |
-
freckles = d.get("freckles_or_moles")
|
| 148 |
-
if freckles in freckle_penalty_map:
|
| 149 |
-
penalty = freckle_penalty_map[freckles]
|
| 150 |
-
score += penalty
|
| 151 |
-
breakdown["freckles_penalty"] = penalty
|
| 152 |
|
| 153 |
# Smile asymmetry: 0 = perfectly symmetric smile, larger = lopsided.
|
| 154 |
smile_asym = d.get("smile_asymmetry")
|
| 155 |
if isinstance(smile_asym, (int, float)):
|
| 156 |
-
penalty = min(
|
| 157 |
score += penalty
|
| 158 |
breakdown["smile_asymmetry_penalty"] = round(penalty, 2)
|
| 159 |
|
| 160 |
# Photo-quality penalty: sunglasses/mask hide features and the
|
| 161 |
# model is guessing more. Mild penalty, not a personal trait.
|
| 162 |
if d.get("wearing_sunglasses") or d.get("wearing_mask"):
|
| 163 |
-
score +=
|
| 164 |
-
breakdown["obstruction_penalty"] =
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 165 |
|
| 166 |
# ββ Bonuses (pull score down = less chopped) βββββββββββββββββ
|
| 167 |
|
|
@@ -169,52 +206,65 @@ class AestheticAnalyzer:
|
|
| 169 |
# take the stronger of the two contributions.
|
| 170 |
jaw_bonus = 0.0
|
| 171 |
jaw_type = d.get("jawline_type")
|
| 172 |
-
jaw_type_bonus_map = {"sharp": -
|
| 173 |
if jaw_type in jaw_type_bonus_map:
|
| 174 |
jaw_bonus = jaw_type_bonus_map[jaw_type]
|
| 175 |
jaw_angle = d.get("jawline_angle")
|
| 176 |
if isinstance(jaw_angle, (int, float)) and jaw_angle < 115:
|
| 177 |
# Sharp angles add more on top of the categorical signal.
|
| 178 |
-
jaw_bonus = min(jaw_bonus, -
|
| 179 |
if jaw_bonus:
|
| 180 |
score += jaw_bonus
|
| 181 |
breakdown["jaw_definition_bonus"] = round(jaw_bonus, 2)
|
| 182 |
|
| 183 |
# Cheekbone prominence.
|
| 184 |
-
cheek_bonus_map = {"high": -
|
| 185 |
cheek = d.get("cheekbone_prominence")
|
| 186 |
if cheek in cheek_bonus_map:
|
| 187 |
bonus = cheek_bonus_map[cheek]
|
| 188 |
score += bonus
|
| 189 |
breakdown["cheekbone_bonus"] = bonus
|
| 190 |
|
| 191 |
-
# Skin clarity bonus when the texture score is low (
|
| 192 |
-
# skin_texture_score is the same Laplacian-density value used by
|
| 193 |
-
# wrinkle_level; β€4 is "smooth" territory.
|
| 194 |
texture = d.get("skin_texture_score")
|
| 195 |
-
if isinstance(texture, (int, float)) and 0 < texture <=
|
| 196 |
-
score -=
|
| 197 |
-
breakdown["skin_clarity_bonus"] = -
|
| 198 |
|
| 199 |
# Lip fullness β "average" and "full" both read as healthy.
|
| 200 |
lip = d.get("lip_fullness")
|
| 201 |
-
if lip
|
| 202 |
-
score -=
|
| 203 |
-
breakdown["lip_fullness_bonus"] = -
|
|
|
|
|
|
|
|
|
|
| 204 |
|
| 205 |
# Defined cupid's bow.
|
| 206 |
if d.get("cupids_bow") == "defined":
|
| 207 |
-
score -=
|
| 208 |
-
breakdown["cupids_bow_bonus"] = -
|
| 209 |
|
| 210 |
# Normal eye spacing.
|
| 211 |
if d.get("eye_spacing") == "average":
|
| 212 |
-
score -=
|
| 213 |
-
breakdown["eye_spacing_bonus"] = -
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 214 |
|
| 215 |
# Dimples β small bonus when the MediaPipe heuristic fires.
|
| 216 |
if d.get("possible_dimples"):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 217 |
score -= 3.0
|
| 218 |
-
breakdown["
|
| 219 |
|
| 220 |
return score, breakdown
|
|
|
|
| 4 |
What it does
|
| 5 |
------------
|
| 6 |
Reads the merged result dict from every other analyzer and produces a
|
| 7 |
+
single numeric chopped_score (0-100) plus a per-factor breakdown.
|
| 8 |
+
Higher = more chopped = less conventionally attractive (by the
|
| 9 |
arbitrary rubric encoded here). The breakdown lets you tune weights
|
| 10 |
or flip polarity client-side without rerunning inference.
|
| 11 |
|
| 12 |
Score composition
|
| 13 |
-----------------
|
| 14 |
+
Final chopped_score is a weighted blend of two sources:
|
| 15 |
|
| 16 |
+
1. **Learned beauty regressor** (BeautyAnalyzer, SCUT-FBP5500):
|
| 17 |
+
raw score in [1.0, 5.0] mapped to a "stretched" 0-100 axis. The
|
| 18 |
+
raw model output is fairly concentrated around 2.5-3.5 (most
|
| 19 |
+
faces), which would cluster scores near the middle. We stretch
|
| 20 |
+
the [2.0, 4.0] sub-range to fill [0, 100] so the tier system
|
| 21 |
+
gets meaningful spread.
|
| 22 |
|
| 23 |
+
2. **Rule-based factor sum**: scaled penalties + bonuses on top of
|
| 24 |
+
a baseline of 50. Factors are documented in `_compute_rule_score`.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
|
| 26 |
Blend math
|
| 27 |
----------
|
| 28 |
+
learned_unattractive = stretched_unattractive(beauty_norm)
|
| 29 |
+
chopped = 0.6 * learned_unattractive + 0.4 * rule_score
|
|
|
|
|
|
|
| 30 |
chopped is clamped to [0, 100].
|
| 31 |
|
| 32 |
+
Tuning history
|
| 33 |
+
--------------
|
| 34 |
+
- Original rule factors landed almost everyone at 50 Β± 10. Scaled up
|
| 35 |
+
by ~1.6Γ to give attributes more bite. The learned signal got a
|
| 36 |
+
matching stretch (75 β 100, 25 β 0) so it isn't drowned out.
|
| 37 |
+
- Freckles/moles penalty was removed β the underlying SegFormer-based
|
| 38 |
+
detector was unreliable and the metric was effectively penalising
|
| 39 |
+
shadows and pores.
|
| 40 |
+
|
| 41 |
Subjectivity disclaimer
|
| 42 |
-----------------------
|
| 43 |
+
Every weight in this file is a guess. "Beauty" is subjective,
|
| 44 |
+
culturally biased, and reductive. Treat the score as an in-joke
|
| 45 |
+
metric; never expose it as objective truth. The UI gates the row
|
| 46 |
+
behind a Settings toggle off-by-default for that reason.
|
| 47 |
|
| 48 |
Note: this analyzer takes no image input β it reads the merged result
|
| 49 |
dict produced by every other analyzer that ran ahead of it.
|
|
|
|
| 53 |
|
| 54 |
|
| 55 |
# How much weight the learned beauty regressor gets when both signals
|
| 56 |
+
# are available. 0.6 leaves more headroom for the rule sum than the
|
| 57 |
+
# old 0.7 β small change, but the rule factors moved harder.
|
| 58 |
+
LEARNED_WEIGHT = 0.6
|
| 59 |
|
| 60 |
# Baseline score. Penalties push up, bonuses pull down.
|
| 61 |
BASELINE = 50.0
|
| 62 |
|
| 63 |
+
# Stretch the learned-beauty 0-100 axis so it covers the chopped
|
| 64 |
+
# spectrum more dramatically. Beauty norms in (LEARNED_NORM_LO, _HI)
|
| 65 |
+
# map linearly to (0, 100). Below the lo bound is "fully chopped"
|
| 66 |
+
# territory (learned_unattractive = 100); above the hi bound is
|
| 67 |
+
# "fully gigachad" (learned_unattractive = 0).
|
| 68 |
+
LEARNED_NORM_LO = 25.0 # raw score β 2.0
|
| 69 |
+
LEARNED_NORM_HI = 75.0 # raw score β 4.0
|
| 70 |
+
|
| 71 |
+
|
| 72 |
+
def _stretch_unattractive(beauty_norm: float) -> float:
|
| 73 |
+
"""Map BeautyAnalyzer's 0-100 normalised score to a stretched
|
| 74 |
+
unattractiveness 0-100. 75 β 0 (gigachad), 25 β 100 (megachopped).
|
| 75 |
+
"""
|
| 76 |
+
if beauty_norm is None:
|
| 77 |
+
return 50.0
|
| 78 |
+
# Invert the axis then linearly stretch (LEARNED_NORM_LO, _HI).
|
| 79 |
+
unattractive = 100.0 - float(beauty_norm)
|
| 80 |
+
# unattractive: 25 (gigachad-ish) -> 100 (megachopped-ish)
|
| 81 |
+
span = (100.0 - LEARNED_NORM_LO) - (100.0 - LEARNED_NORM_HI) # = 50
|
| 82 |
+
lo_after_invert = 100.0 - LEARNED_NORM_HI # = 25
|
| 83 |
+
stretched = (unattractive - lo_after_invert) / span * 100.0
|
| 84 |
+
return max(0.0, min(100.0, stretched))
|
| 85 |
+
|
| 86 |
|
| 87 |
class AestheticAnalyzer:
|
| 88 |
def __init__(self):
|
|
|
|
| 100 |
|
| 101 |
beauty_norm = merged.get("beauty_score_norm")
|
| 102 |
if beauty_norm is not None:
|
| 103 |
+
learned_unattractive = _stretch_unattractive(float(beauty_norm))
|
|
|
|
|
|
|
| 104 |
chopped = (
|
| 105 |
LEARNED_WEIGHT * learned_unattractive
|
| 106 |
+ (1.0 - LEARNED_WEIGHT) * rule_score
|
| 107 |
)
|
| 108 |
+
# Show the learned contribution as a signed offset from
|
| 109 |
+
# baseline so the breakdown reads consistently with rule
|
| 110 |
+
# factors.
|
| 111 |
breakdown["learned_unattractive"] = round(
|
| 112 |
+
LEARNED_WEIGHT * (learned_unattractive - BASELINE), 2
|
| 113 |
)
|
| 114 |
breakdown["_blend_weight_learned"] = LEARNED_WEIGHT
|
| 115 |
else:
|
|
|
|
| 135 |
def _compute_rule_score(d: dict[str, Any]) -> tuple[float, dict[str, float]]:
|
| 136 |
"""Hand-tuned weighted sum over previously-extracted attributes.
|
| 137 |
|
| 138 |
+
Returns (score, breakdown_dict). The breakdown gives each
|
| 139 |
+
factor's signed contribution so a UI can show *why* a score
|
| 140 |
+
landed where it did. Score starts at BASELINE (50) and moves
|
| 141 |
+
up/down.
|
| 142 |
+
|
| 143 |
+
All penalty / bonus magnitudes are scaled up vs. the original
|
| 144 |
+
implementation so attributes contribute meaningfully to the
|
| 145 |
+
spread instead of nudging everyone toward 50.
|
| 146 |
"""
|
| 147 |
score = BASELINE
|
| 148 |
breakdown: dict[str, float] = {}
|
|
|
|
| 153 |
# MediaPipe `facial_asymmetry_score` is already in this range.
|
| 154 |
asym = d.get("facial_asymmetry_score")
|
| 155 |
if isinstance(asym, (int, float)):
|
| 156 |
+
penalty = float(asym) * 30.0 # was 18; bigger asymmetric β more spread
|
| 157 |
score += penalty
|
| 158 |
breakdown["asymmetry_penalty"] = round(penalty, 2)
|
| 159 |
|
| 160 |
# Wrinkle level from SegFormer + OpenCV Laplacian classification.
|
| 161 |
wrinkle_penalty_map = {
|
| 162 |
+
"smooth": 0.0, "slight": 6.0, "moderate": 14.0, "prominent": 20.0,
|
| 163 |
}
|
| 164 |
wrinkle = d.get("wrinkle_level")
|
| 165 |
if wrinkle in wrinkle_penalty_map:
|
|
|
|
| 167 |
score += penalty
|
| 168 |
breakdown["wrinkle_penalty"] = penalty
|
| 169 |
|
| 170 |
+
# Skin uniformity = LAB L* std-dev over the eroded interior
|
| 171 |
+
# face mask. Higher std means uneven tone (shadows, blemishes).
|
| 172 |
uniformity = d.get("skin_uniformity")
|
| 173 |
if isinstance(uniformity, (int, float)) and uniformity > 0:
|
| 174 |
+
# Empirically uniformity sits ~8-15 in clean skin and
|
| 175 |
+
# 20-30 in uneven skin; scale up to +14 (was 8).
|
| 176 |
+
penalty = min(14.0, max(0.0, (float(uniformity) - 10.0) * 1.0))
|
| 177 |
score += penalty
|
| 178 |
breakdown["skin_unevenness_penalty"] = round(penalty, 2)
|
| 179 |
|
| 180 |
+
# NOTE: freckles_or_moles penalty deliberately removed β the
|
| 181 |
+
# detector was too noisy (shadows / pores counted as spots).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 182 |
|
| 183 |
# Smile asymmetry: 0 = perfectly symmetric smile, larger = lopsided.
|
| 184 |
smile_asym = d.get("smile_asymmetry")
|
| 185 |
if isinstance(smile_asym, (int, float)):
|
| 186 |
+
penalty = min(10.0, float(smile_asym) * 50.0) # was max 6, factor 30
|
| 187 |
score += penalty
|
| 188 |
breakdown["smile_asymmetry_penalty"] = round(penalty, 2)
|
| 189 |
|
| 190 |
# Photo-quality penalty: sunglasses/mask hide features and the
|
| 191 |
# model is guessing more. Mild penalty, not a personal trait.
|
| 192 |
if d.get("wearing_sunglasses") or d.get("wearing_mask"):
|
| 193 |
+
score += 8.0 # was 5
|
| 194 |
+
breakdown["obstruction_penalty"] = 8.0
|
| 195 |
+
|
| 196 |
+
# Hat coverage also obscures hairline / forehead β small fixed
|
| 197 |
+
# penalty so a hat doesn't accidentally help the score by
|
| 198 |
+
# blocking unflattering hair.
|
| 199 |
+
if d.get("hat_detected"):
|
| 200 |
+
score += 4.0
|
| 201 |
+
breakdown["hat_obscuration_penalty"] = 4.0
|
| 202 |
|
| 203 |
# ββ Bonuses (pull score down = less chopped) βββββββββββββββββ
|
| 204 |
|
|
|
|
| 206 |
# take the stronger of the two contributions.
|
| 207 |
jaw_bonus = 0.0
|
| 208 |
jaw_type = d.get("jawline_type")
|
| 209 |
+
jaw_type_bonus_map = {"sharp": -16.0, "strong": -10.0, "soft": 0.0}
|
| 210 |
if jaw_type in jaw_type_bonus_map:
|
| 211 |
jaw_bonus = jaw_type_bonus_map[jaw_type]
|
| 212 |
jaw_angle = d.get("jawline_angle")
|
| 213 |
if isinstance(jaw_angle, (int, float)) and jaw_angle < 115:
|
| 214 |
# Sharp angles add more on top of the categorical signal.
|
| 215 |
+
jaw_bonus = min(jaw_bonus, -16.0)
|
| 216 |
if jaw_bonus:
|
| 217 |
score += jaw_bonus
|
| 218 |
breakdown["jaw_definition_bonus"] = round(jaw_bonus, 2)
|
| 219 |
|
| 220 |
# Cheekbone prominence.
|
| 221 |
+
cheek_bonus_map = {"high": -11.0, "moderate": -5.0, "flat": 0.0}
|
| 222 |
cheek = d.get("cheekbone_prominence")
|
| 223 |
if cheek in cheek_bonus_map:
|
| 224 |
bonus = cheek_bonus_map[cheek]
|
| 225 |
score += bonus
|
| 226 |
breakdown["cheekbone_bonus"] = bonus
|
| 227 |
|
| 228 |
+
# Skin clarity bonus when the texture score is low (smooth).
|
|
|
|
|
|
|
| 229 |
texture = d.get("skin_texture_score")
|
| 230 |
+
if isinstance(texture, (int, float)) and 0 < texture <= 8:
|
| 231 |
+
score -= 14.0 # was -9
|
| 232 |
+
breakdown["skin_clarity_bonus"] = -14.0
|
| 233 |
|
| 234 |
# Lip fullness β "average" and "full" both read as healthy.
|
| 235 |
lip = d.get("lip_fullness")
|
| 236 |
+
if lip == "full":
|
| 237 |
+
score -= 8.0 # was -5
|
| 238 |
+
breakdown["lip_fullness_bonus"] = -8.0
|
| 239 |
+
elif lip == "average":
|
| 240 |
+
score -= 4.0
|
| 241 |
+
breakdown["lip_fullness_bonus"] = -4.0
|
| 242 |
|
| 243 |
# Defined cupid's bow.
|
| 244 |
if d.get("cupids_bow") == "defined":
|
| 245 |
+
score -= 5.0 # was -3
|
| 246 |
+
breakdown["cupids_bow_bonus"] = -5.0
|
| 247 |
|
| 248 |
# Normal eye spacing.
|
| 249 |
if d.get("eye_spacing") == "average":
|
| 250 |
+
score -= 6.0 # was -4
|
| 251 |
+
breakdown["eye_spacing_bonus"] = -6.0
|
| 252 |
+
|
| 253 |
+
# Symmetric face (independent of asymmetry penalty above; we
|
| 254 |
+
# explicitly reward very symmetric faces rather than just not
|
| 255 |
+
# penalising them).
|
| 256 |
+
if isinstance(asym, (int, float)) and asym < 0.15:
|
| 257 |
+
score -= 6.0
|
| 258 |
+
breakdown["symmetry_bonus"] = -6.0
|
| 259 |
|
| 260 |
# Dimples β small bonus when the MediaPipe heuristic fires.
|
| 261 |
if d.get("possible_dimples"):
|
| 262 |
+
score -= 5.0 # was -3
|
| 263 |
+
breakdown["dimples_bonus"] = -5.0
|
| 264 |
+
|
| 265 |
+
# Eyes-open bonus (closed eyes makes a face look worse).
|
| 266 |
+
if d.get("eyes_open") is True:
|
| 267 |
score -= 3.0
|
| 268 |
+
breakdown["eyes_open_bonus"] = -3.0
|
| 269 |
|
| 270 |
return score, breakdown
|
analyzers/age_analyzer.py
ADDED
|
@@ -0,0 +1,200 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
AgeAnalyzer β MiVOLO age regression.
|
| 3 |
+
|
| 4 |
+
Model
|
| 5 |
+
-----
|
| 6 |
+
- Architecture : MiVOLO (Multi-Input VOLO transformer). Designed to
|
| 7 |
+
predict age + gender from face + (optional) person
|
| 8 |
+
crops. Significantly better at young faces than
|
| 9 |
+
InsightFace's bundled genderage head.
|
| 10 |
+
- Reference : Kuprashevich & Tolstykh, "MiVOLO: Multi-input
|
| 11 |
+
Transformer for Age and Gender Estimation", 2023
|
| 12 |
+
(https://github.com/WildChlamydia/MiVOLO).
|
| 13 |
+
- Reported MAE : ~3.99 yrs (AgeDB), ~5.50 yrs (IMDB). Best published
|
| 14 |
+
numbers for face-only age regression.
|
| 15 |
+
- License : MIT (code) / weights see model card.
|
| 16 |
+
|
| 17 |
+
Inputs
|
| 18 |
+
------
|
| 19 |
+
img_rgb : np.ndarray (H, W, 3) uint8. Should already be a face crop
|
| 20 |
+
(the orchestrator hands us the InsightFace-cropped image
|
| 21 |
+
via `_crop_to_face` in app.py).
|
| 22 |
+
|
| 23 |
+
Outputs (dict)
|
| 24 |
+
--------------
|
| 25 |
+
age_estimate : float years (continuous regression output)
|
| 26 |
+
age_range : string bucket derived from age_estimate for
|
| 27 |
+
legacy compatibility with the UI
|
| 28 |
+
age_confidence : 1.0 (MiVOLO is a regressor β no probability)
|
| 29 |
+
age_model_source : "mivolo" | "insightface_fallback" | "unavailable"
|
| 30 |
+
|
| 31 |
+
Weight loading
|
| 32 |
+
--------------
|
| 33 |
+
The MiVOLO package needs both its Python implementation AND a
|
| 34 |
+
checkpoint file. Resolution order:
|
| 35 |
+
|
| 36 |
+
1. Local file at `models/mivolo_d1.pth.tar` (drop in after download).
|
| 37 |
+
2. Hugging Face Hub via `AGE_HF_REPO_ID` env var, falling back to the
|
| 38 |
+
public `iitolstykh/mivolo_d1` repo.
|
| 39 |
+
|
| 40 |
+
If neither resolves (or the `mivolo` package isn't installed), the
|
| 41 |
+
analyzer reports `age_model_source: unavailable` and the orchestrator
|
| 42 |
+
falls back to InsightFace's age head with the piecewise calibration
|
| 43 |
+
that used to live in InsightFaceAnalyzer.
|
| 44 |
+
"""
|
| 45 |
+
|
| 46 |
+
import os
|
| 47 |
+
from typing import Any
|
| 48 |
+
|
| 49 |
+
import numpy as np
|
| 50 |
+
|
| 51 |
+
# MiVOLO ships its own model class; defer the import so the module
|
| 52 |
+
# still loads when the package isn't installed.
|
| 53 |
+
try:
|
| 54 |
+
from mivolo.model.mi_volo import MiVOLO
|
| 55 |
+
HAS_MIVOLO = True
|
| 56 |
+
except ImportError:
|
| 57 |
+
HAS_MIVOLO = False
|
| 58 |
+
|
| 59 |
+
|
| 60 |
+
# Same legacy bucket schema we use elsewhere so UI rows that show
|
| 61 |
+
# `age_range` keep rendering whatever the source.
|
| 62 |
+
AGE_BUCKETS = [
|
| 63 |
+
(0, 3, "0-2"), (3, 10, "3-9"), (10, 20, "10-19"),
|
| 64 |
+
(20, 30, "20-29"), (30, 40, "30-39"), (40, 50, "40-49"),
|
| 65 |
+
(50, 60, "50-59"), (60, 70, "60-69"), (70, 200, "70+"),
|
| 66 |
+
]
|
| 67 |
+
|
| 68 |
+
LOCAL_WEIGHTS_PATH = os.environ.get(
|
| 69 |
+
"AGE_WEIGHTS_PATH", "models/mivolo_d1.pth.tar"
|
| 70 |
+
)
|
| 71 |
+
HF_REPO_ID = os.environ.get("AGE_HF_REPO_ID", "iitolstykh/mivolo_d1")
|
| 72 |
+
HF_FILENAME = os.environ.get("AGE_HF_FILENAME", "mivolo_d1.pth.tar")
|
| 73 |
+
|
| 74 |
+
|
| 75 |
+
class AgeAnalyzer:
|
| 76 |
+
def __init__(self):
|
| 77 |
+
self.model = None
|
| 78 |
+
self.source = "unavailable"
|
| 79 |
+
|
| 80 |
+
if not HAS_MIVOLO:
|
| 81 |
+
print(
|
| 82 |
+
"[AgeAnalyzer] mivolo package not installed. Install with "
|
| 83 |
+
"`pip install git+https://github.com/WildChlamydia/MiVOLO.git`. "
|
| 84 |
+
"Falling back to InsightFace age until then."
|
| 85 |
+
)
|
| 86 |
+
return
|
| 87 |
+
|
| 88 |
+
weights_path = self._resolve_weights_path()
|
| 89 |
+
if weights_path is None:
|
| 90 |
+
print(
|
| 91 |
+
"[AgeAnalyzer] No usable weights β local not found at "
|
| 92 |
+
f"{LOCAL_WEIGHTS_PATH}, HF download failed. "
|
| 93 |
+
"Set AGE_WEIGHTS_PATH or AGE_HF_REPO_ID."
|
| 94 |
+
)
|
| 95 |
+
return
|
| 96 |
+
|
| 97 |
+
try:
|
| 98 |
+
# use_persons=False β face-only inference (we don't have
|
| 99 |
+
# body detection in this pipeline). MiVOLO is best with
|
| 100 |
+
# both but face-only is still ~5 yrs MAE.
|
| 101 |
+
self.model = MiVOLO(
|
| 102 |
+
ckpt_path=weights_path,
|
| 103 |
+
device="cuda" if self._cuda_available() else "cpu",
|
| 104 |
+
half=False,
|
| 105 |
+
use_persons=False,
|
| 106 |
+
disable_faces=False,
|
| 107 |
+
verbose=False,
|
| 108 |
+
)
|
| 109 |
+
self.source = "mivolo"
|
| 110 |
+
print(f"[AgeAnalyzer] Loaded MiVOLO weights from {weights_path}")
|
| 111 |
+
except Exception as exc:
|
| 112 |
+
print(f"[AgeAnalyzer] MiVOLO init failed: {exc}")
|
| 113 |
+
self.model = None
|
| 114 |
+
|
| 115 |
+
@staticmethod
|
| 116 |
+
def _cuda_available() -> bool:
|
| 117 |
+
try:
|
| 118 |
+
import torch
|
| 119 |
+
return torch.cuda.is_available()
|
| 120 |
+
except ImportError:
|
| 121 |
+
return False
|
| 122 |
+
|
| 123 |
+
@staticmethod
|
| 124 |
+
def _resolve_weights_path() -> str | None:
|
| 125 |
+
"""Local file wins, HF Hub is the fallback."""
|
| 126 |
+
if os.path.exists(LOCAL_WEIGHTS_PATH):
|
| 127 |
+
return LOCAL_WEIGHTS_PATH
|
| 128 |
+
try:
|
| 129 |
+
from huggingface_hub import hf_hub_download
|
| 130 |
+
return hf_hub_download(repo_id=HF_REPO_ID, filename=HF_FILENAME)
|
| 131 |
+
except Exception as exc:
|
| 132 |
+
print(f"[AgeAnalyzer] HF Hub download failed: {exc}")
|
| 133 |
+
return None
|
| 134 |
+
|
| 135 |
+
def analyze(self, img_rgb: np.ndarray) -> dict[str, Any]:
|
| 136 |
+
if self.model is None:
|
| 137 |
+
return self._empty_result()
|
| 138 |
+
|
| 139 |
+
try:
|
| 140 |
+
# MiVOLO expects a face crop. img_rgb is already cropped
|
| 141 |
+
# to the InsightFace bbox by the orchestrator. The
|
| 142 |
+
# package handles its own resize / normalise; we just
|
| 143 |
+
# hand over the uint8 RGB array.
|
| 144 |
+
age = self._run_inference(img_rgb)
|
| 145 |
+
except Exception as exc:
|
| 146 |
+
print(f"[AgeAnalyzer] Inference failed: {exc}")
|
| 147 |
+
return self._empty_result()
|
| 148 |
+
|
| 149 |
+
return {
|
| 150 |
+
"age_estimate": round(float(age), 1),
|
| 151 |
+
"age_range": self._bucket_age(age),
|
| 152 |
+
"age_confidence": 1.0,
|
| 153 |
+
"age_model_source": self.source,
|
| 154 |
+
}
|
| 155 |
+
|
| 156 |
+
def _run_inference(self, face_crop: np.ndarray) -> float:
|
| 157 |
+
"""Run MiVOLO on a single face crop and return the predicted age.
|
| 158 |
+
|
| 159 |
+
The mivolo package's high-level `Predictor` wraps detection +
|
| 160 |
+
regression together. We've already done detection via
|
| 161 |
+
InsightFace, so we use the lower-level `MiVOLO` model directly
|
| 162 |
+
on the pre-cropped face. The model's `inference` method takes
|
| 163 |
+
a batched tensor; build it from the crop.
|
| 164 |
+
"""
|
| 165 |
+
import torch
|
| 166 |
+
# MiVOLO expects 224Γ224 ImageNet-normalised face input.
|
| 167 |
+
from torchvision import transforms
|
| 168 |
+
|
| 169 |
+
preprocess = transforms.Compose([
|
| 170 |
+
transforms.ToPILImage(),
|
| 171 |
+
transforms.Resize((224, 224)),
|
| 172 |
+
transforms.ToTensor(),
|
| 173 |
+
transforms.Normalize(
|
| 174 |
+
mean=[0.485, 0.456, 0.406],
|
| 175 |
+
std=[0.229, 0.224, 0.225],
|
| 176 |
+
),
|
| 177 |
+
])
|
| 178 |
+
face_tensor = preprocess(face_crop).unsqueeze(0).to(self.model.device)
|
| 179 |
+
|
| 180 |
+
# When use_persons=False, MiVOLO's inference takes a face
|
| 181 |
+
# tensor and a None (or zeros) person tensor of matching batch.
|
| 182 |
+
with torch.no_grad():
|
| 183 |
+
ages, _genders = self.model.inference(face_tensor, None)
|
| 184 |
+
return float(ages[0].item())
|
| 185 |
+
|
| 186 |
+
@staticmethod
|
| 187 |
+
def _bucket_age(age: float) -> str:
|
| 188 |
+
for lo, hi, label in AGE_BUCKETS:
|
| 189 |
+
if lo <= age < hi:
|
| 190 |
+
return label
|
| 191 |
+
return "unknown"
|
| 192 |
+
|
| 193 |
+
@staticmethod
|
| 194 |
+
def _empty_result() -> dict[str, Any]:
|
| 195 |
+
return {
|
| 196 |
+
"age_estimate": None,
|
| 197 |
+
"age_range": "unknown",
|
| 198 |
+
"age_confidence": 0.0,
|
| 199 |
+
"age_model_source": "unavailable",
|
| 200 |
+
}
|
analyzers/gender_analyzer.py
ADDED
|
@@ -0,0 +1,92 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
GenderAnalyzer β FairFace ViT for binary gender.
|
| 3 |
+
|
| 4 |
+
Model
|
| 5 |
+
-----
|
| 6 |
+
- HF repo : dima806/fairface_gender_image_detection
|
| 7 |
+
- Arch : Vision Transformer (ViT-B/16), 86M params
|
| 8 |
+
- Trained : FairFace dataset (gender-balanced)
|
| 9 |
+
- Reported : 93.4% accuracy
|
| 10 |
+
- License : Apache 2.0
|
| 11 |
+
- Source : https://huggingface.co/dima806/fairface_gender_image_detection
|
| 12 |
+
|
| 13 |
+
Why this instead of InsightFace's bundled head
|
| 14 |
+
----------------------------------------------
|
| 15 |
+
InsightFace's `genderage.onnx` does argmax only β it doesn't expose a
|
| 16 |
+
softmax confidence. Borderline calls and confident calls look identical
|
| 17 |
+
in the UI ("Female 100%"). FairFace gives a real probability so the UI
|
| 18 |
+
can show graded confidence.
|
| 19 |
+
|
| 20 |
+
Inputs
|
| 21 |
+
------
|
| 22 |
+
img_rgb : np.ndarray (H, W, 3) uint8. Typically a face crop produced
|
| 23 |
+
by `_crop_to_face` in app.py.
|
| 24 |
+
|
| 25 |
+
Outputs (dict)
|
| 26 |
+
--------------
|
| 27 |
+
gender : "male" | "female"
|
| 28 |
+
gender_confidence : float in [0, 1] (argmax softmax probability)
|
| 29 |
+
gender_distribution : { "male": p, "female": p }
|
| 30 |
+
gender_model_source : "fairface" | "unavailable"
|
| 31 |
+
"""
|
| 32 |
+
|
| 33 |
+
from typing import Any
|
| 34 |
+
|
| 35 |
+
from PIL import Image
|
| 36 |
+
from transformers import pipeline
|
| 37 |
+
|
| 38 |
+
|
| 39 |
+
MODEL_ID = "dima806/fairface_gender_image_detection"
|
| 40 |
+
LABELS = ["male", "female"]
|
| 41 |
+
|
| 42 |
+
|
| 43 |
+
class GenderAnalyzer:
|
| 44 |
+
def __init__(self):
|
| 45 |
+
self.classifier = None
|
| 46 |
+
try:
|
| 47 |
+
self.classifier = pipeline("image-classification", model=MODEL_ID)
|
| 48 |
+
except Exception as exc:
|
| 49 |
+
print(f"[GenderAnalyzer] Failed to load {MODEL_ID}: {exc}")
|
| 50 |
+
|
| 51 |
+
def analyze(self, img_rgb) -> dict[str, Any]:
|
| 52 |
+
if self.classifier is None:
|
| 53 |
+
return self._empty_result()
|
| 54 |
+
|
| 55 |
+
try:
|
| 56 |
+
pil = Image.fromarray(img_rgb)
|
| 57 |
+
preds = self.classifier(pil, top_k=2)
|
| 58 |
+
except Exception as exc:
|
| 59 |
+
print(f"[GenderAnalyzer] Prediction failed: {exc}")
|
| 60 |
+
return self._empty_result()
|
| 61 |
+
|
| 62 |
+
if not preds:
|
| 63 |
+
return self._empty_result()
|
| 64 |
+
|
| 65 |
+
# FairFace labels can come back capitalised; normalise to lowercase.
|
| 66 |
+
distribution = {label: 0.0 for label in LABELS}
|
| 67 |
+
for pred in preds:
|
| 68 |
+
label = str(pred["label"]).strip().lower()
|
| 69 |
+
if label in distribution:
|
| 70 |
+
distribution[label] = round(float(pred["score"]), 3)
|
| 71 |
+
|
| 72 |
+
# Top class wins; expose the actual softmax score as confidence.
|
| 73 |
+
top = preds[0]
|
| 74 |
+
top_label = str(top["label"]).strip().lower()
|
| 75 |
+
if top_label not in LABELS:
|
| 76 |
+
return self._empty_result()
|
| 77 |
+
|
| 78 |
+
return {
|
| 79 |
+
"gender": top_label,
|
| 80 |
+
"gender_confidence": round(float(top["score"]), 3),
|
| 81 |
+
"gender_distribution": distribution,
|
| 82 |
+
"gender_model_source": "fairface",
|
| 83 |
+
}
|
| 84 |
+
|
| 85 |
+
@staticmethod
|
| 86 |
+
def _empty_result() -> dict[str, Any]:
|
| 87 |
+
return {
|
| 88 |
+
"gender": "unknown",
|
| 89 |
+
"gender_confidence": 0.0,
|
| 90 |
+
"gender_distribution": {label: 0.0 for label in LABELS},
|
| 91 |
+
"gender_model_source": "unavailable",
|
| 92 |
+
}
|
analyzers/insightface_analyzer.py
CHANGED
|
@@ -1,12 +1,16 @@
|
|
| 1 |
"""
|
| 2 |
-
InsightFaceAnalyzer β detection +
|
| 3 |
|
| 4 |
Model
|
| 5 |
-----
|
| 6 |
- Package : `insightface` (https://github.com/deepinsight/insightface)
|
| 7 |
- Bundle : buffalo_l (ResNet50@WebFace600K backbone, ONNX)
|
| 8 |
-
-
|
| 9 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
- Size : ~280 MB (ONNX, mixed FP16/FP32)
|
| 11 |
- License : weights research-only; code Apache 2.0
|
| 12 |
- Source : https://github.com/deepinsight/insightface/tree/master/python-package
|
|
@@ -20,19 +24,14 @@ Outputs (dict)
|
|
| 20 |
face_bbox : [x1, y1, x2, y2] in pixel coordinates
|
| 21 |
face_confidence : SCRFD detection score
|
| 22 |
face_embedding : list[float] of length 512 (ArcFace, L2-normalised)
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
gender : "male" | "female"
|
| 27 |
-
gender_confidence : 1.0 by default (InsightFace doesn't expose a
|
| 28 |
-
gender softmax score; the head is argmax-only)
|
| 29 |
-
_insight_landmarks_2d : list of (x, y) tuples β 106 points (internal)
|
| 30 |
|
| 31 |
Accuracy
|
| 32 |
--------
|
| 33 |
- Recognition (ArcFace via buffalo_l): 99.83% LFW, 96.21% IJB-B FAR=1e-4.
|
| 34 |
-
-
|
| 35 |
-
In practice age MAE is ~5 years and gender ~94-96%.
|
| 36 |
|
| 37 |
Notes
|
| 38 |
-----
|
|
@@ -41,7 +40,6 @@ face when multiple are detected β the rest of the pipeline assumes a
|
|
| 41 |
single subject.
|
| 42 |
"""
|
| 43 |
|
| 44 |
-
import os
|
| 45 |
from typing import Any
|
| 46 |
|
| 47 |
import numpy as np
|
|
@@ -57,21 +55,6 @@ except ImportError:
|
|
| 57 |
|
| 58 |
MODEL_NAME = "buffalo_l"
|
| 59 |
|
| 60 |
-
# InsightFace's genderage head is known to overshoot adult ages by
|
| 61 |
-
# roughly 5 years in informal testing (no published calibration). We
|
| 62 |
-
# subtract a fixed offset to undo this bias; clamp to β₯1 so we never
|
| 63 |
-
# emit negative ages for kids. Override at runtime via the
|
| 64 |
-
# AGE_OFFSET_YEARS env var if you want to tune for your dataset.
|
| 65 |
-
AGE_OFFSET_YEARS = float(os.environ.get("AGE_OFFSET_YEARS", "5"))
|
| 66 |
-
|
| 67 |
-
# Age buckets used by the legacy UI. We derive these from the regression
|
| 68 |
-
# output so existing screens keep working.
|
| 69 |
-
AGE_BUCKETS = [
|
| 70 |
-
(0, 3, "0-2"), (3, 10, "3-9"), (10, 20, "10-19"),
|
| 71 |
-
(20, 30, "20-29"), (30, 40, "30-39"), (40, 50, "40-49"),
|
| 72 |
-
(50, 60, "50-59"), (60, 70, "60-69"), (70, 200, "70+"),
|
| 73 |
-
]
|
| 74 |
-
|
| 75 |
|
| 76 |
class InsightFaceAnalyzer:
|
| 77 |
def __init__(self):
|
|
@@ -79,20 +62,18 @@ class InsightFaceAnalyzer:
|
|
| 79 |
if not HAS_INSIGHTFACE:
|
| 80 |
print(
|
| 81 |
"[InsightFaceAnalyzer] insightface package not installed; "
|
| 82 |
-
"detection
|
| 83 |
)
|
| 84 |
return
|
| 85 |
|
| 86 |
try:
|
| 87 |
-
# Buffalo_L bundle auto-resolves under ~/.insightface/models/.
|
| 88 |
# CPUExecutionProvider is the right default for HF Spaces;
|
| 89 |
-
#
|
| 90 |
self.app = FaceAnalysis(
|
| 91 |
name=MODEL_NAME,
|
| 92 |
providers=["CPUExecutionProvider"],
|
| 93 |
)
|
| 94 |
-
# det_size=(640, 640) is the canonical SCRFD input.
|
| 95 |
-
# speeds inference but loses small faces.
|
| 96 |
self.app.prepare(ctx_id=-1, det_size=(640, 640))
|
| 97 |
except Exception as exc:
|
| 98 |
print(f"[InsightFaceAnalyzer] Failed to load {MODEL_NAME}: {exc}")
|
|
@@ -128,29 +109,10 @@ class InsightFaceAnalyzer:
|
|
| 128 |
else None
|
| 129 |
)
|
| 130 |
|
| 131 |
-
# Age head is a single float (years). Buffalo_L systematically
|
| 132 |
-
# over-predicts adults by ~5 years; subtract AGE_OFFSET_YEARS
|
| 133 |
-
# to recalibrate. Don't drop below 1 (negative ages would be
|
| 134 |
-
# absurd, and very young children are already on the noisy
|
| 135 |
-
# end of the model's training distribution).
|
| 136 |
-
raw_age = float(getattr(face, "age", 0.0))
|
| 137 |
-
age = max(1.0, raw_age - AGE_OFFSET_YEARS)
|
| 138 |
-
|
| 139 |
-
# Gender is exposed as 0 (female) / 1 (male) on Face objects.
|
| 140 |
-
# InsightFace doesn't surface a softmax probability β we report
|
| 141 |
-
# confidence 1.0 to indicate "argmax, no soft signal".
|
| 142 |
-
gender_idx = int(getattr(face, "gender", -1))
|
| 143 |
-
gender = "male" if gender_idx == 1 else "female" if gender_idx == 0 else "unknown"
|
| 144 |
-
|
| 145 |
return {
|
| 146 |
"face_bbox": bbox,
|
| 147 |
"face_confidence": round(float(face.det_score), 3),
|
| 148 |
"face_embedding": embedding,
|
| 149 |
-
"age_estimate": round(age, 1),
|
| 150 |
-
"age_range": self._bucket_age(age),
|
| 151 |
-
"age_confidence": 1.0,
|
| 152 |
-
"gender": gender,
|
| 153 |
-
"gender_confidence": 1.0,
|
| 154 |
# 106 2D landmarks (forehead, jaw, brows, eyes, nose, lips).
|
| 155 |
# Underscore-prefixed β stripped from JSON, available to
|
| 156 |
# downstream analyzers that want a tighter face crop.
|
|
@@ -161,23 +123,11 @@ class InsightFaceAnalyzer:
|
|
| 161 |
),
|
| 162 |
}
|
| 163 |
|
| 164 |
-
@staticmethod
|
| 165 |
-
def _bucket_age(age: float) -> str:
|
| 166 |
-
for lo, hi, label in AGE_BUCKETS:
|
| 167 |
-
if lo <= age < hi:
|
| 168 |
-
return label
|
| 169 |
-
return "unknown"
|
| 170 |
-
|
| 171 |
@staticmethod
|
| 172 |
def _empty_result() -> dict[str, Any]:
|
| 173 |
return {
|
| 174 |
"face_bbox": None,
|
| 175 |
"face_confidence": 0.0,
|
| 176 |
"face_embedding": None,
|
| 177 |
-
"age_estimate": 0.0,
|
| 178 |
-
"age_range": "unknown",
|
| 179 |
-
"age_confidence": 0.0,
|
| 180 |
-
"gender": "unknown",
|
| 181 |
-
"gender_confidence": 0.0,
|
| 182 |
"_insight_landmarks_2d": None,
|
| 183 |
}
|
|
|
|
| 1 |
"""
|
| 2 |
+
InsightFaceAnalyzer β face detection + ArcFace recognition embedding.
|
| 3 |
|
| 4 |
Model
|
| 5 |
-----
|
| 6 |
- Package : `insightface` (https://github.com/deepinsight/insightface)
|
| 7 |
- Bundle : buffalo_l (ResNet50@WebFace600K backbone, ONNX)
|
| 8 |
+
- Used here : SCRFD-10GF detector + ArcFace 512-d recognition + 106
|
| 9 |
+
2D landmarks. The bundle ALSO ships an age+gender head,
|
| 10 |
+
but we route those signals to dedicated analyzers
|
| 11 |
+
(MiVOLO for age, FairFace ViT for gender) because the
|
| 12 |
+
bundled heads have no exposed confidence and overshoot
|
| 13 |
+
young faces.
|
| 14 |
- Size : ~280 MB (ONNX, mixed FP16/FP32)
|
| 15 |
- License : weights research-only; code Apache 2.0
|
| 16 |
- Source : https://github.com/deepinsight/insightface/tree/master/python-package
|
|
|
|
| 24 |
face_bbox : [x1, y1, x2, y2] in pixel coordinates
|
| 25 |
face_confidence : SCRFD detection score
|
| 26 |
face_embedding : list[float] of length 512 (ArcFace, L2-normalised)
|
| 27 |
+
_insight_landmarks_2d : list of (x, y) tuples β 106 points (internal,
|
| 28 |
+
stripped before JSON; available to downstream
|
| 29 |
+
analyzers that want tighter face geometry)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
Accuracy
|
| 32 |
--------
|
| 33 |
- Recognition (ArcFace via buffalo_l): 99.83% LFW, 96.21% IJB-B FAR=1e-4.
|
| 34 |
+
- Detection (SCRFD-10GF): >99% recall on WIDER FACE easy / medium.
|
|
|
|
| 35 |
|
| 36 |
Notes
|
| 37 |
-----
|
|
|
|
| 40 |
single subject.
|
| 41 |
"""
|
| 42 |
|
|
|
|
| 43 |
from typing import Any
|
| 44 |
|
| 45 |
import numpy as np
|
|
|
|
| 55 |
|
| 56 |
MODEL_NAME = "buffalo_l"
|
| 57 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 58 |
|
| 59 |
class InsightFaceAnalyzer:
|
| 60 |
def __init__(self):
|
|
|
|
| 62 |
if not HAS_INSIGHTFACE:
|
| 63 |
print(
|
| 64 |
"[InsightFaceAnalyzer] insightface package not installed; "
|
| 65 |
+
"face detection and recognition will be unavailable."
|
| 66 |
)
|
| 67 |
return
|
| 68 |
|
| 69 |
try:
|
|
|
|
| 70 |
# CPUExecutionProvider is the right default for HF Spaces;
|
| 71 |
+
# add 'CUDAExecutionProvider' first for GPU.
|
| 72 |
self.app = FaceAnalysis(
|
| 73 |
name=MODEL_NAME,
|
| 74 |
providers=["CPUExecutionProvider"],
|
| 75 |
)
|
| 76 |
+
# det_size=(640, 640) is the canonical SCRFD input.
|
|
|
|
| 77 |
self.app.prepare(ctx_id=-1, det_size=(640, 640))
|
| 78 |
except Exception as exc:
|
| 79 |
print(f"[InsightFaceAnalyzer] Failed to load {MODEL_NAME}: {exc}")
|
|
|
|
| 109 |
else None
|
| 110 |
)
|
| 111 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 112 |
return {
|
| 113 |
"face_bbox": bbox,
|
| 114 |
"face_confidence": round(float(face.det_score), 3),
|
| 115 |
"face_embedding": embedding,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 116 |
# 106 2D landmarks (forehead, jaw, brows, eyes, nose, lips).
|
| 117 |
# Underscore-prefixed β stripped from JSON, available to
|
| 118 |
# downstream analyzers that want a tighter face crop.
|
|
|
|
| 123 |
),
|
| 124 |
}
|
| 125 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 126 |
@staticmethod
|
| 127 |
def _empty_result() -> dict[str, Any]:
|
| 128 |
return {
|
| 129 |
"face_bbox": None,
|
| 130 |
"face_confidence": 0.0,
|
| 131 |
"face_embedding": None,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 132 |
"_insight_landmarks_2d": None,
|
| 133 |
}
|
app.py
CHANGED
|
@@ -2,31 +2,39 @@
|
|
| 2 |
HCP Face Analysis Microservice
|
| 3 |
==============================
|
| 4 |
|
| 5 |
-
FastAPI service that runs
|
| 6 |
-
and merges their outputs into one facial-attribute dictionary,
|
| 7 |
-
a face-recognition embedding for cross-photo grouping and a
|
| 8 |
-
"chopped score" aesthetic rating.
|
| 9 |
|
| 10 |
Pipeline (in execution order)
|
| 11 |
-----------------------------
|
| 12 |
1. InsightFaceAnalyzer InsightFace buffalo_l (ONNX). SCRFD
|
| 13 |
detection + ArcFace 512-d embedding +
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
and
|
| 17 |
|
| 18 |
2. LandmarkAnalyzer MediaPipe Face Landmarker. 478 3D
|
| 19 |
landmarks + 52 ARKit blendshapes β
|
| 20 |
geometric features, smiling, mouth_open.
|
| 21 |
|
| 22 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
5-class ethnicity widened to a 7-bucket
|
| 24 |
schema for legacy compatibility.
|
| 25 |
|
| 26 |
-
4. ParsingAnalyzer SegFormer-B5 human parsing.
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
|
| 31 |
5. EmotionAnalyzer HSEmotion EfficientNet-B0. 8-class
|
| 32 |
emotion + valence/arousal/mood.
|
|
@@ -50,9 +58,10 @@ Pipeline (in execution order)
|
|
| 50 |
scoring only.
|
| 51 |
|
| 52 |
10. AestheticAnalyzer Pure-Python aggregator. Reads the merged
|
| 53 |
-
dict from analyzers
|
| 54 |
-
final `chopped_score` (0β100, higher
|
| 55 |
-
more chopped) and a per-factor
|
|
|
|
| 56 |
|
| 57 |
Endpoints
|
| 58 |
---------
|
|
@@ -89,6 +98,8 @@ from analyzers.color_analyzer import ColorAnalyzer
|
|
| 89 |
from analyzers.obstruction_analyzer import ObstructionAnalyzer
|
| 90 |
from analyzers.hair_type_analyzer import HairTypeAnalyzer
|
| 91 |
from analyzers.insightface_analyzer import InsightFaceAnalyzer
|
|
|
|
|
|
|
| 92 |
from analyzers.beauty_analyzer import BeautyAnalyzer
|
| 93 |
from analyzers.aesthetic_analyzer import AestheticAnalyzer
|
| 94 |
|
|
@@ -109,6 +120,8 @@ app.add_middleware(
|
|
| 109 |
# model-load cost; subsequent requests are warm.
|
| 110 |
insightface_analyzer: Optional[InsightFaceAnalyzer] = None
|
| 111 |
landmark_analyzer: Optional[LandmarkAnalyzer] = None
|
|
|
|
|
|
|
| 112 |
ethnicity_analyzer: Optional[EthnicityAnalyzer] = None
|
| 113 |
parsing_analyzer: Optional[ParsingAnalyzer] = None
|
| 114 |
emotion_analyzer: Optional[EmotionAnalyzer] = None
|
|
@@ -148,7 +161,8 @@ def get_analyzers():
|
|
| 148 |
requests. First request pays the full model-load cost; subsequent
|
| 149 |
requests are warm.
|
| 150 |
"""
|
| 151 |
-
global insightface_analyzer, landmark_analyzer
|
|
|
|
| 152 |
global parsing_analyzer, emotion_analyzer, color_analyzer
|
| 153 |
global obstruction_analyzer, hair_type_analyzer
|
| 154 |
global beauty_analyzer, aesthetic_analyzer
|
|
@@ -161,6 +175,14 @@ def get_analyzers():
|
|
| 161 |
logger.info("Loading MediaPipe Face Landmarker...")
|
| 162 |
landmark_analyzer = LandmarkAnalyzer()
|
| 163 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 164 |
if ethnicity_analyzer is None:
|
| 165 |
logger.info("Loading Ethnicity classifier...")
|
| 166 |
ethnicity_analyzer = EthnicityAnalyzer()
|
|
@@ -194,6 +216,8 @@ def get_analyzers():
|
|
| 194 |
return (
|
| 195 |
insightface_analyzer,
|
| 196 |
landmark_analyzer,
|
|
|
|
|
|
|
| 197 |
ethnicity_analyzer,
|
| 198 |
parsing_analyzer,
|
| 199 |
emotion_analyzer,
|
|
@@ -246,6 +270,8 @@ def _run_pipeline(img_array: np.ndarray) -> dict:
|
|
| 246 |
(
|
| 247 |
insight,
|
| 248 |
landmarks,
|
|
|
|
|
|
|
| 249 |
ethnicities,
|
| 250 |
parsing,
|
| 251 |
emotions,
|
|
@@ -258,7 +284,9 @@ def _run_pipeline(img_array: np.ndarray) -> dict:
|
|
| 258 |
|
| 259 |
results: dict = {}
|
| 260 |
|
| 261 |
-
# Step 1: InsightFace detection
|
|
|
|
|
|
|
| 262 |
logger.info("Running InsightFace analysis...")
|
| 263 |
insight_results = insight.analyze(img_array)
|
| 264 |
results.update(insight_results)
|
|
@@ -275,7 +303,17 @@ def _run_pipeline(img_array: np.ndarray) -> dict:
|
|
| 275 |
landmark_results = landmarks.analyze(img_array)
|
| 276 |
results.update(landmark_results)
|
| 277 |
|
| 278 |
-
# Step
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 279 |
logger.info("Running ethnicity analysis...")
|
| 280 |
results.update(ethnicities.analyze(face_crop))
|
| 281 |
|
|
|
|
| 2 |
HCP Face Analysis Microservice
|
| 3 |
==============================
|
| 4 |
|
| 5 |
+
FastAPI service that runs twelve specialised analyzers over a single
|
| 6 |
+
photo and merges their outputs into one facial-attribute dictionary,
|
| 7 |
+
including a face-recognition embedding for cross-photo grouping and a
|
| 8 |
+
numeric "chopped score" aesthetic rating.
|
| 9 |
|
| 10 |
Pipeline (in execution order)
|
| 11 |
-----------------------------
|
| 12 |
1. InsightFaceAnalyzer InsightFace buffalo_l (ONNX). SCRFD
|
| 13 |
detection + ArcFace 512-d embedding +
|
| 14 |
+
106 landmarks. Age & gender USED to
|
| 15 |
+
come from here; now delegated to
|
| 16 |
+
MiVOLO and FairFace (steps 3a/3b).
|
| 17 |
|
| 18 |
2. LandmarkAnalyzer MediaPipe Face Landmarker. 478 3D
|
| 19 |
landmarks + 52 ARKit blendshapes β
|
| 20 |
geometric features, smiling, mouth_open.
|
| 21 |
|
| 22 |
+
3a. AgeAnalyzer MiVOLO regressor (~4 yr MAE on AgeDB).
|
| 23 |
+
Replaces the InsightFace age head, which
|
| 24 |
+
systematically overshot young faces.
|
| 25 |
+
|
| 26 |
+
3b. GenderAnalyzer FairFace ViT (~93.4% acc). Replaces the
|
| 27 |
+
InsightFace gender head so we get a real
|
| 28 |
+
softmax confidence instead of argmax 1.0.
|
| 29 |
+
|
| 30 |
+
3c. EthnicityAnalyzer cledoux42/Ethnicity_Test_v003 ViT.
|
| 31 |
5-class ethnicity widened to a 7-bucket
|
| 32 |
schema for legacy compatibility.
|
| 33 |
|
| 34 |
+
4. ParsingAnalyzer SegFormer-B5 human parsing. Receives the
|
| 35 |
+
face-cropped image. Emits face/hair
|
| 36 |
+
masks + hair length + hat detection +
|
| 37 |
+
OpenCV-derived skin stats.
|
| 38 |
|
| 39 |
5. EmotionAnalyzer HSEmotion EfficientNet-B0. 8-class
|
| 40 |
emotion + valence/arousal/mood.
|
|
|
|
| 58 |
scoring only.
|
| 59 |
|
| 60 |
10. AestheticAnalyzer Pure-Python aggregator. Reads the merged
|
| 61 |
+
dict from previous analyzers and produces
|
| 62 |
+
the final `chopped_score` (0β100, higher
|
| 63 |
+
= more chopped) and a per-factor
|
| 64 |
+
breakdown.
|
| 65 |
|
| 66 |
Endpoints
|
| 67 |
---------
|
|
|
|
| 98 |
from analyzers.obstruction_analyzer import ObstructionAnalyzer
|
| 99 |
from analyzers.hair_type_analyzer import HairTypeAnalyzer
|
| 100 |
from analyzers.insightface_analyzer import InsightFaceAnalyzer
|
| 101 |
+
from analyzers.age_analyzer import AgeAnalyzer
|
| 102 |
+
from analyzers.gender_analyzer import GenderAnalyzer
|
| 103 |
from analyzers.beauty_analyzer import BeautyAnalyzer
|
| 104 |
from analyzers.aesthetic_analyzer import AestheticAnalyzer
|
| 105 |
|
|
|
|
| 120 |
# model-load cost; subsequent requests are warm.
|
| 121 |
insightface_analyzer: Optional[InsightFaceAnalyzer] = None
|
| 122 |
landmark_analyzer: Optional[LandmarkAnalyzer] = None
|
| 123 |
+
age_analyzer: Optional[AgeAnalyzer] = None
|
| 124 |
+
gender_analyzer: Optional[GenderAnalyzer] = None
|
| 125 |
ethnicity_analyzer: Optional[EthnicityAnalyzer] = None
|
| 126 |
parsing_analyzer: Optional[ParsingAnalyzer] = None
|
| 127 |
emotion_analyzer: Optional[EmotionAnalyzer] = None
|
|
|
|
| 161 |
requests. First request pays the full model-load cost; subsequent
|
| 162 |
requests are warm.
|
| 163 |
"""
|
| 164 |
+
global insightface_analyzer, landmark_analyzer
|
| 165 |
+
global age_analyzer, gender_analyzer, ethnicity_analyzer
|
| 166 |
global parsing_analyzer, emotion_analyzer, color_analyzer
|
| 167 |
global obstruction_analyzer, hair_type_analyzer
|
| 168 |
global beauty_analyzer, aesthetic_analyzer
|
|
|
|
| 175 |
logger.info("Loading MediaPipe Face Landmarker...")
|
| 176 |
landmark_analyzer = LandmarkAnalyzer()
|
| 177 |
|
| 178 |
+
if age_analyzer is None:
|
| 179 |
+
logger.info("Loading MiVOLO age analyzer...")
|
| 180 |
+
age_analyzer = AgeAnalyzer()
|
| 181 |
+
|
| 182 |
+
if gender_analyzer is None:
|
| 183 |
+
logger.info("Loading FairFace gender analyzer...")
|
| 184 |
+
gender_analyzer = GenderAnalyzer()
|
| 185 |
+
|
| 186 |
if ethnicity_analyzer is None:
|
| 187 |
logger.info("Loading Ethnicity classifier...")
|
| 188 |
ethnicity_analyzer = EthnicityAnalyzer()
|
|
|
|
| 216 |
return (
|
| 217 |
insightface_analyzer,
|
| 218 |
landmark_analyzer,
|
| 219 |
+
age_analyzer,
|
| 220 |
+
gender_analyzer,
|
| 221 |
ethnicity_analyzer,
|
| 222 |
parsing_analyzer,
|
| 223 |
emotion_analyzer,
|
|
|
|
| 270 |
(
|
| 271 |
insight,
|
| 272 |
landmarks,
|
| 273 |
+
ages,
|
| 274 |
+
genders,
|
| 275 |
ethnicities,
|
| 276 |
parsing,
|
| 277 |
emotions,
|
|
|
|
| 284 |
|
| 285 |
results: dict = {}
|
| 286 |
|
| 287 |
+
# Step 1: InsightFace β detection, ArcFace recognition embedding,
|
| 288 |
+
# 106 landmarks. (Age and gender used to live here too; they are
|
| 289 |
+
# now produced by dedicated analyzers further down.)
|
| 290 |
logger.info("Running InsightFace analysis...")
|
| 291 |
insight_results = insight.analyze(img_array)
|
| 292 |
results.update(insight_results)
|
|
|
|
| 303 |
landmark_results = landmarks.analyze(img_array)
|
| 304 |
results.update(landmark_results)
|
| 305 |
|
| 306 |
+
# Step 3a: MiVOLO age regressor. Much better at young faces than
|
| 307 |
+
# InsightFace's bundled head, which overshoots heavily.
|
| 308 |
+
logger.info("Running MiVOLO age analysis...")
|
| 309 |
+
results.update(ages.analyze(face_crop))
|
| 310 |
+
|
| 311 |
+
# Step 3b: FairFace gender. Provides a real softmax confidence
|
| 312 |
+
# score so the UI can show graded uncertainty.
|
| 313 |
+
logger.info("Running FairFace gender analysis...")
|
| 314 |
+
results.update(genders.analyze(face_crop))
|
| 315 |
+
|
| 316 |
+
# Step 3c: ethnicity classifier β likes a tighter face crop.
|
| 317 |
logger.info("Running ethnicity analysis...")
|
| 318 |
results.update(ethnicities.analyze(face_crop))
|
| 319 |
|
architecture.md
CHANGED
|
@@ -11,8 +11,9 @@ ones (only intentional in a couple of places β `_run_pipeline` in
|
|
| 11 |
Photo (RGB ndarray)
|
| 12 |
β
|
| 13 |
βββΊ [1] InsightFaceAnalyzer (insightface buffalo_l, ONNX)
|
| 14 |
-
β β face_bbox, face_confidence, face_embedding (512-d ArcFace)
|
| 15 |
-
β
|
|
|
|
| 16 |
β
|
| 17 |
βββΊ Build face crop from face_bbox + padding. Downstream analyzers
|
| 18 |
β that benefit from a tighter input read the crop; MediaPipe gets
|
|
@@ -24,7 +25,16 @@ Photo (RGB ndarray)
|
|
| 24 |
β facial_asymmetry_score, smile_asymmetry, possible_dimples,
|
| 25 |
β possible_unibrow.
|
| 26 |
β
|
| 27 |
-
βββΊ [
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
β β ethnicity, ethnicity_confidence, ethnicity_distribution
|
| 29 |
β (cropped input).
|
| 30 |
β
|
|
@@ -72,8 +82,10 @@ strips them before returning JSON.
|
|
| 72 |
|
| 73 |
| Section | Field(s) | Source |
|
| 74 |
|---|---|---|
|
| 75 |
-
| Demographics | face_bbox, face_confidence, face_embedding (512-d)
|
| 76 |
-
| Demographics |
|
|
|
|
|
|
|
| 77 |
| Emotion | primary/secondary emotion, emotion_scores, valence, arousal, mood | HSEmotion EffNet-B0 |
|
| 78 |
| Face Structure | face_shape (+ 4 ratios), jawline_type/angle, chin_type, cheekbone_prominence, cheek_fullness, forehead_width, facial_asymmetry_score | MediaPipe Face Landmarker |
|
| 79 |
| Hair | hair_length, hair_present | SegFormer-B5 |
|
|
@@ -86,7 +98,7 @@ strips them before returning JSON.
|
|
| 86 |
| Lips & Mouth | lip_fullness, lip_balance, mouth_width, cupids_bow, smile_asymmetry, possible_dimples, smiling, mouth_open | MediaPipe (last two via blendshapes) |
|
| 87 |
| Lips & Mouth | lip_color (shade + hex) | ColorAnalyzer (mask from MediaPipe) |
|
| 88 |
| Skin | skin_tone (Fitzpatrick, L*/a*/b*, hex), skin_undertone | ColorAnalyzer |
|
| 89 |
-
| Skin | wrinkle_level, skin_texture_score, skin_uniformity
|
| 90 |
| Accessories | wearing_glasses, wearing_sunglasses, wearing_mask | ObstructionViT (dima806) |
|
| 91 |
| Accessories | wearing_hat | SegFormer (hat class coverage) |
|
| 92 |
| Aesthetics | beauty_score (1β5), beauty_score_norm (0β100) | BeautyAnalyzer (SCUT-FBP5500 ResNet-50) |
|
|
|
|
| 11 |
Photo (RGB ndarray)
|
| 12 |
β
|
| 13 |
βββΊ [1] InsightFaceAnalyzer (insightface buffalo_l, ONNX)
|
| 14 |
+
β β face_bbox, face_confidence, face_embedding (512-d ArcFace).
|
| 15 |
+
β Age & gender used to come from here; they now live in
|
| 16 |
+
β dedicated analyzers (steps 3a / 3b).
|
| 17 |
β
|
| 18 |
βββΊ Build face crop from face_bbox + padding. Downstream analyzers
|
| 19 |
β that benefit from a tighter input read the crop; MediaPipe gets
|
|
|
|
| 25 |
β facial_asymmetry_score, smile_asymmetry, possible_dimples,
|
| 26 |
β possible_unibrow.
|
| 27 |
β
|
| 28 |
+
βββΊ [3a] AgeAnalyzer (MiVOLO regressor, ~4 yr MAE on AgeDB)
|
| 29 |
+
β β age_estimate, age_range, age_model_source
|
| 30 |
+
β (cropped input). Replaces the InsightFace age head.
|
| 31 |
+
β
|
| 32 |
+
βββΊ [3b] GenderAnalyzer (dima806/fairface_gender ViT)
|
| 33 |
+
β β gender, gender_confidence, gender_distribution
|
| 34 |
+
β (cropped input). Replaces the InsightFace gender head so
|
| 35 |
+
β we get a real softmax confidence.
|
| 36 |
+
β
|
| 37 |
+
βββΊ [3c] EthnicityAnalyzer (cledoux42/Ethnicity_Test_v003 ViT)
|
| 38 |
β β ethnicity, ethnicity_confidence, ethnicity_distribution
|
| 39 |
β (cropped input).
|
| 40 |
β
|
|
|
|
| 82 |
|
| 83 |
| Section | Field(s) | Source |
|
| 84 |
|---|---|---|
|
| 85 |
+
| Demographics | face_bbox, face_confidence, face_embedding (512-d) | InsightFace buffalo_l |
|
| 86 |
+
| Demographics | age_estimate, age_range, age_model_source | MiVOLO regressor |
|
| 87 |
+
| Demographics | gender, gender_confidence, gender_distribution | FairFace ViT |
|
| 88 |
+
| Demographics | ethnicity, ethnicity_confidence, ethnicity_distribution | cledoux42 ViT |
|
| 89 |
| Emotion | primary/secondary emotion, emotion_scores, valence, arousal, mood | HSEmotion EffNet-B0 |
|
| 90 |
| Face Structure | face_shape (+ 4 ratios), jawline_type/angle, chin_type, cheekbone_prominence, cheek_fullness, forehead_width, facial_asymmetry_score | MediaPipe Face Landmarker |
|
| 91 |
| Hair | hair_length, hair_present | SegFormer-B5 |
|
|
|
|
| 98 |
| Lips & Mouth | lip_fullness, lip_balance, mouth_width, cupids_bow, smile_asymmetry, possible_dimples, smiling, mouth_open | MediaPipe (last two via blendshapes) |
|
| 99 |
| Lips & Mouth | lip_color (shade + hex) | ColorAnalyzer (mask from MediaPipe) |
|
| 100 |
| Skin | skin_tone (Fitzpatrick, L*/a*/b*, hex), skin_undertone | ColorAnalyzer |
|
| 101 |
+
| Skin | wrinkle_level, skin_texture_score, skin_uniformity | SegFormer mask + OpenCV stats (`freckles_or_moles` still computed server-side but no longer displayed β detector was too noisy) |
|
| 102 |
| Accessories | wearing_glasses, wearing_sunglasses, wearing_mask | ObstructionViT (dima806) |
|
| 103 |
| Accessories | wearing_hat | SegFormer (hat class coverage) |
|
| 104 |
| Aesthetics | beauty_score (1β5), beauty_score_norm (0β100) | BeautyAnalyzer (SCUT-FBP5500 ResNet-50) |
|
requirements.txt
CHANGED
|
@@ -15,3 +15,8 @@ transformers==4.45.2
|
|
| 15 |
hsemotion>=0.2.2
|
| 16 |
insightface>=0.7.3
|
| 17 |
onnxruntime>=1.18.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
hsemotion>=0.2.2
|
| 16 |
insightface>=0.7.3
|
| 17 |
onnxruntime>=1.18.0
|
| 18 |
+
# MiVOLO for age regression. Installed from git because the project
|
| 19 |
+
# doesn't publish a PyPI release; pip resolves the git+ URL fine at
|
| 20 |
+
# Docker build time.
|
| 21 |
+
mivolo @ git+https://github.com/WildChlamydia/MiVOLO.git
|
| 22 |
+
scipy>=1.10.0
|