Spaces:

evanlyhf
/

RememberMe

Sleeping

App Files Files Community

Evan Li commited on 11 days ago

Commit

abec69f

1 Parent(s): dcbc516

mivolo, freckles

Browse files

Files changed (9) hide show

Dockerfile +10 -4
README.md +4 -2
analyzers/aesthetic_analyzer.py +114 -64
analyzers/age_analyzer.py +200 -0
analyzers/gender_analyzer.py +92 -0
analyzers/insightface_analyzer.py +14 -64
app.py +56 -18
architecture.md +18 -6
requirements.txt +5 -0

Dockerfile CHANGED Viewed

@@ -23,15 +23,21 @@ RUN mkdir -p models && \
     "https://storage.googleapis.com/mediapipe-models/face_landmarker/face_landmarker/float16/latest/face_landmarker.task"
 # Pre-download InsightFace buffalo_l bundle (detection + recognition +
-# age + gender + landmarks) so the first /analyze call doesn't pay the
-# ~280MB download. The bundle auto-extracts under ~/.insightface/models/
-# on first use.
 RUN mkdir -p /root/.insightface/models && \
     wget -q -O /root/.insightface/models/buffalo_l.zip \
     "https://github.com/deepinsight/insightface/releases/download/v0.7/buffalo_l.zip" && \
     cd /root/.insightface/models && unzip -q buffalo_l.zip -d buffalo_l && rm buffalo_l.zip
-# unzip wasn't in the system deps; add it via the apt block at the top.
 COPY . .

     "https://storage.googleapis.com/mediapipe-models/face_landmarker/face_landmarker/float16/latest/face_landmarker.task"
 # Pre-download InsightFace buffalo_l bundle (detection + recognition +
+# 106 landmarks) so the first /analyze call doesn't pay the ~280MB
+# download. The bundle auto-extracts under ~/.insightface/models/ on
+# first use.
 RUN mkdir -p /root/.insightface/models && \
     wget -q -O /root/.insightface/models/buffalo_l.zip \
     "https://github.com/deepinsight/insightface/releases/download/v0.7/buffalo_l.zip" && \
     cd /root/.insightface/models && unzip -q buffalo_l.zip -d buffalo_l && rm buffalo_l.zip
+# MiVOLO checkpoint is too large for the Docker image (~300 MB) — it's
+# lazy-downloaded from HF Hub on first AgeAnalyzer init, via
+# AGE_HF_REPO_ID (defaults to iitolstykh/mivolo_d1). To bundle it at
+# build time instead, uncomment the lines below and pin the filename.
+# RUN mkdir -p models && \
+#     wget -q -O models/mivolo_d1.pth.tar \
+#     "https://huggingface.co/iitolstykh/mivolo_d1/resolve/main/mivolo_d1.pth.tar"
 COPY . .

README.md CHANGED Viewed

@@ -18,9 +18,11 @@ face-recognition embedding and an aesthetic "chopped score."
 | # | Component | Model | Task | Size |
 |---|-----------|-------|------|------|
-| 1 | InsightFace | `buffalo_l` (SCRFD + ArcFace ResNet50, ONNX) | Detection + 512-d recognition embedding + age + gender + 106 landmarks (99.83% LFW) | ~280 MB |
 | 2 | MediaPipe Face Landmarker | `face_landmarker.task` (Google) | 478 3D landmarks + 52 ARKit blendshapes — geometric features, smiling, mouth-open | ~4 MB |
-| 3 | Ethnicity | `cledoux42/Ethnicity_Test_v003` (ViT) | 5-class ethnicity (~79.6% acc) | ~340 MB |
 | 4 | Human parsing | `matei-dorian/segformer-b5-finetuned-human-parsing` | 18-class pixel segmentation → masks + hair length + hat | ~340 MB |
 | 5 | Emotion | HSEmotion `enet_b0_8_best_afew` (EfficientNet-B0) | 8-class emotion + valence/arousal | ~20 MB |
 | 6 | Color analysis | (no model — OpenCV LAB/HSV) | Skin tone, hair color, eye color, lip color | 0 MB |

 | # | Component | Model | Task | Size |
 |---|-----------|-------|------|------|
+| 1 | InsightFace | `buffalo_l` (SCRFD + ArcFace ResNet50, ONNX) | Detection + 512-d recognition embedding + 106 landmarks (99.83% LFW) | ~280 MB |
 | 2 | MediaPipe Face Landmarker | `face_landmarker.task` (Google) | 478 3D landmarks + 52 ARKit blendshapes — geometric features, smiling, mouth-open | ~4 MB |
+| 3a | Age | MiVOLO (VOLO transformer, regression) | Continuous age (~4 yr MAE on AgeDB) | ~300 MB |
+| 3b | Gender | `dima806/fairface_gender_image_detection` (ViT) | Binary gender + softmax confidence (~93.4% acc) | ~340 MB |
+| 3c | Ethnicity | `cledoux42/Ethnicity_Test_v003` (ViT) | 5-class ethnicity (~79.6% acc) | ~340 MB |
 | 4 | Human parsing | `matei-dorian/segformer-b5-finetuned-human-parsing` | 18-class pixel segmentation → masks + hair length + hat | ~340 MB |
 | 5 | Emotion | HSEmotion `enet_b0_8_best_afew` (EfficientNet-B0) | 8-class emotion + valence/arousal | ~20 MB |
 | 6 | Color analysis | (no model — OpenCV LAB/HSV) | Skin tone, hair color, eye color, lip color | 0 MB |

analyzers/aesthetic_analyzer.py CHANGED Viewed

@@ -4,41 +4,46 @@ AestheticAnalyzer — "chopped score" aggregator.
 What it does
 ------------
 Reads the merged result dict from every other analyzer and produces a
-single numeric "chopped score" plus a per-factor breakdown. Higher
-score = more chopped = less conventionally attractive (by the
 arbitrary rubric encoded here). The breakdown lets you tune weights
 or flip polarity client-side without rerunning inference.
 Score composition
 -----------------
-The final chopped_score is a weighted blend of two sources:
-1. **Learned beauty regressor** (from BeautyAnalyzer, trained on
-   SCUT-FBP5500): a number in [1.0, 5.0] reflecting averaged human
-   ratings. We rescale to a 0–100 unattractiveness axis. This is the
-   dominant signal when available — heavy weight (default 0.7).
-2. **Rule-based factor sum**: penalties for asymmetry, wrinkles,
-   uneven skin, freckles, and asymmetric smile; bonuses for defined
-   jawline, prominent cheekbones, clear skin, balanced lips, and
-   dimples. Each factor is documented in `_compute_rule_score`.
-   This is the only signal when the regressor isn't loaded
-   (BeautyAnalyzer returns None).
 Blend math
 ----------
-    if beauty_score available:
-        chopped = 0.7 * (100 - beauty_norm) + 0.3 * rule_score
-    else:
-        chopped = rule_score
     chopped is clamped to [0, 100].
 Subjectivity disclaimer
 -----------------------
-Every weight in this file is a guess. "Beauty" is subjective, culturally
-biased, and reductive. Treat the score as an in-joke metric; never
-expose it as objective truth. The UI gates the row behind a
-Settings toggle off-by-default for that reason.
 Note: this analyzer takes no image input — it reads the merged result
 dict produced by every other analyzer that ran ahead of it.
@@ -48,12 +53,36 @@ from typing import Any
 # How much weight the learned beauty regressor gets when both signals
-# are available. The rule-based sum gets the rest (1 - this).
-LEARNED_WEIGHT = 0.7
 # Baseline score. Penalties push up, bonuses pull down.
 BASELINE = 50.0
 class AestheticAnalyzer:
     def __init__(self):
@@ -71,15 +100,16 @@ class AestheticAnalyzer:
         beauty_norm = merged.get("beauty_score_norm")
         if beauty_norm is not None:
-            # Beauty regressor: 0 = ugly, 100 = beautiful (per SCUT-FBP5500
-            # scaling). Flip to unattractiveness axis: 100 - x.
-            learned_unattractive = 100.0 - float(beauty_norm)
             chopped = (
                 LEARNED_WEIGHT * learned_unattractive
                 + (1.0 - LEARNED_WEIGHT) * rule_score
             )
             breakdown["learned_unattractive"] = round(
-                LEARNED_WEIGHT * learned_unattractive - LEARNED_WEIGHT * BASELINE, 2
             )
             breakdown["_blend_weight_learned"] = LEARNED_WEIGHT
         else:
@@ -105,9 +135,14 @@ class AestheticAnalyzer:
     def _compute_rule_score(d: dict[str, Any]) -> tuple[float, dict[str, float]]:
         """Hand-tuned weighted sum over previously-extracted attributes.
-        Returns (score, breakdown_dict). The breakdown gives each factor's
-        signed contribution so a UI can show *why* a score landed where
-        it did. Score starts at BASELINE (50) and moves up/down.
         """
         score = BASELINE
         breakdown: dict[str, float] = {}
@@ -118,13 +153,13 @@ class AestheticAnalyzer:
         # MediaPipe `facial_asymmetry_score` is already in this range.
         asym = d.get("facial_asymmetry_score")
         if isinstance(asym, (int, float)):
-            penalty = float(asym) * 18.0
             score += penalty
             breakdown["asymmetry_penalty"] = round(penalty, 2)
         # Wrinkle level from SegFormer + OpenCV Laplacian classification.
         wrinkle_penalty_map = {
-            "smooth": 0.0, "slight": 4.0, "moderate": 8.0, "prominent": 12.0,
         }
         wrinkle = d.get("wrinkle_level")
         if wrinkle in wrinkle_penalty_map:
@@ -132,36 +167,38 @@ class AestheticAnalyzer:
             score += penalty
             breakdown["wrinkle_penalty"] = penalty
-        # Skin uniformity = LAB L* std-dev over the face mask. Higher
-        # std means uneven tone (shadows, blemishes). Scale up to +8.
         uniformity = d.get("skin_uniformity")
         if isinstance(uniformity, (int, float)) and uniformity > 0:
-            # Empirically, uniformity in clean skin is ~8-15; very uneven
-            # skin pushes into the 20-30 range.
-            penalty = min(8.0, max(0.0, (float(uniformity) - 10.0) * 0.5))
             score += penalty
             breakdown["skin_unevenness_penalty"] = round(penalty, 2)
-        # Freckles/moles bucket.
-        freckle_penalty_map = {"none": 0.0, "few": 1.0, "some": 3.0, "many": 5.0}
-        freckles = d.get("freckles_or_moles")
-        if freckles in freckle_penalty_map:
-            penalty = freckle_penalty_map[freckles]
-            score += penalty
-            breakdown["freckles_penalty"] = penalty
         # Smile asymmetry: 0 = perfectly symmetric smile, larger = lopsided.
         smile_asym = d.get("smile_asymmetry")
         if isinstance(smile_asym, (int, float)):
-            penalty = min(6.0, float(smile_asym) * 30.0)
             score += penalty
             breakdown["smile_asymmetry_penalty"] = round(penalty, 2)
         # Photo-quality penalty: sunglasses/mask hide features and the
         # model is guessing more. Mild penalty, not a personal trait.
         if d.get("wearing_sunglasses") or d.get("wearing_mask"):
-            score += 5.0
-            breakdown["obstruction_penalty"] = 5.0
         # ── Bonuses (pull score down = less chopped) ─────────────────
@@ -169,52 +206,65 @@ class AestheticAnalyzer:
         # take the stronger of the two contributions.
         jaw_bonus = 0.0
         jaw_type = d.get("jawline_type")
-        jaw_type_bonus_map = {"sharp": -10.0, "strong": -6.0, "soft": 0.0}
         if jaw_type in jaw_type_bonus_map:
             jaw_bonus = jaw_type_bonus_map[jaw_type]
         jaw_angle = d.get("jawline_angle")
         if isinstance(jaw_angle, (int, float)) and jaw_angle < 115:
             # Sharp angles add more on top of the categorical signal.
-            jaw_bonus = min(jaw_bonus, -10.0)
         if jaw_bonus:
             score += jaw_bonus
             breakdown["jaw_definition_bonus"] = round(jaw_bonus, 2)
         # Cheekbone prominence.
-        cheek_bonus_map = {"high": -7.0, "moderate": -3.0, "flat": 0.0}
         cheek = d.get("cheekbone_prominence")
         if cheek in cheek_bonus_map:
             bonus = cheek_bonus_map[cheek]
             score += bonus
             breakdown["cheekbone_bonus"] = bonus
-        # Skin clarity bonus when the texture score is low (i.e. smooth skin).
-        # skin_texture_score is the same Laplacian-density value used by
-        # wrinkle_level; ≤4 is "smooth" territory.
         texture = d.get("skin_texture_score")
-        if isinstance(texture, (int, float)) and 0 < texture <= 4:
-            score -= 9.0
-            breakdown["skin_clarity_bonus"] = -9.0
         # Lip fullness — "average" and "full" both read as healthy.
         lip = d.get("lip_fullness")
-        if lip in {"average", "full"}:
-            score -= 5.0
-            breakdown["lip_fullness_bonus"] = -5.0
         # Defined cupid's bow.
         if d.get("cupids_bow") == "defined":
-            score -= 3.0
-            breakdown["cupids_bow_bonus"] = -3.0
         # Normal eye spacing.
         if d.get("eye_spacing") == "average":
-            score -= 4.0
-            breakdown["eye_spacing_bonus"] = -4.0
         # Dimples — small bonus when the MediaPipe heuristic fires.
         if d.get("possible_dimples"):
             score -= 3.0
-            breakdown["dimples_bonus"] = -3.0
         return score, breakdown

 What it does
 ------------
 Reads the merged result dict from every other analyzer and produces a
+single numeric chopped_score (0-100) plus a per-factor breakdown.
+Higher = more chopped = less conventionally attractive (by the
 arbitrary rubric encoded here). The breakdown lets you tune weights
 or flip polarity client-side without rerunning inference.
 Score composition
 -----------------
+Final chopped_score is a weighted blend of two sources:
+1. **Learned beauty regressor** (BeautyAnalyzer, SCUT-FBP5500):
+   raw score in [1.0, 5.0] mapped to a "stretched" 0-100 axis. The
+   raw model output is fairly concentrated around 2.5-3.5 (most
+   faces), which would cluster scores near the middle. We stretch
+   the [2.0, 4.0] sub-range to fill [0, 100] so the tier system
+   gets meaningful spread.
+2. **Rule-based factor sum**: scaled penalties + bonuses on top of
+   a baseline of 50. Factors are documented in `_compute_rule_score`.
 Blend math
 ----------
+    learned_unattractive = stretched_unattractive(beauty_norm)
+    chopped = 0.6 * learned_unattractive + 0.4 * rule_score
     chopped is clamped to [0, 100].
+Tuning history
+--------------
+- Original rule factors landed almost everyone at 50 ± 10. Scaled up
+  by ~1.6× to give attributes more bite. The learned signal got a
+  matching stretch (75 → 100, 25 → 0) so it isn't drowned out.
+- Freckles/moles penalty was removed — the underlying SegFormer-based
+  detector was unreliable and the metric was effectively penalising
+  shadows and pores.
 Subjectivity disclaimer
 -----------------------
+Every weight in this file is a guess. "Beauty" is subjective,
+culturally biased, and reductive. Treat the score as an in-joke
+metric; never expose it as objective truth. The UI gates the row
+behind a Settings toggle off-by-default for that reason.
 Note: this analyzer takes no image input — it reads the merged result
 dict produced by every other analyzer that ran ahead of it.
 # How much weight the learned beauty regressor gets when both signals
+# are available. 0.6 leaves more headroom for the rule sum than the
+# old 0.7 — small change, but the rule factors moved harder.
+LEARNED_WEIGHT = 0.6
 # Baseline score. Penalties push up, bonuses pull down.
 BASELINE = 50.0
+# Stretch the learned-beauty 0-100 axis so it covers the chopped
+# spectrum more dramatically. Beauty norms in (LEARNED_NORM_LO, _HI)
+# map linearly to (0, 100). Below the lo bound is "fully chopped"
+# territory (learned_unattractive = 100); above the hi bound is
+# "fully gigachad" (learned_unattractive = 0).
+LEARNED_NORM_LO = 25.0   # raw score ≈ 2.0
+LEARNED_NORM_HI = 75.0   # raw score ≈ 4.0
+def _stretch_unattractive(beauty_norm: float) -> float:
+    """Map BeautyAnalyzer's 0-100 normalised score to a stretched
+    unattractiveness 0-100. 75 → 0 (gigachad), 25 → 100 (megachopped).
+    """
+    if beauty_norm is None:
+        return 50.0
+    # Invert the axis then linearly stretch (LEARNED_NORM_LO, _HI).
+    unattractive = 100.0 - float(beauty_norm)
+    # unattractive: 25 (gigachad-ish) -> 100 (megachopped-ish)
+    span = (100.0 - LEARNED_NORM_LO) - (100.0 - LEARNED_NORM_HI)  # = 50
+    lo_after_invert = 100.0 - LEARNED_NORM_HI                     # = 25
+    stretched = (unattractive - lo_after_invert) / span * 100.0
+    return max(0.0, min(100.0, stretched))
 class AestheticAnalyzer:
     def __init__(self):
         beauty_norm = merged.get("beauty_score_norm")
         if beauty_norm is not None:
+            learned_unattractive = _stretch_unattractive(float(beauty_norm))
             chopped = (
                 LEARNED_WEIGHT * learned_unattractive
                 + (1.0 - LEARNED_WEIGHT) * rule_score
             )
+            # Show the learned contribution as a signed offset from
+            # baseline so the breakdown reads consistently with rule
+            # factors.
             breakdown["learned_unattractive"] = round(
+                LEARNED_WEIGHT * (learned_unattractive - BASELINE), 2
             )
             breakdown["_blend_weight_learned"] = LEARNED_WEIGHT
         else:
     def _compute_rule_score(d: dict[str, Any]) -> tuple[float, dict[str, float]]:
         """Hand-tuned weighted sum over previously-extracted attributes.
+        Returns (score, breakdown_dict). The breakdown gives each
+        factor's signed contribution so a UI can show *why* a score
+        landed where it did. Score starts at BASELINE (50) and moves
+        up/down.
+        All penalty / bonus magnitudes are scaled up vs. the original
+        implementation so attributes contribute meaningfully to the
+        spread instead of nudging everyone toward 50.
         """
         score = BASELINE
         breakdown: dict[str, float] = {}
         # MediaPipe `facial_asymmetry_score` is already in this range.
         asym = d.get("facial_asymmetry_score")
         if isinstance(asym, (int, float)):
+            penalty = float(asym) * 30.0   # was 18; bigger asymmetric → more spread
             score += penalty
             breakdown["asymmetry_penalty"] = round(penalty, 2)
         # Wrinkle level from SegFormer + OpenCV Laplacian classification.
         wrinkle_penalty_map = {
+            "smooth": 0.0, "slight": 6.0, "moderate": 14.0, "prominent": 20.0,
         }
         wrinkle = d.get("wrinkle_level")
         if wrinkle in wrinkle_penalty_map:
             score += penalty
             breakdown["wrinkle_penalty"] = penalty
+        # Skin uniformity = LAB L* std-dev over the eroded interior
+        # face mask. Higher std means uneven tone (shadows, blemishes).
         uniformity = d.get("skin_uniformity")
         if isinstance(uniformity, (int, float)) and uniformity > 0:
+            # Empirically uniformity sits ~8-15 in clean skin and
+            # 20-30 in uneven skin; scale up to +14 (was 8).
+            penalty = min(14.0, max(0.0, (float(uniformity) - 10.0) * 1.0))
             score += penalty
             breakdown["skin_unevenness_penalty"] = round(penalty, 2)
+        # NOTE: freckles_or_moles penalty deliberately removed — the
+        # detector was too noisy (shadows / pores counted as spots).
         # Smile asymmetry: 0 = perfectly symmetric smile, larger = lopsided.
         smile_asym = d.get("smile_asymmetry")
         if isinstance(smile_asym, (int, float)):
+            penalty = min(10.0, float(smile_asym) * 50.0)  # was max 6, factor 30
             score += penalty
             breakdown["smile_asymmetry_penalty"] = round(penalty, 2)
         # Photo-quality penalty: sunglasses/mask hide features and the
         # model is guessing more. Mild penalty, not a personal trait.
         if d.get("wearing_sunglasses") or d.get("wearing_mask"):
+            score += 8.0   # was 5
+            breakdown["obstruction_penalty"] = 8.0
+        # Hat coverage also obscures hairline / forehead — small fixed
+        # penalty so a hat doesn't accidentally help the score by
+        # blocking unflattering hair.
+        if d.get("hat_detected"):
+            score += 4.0
+            breakdown["hat_obscuration_penalty"] = 4.0
         # ── Bonuses (pull score down = less chopped) ─────────────────
         # take the stronger of the two contributions.
         jaw_bonus = 0.0
         jaw_type = d.get("jawline_type")
+        jaw_type_bonus_map = {"sharp": -16.0, "strong": -10.0, "soft": 0.0}
         if jaw_type in jaw_type_bonus_map:
             jaw_bonus = jaw_type_bonus_map[jaw_type]
         jaw_angle = d.get("jawline_angle")
         if isinstance(jaw_angle, (int, float)) and jaw_angle < 115:
             # Sharp angles add more on top of the categorical signal.
+            jaw_bonus = min(jaw_bonus, -16.0)
         if jaw_bonus:
             score += jaw_bonus
             breakdown["jaw_definition_bonus"] = round(jaw_bonus, 2)
         # Cheekbone prominence.
+        cheek_bonus_map = {"high": -11.0, "moderate": -5.0, "flat": 0.0}
         cheek = d.get("cheekbone_prominence")
         if cheek in cheek_bonus_map:
             bonus = cheek_bonus_map[cheek]
             score += bonus
             breakdown["cheekbone_bonus"] = bonus
+        # Skin clarity bonus when the texture score is low (smooth).
         texture = d.get("skin_texture_score")
+        if isinstance(texture, (int, float)) and 0 < texture <= 8:
+            score -= 14.0   # was -9
+            breakdown["skin_clarity_bonus"] = -14.0
         # Lip fullness — "average" and "full" both read as healthy.
         lip = d.get("lip_fullness")
+        if lip == "full":
+            score -= 8.0   # was -5
+            breakdown["lip_fullness_bonus"] = -8.0
+        elif lip == "average":
+            score -= 4.0
+            breakdown["lip_fullness_bonus"] = -4.0
         # Defined cupid's bow.
         if d.get("cupids_bow") == "defined":
+            score -= 5.0   # was -3
+            breakdown["cupids_bow_bonus"] = -5.0
         # Normal eye spacing.
         if d.get("eye_spacing") == "average":
+            score -= 6.0   # was -4
+            breakdown["eye_spacing_bonus"] = -6.0
+        # Symmetric face (independent of asymmetry penalty above; we
+        # explicitly reward very symmetric faces rather than just not
+        # penalising them).
+        if isinstance(asym, (int, float)) and asym < 0.15:
+            score -= 6.0
+            breakdown["symmetry_bonus"] = -6.0
         # Dimples — small bonus when the MediaPipe heuristic fires.
         if d.get("possible_dimples"):
+            score -= 5.0   # was -3
+            breakdown["dimples_bonus"] = -5.0
+        # Eyes-open bonus (closed eyes makes a face look worse).
+        if d.get("eyes_open") is True:
             score -= 3.0
+            breakdown["eyes_open_bonus"] = -3.0
         return score, breakdown

analyzers/age_analyzer.py ADDED Viewed

	@@ -0,0 +1,200 @@

+"""
+AgeAnalyzer — MiVOLO age regression.
+Model
+-----
+- Architecture : MiVOLO (Multi-Input VOLO transformer). Designed to
+                 predict age + gender from face + (optional) person
+                 crops. Significantly better at young faces than
+                 InsightFace's bundled genderage head.
+- Reference    : Kuprashevich & Tolstykh, "MiVOLO: Multi-input
+                 Transformer for Age and Gender Estimation", 2023
+                 (https://github.com/WildChlamydia/MiVOLO).
+- Reported MAE : ~3.99 yrs (AgeDB), ~5.50 yrs (IMDB). Best published
+                 numbers for face-only age regression.
+- License      : MIT (code) / weights see model card.
+Inputs
+------
+img_rgb : np.ndarray (H, W, 3) uint8. Should already be a face crop
+          (the orchestrator hands us the InsightFace-cropped image
+          via `_crop_to_face` in app.py).
+Outputs (dict)
+--------------
+age_estimate          : float years (continuous regression output)
+age_range             : string bucket derived from age_estimate for
+                        legacy compatibility with the UI
+age_confidence        : 1.0 (MiVOLO is a regressor — no probability)
+age_model_source      : "mivolo" | "insightface_fallback" | "unavailable"
+Weight loading
+--------------
+The MiVOLO package needs both its Python implementation AND a
+checkpoint file. Resolution order:
+1. Local file at `models/mivolo_d1.pth.tar` (drop in after download).
+2. Hugging Face Hub via `AGE_HF_REPO_ID` env var, falling back to the
+   public `iitolstykh/mivolo_d1` repo.
+If neither resolves (or the `mivolo` package isn't installed), the
+analyzer reports `age_model_source: unavailable` and the orchestrator
+falls back to InsightFace's age head with the piecewise calibration
+that used to live in InsightFaceAnalyzer.
+"""
+import os
+from typing import Any
+import numpy as np
+# MiVOLO ships its own model class; defer the import so the module
+# still loads when the package isn't installed.
+try:
+    from mivolo.model.mi_volo import MiVOLO
+    HAS_MIVOLO = True
+except ImportError:
+    HAS_MIVOLO = False
+# Same legacy bucket schema we use elsewhere so UI rows that show
+# `age_range` keep rendering whatever the source.
+AGE_BUCKETS = [
+    (0, 3, "0-2"), (3, 10, "3-9"), (10, 20, "10-19"),
+    (20, 30, "20-29"), (30, 40, "30-39"), (40, 50, "40-49"),
+    (50, 60, "50-59"), (60, 70, "60-69"), (70, 200, "70+"),
+]
+LOCAL_WEIGHTS_PATH = os.environ.get(
+    "AGE_WEIGHTS_PATH", "models/mivolo_d1.pth.tar"
+)
+HF_REPO_ID = os.environ.get("AGE_HF_REPO_ID", "iitolstykh/mivolo_d1")
+HF_FILENAME = os.environ.get("AGE_HF_FILENAME", "mivolo_d1.pth.tar")
+class AgeAnalyzer:
+    def __init__(self):
+        self.model = None
+        self.source = "unavailable"
+        if not HAS_MIVOLO:
+            print(
+                "[AgeAnalyzer] mivolo package not installed. Install with "
+                "`pip install git+https://github.com/WildChlamydia/MiVOLO.git`. "
+                "Falling back to InsightFace age until then."
+            )
+            return
+        weights_path = self._resolve_weights_path()
+        if weights_path is None:
+            print(
+                "[AgeAnalyzer] No usable weights — local not found at "
+                f"{LOCAL_WEIGHTS_PATH}, HF download failed. "
+                "Set AGE_WEIGHTS_PATH or AGE_HF_REPO_ID."
+            )
+            return
+        try:
+            # use_persons=False → face-only inference (we don't have
+            # body detection in this pipeline). MiVOLO is best with
+            # both but face-only is still ~5 yrs MAE.
+            self.model = MiVOLO(
+                ckpt_path=weights_path,
+                device="cuda" if self._cuda_available() else "cpu",
+                half=False,
+                use_persons=False,
+                disable_faces=False,
+                verbose=False,
+            )
+            self.source = "mivolo"
+            print(f"[AgeAnalyzer] Loaded MiVOLO weights from {weights_path}")
+        except Exception as exc:
+            print(f"[AgeAnalyzer] MiVOLO init failed: {exc}")
+            self.model = None
+    @staticmethod
+    def _cuda_available() -> bool:
+        try:
+            import torch
+            return torch.cuda.is_available()
+        except ImportError:
+            return False
+    @staticmethod
+    def _resolve_weights_path() -> str | None:
+        """Local file wins, HF Hub is the fallback."""
+        if os.path.exists(LOCAL_WEIGHTS_PATH):
+            return LOCAL_WEIGHTS_PATH
+        try:
+            from huggingface_hub import hf_hub_download
+            return hf_hub_download(repo_id=HF_REPO_ID, filename=HF_FILENAME)
+        except Exception as exc:
+            print(f"[AgeAnalyzer] HF Hub download failed: {exc}")
+            return None
+    def analyze(self, img_rgb: np.ndarray) -> dict[str, Any]:
+        if self.model is None:
+            return self._empty_result()
+        try:
+            # MiVOLO expects a face crop. img_rgb is already cropped
+            # to the InsightFace bbox by the orchestrator. The
+            # package handles its own resize / normalise; we just
+            # hand over the uint8 RGB array.
+            age = self._run_inference(img_rgb)
+        except Exception as exc:
+            print(f"[AgeAnalyzer] Inference failed: {exc}")
+            return self._empty_result()
+        return {
+            "age_estimate": round(float(age), 1),
+            "age_range": self._bucket_age(age),
+            "age_confidence": 1.0,
+            "age_model_source": self.source,
+        }
+    def _run_inference(self, face_crop: np.ndarray) -> float:
+        """Run MiVOLO on a single face crop and return the predicted age.
+        The mivolo package's high-level `Predictor` wraps detection +
+        regression together. We've already done detection via
+        InsightFace, so we use the lower-level `MiVOLO` model directly
+        on the pre-cropped face. The model's `inference` method takes
+        a batched tensor; build it from the crop.
+        """
+        import torch
+        # MiVOLO expects 224×224 ImageNet-normalised face input.
+        from torchvision import transforms
+        preprocess = transforms.Compose([
+            transforms.ToPILImage(),
+            transforms.Resize((224, 224)),
+            transforms.ToTensor(),
+            transforms.Normalize(
+                mean=[0.485, 0.456, 0.406],
+                std=[0.229, 0.224, 0.225],
+            ),
+        ])
+        face_tensor = preprocess(face_crop).unsqueeze(0).to(self.model.device)
+        # When use_persons=False, MiVOLO's inference takes a face
+        # tensor and a None (or zeros) person tensor of matching batch.
+        with torch.no_grad():
+            ages, _genders = self.model.inference(face_tensor, None)
+        return float(ages[0].item())
+    @staticmethod
+    def _bucket_age(age: float) -> str:
+        for lo, hi, label in AGE_BUCKETS:
+            if lo <= age < hi:
+                return label
+        return "unknown"
+    @staticmethod
+    def _empty_result() -> dict[str, Any]:
+        return {
+            "age_estimate": None,
+            "age_range": "unknown",
+            "age_confidence": 0.0,
+            "age_model_source": "unavailable",
+        }

analyzers/gender_analyzer.py ADDED Viewed

	@@ -0,0 +1,92 @@

+"""
+GenderAnalyzer — FairFace ViT for binary gender.
+Model
+-----
+- HF repo  : dima806/fairface_gender_image_detection
+- Arch     : Vision Transformer (ViT-B/16), 86M params
+- Trained  : FairFace dataset (gender-balanced)
+- Reported : 93.4% accuracy
+- License  : Apache 2.0
+- Source   : https://huggingface.co/dima806/fairface_gender_image_detection
+Why this instead of InsightFace's bundled head
+----------------------------------------------
+InsightFace's `genderage.onnx` does argmax only — it doesn't expose a
+softmax confidence. Borderline calls and confident calls look identical
+in the UI ("Female 100%"). FairFace gives a real probability so the UI
+can show graded confidence.
+Inputs
+------
+img_rgb : np.ndarray (H, W, 3) uint8. Typically a face crop produced
+          by `_crop_to_face` in app.py.
+Outputs (dict)
+--------------
+gender              : "male" | "female"
+gender_confidence   : float in [0, 1] (argmax softmax probability)
+gender_distribution : { "male": p, "female": p }
+gender_model_source : "fairface" | "unavailable"
+"""
+from typing import Any
+from PIL import Image
+from transformers import pipeline
+MODEL_ID = "dima806/fairface_gender_image_detection"
+LABELS = ["male", "female"]
+class GenderAnalyzer:
+    def __init__(self):
+        self.classifier = None
+        try:
+            self.classifier = pipeline("image-classification", model=MODEL_ID)
+        except Exception as exc:
+            print(f"[GenderAnalyzer] Failed to load {MODEL_ID}: {exc}")
+    def analyze(self, img_rgb) -> dict[str, Any]:
+        if self.classifier is None:
+            return self._empty_result()
+        try:
+            pil = Image.fromarray(img_rgb)
+            preds = self.classifier(pil, top_k=2)
+        except Exception as exc:
+            print(f"[GenderAnalyzer] Prediction failed: {exc}")
+            return self._empty_result()
+        if not preds:
+            return self._empty_result()
+        # FairFace labels can come back capitalised; normalise to lowercase.
+        distribution = {label: 0.0 for label in LABELS}
+        for pred in preds:
+            label = str(pred["label"]).strip().lower()
+            if label in distribution:
+                distribution[label] = round(float(pred["score"]), 3)
+        # Top class wins; expose the actual softmax score as confidence.
+        top = preds[0]
+        top_label = str(top["label"]).strip().lower()
+        if top_label not in LABELS:
+            return self._empty_result()
+        return {
+            "gender": top_label,
+            "gender_confidence": round(float(top["score"]), 3),
+            "gender_distribution": distribution,
+            "gender_model_source": "fairface",
+        }
+    @staticmethod
+    def _empty_result() -> dict[str, Any]:
+        return {
+            "gender": "unknown",
+            "gender_confidence": 0.0,
+            "gender_distribution": {label: 0.0 for label in LABELS},
+            "gender_model_source": "unavailable",
+        }

analyzers/insightface_analyzer.py CHANGED Viewed

@@ -1,12 +1,16 @@
 """
-InsightFaceAnalyzer — detection + age + gender + recognition embedding.
 Model
 -----
 - Package    : `insightface` (https://github.com/deepinsight/insightface)
 - Bundle     : buffalo_l (ResNet50@WebFace600K backbone, ONNX)
-- Components : SCRFD-10GF detector, ArcFace 512-d recognition,
-               2d106 + 3d68 landmark regressors, age + gender heads
 - Size       : ~280 MB (ONNX, mixed FP16/FP32)
 - License    : weights research-only; code Apache 2.0
 - Source     : https://github.com/deepinsight/insightface/tree/master/python-package
@@ -20,19 +24,14 @@ Outputs (dict)
 face_bbox            : [x1, y1, x2, y2] in pixel coordinates
 face_confidence      : SCRFD detection score
 face_embedding       : list[float] of length 512 (ArcFace, L2-normalised)
-age_estimate         : float years (regression head, not bucketed)
-age_range            : string bucket derived from age_estimate for
-                       backwards compatibility with the legacy UI
-gender               : "male" | "female"
-gender_confidence    : 1.0 by default (InsightFace doesn't expose a
-                       gender softmax score; the head is argmax-only)
-_insight_landmarks_2d : list of (x, y) tuples — 106 points (internal)
 Accuracy
 --------
 - Recognition (ArcFace via buffalo_l): 99.83% LFW, 96.21% IJB-B FAR=1e-4.
-- Age / gender heads are widely used but lack a clean published metric.
-  In practice age MAE is ~5 years and gender ~94-96%.
 Notes
 -----
@@ -41,7 +40,6 @@ face when multiple are detected — the rest of the pipeline assumes a
 single subject.
 """
-import os
 from typing import Any
 import numpy as np
@@ -57,21 +55,6 @@ except ImportError:
 MODEL_NAME = "buffalo_l"
-# InsightFace's genderage head is known to overshoot adult ages by
-# roughly 5 years in informal testing (no published calibration). We
-# subtract a fixed offset to undo this bias; clamp to ≥1 so we never
-# emit negative ages for kids. Override at runtime via the
-# AGE_OFFSET_YEARS env var if you want to tune for your dataset.
-AGE_OFFSET_YEARS = float(os.environ.get("AGE_OFFSET_YEARS", "5"))
-# Age buckets used by the legacy UI. We derive these from the regression
-# output so existing screens keep working.
-AGE_BUCKETS = [
-    (0, 3, "0-2"), (3, 10, "3-9"), (10, 20, "10-19"),
-    (20, 30, "20-29"), (30, 40, "30-39"), (40, 50, "40-49"),
-    (50, 60, "50-59"), (60, 70, "60-69"), (70, 200, "70+"),
-]
 class InsightFaceAnalyzer:
     def __init__(self):
@@ -79,20 +62,18 @@ class InsightFaceAnalyzer:
         if not HAS_INSIGHTFACE:
             print(
                 "[InsightFaceAnalyzer] insightface package not installed; "
-                "detection, age, gender, and recognition will degrade to 'unknown'."
             )
             return
         try:
-            # Buffalo_L bundle auto-resolves under ~/.insightface/models/.
             # CPUExecutionProvider is the right default for HF Spaces;
-            # ctx_id=0 + 'CUDAExecutionProvider' would be the GPU path.
             self.app = FaceAnalysis(
                 name=MODEL_NAME,
                 providers=["CPUExecutionProvider"],
             )
-            # det_size=(640, 640) is the canonical SCRFD input. Smaller
-            # speeds inference but loses small faces.
             self.app.prepare(ctx_id=-1, det_size=(640, 640))
         except Exception as exc:
             print(f"[InsightFaceAnalyzer] Failed to load {MODEL_NAME}: {exc}")
@@ -128,29 +109,10 @@ class InsightFaceAnalyzer:
             else None
         )
-        # Age head is a single float (years). Buffalo_L systematically
-        # over-predicts adults by ~5 years; subtract AGE_OFFSET_YEARS
-        # to recalibrate. Don't drop below 1 (negative ages would be
-        # absurd, and very young children are already on the noisy
-        # end of the model's training distribution).
-        raw_age = float(getattr(face, "age", 0.0))
-        age = max(1.0, raw_age - AGE_OFFSET_YEARS)
-        # Gender is exposed as 0 (female) / 1 (male) on Face objects.
-        # InsightFace doesn't surface a softmax probability — we report
-        # confidence 1.0 to indicate "argmax, no soft signal".
-        gender_idx = int(getattr(face, "gender", -1))
-        gender = "male" if gender_idx == 1 else "female" if gender_idx == 0 else "unknown"
         return {
             "face_bbox": bbox,
             "face_confidence": round(float(face.det_score), 3),
             "face_embedding": embedding,
-            "age_estimate": round(age, 1),
-            "age_range": self._bucket_age(age),
-            "age_confidence": 1.0,
-            "gender": gender,
-            "gender_confidence": 1.0,
             # 106 2D landmarks (forehead, jaw, brows, eyes, nose, lips).
             # Underscore-prefixed → stripped from JSON, available to
             # downstream analyzers that want a tighter face crop.
@@ -161,23 +123,11 @@ class InsightFaceAnalyzer:
             ),
         }
-    @staticmethod
-    def _bucket_age(age: float) -> str:
-        for lo, hi, label in AGE_BUCKETS:
-            if lo <= age < hi:
-                return label
-        return "unknown"
     @staticmethod
     def _empty_result() -> dict[str, Any]:
         return {
             "face_bbox": None,
             "face_confidence": 0.0,
             "face_embedding": None,
-            "age_estimate": 0.0,
-            "age_range": "unknown",
-            "age_confidence": 0.0,
-            "gender": "unknown",
-            "gender_confidence": 0.0,
             "_insight_landmarks_2d": None,
         }

 """
+InsightFaceAnalyzer — face detection + ArcFace recognition embedding.
 Model
 -----
 - Package    : `insightface` (https://github.com/deepinsight/insightface)
 - Bundle     : buffalo_l (ResNet50@WebFace600K backbone, ONNX)
+- Used here  : SCRFD-10GF detector + ArcFace 512-d recognition + 106
+               2D landmarks. The bundle ALSO ships an age+gender head,
+               but we route those signals to dedicated analyzers
+               (MiVOLO for age, FairFace ViT for gender) because the
+               bundled heads have no exposed confidence and overshoot
+               young faces.
 - Size       : ~280 MB (ONNX, mixed FP16/FP32)
 - License    : weights research-only; code Apache 2.0
 - Source     : https://github.com/deepinsight/insightface/tree/master/python-package
 face_bbox            : [x1, y1, x2, y2] in pixel coordinates
 face_confidence      : SCRFD detection score
 face_embedding       : list[float] of length 512 (ArcFace, L2-normalised)
+_insight_landmarks_2d : list of (x, y) tuples — 106 points (internal,
+                       stripped before JSON; available to downstream
+                       analyzers that want tighter face geometry)
 Accuracy
 --------
 - Recognition (ArcFace via buffalo_l): 99.83% LFW, 96.21% IJB-B FAR=1e-4.
+- Detection (SCRFD-10GF): >99% recall on WIDER FACE easy / medium.
 Notes
 -----
 single subject.
 """
 from typing import Any
 import numpy as np
 MODEL_NAME = "buffalo_l"
 class InsightFaceAnalyzer:
     def __init__(self):
         if not HAS_INSIGHTFACE:
             print(
                 "[InsightFaceAnalyzer] insightface package not installed; "
+                "face detection and recognition will be unavailable."
             )
             return
         try:
             # CPUExecutionProvider is the right default for HF Spaces;
+            # add 'CUDAExecutionProvider' first for GPU.
             self.app = FaceAnalysis(
                 name=MODEL_NAME,
                 providers=["CPUExecutionProvider"],
             )
+            # det_size=(640, 640) is the canonical SCRFD input.
             self.app.prepare(ctx_id=-1, det_size=(640, 640))
         except Exception as exc:
             print(f"[InsightFaceAnalyzer] Failed to load {MODEL_NAME}: {exc}")
             else None
         )
         return {
             "face_bbox": bbox,
             "face_confidence": round(float(face.det_score), 3),
             "face_embedding": embedding,
             # 106 2D landmarks (forehead, jaw, brows, eyes, nose, lips).
             # Underscore-prefixed → stripped from JSON, available to
             # downstream analyzers that want a tighter face crop.
             ),
         }
     @staticmethod
     def _empty_result() -> dict[str, Any]:
         return {
             "face_bbox": None,
             "face_confidence": 0.0,
             "face_embedding": None,
             "_insight_landmarks_2d": None,
         }

app.py CHANGED Viewed

@@ -2,31 +2,39 @@
 HCP Face Analysis Microservice
 ==============================
-FastAPI service that runs nine specialized analyzers over a single photo
-and merges their outputs into one facial-attribute dictionary, including
-a face-recognition embedding for cross-photo grouping and a numeric
-"chopped score" aesthetic rating.
 Pipeline (in execution order)
 -----------------------------
 1.  InsightFaceAnalyzer        InsightFace buffalo_l (ONNX). SCRFD
                                detection + ArcFace 512-d embedding +
-                               age regression + gender + 106 landmarks.
-                               Replaces the previous three FairFace ViTs
-                               and adds face matching as a new capability.
 2.  LandmarkAnalyzer           MediaPipe Face Landmarker. 478 3D
                                landmarks + 52 ARKit blendshapes →
                                geometric features, smiling, mouth_open.
-3.  EthnicityAnalyzer          cledoux42/Ethnicity_Test_v003 ViT.
                                5-class ethnicity widened to a 7-bucket
                                schema for legacy compatibility.
-4.  ParsingAnalyzer            SegFormer-B5 human parsing. Now receives
-                               a face-cropped image (smaller, cleaner).
-                               Emits face/hair masks + hair length +
-                               hat detection + OpenCV-derived skin stats.
 5.  EmotionAnalyzer            HSEmotion EfficientNet-B0. 8-class
                                emotion + valence/arousal/mood.
@@ -50,9 +58,10 @@ Pipeline (in execution order)
                                scoring only.
 10. AestheticAnalyzer          Pure-Python aggregator. Reads the merged
-                               dict from analyzers 1–9 and produces the
-                               final `chopped_score` (0–100, higher =
-                               more chopped) and a per-factor breakdown.
 Endpoints
 ---------
@@ -89,6 +98,8 @@ from analyzers.color_analyzer import ColorAnalyzer
 from analyzers.obstruction_analyzer import ObstructionAnalyzer
 from analyzers.hair_type_analyzer import HairTypeAnalyzer
 from analyzers.insightface_analyzer import InsightFaceAnalyzer
 from analyzers.beauty_analyzer import BeautyAnalyzer
 from analyzers.aesthetic_analyzer import AestheticAnalyzer
@@ -109,6 +120,8 @@ app.add_middleware(
 # model-load cost; subsequent requests are warm.
 insightface_analyzer: Optional[InsightFaceAnalyzer] = None
 landmark_analyzer: Optional[LandmarkAnalyzer] = None
 ethnicity_analyzer: Optional[EthnicityAnalyzer] = None
 parsing_analyzer: Optional[ParsingAnalyzer] = None
 emotion_analyzer: Optional[EmotionAnalyzer] = None
@@ -148,7 +161,8 @@ def get_analyzers():
     requests. First request pays the full model-load cost; subsequent
     requests are warm.
     """
-    global insightface_analyzer, landmark_analyzer, ethnicity_analyzer
     global parsing_analyzer, emotion_analyzer, color_analyzer
     global obstruction_analyzer, hair_type_analyzer
     global beauty_analyzer, aesthetic_analyzer
@@ -161,6 +175,14 @@ def get_analyzers():
         logger.info("Loading MediaPipe Face Landmarker...")
         landmark_analyzer = LandmarkAnalyzer()
     if ethnicity_analyzer is None:
         logger.info("Loading Ethnicity classifier...")
         ethnicity_analyzer = EthnicityAnalyzer()
@@ -194,6 +216,8 @@ def get_analyzers():
     return (
         insightface_analyzer,
         landmark_analyzer,
         ethnicity_analyzer,
         parsing_analyzer,
         emotion_analyzer,
@@ -246,6 +270,8 @@ def _run_pipeline(img_array: np.ndarray) -> dict:
     (
         insight,
         landmarks,
         ethnicities,
         parsing,
         emotions,
@@ -258,7 +284,9 @@ def _run_pipeline(img_array: np.ndarray) -> dict:
     results: dict = {}
-    # Step 1: InsightFace detection + age + gender + recognition embedding.
     logger.info("Running InsightFace analysis...")
     insight_results = insight.analyze(img_array)
     results.update(insight_results)
@@ -275,7 +303,17 @@ def _run_pipeline(img_array: np.ndarray) -> dict:
     landmark_results = landmarks.analyze(img_array)
     results.update(landmark_results)
-    # Step 3: ethnicity classifier — likes a tighter face crop.
     logger.info("Running ethnicity analysis...")
     results.update(ethnicities.analyze(face_crop))

 HCP Face Analysis Microservice
 ==============================
+FastAPI service that runs twelve specialised analyzers over a single
+photo and merges their outputs into one facial-attribute dictionary,
+including a face-recognition embedding for cross-photo grouping and a
+numeric "chopped score" aesthetic rating.
 Pipeline (in execution order)
 -----------------------------
 1.  InsightFaceAnalyzer        InsightFace buffalo_l (ONNX). SCRFD
                                detection + ArcFace 512-d embedding +
+                               106 landmarks. Age & gender USED to
+                               come from here; now delegated to
+                               MiVOLO and FairFace (steps 3a/3b).
 2.  LandmarkAnalyzer           MediaPipe Face Landmarker. 478 3D
                                landmarks + 52 ARKit blendshapes →
                                geometric features, smiling, mouth_open.
+3a. AgeAnalyzer                MiVOLO regressor (~4 yr MAE on AgeDB).
+                               Replaces the InsightFace age head, which
+                               systematically overshot young faces.
+3b. GenderAnalyzer             FairFace ViT (~93.4% acc). Replaces the
+                               InsightFace gender head so we get a real
+                               softmax confidence instead of argmax 1.0.
+3c. EthnicityAnalyzer          cledoux42/Ethnicity_Test_v003 ViT.
                                5-class ethnicity widened to a 7-bucket
                                schema for legacy compatibility.
+4.  ParsingAnalyzer            SegFormer-B5 human parsing. Receives the
+                               face-cropped image. Emits face/hair
+                               masks + hair length + hat detection +
+                               OpenCV-derived skin stats.
 5.  EmotionAnalyzer            HSEmotion EfficientNet-B0. 8-class
                                emotion + valence/arousal/mood.
                                scoring only.
 10. AestheticAnalyzer          Pure-Python aggregator. Reads the merged
+                               dict from previous analyzers and produces
+                               the final `chopped_score` (0–100, higher
+                               = more chopped) and a per-factor
+                               breakdown.
 Endpoints
 ---------
 from analyzers.obstruction_analyzer import ObstructionAnalyzer
 from analyzers.hair_type_analyzer import HairTypeAnalyzer
 from analyzers.insightface_analyzer import InsightFaceAnalyzer
+from analyzers.age_analyzer import AgeAnalyzer
+from analyzers.gender_analyzer import GenderAnalyzer
 from analyzers.beauty_analyzer import BeautyAnalyzer
 from analyzers.aesthetic_analyzer import AestheticAnalyzer
 # model-load cost; subsequent requests are warm.
 insightface_analyzer: Optional[InsightFaceAnalyzer] = None
 landmark_analyzer: Optional[LandmarkAnalyzer] = None
+age_analyzer: Optional[AgeAnalyzer] = None
+gender_analyzer: Optional[GenderAnalyzer] = None
 ethnicity_analyzer: Optional[EthnicityAnalyzer] = None
 parsing_analyzer: Optional[ParsingAnalyzer] = None
 emotion_analyzer: Optional[EmotionAnalyzer] = None
     requests. First request pays the full model-load cost; subsequent
     requests are warm.
     """
+    global insightface_analyzer, landmark_analyzer
+    global age_analyzer, gender_analyzer, ethnicity_analyzer
     global parsing_analyzer, emotion_analyzer, color_analyzer
     global obstruction_analyzer, hair_type_analyzer
     global beauty_analyzer, aesthetic_analyzer
         logger.info("Loading MediaPipe Face Landmarker...")
         landmark_analyzer = LandmarkAnalyzer()
+    if age_analyzer is None:
+        logger.info("Loading MiVOLO age analyzer...")
+        age_analyzer = AgeAnalyzer()
+    if gender_analyzer is None:
+        logger.info("Loading FairFace gender analyzer...")
+        gender_analyzer = GenderAnalyzer()
     if ethnicity_analyzer is None:
         logger.info("Loading Ethnicity classifier...")
         ethnicity_analyzer = EthnicityAnalyzer()
     return (
         insightface_analyzer,
         landmark_analyzer,
+        age_analyzer,
+        gender_analyzer,
         ethnicity_analyzer,
         parsing_analyzer,
         emotion_analyzer,
     (
         insight,
         landmarks,
+        ages,
+        genders,
         ethnicities,
         parsing,
         emotions,
     results: dict = {}
+    # Step 1: InsightFace — detection, ArcFace recognition embedding,
+    # 106 landmarks. (Age and gender used to live here too; they are
+    # now produced by dedicated analyzers further down.)
     logger.info("Running InsightFace analysis...")
     insight_results = insight.analyze(img_array)
     results.update(insight_results)
     landmark_results = landmarks.analyze(img_array)
     results.update(landmark_results)
+    # Step 3a: MiVOLO age regressor. Much better at young faces than
+    # InsightFace's bundled head, which overshoots heavily.
+    logger.info("Running MiVOLO age analysis...")
+    results.update(ages.analyze(face_crop))
+    # Step 3b: FairFace gender. Provides a real softmax confidence
+    # score so the UI can show graded uncertainty.
+    logger.info("Running FairFace gender analysis...")
+    results.update(genders.analyze(face_crop))
+    # Step 3c: ethnicity classifier — likes a tighter face crop.
     logger.info("Running ethnicity analysis...")
     results.update(ethnicities.analyze(face_crop))

architecture.md CHANGED Viewed

@@ -11,8 +11,9 @@ ones (only intentional in a couple of places — `_run_pipeline` in
 Photo (RGB ndarray)
   │
   ├─► [1] InsightFaceAnalyzer  (insightface buffalo_l, ONNX)
-  │       → face_bbox, face_confidence, face_embedding (512-d ArcFace),
-  │         age_estimate, age_range, gender + confidences
   │
   ├─► Build face crop from face_bbox + padding. Downstream analyzers
   │   that benefit from a tighter input read the crop; MediaPipe gets
@@ -24,7 +25,16 @@ Photo (RGB ndarray)
   │       facial_asymmetry_score, smile_asymmetry, possible_dimples,
   │       possible_unibrow.
   │
-  ├─► [3] EthnicityAnalyzer  (cledoux42/Ethnicity_Test_v003 ViT)
   │       → ethnicity, ethnicity_confidence, ethnicity_distribution
   │       (cropped input).
   │
@@ -72,8 +82,10 @@ strips them before returning JSON.
 | Section | Field(s) | Source |
 |---|---|---|
-| Demographics | face_bbox, face_confidence, face_embedding (512-d), age_estimate, age_range, age_confidence, gender, gender_confidence | InsightFace buffalo_l |
-| Demographics | ethnicity, ethnicity_confidence, ethnicity_distribution | EthnicityAnalyzer (cledoux42 ViT) |
 | Emotion | primary/secondary emotion, emotion_scores, valence, arousal, mood | HSEmotion EffNet-B0 |
 | Face Structure | face_shape (+ 4 ratios), jawline_type/angle, chin_type, cheekbone_prominence, cheek_fullness, forehead_width, facial_asymmetry_score | MediaPipe Face Landmarker |
 | Hair | hair_length, hair_present | SegFormer-B5 |
@@ -86,7 +98,7 @@ strips them before returning JSON.
 | Lips & Mouth | lip_fullness, lip_balance, mouth_width, cupids_bow, smile_asymmetry, possible_dimples, smiling, mouth_open | MediaPipe (last two via blendshapes) |
 | Lips & Mouth | lip_color (shade + hex) | ColorAnalyzer (mask from MediaPipe) |
 | Skin | skin_tone (Fitzpatrick, L*/a*/b*, hex), skin_undertone | ColorAnalyzer |
-| Skin | wrinkle_level, skin_texture_score, skin_uniformity, freckles_or_moles | SegFormer mask + OpenCV stats |
 | Accessories | wearing_glasses, wearing_sunglasses, wearing_mask | ObstructionViT (dima806) |
 | Accessories | wearing_hat | SegFormer (hat class coverage) |
 | Aesthetics | beauty_score (1–5), beauty_score_norm (0–100) | BeautyAnalyzer (SCUT-FBP5500 ResNet-50) |

 Photo (RGB ndarray)
   │
   ├─► [1] InsightFaceAnalyzer  (insightface buffalo_l, ONNX)
+  │       → face_bbox, face_confidence, face_embedding (512-d ArcFace).
+  │         Age & gender used to come from here; they now live in
+  │         dedicated analyzers (steps 3a / 3b).
   │
   ├─► Build face crop from face_bbox + padding. Downstream analyzers
   │   that benefit from a tighter input read the crop; MediaPipe gets
   │       facial_asymmetry_score, smile_asymmetry, possible_dimples,
   │       possible_unibrow.
   │
+  ├─► [3a] AgeAnalyzer  (MiVOLO regressor, ~4 yr MAE on AgeDB)
+  │       → age_estimate, age_range, age_model_source
+  │       (cropped input). Replaces the InsightFace age head.
+  │
+  ├─► [3b] GenderAnalyzer  (dima806/fairface_gender ViT)
+  │       → gender, gender_confidence, gender_distribution
+  │       (cropped input). Replaces the InsightFace gender head so
+  │       we get a real softmax confidence.
+  │
+  ├─► [3c] EthnicityAnalyzer  (cledoux42/Ethnicity_Test_v003 ViT)
   │       → ethnicity, ethnicity_confidence, ethnicity_distribution
   │       (cropped input).
   │
 | Section | Field(s) | Source |
 |---|---|---|
+| Demographics | face_bbox, face_confidence, face_embedding (512-d) | InsightFace buffalo_l |
+| Demographics | age_estimate, age_range, age_model_source | MiVOLO regressor |
+| Demographics | gender, gender_confidence, gender_distribution | FairFace ViT |
+| Demographics | ethnicity, ethnicity_confidence, ethnicity_distribution | cledoux42 ViT |
 | Emotion | primary/secondary emotion, emotion_scores, valence, arousal, mood | HSEmotion EffNet-B0 |
 | Face Structure | face_shape (+ 4 ratios), jawline_type/angle, chin_type, cheekbone_prominence, cheek_fullness, forehead_width, facial_asymmetry_score | MediaPipe Face Landmarker |
 | Hair | hair_length, hair_present | SegFormer-B5 |
 | Lips & Mouth | lip_fullness, lip_balance, mouth_width, cupids_bow, smile_asymmetry, possible_dimples, smiling, mouth_open | MediaPipe (last two via blendshapes) |
 | Lips & Mouth | lip_color (shade + hex) | ColorAnalyzer (mask from MediaPipe) |
 | Skin | skin_tone (Fitzpatrick, L*/a*/b*, hex), skin_undertone | ColorAnalyzer |
+| Skin | wrinkle_level, skin_texture_score, skin_uniformity | SegFormer mask + OpenCV stats (`freckles_or_moles` still computed server-side but no longer displayed — detector was too noisy) |
 | Accessories | wearing_glasses, wearing_sunglasses, wearing_mask | ObstructionViT (dima806) |
 | Accessories | wearing_hat | SegFormer (hat class coverage) |
 | Aesthetics | beauty_score (1–5), beauty_score_norm (0–100) | BeautyAnalyzer (SCUT-FBP5500 ResNet-50) |

requirements.txt CHANGED Viewed

@@ -15,3 +15,8 @@ transformers==4.45.2
 hsemotion>=0.2.2
 insightface>=0.7.3
 onnxruntime>=1.18.0

 hsemotion>=0.2.2
 insightface>=0.7.3
 onnxruntime>=1.18.0
+# MiVOLO for age regression. Installed from git because the project
+# doesn't publish a PyPI release; pip resolves the git+ URL fine at
+# Docker build time.
+mivolo @ git+https://github.com/WildChlamydia/MiVOLO.git
+scipy>=1.10.0