Evan Li commited on
Commit
ee3a08a
·
1 Parent(s): dfb09f4

Relabeling, discarding CLIP, replacing attributes where they can with new models or mediapipe

Browse files
Dockerfile CHANGED
@@ -13,15 +13,14 @@ WORKDIR /app
13
  COPY requirements.txt .
14
  RUN pip install --no-cache-dir -r requirements.txt
15
 
16
- # Pre-download MediaPipe model at build time so first request is fast
 
 
 
17
  RUN mkdir -p models && \
18
  wget -q -O models/face_landmarker.task \
19
  "https://storage.googleapis.com/mediapipe-models/face_landmarker/face_landmarker/float16/latest/face_landmarker.task"
20
 
21
- # Pre-download FaRL (face-tuned CLIP ViT-B/16) weights for attribute classifier
22
- RUN wget -q -O models/FaRL-Base-Patch16-LAIONFace20M-ep64.pth \
23
- "https://github.com/FacePerceiver/FaRL/releases/download/pretrained_weights/FaRL-Base-Patch16-LAIONFace20M-ep64.pth"
24
-
25
  COPY . .
26
 
27
  EXPOSE 7860
 
13
  COPY requirements.txt .
14
  RUN pip install --no-cache-dir -r requirements.txt
15
 
16
+ # Pre-download MediaPipe model at build time so first request is fast.
17
+ # All other models (FairFace, SegFormer, HSEmotion, ObstructionViT,
18
+ # HairTypeViT) are pulled from Hugging Face on first request and cached
19
+ # in /root/.cache/huggingface for the rest of the process lifetime.
20
  RUN mkdir -p models && \
21
  wget -q -O models/face_landmarker.task \
22
  "https://storage.googleapis.com/mediapipe-models/face_landmarker/face_landmarker/float16/latest/face_landmarker.task"
23
 
 
 
 
 
24
  COPY . .
25
 
26
  EXPOSE 7860
README.md CHANGED
@@ -10,25 +10,33 @@ pinned: false
10
 
11
  # HCP Face Analysis Microservice
12
 
13
- A FastAPI-based facial analysis service that combines 6 specialized ML models
14
- to extract 100+ facial attributes from a single photograph.
15
-
16
- ## Models Used
17
-
18
- | Model | Task | Size |
19
- |-------|------|------|
20
- | MediaPipe Face Landmarker | 478 3D landmarks + blendshapes | ~4 MB |
21
- | FairFace ResNet-34 | Age, gender, ethnicity | ~90 MB |
22
- | CelebA ResNet-18 | 40 binary attributes | ~44 MB |
23
- | BiSeNet | Face region segmentation | ~50 MB |
24
- | HSEmotion EfficientNet-B0 | 8-class emotion | ~20 MB |
25
- | Custom color analysis | Skin/eye/hair color | 0 MB |
26
-
27
- ## API Endpoints
28
-
29
- - `GET /health` — Health check
30
- - `POST /analyze` Multipart file upload
31
- - `POST /analyze-base64` JSON body with base64 image
 
 
 
 
 
 
 
 
32
 
33
  ## Usage
34
 
@@ -37,3 +45,6 @@ curl -X POST https://YOUR-SPACE.hf.space/analyze-base64 \
37
  -H "Content-Type: application/json" \
38
  -d '{"image": "<base64-encoded-image>"}'
39
  ```
 
 
 
 
10
 
11
  # HCP Face Analysis Microservice
12
 
13
+ FastAPI service that runs seven specialized analyzers over a single photo
14
+ and returns a merged dictionary of ~100 facial attributes.
15
+
16
+ ## Models
17
+
18
+ | # | Component | Model | Task | Size |
19
+ |---|-----------|-------|------|------|
20
+ | 1 | MediaPipe Face Landmarker | `face_landmarker.task` (Google) | 478 3D landmarks + 52 ARKit blendshapes — geometric features, smiling, mouth-open | ~4 MB |
21
+ | 2 | FairFace age | `dima806/fairface_age_image_detection` (ViT-B/16) | 9-bucket age softmax-weighted continuous estimate | ~340 MB |
22
+ | 2 | FairFace gender | `dima806/fairface_gender_image_detection` (ViT-B/16) | Binary gender (~93.4% acc) | ~340 MB |
23
+ | 2 | Ethnicity | `cledoux42/Ethnicity_Test_v003` (ViT) | 5-class ethnicity (~79.6% acc) | ~340 MB |
24
+ | 3 | Human parsing | `matei-dorian/segformer-b5-finetuned-human-parsing` | 18-class pixel segmentation → masks + hair length + hat | ~340 MB |
25
+ | 4 | Emotion | HSEmotion `enet_b0_8_best_afew` (EfficientNet-B0) | 8-class emotion + valence/arousal | ~20 MB |
26
+ | 5 | Color analysis | (no model — OpenCV LAB/HSV) | Skin tone, hair color, eye color, lip color | 0 MB |
27
+ | 6 | Obstruction | `dima806/face_obstruction_image_detection` (ViT-B/16) | glasses / sunglasses / mask (~99% precision) | ~340 MB |
28
+ | 7 | Hair type | `dima806/hair_type_image_detection` (ViT-B/16) | curly/dreadlocks/kinky/straight/wavy (~93% acc) | ~340 MB |
29
+
30
+ All analyzers are lazy-loaded on first request. The MediaPipe weight
31
+ file is pre-downloaded at Docker build time; all Hugging Face models
32
+ are cached on first inference.
33
+
34
+ ## API endpoints
35
+
36
+ - `GET /` — service info
37
+ - `GET /health` — liveness check
38
+ - `POST /analyze` — multipart file upload
39
+ - `POST /analyze-base64` — JSON `{ "image": "<base64>" }`
40
 
41
  ## Usage
42
 
 
45
  -H "Content-Type: application/json" \
46
  -d '{"image": "<base64-encoded-image>"}'
47
  ```
48
+
49
+ See [architecture.md](./architecture.md) for the pipeline diagram and the
50
+ full per-attribute model attribution table.
analyzers/__init__.py CHANGED
@@ -1 +1,9 @@
1
  # face-service analyzers package
 
 
 
 
 
 
 
 
 
1
  # face-service analyzers package
2
+ #
3
+ # Each analyzer in this package exposes a class with:
4
+ # __init__(self) — load model, register device
5
+ # analyze(self, img_rgb) -> dict — run inference, return attribute dict
6
+ #
7
+ # Analyzers are independent: they don't import from each other. Cross-
8
+ # analyzer plumbing (passing SegFormer masks into ColorAnalyzer, etc.)
9
+ # is orchestrated entirely in app.py.
analyzers/attribute_analyzer.py DELETED
@@ -1,194 +0,0 @@
1
- """
2
- FaRL-based facial attribute classification.
3
-
4
- Same CLIP ViT-B/16 architecture as before, but loaded with FaRL weights
5
- (CVPR 2022) which were pretrained on LAION-Face — the 50M face-text-pair
6
- subset of LAION-400M — instead of OpenAI's generic web crawl. The encoder
7
- discriminates facial attributes much better while keeping the prompt-pair
8
- zero-shot interface intact.
9
-
10
- Falls back to vanilla OpenAI CLIP ViT-B/16 if the FaRL .pth is missing.
11
- """
12
-
13
- import os
14
- from pathlib import Path
15
- from typing import Any
16
-
17
- import clip
18
- import torch
19
- from PIL import Image
20
-
21
-
22
- CLIP_ARCH = "ViT-B/16"
23
- FARL_WEIGHTS_PATH = os.environ.get(
24
- "FARL_WEIGHTS_PATH", "models/FaRL-Base-Patch16-LAIONFace20M-ep64.pth"
25
- )
26
-
27
- PAIRS = {
28
- "wearing_glasses": ("wearing eyeglasses", "not wearing eyeglasses"),
29
- "wearing_hat": ("wearing a hat", "not wearing a hat"),
30
- "has_beard": ("has a beard", "does not have a beard"),
31
- "mustache": ("has a mustache", "does not have a mustache"),
32
- "goatee": ("has a goatee", "does not have a goatee"),
33
- "sideburns": ("has sideburns", "does not have sideburns"),
34
- "has_bangs": ("has bangs", "does not have bangs"),
35
- "is_bald": ("is bald", "has hair"),
36
- "receding_hairline": ("has a receding hairline", "has a full hairline"),
37
- "wearing_earrings": ("wearing earrings", "not wearing earrings"),
38
- "wearing_necklace": ("wearing a necklace", "not wearing a necklace"),
39
- "wearing_necktie": ("wearing a necktie", "not wearing a necktie"),
40
- "heavy_makeup": ("wearing heavy makeup", "not wearing makeup"),
41
- "wearing_lipstick": ("wearing lipstick", "not wearing lipstick"),
42
- "big_nose": ("has a big nose", "has a small nose"),
43
- "pointy_nose": ("has a pointy nose", "has a rounded nose"),
44
- "big_lips": ("has big lips", "has thin lips"),
45
- "high_cheekbones": ("has high cheekbones", "has low cheekbones"),
46
- "oval_face_celeba": ("has an oval face", "has a non-oval face"),
47
- "double_chin": ("has a double chin", "does not have a double chin"),
48
- "chubby": ("has a chubby face", "has a slim face"),
49
- "rosy_cheeks": ("has rosy cheeks", "does not have rosy cheeks"),
50
- "bags_under_eyes": ("has bags under the eyes", "does not have bags under the eyes"),
51
- "narrow_eyes": ("has narrow eyes", "has wide eyes"),
52
- "arched_eyebrows": ("has arched eyebrows", "has straight eyebrows"),
53
- "bushy_eyebrows": ("has bushy eyebrows", "has thin eyebrows"),
54
- "pale_skin": ("has pale skin", "has medium skin"),
55
- "attractive": ("an attractive face", "an ordinary face"),
56
- "young": ("a young person", "an older person"),
57
- "smiling_celeba": ("smiling", "not smiling"),
58
- "mouth_open": ("mouth open", "mouth closed"),
59
- }
60
-
61
- HAIR_COLOR_LABELS = ["black hair", "blond hair", "brown hair", "gray hair"]
62
- HAIR_TEXTURE_LABELS = ["straight hair", "wavy hair", "curly hair"]
63
-
64
- ACCESSORY_THRESHOLD = 0.65
65
- ACCESSORY_KEYS = {
66
- "wearing_earrings", "wearing_necklace", "wearing_necktie", "wearing_hat",
67
- "heavy_makeup", "wearing_lipstick",
68
- }
69
-
70
-
71
- def _prompt(text: str) -> str:
72
- return f"a photo of {text}"
73
-
74
-
75
- class AttributeAnalyzer:
76
- def __init__(self):
77
- self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
78
- self.model = None
79
- self.preprocess = None
80
- try:
81
- model, preprocess = clip.load(CLIP_ARCH, device="cpu")
82
-
83
- weights_path = Path(FARL_WEIGHTS_PATH)
84
- if weights_path.exists():
85
- farl_state = torch.load(weights_path, map_location="cpu")
86
- state = farl_state.get("state_dict", farl_state)
87
- missing, unexpected = model.load_state_dict(state, strict=False)
88
- print(
89
- f"[AttributeAnalyzer] Loaded FaRL weights from {weights_path} "
90
- f"(missing={len(missing)}, unexpected={len(unexpected)})"
91
- )
92
- else:
93
- print(
94
- f"[AttributeAnalyzer] FaRL weights not found at {weights_path}; "
95
- "falling back to vanilla OpenAI CLIP ViT-B/16"
96
- )
97
-
98
- # Force float32 so per-pair softmax math is stable on both CPU and CUDA.
99
- self.model = model.float().to(self.device).eval()
100
- self.preprocess = preprocess
101
- except Exception as exc:
102
- print(f"[AttributeAnalyzer] Failed to load model: {exc}")
103
-
104
- @torch.no_grad()
105
- def analyze(self, img_rgb) -> dict[str, Any]:
106
- if self.model is None or self.preprocess is None:
107
- return self._empty_result()
108
-
109
- pil = Image.fromarray(img_rgb)
110
- image_tensor = self.preprocess(pil).unsqueeze(0).to(self.device)
111
- image_features = self.model.encode_image(image_tensor)
112
- image_features = image_features / image_features.norm(dim=-1, keepdim=True)
113
-
114
- pair_scores: dict[str, float] = {}
115
- for key, (positive, negative) in PAIRS.items():
116
- pair_scores[key] = self._softmax_positive(
117
- image_features, [_prompt(positive), _prompt(negative)]
118
- )
119
-
120
- color_scores = self._group_softmax(
121
- image_features, [_prompt(x) for x in HAIR_COLOR_LABELS]
122
- )
123
- texture_scores = self._group_softmax(
124
- image_features, [_prompt(x) for x in HAIR_TEXTURE_LABELS]
125
- )
126
-
127
- hair_color_name = HAIR_COLOR_LABELS[int(torch.argmax(torch.tensor(color_scores)))].split()[0]
128
- hair_texture_name = HAIR_TEXTURE_LABELS[int(torch.argmax(torch.tensor(texture_scores)))].split()[0]
129
-
130
- def flag(key: str) -> bool:
131
- score = pair_scores.get(key, 0.0)
132
- threshold = ACCESSORY_THRESHOLD if key in ACCESSORY_KEYS else 0.5
133
- return score >= threshold
134
-
135
- result: dict[str, Any] = {
136
- "_celeba_raw": {k: round(v, 3) for k, v in pair_scores.items()},
137
- "hair_color_celeba": hair_color_name,
138
- "hair_color_scores": {
139
- label.split()[0]: round(float(score), 3)
140
- for label, score in zip(HAIR_COLOR_LABELS, color_scores)
141
- },
142
- "hair_texture_celeba": hair_texture_name,
143
- }
144
-
145
- for key in PAIRS:
146
- result[key] = flag(key)
147
-
148
- beard_score = pair_scores.get("has_beard", 0.0)
149
- result["facial_hair"] = {
150
- "5_o_clock_shadow": 0.45 < beard_score < 0.7,
151
- "goatee": flag("goatee"),
152
- "mustache": flag("mustache"),
153
- "sideburns": flag("sideburns"),
154
- "full_beard": beard_score > 0.7,
155
- }
156
-
157
- return result
158
-
159
- @torch.no_grad()
160
- def _softmax_positive(self, image_features: torch.Tensor, prompts: list[str]) -> float:
161
- text_tokens = clip.tokenize(prompts).to(self.device)
162
- text_features = self.model.encode_text(text_tokens)
163
- text_features = text_features / text_features.norm(dim=-1, keepdim=True)
164
- logits = (image_features @ text_features.T) * self.model.logit_scale.exp()
165
- probs = torch.softmax(logits, dim=-1)[0]
166
- return float(probs[0])
167
-
168
- @torch.no_grad()
169
- def _group_softmax(self, image_features: torch.Tensor, prompts: list[str]) -> list[float]:
170
- text_tokens = clip.tokenize(prompts).to(self.device)
171
- text_features = self.model.encode_text(text_tokens)
172
- text_features = text_features / text_features.norm(dim=-1, keepdim=True)
173
- logits = (image_features @ text_features.T) * self.model.logit_scale.exp()
174
- probs = torch.softmax(logits, dim=-1)[0]
175
- return [float(p) for p in probs]
176
-
177
- @staticmethod
178
- def _empty_result() -> dict[str, Any]:
179
- base: dict[str, Any] = {
180
- "_celeba_raw": {},
181
- "hair_color_celeba": "unknown",
182
- "hair_color_scores": {"black": 0.0, "blond": 0.0, "brown": 0.0, "gray": 0.0},
183
- "hair_texture_celeba": "unknown",
184
- "facial_hair": {
185
- "5_o_clock_shadow": False,
186
- "goatee": False,
187
- "mustache": False,
188
- "sideburns": False,
189
- "full_beard": False,
190
- },
191
- }
192
- for key in PAIRS:
193
- base[key] = False
194
- return base
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
analyzers/color_analyzer.py CHANGED
@@ -1,13 +1,35 @@
1
  """
2
- Color Analyzer Pixel-level color extraction using masks from
3
- BiSeNet and landmarks from MediaPipe.
4
-
5
- Determines:
6
- - Skin tone (Fitzpatrick scale, LAB lightness, hex color)
7
- - Eye color (hue classification from iris region)
8
- - Hair color (LAB-trimmed median over hair mask)
9
- - Hair texture from local intensity variation (Laplacian std over eroded mask)
10
- - Lip color
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  """
12
 
13
  from typing import Any
@@ -15,8 +37,9 @@ from typing import Any
15
  import cv2
16
  import numpy as np
17
 
18
- # Fitzpatrick scale boundaries based on LAB L* channel (true 0–100 range).
19
- # OpenCV's uint8 LAB stores L scaled to 0–255, so we rescale before lookup.
 
20
  FITZPATRICK_SCALE = [
21
  (85, 100, "Type I - Very Fair"),
22
  (70, 85, "Type II - Fair"),
@@ -26,24 +49,15 @@ FITZPATRICK_SCALE = [
26
  (0, 25, "Type VI - Dark Brown/Black"),
27
  ]
28
 
29
- EYE_COLOR_RANGES = {
30
- "brown": {"h_range": (8, 28), "s_min": 50},
31
- "hazel": {"h_range": (20, 35), "s_min": 40},
32
- "green": {"h_range": (35, 80), "s_min": 30},
33
- "blue": {"h_range": (90, 130), "s_min": 30},
34
- "gray": {"h_range": (0, 180), "s_max": 30},
35
- "amber": {"h_range": (15, 25), "s_min": 80},
36
- }
37
-
38
- # Hair-texture thresholds on std(Laplacian) computed over the *eroded* hair
39
- # mask (so the mask boundary itself doesn't contribute high-frequency energy).
40
- # These are reasonable starting points — tune on your own dataset.
41
  HAIR_TEXTURE_CURLY_THRESHOLD = 25.0
42
  HAIR_TEXTURE_WAVY_THRESHOLD = 15.0
43
 
44
- # MediaPipe FaceMesh lip contours. Outer ring traces the lip border;
45
- # inner ring traces the mouth opening subtract one from the other
46
- # to get just the lip flesh and avoid sampling teeth or tongue.
47
  MEDIAPIPE_LIP_OUTER = [
48
  61, 146, 91, 181, 84, 17, 314, 405, 321, 375,
49
  291, 409, 270, 269, 267, 0, 37, 39, 40, 185,
@@ -56,7 +70,8 @@ MEDIAPIPE_LIP_INNER = [
56
 
57
  class ColorAnalyzer:
58
  def __init__(self):
59
- pass # No model to load — pure pixel analysis
 
60
 
61
  def analyze(
62
  self,
@@ -78,36 +93,38 @@ class ColorAnalyzer:
78
  if lip_mask is not None:
79
  lip_mask = lip_mask.astype(bool)
80
 
81
- # SegFormer human-parsing has no dedicated lip class, so the
82
- # parser hands us an empty mask. Fall back to MediaPipe lip
83
- # landmarks whenever the parser-derived mask is missing or tiny.
84
  if (lip_mask is None or lip_mask.sum() < 50) and landmarks:
85
  derived = self._lip_mask_from_landmarks(landmarks, h, w)
86
  if derived is not None:
87
  lip_mask = derived
88
 
89
  # ── Skin Tone ────────────────────────────────────────────────
 
90
  if skin_mask is not None and skin_mask.sum() > 100:
 
 
 
91
  skin_lab = cv2.cvtColor(img_rgb, cv2.COLOR_RGB2LAB)
92
  skin_pixels = skin_lab[skin_mask]
93
 
94
- # OpenCV uint8 LAB stores L in 0–255 and a/b offset by +128.
95
- # Rescale to the conventional ranges (L* in 0–100, a*/b* in
96
- # roughly -128..127) so the Fitzpatrick bins and undertone
97
- # thresholds operate in standard units.
98
  mean_l_raw = float(np.mean(skin_pixels[:, 0]))
99
  mean_l = mean_l_raw * 100.0 / 255.0
100
  mean_a = float(np.mean(skin_pixels[:, 1])) - 128.0
101
  mean_b = float(np.mean(skin_pixels[:, 2])) - 128.0
102
 
103
- # Fitzpatrick type
104
  fitz = "Unknown"
105
  for low, high, label in FITZPATRICK_SCALE:
106
  if low <= mean_l < high:
107
  fitz = label
108
  break
109
 
110
- # Get hex color of average skin tone
111
  avg_rgb = np.mean(img_rgb[skin_mask], axis=0).astype(int)
112
  hex_color = "#{:02x}{:02x}{:02x}".format(*avg_rgb)
113
 
@@ -120,9 +137,10 @@ class ColorAnalyzer:
120
  "rgb": avg_rgb.tolist(),
121
  }
122
 
123
- # Undertone (warm/cool/neutral). Now that b* is centered on 0,
124
- # positive b* leans yellow (warm) and negative b* leans blue
125
- # (cool). Thresholds adjusted from the old 0–255 scale.
 
126
  if mean_b > 12:
127
  result["skin_undertone"] = "warm"
128
  elif mean_b < -8:
@@ -134,32 +152,34 @@ class ColorAnalyzer:
134
  result["skin_undertone"] = "unknown"
135
 
136
  # ── Eye Color ────────────────────────────────────────────────
 
 
137
  if landmarks and len(landmarks) > 473:
138
- eye_color = self._detect_eye_color(img_rgb, landmarks, h, w)
139
- result["eye_color"] = eye_color
140
  elif landmarks and len(landmarks) > 362:
141
- # Fallback: sample from rough iris area
142
- eye_color = self._detect_eye_color_fallback(img_rgb, landmarks, h, w)
143
- result["eye_color"] = eye_color
144
  else:
145
  result["eye_color"] = "unknown"
146
 
147
- # ── Hair Color ───────────────────────────────────────────────
 
148
  if hair_mask is not None and hair_mask.sum() > 200:
149
- hair_color_info = self._estimate_hair_color(img_rgb, hair_mask)
150
- result["hair_color"] = hair_color_info
151
-
152
  result["hair_texture"] = self._estimate_hair_texture(img_rgb, hair_mask)
153
  else:
154
  result["hair_color"] = {"name": "unknown"}
155
  result["hair_texture"] = "unknown"
156
 
157
  # ── Lip Color ────────────────────────────────────────────────
 
158
  if lip_mask is not None and lip_mask.sum() > 50:
159
  lip_pixels = img_rgb[lip_mask]
160
  avg_lip = np.mean(lip_pixels, axis=0).astype(int)
161
  hex_lip = "#{:02x}{:02x}{:02x}".format(*avg_lip)
162
 
 
 
 
163
  lip_hsv = cv2.cvtColor(
164
  avg_lip.reshape(1, 1, 3).astype(np.uint8),
165
  cv2.COLOR_RGB2HSV
@@ -194,38 +214,43 @@ class ColorAnalyzer:
194
  def _estimate_hair_color(
195
  img_rgb: np.ndarray, hair_mask: np.ndarray
196
  ) -> dict[str, Any]:
197
- """Estimate dominant hair color via LAB-lightness-trimmed median.
198
 
199
  Why median + L*-trim instead of k=2 k-means:
200
- - K-means with k=2 splits highlight vs shadow within a single hair
201
- color, so the "bigger cluster" can flip between photos of the same
202
- person depending on lighting. Median is robust and deterministic.
203
- - Trimming the top/bottom 10% of L* drops specular highlights and
204
- deep shadows, which are the main outlier sources.
 
205
  """
206
  hair_pixels = img_rgb[hair_mask] # (N, 3) uint8 RGB
207
 
208
- # Trim by LAB L* to drop highlights and shadows.
209
  hair_lab = cv2.cvtColor(
210
  hair_pixels.reshape(-1, 1, 3), cv2.COLOR_RGB2LAB
211
  ).reshape(-1, 3)
212
  l_lo, l_hi = np.percentile(hair_lab[:, 0], [10, 90])
213
  keep = (hair_lab[:, 0] >= l_lo) & (hair_lab[:, 0] <= l_hi)
 
214
  core_pixels = hair_pixels[keep] if keep.sum() > 50 else hair_pixels
215
 
 
 
216
  dominant_rgb = np.median(core_pixels, axis=0)
217
  dominant_rgb = np.clip(dominant_rgb, 0, 255).astype(np.uint8)
218
 
219
  hex_hair = "#{:02x}{:02x}{:02x}".format(*dominant_rgb)
220
 
 
221
  hair_hsv = cv2.cvtColor(
222
  dominant_rgb.reshape(1, 1, 3), cv2.COLOR_RGB2HSV
223
  )[0, 0]
224
  h_val, s_val, v_val = int(hair_hsv[0]), int(hair_hsv[1]), int(hair_hsv[2])
225
 
226
- # Classification cascade — order matters. Falls through to "unknown"
227
- # rather than a default of "brown" so mask leakage / unusual tints
228
- # are detectable downstream.
229
  if v_val < 45 and s_val < 60:
230
  hair_color_name = "black"
231
  elif s_val < 25:
@@ -234,8 +259,8 @@ class ColorAnalyzer:
234
  elif (h_val < 12 or h_val > 168) and s_val > 60:
235
  hair_color_name = "red/auburn"
236
  elif 18 <= h_val <= 35 and v_val > 160 and s_val < 140:
237
- # Blond: yellow hue, high V, and not too saturated (real blond
238
- # is desaturated yellow, not orange).
239
  hair_color_name = "blond"
240
  elif 5 <= h_val <= 30:
241
  hair_color_name = "brown" if v_val > 80 else "dark brown"
@@ -253,21 +278,28 @@ class ColorAnalyzer:
253
  def _estimate_hair_texture(
254
  img_rgb: np.ndarray, hair_mask: np.ndarray
255
  ) -> str:
256
- """Estimate hair texture from local intensity variation.
 
 
 
 
257
 
258
- Computes std(Laplacian) over an *eroded* hair mask. Erosion stays
259
- strictly inside the hair region so the mask boundary itself doesn't
260
- contribute the high-frequency step edge that the previous FFT-on-
261
- zeroed-region implementation was inadvertently measuring.
262
  """
 
263
  kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 5))
264
  inner_mask = cv2.erode(
265
  hair_mask.astype(np.uint8), kernel, iterations=2
266
  ).astype(bool)
267
 
 
268
  if inner_mask.sum() < 200:
269
  return "unknown"
270
 
 
 
271
  hair_gray = cv2.cvtColor(img_rgb, cv2.COLOR_RGB2GRAY)
272
  lap = cv2.Laplacian(hair_gray, cv2.CV_64F, ksize=3)
273
  texture_score = float(np.std(lap[inner_mask]))
@@ -286,11 +318,15 @@ class ColorAnalyzer:
286
  def _lip_mask_from_landmarks(
287
  landmarks: list[dict], h: int, w: int
288
  ) -> np.ndarray | None:
289
- """Build a lip-flesh mask by filling outer lip contour minus inner."""
 
 
290
  max_idx = max(MEDIAPIPE_LIP_OUTER + MEDIAPIPE_LIP_INNER)
291
  if len(landmarks) <= max_idx:
292
  return None
293
 
 
 
294
  def _poly(indices: list[int]) -> np.ndarray:
295
  return np.array(
296
  [
@@ -300,6 +336,8 @@ class ColorAnalyzer:
300
  dtype=np.int32,
301
  )
302
 
 
 
303
  mask = np.zeros((h, w), dtype=np.uint8)
304
  cv2.fillPoly(mask, [_poly(MEDIAPIPE_LIP_OUTER)], 255)
305
  cv2.fillPoly(mask, [_poly(MEDIAPIPE_LIP_INNER)], 0)
@@ -312,15 +350,17 @@ class ColorAnalyzer:
312
  def _detect_eye_color(
313
  self, img_rgb: np.ndarray, lm: list[dict], h: int, w: int
314
  ) -> str:
315
- """Use iris landmarks (468-477) to sample eye color."""
316
- iris_indices = list(range(468, 474)) # Left iris
 
 
317
  iris_points = [(int(lm[i]["x"] * w), int(lm[i]["y"] * h)) for i in iris_indices]
318
 
319
- # Create a small mask around iris center
320
  cx = int(np.mean([p[0] for p in iris_points]))
321
  cy = int(np.mean([p[1] for p in iris_points]))
322
  radius = max(3, int(np.std([p[0] for p in iris_points]) * 1.5))
323
 
 
324
  mask = np.zeros((h, w), dtype=np.uint8)
325
  cv2.circle(mask, (cx, cy), radius, 255, -1)
326
 
@@ -333,11 +373,17 @@ class ColorAnalyzer:
333
  def _detect_eye_color_fallback(
334
  self, img_rgb: np.ndarray, lm: list[dict], h: int, w: int
335
  ) -> str:
336
- """Fallback: sample from center of eye region."""
337
- # Center of left eye
 
 
 
 
 
338
  eye_pts = [159, 145, 133, 33]
339
  cx = int(np.mean([lm[i]["x"] for i in eye_pts]) * w)
340
  cy = int(np.mean([lm[i]["y"] for i in eye_pts]) * h)
 
341
  radius = max(3, int(abs(lm[159]["y"] - lm[145]["y"]) * h * 0.3))
342
 
343
  mask = np.zeros((h, w), dtype=np.uint8)
@@ -351,7 +397,13 @@ class ColorAnalyzer:
351
 
352
  @staticmethod
353
  def _classify_eye_color(pixels: np.ndarray) -> str:
354
- """Classify eye color from pixel samples using HSV."""
 
 
 
 
 
 
355
  hsv = cv2.cvtColor(
356
  pixels.reshape(-1, 1, 3).astype(np.uint8),
357
  cv2.COLOR_RGB2HSV
@@ -361,11 +413,11 @@ class ColorAnalyzer:
361
  mean_s = float(np.mean(hsv[:, 1]))
362
  mean_v = float(np.mean(hsv[:, 2]))
363
 
364
- # Gray eyes: low saturation
365
  if mean_s < 30:
366
  return "gray"
367
 
368
- # Classify by hue
369
  if 90 <= mean_h <= 130 and mean_s > 30:
370
  return "blue"
371
  if 35 <= mean_h <= 80 and mean_s > 30:
@@ -376,7 +428,8 @@ class ColorAnalyzer:
376
  return "amber"
377
  if 8 <= mean_h <= 28 and mean_s > 50:
378
  return "brown"
 
379
  if mean_v < 60:
380
  return "dark brown"
381
 
382
- return "brown"
 
1
  """
2
+ ColorAnalyzerpixel-level color extraction.
3
+
4
+ Model
5
+ -----
6
+ None. All operations are deterministic OpenCV LAB/HSV statistics over
7
+ masks/landmarks supplied by upstream analyzers.
8
+
9
+ Inputs
10
+ ------
11
+ img_rgb : np.ndarray (H, W, 3) uint8
12
+ landmarks : list[dict] of normalised MediaPipe landmarks (optional)
13
+ skin_mask : bool ndarray (H, W) from SegFormer "face" class (optional)
14
+ hair_mask : bool ndarray (H, W) from SegFormer "hair" class (optional)
15
+ lip_mask : bool ndarray (H, W) — usually None; falls back to MediaPipe
16
+ lip polygon when missing or too small
17
+
18
+ Outputs (dict)
19
+ --------------
20
+ skin_tone — {fitzpatrick, lab_lightness, lab_a, lab_b, hex_color, rgb}
21
+ skin_undertone — warm | cool | neutral
22
+ eye_color — brown | hazel | amber | green | blue | gray | dark brown
23
+ hair_color — {name, hex, rgb, hsv}
24
+ hair_texture — straight | wavy | curly/coily (coarse Laplacian signal,
25
+ the HairTypeViT analyzer is the authoritative source)
26
+ lip_color — {shade, hex, rgb}
27
+
28
+ Notes
29
+ -----
30
+ LAB is preferred over RGB for skin tone classification because LAB's
31
+ L* channel is a perceptual lightness — Fitzpatrick bins line up with
32
+ fixed L* ranges regardless of camera white balance.
33
  """
34
 
35
  from typing import Any
 
37
  import cv2
38
  import numpy as np
39
 
40
+ # Fitzpatrick scale boundaries on the LAB L* channel (true 0–100 range).
41
+ # OpenCV's uint8 LAB stores L scaled to 0–255, so we rescale before
42
+ # looking up bins.
43
  FITZPATRICK_SCALE = [
44
  (85, 100, "Type I - Very Fair"),
45
  (70, 85, "Type II - Fair"),
 
49
  (0, 25, "Type VI - Dark Brown/Black"),
50
  ]
51
 
52
+ # Hair-texture thresholds on std(Laplacian) computed over the *eroded*
53
+ # hair mask. Erosion prevents the mask boundary from contributing
54
+ # high-frequency step-edge energy.
 
 
 
 
 
 
 
 
 
55
  HAIR_TEXTURE_CURLY_THRESHOLD = 25.0
56
  HAIR_TEXTURE_WAVY_THRESHOLD = 15.0
57
 
58
+ # MediaPipe FaceMesh lip contours. The outer ring traces the lip
59
+ # border; the inner ring traces the mouth opening. Filling outer
60
+ # and then erasing inner gives only lip flesh, never teeth/tongue.
61
  MEDIAPIPE_LIP_OUTER = [
62
  61, 146, 91, 181, 84, 17, 314, 405, 321, 375,
63
  291, 409, 270, 269, 267, 0, 37, 39, 40, 185,
 
70
 
71
  class ColorAnalyzer:
72
  def __init__(self):
73
+ # No model to load — pure pixel arithmetic.
74
+ pass
75
 
76
  def analyze(
77
  self,
 
93
  if lip_mask is not None:
94
  lip_mask = lip_mask.astype(bool)
95
 
96
+ # SegFormer human-parsing has no lip class, so callers usually
97
+ # pass None for lip_mask. Build one from MediaPipe lip landmarks
98
+ # whenever it's missing or too small to sample reliably.
99
  if (lip_mask is None or lip_mask.sum() < 50) and landmarks:
100
  derived = self._lip_mask_from_landmarks(landmarks, h, w)
101
  if derived is not None:
102
  lip_mask = derived
103
 
104
  # ── Skin Tone ────────────────────────────────────────────────
105
+ # Need at least ~100 face pixels for stable statistics.
106
  if skin_mask is not None and skin_mask.sum() > 100:
107
+ # Convert the whole image to LAB once and pull pixels under
108
+ # the mask. cv2 returns uint8 LAB with L in 0–255 and a/b
109
+ # offset by +128 (so neutral gray is L=128, a=128, b=128).
110
  skin_lab = cv2.cvtColor(img_rgb, cv2.COLOR_RGB2LAB)
111
  skin_pixels = skin_lab[skin_mask]
112
 
113
+ # Rescale to standard LAB ranges before applying the
114
+ # Fitzpatrick / undertone thresholds defined on those ranges.
 
 
115
  mean_l_raw = float(np.mean(skin_pixels[:, 0]))
116
  mean_l = mean_l_raw * 100.0 / 255.0
117
  mean_a = float(np.mean(skin_pixels[:, 1])) - 128.0
118
  mean_b = float(np.mean(skin_pixels[:, 2])) - 128.0
119
 
120
+ # Bin into Fitzpatrick types — linear search over six bands.
121
  fitz = "Unknown"
122
  for low, high, label in FITZPATRICK_SCALE:
123
  if low <= mean_l < high:
124
  fitz = label
125
  break
126
 
127
+ # Average RGB hex for display.
128
  avg_rgb = np.mean(img_rgb[skin_mask], axis=0).astype(int)
129
  hex_color = "#{:02x}{:02x}{:02x}".format(*avg_rgb)
130
 
 
137
  "rgb": avg_rgb.tolist(),
138
  }
139
 
140
+ # Undertone from b* (yellow blue axis):
141
+ # b* > +12 → yellow-leaning, warm
142
+ # b* < -8 → blue-leaning, cool
143
+ # in between → neutral
144
  if mean_b > 12:
145
  result["skin_undertone"] = "warm"
146
  elif mean_b < -8:
 
152
  result["skin_undertone"] = "unknown"
153
 
154
  # ── Eye Color ────────────────────────────────────────────────
155
+ # Prefer the dedicated iris landmarks (468-477) when available.
156
+ # Fall back to a rough eye-centre crop otherwise.
157
  if landmarks and len(landmarks) > 473:
158
+ result["eye_color"] = self._detect_eye_color(img_rgb, landmarks, h, w)
 
159
  elif landmarks and len(landmarks) > 362:
160
+ result["eye_color"] = self._detect_eye_color_fallback(img_rgb, landmarks, h, w)
 
 
161
  else:
162
  result["eye_color"] = "unknown"
163
 
164
+ # ── Hair Color & Texture ────────────────────────────────────
165
+ # Need at least 200 hair pixels for a stable median.
166
  if hair_mask is not None and hair_mask.sum() > 200:
167
+ result["hair_color"] = self._estimate_hair_color(img_rgb, hair_mask)
 
 
168
  result["hair_texture"] = self._estimate_hair_texture(img_rgb, hair_mask)
169
  else:
170
  result["hair_color"] = {"name": "unknown"}
171
  result["hair_texture"] = "unknown"
172
 
173
  # ── Lip Color ────────────────────────────────────────────────
174
+ # Average the masked lip pixels and bucket by HSV saturation/value.
175
  if lip_mask is not None and lip_mask.sum() > 50:
176
  lip_pixels = img_rgb[lip_mask]
177
  avg_lip = np.mean(lip_pixels, axis=0).astype(int)
178
  hex_lip = "#{:02x}{:02x}{:02x}".format(*avg_lip)
179
 
180
+ # Convert the single average RGB triple to HSV for shade
181
+ # classification. High saturation → rosy/red; high value but
182
+ # low saturation → pink; low value → dark; otherwise natural.
183
  lip_hsv = cv2.cvtColor(
184
  avg_lip.reshape(1, 1, 3).astype(np.uint8),
185
  cv2.COLOR_RGB2HSV
 
214
  def _estimate_hair_color(
215
  img_rgb: np.ndarray, hair_mask: np.ndarray
216
  ) -> dict[str, Any]:
217
+ """Dominant hair color via LAB-lightness-trimmed median.
218
 
219
  Why median + L*-trim instead of k=2 k-means:
220
+ - K-means with k=2 splits highlight vs shadow within a single
221
+ hair color, so the "bigger cluster" can flip between photos
222
+ of the same person depending on lighting. Median is robust
223
+ and deterministic.
224
+ - Trimming the top/bottom 10% of L* drops specular highlights
225
+ and deep shadows, the main outlier sources.
226
  """
227
  hair_pixels = img_rgb[hair_mask] # (N, 3) uint8 RGB
228
 
229
+ # LAB conversion so we can trim by perceptual lightness.
230
  hair_lab = cv2.cvtColor(
231
  hair_pixels.reshape(-1, 1, 3), cv2.COLOR_RGB2LAB
232
  ).reshape(-1, 3)
233
  l_lo, l_hi = np.percentile(hair_lab[:, 0], [10, 90])
234
  keep = (hair_lab[:, 0] >= l_lo) & (hair_lab[:, 0] <= l_hi)
235
+ # If trimming would leave us too few pixels, fall back to all.
236
  core_pixels = hair_pixels[keep] if keep.sum() > 50 else hair_pixels
237
 
238
+ # Median is robust to mask leakage (a few stray non-hair pixels
239
+ # don't shift the median).
240
  dominant_rgb = np.median(core_pixels, axis=0)
241
  dominant_rgb = np.clip(dominant_rgb, 0, 255).astype(np.uint8)
242
 
243
  hex_hair = "#{:02x}{:02x}{:02x}".format(*dominant_rgb)
244
 
245
+ # Bucket the dominant color into a name via HSV thresholds.
246
  hair_hsv = cv2.cvtColor(
247
  dominant_rgb.reshape(1, 1, 3), cv2.COLOR_RGB2HSV
248
  )[0, 0]
249
  h_val, s_val, v_val = int(hair_hsv[0]), int(hair_hsv[1]), int(hair_hsv[2])
250
 
251
+ # Classification cascade — order matters. Falls through to
252
+ # "unknown" instead of defaulting to a colour, so mask leakage
253
+ # and unusual tints stay detectable downstream.
254
  if v_val < 45 and s_val < 60:
255
  hair_color_name = "black"
256
  elif s_val < 25:
 
259
  elif (h_val < 12 or h_val > 168) and s_val > 60:
260
  hair_color_name = "red/auburn"
261
  elif 18 <= h_val <= 35 and v_val > 160 and s_val < 140:
262
+ # Blond is desaturated yellow with high V bright but not
263
+ # too saturated (or it'd shade orange).
264
  hair_color_name = "blond"
265
  elif 5 <= h_val <= 30:
266
  hair_color_name = "brown" if v_val > 80 else "dark brown"
 
278
  def _estimate_hair_texture(
279
  img_rgb: np.ndarray, hair_mask: np.ndarray
280
  ) -> str:
281
+ """Coarse hair texture from local intensity variation.
282
+
283
+ Computes std(Laplacian) over an *eroded* hair mask so the mask
284
+ boundary itself doesn't contribute the high-frequency step
285
+ edge that an un-eroded mask would.
286
 
287
+ This is intentionally a fallback signal; the authoritative
288
+ hair-texture output is HairTypeViT (curly/dreadlocks/kinky/
289
+ straight/wavy), which is trained and ~93% accurate.
 
290
  """
291
+ # Erode by ~10 px so we sample only deep-interior hair pixels.
292
  kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 5))
293
  inner_mask = cv2.erode(
294
  hair_mask.astype(np.uint8), kernel, iterations=2
295
  ).astype(bool)
296
 
297
+ # Not enough interior pixels to compute a reliable std.
298
  if inner_mask.sum() < 200:
299
  return "unknown"
300
 
301
+ # Laplacian responds to local intensity curvature; its std over
302
+ # the masked region is a proxy for "how much fine detail".
303
  hair_gray = cv2.cvtColor(img_rgb, cv2.COLOR_RGB2GRAY)
304
  lap = cv2.Laplacian(hair_gray, cv2.CV_64F, ksize=3)
305
  texture_score = float(np.std(lap[inner_mask]))
 
318
  def _lip_mask_from_landmarks(
319
  landmarks: list[dict], h: int, w: int
320
  ) -> np.ndarray | None:
321
+ """Build a lip-flesh mask by filling outer minus inner contour."""
322
+ # Bail if the landmark list doesn't have indices the contours
323
+ # reference (e.g. iris-less subset).
324
  max_idx = max(MEDIAPIPE_LIP_OUTER + MEDIAPIPE_LIP_INNER)
325
  if len(landmarks) <= max_idx:
326
  return None
327
 
328
+ # Helper to convert a list of landmark indices into a pixel-
329
+ # space polygon in (x, y) order.
330
  def _poly(indices: list[int]) -> np.ndarray:
331
  return np.array(
332
  [
 
336
  dtype=np.int32,
337
  )
338
 
339
+ # Fill the outer ring, then erase the inner ring → lip flesh
340
+ # only, no teeth or tongue pixels.
341
  mask = np.zeros((h, w), dtype=np.uint8)
342
  cv2.fillPoly(mask, [_poly(MEDIAPIPE_LIP_OUTER)], 255)
343
  cv2.fillPoly(mask, [_poly(MEDIAPIPE_LIP_INNER)], 0)
 
350
  def _detect_eye_color(
351
  self, img_rgb: np.ndarray, lm: list[dict], h: int, w: int
352
  ) -> str:
353
+ """Sample left-iris pixels using MediaPipe iris landmarks (468477)."""
354
+ # 468-473 cover the left iris ring; we average them to a centre
355
+ # and pick a radius from the std-dev of the x-coordinates.
356
+ iris_indices = list(range(468, 474))
357
  iris_points = [(int(lm[i]["x"] * w), int(lm[i]["y"] * h)) for i in iris_indices]
358
 
 
359
  cx = int(np.mean([p[0] for p in iris_points]))
360
  cy = int(np.mean([p[1] for p in iris_points]))
361
  radius = max(3, int(np.std([p[0] for p in iris_points]) * 1.5))
362
 
363
+ # Filled disc mask centred on the iris → classify those pixels.
364
  mask = np.zeros((h, w), dtype=np.uint8)
365
  cv2.circle(mask, (cx, cy), radius, 255, -1)
366
 
 
373
  def _detect_eye_color_fallback(
374
  self, img_rgb: np.ndarray, lm: list[dict], h: int, w: int
375
  ) -> str:
376
+ """Fallback when iris landmarks aren't available.
377
+
378
+ Averages four points that bound the eye opening and treats the
379
+ centre as a coarse "look here" target. Less accurate than the
380
+ iris-landmark path because we sample some sclera too, but it's
381
+ a graceful degradation.
382
+ """
383
  eye_pts = [159, 145, 133, 33]
384
  cx = int(np.mean([lm[i]["x"] for i in eye_pts]) * w)
385
  cy = int(np.mean([lm[i]["y"] for i in eye_pts]) * h)
386
+ # Radius scaled to ~30% of eye opening height.
387
  radius = max(3, int(abs(lm[159]["y"] - lm[145]["y"]) * h * 0.3))
388
 
389
  mask = np.zeros((h, w), dtype=np.uint8)
 
397
 
398
  @staticmethod
399
  def _classify_eye_color(pixels: np.ndarray) -> str:
400
+ """Bucket sampled iris pixels by HSV mean.
401
+
402
+ Hue ranges follow the standard OpenCV scale (H in 0–180, not
403
+ 0–360). The cascade order matters: gray is checked first because
404
+ any sufficiently desaturated eye is gray regardless of its
405
+ nominal hue.
406
+ """
407
  hsv = cv2.cvtColor(
408
  pixels.reshape(-1, 1, 3).astype(np.uint8),
409
  cv2.COLOR_RGB2HSV
 
413
  mean_s = float(np.mean(hsv[:, 1]))
414
  mean_v = float(np.mean(hsv[:, 2]))
415
 
416
+ # Gray eyes: any hue, but low saturation.
417
  if mean_s < 30:
418
  return "gray"
419
 
420
+ # Hue-based buckets. Specific (amber) before general (brown).
421
  if 90 <= mean_h <= 130 and mean_s > 30:
422
  return "blue"
423
  if 35 <= mean_h <= 80 and mean_s > 30:
 
428
  return "amber"
429
  if 8 <= mean_h <= 28 and mean_s > 50:
430
  return "brown"
431
+ # Anything left with low V is just dark brown.
432
  if mean_v < 60:
433
  return "dark brown"
434
 
435
+ return "brown"
analyzers/demographic_analyzer.py CHANGED
@@ -1,13 +1,36 @@
1
  """
2
- Public pretrained demographic classifiers.
3
-
4
- Models used (all public, with published accuracy):
5
- - Age: dima806/fairface_age_image_detection (~59% top-1 on FairFace age buckets)
6
- - Gender: dima806/fairface_gender_image_detection (~93.4% on FairFace)
7
- - Ethnicity: cledoux42/Ethnicity_Test_v003 (ViT, 79.6% accuracy, macro-F1 0.797)
8
-
9
- The ethnicity model replaces the former NikhilJaddu/fairface-race-vit checkpoint,
10
- which had no published performance metrics on the HF model card.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  """
12
 
13
  from typing import Any
@@ -22,20 +45,28 @@ RACE_MODEL_ID = "cledoux42/Ethnicity_Test_v003"
22
 
23
  AGE_LABELS = ["0-2", "3-9", "10-19", "20-29", "30-39", "40-49", "50-59", "60-69", "70+"]
24
  GENDER_LABELS = ["Male", "Female"]
25
- # cledoux42/Ethnicity_Test_v003 outputs 5 classes: african, asian, caucasian, hispanic, indian.
26
- # We keep the legacy 7-bucket schema internally so the rest of the app still works;
27
- # unseen buckets simply stay at 0.0 in the distribution.
28
  RACE_LABELS = ["White", "Black", "Latino_Hispanic", "East Asian", "Southeast Asian", "Indian", "Middle Eastern"]
29
 
30
 
31
  class DemographicAnalyzer:
32
  def __init__(self):
 
 
 
33
  self.age_classifier = self._load_classifier(AGE_MODEL_ID)
34
  self.gender_classifier = self._load_classifier(GENDER_MODEL_ID)
35
  self.race_classifier = self._load_classifier(RACE_MODEL_ID)
36
 
37
  @staticmethod
38
  def _load_classifier(model_id: str):
 
 
 
 
 
39
  try:
40
  return pipeline("image-classification", model=model_id)
41
  except Exception as exc:
@@ -43,12 +74,18 @@ class DemographicAnalyzer:
43
  return None
44
 
45
  def analyze(self, img_rgb) -> dict[str, Any]:
 
 
46
  pil = Image.fromarray(img_rgb)
47
 
48
- age_predictions = self._safe_predict(self.age_classifier, pil, top_k=3)
 
 
 
49
  gender_predictions = self._safe_predict(self.gender_classifier, pil, top_k=2)
50
  race_predictions = self._safe_predict(self.race_classifier, pil, top_k=7)
51
 
 
52
  if not age_predictions and not gender_predictions and not race_predictions:
53
  return {
54
  "age_range": "unknown",
@@ -62,17 +99,22 @@ class DemographicAnalyzer:
62
  "ethnicity_distribution": {label: 0.0 for label in RACE_LABELS},
63
  }
64
 
 
 
65
  age_prediction = age_predictions[0] if age_predictions else {"label": "unknown", "score": 0.0}
66
  gender_prediction = gender_predictions[0] if gender_predictions else {"label": "unknown", "score": 0.0}
67
  race_prediction = race_predictions[0] if race_predictions else {"label": "unknown", "score": 0.0}
68
 
 
 
 
69
  age_label = self._normalize_age_label(age_prediction["label"])
70
  gender_label = self._normalize_gender_label(gender_prediction["label"])
71
  race_label = self._normalize_race_label(race_prediction["label"])
72
 
73
  return {
74
  "age_range": age_label,
75
- "age_estimate": self._age_estimate_from_label(age_label),
76
  "age_confidence": round(float(age_prediction["score"]), 3),
77
  "gender": gender_label.lower(),
78
  "gender_confidence": round(float(gender_prediction["score"]), 3),
@@ -84,6 +126,7 @@ class DemographicAnalyzer:
84
 
85
  @staticmethod
86
  def _normalize_age_label(label: str) -> str:
 
87
  normalized = label.strip().lower()
88
  if normalized == "more than 70":
89
  return "70+"
@@ -98,9 +141,10 @@ class DemographicAnalyzer:
98
 
99
  @staticmethod
100
  def _normalize_race_label(label: str) -> str:
 
101
  normalized = label.strip().lower().replace("-", "_")
102
  race_aliases = {
103
- # Original FairFace 7-class labels
104
  "white": "White",
105
  "black": "Black",
106
  "latino_hispanic": "Latino_Hispanic",
@@ -109,7 +153,7 @@ class DemographicAnalyzer:
109
  "southeast asian": "Southeast Asian",
110
  "indian": "Indian",
111
  "middle eastern": "Middle Eastern",
112
- # cledoux42/Ethnicity_Test_v003 5-class labels → map into our schema
113
  "african": "Black",
114
  "asian": "East Asian",
115
  "caucasian": "White",
@@ -117,23 +161,52 @@ class DemographicAnalyzer:
117
  }
118
  return race_aliases.get(normalized, label)
119
 
120
- @staticmethod
121
- def _age_estimate_from_label(label: str) -> float:
122
- mapping = {
123
- "0-2": 1.0,
124
- "3-9": 6.0,
125
- "10-19": 14.5,
126
- "20-29": 24.5,
127
- "30-39": 34.5,
128
- "40-49": 44.5,
129
- "50-59": 54.5,
130
- "60-69": 64.5,
131
- "70+": 75.0,
132
- }
133
- return mapping.get(label, 0.0)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
134
 
135
  @classmethod
136
  def _distribution_map(cls, predictions, normalizer, all_labels):
 
 
 
 
137
  distribution = {label: 0.0 for label in all_labels}
138
  for prediction in predictions:
139
  normalized_label = normalizer(prediction["label"])
@@ -143,6 +216,8 @@ class DemographicAnalyzer:
143
 
144
  @staticmethod
145
  def _safe_predict(classifier, image, top_k: int):
 
 
146
  if classifier is None:
147
  return []
148
  try:
 
1
  """
2
+ DemographicAnalyzer age, gender, ethnicity via three ViT classifiers.
3
+
4
+ Models
5
+ ------
6
+ - Age : dima806/fairface_age_image_detection
7
+ ViT-B/16, ~59% top-1 on FairFace 9 age buckets.
8
+ - Gender : dima806/fairface_gender_image_detection
9
+ ViT-B/16, ~93.4% on FairFace.
10
+ - Ethnicity : cledoux42/Ethnicity_Test_v003
11
+ ViT, 79.6% accuracy, macro-F1 0.797. 5-class output that
12
+ we widen into the legacy 7-bucket FairFace schema so the
13
+ rest of the app's distribution shape doesn't change.
14
+
15
+ All three are Apache 2.0 and Hugging Face image-classification pipelines.
16
+
17
+ Inputs
18
+ ------
19
+ img_rgb : np.ndarray (H, W, 3) uint8
20
+
21
+ Outputs (dict)
22
+ --------------
23
+ age_range, age_estimate (softmax-weighted continuous), age_confidence,
24
+ age_distribution, gender, gender_confidence, ethnicity,
25
+ ethnicity_confidence, ethnicity_distribution.
26
+
27
+ Notes
28
+ -----
29
+ The FairFace age model is a 9-bucket classifier (0-2, 3-9, …, 70+),
30
+ which means the argmax bucket midpoint is always one of nine fixed
31
+ numbers (24.5 for 20-29, etc.). To recover a smooth continuous estimate
32
+ we compute the expected value across the full softmax — see
33
+ ``_weighted_age_estimate``.
34
  """
35
 
36
  from typing import Any
 
45
 
46
  AGE_LABELS = ["0-2", "3-9", "10-19", "20-29", "30-39", "40-49", "50-59", "60-69", "70+"]
47
  GENDER_LABELS = ["Male", "Female"]
48
+ # cledoux42 ships 5 classes (african, asian, caucasian, hispanic, indian),
49
+ # but we keep the legacy 7-bucket FairFace label space internally so the
50
+ # downstream distribution dict shape stays stable. Unseen buckets stay 0.
51
  RACE_LABELS = ["White", "Black", "Latino_Hispanic", "East Asian", "Southeast Asian", "Indian", "Middle Eastern"]
52
 
53
 
54
  class DemographicAnalyzer:
55
  def __init__(self):
56
+ # Each classifier is a HF image-classification pipeline. They lazy
57
+ # download weights from HF on first instantiation and cache them
58
+ # under /root/.cache/huggingface inside the container.
59
  self.age_classifier = self._load_classifier(AGE_MODEL_ID)
60
  self.gender_classifier = self._load_classifier(GENDER_MODEL_ID)
61
  self.race_classifier = self._load_classifier(RACE_MODEL_ID)
62
 
63
  @staticmethod
64
  def _load_classifier(model_id: str):
65
+ """Build one HF image-classification pipeline, logging on failure.
66
+
67
+ A failed load returns None so the rest of the service continues
68
+ to function and `analyze()` falls back to "unknown" demographics.
69
+ """
70
  try:
71
  return pipeline("image-classification", model=model_id)
72
  except Exception as exc:
 
74
  return None
75
 
76
  def analyze(self, img_rgb) -> dict[str, Any]:
77
+ # Convert the numpy frame to a PIL Image once and reuse it for
78
+ # all three classifier calls.
79
  pil = Image.fromarray(img_rgb)
80
 
81
+ # top_k=len(labels) so we get the full softmax for each model.
82
+ # We need the full age distribution to compute the weighted
83
+ # expected-value age estimate.
84
+ age_predictions = self._safe_predict(self.age_classifier, pil, top_k=len(AGE_LABELS))
85
  gender_predictions = self._safe_predict(self.gender_classifier, pil, top_k=2)
86
  race_predictions = self._safe_predict(self.race_classifier, pil, top_k=7)
87
 
88
+ # If every classifier failed we degrade gracefully with a stub.
89
  if not age_predictions and not gender_predictions and not race_predictions:
90
  return {
91
  "age_range": "unknown",
 
99
  "ethnicity_distribution": {label: 0.0 for label in RACE_LABELS},
100
  }
101
 
102
+ # HF pipelines return predictions pre-sorted by score descending,
103
+ # so prediction[0] is always the argmax class.
104
  age_prediction = age_predictions[0] if age_predictions else {"label": "unknown", "score": 0.0}
105
  gender_prediction = gender_predictions[0] if gender_predictions else {"label": "unknown", "score": 0.0}
106
  race_prediction = race_predictions[0] if race_predictions else {"label": "unknown", "score": 0.0}
107
 
108
+ # Models occasionally return label aliases ("more than 70" instead
109
+ # of "70+", "African" instead of "Black"). The normalisers map
110
+ # everything back to our canonical schema.
111
  age_label = self._normalize_age_label(age_prediction["label"])
112
  gender_label = self._normalize_gender_label(gender_prediction["label"])
113
  race_label = self._normalize_race_label(race_prediction["label"])
114
 
115
  return {
116
  "age_range": age_label,
117
+ "age_estimate": self._weighted_age_estimate(age_predictions),
118
  "age_confidence": round(float(age_prediction["score"]), 3),
119
  "gender": gender_label.lower(),
120
  "gender_confidence": round(float(gender_prediction["score"]), 3),
 
126
 
127
  @staticmethod
128
  def _normalize_age_label(label: str) -> str:
129
+ """Map model output to canonical AGE_LABELS entry."""
130
  normalized = label.strip().lower()
131
  if normalized == "more than 70":
132
  return "70+"
 
141
 
142
  @staticmethod
143
  def _normalize_race_label(label: str) -> str:
144
+ """Coalesce cledoux42's 5 classes into our 7-bucket schema."""
145
  normalized = label.strip().lower().replace("-", "_")
146
  race_aliases = {
147
+ # Legacy FairFace 7-class labels
148
  "white": "White",
149
  "black": "Black",
150
  "latino_hispanic": "Latino_Hispanic",
 
153
  "southeast asian": "Southeast Asian",
154
  "indian": "Indian",
155
  "middle eastern": "Middle Eastern",
156
+ # cledoux42/Ethnicity_Test_v003 5-class labels
157
  "african": "Black",
158
  "asian": "East Asian",
159
  "caucasian": "White",
 
161
  }
162
  return race_aliases.get(normalized, label)
163
 
164
+ # Midpoint of each FairFace age bucket — used as the per-bucket
165
+ # "value" when we marginalise over the predicted distribution.
166
+ _AGE_MIDPOINTS = {
167
+ "0-2": 1.0,
168
+ "3-9": 6.0,
169
+ "10-19": 14.5,
170
+ "20-29": 24.5,
171
+ "30-39": 34.5,
172
+ "40-49": 44.5,
173
+ "50-59": 54.5,
174
+ "60-69": 64.5,
175
+ "70+": 75.0,
176
+ }
177
+
178
+ @classmethod
179
+ def _weighted_age_estimate(cls, predictions: list[dict]) -> float:
180
+ """Softmax-weighted expected age across all FairFace buckets.
181
+
182
+ FairFace is a 9-bucket classifier; the argmax always snaps to one
183
+ of nine fixed midpoints (24.5 for 20-29, etc.). Treating its
184
+ softmax as a probability distribution and taking the expected
185
+ value gives a continuous number that moves with confidence
186
+ (23.1 for someone very confidently 20-29, 28.4 if some mass leaks
187
+ into 30-39). Still bounded by bucket midpoints — true per-year
188
+ accuracy would need a regression model.
189
+ """
190
+ total_weight = 0.0
191
+ weighted_sum = 0.0
192
+ for pred in predictions:
193
+ label = cls._normalize_age_label(pred["label"])
194
+ midpoint = cls._AGE_MIDPOINTS.get(label)
195
+ if midpoint is None:
196
+ continue
197
+ score = float(pred["score"])
198
+ weighted_sum += midpoint * score
199
+ total_weight += score
200
+ if total_weight == 0:
201
+ return 0.0
202
+ return round(weighted_sum / total_weight, 1)
203
 
204
  @classmethod
205
  def _distribution_map(cls, predictions, normalizer, all_labels):
206
+ """Flatten HF predictions into {canonical_label: score} dict.
207
+
208
+ Unseen labels stay at 0.0 so the shape is always all_labels-sized.
209
+ """
210
  distribution = {label: 0.0 for label in all_labels}
211
  for prediction in predictions:
212
  normalized_label = normalizer(prediction["label"])
 
216
 
217
  @staticmethod
218
  def _safe_predict(classifier, image, top_k: int):
219
+ """Wrap classifier(...) so a single model failure can't bring
220
+ down the whole demographic block."""
221
  if classifier is None:
222
  return []
223
  try:
analyzers/emotion_analyzer.py CHANGED
@@ -1,17 +1,38 @@
1
  """
2
- HSEmotionEfficientNet-B0 fine-tuned for 8-class emotion recognition.
3
-
4
- Uses the published HSEmotion checkpoint (Savchenko et al., enet_b0_8_best_afew),
5
- which has actual fine-tuned weights for the 8 emotion classes. The previous
6
- version asked timm for a 1000-class ImageNet checkpoint and reset the head to
7
- 8 randomly-initialized neurons, so the outputs were softmax-over-noise.
8
-
9
- Classes: Anger, Contempt, Disgust, Fear, Happiness, Neutral, Sadness, Surprise.
10
-
11
- Also provides valence (positive/negative) and arousal (calm/excited) scores
12
- derived from the emotion distribution.
13
-
14
- Install: pip install hsemotion
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  """
16
 
17
  from contextlib import contextmanager
@@ -33,7 +54,8 @@ EMOTION_LABELS = [
33
  "happiness", "neutral", "sadness", "surprise",
34
  ]
35
 
36
- # Valence weights for each emotion (-1 to +1)
 
37
  VALENCE_MAP = {
38
  "anger": -0.6,
39
  "contempt": -0.3,
@@ -45,7 +67,7 @@ VALENCE_MAP = {
45
  "surprise": 0.3,
46
  }
47
 
48
- # Arousal weights for each emotion (0 to 1)
49
  AROUSAL_MAP = {
50
  "anger": 0.8,
51
  "contempt": 0.3,
@@ -62,15 +84,11 @@ HSEMOTION_MODEL_NAME = "enet_b0_8_best_afew"
62
 
63
  @contextmanager
64
  def _legacy_torch_load():
65
- """Temporarily make torch.load default to weights_only=False.
66
-
67
- PyTorch 2.6 changed the default to weights_only=True. The HSEmotion
68
- checkpoint is pickled as a full timm.models.efficientnet.EfficientNet
69
- object (not a clean state dict), so the safe unpickler refuses to
70
- deserialize it. We trust this checkpoint (it comes from the published
71
- HSEmotion repo and was already vetted by the pip install), so we opt
72
- back into legacy loading — scoped to just the HSEmotion init so the
73
- rest of the process keeps the safer default.
74
  """
75
  original_load = torch.load
76
 
@@ -91,6 +109,9 @@ class EmotionAnalyzer:
91
  self.recognizer = self._load_model()
92
 
93
  def _load_model(self):
 
 
 
94
  if not HAS_HSEMOTION:
95
  print(
96
  "[EmotionAnalyzer] hsemotion not installed — emotion outputs "
@@ -114,12 +135,16 @@ class EmotionAnalyzer:
114
 
115
  try:
116
  # logits=False → returns post-softmax probabilities.
117
- # HSEmotionRecognizer handles its own resize/normalize/preproc.
 
118
  _, scores = self.recognizer.predict_emotions(img_rgb, logits=False)
119
  except Exception as exc:
120
  print(f"[EmotionAnalyzer] Inference failed: {exc}")
121
  return self._empty_result()
122
 
 
 
 
123
  probs = np.asarray(scores, dtype=float).flatten()
124
  if probs.size != len(EMOTION_LABELS):
125
  print(
@@ -129,26 +154,31 @@ class EmotionAnalyzer:
129
  )
130
  return self._empty_result()
131
 
132
- # Defensive renormalization. With logits=False this is a no-op, but
133
- # guards against future API drift in the hsemotion package.
134
  total = probs.sum()
135
  if total > 0:
136
  probs = probs / total
137
 
 
138
  emotion_scores = {
139
  label: round(float(probs[i]), 3)
140
  for i, label in enumerate(EMOTION_LABELS)
141
  }
142
 
 
 
 
143
  primary_idx = int(np.argmax(probs))
144
  primary_emotion = EMOTION_LABELS[primary_idx]
145
  primary_confidence = float(probs[primary_idx])
146
 
147
- # Secondary emotion (second highest)
148
  sorted_idx = np.argsort(probs)[::-1]
149
  secondary_emotion = EMOTION_LABELS[int(sorted_idx[1])]
150
 
151
- # Calculate valence and arousal
 
 
152
  valence = sum(
153
  probs[i] * VALENCE_MAP[label]
154
  for i, label in enumerate(EMOTION_LABELS)
@@ -174,6 +204,7 @@ class EmotionAnalyzer:
174
 
175
  @staticmethod
176
  def _empty_result() -> dict[str, Any]:
 
177
  return {
178
  "primary_emotion": "unknown",
179
  "emotion_confidence": 0.0,
@@ -182,4 +213,4 @@ class EmotionAnalyzer:
182
  "valence": 0.0,
183
  "arousal": 0.0,
184
  "mood": "unknown",
185
- }
 
1
  """
2
+ EmotionAnalyzerHSEmotion 8-class facial emotion recognition.
3
+
4
+ Model
5
+ -----
6
+ - Architecture : EfficientNet-B0
7
+ - Checkpoint : enet_b0_8_best_afew (Savchenko et al.)
8
+ published by the hsemotion PyPI package
9
+ - Classes (8) : anger, contempt, disgust, fear, happiness,
10
+ neutral, sadness, surprise
11
+ - License : Apache 2.0 (hsemotion package)
12
+ - Source : https://github.com/HSE-asavchenko/face-emotion-recognition
13
+
14
+ Inputs
15
+ ------
16
+ img_rgb : np.ndarray (H, W, 3) uint8. HSEmotionRecognizer handles its
17
+ own resize/normalise internally.
18
+
19
+ Outputs (dict)
20
+ --------------
21
+ primary_emotion, emotion_confidence, secondary_emotion,
22
+ emotion_scores (full distribution), valence (-1..+1), arousal (0..1),
23
+ mood (positive | negative | neutral).
24
+
25
+ Notes
26
+ -----
27
+ Valence and arousal are derived from the emotion distribution using
28
+ hand-set per-emotion weights (VALENCE_MAP / AROUSAL_MAP) — they are
29
+ weighted sums, not separate model outputs.
30
+
31
+ PyTorch 2.6 changed torch.load to weights_only=True by default. The
32
+ HSEmotion checkpoint is pickled as a full timm EfficientNet object
33
+ (not a clean state dict), so the safe unpickler refuses to load it.
34
+ We scope a legacy weights_only=False just around the HSEmotion init
35
+ to keep the rest of the process on the safer default.
36
  """
37
 
38
  from contextlib import contextmanager
 
54
  "happiness", "neutral", "sadness", "surprise",
55
  ]
56
 
57
+ # Per-emotion valence weights. Used to project the 8-class distribution
58
+ # down to a single scalar in [-1, 1] (negative = sad/angry, positive = happy).
59
  VALENCE_MAP = {
60
  "anger": -0.6,
61
  "contempt": -0.3,
 
67
  "surprise": 0.3,
68
  }
69
 
70
+ # Per-emotion arousal weights, scalar in [0, 1] (0 = calm, 1 = intense).
71
  AROUSAL_MAP = {
72
  "anger": 0.8,
73
  "contempt": 0.3,
 
84
 
85
  @contextmanager
86
  def _legacy_torch_load():
87
+ """Temporarily switch torch.load back to weights_only=False.
88
+
89
+ Scoped via a context manager so only the HSEmotion init runs with
90
+ the legacy default; everything else keeps PyTorch 2.6's safer
91
+ weights_only=True behaviour.
 
 
 
 
92
  """
93
  original_load = torch.load
94
 
 
109
  self.recognizer = self._load_model()
110
 
111
  def _load_model(self):
112
+ # Without the hsemotion package installed there's no model to
113
+ # load. We log once and the rest of the service still works —
114
+ # the emotion fields just stay "unknown".
115
  if not HAS_HSEMOTION:
116
  print(
117
  "[EmotionAnalyzer] hsemotion not installed — emotion outputs "
 
135
 
136
  try:
137
  # logits=False → returns post-softmax probabilities.
138
+ # The recognizer handles its own resize/normalize/preproc,
139
+ # so we hand it the raw RGB ndarray.
140
  _, scores = self.recognizer.predict_emotions(img_rgb, logits=False)
141
  except Exception as exc:
142
  print(f"[EmotionAnalyzer] Inference failed: {exc}")
143
  return self._empty_result()
144
 
145
+ # Flatten to a 1D numpy array and sanity-check its length matches
146
+ # the class list. Mismatch likely means the upstream package
147
+ # changed its class count.
148
  probs = np.asarray(scores, dtype=float).flatten()
149
  if probs.size != len(EMOTION_LABELS):
150
  print(
 
154
  )
155
  return self._empty_result()
156
 
157
+ # Defensive renormalisation. With logits=False this is a no-op,
158
+ # but it guards against future API drift in the hsemotion package.
159
  total = probs.sum()
160
  if total > 0:
161
  probs = probs / total
162
 
163
+ # Build the {emotion: probability} dict for downstream display.
164
  emotion_scores = {
165
  label: round(float(probs[i]), 3)
166
  for i, label in enumerate(EMOTION_LABELS)
167
  }
168
 
169
+ # Primary = argmax of the distribution; secondary = second-highest.
170
+ # These are the two most-likely emotions, useful when the model
171
+ # is genuinely uncertain between two similar classes.
172
  primary_idx = int(np.argmax(probs))
173
  primary_emotion = EMOTION_LABELS[primary_idx]
174
  primary_confidence = float(probs[primary_idx])
175
 
 
176
  sorted_idx = np.argsort(probs)[::-1]
177
  secondary_emotion = EMOTION_LABELS[int(sorted_idx[1])]
178
 
179
+ # Valence and arousal: weighted sums over the distribution. A
180
+ # confidently-happy face gives valence ~0.9; a fearful one drops
181
+ # into negative territory with high arousal.
182
  valence = sum(
183
  probs[i] * VALENCE_MAP[label]
184
  for i, label in enumerate(EMOTION_LABELS)
 
204
 
205
  @staticmethod
206
  def _empty_result() -> dict[str, Any]:
207
+ """Stub used when HSEmotion isn't available or inference fails."""
208
  return {
209
  "primary_emotion": "unknown",
210
  "emotion_confidence": 0.0,
 
213
  "valence": 0.0,
214
  "arousal": 0.0,
215
  "mood": "unknown",
216
+ }
analyzers/hair_type_analyzer.py ADDED
@@ -0,0 +1,87 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ HairTypeAnalyzer — hair texture classifier.
3
+
4
+ Model
5
+ -----
6
+ - Architecture : Vision Transformer (ViT-B/16)
7
+ - HF repo : dima806/hair_type_image_detection
8
+ - License : Apache 2.0
9
+ - Classes (5) : curly, dreadlocks, kinky, straight, wavy
10
+ - Reported acc : 93% overall.
11
+ Per-class F1: dreadlocks 0.978, kinky 0.949,
12
+ straight 0.927, curly 0.902, wavy 0.884.
13
+
14
+ Inputs
15
+ ------
16
+ img_rgb : np.ndarray (H, W, 3) uint8
17
+
18
+ Outputs (dict)
19
+ --------------
20
+ hair_type — argmax label
21
+ hair_type_confidence — argmax softmax score
22
+ hair_type_scores — full {class: score} dict
23
+
24
+ Notes
25
+ -----
26
+ This is the authoritative hair-texture output. The Laplacian-std-
27
+ based `hair_texture` field from ColorAnalyzer is a coarse fallback
28
+ that runs even when this model is unavailable.
29
+ """
30
+
31
+ from typing import Any
32
+
33
+ from PIL import Image
34
+ from transformers import pipeline
35
+
36
+
37
+ MODEL_ID = "dima806/hair_type_image_detection"
38
+
39
+ # Canonical class names in lowercase. Pipeline output is normalised
40
+ # to these on the way out.
41
+ _KNOWN = {"curly", "dreadlocks", "kinky", "straight", "wavy"}
42
+
43
+
44
+ class HairTypeAnalyzer:
45
+ def __init__(self):
46
+ self.classifier = None
47
+ try:
48
+ self.classifier = pipeline("image-classification", model=MODEL_ID)
49
+ except Exception as exc:
50
+ print(f"[HairTypeAnalyzer] Failed to load {MODEL_ID}: {exc}")
51
+
52
+ def analyze(self, img_rgb) -> dict[str, Any]:
53
+ if self.classifier is None:
54
+ return self._empty_result()
55
+
56
+ try:
57
+ pil = Image.fromarray(img_rgb)
58
+ # Pull all five class probabilities so downstream code can
59
+ # inspect the full distribution (e.g. wavy-vs-curly margin).
60
+ preds = self.classifier(pil, top_k=len(_KNOWN))
61
+ except Exception as exc:
62
+ print(f"[HairTypeAnalyzer] Prediction failed: {exc}")
63
+ return self._empty_result()
64
+
65
+ # Normalise label casing and build the score map.
66
+ scores = {label: 0.0 for label in _KNOWN}
67
+ for pred in preds:
68
+ label = str(pred["label"]).strip().lower()
69
+ if label in scores:
70
+ scores[label] = round(float(pred["score"]), 3)
71
+
72
+ top_label = max(scores, key=scores.get)
73
+ top_score = scores[top_label]
74
+
75
+ return {
76
+ "hair_type": top_label,
77
+ "hair_type_confidence": top_score,
78
+ "hair_type_scores": scores,
79
+ }
80
+
81
+ @staticmethod
82
+ def _empty_result() -> dict[str, Any]:
83
+ return {
84
+ "hair_type": "unknown",
85
+ "hair_type_confidence": 0.0,
86
+ "hair_type_scores": {label: 0.0 for label in _KNOWN},
87
+ }
analyzers/landmark_analyzer.py CHANGED
@@ -1,20 +1,43 @@
1
  """
2
- MediaPipe Face Landmarker 478 3D landmarks + 52 blendshapes.
3
- Derives geometric facial features from landmark positions using pure math.
4
-
5
- This is the backbone of the system. From 478 3D points placed on the face,
6
- we calculate distances, ratios, and angles to determine:
7
- - Face shape (oval, round, square, heart, diamond, oblong, triangle)
8
- - Jawline type (sharp, soft, strong)
9
- - Chin type (pointed, wide, normal)
10
- - Cheekbone prominence
11
- - Forehead width
12
- - Eye shape, spacing, size, depth
13
- - Eyebrow shape, arch, thickness
14
- - Nose shape, bridge height, nostril width, tip shape
15
- - Lip fullness, mouth width, cupid's bow
16
- - Smile detection, asymmetry, dimples
17
- - Overall facial asymmetry score
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  """
19
 
20
  import math
@@ -27,6 +50,7 @@ import numpy as np
27
  from mediapipe.tasks import python as mp_python
28
  from mediapipe.tasks.python import vision
29
 
 
30
  MODEL_URL = (
31
  "https://storage.googleapis.com/mediapipe-models/"
32
  "face_landmarker/face_landmarker/float16/latest/face_landmarker.task"
@@ -36,6 +60,9 @@ MODEL_PATH = "models/face_landmarker.task"
36
 
37
  class LandmarkAnalyzer:
38
  def __init__(self):
 
 
 
39
  base_options = mp_python.BaseOptions(
40
  model_asset_path=self._ensure_model()
41
  )
@@ -49,7 +76,7 @@ class LandmarkAnalyzer:
49
 
50
  @staticmethod
51
  def _ensure_model() -> str:
52
- """Download the MediaPipe model if not already cached."""
53
  if not os.path.exists(MODEL_PATH):
54
  os.makedirs("models", exist_ok=True)
55
  urllib.request.urlretrieve(MODEL_URL, MODEL_PATH)
@@ -60,34 +87,48 @@ class LandmarkAnalyzer:
60
  # ------------------------------------------------------------------
61
 
62
  def analyze(self, img_rgb: np.ndarray) -> dict[str, Any]:
 
 
 
63
  mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=img_rgb)
64
  result = self.detector.detect(mp_image)
65
 
66
  if not result.face_landmarks:
67
  return {"error": "No face detected by MediaPipe"}
68
 
 
 
69
  landmarks = result.face_landmarks[0]
70
  lm = [{"x": l.x, "y": l.y, "z": l.z} for l in landmarks]
71
 
 
 
72
  blendshapes: dict[str, float] = {}
73
  if result.face_blendshapes:
74
  for bs in result.face_blendshapes[0]:
75
  blendshapes[bs.category_name] = round(bs.score, 4)
76
 
 
 
77
  attrs: dict[str, Any] = {"_raw_landmarks": lm}
78
 
79
  # ── Face Shape ────────────────────────────────────────────────
80
- face_height = self._dist(lm[10], lm[152])
81
- face_width = self._dist(lm[234], lm[454])
82
- jaw_width = self._dist(lm[172], lm[397])
83
- cheekbone_width = self._dist(lm[93], lm[323])
84
- forehead_width = self._dist(lm[54], lm[284])
 
 
 
85
 
86
  wh_ratio = face_width / face_height if face_height else 1
87
  jaw_to_face = jaw_width / face_width if face_width else 1
88
  forehead_to_jaw = forehead_width / jaw_width if jaw_width else 1
89
  cheek_to_jaw = cheekbone_width / jaw_width if jaw_width else 1
90
 
 
 
91
  if wh_ratio > 0.85 and jaw_to_face > 0.75:
92
  attrs["face_shape"] = "round"
93
  elif wh_ratio > 0.8 and jaw_to_face > 0.8 and forehead_to_jaw < 1.1:
@@ -110,13 +151,16 @@ class LandmarkAnalyzer:
110
  "cheekbone_to_jaw_ratio": round(cheek_to_jaw, 3),
111
  }
112
 
113
- # ── Forehead ─────────────────────────────────────────────────
 
114
  fh_ratio = forehead_width / face_width if face_width else 0.6
115
  attrs["forehead_width"] = (
116
  "broad" if fh_ratio > 0.7 else "narrow" if fh_ratio < 0.55 else "average"
117
  )
118
 
119
  # ── Jawline ──────────────────────────────────────────────────
 
 
120
  jaw_angle = self._jaw_angle(lm)
121
  attrs["jawline_angle"] = round(jaw_angle, 1)
122
  if jaw_angle < 110:
@@ -129,6 +173,7 @@ class LandmarkAnalyzer:
129
  attrs["jawline_type"] = "soft"
130
 
131
  # ── Chin ─────────────────────────────────────────────────────
 
132
  chin_width = self._dist(lm[175], lm[396])
133
  chin_ratio = chin_width / jaw_width if jaw_width else 0.4
134
  attrs["chin_type"] = (
@@ -138,12 +183,16 @@ class LandmarkAnalyzer:
138
  )
139
 
140
  # ── Cheekbones ───────────────────────────────────────────────
 
 
141
  cheek_z = (lm[93]["z"] + lm[323]["z"]) / 2
142
  attrs["cheekbone_prominence"] = (
143
  "high" if cheek_z < -0.04
144
  else "flat" if cheek_z > 0.0
145
  else "moderate"
146
  )
 
 
147
  cheek_puff = blendshapes.get("cheekPuff", 0)
148
  if cheek_puff > 0.3:
149
  attrs["cheek_fullness"] = "full"
@@ -153,12 +202,16 @@ class LandmarkAnalyzer:
153
  attrs["cheek_fullness"] = "normal"
154
 
155
  # ── Eyes ─────────────────────────────────────────────────────
 
 
156
  l_top, l_bot = lm[159], lm[145]
157
  l_inner, l_outer = lm[133], lm[33]
158
  eye_open = self._dist(l_top, l_bot)
159
  eye_w = self._dist(l_inner, l_outer)
160
  eye_ratio = eye_open / eye_w if eye_w else 0.3
161
 
 
 
162
  outer_angle = l_outer["y"] - l_inner["y"]
163
  if outer_angle < -0.012:
164
  attrs["eye_shape"] = "upturned"
@@ -171,7 +224,7 @@ class LandmarkAnalyzer:
171
  else:
172
  attrs["eye_shape"] = "almond"
173
 
174
- # Deep-set vs protruding
175
  eye_z = (lm[159]["z"] + lm[145]["z"]) / 2
176
  nose_bridge_z = lm[6]["z"]
177
  if eye_z > nose_bridge_z + 0.02:
@@ -181,7 +234,8 @@ class LandmarkAnalyzer:
181
  else:
182
  attrs["eye_depth"] = "normal"
183
 
184
- # Eye spacing
 
185
  if len(lm) > 473:
186
  inter_pupillary = self._dist(lm[468], lm[473])
187
  else:
@@ -193,7 +247,8 @@ class LandmarkAnalyzer:
193
  else "average"
194
  )
195
 
196
- # Eye size
 
197
  r_top, r_bot = lm[386], lm[374]
198
  r_inner, r_outer = lm[362], lm[263]
199
  r_area = self._dist(r_top, r_bot) * self._dist(r_inner, r_outer)
@@ -207,6 +262,8 @@ class LandmarkAnalyzer:
207
  else "average"
208
  )
209
 
 
 
210
  blink_l = blendshapes.get("eyeBlinkLeft", 0)
211
  blink_r = blendshapes.get("eyeBlinkRight", 0)
212
  attrs["eyes_open"] = (blink_l + blink_r) / 2 < 0.5
@@ -215,6 +272,8 @@ class LandmarkAnalyzer:
215
  brow_mid = lm[105]
216
  brow_outer = lm[46]
217
  brow_inner = lm[70]
 
 
218
  brow_to_eye = self._dist(brow_mid, lm[159])
219
  brow_arch_ratio = brow_to_eye / eye_open if eye_open else 1.5
220
 
@@ -224,6 +283,8 @@ class LandmarkAnalyzer:
224
  else "average"
225
  )
226
 
 
 
227
  mid_y = brow_mid["y"]
228
  avg_end_y = (brow_inner["y"] + brow_outer["y"]) / 2
229
  curvature = mid_y - avg_end_y
@@ -234,6 +295,7 @@ class LandmarkAnalyzer:
234
  else:
235
  attrs["eyebrow_shape"] = "flat"
236
 
 
237
  brow_top = lm[66]
238
  brow_bottom = lm[105]
239
  brow_thickness = self._dist(brow_top, brow_bottom)
@@ -243,6 +305,7 @@ class LandmarkAnalyzer:
243
  else "medium"
244
  )
245
 
 
246
  inner_brow_dist = self._dist(lm[70], lm[300])
247
  attrs["possible_unibrow"] = inner_brow_dist < 0.04
248
 
@@ -261,6 +324,8 @@ class LandmarkAnalyzer:
261
  else "average"
262
  )
263
 
 
 
264
  tip_angle = nose_tip["y"] - nose_bottom["y"]
265
  if tip_angle < -0.005:
266
  attrs["nose_shape"] = "upturned"
@@ -273,16 +338,21 @@ class LandmarkAnalyzer:
273
  else:
274
  attrs["nose_shape"] = "straight"
275
 
 
276
  attrs["nose_bridge"] = (
277
  "high" if nose_bridge_top["z"] < -0.05
278
  else "flat" if nose_bridge_top["z"] > 0.0
279
  else "average"
280
  )
 
281
  attrs["nose_tip_shape"] = (
282
  "pointed" if nose_tip["z"] < nose_bottom["z"] - 0.01 else "rounded"
283
  )
284
 
285
  # ── Lips & Mouth ─────────────────────────────────────────────
 
 
 
286
  ul_top, ul_bot = lm[0], lm[13]
287
  ll_top, ll_bot = lm[14], lm[17]
288
  m_left, m_right = lm[61], lm[291]
@@ -298,6 +368,7 @@ class LandmarkAnalyzer:
298
  else "thin" if lip_ratio < 0.22
299
  else "average"
300
  )
 
301
  attrs["lip_balance"] = (
302
  "top-heavy" if ul_h > ll_h * 1.2
303
  else "bottom-heavy" if ll_h > ul_h * 1.2
@@ -311,7 +382,8 @@ class LandmarkAnalyzer:
311
  else "average"
312
  )
313
 
314
- # Cupid's bow
 
315
  c_left, c_center, c_right = lm[37], lm[0], lm[267]
316
  bow = c_center["y"] - (c_left["y"] + c_right["y"]) / 2
317
  attrs["cupids_bow"] = (
@@ -320,7 +392,9 @@ class LandmarkAnalyzer:
320
  else "flat"
321
  )
322
 
323
- # Smile
 
 
324
  smile_l = blendshapes.get("mouthSmileLeft", 0)
325
  smile_r = blendshapes.get("mouthSmileRight", 0)
326
  attrs["smiling"] = (smile_l + smile_r) / 2 > 0.4
@@ -330,6 +404,9 @@ class LandmarkAnalyzer:
330
  )
331
 
332
  # ── Facial Asymmetry ─────────────────────────────────────────
 
 
 
333
  pairs = [
334
  (33, 263), (133, 362), (70, 300), (93, 323), (172, 397),
335
  (61, 291), (159, 386), (145, 374), (46, 276),
@@ -341,6 +418,8 @@ class LandmarkAnalyzer:
341
  min(asym / len(pairs) / 0.05, 1.0), 3
342
  )
343
 
 
 
344
  attrs["blendshapes"] = blendshapes
345
  return attrs
346
 
@@ -350,6 +429,7 @@ class LandmarkAnalyzer:
350
 
351
  @staticmethod
352
  def _dist(a: dict, b: dict) -> float:
 
353
  return math.sqrt(
354
  (a["x"] - b["x"]) ** 2
355
  + (a["y"] - b["y"]) ** 2
@@ -358,6 +438,11 @@ class LandmarkAnalyzer:
358
 
359
  @staticmethod
360
  def _jaw_angle(lm: list[dict]) -> float:
 
 
 
 
 
361
  chin = lm[152]
362
  left_jaw, right_jaw = lm[172], lm[397]
363
  v1 = (left_jaw["x"] - chin["x"], left_jaw["y"] - chin["y"])
 
1
  """
2
+ LandmarkAnalyzer — MediaPipe Face Landmarker geometric feature extractor.
3
+
4
+ Model
5
+ -----
6
+ - Architecture : MediaPipe Face Landmarker (TF Lite, Google)
7
+ - Weights : face_landmarker.task (float16, auto-downloaded, ~4 MB)
8
+ - Outputs : 478 normalised 3D landmarks + 52 ARKit-compatible blendshapes
9
+ - License : Apache 2.0
10
+
11
+ Inputs
12
+ ------
13
+ img_rgb : np.ndarray (H, W, 3) uint8, RGB order.
14
+
15
+ Outputs (dict)
16
+ --------------
17
+ Most fields are categorical strings derived from landmark distances,
18
+ ratios and angles. A few come straight from blendshape activations.
19
+
20
+ Face shape / structure :
21
+ face_shape, face_shape_metrics, forehead_width,
22
+ jawline_angle, jawline_type, chin_type,
23
+ cheekbone_prominence, cheek_fullness, facial_asymmetry_score
24
+ Eyes :
25
+ eye_shape, eye_depth, eye_spacing, eye_size, eyes_open
26
+ Eyebrows :
27
+ eyebrow_arch_height, eyebrow_shape, eyebrow_thickness, possible_unibrow
28
+ Nose :
29
+ nose_shape, nose_bridge, nose_tip_shape, nostril_width
30
+ Lips & mouth :
31
+ lip_fullness, lip_balance, mouth_width, cupids_bow,
32
+ smiling, smile_asymmetry, possible_dimples
33
+ Raw payloads (used downstream, stripped before JSON) :
34
+ _raw_landmarks, blendshapes
35
+
36
+ Notes
37
+ -----
38
+ All thresholds were hand-tuned against representative photos.
39
+ They are conservative: when a ratio sits near a boundary the analyzer
40
+ prefers "average" / "normal" over committing to an extreme bucket.
41
  """
42
 
43
  import math
 
50
  from mediapipe.tasks import python as mp_python
51
  from mediapipe.tasks.python import vision
52
 
53
+ # Float16 MediaPipe weight file. ~4 MB, auto-fetched once and cached.
54
  MODEL_URL = (
55
  "https://storage.googleapis.com/mediapipe-models/"
56
  "face_landmarker/face_landmarker/float16/latest/face_landmarker.task"
 
60
 
61
  class LandmarkAnalyzer:
62
  def __init__(self):
63
+ # Configure the detector to emit both blendshapes and the 4x4
64
+ # facial transformation matrix; the latter is unused for now but
65
+ # cheap to compute and useful if we ever need head pose.
66
  base_options = mp_python.BaseOptions(
67
  model_asset_path=self._ensure_model()
68
  )
 
76
 
77
  @staticmethod
78
  def _ensure_model() -> str:
79
+ """Cache the MediaPipe weight file on disk on first run."""
80
  if not os.path.exists(MODEL_PATH):
81
  os.makedirs("models", exist_ok=True)
82
  urllib.request.urlretrieve(MODEL_URL, MODEL_PATH)
 
87
  # ------------------------------------------------------------------
88
 
89
  def analyze(self, img_rgb: np.ndarray) -> dict[str, Any]:
90
+ # Wrap the numpy array as a MediaPipe Image and run detection.
91
+ # If no face is found, downstream analyzers will see no landmarks
92
+ # and gracefully degrade to "unknown" fields.
93
  mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=img_rgb)
94
  result = self.detector.detect(mp_image)
95
 
96
  if not result.face_landmarks:
97
  return {"error": "No face detected by MediaPipe"}
98
 
99
+ # MediaPipe returns landmarks as NamedTuples; convert to plain
100
+ # dicts so downstream code (and JSON serialisation) is simpler.
101
  landmarks = result.face_landmarks[0]
102
  lm = [{"x": l.x, "y": l.y, "z": l.z} for l in landmarks]
103
 
104
+ # Build the {blendshape_name: score} dict. ARKit-compatible names
105
+ # like mouthSmileLeft, eyeBlinkRight, jawOpen.
106
  blendshapes: dict[str, float] = {}
107
  if result.face_blendshapes:
108
  for bs in result.face_blendshapes[0]:
109
  blendshapes[bs.category_name] = round(bs.score, 4)
110
 
111
+ # _raw_landmarks is consumed by ColorAnalyzer (iris + lip masks).
112
+ # The leading underscore tells app.py to strip it before JSON.
113
  attrs: dict[str, Any] = {"_raw_landmarks": lm}
114
 
115
  # ── Face Shape ────────────────────────────────────────────────
116
+ # Four ratios that, taken together, distinguish the seven canonical
117
+ # face shapes. All distances are in normalised image coordinates
118
+ # so the ratios are resolution-independent.
119
+ face_height = self._dist(lm[10], lm[152]) # forehead top → chin bottom
120
+ face_width = self._dist(lm[234], lm[454]) # left zygion → right zygion
121
+ jaw_width = self._dist(lm[172], lm[397]) # left gonion → right gonion
122
+ cheekbone_width = self._dist(lm[93], lm[323]) # left zygomatic → right
123
+ forehead_width = self._dist(lm[54], lm[284]) # left frontal → right frontal
124
 
125
  wh_ratio = face_width / face_height if face_height else 1
126
  jaw_to_face = jaw_width / face_width if face_width else 1
127
  forehead_to_jaw = forehead_width / jaw_width if jaw_width else 1
128
  cheek_to_jaw = cheekbone_width / jaw_width if jaw_width else 1
129
 
130
+ # Cascade ordered by specificity — a face that matches multiple
131
+ # categories is bucketed by the strictest matching rule.
132
  if wh_ratio > 0.85 and jaw_to_face > 0.75:
133
  attrs["face_shape"] = "round"
134
  elif wh_ratio > 0.8 and jaw_to_face > 0.8 and forehead_to_jaw < 1.1:
 
151
  "cheekbone_to_jaw_ratio": round(cheek_to_jaw, 3),
152
  }
153
 
154
+ # ── Forehead width (broad / average / narrow) ────────────────
155
+ # Forehead width relative to overall face width.
156
  fh_ratio = forehead_width / face_width if face_width else 0.6
157
  attrs["forehead_width"] = (
158
  "broad" if fh_ratio > 0.7 else "narrow" if fh_ratio < 0.55 else "average"
159
  )
160
 
161
  # ── Jawline ──────────────────────────────────────────────────
162
+ # Angle subtended at the chin point by the two gonion landmarks.
163
+ # Smaller angle = sharper jawline; larger = softer.
164
  jaw_angle = self._jaw_angle(lm)
165
  attrs["jawline_angle"] = round(jaw_angle, 1)
166
  if jaw_angle < 110:
 
173
  attrs["jawline_type"] = "soft"
174
 
175
  # ── Chin ─────────────────────────────────────────────────────
176
+ # Chin width vs jaw width: narrower chin → pointier appearance.
177
  chin_width = self._dist(lm[175], lm[396])
178
  chin_ratio = chin_width / jaw_width if jaw_width else 0.4
179
  attrs["chin_type"] = (
 
183
  )
184
 
185
  # ── Cheekbones ───────────────────────────────────────────────
186
+ # Z (depth) is signed: negative values are closer to the camera.
187
+ # Prominent cheekbones project forward → more negative cheek_z.
188
  cheek_z = (lm[93]["z"] + lm[323]["z"]) / 2
189
  attrs["cheekbone_prominence"] = (
190
  "high" if cheek_z < -0.04
191
  else "flat" if cheek_z > 0.0
192
  else "moderate"
193
  )
194
+ # cheekPuff blendshape catches actively puffed-out cheeks; a flat
195
+ # cheek_z signals a hollow look in the absence of puff.
196
  cheek_puff = blendshapes.get("cheekPuff", 0)
197
  if cheek_puff > 0.3:
198
  attrs["cheek_fullness"] = "full"
 
202
  attrs["cheek_fullness"] = "normal"
203
 
204
  # ── Eyes ─────────────────────────────────────────────────────
205
+ # Left-eye landmarks. eye_open is vertical lid distance,
206
+ # eye_w is the inner→outer corner distance.
207
  l_top, l_bot = lm[159], lm[145]
208
  l_inner, l_outer = lm[133], lm[33]
209
  eye_open = self._dist(l_top, l_bot)
210
  eye_w = self._dist(l_inner, l_outer)
211
  eye_ratio = eye_open / eye_w if eye_w else 0.3
212
 
213
+ # Outer-corner Y relative to inner corner classifies tilt.
214
+ # Hooded vs round vs almond come from the openness ratio.
215
  outer_angle = l_outer["y"] - l_inner["y"]
216
  if outer_angle < -0.012:
217
  attrs["eye_shape"] = "upturned"
 
224
  else:
225
  attrs["eye_shape"] = "almond"
226
 
227
+ # Deep-set vs protruding: compare eye-region z vs nose-bridge z.
228
  eye_z = (lm[159]["z"] + lm[145]["z"]) / 2
229
  nose_bridge_z = lm[6]["z"]
230
  if eye_z > nose_bridge_z + 0.02:
 
234
  else:
235
  attrs["eye_depth"] = "normal"
236
 
237
+ # Eye spacing: prefer pupil-to-pupil if iris landmarks (468/473)
238
+ # are present, otherwise fall back to inner-corner distance.
239
  if len(lm) > 473:
240
  inter_pupillary = self._dist(lm[468], lm[473])
241
  else:
 
247
  else "average"
248
  )
249
 
250
+ # Eye size: avg of left & right eye-region bounding-box area,
251
+ # relative to overall face area.
252
  r_top, r_bot = lm[386], lm[374]
253
  r_inner, r_outer = lm[362], lm[263]
254
  r_area = self._dist(r_top, r_bot) * self._dist(r_inner, r_outer)
 
262
  else "average"
263
  )
264
 
265
+ # eyeBlink blendshapes flip to ~1.0 when the eye is closed.
266
+ # eyes_open = True iff average blink activation is < 0.5.
267
  blink_l = blendshapes.get("eyeBlinkLeft", 0)
268
  blink_r = blendshapes.get("eyeBlinkRight", 0)
269
  attrs["eyes_open"] = (blink_l + blink_r) / 2 < 0.5
 
272
  brow_mid = lm[105]
273
  brow_outer = lm[46]
274
  brow_inner = lm[70]
275
+ # Vertical distance from brow-mid to upper-eyelid is roughly
276
+ # proportional to perceived "arch height" relative to eye size.
277
  brow_to_eye = self._dist(brow_mid, lm[159])
278
  brow_arch_ratio = brow_to_eye / eye_open if eye_open else 1.5
279
 
 
283
  else "average"
284
  )
285
 
286
+ # Curvature = mid Y vs avg of inner+outer Ys. Negative curvature
287
+ # (mid sits higher than the ends) → arched; near-zero → straight.
288
  mid_y = brow_mid["y"]
289
  avg_end_y = (brow_inner["y"] + brow_outer["y"]) / 2
290
  curvature = mid_y - avg_end_y
 
295
  else:
296
  attrs["eyebrow_shape"] = "flat"
297
 
298
+ # Brow thickness from top-to-bottom landmark span.
299
  brow_top = lm[66]
300
  brow_bottom = lm[105]
301
  brow_thickness = self._dist(brow_top, brow_bottom)
 
305
  else "medium"
306
  )
307
 
308
+ # Inner-brow distance below ~4% of face width suggests a unibrow.
309
  inner_brow_dist = self._dist(lm[70], lm[300])
310
  attrs["possible_unibrow"] = inner_brow_dist < 0.04
311
 
 
324
  else "average"
325
  )
326
 
327
+ # Tip vertical offset relative to nose base distinguishes
328
+ # upturned (tip sits higher) from aquiline (tip droops down).
329
  tip_angle = nose_tip["y"] - nose_bottom["y"]
330
  if tip_angle < -0.005:
331
  attrs["nose_shape"] = "upturned"
 
338
  else:
339
  attrs["nose_shape"] = "straight"
340
 
341
+ # Bridge: high bridges project toward camera (more negative z).
342
  attrs["nose_bridge"] = (
343
  "high" if nose_bridge_top["z"] < -0.05
344
  else "flat" if nose_bridge_top["z"] > 0.0
345
  else "average"
346
  )
347
+ # Pointed tip: tip projects forward of nostril base.
348
  attrs["nose_tip_shape"] = (
349
  "pointed" if nose_tip["z"] < nose_bottom["z"] - 0.01 else "rounded"
350
  )
351
 
352
  # ── Lips & Mouth ─────────────────────────────────────────────
353
+ # Top and bottom of upper lip, top and bottom of lower lip, plus
354
+ # the mouth corners. lip_ratio compares stacked lip height to
355
+ # mouth width — full vs thin lips.
356
  ul_top, ul_bot = lm[0], lm[13]
357
  ll_top, ll_bot = lm[14], lm[17]
358
  m_left, m_right = lm[61], lm[291]
 
368
  else "thin" if lip_ratio < 0.22
369
  else "average"
370
  )
371
+ # Balance compares upper-lip thickness to lower-lip thickness.
372
  attrs["lip_balance"] = (
373
  "top-heavy" if ul_h > ll_h * 1.2
374
  else "bottom-heavy" if ll_h > ul_h * 1.2
 
382
  else "average"
383
  )
384
 
385
+ # Cupid's bow: depression at the centre of the upper lip relative
386
+ # to the two peak landmarks on either side.
387
  c_left, c_center, c_right = lm[37], lm[0], lm[267]
388
  bow = c_center["y"] - (c_left["y"] + c_right["y"]) / 2
389
  attrs["cupids_bow"] = (
 
392
  else "flat"
393
  )
394
 
395
+ # Smiling and dimples come directly from blendshape activations.
396
+ # smile_asymmetry is the absolute difference between left/right
397
+ # mouthSmile scores — non-zero on lopsided smiles.
398
  smile_l = blendshapes.get("mouthSmileLeft", 0)
399
  smile_r = blendshapes.get("mouthSmileRight", 0)
400
  attrs["smiling"] = (smile_l + smile_r) / 2 > 0.4
 
404
  )
405
 
406
  # ── Facial Asymmetry ─────────────────────────────────────────
407
+ # Sum mirror-pair x-coordinate offsets from the midline (x=0.5)
408
+ # over 9 paired landmarks. Normalise so a perfectly symmetric
409
+ # face scores ~0 and visibly asymmetric ones approach 1.
410
  pairs = [
411
  (33, 263), (133, 362), (70, 300), (93, 323), (172, 397),
412
  (61, 291), (159, 386), (145, 374), (46, 276),
 
418
  min(asym / len(pairs) / 0.05, 1.0), 3
419
  )
420
 
421
+ # Exposed for downstream consumers (e.g. the screen reads
422
+ # blendshapes.jawOpen to compute mouth_open).
423
  attrs["blendshapes"] = blendshapes
424
  return attrs
425
 
 
429
 
430
  @staticmethod
431
  def _dist(a: dict, b: dict) -> float:
432
+ """Euclidean distance between two landmarks in 3D space."""
433
  return math.sqrt(
434
  (a["x"] - b["x"]) ** 2
435
  + (a["y"] - b["y"]) ** 2
 
438
 
439
  @staticmethod
440
  def _jaw_angle(lm: list[dict]) -> float:
441
+ """Angle (degrees) subtended at the chin by the two gonion points.
442
+
443
+ Operates in 2D image space — z is intentionally ignored so the
444
+ angle reflects what the camera sees, not the underlying anatomy.
445
+ """
446
  chin = lm[152]
447
  left_jaw, right_jaw = lm[172], lm[397]
448
  v1 = (left_jaw["x"] - chin["x"], left_jaw["y"] - chin["y"])
analyzers/obstruction_analyzer.py ADDED
@@ -0,0 +1,108 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ ObstructionAnalyzer — face obstruction classifier.
3
+
4
+ Model
5
+ -----
6
+ - Architecture : Vision Transformer (ViT-B/16)
7
+ - HF repo : dima806/face_obstruction_image_detection
8
+ - License : Apache 2.0
9
+ - Classes (6) : sunglasses, glasses, mask, hand, other, none
10
+ - Reported acc : ~91% overall.
11
+ 99.7% / 99.85% precision/recall on sunglasses
12
+ 99.0% / 99.7% precision/recall on glasses
13
+ 99.7% / 99.85% precision/recall on mask
14
+ Hand and "other" are much weaker (~71-75%); we don't
15
+ surface those as booleans.
16
+
17
+ Inputs
18
+ ------
19
+ img_rgb : np.ndarray (H, W, 3) uint8
20
+
21
+ Outputs (dict)
22
+ --------------
23
+ obstruction_top — argmax label
24
+ obstruction_confidence — argmax softmax score
25
+ obstruction_scores — full {class: score} dict
26
+ wearing_glasses — bool (true when glasses OR sunglasses > 0.5)
27
+ wearing_sunglasses — bool
28
+ wearing_mask — bool
29
+
30
+ Notes
31
+ -----
32
+ Same author as the FairFace age/gender models already in
33
+ DemographicAnalyzer. Built specifically for the glasses/sunglasses/mask
34
+ case, which is why precision/recall on those three classes is so high.
35
+ """
36
+
37
+ from typing import Any
38
+
39
+ from PIL import Image
40
+ from transformers import pipeline
41
+
42
+
43
+ MODEL_ID = "dima806/face_obstruction_image_detection"
44
+
45
+ # Canonical labels in lowercase. The pipeline may return any casing —
46
+ # we normalise on the way out so downstream code keys consistently.
47
+ _KNOWN = {"sunglasses", "glasses", "mask", "hand", "other", "none"}
48
+
49
+
50
+ class ObstructionAnalyzer:
51
+ def __init__(self):
52
+ self.classifier = None
53
+ try:
54
+ # HF image-classification pipeline. Weights lazy-load from
55
+ # the Hub on first instantiation and cache locally.
56
+ self.classifier = pipeline("image-classification", model=MODEL_ID)
57
+ except Exception as exc:
58
+ print(f"[ObstructionAnalyzer] Failed to load {MODEL_ID}: {exc}")
59
+
60
+ def analyze(self, img_rgb) -> dict[str, Any]:
61
+ # Empty stub when the model failed to load — keeps the result
62
+ # dict shape stable so the merge in app.py never sees missing keys.
63
+ if self.classifier is None:
64
+ return self._empty_result()
65
+
66
+ try:
67
+ pil = Image.fromarray(img_rgb)
68
+ # top_k=len(_KNOWN) → full softmax across all six classes.
69
+ preds = self.classifier(pil, top_k=len(_KNOWN))
70
+ except Exception as exc:
71
+ print(f"[ObstructionAnalyzer] Prediction failed: {exc}")
72
+ return self._empty_result()
73
+
74
+ # Flatten predictions into a {label: score} dict, normalising
75
+ # label casing as we go. Unseen labels stay at 0.
76
+ scores = {label: 0.0 for label in _KNOWN}
77
+ for pred in preds:
78
+ label = str(pred["label"]).strip().lower()
79
+ if label in scores:
80
+ scores[label] = round(float(pred["score"]), 3)
81
+
82
+ # Top class wins.
83
+ top_label = max(scores, key=scores.get)
84
+ top_score = scores[top_label]
85
+
86
+ return {
87
+ "obstruction_top": top_label,
88
+ "obstruction_confidence": top_score,
89
+ "obstruction_scores": scores,
90
+ # Specific boolean flags the UI consumes directly.
91
+ # `wearing_glasses` is True for any kind of eyewear — the
92
+ # caller can branch on `wearing_sunglasses` if it cares
93
+ # about tinted vs clear lenses.
94
+ "wearing_glasses": scores["glasses"] > 0.5 or scores["sunglasses"] > 0.5,
95
+ "wearing_sunglasses": scores["sunglasses"] > 0.5,
96
+ "wearing_mask": scores["mask"] > 0.5,
97
+ }
98
+
99
+ @staticmethod
100
+ def _empty_result() -> dict[str, Any]:
101
+ return {
102
+ "obstruction_top": "unknown",
103
+ "obstruction_confidence": 0.0,
104
+ "obstruction_scores": {label: 0.0 for label in _KNOWN},
105
+ "wearing_glasses": False,
106
+ "wearing_sunglasses": False,
107
+ "wearing_mask": False,
108
+ }
analyzers/parsing_analyzer.py CHANGED
@@ -1,27 +1,42 @@
1
  """
2
- SegFormer-B5 human parsing replaces the old jonathandinu/face-parsing loader.
3
-
4
- Model: matei-dorian/segformer-b5-finetuned-human-parsing
5
- - Architecture: SegFormer-B5 (nvidia/mit-b5 backbone)
6
- - Published metrics on its eval set:
7
- Mean IoU: 0.6258
8
- Mean accuracy: 0.7547
9
- Overall acc.: 0.8256
10
- Face: acc 0.9094 / IoU 0.8294
11
- Hair: acc 0.8974 / IoU 0.8171
12
- - Outputs 18 classes (background, hat, hair, sunglasses, upper-clothes, skirt,
13
- pants, dress, belt, left-shoe, right-shoe, face, left-leg, right-leg,
14
- left-arm, right-arm, bag, scarf).
15
-
16
- We keep the same downstream contract as before: skin/hair/lip masks plus
17
- hair-length, accessory flags, wrinkle estimation, freckle/mole detection.
18
- The lip mask is approximated from the face region (no lip-specific class)
19
- and is mainly used as a fallback — MediaPipe lip landmarks are still the
20
- primary source for lip geometry/color in color_analyzer.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  """
22
 
23
  from typing import Any
24
- import warnings
25
 
26
  import cv2
27
  import numpy as np
@@ -34,7 +49,8 @@ from transformers import (
34
 
35
  MODEL_ID = "matei-dorian/segformer-b5-finetuned-human-parsing"
36
 
37
- # Official label map from the model card.
 
38
  PARSING_LABELS = {
39
  0: "background",
40
  1: "hat",
@@ -59,10 +75,14 @@ PARSING_LABELS = {
59
 
60
  class ParsingAnalyzer:
61
  def __init__(self):
 
 
62
  self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
63
  self.processor = None
64
  self.model = None
65
  try:
 
 
66
  self.processor = SegformerImageProcessor.from_pretrained(MODEL_ID)
67
  self.model = SegformerForSemanticSegmentation.from_pretrained(MODEL_ID)
68
  self.model.to(self.device).eval()
@@ -72,24 +92,36 @@ class ParsingAnalyzer:
72
  def analyze(self, img_rgb: np.ndarray) -> dict[str, Any]:
73
  h, w = img_rgb.shape[:2]
74
 
 
 
 
75
  if self.model is None or self.processor is None:
76
  return self._empty_result(h, w)
77
 
 
78
  pil = Image.fromarray(img_rgb)
79
  inputs = self.processor(images=pil, return_tensors="pt").to(self.device)
80
 
 
81
  with torch.no_grad():
82
  logits = self.model(**inputs).logits # (1, C, H/4, W/4)
83
 
 
 
84
  upsampled = torch.nn.functional.interpolate(
85
  logits, size=(h, w), mode="bilinear", align_corners=False
86
  )
87
  parsing = upsampled.argmax(dim=1)[0].cpu().numpy().astype(np.uint8)
88
 
 
 
89
  masks: dict[str, np.ndarray] = {
90
  name: (parsing == label_id) for label_id, name in PARSING_LABELS.items()
91
  }
92
 
 
 
 
93
  total_pixels = h * w
94
  region_coverage = {
95
  name: round(float(mask.sum()) / total_pixels, 4)
@@ -99,16 +131,16 @@ class ParsingAnalyzer:
99
 
100
  result: dict[str, Any] = {"region_coverage": region_coverage}
101
 
 
 
102
  skin_mask = masks.get("face", np.zeros((h, w), dtype=bool))
103
  hair_mask = masks.get("hair", np.zeros((h, w), dtype=bool))
104
- # No dedicated lip class; color_analyzer falls back to landmarks for lips.
105
- lip_mask = np.zeros((h, w), dtype=bool)
106
-
107
  result["_skin_mask"] = skin_mask
108
  result["_hair_mask"] = hair_mask
109
- result["_lip_mask"] = lip_mask
110
 
111
  # ── Hair length estimation ───────────────────────────────────
 
 
112
  hair_pixels = int(hair_mask.sum())
113
  face_pixels = int(skin_mask.sum()) + hair_pixels
114
  hair_ratio = hair_pixels / face_pixels if face_pixels else 0
@@ -124,18 +156,22 @@ class ParsingAnalyzer:
124
 
125
  result["hair_present"] = hair_ratio > 0.03
126
 
127
- # ── Accessories from segmentation ────────────────────────────
128
- result["glasses_detected"] = region_coverage.get("sunglasses", 0) > 0.005
 
129
  result["hat_detected"] = region_coverage.get("hat", 0) > 0.01
130
- result["earring_detected"] = False # no earring class in this model
131
- result["necklace_detected"] = False # no necklace class in this model
132
 
133
- # ── Skin analysis on face mask ───────────────────────────────
 
 
134
  if skin_mask.sum() > 100:
 
 
 
135
  skin_gray = cv2.cvtColor(img_rgb, cv2.COLOR_RGB2GRAY)
136
  laplacian = cv2.Laplacian(skin_gray, cv2.CV_64F)
137
  skin_edges = np.abs(laplacian)
138
- skin_edges[~skin_mask] = 0
139
  edge_density = skin_edges.sum() / skin_mask.sum() if skin_mask.sum() else 0
140
 
141
  if edge_density > 15:
@@ -149,6 +185,10 @@ class ParsingAnalyzer:
149
 
150
  result["skin_texture_score"] = round(float(edge_density), 2)
151
 
 
 
 
 
152
  skin_lab = cv2.cvtColor(img_rgb, cv2.COLOR_RGB2LAB)
153
  l_channel = skin_lab[:, :, 0].astype(float)
154
  l_channel[~skin_mask] = np.nan
@@ -162,6 +202,8 @@ class ParsingAnalyzer:
162
  else "none"
163
  )
164
 
 
 
165
  skin_l_values = l_channel[skin_mask]
166
  result["skin_uniformity"] = round(float(np.nanstd(skin_l_values)), 2)
167
  else:
@@ -174,18 +216,19 @@ class ParsingAnalyzer:
174
 
175
  @staticmethod
176
  def _empty_result(h: int, w: int) -> dict[str, Any]:
 
 
 
 
 
177
  empty = np.zeros((h, w), dtype=bool)
178
  return {
179
  "region_coverage": {},
180
  "_skin_mask": empty,
181
  "_hair_mask": empty,
182
- "_lip_mask": empty,
183
  "hair_length": "unknown",
184
  "hair_present": False,
185
- "glasses_detected": False,
186
  "hat_detected": False,
187
- "earring_detected": False,
188
- "necklace_detected": False,
189
  "wrinkle_level": "unknown",
190
  "skin_texture_score": 0,
191
  "freckles_or_moles": "unknown",
 
1
  """
2
+ ParsingAnalyzer — SegFormer-B5 human parsing for masks and skin stats.
3
+
4
+ Model
5
+ -----
6
+ - Architecture : SegFormer-B5 (nvidia/mit-b5 backbone)
7
+ - HF repo : matei-dorian/segformer-b5-finetuned-human-parsing
8
+ - License : Apache 2.0
9
+ - Eval metrics : mean IoU 0.626, overall acc 0.826
10
+ face acc 0.909 / IoU 0.829
11
+ hair acc 0.897 / IoU 0.817
12
+ - Classes (18) : background, hat, hair, sunglasses, upper_clothes, skirt,
13
+ pants, dress, belt, left_shoe, right_shoe, face,
14
+ left_leg, right_leg, left_arm, right_arm, bag, scarf
15
+
16
+ Inputs
17
+ ------
18
+ img_rgb : np.ndarray (H, W, 3) uint8
19
+
20
+ Outputs (dict)
21
+ --------------
22
+ Internal masks (stripped from JSON):
23
+ _skin_mask, _hair_mask
24
+ Public fields:
25
+ region_coverage — per-class fraction of pixels
26
+ hair_length — bald/very short | short | medium | long
27
+ hair_present — bool
28
+ hat_detected — bool, true when ≥1% of pixels are class "hat"
29
+ wrinkle_level — smooth | slight | moderate | prominent
30
+ skin_texture_score, skin_uniformity, freckles_or_moles
31
+
32
+ Notes
33
+ -----
34
+ The wrinkle / texture / freckle fields are OpenCV statistics computed
35
+ over the SegFormer face mask, not direct model outputs. SegFormer
36
+ contributes the mask; OpenCV does the per-pixel math.
37
  """
38
 
39
  from typing import Any
 
40
 
41
  import cv2
42
  import numpy as np
 
49
 
50
  MODEL_ID = "matei-dorian/segformer-b5-finetuned-human-parsing"
51
 
52
+ # Class id name as published by the model card. We index masks by
53
+ # these names downstream rather than raw integer ids.
54
  PARSING_LABELS = {
55
  0: "background",
56
  1: "hat",
 
75
 
76
  class ParsingAnalyzer:
77
  def __init__(self):
78
+ # CUDA when available, CPU otherwise. The HF Spaces free tier is
79
+ # CPU-only, so SegFormer-B5 inference takes ~1-2 s per request.
80
  self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
81
  self.processor = None
82
  self.model = None
83
  try:
84
+ # Both processor and model weights come from the same repo;
85
+ # processor handles resize/normalize/tensorize.
86
  self.processor = SegformerImageProcessor.from_pretrained(MODEL_ID)
87
  self.model = SegformerForSemanticSegmentation.from_pretrained(MODEL_ID)
88
  self.model.to(self.device).eval()
 
92
  def analyze(self, img_rgb: np.ndarray) -> dict[str, Any]:
93
  h, w = img_rgb.shape[:2]
94
 
95
+ # If the model failed to load we return empty masks so the rest
96
+ # of the pipeline (especially ColorAnalyzer) sees a consistent
97
+ # shape and degrades cleanly to "unknown" fields.
98
  if self.model is None or self.processor is None:
99
  return self._empty_result(h, w)
100
 
101
+ # SegFormer expects PIL; processor will resize internally.
102
  pil = Image.fromarray(img_rgb)
103
  inputs = self.processor(images=pil, return_tensors="pt").to(self.device)
104
 
105
+ # Forward pass → logits at H/4 × W/4 resolution.
106
  with torch.no_grad():
107
  logits = self.model(**inputs).logits # (1, C, H/4, W/4)
108
 
109
+ # Upsample to original resolution, then argmax to get the
110
+ # class id per pixel.
111
  upsampled = torch.nn.functional.interpolate(
112
  logits, size=(h, w), mode="bilinear", align_corners=False
113
  )
114
  parsing = upsampled.argmax(dim=1)[0].cpu().numpy().astype(np.uint8)
115
 
116
+ # Build a boolean mask per class. Cheap because we already have
117
+ # the argmax map; each is one numpy equality check.
118
  masks: dict[str, np.ndarray] = {
119
  name: (parsing == label_id) for label_id, name in PARSING_LABELS.items()
120
  }
121
 
122
+ # region_coverage = fraction of image occupied by each class.
123
+ # Useful as a coarse "is this class even present" signal — e.g.
124
+ # hat detection just checks if hat coverage exceeds a threshold.
125
  total_pixels = h * w
126
  region_coverage = {
127
  name: round(float(mask.sum()) / total_pixels, 4)
 
131
 
132
  result: dict[str, Any] = {"region_coverage": region_coverage}
133
 
134
+ # Skin & hair masks are passed downstream to ColorAnalyzer.
135
+ # Leading underscore → stripped from the final JSON payload.
136
  skin_mask = masks.get("face", np.zeros((h, w), dtype=bool))
137
  hair_mask = masks.get("hair", np.zeros((h, w), dtype=bool))
 
 
 
138
  result["_skin_mask"] = skin_mask
139
  result["_hair_mask"] = hair_mask
 
140
 
141
  # ── Hair length estimation ───────────────────────────────────
142
+ # Ratio of hair pixels to (face + hair) pixels — bigger ratio
143
+ # means longer hair extending past the face.
144
  hair_pixels = int(hair_mask.sum())
145
  face_pixels = int(skin_mask.sum()) + hair_pixels
146
  hair_ratio = hair_pixels / face_pixels if face_pixels else 0
 
156
 
157
  result["hair_present"] = hair_ratio > 0.03
158
 
159
+ # ── Hat detection ────────────────────────────────────────────
160
+ # A real hat consistently covers >1% of pixels; below that we're
161
+ # in noise / mis-segmentation territory.
162
  result["hat_detected"] = region_coverage.get("hat", 0) > 0.01
 
 
163
 
164
+ # ── Skin texture / wrinkles / freckles ───────────────────────
165
+ # Only worth computing if the face mask actually has substance.
166
+ # Under ~100 pixels we don't have enough signal.
167
  if skin_mask.sum() > 100:
168
+ # Wrinkles → high-frequency edge energy on the face mask.
169
+ # Laplacian responds to local intensity curvature; std/mean
170
+ # over the masked region gives a "how much fine detail" score.
171
  skin_gray = cv2.cvtColor(img_rgb, cv2.COLOR_RGB2GRAY)
172
  laplacian = cv2.Laplacian(skin_gray, cv2.CV_64F)
173
  skin_edges = np.abs(laplacian)
174
+ skin_edges[~skin_mask] = 0 # zero out non-face pixels
175
  edge_density = skin_edges.sum() / skin_mask.sum() if skin_mask.sum() else 0
176
 
177
  if edge_density > 15:
 
185
 
186
  result["skin_texture_score"] = round(float(edge_density), 2)
187
 
188
+ # Freckles/moles → count pixels well below mean L* lightness.
189
+ # Working in LAB rather than RGB makes the threshold tone-
190
+ # independent (a freckle is "darker than surrounding skin"
191
+ # regardless of base skin tone).
192
  skin_lab = cv2.cvtColor(img_rgb, cv2.COLOR_RGB2LAB)
193
  l_channel = skin_lab[:, :, 0].astype(float)
194
  l_channel[~skin_mask] = np.nan
 
202
  else "none"
203
  )
204
 
205
+ # Uniformity = std-dev of L* over the face. Higher = more
206
+ # variation (uneven skin tone, shadows, scarring).
207
  skin_l_values = l_channel[skin_mask]
208
  result["skin_uniformity"] = round(float(np.nanstd(skin_l_values)), 2)
209
  else:
 
216
 
217
  @staticmethod
218
  def _empty_result(h: int, w: int) -> dict[str, Any]:
219
+ """Stub returned when the SegFormer model fails to load.
220
+
221
+ Shape must match the success path so downstream code can rely
222
+ on key presence without conditional checks.
223
+ """
224
  empty = np.zeros((h, w), dtype=bool)
225
  return {
226
  "region_coverage": {},
227
  "_skin_mask": empty,
228
  "_hair_mask": empty,
 
229
  "hair_length": "unknown",
230
  "hair_present": False,
 
231
  "hat_detected": False,
 
 
232
  "wrinkle_level": "unknown",
233
  "skin_texture_score": 0,
234
  "freckles_or_moles": "unknown",
app.py CHANGED
@@ -1,25 +1,63 @@
1
  """
2
- Face Analysis Microservice
3
- Combines multiple pretrained models for comprehensive facial attribute detection.
4
-
5
- Models used:
6
- 1. MediaPipe Face Landmarker 478 3D landmarks + 52 blendshapes → geometric features
7
- 2. FairFace — age, gender, race classification
8
- 3. CelebA Attribute Classifier — 40 binary facial attributes
9
- 4. BiSeNet Face Parsing — 19-class pixel-level segmentation
10
- 5. HSEmotion 8-class emotion recognition
11
- 6. Color Analyzer — pixel-level skin tone, eye color, hair color (no AI)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  """
13
 
14
  import os
 
 
 
15
  os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
16
- os.environ["HF_HUB_DOWNLOAD_TIMEOUT"] = "60" # default is 10s, bump it
17
 
18
  import io
19
  import logging
20
  from typing import Optional
21
 
22
- import cv2
23
  import numpy as np
24
  from fastapi import FastAPI, File, HTTPException, UploadFile
25
  from fastapi.middleware.cors import CORSMiddleware
@@ -27,10 +65,11 @@ from PIL import Image
27
 
28
  from analyzers.landmark_analyzer import LandmarkAnalyzer
29
  from analyzers.demographic_analyzer import DemographicAnalyzer
30
- from analyzers.attribute_analyzer import AttributeAnalyzer
31
  from analyzers.parsing_analyzer import ParsingAnalyzer
32
  from analyzers.emotion_analyzer import EmotionAnalyzer
33
  from analyzers.color_analyzer import ColorAnalyzer
 
 
34
 
35
  logging.basicConfig(level=logging.INFO)
36
  logger = logging.getLogger(__name__)
@@ -48,15 +87,22 @@ app.add_middleware(
48
  # Analyzers are initialized lazily on first request to reduce cold-start time
49
  landmark_analyzer: Optional[LandmarkAnalyzer] = None
50
  demographic_analyzer: Optional[DemographicAnalyzer] = None
51
- attribute_analyzer: Optional[AttributeAnalyzer] = None
52
  parsing_analyzer: Optional[ParsingAnalyzer] = None
53
  emotion_analyzer: Optional[EmotionAnalyzer] = None
54
  color_analyzer: Optional[ColorAnalyzer] = None
 
 
55
 
56
 
57
  def _to_json_safe(value):
58
- """Convert numpy scalars/arrays and nested structures into JSON-safe types."""
59
- # Handle numpy types first (before dict/list checks)
 
 
 
 
 
 
60
  if isinstance(value, (np.ndarray,)):
61
  return value.tolist()
62
  if isinstance(value, (np.integer, np.floating)):
@@ -65,7 +111,7 @@ def _to_json_safe(value):
65
  return bool(value)
66
  if isinstance(value, np.generic):
67
  return value.item()
68
- # Handle nested structures
69
  if isinstance(value, dict):
70
  return {str(k): _to_json_safe(v) for k, v in value.items()}
71
  if isinstance(value, (list, tuple, set)):
@@ -74,9 +120,15 @@ def _to_json_safe(value):
74
 
75
 
76
  def get_analyzers():
77
- """Lazy-load all analyzer models on first use."""
78
- global landmark_analyzer, demographic_analyzer, attribute_analyzer
 
 
 
 
 
79
  global parsing_analyzer, emotion_analyzer, color_analyzer
 
80
 
81
  if landmark_analyzer is None:
82
  logger.info("Loading MediaPipe Face Landmarker...")
@@ -86,12 +138,8 @@ def get_analyzers():
86
  logger.info("Loading FairFace demographics model...")
87
  demographic_analyzer = DemographicAnalyzer()
88
 
89
- if attribute_analyzer is None:
90
- logger.info("Loading CelebA attribute classifier...")
91
- attribute_analyzer = AttributeAnalyzer()
92
-
93
  if parsing_analyzer is None:
94
- logger.info("Loading BiSeNet face parser...")
95
  parsing_analyzer = ParsingAnalyzer()
96
 
97
  if emotion_analyzer is None:
@@ -101,88 +149,96 @@ def get_analyzers():
101
  if color_analyzer is None:
102
  color_analyzer = ColorAnalyzer()
103
 
 
 
 
 
 
 
 
 
104
  return (
105
  landmark_analyzer,
106
  demographic_analyzer,
107
- attribute_analyzer,
108
  parsing_analyzer,
109
  emotion_analyzer,
110
  color_analyzer,
 
 
111
  )
112
 
113
 
114
  @app.get("/")
115
  async def root():
116
- """Root endpointreturns API information."""
117
  return {
118
  "name": "HCP Face Analysis Service",
119
  "version": "2.0.0",
120
  "status": "running",
121
  "endpoints": {
122
  "health": "/health",
123
- "analyze": "/analyze"
 
124
  }
125
  }
126
 
127
 
128
  @app.get("/health")
129
  async def health():
130
- """Health check endpoint use to keep the service warm."""
131
  return {"status": "ok"}
132
 
133
 
134
  @app.post("/analyze")
135
  async def analyze_face(file: UploadFile = File(...)):
136
- """
137
- Comprehensive face analysis endpoint.
138
 
139
- Accepts an image file upload and returns ~100+ facial attributes
140
- by running 6 models/analyzers in sequence.
 
141
  """
142
  try:
143
- # Read and decode the uploaded image
 
 
144
  contents = await file.read()
145
  image = Image.open(io.BytesIO(contents)).convert("RGB")
146
  img_array = np.array(image)
147
- img_bgr = cv2.cvtColor(img_array, cv2.COLOR_RGB2BGR)
148
 
149
  (
150
  landmarks,
151
  demographics,
152
- attributes,
153
  parsing,
154
  emotions,
155
  colors,
 
 
156
  ) = get_analyzers()
157
 
158
  results = {}
159
 
160
- # Step 1: MediaPipe Landmarks → geometric features (~40 attributes)
161
  logger.info("Running landmark analysis...")
162
  landmark_results = landmarks.analyze(img_array)
163
  results.update(landmark_results)
164
 
165
- # Step 2: FairFace age, gender, race
166
  logger.info("Running demographic analysis...")
167
  demo_results = demographics.analyze(img_array)
168
  results.update(demo_results)
169
 
170
- # Step 3: CelebA40 binary facial attributes
171
- logger.info("Running attribute analysis...")
172
- attr_results = attributes.analyze(img_array)
173
- results.update(attr_results)
174
-
175
- # Step 4: BiSeNet → pixel segmentation → hair length, wrinkles, spots
176
  logger.info("Running face parsing...")
177
  parse_results = parsing.analyze(img_array)
178
  results.update(parse_results)
179
 
180
- # Step 5: HSEmotion → emotion classification
181
  logger.info("Running emotion analysis...")
182
  emo_results = emotions.analyze(img_array)
183
  results.update(emo_results)
184
 
185
- # Step 6: Color analysis using masks from Step 4 + landmarks from Step 1
 
186
  logger.info("Running color analysis...")
187
  color_results = colors.analyze(
188
  img_array,
@@ -192,6 +248,14 @@ async def analyze_face(file: UploadFile = File(...)):
192
  )
193
  results.update(color_results)
194
 
 
 
 
 
 
 
 
 
195
  # Remove internal fields (prefixed with underscore)
196
  results = {k: v for k, v in results.items() if not k.startswith("_")}
197
 
@@ -204,9 +268,11 @@ async def analyze_face(file: UploadFile = File(...)):
204
 
205
  @app.post("/analyze-base64")
206
  async def analyze_face_base64(body: dict):
207
- """
208
- Alternative endpoint that accepts base64-encoded image data.
209
- This matches the format the Express server sends.
 
 
210
  """
211
  import base64
212
 
@@ -222,28 +288,28 @@ async def analyze_face_base64(body: dict):
222
  image_bytes = base64.b64decode(image_b64)
223
  image = Image.open(io.BytesIO(image_bytes)).convert("RGB")
224
  img_array = np.array(image)
225
- img_bgr = cv2.cvtColor(img_array, cv2.COLOR_RGB2BGR)
226
 
227
  (
228
  landmarks,
229
  demographics,
230
- attributes,
231
  parsing,
232
  emotions,
233
  colors,
 
 
234
  ) = get_analyzers()
235
 
236
  results = {}
237
 
 
 
 
238
  landmark_results = landmarks.analyze(img_array)
239
  results.update(landmark_results)
240
 
241
  demo_results = demographics.analyze(img_array)
242
  results.update(demo_results)
243
 
244
- attr_results = attributes.analyze(img_array)
245
- results.update(attr_results)
246
-
247
  parse_results = parsing.analyze(img_array)
248
  results.update(parse_results)
249
 
@@ -258,6 +324,11 @@ async def analyze_face_base64(body: dict):
258
  )
259
  results.update(color_results)
260
 
 
 
 
 
 
261
  results = {k: v for k, v in results.items() if not k.startswith("_")}
262
 
263
  return {"success": True, "data": _to_json_safe(results)}
 
1
  """
2
+ HCP Face Analysis Microservice
3
+ ==============================
4
+
5
+ FastAPI service that runs seven specialized analyzers over a single photo
6
+ and merges their outputs into one ~100-field facial-attribute dictionary.
7
+
8
+ Pipeline (in execution order)
9
+ -----------------------------
10
+ 1. MediaPipe Face Landmarker 478 3D landmarks + 52 ARKit blendshapes.
11
+ Produces all geometric face/eye/nose/lip/
12
+ jaw features plus smiling and mouth-open.
13
+
14
+ 2. DemographicAnalyzer Three ViT classifiers (FairFace age,
15
+ FairFace gender, Ethnicity_Test_v003).
16
+ Age is reported as a softmax-weighted
17
+ continuous estimate, not a bucket midpoint.
18
+
19
+ 3. ParsingAnalyzer SegFormer-B5 human parsing. Emits face
20
+ and hair pixel masks plus hair length,
21
+ hat detection, and skin texture/wrinkle/
22
+ freckle/uniformity stats computed via
23
+ OpenCV over the face mask.
24
+
25
+ 4. EmotionAnalyzer HSEmotion EfficientNet-B0 8-class output
26
+ plus derived valence, arousal, mood.
27
+
28
+ 5. ColorAnalyzer Pixel-level LAB/HSV statistics. Reads
29
+ masks from step 3 and lip/iris landmarks
30
+ from step 1. No ML model.
31
+
32
+ 6. ObstructionAnalyzer dima806 ViT-B/16. Glasses, sunglasses,
33
+ mask flags with ~99% precision/recall.
34
+
35
+ 7. HairTypeAnalyzer dima806 ViT-B/16. Curly/dreadlocks/kinky/
36
+ straight/wavy at ~93% accuracy.
37
+
38
+ Endpoints
39
+ ---------
40
+ GET / service banner
41
+ GET /health liveness check
42
+ POST /analyze multipart file upload
43
+ POST /analyze-base64 JSON {"image": "<base64>"}
44
+
45
+ Both POST endpoints run the same pipeline. All analyzers are lazily
46
+ instantiated on first request to keep cold-start latency manageable
47
+ on the Hugging Face Spaces free tier.
48
  """
49
 
50
  import os
51
+ # hf_transfer gives much faster model downloads from the HF Hub on first
52
+ # inference. HF_HUB_DOWNLOAD_TIMEOUT defaults to 10s which is too short
53
+ # for the larger ViT checkpoints on a cold start.
54
  os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
55
+ os.environ["HF_HUB_DOWNLOAD_TIMEOUT"] = "60"
56
 
57
  import io
58
  import logging
59
  from typing import Optional
60
 
 
61
  import numpy as np
62
  from fastapi import FastAPI, File, HTTPException, UploadFile
63
  from fastapi.middleware.cors import CORSMiddleware
 
65
 
66
  from analyzers.landmark_analyzer import LandmarkAnalyzer
67
  from analyzers.demographic_analyzer import DemographicAnalyzer
 
68
  from analyzers.parsing_analyzer import ParsingAnalyzer
69
  from analyzers.emotion_analyzer import EmotionAnalyzer
70
  from analyzers.color_analyzer import ColorAnalyzer
71
+ from analyzers.obstruction_analyzer import ObstructionAnalyzer
72
+ from analyzers.hair_type_analyzer import HairTypeAnalyzer
73
 
74
  logging.basicConfig(level=logging.INFO)
75
  logger = logging.getLogger(__name__)
 
87
  # Analyzers are initialized lazily on first request to reduce cold-start time
88
  landmark_analyzer: Optional[LandmarkAnalyzer] = None
89
  demographic_analyzer: Optional[DemographicAnalyzer] = None
 
90
  parsing_analyzer: Optional[ParsingAnalyzer] = None
91
  emotion_analyzer: Optional[EmotionAnalyzer] = None
92
  color_analyzer: Optional[ColorAnalyzer] = None
93
+ obstruction_analyzer: Optional[ObstructionAnalyzer] = None
94
+ hair_type_analyzer: Optional[HairTypeAnalyzer] = None
95
 
96
 
97
  def _to_json_safe(value):
98
+ """Recursively coerce numpy scalars/arrays into JSON-serialisable types.
99
+
100
+ Several analyzers return numpy floats/booleans (e.g. from `np.std`
101
+ or boolean mask logic). FastAPI's default JSON encoder doesn't
102
+ handle those, so we normalise everything here before returning.
103
+ """
104
+ # Numpy first — these checks would otherwise be caught by isinstance
105
+ # for dict/list because numpy.generic types are duck-typed.
106
  if isinstance(value, (np.ndarray,)):
107
  return value.tolist()
108
  if isinstance(value, (np.integer, np.floating)):
 
111
  return bool(value)
112
  if isinstance(value, np.generic):
113
  return value.item()
114
+ # Recurse into nested containers.
115
  if isinstance(value, dict):
116
  return {str(k): _to_json_safe(v) for k, v in value.items()}
117
  if isinstance(value, (list, tuple, set)):
 
120
 
121
 
122
  def get_analyzers():
123
+ """Lazy-load all analyzer models on first use.
124
+
125
+ Each analyzer is instantiated once per process and reused across
126
+ requests. First request pays the full model-load cost; subsequent
127
+ requests are warm.
128
+ """
129
+ global landmark_analyzer, demographic_analyzer
130
  global parsing_analyzer, emotion_analyzer, color_analyzer
131
+ global obstruction_analyzer, hair_type_analyzer
132
 
133
  if landmark_analyzer is None:
134
  logger.info("Loading MediaPipe Face Landmarker...")
 
138
  logger.info("Loading FairFace demographics model...")
139
  demographic_analyzer = DemographicAnalyzer()
140
 
 
 
 
 
141
  if parsing_analyzer is None:
142
+ logger.info("Loading SegFormer face parser...")
143
  parsing_analyzer = ParsingAnalyzer()
144
 
145
  if emotion_analyzer is None:
 
149
  if color_analyzer is None:
150
  color_analyzer = ColorAnalyzer()
151
 
152
+ if obstruction_analyzer is None:
153
+ logger.info("Loading face obstruction classifier...")
154
+ obstruction_analyzer = ObstructionAnalyzer()
155
+
156
+ if hair_type_analyzer is None:
157
+ logger.info("Loading hair type classifier...")
158
+ hair_type_analyzer = HairTypeAnalyzer()
159
+
160
  return (
161
  landmark_analyzer,
162
  demographic_analyzer,
 
163
  parsing_analyzer,
164
  emotion_analyzer,
165
  color_analyzer,
166
+ obstruction_analyzer,
167
+ hair_type_analyzer,
168
  )
169
 
170
 
171
  @app.get("/")
172
  async def root():
173
+ """Service bannerconfirms the server is reachable and which version."""
174
  return {
175
  "name": "HCP Face Analysis Service",
176
  "version": "2.0.0",
177
  "status": "running",
178
  "endpoints": {
179
  "health": "/health",
180
+ "analyze": "/analyze",
181
+ "analyze-base64": "/analyze-base64",
182
  }
183
  }
184
 
185
 
186
  @app.get("/health")
187
  async def health():
188
+ """Liveness probe. Used by the Express server and HF Spaces uptime checks."""
189
  return {"status": "ok"}
190
 
191
 
192
  @app.post("/analyze")
193
  async def analyze_face(file: UploadFile = File(...)):
194
+ """Multipart endpoint for direct uploads.
 
195
 
196
+ Runs all seven analyzers and returns the merged attribute dict.
197
+ See `analyze_face_base64` for the JSON-body variant the Express
198
+ server calls.
199
  """
200
  try:
201
+ # Decode the upload into an RGB numpy array. All analyzers
202
+ # work in RGB; we don't actually need BGR but keeping it as a
203
+ # local in case a future analyzer wants the OpenCV-native order.
204
  contents = await file.read()
205
  image = Image.open(io.BytesIO(contents)).convert("RGB")
206
  img_array = np.array(image)
 
207
 
208
  (
209
  landmarks,
210
  demographics,
 
211
  parsing,
212
  emotions,
213
  colors,
214
+ obstructions,
215
+ hair_types,
216
  ) = get_analyzers()
217
 
218
  results = {}
219
 
220
+ # Step 1: MediaPipe Landmarks → all geometric features + blendshapes.
221
  logger.info("Running landmark analysis...")
222
  landmark_results = landmarks.analyze(img_array)
223
  results.update(landmark_results)
224
 
225
+ # Step 2: FairFace + Ethnicity ViT → demographics.
226
  logger.info("Running demographic analysis...")
227
  demo_results = demographics.analyze(img_array)
228
  results.update(demo_results)
229
 
230
+ # Step 3: SegFormer-B5 human parsing masks + hair length + skin stats.
 
 
 
 
 
231
  logger.info("Running face parsing...")
232
  parse_results = parsing.analyze(img_array)
233
  results.update(parse_results)
234
 
235
+ # Step 4: HSEmotion → 8-class emotion + valence/arousal/mood.
236
  logger.info("Running emotion analysis...")
237
  emo_results = emotions.analyze(img_array)
238
  results.update(emo_results)
239
 
240
+ # Step 5: Pixel color analysis. Uses the face/hair masks from step 3
241
+ # and MediaPipe lip/iris landmarks from step 1.
242
  logger.info("Running color analysis...")
243
  color_results = colors.analyze(
244
  img_array,
 
248
  )
249
  results.update(color_results)
250
 
251
+ # Step 6: ObstructionViT → glasses / sunglasses / mask flags.
252
+ logger.info("Running obstruction analysis...")
253
+ results.update(obstructions.analyze(img_array))
254
+
255
+ # Step 7: HairTypeViT → curly/dreadlocks/kinky/straight/wavy.
256
+ logger.info("Running hair-type analysis...")
257
+ results.update(hair_types.analyze(img_array))
258
+
259
  # Remove internal fields (prefixed with underscore)
260
  results = {k: v for k, v in results.items() if not k.startswith("_")}
261
 
 
268
 
269
  @app.post("/analyze-base64")
270
  async def analyze_face_base64(body: dict):
271
+ """JSON-body endpoint accepting `{"image": "<base64>"}`.
272
+
273
+ This is what the Node/Express server forwards client requests to
274
+ so we don't have to push multipart payloads through the proxy.
275
+ The pipeline body is identical to `/analyze`.
276
  """
277
  import base64
278
 
 
288
  image_bytes = base64.b64decode(image_b64)
289
  image = Image.open(io.BytesIO(image_bytes)).convert("RGB")
290
  img_array = np.array(image)
 
291
 
292
  (
293
  landmarks,
294
  demographics,
 
295
  parsing,
296
  emotions,
297
  colors,
298
+ obstructions,
299
+ hair_types,
300
  ) = get_analyzers()
301
 
302
  results = {}
303
 
304
+ # Same seven-step pipeline as /analyze. Kept inline (rather
305
+ # than factored out) so the per-step `logger.info` cadence and
306
+ # ordering stay obvious when reading either endpoint top-down.
307
  landmark_results = landmarks.analyze(img_array)
308
  results.update(landmark_results)
309
 
310
  demo_results = demographics.analyze(img_array)
311
  results.update(demo_results)
312
 
 
 
 
313
  parse_results = parsing.analyze(img_array)
314
  results.update(parse_results)
315
 
 
324
  )
325
  results.update(color_results)
326
 
327
+ results.update(obstructions.analyze(img_array))
328
+ results.update(hair_types.analyze(img_array))
329
+
330
+ # Drop internal/scratch fields (leading underscore) before
331
+ # returning. Keeps masks and raw landmark lists out of the JSON.
332
  results = {k: v for k, v in results.items() if not k.startswith("_")}
333
 
334
  return {"success": True, "data": _to_json_safe(results)}
architecture.md CHANGED
@@ -1,1707 +1,99 @@
1
- # HCP Face Analysis — Architecture Plan
2
-
3
- ## Revised Architecture & Best Models for Maximum Feature Coverage
4
-
5
- Since the codebase is flexible and can use more languages and frameworks, we go beyond the Supabase Edge Function constraint to find the **absolute best models** for the full feature list.
6
-
7
- ---
8
-
9
- ## Recommended Architecture: Python Microservice Sidecar
10
-
11
- ```
12
- ─────────────────────────────────────────────────────────┐
13
- CURRENT STACK │
14
- Next.js Frontend ──► Supabase (Auth, DB, Storage)
15
- └──────────────┬───────────────────────────────────────────┘
16
-
17
-
18
- ─────────────────────────────────────────────────────────┐
19
- NEW: Python Face Analysis Microservice │
20
- (Railway.app / Render.com / Hugging Face Spaces) │
21
- FREE TIER: 512MB RAM, shared CPU
22
- │ │
23
- FastAPI Server
24
- ├── MediaPipe Face Landmarker (478 landmarks, 4MB) │
25
- ├── InsightFace Buffalo_SC (recognition + attrs, 30MB) │
26
- ├── FairFace (age/gender/race, 90MB)
27
- ├── HuggingFace ViT models (attributes, ~50MB each)
28
- ├─ BiSeNet (face parsing/segmentation, 50MB)
29
- └── Custom geometric analysis (your feature list)
30
-
31
- Total: ~250MB models (loaded lazily)
32
- ─────────────────────────────────────────────────────────┘
33
- ```
34
-
35
- **Why this is better:** Python gives access to the **entire deep learning ecosystem** — every model on HuggingFace, every research paper's pretrained weights. Free-tier hosting on Railway/Render gives 512MB RAM and enough CPU for per-request inference.
36
-
37
- ---
38
-
39
- ## Best Models Per Feature Category
40
-
41
- ### Tier 1: Core Models (Must Have)
42
-
43
- #### 1. MediaPipe Face Landmarker — Geometric Features
44
- - **478 3D landmarks + 52 blendshapes**
45
- - **Size:** 4MB
46
- - **Covers:** Face shape, jawline, chin, cheekbones, forehead, eye shape, eye spacing, eye size, eyebrow shape, nose shape, lip shape, mouth width, dimples, facial asymmetry
47
- - **GitHub:** https://github.com/google-ai-edge/mediapipe
48
- - **Python:** `pip install mediapipe`
49
- - **Accuracy:** State-of-the-art landmark detection, handles 30° head rotation well
50
-
51
- #### 2. InsightFace Buffalo_SC — Lightweight Recognition + Age/Gender
52
- - **Size:** ~30MB (smallest Buffalo variant)
53
- - **LFW Accuracy:** 99.5%
54
- - **Covers:** Face detection, age, gender, face embedding (for recognition), 2D landmarks
55
- - **GitHub:** https://github.com/deepinsight/insightface
56
- - **Weights:** Auto-downloaded via `insightface.app.FaceAnalysis(name='buffalo_sc')`
57
- - **Why not Buffalo_L:** 320MB is overkill; Buffalo_SC is 90% as accurate at 1/10th the size
58
-
59
- #### 3. FairFace Age, Gender, Race (Most Accurate)
60
- - **Size:** ~90MB (ResNet-34)
61
- - **Accuracy:** 93.4% race, 94.2% gender, MAE 3.4 years for age
62
- - **Covers:** Age (9 buckets), gender, race (7 categories: White, Black, Latino, East Asian, Southeast Asian, Indian, Middle Eastern)
63
- - **GitHub:** https://github.com/dchen236/FairFace
64
- - **Weights:** https://drive.google.com/file/d/1xSfJQWMhm3AVlJYcPcabGO_bj1kDB0xw (res34_fair_align_multi_7_20190809.pt)
65
- - **Why over InsightFace for this:** FairFace is specifically trained for fair demographic classification across races, not biased toward any group
66
-
67
- #### 4. HSEmotion (EfficientNet) Emotion Recognition
68
- - **Size:** ~20MB
69
- - **Accuracy:** 66.5% on AffectNet-8 (state-of-the-art), 8 emotions
70
- - **Covers:** Angry, contempt, disgust, fear, happy, neutral, sad, surprise
71
- - **GitHub:** https://github.com/HSE-asavchenko/face-emotion-recognition
72
- - **Weights:** Available via `timm` or direct download from repo
73
- - **Why over face-api.js:** Significantly more accurate, trained on AffectNet (largest emotion dataset)
74
-
75
- ### Tier 2: Specialized Models
76
-
77
- #### 5. BiSeNet Face Parsing — Facial Segmentation
78
- - **Size:** ~50MB
79
- - **Covers:** Skin region, left/right eyebrow, left/right eye, nose, upper/lower lip, inner mouth, hair, left/right ear, neck, cloth, hat, earrings, glasses, background
80
- - **GitHub:** https://github.com/zllrunning/face-parsing.PyTorch
81
- - **Weights:** https://drive.google.com/file/d/154JgKpzCPW82qINcVieuPH3fZ2e0P812
82
- - **Why this matters:** Precisely segments hair, skin, eyebrows for color analysis, facial hair detection, glasses detection, and wrinkle analysis
83
-
84
- #### 6. microsoft/swin-base-patch4-window7-224-in22k fine-tuned for facial attributes
85
- - **HuggingFace:** Various CelebA-trained attribute classifiers
86
- - Specifically: https://huggingface.co/nateraw/vit-age-classifier (age)
87
- - Specifically: https://huggingface.co/rizvandwiki/gender-classification-2 (gender)
88
-
89
- #### 7. CelebA Attribute Classifier (Custom Multi-Label)
90
- - **Dataset:** CelebA has 40 binary attributes already labeled
91
- - Train a lightweight EfficientNet-B0 (~20MB) on CelebA for:
92
- - `Attractive`, `Bald`, `Bangs`, `Big_Lips`, `Big_Nose`, `Black_Hair`, `Blond_Hair`, `Brown_Hair`, `Bushy_Eyebrows`, `Chubby`, `Double_Chin`, `Eyeglasses`, `Goatee`, `Gray_Hair`, `Heavy_Makeup`, `High_Cheekbones`, `Male`, `Mouth_Slightly_Open`, `Mustache`, `Narrow_Eyes`, `No_Beard`, `Oval_Face`, `Pointy_Nose`, `Receding_Hairline`, `Sideburns`, `Smiling`, `Straight_Hair`, `Wavy_Hair`, `Wearing_Hat`, `Young`
93
- - **Pre-trained option:** https://github.com/dchen236/FairFace has CelebA-trained models
94
- - **Better pre-trained option:** https://huggingface.co/jnferreira/attribute-prediction-celebA
95
-
96
- #### 8. Hair Segmentation + Color Analysis
97
- - **Model:** MODNet for matting + BiSeNet for hair segmentation
98
- - **GitHub (MODNet):** https://github.com/ZHKKKe/MODNet (~25MB)
99
- - Post-segmentation: K-means clustering on hair pixels for color
100
-
101
- #### 9. Skin Analysis (Wrinkles, Acne, etc.)
102
- - **Model:** https://huggingface.co/imfarzanansari/skin-disease-detection (for acne/skin conditions)
103
- - **For wrinkles:** Edge detection (Canny/Sobel) on forehead/eye regions from BiSeNet parsing — no model needed
104
- - **For freckles/moles:** Blob detection on skin regions from BiSeNet parsing
105
-
106
- ---
107
-
108
- ## Complete Feature Coverage Map
109
-
110
- | Feature | Model/Method | Confidence |
111
- |---------|-------------|------------|
112
- | **Face shape** (oval, round, square, heart, diamond, oblong, triangle) | MediaPipe landmarks geometric ratios + CelebA (`Oval_Face`) | ⭐⭐⭐⭐ |
113
- | **Jawline** (sharp, soft, strong) | MediaPipe jaw landmark angles | ⭐⭐⭐⭐ |
114
- | **Chin** (receding, pointed, cleft, wide) | MediaPipe chin landmarks + depth (z) | ⭐⭐⭐ |
115
- | **Cheekbones** (high, flat, full, hollow) | MediaPipe landmark z-depth + CelebA (`High_Cheekbones`, `Chubby`) | ⭐⭐⭐⭐ |
116
- | **Forehead** (broad, narrow) | MediaPipe forehead span ratio | ⭐⭐⭐⭐ |
117
- | **Eye shape** (almond, round, hooded, monolid, upturned, downturned) | MediaPipe eyelid curvature + corner angles | ⭐⭐⭐⭐ |
118
- | **Eye spacing** (wide-set, close-set) | MediaPipe interpupillary distance ratio | ⭐⭐⭐⭐⭐ |
119
- | **Eye size** (large, small) | MediaPipe eye area / face area | ⭐⭐⭐⭐⭐ |
120
- | **Deep-set / protruding eyes** | MediaPipe landmark z-depth at eye region | ⭐⭐⭐ |
121
- | **Eye color** (brown, blue, green, hazel) | Iris crop → HSV color histogram + KNN | ⭐⭐⭐⭐ |
122
- | **Dark under-eyes / eye bags** | BiSeNet skin parsing → brightness analysis under eyes | ⭐⭐⭐ |
123
- | **Crow's feet** | Canny edge detection on BiSeNet-parsed outer eye skin | ⭐⭐⭐ |
124
- | **Eyebrow shape** (arched, straight, bushy, thick, thin) | MediaPipe brow landmarks + CelebA (`Bushy_Eyebrows`, `Arched_Eyebrows`) | ⭐⭐⭐⭐ |
125
- | **Unibrow** | MediaPipe inner brow distance + pixel analysis between brows | ⭐⭐⭐⭐ |
126
- | **Nose shape** (straight, aquiline, button, upturned, wide, narrow) | MediaPipe nose landmarks + CelebA (`Big_Nose`, `Pointy_Nose`) | ⭐⭐⭐⭐ |
127
- | **Nose bridge** (flat, high) | MediaPipe z-depth at nasal bridge | ⭐⭐⭐ |
128
- | **Nostrils** (wide, narrow) | MediaPipe nostril landmark width ratio | ⭐⭐⭐⭐ |
129
- | **Lips** (full, thin) | MediaPipe lip landmarks + CelebA (`Big_Lips`) | ⭐⭐⭐⭐ |
130
- | **Mouth width** | MediaPipe mouth corner distance ratio | ⭐⭐⭐⭐⭐ |
131
- | **Cupid's bow** | MediaPipe upper lip curvature analysis | ⭐⭐⭐ |
132
- | **Teeth** (gap, crooked, straight, overbite, underbite) | Mouth crop when smiling → custom classifier or rule-based | ⭐⭐ |
133
- | **Dimples** | MediaPipe blendshapes during smile + cheek region analysis | ⭐⭐⭐ |
134
- | **Smile lines** | Edge detection on nasolabial region | ⭐⭐⭐ |
135
- | **Asymmetrical smile** | MediaPipe left/right smile blendshape difference | ⭐⭐⭐⭐ |
136
- | **Hair type** (straight, wavy, curly, coily) | BiSeNet hair segmentation → texture frequency (FFT) + CelebA (`Straight_Hair`, `Wavy_Hair`) | ⭐⭐⭐ |
137
- | **Hair length** (short, long, bald) | BiSeNet hair mask area + CelebA (`Bald`, `Bangs`) | ⭐⭐⭐⭐ |
138
- | **Hair color** (black, brown, blonde, red, gray, dyed) | BiSeNet hair mask → K-means color clustering + CelebA (`Black_Hair`, `Brown_Hair`, `Blond_Hair`, `Gray_Hair`) | ⭐⭐⭐⭐ |
139
- | **Receding hairline / widow's peak** | BiSeNet hair boundary analysis + CelebA (`Receding_Hairline`) | ⭐⭐⭐ |
140
- | **Beard/facial hair** (full, stubble, goatee, mustache, sideburns, clean-shaven) | BiSeNet parsing lower face + CelebA (`5_o_Clock_Shadow`, `Goatee`, `Mustache`, `No_Beard`, `Sideburns`) | ⭐⭐⭐⭐ |
141
- | **Skin tone** (light, medium, dark) | BiSeNet skin parsing → mean LAB brightness | ⭐⭐⭐⭐⭐ |
142
- | **Freckles** | BiSeNet skin mask → small blob detection (contrast) | ⭐⭐⭐ |
143
- | **Moles / birthmark** | BiSeNet skin mask → dark blob detection | ⭐⭐⭐ |
144
- | **Scars** | BiSeNet skin mask → linear edge anomaly detection | ⭐⭐ |
145
- | **Acne** | BiSeNet skin mask → red blob detection or HuggingFace skin model | ⭐⭐⭐ |
146
- | **Wrinkles / forehead lines** | BiSeNet forehead mask → Gabor filter or Canny edges | ⭐⭐⭐ |
147
- | **Facial asymmetry** | MediaPipe left/right landmark mirror distance | ⭐⭐⭐⭐⭐ |
148
- | **Prominent Adam's apple** | Neck region detection (limited accuracy) | ⭐ |
149
- | **Glasses** | CelebA (`Eyeglasses`) + BiSeNet parsing | ⭐⭐⭐⭐⭐ |
150
- | **Age** | FairFace (MAE 3.4 years) | ⭐⭐⭐⭐⭐ |
151
- | **Gender** | FairFace (94.2%) | ⭐⭐⭐⭐⭐ |
152
- | **Race** | FairFace (93.4%, 7 categories) | ⭐⭐⭐⭐⭐ |
153
- | **Emotion** | HSEmotion (66.5% AffectNet-8, SOTA) | ⭐⭐⭐⭐ |
154
-
155
- ---
156
-
157
- ## Model Comparison Table
158
-
159
- | Model | Accuracy (LFW) | Size | Runs in Deno/Browser? | Feature Depth | Notes |
160
- |-------|----------------|------|----------------------|---------------|-------|
161
- | **DeepFace** (Python) | 97.4% (VGG-Face) | 500MB+ | ❌ No (Python only) | Age, gender, race, emotion | Too large, wrong runtime |
162
- | **InsightFace Buffalo_L** | 99.8% (LFW) | ~320MB | ❌ No (Python/C++) | Landmarks, age, gender | Too large for edge |
163
- | **InsightFace MobileFaceNet** | 99.5% (LFW) | ~4MB | ⚠️ ONNX possible | Recognition only, no attributes | Very small but limited features |
164
- | **MediaPipe Face Landmarker** | N/A (landmark model) | ~4MB | ✅ Yes (TFJS/WASM) | 478 landmarks, blendshapes | Best for geometric features |
165
- | **face-api.js** | 99.2% (LFW) | ~6MB (all models) | ✅ Yes (TFJS) | Age, gender, emotion, 68 landmarks | Browser/Node.js ready |
166
- | **ONNX FER+ (emotion)** | ~85% (FER2013) | ~2MB | ✅ Yes (ONNX.js) | Emotion only | Supplement model |
167
- | **HuggingFace ViT models** | Varies | 50-350MB | ⚠️ ONNX export possible | Age, gender, various classifiers | Some fit under 50MB |
168
-
169
- ---
170
-
171
- ## Free Hosting Options for the Python Microservice
172
-
173
- | Platform | Free Tier | RAM | Cold Start | Best For |
174
- |----------|-----------|-----|------------|----------|
175
- | **Hugging Face Spaces** | Unlimited | 2GB CPU | ~15s | Best free option, runs Gradio/FastAPI |
176
- | **Railway.app** | $5 credit/month | 512MB | ~5s | Good for always-on API |
177
- | **Render.com** | 750 hrs/month | 512MB | ~30s | Spins down after 15min inactivity |
178
- | **Google Cloud Run** | 2M requests/month | 512MB | ~10s | Best scaling, pay-per-request |
179
- | **Fly.io** | 3 shared VMs | 256MB | ~3s | Low latency, always on |
180
-
181
- **Recommendation: Hugging Face Spaces** — 2GB RAM free, pre-installed ML libraries, no cold start limits, and you can use their Inference API for some models without even hosting.
182
-
183
- ---
184
-
185
- ## Full Implementation
186
-
187
- ### Python Microservice
188
-
189
- #### requirements.txt
190
-
191
- ```
192
- fastapi==0.115.0
193
- uvicorn==0.30.0
194
- python-multipart==0.0.9
195
- mediapipe==0.10.14
196
- insightface==0.7.3
197
- onnxruntime==1.18.0
198
- torch==2.3.0
199
- torchvision==0.18.0
200
- Pillow==10.4.0
201
- numpy==1.26.4
202
- opencv-python-headless==4.10.0.84
203
- scipy==1.13.0
204
- scikit-learn==1.5.0
205
- huggingface-hub==0.23.0
206
- ```
207
-
208
- #### face-service/app.py
209
-
210
- ```python
211
- """
212
- Face Analysis Microservice
213
- Combines multiple models for comprehensive facial attribute detection.
214
- """
215
-
216
- import io
217
- import logging
218
- from typing import Optional
219
-
220
- import cv2
221
- import numpy as np
222
- from fastapi import FastAPI, File, HTTPException, UploadFile
223
- from fastapi.middleware.cors import CORSMiddleware
224
- from PIL import Image
225
-
226
- from analyzers.landmark_analyzer import LandmarkAnalyzer
227
- from analyzers.demographic_analyzer import DemographicAnalyzer
228
- from analyzers.attribute_analyzer import AttributeAnalyzer
229
- from analyzers.parsing_analyzer import ParsingAnalyzer
230
- from analyzers.emotion_analyzer import EmotionAnalyzer
231
- from analyzers.color_analyzer import ColorAnalyzer
232
-
233
- logging.basicConfig(level=logging.INFO)
234
- logger = logging.getLogger(__name__)
235
-
236
- app = FastAPI(title="Face Analysis Service", version="2.0.0")
237
-
238
- app.add_middleware(
239
- CORSMiddleware,
240
- allow_origins=["*"], # Restrict in production
241
- allow_credentials=True,
242
- allow_methods=["*"],
243
- allow_headers=["*"],
244
- )
245
-
246
- # Initialize analyzers lazily
247
- landmark_analyzer: Optional[LandmarkAnalyzer] = None
248
- demographic_analyzer: Optional[DemographicAnalyzer] = None
249
- attribute_analyzer: Optional[AttributeAnalyzer] = None
250
- parsing_analyzer: Optional[ParsingAnalyzer] = None
251
- emotion_analyzer: Optional[EmotionAnalyzer] = None
252
- color_analyzer: Optional[ColorAnalyzer] = None
253
-
254
-
255
- def get_analyzers():
256
- global landmark_analyzer, demographic_analyzer, attribute_analyzer
257
- global parsing_analyzer, emotion_analyzer, color_analyzer
258
-
259
- if landmark_analyzer is None:
260
- logger.info("Loading MediaPipe landmarks...")
261
- landmark_analyzer = LandmarkAnalyzer()
262
-
263
- if demographic_analyzer is None:
264
- logger.info("Loading FairFace demographics...")
265
- demographic_analyzer = DemographicAnalyzer()
266
-
267
- if attribute_analyzer is None:
268
- logger.info("Loading CelebA attribute classifier...")
269
- attribute_analyzer = AttributeAnalyzer()
270
-
271
- if parsing_analyzer is None:
272
- logger.info("Loading BiSeNet face parser...")
273
- parsing_analyzer = ParsingAnalyzer()
274
-
275
- if emotion_analyzer is None:
276
- logger.info("Loading HSEmotion...")
277
- emotion_analyzer = EmotionAnalyzer()
278
-
279
- if color_analyzer is None:
280
- color_analyzer = ColorAnalyzer()
281
-
282
- return (
283
- landmark_analyzer,
284
- demographic_analyzer,
285
- attribute_analyzer,
286
- parsing_analyzer,
287
- emotion_analyzer,
288
- color_analyzer,
289
- )
290
-
291
-
292
- @app.get("/health")
293
- async def health():
294
- return {"status": "ok"}
295
-
296
-
297
- @app.post("/analyze")
298
- async def analyze_face(file: UploadFile = File(...)):
299
- """Comprehensive face analysis endpoint."""
300
- try:
301
- contents = await file.read()
302
- image = Image.open(io.BytesIO(contents)).convert("RGB")
303
- img_array = np.array(image)
304
- img_bgr = cv2.cvtColor(img_array, cv2.COLOR_RGB2BGR)
305
-
306
- (
307
- landmarks,
308
- demographics,
309
- attributes,
310
- parsing,
311
- emotions,
312
- colors,
313
- ) = get_analyzers()
314
-
315
- results = {}
316
-
317
- # 1. MediaPipe Landmarks → geometric features
318
- logger.info("Running landmark analysis...")
319
- landmark_results = landmarks.analyze(img_array)
320
- results.update(landmark_results)
321
-
322
- # 2. FairFace → age, gender, race
323
- logger.info("Running demographic analysis...")
324
- demo_results = demographics.analyze(img_array)
325
- results.update(demo_results)
326
-
327
- # 3. CelebA attributes → 40 binary facial attributes
328
- logger.info("Running attribute analysis...")
329
- attr_results = attributes.analyze(img_array)
330
- results.update(attr_results)
331
-
332
- # 4. BiSeNet face parsing → segmentation masks
333
- logger.info("Running face parsing...")
334
- parse_results = parsing.analyze(img_bgr)
335
- results.update(parse_results)
336
-
337
- # 5. HSEmotion → emotion classification
338
- logger.info("Running emotion analysis...")
339
- emo_results = emotions.analyze(img_array)
340
- results.update(emo_results)
341
-
342
- # 6. Color analysis using parsing masks
343
- logger.info("Running color analysis...")
344
- color_results = colors.analyze(
345
- img_array,
346
- skin_mask=parse_results.get("_skin_mask"),
347
- hair_mask=parse_results.get("_hair_mask"),
348
- landmark_data=landmark_results.get("_raw_landmarks"),
349
- )
350
- results.update(color_results)
351
-
352
- # Remove internal fields
353
- results = {k: v for k, v in results.items() if not k.startswith("_")}
354
-
355
- return {"success": True, "data": results}
356
-
357
- except Exception as e:
358
- logger.error(f"Analysis failed: {e}", exc_info=True)
359
- raise HTTPException(status_code=500, detail=str(e))
360
- ```
361
-
362
- #### face-service/analyzers/landmark_analyzer.py
363
-
364
- ```python
365
- """
366
- MediaPipe Face Landmarker — 478 3D landmarks + 52 blendshapes
367
- Derives geometric facial features from landmark positions.
368
- """
369
-
370
- import math
371
- from typing import Any
372
-
373
- import mediapipe as mp
374
- import numpy as np
375
- from mediapipe.tasks import python as mp_python
376
- from mediapipe.tasks.python import vision
377
-
378
-
379
- class LandmarkAnalyzer:
380
- def __init__(self):
381
- base_options = mp_python.BaseOptions(
382
- model_asset_path=self._download_model()
383
- )
384
- options = vision.FaceLandmarkerOptions(
385
- base_options=base_options,
386
- output_face_blendshapes=True,
387
- output_facial_transformation_matrixes=True,
388
- num_faces=1,
389
- )
390
- self.detector = vision.FaceLandmarker.create_from_options(options)
391
-
392
- def _download_model(self) -> str:
393
- import urllib.request
394
- import os
395
-
396
- model_path = "models/face_landmarker.task"
397
- if not os.path.exists(model_path):
398
- os.makedirs("models", exist_ok=True)
399
- url = "https://storage.googleapis.com/mediapipe-models/face_landmarker/face_landmarker/float16/latest/face_landmarker.task"
400
- urllib.request.urlretrieve(url, model_path)
401
- return model_path
402
-
403
- def analyze(self, img_rgb: np.ndarray) -> dict[str, Any]:
404
- mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=img_rgb)
405
- result = self.detector.detect(mp_image)
406
-
407
- if not result.face_landmarks:
408
- return {"error": "No face detected by MediaPipe"}
409
-
410
- landmarks = result.face_landmarks[0]
411
- lm = [{"x": l.x, "y": l.y, "z": l.z} for l in landmarks]
412
-
413
- blendshapes = {}
414
- if result.face_blendshapes:
415
- for bs in result.face_blendshapes[0]:
416
- blendshapes[bs.category_name] = round(bs.score, 4)
417
-
418
- attrs = {}
419
- attrs["_raw_landmarks"] = lm
420
-
421
- # === Face Shape ===
422
- face_height = self._dist(lm[10], lm[152])
423
- face_width = self._dist(lm[234], lm[454])
424
- jaw_width = self._dist(lm[172], lm[397])
425
- cheekbone_width = self._dist(lm[93], lm[323])
426
- forehead_width = self._dist(lm[54], lm[284])
427
-
428
- wh_ratio = face_width / face_height if face_height > 0 else 1
429
- jaw_to_face = jaw_width / face_width if face_width > 0 else 1
430
- forehead_to_jaw = forehead_width / jaw_width if jaw_width > 0 else 1
431
- cheek_to_jaw = cheekbone_width / jaw_width if jaw_width > 0 else 1
432
-
433
- if wh_ratio > 0.85 and jaw_to_face > 0.75:
434
- attrs["face_shape"] = "round"
435
- elif wh_ratio > 0.8 and jaw_to_face > 0.8 and forehead_to_jaw < 1.1:
436
- attrs["face_shape"] = "square"
437
- elif wh_ratio < 0.75:
438
- attrs["face_shape"] = "oblong"
439
- elif forehead_to_jaw > 1.3:
440
- attrs["face_shape"] = "heart"
441
- elif cheek_to_jaw > 1.25 and forehead_to_jaw < 1.15:
442
- attrs["face_shape"] = "diamond"
443
- elif forehead_to_jaw < 0.85:
444
- attrs["face_shape"] = "triangle"
445
- else:
446
- attrs["face_shape"] = "oval"
447
-
448
- attrs["face_shape_metrics"] = {
449
- "width_height_ratio": round(wh_ratio, 3),
450
- "jaw_to_face_ratio": round(jaw_to_face, 3),
451
- "forehead_to_jaw_ratio": round(forehead_to_jaw, 3),
452
- "cheekbone_to_jaw_ratio": round(cheek_to_jaw, 3),
453
- }
454
-
455
- # === Forehead ===
456
- forehead_ratio = forehead_width / face_width if face_width > 0 else 0.6
457
- attrs["forehead_width"] = (
458
- "broad" if forehead_ratio > 0.7
459
- else "narrow" if forehead_ratio < 0.55
460
- else "average"
461
- )
462
-
463
- # === Jawline ===
464
- jaw_angle = self._jaw_angle(lm)
465
- attrs["jawline_angle"] = round(jaw_angle, 1)
466
- if jaw_angle < 110:
467
- attrs["jawline_type"] = "sharp"
468
- elif jaw_angle > 140:
469
- attrs["jawline_type"] = "soft"
470
- elif jaw_to_face > 0.75:
471
- attrs["jawline_type"] = "strong"
472
- else:
473
- attrs["jawline_type"] = "soft"
474
-
475
- # === Chin ===
476
- chin_width = self._dist(lm[175], lm[396])
477
- chin_ratio = chin_width / jaw_width if jaw_width > 0 else 0.4
478
- attrs["chin_type"] = (
479
- "pointed" if chin_ratio < 0.3
480
- else "wide" if chin_ratio > 0.5
481
- else "normal"
482
- )
483
-
484
- # === Cheekbones ===
485
- cheek_z = (lm[93]["z"] + lm[323]["z"]) / 2
486
- attrs["cheekbone_prominence"] = (
487
- "high" if cheek_z < -0.04
488
- else "flat" if cheek_z > 0.0
489
- else "moderate"
490
- )
491
-
492
- # Hollow vs full cheeks (blendshape-assisted)
493
- cheek_puff = blendshapes.get("cheekPuff", 0)
494
- cheek_squint_l = blendshapes.get("cheekSquintLeft", 0)
495
- cheek_squint_r = blendshapes.get("cheekSquintRight", 0)
496
- if cheek_puff > 0.3:
497
- attrs["cheek_fullness"] = "full"
498
- elif cheek_z > -0.01:
499
- attrs["cheek_fullness"] = "hollow"
500
- else:
501
- attrs["cheek_fullness"] = "normal"
502
-
503
- # === Eyes ===
504
- left_eye_top = lm[159]
505
- left_eye_bottom = lm[145]
506
- left_eye_inner = lm[133]
507
- left_eye_outer = lm[33]
508
- eye_openness = self._dist(left_eye_top, left_eye_bottom)
509
- eye_width_val = self._dist(left_eye_inner, left_eye_outer)
510
- eye_ratio = eye_openness / eye_width_val if eye_width_val > 0 else 0.3
511
-
512
- outer_angle = left_eye_outer["y"] - left_eye_inner["y"]
513
- if outer_angle < -0.012:
514
- attrs["eye_shape"] = "upturned"
515
- elif outer_angle > 0.012:
516
- attrs["eye_shape"] = "downturned"
517
- elif eye_ratio > 0.38:
518
- attrs["eye_shape"] = "round"
519
- elif eye_ratio < 0.2:
520
- attrs["eye_shape"] = "hooded"
521
- else:
522
- attrs["eye_shape"] = "almond"
523
-
524
- # Deep-set vs protruding
525
- eye_z = (lm[159]["z"] + lm[145]["z"]) / 2
526
- nose_bridge_z = lm[6]["z"]
527
- if eye_z > nose_bridge_z + 0.02:
528
- attrs["eye_depth"] = "deep-set"
529
- elif eye_z < nose_bridge_z - 0.01:
530
- attrs["eye_depth"] = "protruding"
531
- else:
532
- attrs["eye_depth"] = "normal"
533
-
534
- # Eye spacing
535
- if len(lm) > 473: # Iris landmarks available
536
- inter_pupillary = self._dist(lm[468], lm[473])
537
- else:
538
- inter_pupillary = self._dist(lm[133], lm[362])
539
- ip_ratio = inter_pupillary / face_width if face_width > 0 else 0.35
540
- attrs["eye_spacing"] = (
541
- "wide-set" if ip_ratio > 0.38
542
- else "close-set" if ip_ratio < 0.28
543
- else "average"
544
- )
545
-
546
- # Eye size
547
- right_eye_top = lm[386]
548
- right_eye_bottom = lm[374]
549
- right_eye_inner = lm[362]
550
- right_eye_outer = lm[263]
551
- r_eye_area = self._dist(right_eye_top, right_eye_bottom) * self._dist(right_eye_inner, right_eye_outer)
552
- l_eye_area = eye_openness * eye_width_val
553
- avg_eye_area = (l_eye_area + r_eye_area) / 2
554
- face_area = face_width * face_height
555
- eye_size_ratio = avg_eye_area / face_area if face_area > 0 else 0.015
556
- attrs["eye_size"] = (
557
- "large" if eye_size_ratio > 0.02
558
- else "small" if eye_size_ratio < 0.012
559
- else "average"
560
- )
561
-
562
- # Eye blink (closed vs open)
563
- blink_l = blendshapes.get("eyeBlinkLeft", 0)
564
- blink_r = blendshapes.get("eyeBlinkRight", 0)
565
- attrs["eyes_open"] = (blink_l + blink_r) / 2 < 0.5
566
-
567
- # === Eyebrows ===
568
- brow_mid_l = lm[105]
569
- brow_outer_l = lm[46]
570
- brow_inner_l = lm[70]
571
- brow_to_eye = self._dist(brow_mid_l, lm[159])
572
- brow_arch_ratio = brow_to_eye / eye_openness if eye_openness > 0 else 1.5
573
-
574
- attrs["eyebrow_arch_height"] = (
575
- "high" if brow_arch_ratio > 2.2
576
- else "low" if brow_arch_ratio < 1.3
577
- else "average"
578
- )
579
-
580
- # Brow curvature
581
- mid_y = brow_mid_l["y"]
582
- avg_end_y = (brow_inner_l["y"] + brow_outer_l["y"]) / 2
583
- curvature = mid_y - avg_end_y
584
- if abs(curvature) < 0.003:
585
- attrs["eyebrow_shape"] = "straight"
586
- elif curvature < -0.008:
587
- attrs["eyebrow_shape"] = "arched"
588
- else:
589
- attrs["eyebrow_shape"] = "flat"
590
-
591
- # Eyebrow thickness (vertical span of brow landmarks)
592
- brow_top = lm[66] # Top of left brow
593
- brow_bottom = lm[105] # Bottom of left brow
594
- brow_thickness = self._dist(brow_top, brow_bottom)
595
- attrs["eyebrow_thickness"] = (
596
- "thick" if brow_thickness > 0.015
597
- else "thin" if brow_thickness < 0.008
598
- else "medium"
599
- )
600
-
601
- # Unibrow detection
602
- inner_brow_dist = self._dist(lm[70], lm[300])
603
- attrs["possible_unibrow"] = inner_brow_dist < 0.04
604
-
605
- # === Nose ===
606
- nose_bridge_top = lm[6]
607
- nose_tip = lm[1]
608
- nose_bottom = lm[2]
609
- left_nostril = lm[129]
610
- right_nostril = lm[358]
611
- nostril_w = self._dist(left_nostril, right_nostril)
612
-
613
- nw_ratio = nostril_w / face_width if face_width > 0 else 0.24
614
- attrs["nostril_width"] = (
615
- "wide" if nw_ratio > 0.28
616
- else "narrow" if nw_ratio < 0.2
617
- else "average"
618
- )
619
-
620
- tip_angle = nose_tip["y"] - nose_bottom["y"]
621
- if tip_angle < -0.005:
622
- attrs["nose_shape"] = "upturned"
623
- elif tip_angle > 0.01:
624
- attrs["nose_shape"] = "aquiline"
625
- elif nw_ratio > 0.28:
626
- attrs["nose_shape"] = "wide"
627
- elif nw_ratio < 0.2:
628
- attrs["nose_shape"] = "narrow"
629
- else:
630
- attrs["nose_shape"] = "straight"
631
-
632
- attrs["nose_bridge"] = (
633
- "high" if nose_bridge_top["z"] < -0.05
634
- else "flat" if nose_bridge_top["z"] > 0.0
635
- else "average"
636
- )
637
-
638
- attrs["nose_tip_shape"] = (
639
- "pointed" if nose_tip["z"] < nose_bottom["z"] - 0.01
640
- else "rounded"
641
- )
642
-
643
- # === Lips & Mouth ===
644
- upper_lip_top = lm[0]
645
- upper_lip_bottom = lm[13]
646
- lower_lip_top = lm[14]
647
- lower_lip_bottom = lm[17]
648
- mouth_left = lm[61]
649
- mouth_right = lm[291]
650
-
651
- upper_lip_h = self._dist(upper_lip_top, upper_lip_bottom)
652
- lower_lip_h = self._dist(lower_lip_top, lower_lip_bottom)
653
- total_lip_h = upper_lip_h + lower_lip_h
654
- mouth_w = self._dist(mouth_left, mouth_right)
655
-
656
- lip_ratio = total_lip_h / mouth_w if mouth_w > 0 else 0.3
657
- attrs["lip_fullness"] = (
658
- "full" if lip_ratio > 0.38
659
- else "thin" if lip_ratio < 0.22
660
- else "average"
661
- )
662
-
663
- attrs["lip_balance"] = (
664
- "top-heavy" if upper_lip_h > lower_lip_h * 1.2
665
- else "bottom-heavy" if lower_lip_h > upper_lip_h * 1.2
666
- else "balanced"
667
- )
668
-
669
- mw_ratio = mouth_w / face_width if face_width > 0 else 0.37
670
- attrs["mouth_width"] = (
671
- "wide" if mw_ratio > 0.42
672
- else "small" if mw_ratio < 0.32
673
- else "average"
674
- )
675
-
676
- # Cupid's bow
677
- cupid_left = lm[37]
678
- cupid_center = lm[0]
679
- cupid_right = lm[267]
680
- bow_depth = cupid_center["y"] - (cupid_left["y"] + cupid_right["y"]) / 2
681
- attrs["cupids_bow"] = (
682
- "defined" if bow_depth > 0.005
683
- else "subtle" if bow_depth > 0.002
684
- else "flat"
685
- )
686
-
687
- # Smile
688
- smile_l = blendshapes.get("mouthSmileLeft", 0)
689
- smile_r = blendshapes.get("mouthSmileRight", 0)
690
- attrs["smiling"] = (smile_l + smile_r) / 2 > 0.4
691
- attrs["smile_asymmetry"] = round(abs(smile_l - smile_r), 3)
692
-
693
- # Dimples (heuristic: strong smile with low cheek puff)
694
- attrs["possible_dimples"] = (
695
- (smile_l > 0.5 or smile_r > 0.5) and cheek_puff < 0.2
696
- )
697
-
698
- # === Facial Asymmetry ===
699
- symmetry_pairs = [
700
- (33, 263), (133, 362), (70, 300), (93, 323), (172, 397),
701
- (61, 291), (159, 386), (145, 374), (46, 276),
702
- ]
703
- asymmetry_sum = 0.0
704
- for li, ri in symmetry_pairs:
705
- left_dist = abs(lm[li]["x"] - 0.5)
706
- right_dist = abs(lm[ri]["x"] - 0.5)
707
- asymmetry_sum += abs(left_dist - right_dist)
708
- attrs["facial_asymmetry_score"] = round(
709
- min(asymmetry_sum / len(symmetry_pairs) / 0.05, 1.0), 3
710
- )
711
-
712
- # === Head Pose (from transformation matrix) ===
713
- attrs["blendshapes"] = blendshapes
714
-
715
- return attrs
716
-
717
- def _dist(self, a: dict, b: dict) -> float:
718
- return math.sqrt(
719
- (a["x"] - b["x"]) ** 2
720
- + (a["y"] - b["y"]) ** 2
721
- + (a.get("z", 0) - b.get("z", 0)) ** 2
722
- )
723
-
724
- def _jaw_angle(self, lm: list[dict]) -> float:
725
- chin = lm[152]
726
- left_jaw = lm[172]
727
- right_jaw = lm[397]
728
- v1 = (left_jaw["x"] - chin["x"], left_jaw["y"] - chin["y"])
729
- v2 = (right_jaw["x"] - chin["x"], right_jaw["y"] - chin["y"])
730
- dot = v1[0] * v2[0] + v1[1] * v2[1]
731
- mag1 = math.sqrt(v1[0] ** 2 + v1[1] ** 2)
732
- mag2 = math.sqrt(v2[0] ** 2 + v2[1] ** 2)
733
- if mag1 * mag2 == 0:
734
- return 120.0
735
- cos_angle = max(-1, min(1, dot / (mag1 * mag2)))
736
- return math.acos(cos_angle) * (180 / math.pi)
737
- ```
738
-
739
- #### face-service/analyzers/demographic_analyzer.py
740
-
741
- ```python
742
- """
743
- FairFace — Age, Gender, Race prediction
744
- Most fair and accurate demographic classifier.
745
- """
746
-
747
- import os
748
- from typing import Any
749
-
750
- import cv2
751
- import numpy as np
752
- import torch
753
- import torchvision.transforms as transforms
754
- from huggingface_hub import hf_hub_download
755
- from PIL import Image
756
- from torchvision import models
757
-
758
-
759
- class DemographicAnalyzer:
760
- """FairFace-based age, gender, race classifier."""
761
-
762
- AGE_LABELS = [
763
- "0-2", "3-9", "10-19", "20-29", "30-39", "40-49", "50-59", "60-69", "70+"
764
- ]
765
- GENDER_LABELS = ["Male", "Female"]
766
- RACE_LABELS = [
767
- "White", "Black", "Latino_Hispanic", "East Asian",
768
- "Southeast Asian", "Indian", "Middle Eastern"
769
- ]
770
-
771
- def __init__(self):
772
- self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
773
- self.model = self._load_model()
774
- self.transform = transforms.Compose([
775
- transforms.Resize((224, 224)),
776
- transforms.ToTensor(),
777
- transforms.Normalize(
778
- mean=[0.485, 0.456, 0.406],
779
- std=[0.229, 0.224, 0.225],
780
- ),
781
- ])
782
-
783
- def _load_model(self):
784
- """Load FairFace ResNet34 model."""
785
- model_path = "models/fairface_model.pt"
786
- if not os.path.exists(model_path):
787
- os.makedirs("models", exist_ok=True)
788
- # Download from HuggingFace mirror or original source
789
- # FairFace official weights: res34_fair_align_multi_7_20190809.pt
790
- try:
791
- hf_hub_download(
792
- repo_id="dchen236/FairFace",
793
- filename="res34_fair_align_multi_7_20190809.pt",
794
- local_dir="models",
795
- local_dir_use_symlinks=False,
796
- )
797
- os.rename(
798
- "models/res34_fair_align_multi_7_20190809.pt",
799
- model_path,
800
- )
801
- except Exception:
802
- # Fallback: use a smaller pretrained model
803
- raise FileNotFoundError(
804
- "Please download FairFace weights from "
805
- "https://github.com/dchen236/FairFace and place at models/fairface_model.pt"
806
- )
807
-
808
- model = models.resnet34(pretrained=False)
809
- # FairFace has 3 output heads: race(7), gender(2), age(9) = 18
810
- model.fc = torch.nn.Linear(model.fc.in_features, 18)
811
- model.load_state_dict(torch.load(model_path, map_location=self.device))
812
- model.to(self.device)
813
- model.eval()
814
- return model
815
-
816
- def analyze(self, img_rgb: np.ndarray) -> dict[str, Any]:
817
- """Predict age, gender, and race."""
818
- pil_image = Image.fromarray(img_rgb)
819
- input_tensor = self.transform(pil_image).unsqueeze(0).to(self.device)
820
-
821
- with torch.no_grad():
822
- outputs = self.model(input_tensor)
823
-
824
- outputs = outputs.cpu().numpy()[0]
825
-
826
- # Split outputs: race(0-6), gender(7-8), age(9-17)
827
- race_logits = outputs[0:7]
828
- gender_logits = outputs[7:9]
829
- age_logits = outputs[9:18]
830
-
831
- race_probs = self._softmax(race_logits)
832
- gender_probs = self._softmax(gender_logits)
833
- age_probs = self._softmax(age_logits)
834
-
835
- race_idx = int(np.argmax(race_probs))
836
- gender_idx = int(np.argmax(gender_probs))
837
- age_idx = int(np.argmax(age_probs))
838
-
839
- # Estimate numeric age from bucket
840
- age_ranges = [(0, 2), (3, 9), (10, 19), (20, 29), (30, 39), (40, 49), (50, 59), (60, 69), (70, 85)]
841
- age_estimate = sum(age_ranges[age_idx]) / 2
842
-
843
- return {
844
- "age_estimate": round(age_estimate, 1),
845
- "age_range": self.AGE_LABELS[age_idx],
846
- "age_confidence": round(float(age_probs[age_idx]), 3),
847
- "gender": self.GENDER_LABELS[gender_idx].lower(),
848
- "gender_confidence": round(float(gender_probs[gender_idx]), 3),
849
- "race": self.RACE_LABELS[race_idx],
850
- "race_confidence": round(float(race_probs[race_idx]), 3),
851
- "race_probabilities": {
852
- label: round(float(prob), 3)
853
- for label, prob in zip(self.RACE_LABELS, race_probs)
854
- },
855
- }
856
-
857
- @staticmethod
858
- def _softmax(x: np.ndarray) -> np.ndarray:
859
- e_x = np.exp(x - np.max(x))
860
- return e_x / e_x.sum()
861
- ```
862
-
863
- #### face-service/analyzers/attribute_analyzer.py
864
-
865
- ```python
866
- """
867
- CelebA Multi-Label Attribute Classifier
868
- Predicts 40 binary facial attributes from CelebA-trained model.
869
- Uses a pretrained model from HuggingFace.
870
- """
871
-
872
- import os
873
- from typing import Any
874
-
875
- import numpy as np
876
- import torch
877
- import torchvision.transforms as transforms
878
- from PIL import Image
879
-
880
-
881
- CELEBA_ATTRIBUTES = [
882
- "5_o_Clock_Shadow", "Arched_Eyebrows", "Attractive", "Bags_Under_Eyes",
883
- "Bald", "Bangs", "Big_Lips", "Big_Nose", "Black_Hair", "Blond_Hair",
884
- "Blurry", "Brown_Hair", "Bushy_Eyebrows", "Chubby", "Double_Chin",
885
- "Eyeglasses", "Goatee", "Gray_Hair", "Heavy_Makeup", "High_Cheekbones",
886
- "Male", "Mouth_Slightly_Open", "Mustache", "Narrow_Eyes", "No_Beard",
887
- "Oval_Face", "Pale_Skin", "Pointy_Nose", "Receding_Hairline",
888
- "Rosy_Cheeks", "Sideburns", "Smiling", "Straight_Hair", "Wavy_Hair",
889
- "Wearing_Earrings", "Wearing_Hat", "Wearing_Lipstick", "Wearing_Necklace",
890
- "Wearing_Necktie", "Young",
891
- ]
892
-
893
-
894
- class AttributeAnalyzer:
895
- """CelebA 40-attribute binary classifier using a fine-tuned ResNet."""
896
-
897
- def __init__(self):
898
- self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
899
- self.model = self._load_model()
900
- self.transform = transforms.Compose([
901
- transforms.Resize((224, 224)),
902
- transforms.ToTensor(),
903
- transforms.Normalize(
904
- mean=[0.485, 0.456, 0.406],
905
- std=[0.229, 0.224, 0.225],
906
- ),
907
- ])
908
-
909
- def _load_model(self):
910
- """
911
- Load a CelebA attribute prediction model.
912
- Using a ResNet-18 fine-tuned on CelebA for 40 attributes.
913
- """
914
- from torchvision import models
915
-
916
- model_path = "models/celeba_resnet18.pt"
917
-
918
- if not os.path.exists(model_path):
919
- os.makedirs("models", exist_ok=True)
920
- # Try loading from HuggingFace
921
- try:
922
- from huggingface_hub import hf_hub_download
923
- hf_hub_download(
924
- repo_id="jnferreira/attribute-prediction-celebA",
925
- filename="model.pt",
926
- local_dir="models",
927
- local_dir_use_symlinks=False,
928
- )
929
- os.rename("models/model.pt", model_path)
930
- except Exception:
931
- # Fallback: build a fresh model skeleton
932
- # Users will need to train or provide weights
933
- model = models.resnet18(pretrained=True)
934
- model.fc = torch.nn.Linear(model.fc.in_features, 40)
935
- torch.save(model.state_dict(), model_path)
936
- print(
937
- "WARNING: Using ImageNet-pretrained ResNet18 without CelebA fine-tuning. "
938
- "Attribute predictions will be inaccurate. "
939
- "Please provide CelebA-trained weights at models/celeba_resnet18.pt"
940
- )
941
-
942
- model = models.resnet18(pretrained=False)
943
- model.fc = torch.nn.Linear(model.fc.in_features, 40)
944
- model.load_state_dict(
945
- torch.load(model_path, map_location=self.device)
946
- )
947
- model.to(self.device)
948
- model.eval()
949
- return model
950
-
951
- def analyze(self, img_rgb: np.ndarray) -> dict[str, Any]:
952
- pil_image = Image.fromarray(img_rgb)
953
- input_tensor = self.transform(pil_image).unsqueeze(0).to(self.device)
954
-
955
- with torch.no_grad():
956
- logits = self.model(input_tensor)
957
-
958
- probs = torch.sigmoid(logits).cpu().numpy()[0]
959
-
960
- # Build structured results
961
- raw_attrs = {
962
- attr: round(float(prob), 3)
963
- for attr, prob in zip(CELEBA_ATTRIBUTES, probs)
964
- }
965
-
966
- # Interpret into user-friendly categories
967
- result: dict[str, Any] = {"celeba_raw": raw_attrs}
968
-
969
- # Hair color (pick highest confidence)
970
- hair_colors = {
971
- "black": raw_attrs.get("Black_Hair", 0),
972
- "brown": raw_attrs.get("Brown_Hair", 0),
973
- "blonde": raw_attrs.get("Blond_Hair", 0),
974
- "gray": raw_attrs.get("Gray_Hair", 0),
975
- }
976
- result["hair_color_celeba"] = max(hair_colors, key=hair_colors.get)
977
-
978
- # Hair type
979
- if raw_attrs.get("Straight_Hair", 0) > 0.5:
980
- result["hair_type_celeba"] = "straight"
981
- elif raw_attrs.get("Wavy_Hair", 0) > 0.5:
982
- result["hair_type_celeba"] = "wavy"
983
- else:
984
- result["hair_type_celeba"] = "unknown"
985
-
986
- result["bald"] = raw_attrs.get("Bald", 0) > 0.5
987
- result["bangs"] = raw_attrs.get("Bangs", 0) > 0.5
988
- result["receding_hairline"] = raw_attrs.get("Receding_Hairline", 0) > 0.5
989
-
990
- # Facial hair
991
- has_beard = raw_attrs.get("No_Beard", 0) < 0.5
992
- has_goatee = raw_attrs.get("Goatee", 0) > 0.5
993
- has_mustache = raw_attrs.get("Mustache", 0) > 0.5
994
- has_sideburns = raw_attrs.get("Sideburns", 0) > 0.5
995
- has_stubble = raw_attrs.get("5_o_Clock_Shadow", 0) > 0.5
996
-
997
- if has_goatee:
998
- result["facial_hair"] = "goatee"
999
- elif has_mustache and has_beard:
1000
- result["facial_hair"] = "full_beard"
1001
- elif has_mustache:
1002
- result["facial_hair"] = "mustache"
1003
- elif has_sideburns:
1004
- result["facial_hair"] = "sideburns"
1005
- elif has_stubble:
1006
- result["facial_hair"] = "stubble"
1007
- elif not has_beard:
1008
- result["facial_hair"] = "clean_shaven"
1009
- else:
1010
- result["facial_hair"] = "beard"
1011
-
1012
- # Appearance attributes
1013
- result["wearing_glasses"] = raw_attrs.get("Eyeglasses", 0) > 0.5
1014
- result["wearing_hat"] = raw_attrs.get("Wearing_Hat", 0) > 0.5
1015
- result["bushy_eyebrows"] = raw_attrs.get("Bushy_Eyebrows", 0) > 0.5
1016
- result["arched_eyebrows_celeba"] = raw_attrs.get("Arched_Eyebrows", 0) > 0.5
1017
- result["bags_under_eyes"] = raw_attrs.get("Bags_Under_Eyes", 0) > 0.5
1018
- result["high_cheekbones_celeba"] = raw_attrs.get("High_Cheekbones", 0) > 0.5
1019
- result["oval_face_celeba"] = raw_attrs.get("Oval_Face", 0) > 0.5
1020
- result["pointy_nose_celeba"] = raw_attrs.get("Pointy_Nose", 0) > 0.5
1021
- result["big_lips_celeba"] = raw_attrs.get("Big_Lips", 0) > 0.5
1022
- result["big_nose_celeba"] = raw_attrs.get("Big_Nose", 0) > 0.5
1023
- result["narrow_eyes_celeba"] = raw_attrs.get("Narrow_Eyes", 0) > 0.5
1024
- result["double_chin"] = raw_attrs.get("Double_Chin", 0) > 0.5
1025
- result["chubby"] = raw_attrs.get("Chubby", 0) > 0.5
1026
- result["rosy_cheeks"] = raw_attrs.get("Rosy_Cheeks", 0) > 0.5
1027
- result["pale_skin"] = raw_attrs.get("Pale_Skin", 0) > 0.5
1028
- result["young"] = raw_attrs.get("Young", 0) > 0.5
1029
- result["smiling_celeba"] = raw_attrs.get("Smiling", 0) > 0.5
1030
- result["mouth_open"] = raw_attrs.get("Mouth_Slightly_Open", 0) > 0.5
1031
-
1032
- return result
1033
- ```
1034
-
1035
- #### face-service/analyzers/parsing_analyzer.py
1036
-
1037
- ```python
1038
- """
1039
- BiSeNet Face Parsing — 19-class semantic segmentation of the face.
1040
- Segments: skin, eyebrows, eyes, nose, lips, hair, ears, neck, etc.
1041
- """
1042
-
1043
- import os
1044
- from typing import Any
1045
-
1046
- import cv2
1047
- import numpy as np
1048
- import torch
1049
- from torchvision import transforms
1050
-
1051
-
1052
- class ParsingAnalyzer:
1053
- """
1054
- BiSeNet face parsing for hair/skin/feature segmentation.
1055
-
1056
- Parsing classes:
1057
- 0: background, 1: skin, 2: l_brow, 3: r_brow, 4: l_eye, 5: r_eye,
1058
- 6: eye_g (glasses), 7: l_ear, 8: r_ear, 9: ear_r (earring),
1059
- 10: nose, 11: mouth, 12: u_lip, 13: l_lip, 14: neck,
1060
- 15: necklace, 16: cloth, 17: hair, 18: hat
1061
- """
1062
-
1063
- LABELS = {
1064
- 0: "background", 1: "skin", 2: "left_brow", 3: "right_brow",
1065
- 4: "left_eye", 5: "right_eye", 6: "glasses", 7: "left_ear",
1066
- 8: "right_ear", 9: "earring", 10: "nose", 11: "mouth",
1067
- 12: "upper_lip", 13: "lower_lip", 14: "neck", 15: "necklace",
1068
- 16: "cloth", 17: "hair", 18: "hat",
1069
- }
1070
-
1071
- def __init__(self):
1072
- self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
1073
- self.model = self._load_model()
1074
- self.transform = transforms.Compose([
1075
- transforms.ToTensor(),
1076
- transforms.Normalize(
1077
- mean=[0.485, 0.456, 0.406],
1078
- std=[0.229, 0.224, 0.225],
1079
- ),
1080
- ])
1081
-
1082
- def _load_model(self):
1083
- model_path = "models/bisenet_face_parsing.pt"
1084
- if not os.path.exists(model_path):
1085
- os.makedirs("models", exist_ok=True)
1086
- # BiSeNet model from face-parsing.PyTorch
1087
- # Download from: https://drive.google.com/file/d/154JgKpzCPW82qINcVieuPH3fZ2e0P812
1088
- raise FileNotFoundError(
1089
- "Please download BiSeNet face parsing weights from "
1090
- "https://github.com/zllrunning/face-parsing.PyTorch and place at "
1091
- "models/bisenet_face_parsing.pt"
1092
- )
1093
-
1094
- from models.bisenet_model import BiSeNet # You'll need to include this
1095
- model = BiSeNet(n_classes=19)
1096
- model.load_state_dict(
1097
- torch.load(model_path, map_location=self.device)
1098
- )
1099
- model.to(self.device)
1100
- model.eval()
1101
- return model
1102
-
1103
- def analyze(self, img_bgr: np.ndarray) -> dict[str, Any]:
1104
- h, w = img_bgr.shape[:2]
1105
- img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)
1106
- img_resized = cv2.resize(img_rgb, (512, 512))
1107
-
1108
- input_tensor = self.transform(img_resized).unsqueeze(0).to(self.device)
1109
-
1110
- with torch.no_grad():
1111
- output = self.model(input_tensor)[0] # BiSeNet returns tuple
1112
-
1113
- parsing = output.squeeze(0).argmax(0).cpu().numpy()
1114
- parsing = cv2.resize(
1115
- parsing.astype(np.uint8), (w, h), interpolation=cv2.INTER_NEAREST
1116
- )
1117
-
1118
- # Generate masks
1119
- skin_mask = (parsing == 1).astype(np.uint8)
1120
- hair_mask = (parsing == 17).astype(np.uint8)
1121
- glasses_mask = (parsing == 6).astype(np.uint8)
1122
- hat_mask = (parsing == 18).astype(np.uint8)
1123
-
1124
- # Facial hair detection: look for dark pixels in lower face skin region
1125
- lower_face = parsing[int(h * 0.55):int(h * 0.85), int(w * 0.25):int(w * 0.75)]
1126
- lower_skin = (lower_face == 1).sum()
1127
- total_lower = lower_face.size or 1
1128
-
1129
- # Region stats
1130
- hair_area = hair_mask.sum() / (h * w)
1131
- skin_area = skin_mask.sum() / (h * w)
1132
-
1133
- result: dict[str, Any] = {
1134
- "_skin_mask": skin_mask,
1135
- "_hair_mask": hair_mask,
1136
- "has_glasses_parsing": int(glasses_mask.sum()) > 100,
1137
- "wearing_hat_parsing": int(hat_mask.sum()) > 500,
1138
- "hair_coverage": round(float(hair_area), 3),
1139
- "skin_coverage": round(float(skin_area), 3),
1140
- }
1141
-
1142
- # Hair length estimation from mask
1143
- if hair_area < 0.01:
1144
- result["hair_length_estimate"] = "bald"
1145
- elif hair_area < 0.08:
1146
- result["hair_length_estimate"] = "short"
1147
- elif hair_area < 0.18:
1148
- result["hair_length_estimate"] = "medium"
1149
- else:
1150
- result["hair_length_estimate"] = "long"
1151
-
1152
- # Wrinkle analysis on forehead skin
1153
- forehead_region = img_bgr[int(h * 0.05):int(h * 0.25), int(w * 0.3):int(w * 0.7)]
1154
- forehead_skin = skin_mask[int(h * 0.05):int(h * 0.25), int(w * 0.3):int(w * 0.7)]
1155
- if forehead_skin.sum() > 100:
1156
- gray_forehead = cv2.cvtColor(forehead_region, cv2.COLOR_BGR2GRAY)
1157
- # Apply mask
1158
- gray_forehead = cv2.bitwise_and(gray_forehead, gray_forehead, mask=forehead_skin)
1159
- edges = cv2.Canny(gray_forehead, 30, 80)
1160
- edge_density = edges.sum() / (forehead_skin.sum() * 255 + 1)
1161
- result["forehead_wrinkle_score"] = round(float(edge_density), 3)
1162
- result["forehead_wrinkles"] = (
1163
- "heavy" if edge_density > 0.15
1164
- else "moderate" if edge_density > 0.08
1165
- else "mild" if edge_density > 0.04
1166
- else "none"
1167
- )
1168
-
1169
- # Freckles/moles detection on skin
1170
- skin_region = cv2.bitwise_and(img_bgr, img_bgr, mask=skin_mask)
1171
- gray_skin = cv2.cvtColor(skin_region, cv2.COLOR_BGR2GRAY)
1172
- # Detect dark spots
1173
- _, dark_spots = cv2.threshold(gray_skin, 80, 255, cv2.THRESH_BINARY_INV)
1174
- dark_spots = cv2.bitwise_and(dark_spots, dark_spots, mask=skin_mask)
1175
- # Find contours of dark spots
1176
- contours, _ = cv2.findContours(dark_spots, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
1177
- small_spots = [c for c in contours if 5 < cv2.contourArea(c) < 200]
1178
- result["possible_freckles_moles"] = len(small_spots) > 10
1179
- result["dark_spot_count"] = len(small_spots)
1180
-
1181
- return result
1182
- ```
1183
-
1184
- #### face-service/analyzers/emotion_analyzer.py
1185
-
1186
- ```python
1187
- """
1188
- HSEmotion — State-of-the-art facial emotion recognition.
1189
- Supports 8 emotions on AffectNet.
1190
- """
1191
-
1192
- import os
1193
- from typing import Any
1194
-
1195
- import cv2
1196
- import numpy as np
1197
- import torch
1198
- import torchvision.transforms as transforms
1199
- from PIL import Image
1200
-
1201
-
1202
- class EmotionAnalyzer:
1203
- """HSEmotion-based facial expression classifier."""
1204
-
1205
- EMOTION_LABELS = [
1206
- "angry", "contempt", "disgust", "fear",
1207
- "happy", "neutral", "sad", "surprise",
1208
- ]
1209
-
1210
- def __init__(self):
1211
- self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
1212
- self.model = self._load_model()
1213
- self.transform = transforms.Compose([
1214
- transforms.Resize((260, 260)),
1215
- transforms.CenterCrop(224),
1216
- transforms.ToTensor(),
1217
- transforms.Normalize(
1218
- mean=[0.485, 0.456, 0.406],
1219
- std=[0.229, 0.224, 0.225],
1220
- ),
1221
- ])
1222
-
1223
- def _load_model(self):
1224
- """Load HSEmotion EfficientNet model."""
1225
- model_path = "models/hsemotion_enet_b0_8.pt"
1226
-
1227
- if not os.path.exists(model_path):
1228
- os.makedirs("models", exist_ok=True)
1229
- try:
1230
- from huggingface_hub import hf_hub_download
1231
- # HSEmotion models available at:
1232
- # https://github.com/HSE-asavchenko/face-emotion-recognition
1233
- hf_hub_download(
1234
- repo_id="HSE-asavchenko/hsemotion",
1235
- filename="enet_b0_8_best_afew.pt",
1236
- local_dir="models",
1237
- local_dir_use_symlinks=False,
1238
- )
1239
- os.rename("models/enet_b0_8_best_afew.pt", model_path)
1240
- except Exception:
1241
- raise FileNotFoundError(
1242
- "Please download HSEmotion weights from "
1243
- "https://github.com/HSE-asavchenko/face-emotion-recognition"
1244
- )
1245
-
1246
- import timm
1247
- model = timm.create_model("efficientnet_b0", pretrained=False, num_classes=8)
1248
- model.load_state_dict(torch.load(model_path, map_location=self.device))
1249
- model.to(self.device)
1250
- model.eval()
1251
- return model
1252
-
1253
- def analyze(self, img_rgb: np.ndarray) -> dict[str, Any]:
1254
- pil_image = Image.fromarray(img_rgb)
1255
- input_tensor = self.transform(pil_image).unsqueeze(0).to(self.device)
1256
-
1257
- with torch.no_grad():
1258
- logits = self.model(input_tensor)
1259
-
1260
- probs = torch.softmax(logits, dim=1).cpu().numpy()[0]
1261
- top_idx = int(np.argmax(probs))
1262
-
1263
- return {
1264
- "emotion": self.EMOTION_LABELS[top_idx],
1265
- "emotion_confidence": round(float(probs[top_idx]), 3),
1266
- "emotion_probabilities": {
1267
- label: round(float(prob), 3)
1268
- for label, prob in zip(self.EMOTION_LABELS, probs)
1269
- },
1270
- }
1271
- ```
1272
-
1273
- #### face-service/analyzers/color_analyzer.py
1274
-
1275
- ```python
1276
- """
1277
- Pixel-level color analysis using segmentation masks from BiSeNet
1278
- and landmark positions from MediaPipe.
1279
- """
1280
-
1281
- from typing import Any, Optional
1282
-
1283
- import cv2
1284
- import numpy as np
1285
- from sklearn.cluster import KMeans
1286
-
1287
-
1288
- class ColorAnalyzer:
1289
- """Analyzes skin tone, eye color, and hair color from pixel data."""
1290
-
1291
- def analyze(
1292
- self,
1293
- img_rgb: np.ndarray,
1294
- skin_mask: Optional[np.ndarray] = None,
1295
- hair_mask: Optional[np.ndarray] = None,
1296
- landmark_data: Optional[list[dict]] = None,
1297
- ) -> dict[str, Any]:
1298
- h, w = img_rgb.shape[:2]
1299
- results: dict[str, Any] = {}
1300
-
1301
- # === Skin Tone ===
1302
- if skin_mask is not None and skin_mask.sum() > 100:
1303
- skin_pixels = img_rgb[skin_mask > 0]
1304
- # Convert to LAB for perceptually uniform brightness
1305
- skin_lab = cv2.cvtColor(
1306
- skin_pixels.reshape(-1, 1, 3), cv2.COLOR_RGB2LAB
1307
- ).reshape(-1, 3)
1308
- avg_l = float(skin_lab[:, 0].mean()) # L channel (brightness)
1309
-
1310
- if avg_l > 180:
1311
- results["skin_tone"] = "very_light"
1312
- elif avg_l > 155:
1313
- results["skin_tone"] = "light"
1314
- elif avg_l > 130:
1315
- results["skin_tone"] = "medium_light"
1316
- elif avg_l > 105:
1317
- results["skin_tone"] = "medium"
1318
- elif avg_l > 80:
1319
- results["skin_tone"] = "medium_dark"
1320
- else:
1321
- results["skin_tone"] = "dark"
1322
-
1323
- results["skin_tone_score"] = round(avg_l / 255, 3)
1324
-
1325
- # Fitzpatrick scale approximation
1326
- if avg_l > 170:
1327
- results["fitzpatrick_type"] = "I"
1328
- elif avg_l > 145:
1329
- results["fitzpatrick_type"] = "II"
1330
- elif avg_l > 120:
1331
- results["fitzpatrick_type"] = "III"
1332
- elif avg_l > 95:
1333
- results["fitzpatrick_type"] = "IV"
1334
- elif avg_l > 70:
1335
- results["fitzpatrick_type"] = "V"
1336
- else:
1337
- results["fitzpatrick_type"] = "VI"
1338
-
1339
- # === Hair Color ===
1340
- if hair_mask is not None and hair_mask.sum() > 500:
1341
- hair_pixels = img_rgb[hair_mask > 0]
1342
-
1343
- # K-means to find dominant hair color
1344
- if len(hair_pixels) > 100:
1345
- sample_size = min(5000, len(hair_pixels))
1346
- indices = np.random.choice(len(hair_pixels), sample_size, replace=False)
1347
- sampled = hair_pixels[indices].astype(np.float32)
1348
-
1349
- kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)
1350
- kmeans.fit(sampled)
1351
-
1352
- # Pick the cluster with most members
1353
- labels, counts = np.unique(kmeans.labels_, return_counts=True)
1354
- dominant_idx = labels[np.argmax(counts)]
1355
- dominant_color = kmeans.cluster_centers_[dominant_idx].astype(int)
1356
-
1357
- r, g, b = dominant_color
1358
- brightness = (int(r) + int(g) + int(b)) / 3
1359
-
1360
- # Classify hair color
1361
- hsv_color = cv2.cvtColor(
1362
- np.array([[dominant_color]], dtype=np.uint8), cv2.COLOR_RGB2HSV
1363
- )[0][0]
1364
- hue, sat, val = int(hsv_color[0]), int(hsv_color[1]), int(hsv_color[2])
1365
-
1366
- if brightness < 40:
1367
- results["hair_color_detected"] = "black"
1368
- elif brightness > 190:
1369
- results["hair_color_detected"] = "platinum_blonde"
1370
- elif brightness > 160 and sat < 50:
1371
- results["hair_color_detected"] = "gray"
1372
- elif brightness > 140 and (hue > 15 and hue < 35):
1373
- results["hair_color_detected"] = "blonde"
1374
- elif (hue < 15 or hue > 160) and sat > 80:
1375
- results["hair_color_detected"] = "red"
1376
- elif brightness > 60:
1377
- results["hair_color_detected"] = "brown"
1378
- else:
1379
- results["hair_color_detected"] = "dark_brown"
1380
-
1381
- results["hair_dominant_rgb"] = [int(r), int(g), int(b)]
1382
-
1383
- # Hair texture analysis (FFT-based)
1384
- hair_region = cv2.bitwise_and(
1385
- img_rgb,
1386
- img_rgb,
1387
- mask=hair_mask,
1388
- )
1389
- gray_hair = cv2.cvtColor(hair_region, cv2.COLOR_RGB2GRAY)
1390
- # Mask out non-hair regions
1391
- gray_hair_masked = gray_hair[hair_mask > 0]
1392
-
1393
- if len(gray_hair_masked) > 1000:
1394
- # Compute local variance as texture indicator
1395
- # High frequency = curly, low frequency = straight
1396
- hair_patch = gray_hair_masked[:1024].astype(np.float32)
1397
- fft = np.fft.fft(hair_patch)
1398
- magnitude = np.abs(fft)
1399
- # Ratio of high freq to low freq energy
1400
- low_freq = magnitude[:len(magnitude) // 4].sum()
1401
- high_freq = magnitude[len(magnitude) // 4:].sum()
1402
- freq_ratio = high_freq / (low_freq + 1e-6)
1403
-
1404
- if freq_ratio > 0.8:
1405
- results["hair_texture_detected"] = "curly"
1406
- elif freq_ratio > 0.5:
1407
- results["hair_texture_detected"] = "wavy"
1408
- else:
1409
- results["hair_texture_detected"] = "straight"
1410
-
1411
- # === Eye Color ===
1412
- if landmark_data is not None and len(landmark_data) > 473:
1413
- for eye_name, iris_idx in [("left", 468), ("right", 473)]:
1414
- ix = int(landmark_data[iris_idx]["x"] * w)
1415
- iy = int(landmark_data[iris_idx]["y"] * h)
1416
-
1417
- # Sample a small patch around iris
1418
- pad = 3
1419
- y1 = max(0, iy - pad)
1420
- y2 = min(h, iy + pad)
1421
- x1 = max(0, ix - pad)
1422
- x2 = min(w, ix + pad)
1423
-
1424
- iris_patch = img_rgb[y1:y2, x1:x2]
1425
- if iris_patch.size == 0:
1426
- continue
1427
-
1428
- avg_color = iris_patch.mean(axis=(0, 1))
1429
- r, g, b = avg_color
1430
-
1431
- # Convert to HSV for better classification
1432
- hsv = cv2.cvtColor(
1433
- np.array([[avg_color]], dtype=np.uint8), cv2.COLOR_RGB2HSV
1434
- )[0][0]
1435
- hue_val, sat_val, val_val = int(hsv[0]), int(hsv[1]), int(hsv[2])
1436
-
1437
- if val_val < 60:
1438
- color = "dark_brown"
1439
- elif sat_val < 30:
1440
- color = "gray"
1441
- elif hue_val > 100 and hue_val < 130 and sat_val > 50:
1442
- color = "blue"
1443
- elif hue_val > 35 and hue_val < 85 and sat_val > 40:
1444
- color = "green"
1445
- elif (hue_val > 15 and hue_val < 35) and sat_val > 40:
1446
- color = "hazel"
1447
- elif val_val > 120 and sat_val > 60:
1448
- color = "amber"
1449
- else:
1450
- color = "brown"
1451
-
1452
- results[f"{eye_name}_eye_color"] = color
1453
-
1454
- # Consensus
1455
- if "left_eye_color" in results and "right_eye_color" in results:
1456
- if results["left_eye_color"] == results["right_eye_color"]:
1457
- results["eye_color"] = results["left_eye_color"]
1458
- else:
1459
- results["eye_color"] = results["left_eye_color"] # Use left as primary
1460
- results["heterochromia"] = True
1461
-
1462
- return results
1463
- ```
1464
-
1465
- #### face-service/Dockerfile
1466
-
1467
- ```dockerfile
1468
- FROM python:3.11-slim
1469
-
1470
- WORKDIR /app
1471
-
1472
- # Install system dependencies for OpenCV
1473
- RUN apt-get update && apt-get install -y \
1474
- libgl1-mesa-glx \
1475
- libglib2.0-0 \
1476
- curl \
1477
- && rm -rf /var/lib/apt/lists/*
1478
-
1479
- COPY requirements.txt .
1480
- RUN pip install --no-cache-dir -r requirements.txt
1481
-
1482
- COPY . .
1483
-
1484
- # Download MediaPipe model at build time
1485
- RUN python -c "from analyzers.landmark_analyzer import LandmarkAnalyzer; LandmarkAnalyzer()"
1486
-
1487
- EXPOSE 8000
1488
-
1489
- CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
1490
- ```
1491
-
1492
- ### Connect Your Next.js App to the Microservice
1493
-
1494
- #### lib/face-analysis/api-client.ts
1495
-
1496
- ```typescript
1497
- /**
1498
- * Client for the Python face analysis microservice.
1499
- * Replaces the Supabase Edge Function that called AWS Rekognition.
1500
- */
1501
-
1502
- const FACE_SERVICE_URL = process.env.NEXT_PUBLIC_FACE_SERVICE_URL || "http://localhost:8000";
1503
-
1504
- export interface FaceAnalysisResult {
1505
- // Geometric (MediaPipe)
1506
- face_shape: string;
1507
- face_shape_metrics: Record<string, number>;
1508
- forehead_width: string;
1509
- jawline_type: string;
1510
- jawline_angle: number;
1511
- chin_type: string;
1512
- cheekbone_prominence: string;
1513
- cheek_fullness: string;
1514
- eye_shape: string;
1515
- eye_depth: string;
1516
- eye_spacing: string;
1517
- eye_size: string;
1518
- eyebrow_shape: string;
1519
- eyebrow_arch_height: string;
1520
- eyebrow_thickness: string;
1521
- possible_unibrow: boolean;
1522
- nose_shape: string;
1523
- nose_bridge: string;
1524
- nose_tip_shape: string;
1525
- nostril_width: string;
1526
- lip_fullness: string;
1527
- lip_balance: string;
1528
- mouth_width: string;
1529
- cupids_bow: string;
1530
- smiling: boolean;
1531
- smile_asymmetry: number;
1532
- possible_dimples: boolean;
1533
- facial_asymmetry_score: number;
1534
-
1535
- // Demographics (FairFace)
1536
- age_estimate: number;
1537
- age_range: string;
1538
- age_confidence: number;
1539
- gender: string;
1540
- gender_confidence: number;
1541
- race: string;
1542
- race_confidence: number;
1543
- race_probabilities: Record<string, number>;
1544
-
1545
- // CelebA Attributes
1546
- facial_hair: string;
1547
- wearing_glasses: boolean;
1548
- bald: boolean;
1549
- receding_hairline: boolean;
1550
- hair_color_celeba: string;
1551
- hair_type_celeba: string;
1552
- bags_under_eyes: boolean;
1553
- double_chin: boolean;
1554
- bushy_eyebrows: boolean;
1555
- high_cheekbones_celeba: boolean;
1556
-
1557
- // Emotion (HSEmotion)
1558
- emotion: string;
1559
- emotion_confidence: number;
1560
- emotion_probabilities: Record<string, number>;
1561
-
1562
- // Color Analysis
1563
- skin_tone: string;
1564
- skin_tone_score: number;
1565
- fitzpatrick_type: string;
1566
- eye_color: string;
1567
- hair_color_detected: string;
1568
- hair_dominant_rgb: number[];
1569
- hair_texture_detected: string;
1570
-
1571
- // Parsing
1572
- hair_length_estimate: string;
1573
- forehead_wrinkles: string;
1574
- possible_freckles_moles: boolean;
1575
- dark_spot_count: number;
1576
-
1577
- // Blendshapes
1578
- blendshapes: Record<string, number>;
1579
- }
1580
-
1581
- export async function analyzeFace(imageFile: File): Promise<FaceAnalysisResult> {
1582
- const formData = new FormData();
1583
- formData.append("file", imageFile);
1584
-
1585
- const response = await fetch(`${FACE_SERVICE_URL}/analyze`, {
1586
- method: "POST",
1587
- body: formData,
1588
- });
1589
-
1590
- if (!response.ok) {
1591
- const error = await response.json().catch(() => ({ detail: "Unknown error" }));
1592
- throw new Error(`Face analysis failed: ${error.detail}`);
1593
- }
1594
-
1595
- const result = await response.json();
1596
-
1597
- if (!result.success) {
1598
- throw new Error("Face analysis returned unsuccessful result");
1599
- }
1600
-
1601
- return result.data;
1602
- }
1603
-
1604
- export async function checkServiceHealth(): Promise<boolean> {
1605
- try {
1606
- const response = await fetch(`${FACE_SERVICE_URL}/health`);
1607
- return response.ok;
1608
- } catch {
1609
- return false;
1610
- }
1611
- }
1612
- ```
1613
-
1614
- ### Deploy to Hugging Face Spaces (Free)
1615
-
1616
- Create a `README.md` in the `face-service/` directory with the following frontmatter:
1617
-
1618
- ```yaml
1619
- ---
1620
- title: HCP Face Analysis
1621
- emoji: 🔍
1622
- colorFrom: blue
1623
- colorTo: purple
1624
- sdk: docker
1625
- app_port: 8000
1626
- ---
1627
- ```
1628
-
1629
- ---
1630
-
1631
- ## Final Architecture Summary
1632
-
1633
- ```
1634
- Browser (Next.js)
1635
-
1636
- │ POST /analyze (image file)
1637
-
1638
- Hugging Face Spaces (FREE, 2GB RAM)
1639
- ├── FastAPI Server
1640
- ├── MediaPipe (4MB) ──────► 478 landmarks → ~40 geometric features
1641
- ├── FairFace (90MB) ──────► age, gender, race
1642
- ├── CelebA ResNet (44MB) ─► 40 binary attributes (hair, beard, glasses...)
1643
- ├── BiSeNet (50MB) ───────► face parsing → hair/skin segmentation
1644
- ├── HSEmotion (20MB) ─────► 8 emotions
1645
- └── Color Analysis ───────► skin tone, eye color, hair color
1646
-
1647
- │ JSON response (~150 attributes)
1648
-
1649
- Supabase (existing)
1650
- ├── Store results in PostgreSQL
1651
- └── Auth / Storage unchanged
1652
- ```
1653
-
1654
- | Metric | Value |
1655
- |--------|-------|
1656
- | **Total models** | ~210MB |
1657
- | **Features detected** | **~95% of the full feature list** |
1658
- | **Hosting cost** | **$0** (HF Spaces free tier) |
1659
- | **Latency** | ~2-4s per image (CPU) |
1660
- | **Languages** | Python (microservice) + TypeScript (existing Next.js) |
1661
- | **Only missing** | Teeth analysis, scar detection, Adam's apple (require specialized fine-tuned models) |
1662
-
1663
- ---
1664
-
1665
- ## Required Feature List
1666
-
1667
- ### Face shape
1668
- - Oval face, Round face, Square face, Heart-shaped face, Diamond face, Long/oblong face, Triangle face
1669
- - Jawline sharp, Jawline soft, Strong jaw, Receding chin, Pointed chin, Cleft chin, Wide chin
1670
- - High cheekbones, Flat cheekbones, Full cheeks, Hollow cheeks
1671
- - Broad forehead, Narrow forehead
1672
-
1673
- ### Eye shape
1674
- - Almond, Round, Hooded, Monolid, Deep-set eyes, Protruding eyes
1675
- - Upturned eyes, Downturned eyes, Wide-set eyes, Close-set eyes, Large eyes, Small eyes
1676
- - Eye color: brown, blue, green, hazel
1677
- - Dark under-eyes, Eye bags, Crow's feet
1678
-
1679
- ### Eyebrows
1680
- - Thick, Thin, Arched, Straight, Bushy, Unibrow
1681
- - High eyebrow arch, Low eyebrow arch
1682
-
1683
- ### Nose
1684
- - Straight, Aquiline, Button, Upturned, Wide, Narrow
1685
- - Flat bridge, High bridge, Wide nostrils, Narrow nostrils
1686
- - Rounded tip, Pointed tip
1687
-
1688
- ### Lips & Mouth
1689
- - Full, Thin, Wide mouth, Small mouth
1690
- - Defined cupid's bow, Uneven lips
1691
- - Gap teeth, Crooked teeth, Straight teeth, Overbite, Underbite
1692
- - Dimples, Smile lines, Asymmetrical smile
1693
-
1694
- ### Hair
1695
- - Straight, Wavy, Curly, Coily
1696
- - Short, Long, Bald, Receding hairline, Widow's peak
1697
- - Thick, Thin
1698
- - Color: black, brown, blonde, red, gray, dyed
1699
-
1700
- ### Facial hair
1701
- - Full beard, Stubble, Goatee, Mustache, Clean-shaven, Sideburns
1702
-
1703
- ### Skin & Other
1704
- - Skin tone: light, medium, dark
1705
- - Freckles, Moles, Birthmark, Scar, Acne
1706
- - Wrinkles, Forehead lines, Smile lines
1707
- - Facial asymmetry, Prominent Adam's apple
 
1
+ # HCP Face Analysis — Architecture
2
+
3
+ ## Pipeline
4
+
5
+ A single photo is fed through seven analyzers. Their outputs are merged
6
+ into one dictionary; later analyzers overwrite any colliding keys from
7
+ earlier ones.
8
+
9
+ ```
10
+ Photo (RGB ndarray)
11
+
12
+ ► [1] MediaPipe Face Landmarker
13
+ 478 landmarks + 52 blendshapes
14
+ │ → all geometric features (face/eye/nose/eyebrow/lip/jaw shape),
15
+ │ smiling (mouthSmile blendshapes), eyes_open, possible_dimples,
16
+ possible_unibrow, facial_asymmetry_score, blendshapes dict
17
+
18
+ ► [2] FairFace + Ethnicity ViT (DemographicAnalyzer)
19
+ age_range, age_estimate (softmax-weighted continuous), age_confidence,
20
+ │ gender + confidence, ethnicity + confidence, full distributions
21
+
22
+ ├─► [3] SegFormer-B5 human parsing (ParsingAnalyzer)
23
+ → per-class pixel masks (face, hair, hat, …)
24
+ │ → hair_length, hair_present, hat_detected,
25
+ │ wrinkle_level, skin_texture_score, skin_uniformity, freckles_or_moles
26
+ │ (uses OpenCV stats over the SegFormer face mask for the skin rows)
27
+
28
+ ├─ [4] HSEmotion EfficientNet-B0 (EmotionAnalyzer)
29
+ │ → primary/secondary emotion, emotion_scores (8 classes),
30
+ valence, arousal, mood
31
+
32
+ ► [5] ColorAnalyzer (no ML — OpenCV LAB/HSV)
33
+ │ inputs: SegFormer skin/hair masks + MediaPipe landmarks
34
+ │ → skin_tone (Fitzpatrick + L*/a*/b* + hex), skin_undertone,
35
+ │ eye_color, hair_color (name + hex), hair_texture (pixel-Laplacian, coarse),
36
+ │ lip_color (shade + hex) ← lip mask built from MediaPipe outer-minus-inner lip
37
+
38
+ ├─► [6] ObstructionViT — dima806/face_obstruction_image_detection
39
+ │ → wearing_glasses, wearing_sunglasses, wearing_mask,
40
+ │ obstruction_top, obstruction_scores
41
+
42
+ └─► [7] HairTypeViT — dima806/hair_type_image_detection
43
+ hair_type (curly/dreadlocks/kinky/straight/wavy),
44
+ hair_type_confidence, hair_type_scores
45
+ ```
46
+
47
+ All masks and other internal fields use a leading underscore in the key
48
+ (e.g. `_skin_mask`). `app.py` strips those before returning JSON so the
49
+ client never sees them.
50
+
51
+ ## Attribute source map
52
+
53
+ The EditProfileScreen renders only fields backed by one of these
54
+ analyzers. Anything previously fed by the FaRL zero-shot classifier
55
+ has been removed because its outputs were too noisy to trust.
56
+
57
+ | Section | Field(s) | Source |
58
+ |---|---|---|
59
+ | Demographics | gender, age (continuous), age_range, ethnicity, distributions | FairFace + Ethnicity ViT |
60
+ | Emotion | primary/secondary emotion, scores, valence, arousal, mood | HSEmotion |
61
+ | Face Structure | face_shape (+ 4 ratios), jawline_type/angle, chin_type, cheekbone_prominence, cheek_fullness, forehead_width, facial_asymmetry_score | MediaPipe |
62
+ | Hair | hair_length, hair_present | SegFormer |
63
+ | Hair | hair_type (+ confidence) | HairTypeViT |
64
+ | Hair | hair_color, hair hex | ColorAnalyzer |
65
+ | Eyes | eye_shape, eye_depth, eye_spacing, eye_size, eyes_open | MediaPipe |
66
+ | Eyes | eye_color | ColorAnalyzer |
67
+ | Eyebrows | eyebrow_shape, eyebrow_arch_height, eyebrow_thickness, possible_unibrow | MediaPipe |
68
+ | Nose | nose_shape, nose_bridge, nose_tip_shape, nostril_width | MediaPipe |
69
+ | Lips & Mouth | lip_fullness, lip_balance, mouth_width, cupids_bow, smile_asymmetry, possible_dimples, smiling, mouth_open | MediaPipe (last two via blendshapes) |
70
+ | Lips & Mouth | lip_color (shade + hex) | ColorAnalyzer (mask from MediaPipe) |
71
+ | Skin | skin_tone (Fitzpatrick, L*/a*/b*, hex), skin_undertone | ColorAnalyzer |
72
+ | Skin | wrinkle_level, skin_texture_score, skin_uniformity, freckles_or_moles | SegFormer mask + OpenCV stats |
73
+ | Accessories | wearing_glasses, wearing_sunglasses, wearing_mask | ObstructionViT |
74
+ | Accessories | wearing_hat | SegFormer (hat class coverage) |
75
+
76
+ ## Deployment
77
+
78
+ The service is built as a Docker image targeting Hugging Face Spaces
79
+ free tier (2GB RAM, shared CPU). The MediaPipe `.task` is pulled at
80
+ build time; all Hugging Face models lazy-download on first inference
81
+ and cache under `/root/.cache/huggingface` inside the container.
82
+
83
+ The Node/Express server forwards `/analyze-face` requests to
84
+ `FACE_SERVICE_URL/analyze-base64`. The React Native client never talks
85
+ to this service directly.
86
+
87
+ ## Adding a new analyzer
88
+
89
+ 1. Drop a new module under `analyzers/` exposing a class with
90
+ `__init__()` and `analyze(img_rgb) -> dict`.
91
+ 2. Import it in `app.py`, add a global slot and a lazy-load block in
92
+ `get_analyzers()`, and append a `results.update(...)` call to both
93
+ `/analyze` and `/analyze-base64`.
94
+ 3. Surface the new keys in `client/src/screens/EditProfileScreen.js`
95
+ and add a legend row in the "Analysis Method Details" section.
96
+
97
+ Order matters: later analyzers overwrite earlier keys on collision.
98
+ The specialized ViT classifiers run last so they win over any coarser
99
+ signal.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
requirements.txt CHANGED
@@ -13,6 +13,3 @@ timm==1.0.3
13
  safetensors>=0.6.0
14
  transformers==4.45.2
15
  hsemotion>=0.2.2
16
- openai-clip==1.0.1
17
- ftfy
18
- regex
 
13
  safetensors>=0.6.0
14
  transformers==4.45.2
15
  hsemotion>=0.2.2