Spaces:
Sleeping
Sleeping
Evan Li commited on
Commit ·
ee3a08a
1
Parent(s): dfb09f4
Relabeling, discarding CLIP, replacing attributes where they can with new models or mediapipe
Browse files- Dockerfile +4 -5
- README.md +30 -19
- analyzers/__init__.py +8 -0
- analyzers/attribute_analyzer.py +0 -194
- analyzers/color_analyzer.py +128 -75
- analyzers/demographic_analyzer.py +105 -30
- analyzers/emotion_analyzer.py +61 -30
- analyzers/hair_type_analyzer.py +87 -0
- analyzers/landmark_analyzer.py +113 -28
- analyzers/obstruction_analyzer.py +108 -0
- analyzers/parsing_analyzer.py +78 -35
- app.py +124 -53
- architecture.md +99 -1707
- requirements.txt +0 -3
Dockerfile
CHANGED
|
@@ -13,15 +13,14 @@ WORKDIR /app
|
|
| 13 |
COPY requirements.txt .
|
| 14 |
RUN pip install --no-cache-dir -r requirements.txt
|
| 15 |
|
| 16 |
-
# Pre-download MediaPipe model at build time so first request is fast
|
|
|
|
|
|
|
|
|
|
| 17 |
RUN mkdir -p models && \
|
| 18 |
wget -q -O models/face_landmarker.task \
|
| 19 |
"https://storage.googleapis.com/mediapipe-models/face_landmarker/face_landmarker/float16/latest/face_landmarker.task"
|
| 20 |
|
| 21 |
-
# Pre-download FaRL (face-tuned CLIP ViT-B/16) weights for attribute classifier
|
| 22 |
-
RUN wget -q -O models/FaRL-Base-Patch16-LAIONFace20M-ep64.pth \
|
| 23 |
-
"https://github.com/FacePerceiver/FaRL/releases/download/pretrained_weights/FaRL-Base-Patch16-LAIONFace20M-ep64.pth"
|
| 24 |
-
|
| 25 |
COPY . .
|
| 26 |
|
| 27 |
EXPOSE 7860
|
|
|
|
| 13 |
COPY requirements.txt .
|
| 14 |
RUN pip install --no-cache-dir -r requirements.txt
|
| 15 |
|
| 16 |
+
# Pre-download MediaPipe model at build time so first request is fast.
|
| 17 |
+
# All other models (FairFace, SegFormer, HSEmotion, ObstructionViT,
|
| 18 |
+
# HairTypeViT) are pulled from Hugging Face on first request and cached
|
| 19 |
+
# in /root/.cache/huggingface for the rest of the process lifetime.
|
| 20 |
RUN mkdir -p models && \
|
| 21 |
wget -q -O models/face_landmarker.task \
|
| 22 |
"https://storage.googleapis.com/mediapipe-models/face_landmarker/face_landmarker/float16/latest/face_landmarker.task"
|
| 23 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
COPY . .
|
| 25 |
|
| 26 |
EXPOSE 7860
|
README.md
CHANGED
|
@@ -10,25 +10,33 @@ pinned: false
|
|
| 10 |
|
| 11 |
# HCP Face Analysis Microservice
|
| 12 |
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
## Models
|
| 17 |
-
|
| 18 |
-
| Model | Task | Size |
|
| 19 |
-
|-------|------|------|
|
| 20 |
-
| MediaPipe Face Landmarker | 478 3D landmarks + blendshapes | ~4 MB |
|
| 21 |
-
| FairFace
|
| 22 |
-
|
|
| 23 |
-
|
|
| 24 |
-
|
|
| 25 |
-
|
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
-
|
| 31 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
|
| 33 |
## Usage
|
| 34 |
|
|
@@ -37,3 +45,6 @@ curl -X POST https://YOUR-SPACE.hf.space/analyze-base64 \
|
|
| 37 |
-H "Content-Type: application/json" \
|
| 38 |
-d '{"image": "<base64-encoded-image>"}'
|
| 39 |
```
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
|
| 11 |
# HCP Face Analysis Microservice
|
| 12 |
|
| 13 |
+
FastAPI service that runs seven specialized analyzers over a single photo
|
| 14 |
+
and returns a merged dictionary of ~100 facial attributes.
|
| 15 |
+
|
| 16 |
+
## Models
|
| 17 |
+
|
| 18 |
+
| # | Component | Model | Task | Size |
|
| 19 |
+
|---|-----------|-------|------|------|
|
| 20 |
+
| 1 | MediaPipe Face Landmarker | `face_landmarker.task` (Google) | 478 3D landmarks + 52 ARKit blendshapes — geometric features, smiling, mouth-open | ~4 MB |
|
| 21 |
+
| 2 | FairFace age | `dima806/fairface_age_image_detection` (ViT-B/16) | 9-bucket age → softmax-weighted continuous estimate | ~340 MB |
|
| 22 |
+
| 2 | FairFace gender | `dima806/fairface_gender_image_detection` (ViT-B/16) | Binary gender (~93.4% acc) | ~340 MB |
|
| 23 |
+
| 2 | Ethnicity | `cledoux42/Ethnicity_Test_v003` (ViT) | 5-class ethnicity (~79.6% acc) | ~340 MB |
|
| 24 |
+
| 3 | Human parsing | `matei-dorian/segformer-b5-finetuned-human-parsing` | 18-class pixel segmentation → masks + hair length + hat | ~340 MB |
|
| 25 |
+
| 4 | Emotion | HSEmotion `enet_b0_8_best_afew` (EfficientNet-B0) | 8-class emotion + valence/arousal | ~20 MB |
|
| 26 |
+
| 5 | Color analysis | (no model — OpenCV LAB/HSV) | Skin tone, hair color, eye color, lip color | 0 MB |
|
| 27 |
+
| 6 | Obstruction | `dima806/face_obstruction_image_detection` (ViT-B/16) | glasses / sunglasses / mask (~99% precision) | ~340 MB |
|
| 28 |
+
| 7 | Hair type | `dima806/hair_type_image_detection` (ViT-B/16) | curly/dreadlocks/kinky/straight/wavy (~93% acc) | ~340 MB |
|
| 29 |
+
|
| 30 |
+
All analyzers are lazy-loaded on first request. The MediaPipe weight
|
| 31 |
+
file is pre-downloaded at Docker build time; all Hugging Face models
|
| 32 |
+
are cached on first inference.
|
| 33 |
+
|
| 34 |
+
## API endpoints
|
| 35 |
+
|
| 36 |
+
- `GET /` — service info
|
| 37 |
+
- `GET /health` — liveness check
|
| 38 |
+
- `POST /analyze` — multipart file upload
|
| 39 |
+
- `POST /analyze-base64` — JSON `{ "image": "<base64>" }`
|
| 40 |
|
| 41 |
## Usage
|
| 42 |
|
|
|
|
| 45 |
-H "Content-Type: application/json" \
|
| 46 |
-d '{"image": "<base64-encoded-image>"}'
|
| 47 |
```
|
| 48 |
+
|
| 49 |
+
See [architecture.md](./architecture.md) for the pipeline diagram and the
|
| 50 |
+
full per-attribute model attribution table.
|
analyzers/__init__.py
CHANGED
|
@@ -1 +1,9 @@
|
|
| 1 |
# face-service analyzers package
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# face-service analyzers package
|
| 2 |
+
#
|
| 3 |
+
# Each analyzer in this package exposes a class with:
|
| 4 |
+
# __init__(self) — load model, register device
|
| 5 |
+
# analyze(self, img_rgb) -> dict — run inference, return attribute dict
|
| 6 |
+
#
|
| 7 |
+
# Analyzers are independent: they don't import from each other. Cross-
|
| 8 |
+
# analyzer plumbing (passing SegFormer masks into ColorAnalyzer, etc.)
|
| 9 |
+
# is orchestrated entirely in app.py.
|
analyzers/attribute_analyzer.py
DELETED
|
@@ -1,194 +0,0 @@
|
|
| 1 |
-
"""
|
| 2 |
-
FaRL-based facial attribute classification.
|
| 3 |
-
|
| 4 |
-
Same CLIP ViT-B/16 architecture as before, but loaded with FaRL weights
|
| 5 |
-
(CVPR 2022) which were pretrained on LAION-Face — the 50M face-text-pair
|
| 6 |
-
subset of LAION-400M — instead of OpenAI's generic web crawl. The encoder
|
| 7 |
-
discriminates facial attributes much better while keeping the prompt-pair
|
| 8 |
-
zero-shot interface intact.
|
| 9 |
-
|
| 10 |
-
Falls back to vanilla OpenAI CLIP ViT-B/16 if the FaRL .pth is missing.
|
| 11 |
-
"""
|
| 12 |
-
|
| 13 |
-
import os
|
| 14 |
-
from pathlib import Path
|
| 15 |
-
from typing import Any
|
| 16 |
-
|
| 17 |
-
import clip
|
| 18 |
-
import torch
|
| 19 |
-
from PIL import Image
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
CLIP_ARCH = "ViT-B/16"
|
| 23 |
-
FARL_WEIGHTS_PATH = os.environ.get(
|
| 24 |
-
"FARL_WEIGHTS_PATH", "models/FaRL-Base-Patch16-LAIONFace20M-ep64.pth"
|
| 25 |
-
)
|
| 26 |
-
|
| 27 |
-
PAIRS = {
|
| 28 |
-
"wearing_glasses": ("wearing eyeglasses", "not wearing eyeglasses"),
|
| 29 |
-
"wearing_hat": ("wearing a hat", "not wearing a hat"),
|
| 30 |
-
"has_beard": ("has a beard", "does not have a beard"),
|
| 31 |
-
"mustache": ("has a mustache", "does not have a mustache"),
|
| 32 |
-
"goatee": ("has a goatee", "does not have a goatee"),
|
| 33 |
-
"sideburns": ("has sideburns", "does not have sideburns"),
|
| 34 |
-
"has_bangs": ("has bangs", "does not have bangs"),
|
| 35 |
-
"is_bald": ("is bald", "has hair"),
|
| 36 |
-
"receding_hairline": ("has a receding hairline", "has a full hairline"),
|
| 37 |
-
"wearing_earrings": ("wearing earrings", "not wearing earrings"),
|
| 38 |
-
"wearing_necklace": ("wearing a necklace", "not wearing a necklace"),
|
| 39 |
-
"wearing_necktie": ("wearing a necktie", "not wearing a necktie"),
|
| 40 |
-
"heavy_makeup": ("wearing heavy makeup", "not wearing makeup"),
|
| 41 |
-
"wearing_lipstick": ("wearing lipstick", "not wearing lipstick"),
|
| 42 |
-
"big_nose": ("has a big nose", "has a small nose"),
|
| 43 |
-
"pointy_nose": ("has a pointy nose", "has a rounded nose"),
|
| 44 |
-
"big_lips": ("has big lips", "has thin lips"),
|
| 45 |
-
"high_cheekbones": ("has high cheekbones", "has low cheekbones"),
|
| 46 |
-
"oval_face_celeba": ("has an oval face", "has a non-oval face"),
|
| 47 |
-
"double_chin": ("has a double chin", "does not have a double chin"),
|
| 48 |
-
"chubby": ("has a chubby face", "has a slim face"),
|
| 49 |
-
"rosy_cheeks": ("has rosy cheeks", "does not have rosy cheeks"),
|
| 50 |
-
"bags_under_eyes": ("has bags under the eyes", "does not have bags under the eyes"),
|
| 51 |
-
"narrow_eyes": ("has narrow eyes", "has wide eyes"),
|
| 52 |
-
"arched_eyebrows": ("has arched eyebrows", "has straight eyebrows"),
|
| 53 |
-
"bushy_eyebrows": ("has bushy eyebrows", "has thin eyebrows"),
|
| 54 |
-
"pale_skin": ("has pale skin", "has medium skin"),
|
| 55 |
-
"attractive": ("an attractive face", "an ordinary face"),
|
| 56 |
-
"young": ("a young person", "an older person"),
|
| 57 |
-
"smiling_celeba": ("smiling", "not smiling"),
|
| 58 |
-
"mouth_open": ("mouth open", "mouth closed"),
|
| 59 |
-
}
|
| 60 |
-
|
| 61 |
-
HAIR_COLOR_LABELS = ["black hair", "blond hair", "brown hair", "gray hair"]
|
| 62 |
-
HAIR_TEXTURE_LABELS = ["straight hair", "wavy hair", "curly hair"]
|
| 63 |
-
|
| 64 |
-
ACCESSORY_THRESHOLD = 0.65
|
| 65 |
-
ACCESSORY_KEYS = {
|
| 66 |
-
"wearing_earrings", "wearing_necklace", "wearing_necktie", "wearing_hat",
|
| 67 |
-
"heavy_makeup", "wearing_lipstick",
|
| 68 |
-
}
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
def _prompt(text: str) -> str:
|
| 72 |
-
return f"a photo of {text}"
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
class AttributeAnalyzer:
|
| 76 |
-
def __init__(self):
|
| 77 |
-
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
| 78 |
-
self.model = None
|
| 79 |
-
self.preprocess = None
|
| 80 |
-
try:
|
| 81 |
-
model, preprocess = clip.load(CLIP_ARCH, device="cpu")
|
| 82 |
-
|
| 83 |
-
weights_path = Path(FARL_WEIGHTS_PATH)
|
| 84 |
-
if weights_path.exists():
|
| 85 |
-
farl_state = torch.load(weights_path, map_location="cpu")
|
| 86 |
-
state = farl_state.get("state_dict", farl_state)
|
| 87 |
-
missing, unexpected = model.load_state_dict(state, strict=False)
|
| 88 |
-
print(
|
| 89 |
-
f"[AttributeAnalyzer] Loaded FaRL weights from {weights_path} "
|
| 90 |
-
f"(missing={len(missing)}, unexpected={len(unexpected)})"
|
| 91 |
-
)
|
| 92 |
-
else:
|
| 93 |
-
print(
|
| 94 |
-
f"[AttributeAnalyzer] FaRL weights not found at {weights_path}; "
|
| 95 |
-
"falling back to vanilla OpenAI CLIP ViT-B/16"
|
| 96 |
-
)
|
| 97 |
-
|
| 98 |
-
# Force float32 so per-pair softmax math is stable on both CPU and CUDA.
|
| 99 |
-
self.model = model.float().to(self.device).eval()
|
| 100 |
-
self.preprocess = preprocess
|
| 101 |
-
except Exception as exc:
|
| 102 |
-
print(f"[AttributeAnalyzer] Failed to load model: {exc}")
|
| 103 |
-
|
| 104 |
-
@torch.no_grad()
|
| 105 |
-
def analyze(self, img_rgb) -> dict[str, Any]:
|
| 106 |
-
if self.model is None or self.preprocess is None:
|
| 107 |
-
return self._empty_result()
|
| 108 |
-
|
| 109 |
-
pil = Image.fromarray(img_rgb)
|
| 110 |
-
image_tensor = self.preprocess(pil).unsqueeze(0).to(self.device)
|
| 111 |
-
image_features = self.model.encode_image(image_tensor)
|
| 112 |
-
image_features = image_features / image_features.norm(dim=-1, keepdim=True)
|
| 113 |
-
|
| 114 |
-
pair_scores: dict[str, float] = {}
|
| 115 |
-
for key, (positive, negative) in PAIRS.items():
|
| 116 |
-
pair_scores[key] = self._softmax_positive(
|
| 117 |
-
image_features, [_prompt(positive), _prompt(negative)]
|
| 118 |
-
)
|
| 119 |
-
|
| 120 |
-
color_scores = self._group_softmax(
|
| 121 |
-
image_features, [_prompt(x) for x in HAIR_COLOR_LABELS]
|
| 122 |
-
)
|
| 123 |
-
texture_scores = self._group_softmax(
|
| 124 |
-
image_features, [_prompt(x) for x in HAIR_TEXTURE_LABELS]
|
| 125 |
-
)
|
| 126 |
-
|
| 127 |
-
hair_color_name = HAIR_COLOR_LABELS[int(torch.argmax(torch.tensor(color_scores)))].split()[0]
|
| 128 |
-
hair_texture_name = HAIR_TEXTURE_LABELS[int(torch.argmax(torch.tensor(texture_scores)))].split()[0]
|
| 129 |
-
|
| 130 |
-
def flag(key: str) -> bool:
|
| 131 |
-
score = pair_scores.get(key, 0.0)
|
| 132 |
-
threshold = ACCESSORY_THRESHOLD if key in ACCESSORY_KEYS else 0.5
|
| 133 |
-
return score >= threshold
|
| 134 |
-
|
| 135 |
-
result: dict[str, Any] = {
|
| 136 |
-
"_celeba_raw": {k: round(v, 3) for k, v in pair_scores.items()},
|
| 137 |
-
"hair_color_celeba": hair_color_name,
|
| 138 |
-
"hair_color_scores": {
|
| 139 |
-
label.split()[0]: round(float(score), 3)
|
| 140 |
-
for label, score in zip(HAIR_COLOR_LABELS, color_scores)
|
| 141 |
-
},
|
| 142 |
-
"hair_texture_celeba": hair_texture_name,
|
| 143 |
-
}
|
| 144 |
-
|
| 145 |
-
for key in PAIRS:
|
| 146 |
-
result[key] = flag(key)
|
| 147 |
-
|
| 148 |
-
beard_score = pair_scores.get("has_beard", 0.0)
|
| 149 |
-
result["facial_hair"] = {
|
| 150 |
-
"5_o_clock_shadow": 0.45 < beard_score < 0.7,
|
| 151 |
-
"goatee": flag("goatee"),
|
| 152 |
-
"mustache": flag("mustache"),
|
| 153 |
-
"sideburns": flag("sideburns"),
|
| 154 |
-
"full_beard": beard_score > 0.7,
|
| 155 |
-
}
|
| 156 |
-
|
| 157 |
-
return result
|
| 158 |
-
|
| 159 |
-
@torch.no_grad()
|
| 160 |
-
def _softmax_positive(self, image_features: torch.Tensor, prompts: list[str]) -> float:
|
| 161 |
-
text_tokens = clip.tokenize(prompts).to(self.device)
|
| 162 |
-
text_features = self.model.encode_text(text_tokens)
|
| 163 |
-
text_features = text_features / text_features.norm(dim=-1, keepdim=True)
|
| 164 |
-
logits = (image_features @ text_features.T) * self.model.logit_scale.exp()
|
| 165 |
-
probs = torch.softmax(logits, dim=-1)[0]
|
| 166 |
-
return float(probs[0])
|
| 167 |
-
|
| 168 |
-
@torch.no_grad()
|
| 169 |
-
def _group_softmax(self, image_features: torch.Tensor, prompts: list[str]) -> list[float]:
|
| 170 |
-
text_tokens = clip.tokenize(prompts).to(self.device)
|
| 171 |
-
text_features = self.model.encode_text(text_tokens)
|
| 172 |
-
text_features = text_features / text_features.norm(dim=-1, keepdim=True)
|
| 173 |
-
logits = (image_features @ text_features.T) * self.model.logit_scale.exp()
|
| 174 |
-
probs = torch.softmax(logits, dim=-1)[0]
|
| 175 |
-
return [float(p) for p in probs]
|
| 176 |
-
|
| 177 |
-
@staticmethod
|
| 178 |
-
def _empty_result() -> dict[str, Any]:
|
| 179 |
-
base: dict[str, Any] = {
|
| 180 |
-
"_celeba_raw": {},
|
| 181 |
-
"hair_color_celeba": "unknown",
|
| 182 |
-
"hair_color_scores": {"black": 0.0, "blond": 0.0, "brown": 0.0, "gray": 0.0},
|
| 183 |
-
"hair_texture_celeba": "unknown",
|
| 184 |
-
"facial_hair": {
|
| 185 |
-
"5_o_clock_shadow": False,
|
| 186 |
-
"goatee": False,
|
| 187 |
-
"mustache": False,
|
| 188 |
-
"sideburns": False,
|
| 189 |
-
"full_beard": False,
|
| 190 |
-
},
|
| 191 |
-
}
|
| 192 |
-
for key in PAIRS:
|
| 193 |
-
base[key] = False
|
| 194 |
-
return base
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
analyzers/color_analyzer.py
CHANGED
|
@@ -1,13 +1,35 @@
|
|
| 1 |
"""
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
"""
|
| 12 |
|
| 13 |
from typing import Any
|
|
@@ -15,8 +37,9 @@ from typing import Any
|
|
| 15 |
import cv2
|
| 16 |
import numpy as np
|
| 17 |
|
| 18 |
-
# Fitzpatrick scale boundaries
|
| 19 |
-
# OpenCV's uint8 LAB stores L scaled to 0–255, so we rescale before
|
|
|
|
| 20 |
FITZPATRICK_SCALE = [
|
| 21 |
(85, 100, "Type I - Very Fair"),
|
| 22 |
(70, 85, "Type II - Fair"),
|
|
@@ -26,24 +49,15 @@ FITZPATRICK_SCALE = [
|
|
| 26 |
(0, 25, "Type VI - Dark Brown/Black"),
|
| 27 |
]
|
| 28 |
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
"green": {"h_range": (35, 80), "s_min": 30},
|
| 33 |
-
"blue": {"h_range": (90, 130), "s_min": 30},
|
| 34 |
-
"gray": {"h_range": (0, 180), "s_max": 30},
|
| 35 |
-
"amber": {"h_range": (15, 25), "s_min": 80},
|
| 36 |
-
}
|
| 37 |
-
|
| 38 |
-
# Hair-texture thresholds on std(Laplacian) computed over the *eroded* hair
|
| 39 |
-
# mask (so the mask boundary itself doesn't contribute high-frequency energy).
|
| 40 |
-
# These are reasonable starting points — tune on your own dataset.
|
| 41 |
HAIR_TEXTURE_CURLY_THRESHOLD = 25.0
|
| 42 |
HAIR_TEXTURE_WAVY_THRESHOLD = 15.0
|
| 43 |
|
| 44 |
-
# MediaPipe FaceMesh lip contours.
|
| 45 |
-
# inner ring traces the mouth opening
|
| 46 |
-
#
|
| 47 |
MEDIAPIPE_LIP_OUTER = [
|
| 48 |
61, 146, 91, 181, 84, 17, 314, 405, 321, 375,
|
| 49 |
291, 409, 270, 269, 267, 0, 37, 39, 40, 185,
|
|
@@ -56,7 +70,8 @@ MEDIAPIPE_LIP_INNER = [
|
|
| 56 |
|
| 57 |
class ColorAnalyzer:
|
| 58 |
def __init__(self):
|
| 59 |
-
|
|
|
|
| 60 |
|
| 61 |
def analyze(
|
| 62 |
self,
|
|
@@ -78,36 +93,38 @@ class ColorAnalyzer:
|
|
| 78 |
if lip_mask is not None:
|
| 79 |
lip_mask = lip_mask.astype(bool)
|
| 80 |
|
| 81 |
-
# SegFormer human-parsing has no
|
| 82 |
-
#
|
| 83 |
-
#
|
| 84 |
if (lip_mask is None or lip_mask.sum() < 50) and landmarks:
|
| 85 |
derived = self._lip_mask_from_landmarks(landmarks, h, w)
|
| 86 |
if derived is not None:
|
| 87 |
lip_mask = derived
|
| 88 |
|
| 89 |
# ── Skin Tone ────────────────────────────────────────────────
|
|
|
|
| 90 |
if skin_mask is not None and skin_mask.sum() > 100:
|
|
|
|
|
|
|
|
|
|
| 91 |
skin_lab = cv2.cvtColor(img_rgb, cv2.COLOR_RGB2LAB)
|
| 92 |
skin_pixels = skin_lab[skin_mask]
|
| 93 |
|
| 94 |
-
#
|
| 95 |
-
#
|
| 96 |
-
# roughly -128..127) so the Fitzpatrick bins and undertone
|
| 97 |
-
# thresholds operate in standard units.
|
| 98 |
mean_l_raw = float(np.mean(skin_pixels[:, 0]))
|
| 99 |
mean_l = mean_l_raw * 100.0 / 255.0
|
| 100 |
mean_a = float(np.mean(skin_pixels[:, 1])) - 128.0
|
| 101 |
mean_b = float(np.mean(skin_pixels[:, 2])) - 128.0
|
| 102 |
|
| 103 |
-
# Fitzpatrick
|
| 104 |
fitz = "Unknown"
|
| 105 |
for low, high, label in FITZPATRICK_SCALE:
|
| 106 |
if low <= mean_l < high:
|
| 107 |
fitz = label
|
| 108 |
break
|
| 109 |
|
| 110 |
-
#
|
| 111 |
avg_rgb = np.mean(img_rgb[skin_mask], axis=0).astype(int)
|
| 112 |
hex_color = "#{:02x}{:02x}{:02x}".format(*avg_rgb)
|
| 113 |
|
|
@@ -120,9 +137,10 @@ class ColorAnalyzer:
|
|
| 120 |
"rgb": avg_rgb.tolist(),
|
| 121 |
}
|
| 122 |
|
| 123 |
-
# Undertone
|
| 124 |
-
#
|
| 125 |
-
#
|
|
|
|
| 126 |
if mean_b > 12:
|
| 127 |
result["skin_undertone"] = "warm"
|
| 128 |
elif mean_b < -8:
|
|
@@ -134,32 +152,34 @@ class ColorAnalyzer:
|
|
| 134 |
result["skin_undertone"] = "unknown"
|
| 135 |
|
| 136 |
# ── Eye Color ────────────────────────────────────────────────
|
|
|
|
|
|
|
| 137 |
if landmarks and len(landmarks) > 473:
|
| 138 |
-
eye_color = self._detect_eye_color(img_rgb, landmarks, h, w)
|
| 139 |
-
result["eye_color"] = eye_color
|
| 140 |
elif landmarks and len(landmarks) > 362:
|
| 141 |
-
|
| 142 |
-
eye_color = self._detect_eye_color_fallback(img_rgb, landmarks, h, w)
|
| 143 |
-
result["eye_color"] = eye_color
|
| 144 |
else:
|
| 145 |
result["eye_color"] = "unknown"
|
| 146 |
|
| 147 |
-
# ── Hair Color ────────────────────────────────────
|
|
|
|
| 148 |
if hair_mask is not None and hair_mask.sum() > 200:
|
| 149 |
-
|
| 150 |
-
result["hair_color"] = hair_color_info
|
| 151 |
-
|
| 152 |
result["hair_texture"] = self._estimate_hair_texture(img_rgb, hair_mask)
|
| 153 |
else:
|
| 154 |
result["hair_color"] = {"name": "unknown"}
|
| 155 |
result["hair_texture"] = "unknown"
|
| 156 |
|
| 157 |
# ── Lip Color ────────────────────────────────────────────────
|
|
|
|
| 158 |
if lip_mask is not None and lip_mask.sum() > 50:
|
| 159 |
lip_pixels = img_rgb[lip_mask]
|
| 160 |
avg_lip = np.mean(lip_pixels, axis=0).astype(int)
|
| 161 |
hex_lip = "#{:02x}{:02x}{:02x}".format(*avg_lip)
|
| 162 |
|
|
|
|
|
|
|
|
|
|
| 163 |
lip_hsv = cv2.cvtColor(
|
| 164 |
avg_lip.reshape(1, 1, 3).astype(np.uint8),
|
| 165 |
cv2.COLOR_RGB2HSV
|
|
@@ -194,38 +214,43 @@ class ColorAnalyzer:
|
|
| 194 |
def _estimate_hair_color(
|
| 195 |
img_rgb: np.ndarray, hair_mask: np.ndarray
|
| 196 |
) -> dict[str, Any]:
|
| 197 |
-
"""
|
| 198 |
|
| 199 |
Why median + L*-trim instead of k=2 k-means:
|
| 200 |
-
- K-means with k=2 splits highlight vs shadow within a single
|
| 201 |
-
color, so the "bigger cluster" can flip between photos
|
| 202 |
-
person depending on lighting. Median is robust
|
| 203 |
-
|
| 204 |
-
|
|
|
|
| 205 |
"""
|
| 206 |
hair_pixels = img_rgb[hair_mask] # (N, 3) uint8 RGB
|
| 207 |
|
| 208 |
-
#
|
| 209 |
hair_lab = cv2.cvtColor(
|
| 210 |
hair_pixels.reshape(-1, 1, 3), cv2.COLOR_RGB2LAB
|
| 211 |
).reshape(-1, 3)
|
| 212 |
l_lo, l_hi = np.percentile(hair_lab[:, 0], [10, 90])
|
| 213 |
keep = (hair_lab[:, 0] >= l_lo) & (hair_lab[:, 0] <= l_hi)
|
|
|
|
| 214 |
core_pixels = hair_pixels[keep] if keep.sum() > 50 else hair_pixels
|
| 215 |
|
|
|
|
|
|
|
| 216 |
dominant_rgb = np.median(core_pixels, axis=0)
|
| 217 |
dominant_rgb = np.clip(dominant_rgb, 0, 255).astype(np.uint8)
|
| 218 |
|
| 219 |
hex_hair = "#{:02x}{:02x}{:02x}".format(*dominant_rgb)
|
| 220 |
|
|
|
|
| 221 |
hair_hsv = cv2.cvtColor(
|
| 222 |
dominant_rgb.reshape(1, 1, 3), cv2.COLOR_RGB2HSV
|
| 223 |
)[0, 0]
|
| 224 |
h_val, s_val, v_val = int(hair_hsv[0]), int(hair_hsv[1]), int(hair_hsv[2])
|
| 225 |
|
| 226 |
-
# Classification cascade — order matters. Falls through to
|
| 227 |
-
#
|
| 228 |
-
#
|
| 229 |
if v_val < 45 and s_val < 60:
|
| 230 |
hair_color_name = "black"
|
| 231 |
elif s_val < 25:
|
|
@@ -234,8 +259,8 @@ class ColorAnalyzer:
|
|
| 234 |
elif (h_val < 12 or h_val > 168) and s_val > 60:
|
| 235 |
hair_color_name = "red/auburn"
|
| 236 |
elif 18 <= h_val <= 35 and v_val > 160 and s_val < 140:
|
| 237 |
-
# Blond
|
| 238 |
-
#
|
| 239 |
hair_color_name = "blond"
|
| 240 |
elif 5 <= h_val <= 30:
|
| 241 |
hair_color_name = "brown" if v_val > 80 else "dark brown"
|
|
@@ -253,21 +278,28 @@ class ColorAnalyzer:
|
|
| 253 |
def _estimate_hair_texture(
|
| 254 |
img_rgb: np.ndarray, hair_mask: np.ndarray
|
| 255 |
) -> str:
|
| 256 |
-
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
| 257 |
|
| 258 |
-
|
| 259 |
-
|
| 260 |
-
|
| 261 |
-
zeroed-region implementation was inadvertently measuring.
|
| 262 |
"""
|
|
|
|
| 263 |
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 5))
|
| 264 |
inner_mask = cv2.erode(
|
| 265 |
hair_mask.astype(np.uint8), kernel, iterations=2
|
| 266 |
).astype(bool)
|
| 267 |
|
|
|
|
| 268 |
if inner_mask.sum() < 200:
|
| 269 |
return "unknown"
|
| 270 |
|
|
|
|
|
|
|
| 271 |
hair_gray = cv2.cvtColor(img_rgb, cv2.COLOR_RGB2GRAY)
|
| 272 |
lap = cv2.Laplacian(hair_gray, cv2.CV_64F, ksize=3)
|
| 273 |
texture_score = float(np.std(lap[inner_mask]))
|
|
@@ -286,11 +318,15 @@ class ColorAnalyzer:
|
|
| 286 |
def _lip_mask_from_landmarks(
|
| 287 |
landmarks: list[dict], h: int, w: int
|
| 288 |
) -> np.ndarray | None:
|
| 289 |
-
"""Build a lip-flesh mask by filling outer
|
|
|
|
|
|
|
| 290 |
max_idx = max(MEDIAPIPE_LIP_OUTER + MEDIAPIPE_LIP_INNER)
|
| 291 |
if len(landmarks) <= max_idx:
|
| 292 |
return None
|
| 293 |
|
|
|
|
|
|
|
| 294 |
def _poly(indices: list[int]) -> np.ndarray:
|
| 295 |
return np.array(
|
| 296 |
[
|
|
@@ -300,6 +336,8 @@ class ColorAnalyzer:
|
|
| 300 |
dtype=np.int32,
|
| 301 |
)
|
| 302 |
|
|
|
|
|
|
|
| 303 |
mask = np.zeros((h, w), dtype=np.uint8)
|
| 304 |
cv2.fillPoly(mask, [_poly(MEDIAPIPE_LIP_OUTER)], 255)
|
| 305 |
cv2.fillPoly(mask, [_poly(MEDIAPIPE_LIP_INNER)], 0)
|
|
@@ -312,15 +350,17 @@ class ColorAnalyzer:
|
|
| 312 |
def _detect_eye_color(
|
| 313 |
self, img_rgb: np.ndarray, lm: list[dict], h: int, w: int
|
| 314 |
) -> str:
|
| 315 |
-
"""
|
| 316 |
-
|
|
|
|
|
|
|
| 317 |
iris_points = [(int(lm[i]["x"] * w), int(lm[i]["y"] * h)) for i in iris_indices]
|
| 318 |
|
| 319 |
-
# Create a small mask around iris center
|
| 320 |
cx = int(np.mean([p[0] for p in iris_points]))
|
| 321 |
cy = int(np.mean([p[1] for p in iris_points]))
|
| 322 |
radius = max(3, int(np.std([p[0] for p in iris_points]) * 1.5))
|
| 323 |
|
|
|
|
| 324 |
mask = np.zeros((h, w), dtype=np.uint8)
|
| 325 |
cv2.circle(mask, (cx, cy), radius, 255, -1)
|
| 326 |
|
|
@@ -333,11 +373,17 @@ class ColorAnalyzer:
|
|
| 333 |
def _detect_eye_color_fallback(
|
| 334 |
self, img_rgb: np.ndarray, lm: list[dict], h: int, w: int
|
| 335 |
) -> str:
|
| 336 |
-
"""Fallback
|
| 337 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 338 |
eye_pts = [159, 145, 133, 33]
|
| 339 |
cx = int(np.mean([lm[i]["x"] for i in eye_pts]) * w)
|
| 340 |
cy = int(np.mean([lm[i]["y"] for i in eye_pts]) * h)
|
|
|
|
| 341 |
radius = max(3, int(abs(lm[159]["y"] - lm[145]["y"]) * h * 0.3))
|
| 342 |
|
| 343 |
mask = np.zeros((h, w), dtype=np.uint8)
|
|
@@ -351,7 +397,13 @@ class ColorAnalyzer:
|
|
| 351 |
|
| 352 |
@staticmethod
|
| 353 |
def _classify_eye_color(pixels: np.ndarray) -> str:
|
| 354 |
-
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 355 |
hsv = cv2.cvtColor(
|
| 356 |
pixels.reshape(-1, 1, 3).astype(np.uint8),
|
| 357 |
cv2.COLOR_RGB2HSV
|
|
@@ -361,11 +413,11 @@ class ColorAnalyzer:
|
|
| 361 |
mean_s = float(np.mean(hsv[:, 1]))
|
| 362 |
mean_v = float(np.mean(hsv[:, 2]))
|
| 363 |
|
| 364 |
-
# Gray eyes: low saturation
|
| 365 |
if mean_s < 30:
|
| 366 |
return "gray"
|
| 367 |
|
| 368 |
-
#
|
| 369 |
if 90 <= mean_h <= 130 and mean_s > 30:
|
| 370 |
return "blue"
|
| 371 |
if 35 <= mean_h <= 80 and mean_s > 30:
|
|
@@ -376,7 +428,8 @@ class ColorAnalyzer:
|
|
| 376 |
return "amber"
|
| 377 |
if 8 <= mean_h <= 28 and mean_s > 50:
|
| 378 |
return "brown"
|
|
|
|
| 379 |
if mean_v < 60:
|
| 380 |
return "dark brown"
|
| 381 |
|
| 382 |
-
return "brown"
|
|
|
|
| 1 |
"""
|
| 2 |
+
ColorAnalyzer — pixel-level color extraction.
|
| 3 |
+
|
| 4 |
+
Model
|
| 5 |
+
-----
|
| 6 |
+
None. All operations are deterministic OpenCV LAB/HSV statistics over
|
| 7 |
+
masks/landmarks supplied by upstream analyzers.
|
| 8 |
+
|
| 9 |
+
Inputs
|
| 10 |
+
------
|
| 11 |
+
img_rgb : np.ndarray (H, W, 3) uint8
|
| 12 |
+
landmarks : list[dict] of normalised MediaPipe landmarks (optional)
|
| 13 |
+
skin_mask : bool ndarray (H, W) from SegFormer "face" class (optional)
|
| 14 |
+
hair_mask : bool ndarray (H, W) from SegFormer "hair" class (optional)
|
| 15 |
+
lip_mask : bool ndarray (H, W) — usually None; falls back to MediaPipe
|
| 16 |
+
lip polygon when missing or too small
|
| 17 |
+
|
| 18 |
+
Outputs (dict)
|
| 19 |
+
--------------
|
| 20 |
+
skin_tone — {fitzpatrick, lab_lightness, lab_a, lab_b, hex_color, rgb}
|
| 21 |
+
skin_undertone — warm | cool | neutral
|
| 22 |
+
eye_color — brown | hazel | amber | green | blue | gray | dark brown
|
| 23 |
+
hair_color — {name, hex, rgb, hsv}
|
| 24 |
+
hair_texture — straight | wavy | curly/coily (coarse Laplacian signal,
|
| 25 |
+
the HairTypeViT analyzer is the authoritative source)
|
| 26 |
+
lip_color — {shade, hex, rgb}
|
| 27 |
+
|
| 28 |
+
Notes
|
| 29 |
+
-----
|
| 30 |
+
LAB is preferred over RGB for skin tone classification because LAB's
|
| 31 |
+
L* channel is a perceptual lightness — Fitzpatrick bins line up with
|
| 32 |
+
fixed L* ranges regardless of camera white balance.
|
| 33 |
"""
|
| 34 |
|
| 35 |
from typing import Any
|
|
|
|
| 37 |
import cv2
|
| 38 |
import numpy as np
|
| 39 |
|
| 40 |
+
# Fitzpatrick scale boundaries on the LAB L* channel (true 0–100 range).
|
| 41 |
+
# OpenCV's uint8 LAB stores L scaled to 0–255, so we rescale before
|
| 42 |
+
# looking up bins.
|
| 43 |
FITZPATRICK_SCALE = [
|
| 44 |
(85, 100, "Type I - Very Fair"),
|
| 45 |
(70, 85, "Type II - Fair"),
|
|
|
|
| 49 |
(0, 25, "Type VI - Dark Brown/Black"),
|
| 50 |
]
|
| 51 |
|
| 52 |
+
# Hair-texture thresholds on std(Laplacian) computed over the *eroded*
|
| 53 |
+
# hair mask. Erosion prevents the mask boundary from contributing
|
| 54 |
+
# high-frequency step-edge energy.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
HAIR_TEXTURE_CURLY_THRESHOLD = 25.0
|
| 56 |
HAIR_TEXTURE_WAVY_THRESHOLD = 15.0
|
| 57 |
|
| 58 |
+
# MediaPipe FaceMesh lip contours. The outer ring traces the lip
|
| 59 |
+
# border; the inner ring traces the mouth opening. Filling outer
|
| 60 |
+
# and then erasing inner gives only lip flesh, never teeth/tongue.
|
| 61 |
MEDIAPIPE_LIP_OUTER = [
|
| 62 |
61, 146, 91, 181, 84, 17, 314, 405, 321, 375,
|
| 63 |
291, 409, 270, 269, 267, 0, 37, 39, 40, 185,
|
|
|
|
| 70 |
|
| 71 |
class ColorAnalyzer:
|
| 72 |
def __init__(self):
|
| 73 |
+
# No model to load — pure pixel arithmetic.
|
| 74 |
+
pass
|
| 75 |
|
| 76 |
def analyze(
|
| 77 |
self,
|
|
|
|
| 93 |
if lip_mask is not None:
|
| 94 |
lip_mask = lip_mask.astype(bool)
|
| 95 |
|
| 96 |
+
# SegFormer human-parsing has no lip class, so callers usually
|
| 97 |
+
# pass None for lip_mask. Build one from MediaPipe lip landmarks
|
| 98 |
+
# whenever it's missing or too small to sample reliably.
|
| 99 |
if (lip_mask is None or lip_mask.sum() < 50) and landmarks:
|
| 100 |
derived = self._lip_mask_from_landmarks(landmarks, h, w)
|
| 101 |
if derived is not None:
|
| 102 |
lip_mask = derived
|
| 103 |
|
| 104 |
# ── Skin Tone ────────────────────────────────────────────────
|
| 105 |
+
# Need at least ~100 face pixels for stable statistics.
|
| 106 |
if skin_mask is not None and skin_mask.sum() > 100:
|
| 107 |
+
# Convert the whole image to LAB once and pull pixels under
|
| 108 |
+
# the mask. cv2 returns uint8 LAB with L in 0–255 and a/b
|
| 109 |
+
# offset by +128 (so neutral gray is L=128, a=128, b=128).
|
| 110 |
skin_lab = cv2.cvtColor(img_rgb, cv2.COLOR_RGB2LAB)
|
| 111 |
skin_pixels = skin_lab[skin_mask]
|
| 112 |
|
| 113 |
+
# Rescale to standard LAB ranges before applying the
|
| 114 |
+
# Fitzpatrick / undertone thresholds defined on those ranges.
|
|
|
|
|
|
|
| 115 |
mean_l_raw = float(np.mean(skin_pixels[:, 0]))
|
| 116 |
mean_l = mean_l_raw * 100.0 / 255.0
|
| 117 |
mean_a = float(np.mean(skin_pixels[:, 1])) - 128.0
|
| 118 |
mean_b = float(np.mean(skin_pixels[:, 2])) - 128.0
|
| 119 |
|
| 120 |
+
# Bin into Fitzpatrick types — linear search over six bands.
|
| 121 |
fitz = "Unknown"
|
| 122 |
for low, high, label in FITZPATRICK_SCALE:
|
| 123 |
if low <= mean_l < high:
|
| 124 |
fitz = label
|
| 125 |
break
|
| 126 |
|
| 127 |
+
# Average RGB → hex for display.
|
| 128 |
avg_rgb = np.mean(img_rgb[skin_mask], axis=0).astype(int)
|
| 129 |
hex_color = "#{:02x}{:02x}{:02x}".format(*avg_rgb)
|
| 130 |
|
|
|
|
| 137 |
"rgb": avg_rgb.tolist(),
|
| 138 |
}
|
| 139 |
|
| 140 |
+
# Undertone from b* (yellow ↔ blue axis):
|
| 141 |
+
# b* > +12 → yellow-leaning, warm
|
| 142 |
+
# b* < -8 → blue-leaning, cool
|
| 143 |
+
# in between → neutral
|
| 144 |
if mean_b > 12:
|
| 145 |
result["skin_undertone"] = "warm"
|
| 146 |
elif mean_b < -8:
|
|
|
|
| 152 |
result["skin_undertone"] = "unknown"
|
| 153 |
|
| 154 |
# ── Eye Color ────────────────────────────────────────────────
|
| 155 |
+
# Prefer the dedicated iris landmarks (468-477) when available.
|
| 156 |
+
# Fall back to a rough eye-centre crop otherwise.
|
| 157 |
if landmarks and len(landmarks) > 473:
|
| 158 |
+
result["eye_color"] = self._detect_eye_color(img_rgb, landmarks, h, w)
|
|
|
|
| 159 |
elif landmarks and len(landmarks) > 362:
|
| 160 |
+
result["eye_color"] = self._detect_eye_color_fallback(img_rgb, landmarks, h, w)
|
|
|
|
|
|
|
| 161 |
else:
|
| 162 |
result["eye_color"] = "unknown"
|
| 163 |
|
| 164 |
+
# ── Hair Color & Texture ────────────────────────────────────
|
| 165 |
+
# Need at least 200 hair pixels for a stable median.
|
| 166 |
if hair_mask is not None and hair_mask.sum() > 200:
|
| 167 |
+
result["hair_color"] = self._estimate_hair_color(img_rgb, hair_mask)
|
|
|
|
|
|
|
| 168 |
result["hair_texture"] = self._estimate_hair_texture(img_rgb, hair_mask)
|
| 169 |
else:
|
| 170 |
result["hair_color"] = {"name": "unknown"}
|
| 171 |
result["hair_texture"] = "unknown"
|
| 172 |
|
| 173 |
# ── Lip Color ────────────────────────────────────────────────
|
| 174 |
+
# Average the masked lip pixels and bucket by HSV saturation/value.
|
| 175 |
if lip_mask is not None and lip_mask.sum() > 50:
|
| 176 |
lip_pixels = img_rgb[lip_mask]
|
| 177 |
avg_lip = np.mean(lip_pixels, axis=0).astype(int)
|
| 178 |
hex_lip = "#{:02x}{:02x}{:02x}".format(*avg_lip)
|
| 179 |
|
| 180 |
+
# Convert the single average RGB triple to HSV for shade
|
| 181 |
+
# classification. High saturation → rosy/red; high value but
|
| 182 |
+
# low saturation → pink; low value → dark; otherwise natural.
|
| 183 |
lip_hsv = cv2.cvtColor(
|
| 184 |
avg_lip.reshape(1, 1, 3).astype(np.uint8),
|
| 185 |
cv2.COLOR_RGB2HSV
|
|
|
|
| 214 |
def _estimate_hair_color(
|
| 215 |
img_rgb: np.ndarray, hair_mask: np.ndarray
|
| 216 |
) -> dict[str, Any]:
|
| 217 |
+
"""Dominant hair color via LAB-lightness-trimmed median.
|
| 218 |
|
| 219 |
Why median + L*-trim instead of k=2 k-means:
|
| 220 |
+
- K-means with k=2 splits highlight vs shadow within a single
|
| 221 |
+
hair color, so the "bigger cluster" can flip between photos
|
| 222 |
+
of the same person depending on lighting. Median is robust
|
| 223 |
+
and deterministic.
|
| 224 |
+
- Trimming the top/bottom 10% of L* drops specular highlights
|
| 225 |
+
and deep shadows, the main outlier sources.
|
| 226 |
"""
|
| 227 |
hair_pixels = img_rgb[hair_mask] # (N, 3) uint8 RGB
|
| 228 |
|
| 229 |
+
# LAB conversion so we can trim by perceptual lightness.
|
| 230 |
hair_lab = cv2.cvtColor(
|
| 231 |
hair_pixels.reshape(-1, 1, 3), cv2.COLOR_RGB2LAB
|
| 232 |
).reshape(-1, 3)
|
| 233 |
l_lo, l_hi = np.percentile(hair_lab[:, 0], [10, 90])
|
| 234 |
keep = (hair_lab[:, 0] >= l_lo) & (hair_lab[:, 0] <= l_hi)
|
| 235 |
+
# If trimming would leave us too few pixels, fall back to all.
|
| 236 |
core_pixels = hair_pixels[keep] if keep.sum() > 50 else hair_pixels
|
| 237 |
|
| 238 |
+
# Median is robust to mask leakage (a few stray non-hair pixels
|
| 239 |
+
# don't shift the median).
|
| 240 |
dominant_rgb = np.median(core_pixels, axis=0)
|
| 241 |
dominant_rgb = np.clip(dominant_rgb, 0, 255).astype(np.uint8)
|
| 242 |
|
| 243 |
hex_hair = "#{:02x}{:02x}{:02x}".format(*dominant_rgb)
|
| 244 |
|
| 245 |
+
# Bucket the dominant color into a name via HSV thresholds.
|
| 246 |
hair_hsv = cv2.cvtColor(
|
| 247 |
dominant_rgb.reshape(1, 1, 3), cv2.COLOR_RGB2HSV
|
| 248 |
)[0, 0]
|
| 249 |
h_val, s_val, v_val = int(hair_hsv[0]), int(hair_hsv[1]), int(hair_hsv[2])
|
| 250 |
|
| 251 |
+
# Classification cascade — order matters. Falls through to
|
| 252 |
+
# "unknown" instead of defaulting to a colour, so mask leakage
|
| 253 |
+
# and unusual tints stay detectable downstream.
|
| 254 |
if v_val < 45 and s_val < 60:
|
| 255 |
hair_color_name = "black"
|
| 256 |
elif s_val < 25:
|
|
|
|
| 259 |
elif (h_val < 12 or h_val > 168) and s_val > 60:
|
| 260 |
hair_color_name = "red/auburn"
|
| 261 |
elif 18 <= h_val <= 35 and v_val > 160 and s_val < 140:
|
| 262 |
+
# Blond is desaturated yellow with high V — bright but not
|
| 263 |
+
# too saturated (or it'd shade orange).
|
| 264 |
hair_color_name = "blond"
|
| 265 |
elif 5 <= h_val <= 30:
|
| 266 |
hair_color_name = "brown" if v_val > 80 else "dark brown"
|
|
|
|
| 278 |
def _estimate_hair_texture(
|
| 279 |
img_rgb: np.ndarray, hair_mask: np.ndarray
|
| 280 |
) -> str:
|
| 281 |
+
"""Coarse hair texture from local intensity variation.
|
| 282 |
+
|
| 283 |
+
Computes std(Laplacian) over an *eroded* hair mask so the mask
|
| 284 |
+
boundary itself doesn't contribute the high-frequency step
|
| 285 |
+
edge that an un-eroded mask would.
|
| 286 |
|
| 287 |
+
This is intentionally a fallback signal; the authoritative
|
| 288 |
+
hair-texture output is HairTypeViT (curly/dreadlocks/kinky/
|
| 289 |
+
straight/wavy), which is trained and ~93% accurate.
|
|
|
|
| 290 |
"""
|
| 291 |
+
# Erode by ~10 px so we sample only deep-interior hair pixels.
|
| 292 |
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 5))
|
| 293 |
inner_mask = cv2.erode(
|
| 294 |
hair_mask.astype(np.uint8), kernel, iterations=2
|
| 295 |
).astype(bool)
|
| 296 |
|
| 297 |
+
# Not enough interior pixels to compute a reliable std.
|
| 298 |
if inner_mask.sum() < 200:
|
| 299 |
return "unknown"
|
| 300 |
|
| 301 |
+
# Laplacian responds to local intensity curvature; its std over
|
| 302 |
+
# the masked region is a proxy for "how much fine detail".
|
| 303 |
hair_gray = cv2.cvtColor(img_rgb, cv2.COLOR_RGB2GRAY)
|
| 304 |
lap = cv2.Laplacian(hair_gray, cv2.CV_64F, ksize=3)
|
| 305 |
texture_score = float(np.std(lap[inner_mask]))
|
|
|
|
| 318 |
def _lip_mask_from_landmarks(
|
| 319 |
landmarks: list[dict], h: int, w: int
|
| 320 |
) -> np.ndarray | None:
|
| 321 |
+
"""Build a lip-flesh mask by filling outer minus inner contour."""
|
| 322 |
+
# Bail if the landmark list doesn't have indices the contours
|
| 323 |
+
# reference (e.g. iris-less subset).
|
| 324 |
max_idx = max(MEDIAPIPE_LIP_OUTER + MEDIAPIPE_LIP_INNER)
|
| 325 |
if len(landmarks) <= max_idx:
|
| 326 |
return None
|
| 327 |
|
| 328 |
+
# Helper to convert a list of landmark indices into a pixel-
|
| 329 |
+
# space polygon in (x, y) order.
|
| 330 |
def _poly(indices: list[int]) -> np.ndarray:
|
| 331 |
return np.array(
|
| 332 |
[
|
|
|
|
| 336 |
dtype=np.int32,
|
| 337 |
)
|
| 338 |
|
| 339 |
+
# Fill the outer ring, then erase the inner ring → lip flesh
|
| 340 |
+
# only, no teeth or tongue pixels.
|
| 341 |
mask = np.zeros((h, w), dtype=np.uint8)
|
| 342 |
cv2.fillPoly(mask, [_poly(MEDIAPIPE_LIP_OUTER)], 255)
|
| 343 |
cv2.fillPoly(mask, [_poly(MEDIAPIPE_LIP_INNER)], 0)
|
|
|
|
| 350 |
def _detect_eye_color(
|
| 351 |
self, img_rgb: np.ndarray, lm: list[dict], h: int, w: int
|
| 352 |
) -> str:
|
| 353 |
+
"""Sample left-iris pixels using MediaPipe iris landmarks (468–477)."""
|
| 354 |
+
# 468-473 cover the left iris ring; we average them to a centre
|
| 355 |
+
# and pick a radius from the std-dev of the x-coordinates.
|
| 356 |
+
iris_indices = list(range(468, 474))
|
| 357 |
iris_points = [(int(lm[i]["x"] * w), int(lm[i]["y"] * h)) for i in iris_indices]
|
| 358 |
|
|
|
|
| 359 |
cx = int(np.mean([p[0] for p in iris_points]))
|
| 360 |
cy = int(np.mean([p[1] for p in iris_points]))
|
| 361 |
radius = max(3, int(np.std([p[0] for p in iris_points]) * 1.5))
|
| 362 |
|
| 363 |
+
# Filled disc mask centred on the iris → classify those pixels.
|
| 364 |
mask = np.zeros((h, w), dtype=np.uint8)
|
| 365 |
cv2.circle(mask, (cx, cy), radius, 255, -1)
|
| 366 |
|
|
|
|
| 373 |
def _detect_eye_color_fallback(
|
| 374 |
self, img_rgb: np.ndarray, lm: list[dict], h: int, w: int
|
| 375 |
) -> str:
|
| 376 |
+
"""Fallback when iris landmarks aren't available.
|
| 377 |
+
|
| 378 |
+
Averages four points that bound the eye opening and treats the
|
| 379 |
+
centre as a coarse "look here" target. Less accurate than the
|
| 380 |
+
iris-landmark path because we sample some sclera too, but it's
|
| 381 |
+
a graceful degradation.
|
| 382 |
+
"""
|
| 383 |
eye_pts = [159, 145, 133, 33]
|
| 384 |
cx = int(np.mean([lm[i]["x"] for i in eye_pts]) * w)
|
| 385 |
cy = int(np.mean([lm[i]["y"] for i in eye_pts]) * h)
|
| 386 |
+
# Radius scaled to ~30% of eye opening height.
|
| 387 |
radius = max(3, int(abs(lm[159]["y"] - lm[145]["y"]) * h * 0.3))
|
| 388 |
|
| 389 |
mask = np.zeros((h, w), dtype=np.uint8)
|
|
|
|
| 397 |
|
| 398 |
@staticmethod
|
| 399 |
def _classify_eye_color(pixels: np.ndarray) -> str:
|
| 400 |
+
"""Bucket sampled iris pixels by HSV mean.
|
| 401 |
+
|
| 402 |
+
Hue ranges follow the standard OpenCV scale (H in 0–180, not
|
| 403 |
+
0–360). The cascade order matters: gray is checked first because
|
| 404 |
+
any sufficiently desaturated eye is gray regardless of its
|
| 405 |
+
nominal hue.
|
| 406 |
+
"""
|
| 407 |
hsv = cv2.cvtColor(
|
| 408 |
pixels.reshape(-1, 1, 3).astype(np.uint8),
|
| 409 |
cv2.COLOR_RGB2HSV
|
|
|
|
| 413 |
mean_s = float(np.mean(hsv[:, 1]))
|
| 414 |
mean_v = float(np.mean(hsv[:, 2]))
|
| 415 |
|
| 416 |
+
# Gray eyes: any hue, but low saturation.
|
| 417 |
if mean_s < 30:
|
| 418 |
return "gray"
|
| 419 |
|
| 420 |
+
# Hue-based buckets. Specific (amber) before general (brown).
|
| 421 |
if 90 <= mean_h <= 130 and mean_s > 30:
|
| 422 |
return "blue"
|
| 423 |
if 35 <= mean_h <= 80 and mean_s > 30:
|
|
|
|
| 428 |
return "amber"
|
| 429 |
if 8 <= mean_h <= 28 and mean_s > 50:
|
| 430 |
return "brown"
|
| 431 |
+
# Anything left with low V is just dark brown.
|
| 432 |
if mean_v < 60:
|
| 433 |
return "dark brown"
|
| 434 |
|
| 435 |
+
return "brown"
|
analyzers/demographic_analyzer.py
CHANGED
|
@@ -1,13 +1,36 @@
|
|
| 1 |
"""
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
Models
|
| 5 |
-
-
|
| 6 |
-
-
|
| 7 |
-
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
"""
|
| 12 |
|
| 13 |
from typing import Any
|
|
@@ -22,20 +45,28 @@ RACE_MODEL_ID = "cledoux42/Ethnicity_Test_v003"
|
|
| 22 |
|
| 23 |
AGE_LABELS = ["0-2", "3-9", "10-19", "20-29", "30-39", "40-49", "50-59", "60-69", "70+"]
|
| 24 |
GENDER_LABELS = ["Male", "Female"]
|
| 25 |
-
# cledoux42
|
| 26 |
-
#
|
| 27 |
-
#
|
| 28 |
RACE_LABELS = ["White", "Black", "Latino_Hispanic", "East Asian", "Southeast Asian", "Indian", "Middle Eastern"]
|
| 29 |
|
| 30 |
|
| 31 |
class DemographicAnalyzer:
|
| 32 |
def __init__(self):
|
|
|
|
|
|
|
|
|
|
| 33 |
self.age_classifier = self._load_classifier(AGE_MODEL_ID)
|
| 34 |
self.gender_classifier = self._load_classifier(GENDER_MODEL_ID)
|
| 35 |
self.race_classifier = self._load_classifier(RACE_MODEL_ID)
|
| 36 |
|
| 37 |
@staticmethod
|
| 38 |
def _load_classifier(model_id: str):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
try:
|
| 40 |
return pipeline("image-classification", model=model_id)
|
| 41 |
except Exception as exc:
|
|
@@ -43,12 +74,18 @@ class DemographicAnalyzer:
|
|
| 43 |
return None
|
| 44 |
|
| 45 |
def analyze(self, img_rgb) -> dict[str, Any]:
|
|
|
|
|
|
|
| 46 |
pil = Image.fromarray(img_rgb)
|
| 47 |
|
| 48 |
-
|
|
|
|
|
|
|
|
|
|
| 49 |
gender_predictions = self._safe_predict(self.gender_classifier, pil, top_k=2)
|
| 50 |
race_predictions = self._safe_predict(self.race_classifier, pil, top_k=7)
|
| 51 |
|
|
|
|
| 52 |
if not age_predictions and not gender_predictions and not race_predictions:
|
| 53 |
return {
|
| 54 |
"age_range": "unknown",
|
|
@@ -62,17 +99,22 @@ class DemographicAnalyzer:
|
|
| 62 |
"ethnicity_distribution": {label: 0.0 for label in RACE_LABELS},
|
| 63 |
}
|
| 64 |
|
|
|
|
|
|
|
| 65 |
age_prediction = age_predictions[0] if age_predictions else {"label": "unknown", "score": 0.0}
|
| 66 |
gender_prediction = gender_predictions[0] if gender_predictions else {"label": "unknown", "score": 0.0}
|
| 67 |
race_prediction = race_predictions[0] if race_predictions else {"label": "unknown", "score": 0.0}
|
| 68 |
|
|
|
|
|
|
|
|
|
|
| 69 |
age_label = self._normalize_age_label(age_prediction["label"])
|
| 70 |
gender_label = self._normalize_gender_label(gender_prediction["label"])
|
| 71 |
race_label = self._normalize_race_label(race_prediction["label"])
|
| 72 |
|
| 73 |
return {
|
| 74 |
"age_range": age_label,
|
| 75 |
-
"age_estimate": self.
|
| 76 |
"age_confidence": round(float(age_prediction["score"]), 3),
|
| 77 |
"gender": gender_label.lower(),
|
| 78 |
"gender_confidence": round(float(gender_prediction["score"]), 3),
|
|
@@ -84,6 +126,7 @@ class DemographicAnalyzer:
|
|
| 84 |
|
| 85 |
@staticmethod
|
| 86 |
def _normalize_age_label(label: str) -> str:
|
|
|
|
| 87 |
normalized = label.strip().lower()
|
| 88 |
if normalized == "more than 70":
|
| 89 |
return "70+"
|
|
@@ -98,9 +141,10 @@ class DemographicAnalyzer:
|
|
| 98 |
|
| 99 |
@staticmethod
|
| 100 |
def _normalize_race_label(label: str) -> str:
|
|
|
|
| 101 |
normalized = label.strip().lower().replace("-", "_")
|
| 102 |
race_aliases = {
|
| 103 |
-
#
|
| 104 |
"white": "White",
|
| 105 |
"black": "Black",
|
| 106 |
"latino_hispanic": "Latino_Hispanic",
|
|
@@ -109,7 +153,7 @@ class DemographicAnalyzer:
|
|
| 109 |
"southeast asian": "Southeast Asian",
|
| 110 |
"indian": "Indian",
|
| 111 |
"middle eastern": "Middle Eastern",
|
| 112 |
-
# cledoux42/Ethnicity_Test_v003 5-class labels
|
| 113 |
"african": "Black",
|
| 114 |
"asian": "East Asian",
|
| 115 |
"caucasian": "White",
|
|
@@ -117,23 +161,52 @@ class DemographicAnalyzer:
|
|
| 117 |
}
|
| 118 |
return race_aliases.get(normalized, label)
|
| 119 |
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
|
| 123 |
-
|
| 124 |
-
|
| 125 |
-
|
| 126 |
-
|
| 127 |
-
|
| 128 |
-
|
| 129 |
-
|
| 130 |
-
|
| 131 |
-
|
| 132 |
-
|
| 133 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 134 |
|
| 135 |
@classmethod
|
| 136 |
def _distribution_map(cls, predictions, normalizer, all_labels):
|
|
|
|
|
|
|
|
|
|
|
|
|
| 137 |
distribution = {label: 0.0 for label in all_labels}
|
| 138 |
for prediction in predictions:
|
| 139 |
normalized_label = normalizer(prediction["label"])
|
|
@@ -143,6 +216,8 @@ class DemographicAnalyzer:
|
|
| 143 |
|
| 144 |
@staticmethod
|
| 145 |
def _safe_predict(classifier, image, top_k: int):
|
|
|
|
|
|
|
| 146 |
if classifier is None:
|
| 147 |
return []
|
| 148 |
try:
|
|
|
|
| 1 |
"""
|
| 2 |
+
DemographicAnalyzer — age, gender, ethnicity via three ViT classifiers.
|
| 3 |
+
|
| 4 |
+
Models
|
| 5 |
+
------
|
| 6 |
+
- Age : dima806/fairface_age_image_detection
|
| 7 |
+
ViT-B/16, ~59% top-1 on FairFace 9 age buckets.
|
| 8 |
+
- Gender : dima806/fairface_gender_image_detection
|
| 9 |
+
ViT-B/16, ~93.4% on FairFace.
|
| 10 |
+
- Ethnicity : cledoux42/Ethnicity_Test_v003
|
| 11 |
+
ViT, 79.6% accuracy, macro-F1 0.797. 5-class output that
|
| 12 |
+
we widen into the legacy 7-bucket FairFace schema so the
|
| 13 |
+
rest of the app's distribution shape doesn't change.
|
| 14 |
+
|
| 15 |
+
All three are Apache 2.0 and Hugging Face image-classification pipelines.
|
| 16 |
+
|
| 17 |
+
Inputs
|
| 18 |
+
------
|
| 19 |
+
img_rgb : np.ndarray (H, W, 3) uint8
|
| 20 |
+
|
| 21 |
+
Outputs (dict)
|
| 22 |
+
--------------
|
| 23 |
+
age_range, age_estimate (softmax-weighted continuous), age_confidence,
|
| 24 |
+
age_distribution, gender, gender_confidence, ethnicity,
|
| 25 |
+
ethnicity_confidence, ethnicity_distribution.
|
| 26 |
+
|
| 27 |
+
Notes
|
| 28 |
+
-----
|
| 29 |
+
The FairFace age model is a 9-bucket classifier (0-2, 3-9, …, 70+),
|
| 30 |
+
which means the argmax bucket midpoint is always one of nine fixed
|
| 31 |
+
numbers (24.5 for 20-29, etc.). To recover a smooth continuous estimate
|
| 32 |
+
we compute the expected value across the full softmax — see
|
| 33 |
+
``_weighted_age_estimate``.
|
| 34 |
"""
|
| 35 |
|
| 36 |
from typing import Any
|
|
|
|
| 45 |
|
| 46 |
AGE_LABELS = ["0-2", "3-9", "10-19", "20-29", "30-39", "40-49", "50-59", "60-69", "70+"]
|
| 47 |
GENDER_LABELS = ["Male", "Female"]
|
| 48 |
+
# cledoux42 ships 5 classes (african, asian, caucasian, hispanic, indian),
|
| 49 |
+
# but we keep the legacy 7-bucket FairFace label space internally so the
|
| 50 |
+
# downstream distribution dict shape stays stable. Unseen buckets stay 0.
|
| 51 |
RACE_LABELS = ["White", "Black", "Latino_Hispanic", "East Asian", "Southeast Asian", "Indian", "Middle Eastern"]
|
| 52 |
|
| 53 |
|
| 54 |
class DemographicAnalyzer:
|
| 55 |
def __init__(self):
|
| 56 |
+
# Each classifier is a HF image-classification pipeline. They lazy
|
| 57 |
+
# download weights from HF on first instantiation and cache them
|
| 58 |
+
# under /root/.cache/huggingface inside the container.
|
| 59 |
self.age_classifier = self._load_classifier(AGE_MODEL_ID)
|
| 60 |
self.gender_classifier = self._load_classifier(GENDER_MODEL_ID)
|
| 61 |
self.race_classifier = self._load_classifier(RACE_MODEL_ID)
|
| 62 |
|
| 63 |
@staticmethod
|
| 64 |
def _load_classifier(model_id: str):
|
| 65 |
+
"""Build one HF image-classification pipeline, logging on failure.
|
| 66 |
+
|
| 67 |
+
A failed load returns None so the rest of the service continues
|
| 68 |
+
to function and `analyze()` falls back to "unknown" demographics.
|
| 69 |
+
"""
|
| 70 |
try:
|
| 71 |
return pipeline("image-classification", model=model_id)
|
| 72 |
except Exception as exc:
|
|
|
|
| 74 |
return None
|
| 75 |
|
| 76 |
def analyze(self, img_rgb) -> dict[str, Any]:
|
| 77 |
+
# Convert the numpy frame to a PIL Image once and reuse it for
|
| 78 |
+
# all three classifier calls.
|
| 79 |
pil = Image.fromarray(img_rgb)
|
| 80 |
|
| 81 |
+
# top_k=len(labels) so we get the full softmax for each model.
|
| 82 |
+
# We need the full age distribution to compute the weighted
|
| 83 |
+
# expected-value age estimate.
|
| 84 |
+
age_predictions = self._safe_predict(self.age_classifier, pil, top_k=len(AGE_LABELS))
|
| 85 |
gender_predictions = self._safe_predict(self.gender_classifier, pil, top_k=2)
|
| 86 |
race_predictions = self._safe_predict(self.race_classifier, pil, top_k=7)
|
| 87 |
|
| 88 |
+
# If every classifier failed we degrade gracefully with a stub.
|
| 89 |
if not age_predictions and not gender_predictions and not race_predictions:
|
| 90 |
return {
|
| 91 |
"age_range": "unknown",
|
|
|
|
| 99 |
"ethnicity_distribution": {label: 0.0 for label in RACE_LABELS},
|
| 100 |
}
|
| 101 |
|
| 102 |
+
# HF pipelines return predictions pre-sorted by score descending,
|
| 103 |
+
# so prediction[0] is always the argmax class.
|
| 104 |
age_prediction = age_predictions[0] if age_predictions else {"label": "unknown", "score": 0.0}
|
| 105 |
gender_prediction = gender_predictions[0] if gender_predictions else {"label": "unknown", "score": 0.0}
|
| 106 |
race_prediction = race_predictions[0] if race_predictions else {"label": "unknown", "score": 0.0}
|
| 107 |
|
| 108 |
+
# Models occasionally return label aliases ("more than 70" instead
|
| 109 |
+
# of "70+", "African" instead of "Black"). The normalisers map
|
| 110 |
+
# everything back to our canonical schema.
|
| 111 |
age_label = self._normalize_age_label(age_prediction["label"])
|
| 112 |
gender_label = self._normalize_gender_label(gender_prediction["label"])
|
| 113 |
race_label = self._normalize_race_label(race_prediction["label"])
|
| 114 |
|
| 115 |
return {
|
| 116 |
"age_range": age_label,
|
| 117 |
+
"age_estimate": self._weighted_age_estimate(age_predictions),
|
| 118 |
"age_confidence": round(float(age_prediction["score"]), 3),
|
| 119 |
"gender": gender_label.lower(),
|
| 120 |
"gender_confidence": round(float(gender_prediction["score"]), 3),
|
|
|
|
| 126 |
|
| 127 |
@staticmethod
|
| 128 |
def _normalize_age_label(label: str) -> str:
|
| 129 |
+
"""Map model output to canonical AGE_LABELS entry."""
|
| 130 |
normalized = label.strip().lower()
|
| 131 |
if normalized == "more than 70":
|
| 132 |
return "70+"
|
|
|
|
| 141 |
|
| 142 |
@staticmethod
|
| 143 |
def _normalize_race_label(label: str) -> str:
|
| 144 |
+
"""Coalesce cledoux42's 5 classes into our 7-bucket schema."""
|
| 145 |
normalized = label.strip().lower().replace("-", "_")
|
| 146 |
race_aliases = {
|
| 147 |
+
# Legacy FairFace 7-class labels
|
| 148 |
"white": "White",
|
| 149 |
"black": "Black",
|
| 150 |
"latino_hispanic": "Latino_Hispanic",
|
|
|
|
| 153 |
"southeast asian": "Southeast Asian",
|
| 154 |
"indian": "Indian",
|
| 155 |
"middle eastern": "Middle Eastern",
|
| 156 |
+
# cledoux42/Ethnicity_Test_v003 5-class labels
|
| 157 |
"african": "Black",
|
| 158 |
"asian": "East Asian",
|
| 159 |
"caucasian": "White",
|
|
|
|
| 161 |
}
|
| 162 |
return race_aliases.get(normalized, label)
|
| 163 |
|
| 164 |
+
# Midpoint of each FairFace age bucket — used as the per-bucket
|
| 165 |
+
# "value" when we marginalise over the predicted distribution.
|
| 166 |
+
_AGE_MIDPOINTS = {
|
| 167 |
+
"0-2": 1.0,
|
| 168 |
+
"3-9": 6.0,
|
| 169 |
+
"10-19": 14.5,
|
| 170 |
+
"20-29": 24.5,
|
| 171 |
+
"30-39": 34.5,
|
| 172 |
+
"40-49": 44.5,
|
| 173 |
+
"50-59": 54.5,
|
| 174 |
+
"60-69": 64.5,
|
| 175 |
+
"70+": 75.0,
|
| 176 |
+
}
|
| 177 |
+
|
| 178 |
+
@classmethod
|
| 179 |
+
def _weighted_age_estimate(cls, predictions: list[dict]) -> float:
|
| 180 |
+
"""Softmax-weighted expected age across all FairFace buckets.
|
| 181 |
+
|
| 182 |
+
FairFace is a 9-bucket classifier; the argmax always snaps to one
|
| 183 |
+
of nine fixed midpoints (24.5 for 20-29, etc.). Treating its
|
| 184 |
+
softmax as a probability distribution and taking the expected
|
| 185 |
+
value gives a continuous number that moves with confidence
|
| 186 |
+
(23.1 for someone very confidently 20-29, 28.4 if some mass leaks
|
| 187 |
+
into 30-39). Still bounded by bucket midpoints — true per-year
|
| 188 |
+
accuracy would need a regression model.
|
| 189 |
+
"""
|
| 190 |
+
total_weight = 0.0
|
| 191 |
+
weighted_sum = 0.0
|
| 192 |
+
for pred in predictions:
|
| 193 |
+
label = cls._normalize_age_label(pred["label"])
|
| 194 |
+
midpoint = cls._AGE_MIDPOINTS.get(label)
|
| 195 |
+
if midpoint is None:
|
| 196 |
+
continue
|
| 197 |
+
score = float(pred["score"])
|
| 198 |
+
weighted_sum += midpoint * score
|
| 199 |
+
total_weight += score
|
| 200 |
+
if total_weight == 0:
|
| 201 |
+
return 0.0
|
| 202 |
+
return round(weighted_sum / total_weight, 1)
|
| 203 |
|
| 204 |
@classmethod
|
| 205 |
def _distribution_map(cls, predictions, normalizer, all_labels):
|
| 206 |
+
"""Flatten HF predictions into {canonical_label: score} dict.
|
| 207 |
+
|
| 208 |
+
Unseen labels stay at 0.0 so the shape is always all_labels-sized.
|
| 209 |
+
"""
|
| 210 |
distribution = {label: 0.0 for label in all_labels}
|
| 211 |
for prediction in predictions:
|
| 212 |
normalized_label = normalizer(prediction["label"])
|
|
|
|
| 216 |
|
| 217 |
@staticmethod
|
| 218 |
def _safe_predict(classifier, image, top_k: int):
|
| 219 |
+
"""Wrap classifier(...) so a single model failure can't bring
|
| 220 |
+
down the whole demographic block."""
|
| 221 |
if classifier is None:
|
| 222 |
return []
|
| 223 |
try:
|
analyzers/emotion_analyzer.py
CHANGED
|
@@ -1,17 +1,38 @@
|
|
| 1 |
"""
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
Classes:
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
"""
|
| 16 |
|
| 17 |
from contextlib import contextmanager
|
|
@@ -33,7 +54,8 @@ EMOTION_LABELS = [
|
|
| 33 |
"happiness", "neutral", "sadness", "surprise",
|
| 34 |
]
|
| 35 |
|
| 36 |
-
#
|
|
|
|
| 37 |
VALENCE_MAP = {
|
| 38 |
"anger": -0.6,
|
| 39 |
"contempt": -0.3,
|
|
@@ -45,7 +67,7 @@ VALENCE_MAP = {
|
|
| 45 |
"surprise": 0.3,
|
| 46 |
}
|
| 47 |
|
| 48 |
-
#
|
| 49 |
AROUSAL_MAP = {
|
| 50 |
"anger": 0.8,
|
| 51 |
"contempt": 0.3,
|
|
@@ -62,15 +84,11 @@ HSEMOTION_MODEL_NAME = "enet_b0_8_best_afew"
|
|
| 62 |
|
| 63 |
@contextmanager
|
| 64 |
def _legacy_torch_load():
|
| 65 |
-
"""Temporarily
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
deserialize it. We trust this checkpoint (it comes from the published
|
| 71 |
-
HSEmotion repo and was already vetted by the pip install), so we opt
|
| 72 |
-
back into legacy loading — scoped to just the HSEmotion init so the
|
| 73 |
-
rest of the process keeps the safer default.
|
| 74 |
"""
|
| 75 |
original_load = torch.load
|
| 76 |
|
|
@@ -91,6 +109,9 @@ class EmotionAnalyzer:
|
|
| 91 |
self.recognizer = self._load_model()
|
| 92 |
|
| 93 |
def _load_model(self):
|
|
|
|
|
|
|
|
|
|
| 94 |
if not HAS_HSEMOTION:
|
| 95 |
print(
|
| 96 |
"[EmotionAnalyzer] hsemotion not installed — emotion outputs "
|
|
@@ -114,12 +135,16 @@ class EmotionAnalyzer:
|
|
| 114 |
|
| 115 |
try:
|
| 116 |
# logits=False → returns post-softmax probabilities.
|
| 117 |
-
#
|
|
|
|
| 118 |
_, scores = self.recognizer.predict_emotions(img_rgb, logits=False)
|
| 119 |
except Exception as exc:
|
| 120 |
print(f"[EmotionAnalyzer] Inference failed: {exc}")
|
| 121 |
return self._empty_result()
|
| 122 |
|
|
|
|
|
|
|
|
|
|
| 123 |
probs = np.asarray(scores, dtype=float).flatten()
|
| 124 |
if probs.size != len(EMOTION_LABELS):
|
| 125 |
print(
|
|
@@ -129,26 +154,31 @@ class EmotionAnalyzer:
|
|
| 129 |
)
|
| 130 |
return self._empty_result()
|
| 131 |
|
| 132 |
-
# Defensive
|
| 133 |
-
# guards against future API drift in the hsemotion package.
|
| 134 |
total = probs.sum()
|
| 135 |
if total > 0:
|
| 136 |
probs = probs / total
|
| 137 |
|
|
|
|
| 138 |
emotion_scores = {
|
| 139 |
label: round(float(probs[i]), 3)
|
| 140 |
for i, label in enumerate(EMOTION_LABELS)
|
| 141 |
}
|
| 142 |
|
|
|
|
|
|
|
|
|
|
| 143 |
primary_idx = int(np.argmax(probs))
|
| 144 |
primary_emotion = EMOTION_LABELS[primary_idx]
|
| 145 |
primary_confidence = float(probs[primary_idx])
|
| 146 |
|
| 147 |
-
# Secondary emotion (second highest)
|
| 148 |
sorted_idx = np.argsort(probs)[::-1]
|
| 149 |
secondary_emotion = EMOTION_LABELS[int(sorted_idx[1])]
|
| 150 |
|
| 151 |
-
#
|
|
|
|
|
|
|
| 152 |
valence = sum(
|
| 153 |
probs[i] * VALENCE_MAP[label]
|
| 154 |
for i, label in enumerate(EMOTION_LABELS)
|
|
@@ -174,6 +204,7 @@ class EmotionAnalyzer:
|
|
| 174 |
|
| 175 |
@staticmethod
|
| 176 |
def _empty_result() -> dict[str, Any]:
|
|
|
|
| 177 |
return {
|
| 178 |
"primary_emotion": "unknown",
|
| 179 |
"emotion_confidence": 0.0,
|
|
@@ -182,4 +213,4 @@ class EmotionAnalyzer:
|
|
| 182 |
"valence": 0.0,
|
| 183 |
"arousal": 0.0,
|
| 184 |
"mood": "unknown",
|
| 185 |
-
}
|
|
|
|
| 1 |
"""
|
| 2 |
+
EmotionAnalyzer — HSEmotion 8-class facial emotion recognition.
|
| 3 |
+
|
| 4 |
+
Model
|
| 5 |
+
-----
|
| 6 |
+
- Architecture : EfficientNet-B0
|
| 7 |
+
- Checkpoint : enet_b0_8_best_afew (Savchenko et al.)
|
| 8 |
+
published by the hsemotion PyPI package
|
| 9 |
+
- Classes (8) : anger, contempt, disgust, fear, happiness,
|
| 10 |
+
neutral, sadness, surprise
|
| 11 |
+
- License : Apache 2.0 (hsemotion package)
|
| 12 |
+
- Source : https://github.com/HSE-asavchenko/face-emotion-recognition
|
| 13 |
+
|
| 14 |
+
Inputs
|
| 15 |
+
------
|
| 16 |
+
img_rgb : np.ndarray (H, W, 3) uint8. HSEmotionRecognizer handles its
|
| 17 |
+
own resize/normalise internally.
|
| 18 |
+
|
| 19 |
+
Outputs (dict)
|
| 20 |
+
--------------
|
| 21 |
+
primary_emotion, emotion_confidence, secondary_emotion,
|
| 22 |
+
emotion_scores (full distribution), valence (-1..+1), arousal (0..1),
|
| 23 |
+
mood (positive | negative | neutral).
|
| 24 |
+
|
| 25 |
+
Notes
|
| 26 |
+
-----
|
| 27 |
+
Valence and arousal are derived from the emotion distribution using
|
| 28 |
+
hand-set per-emotion weights (VALENCE_MAP / AROUSAL_MAP) — they are
|
| 29 |
+
weighted sums, not separate model outputs.
|
| 30 |
+
|
| 31 |
+
PyTorch 2.6 changed torch.load to weights_only=True by default. The
|
| 32 |
+
HSEmotion checkpoint is pickled as a full timm EfficientNet object
|
| 33 |
+
(not a clean state dict), so the safe unpickler refuses to load it.
|
| 34 |
+
We scope a legacy weights_only=False just around the HSEmotion init
|
| 35 |
+
to keep the rest of the process on the safer default.
|
| 36 |
"""
|
| 37 |
|
| 38 |
from contextlib import contextmanager
|
|
|
|
| 54 |
"happiness", "neutral", "sadness", "surprise",
|
| 55 |
]
|
| 56 |
|
| 57 |
+
# Per-emotion valence weights. Used to project the 8-class distribution
|
| 58 |
+
# down to a single scalar in [-1, 1] (negative = sad/angry, positive = happy).
|
| 59 |
VALENCE_MAP = {
|
| 60 |
"anger": -0.6,
|
| 61 |
"contempt": -0.3,
|
|
|
|
| 67 |
"surprise": 0.3,
|
| 68 |
}
|
| 69 |
|
| 70 |
+
# Per-emotion arousal weights, scalar in [0, 1] (0 = calm, 1 = intense).
|
| 71 |
AROUSAL_MAP = {
|
| 72 |
"anger": 0.8,
|
| 73 |
"contempt": 0.3,
|
|
|
|
| 84 |
|
| 85 |
@contextmanager
|
| 86 |
def _legacy_torch_load():
|
| 87 |
+
"""Temporarily switch torch.load back to weights_only=False.
|
| 88 |
+
|
| 89 |
+
Scoped via a context manager so only the HSEmotion init runs with
|
| 90 |
+
the legacy default; everything else keeps PyTorch 2.6's safer
|
| 91 |
+
weights_only=True behaviour.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 92 |
"""
|
| 93 |
original_load = torch.load
|
| 94 |
|
|
|
|
| 109 |
self.recognizer = self._load_model()
|
| 110 |
|
| 111 |
def _load_model(self):
|
| 112 |
+
# Without the hsemotion package installed there's no model to
|
| 113 |
+
# load. We log once and the rest of the service still works —
|
| 114 |
+
# the emotion fields just stay "unknown".
|
| 115 |
if not HAS_HSEMOTION:
|
| 116 |
print(
|
| 117 |
"[EmotionAnalyzer] hsemotion not installed — emotion outputs "
|
|
|
|
| 135 |
|
| 136 |
try:
|
| 137 |
# logits=False → returns post-softmax probabilities.
|
| 138 |
+
# The recognizer handles its own resize/normalize/preproc,
|
| 139 |
+
# so we hand it the raw RGB ndarray.
|
| 140 |
_, scores = self.recognizer.predict_emotions(img_rgb, logits=False)
|
| 141 |
except Exception as exc:
|
| 142 |
print(f"[EmotionAnalyzer] Inference failed: {exc}")
|
| 143 |
return self._empty_result()
|
| 144 |
|
| 145 |
+
# Flatten to a 1D numpy array and sanity-check its length matches
|
| 146 |
+
# the class list. Mismatch likely means the upstream package
|
| 147 |
+
# changed its class count.
|
| 148 |
probs = np.asarray(scores, dtype=float).flatten()
|
| 149 |
if probs.size != len(EMOTION_LABELS):
|
| 150 |
print(
|
|
|
|
| 154 |
)
|
| 155 |
return self._empty_result()
|
| 156 |
|
| 157 |
+
# Defensive renormalisation. With logits=False this is a no-op,
|
| 158 |
+
# but it guards against future API drift in the hsemotion package.
|
| 159 |
total = probs.sum()
|
| 160 |
if total > 0:
|
| 161 |
probs = probs / total
|
| 162 |
|
| 163 |
+
# Build the {emotion: probability} dict for downstream display.
|
| 164 |
emotion_scores = {
|
| 165 |
label: round(float(probs[i]), 3)
|
| 166 |
for i, label in enumerate(EMOTION_LABELS)
|
| 167 |
}
|
| 168 |
|
| 169 |
+
# Primary = argmax of the distribution; secondary = second-highest.
|
| 170 |
+
# These are the two most-likely emotions, useful when the model
|
| 171 |
+
# is genuinely uncertain between two similar classes.
|
| 172 |
primary_idx = int(np.argmax(probs))
|
| 173 |
primary_emotion = EMOTION_LABELS[primary_idx]
|
| 174 |
primary_confidence = float(probs[primary_idx])
|
| 175 |
|
|
|
|
| 176 |
sorted_idx = np.argsort(probs)[::-1]
|
| 177 |
secondary_emotion = EMOTION_LABELS[int(sorted_idx[1])]
|
| 178 |
|
| 179 |
+
# Valence and arousal: weighted sums over the distribution. A
|
| 180 |
+
# confidently-happy face gives valence ~0.9; a fearful one drops
|
| 181 |
+
# into negative territory with high arousal.
|
| 182 |
valence = sum(
|
| 183 |
probs[i] * VALENCE_MAP[label]
|
| 184 |
for i, label in enumerate(EMOTION_LABELS)
|
|
|
|
| 204 |
|
| 205 |
@staticmethod
|
| 206 |
def _empty_result() -> dict[str, Any]:
|
| 207 |
+
"""Stub used when HSEmotion isn't available or inference fails."""
|
| 208 |
return {
|
| 209 |
"primary_emotion": "unknown",
|
| 210 |
"emotion_confidence": 0.0,
|
|
|
|
| 213 |
"valence": 0.0,
|
| 214 |
"arousal": 0.0,
|
| 215 |
"mood": "unknown",
|
| 216 |
+
}
|
analyzers/hair_type_analyzer.py
ADDED
|
@@ -0,0 +1,87 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
HairTypeAnalyzer — hair texture classifier.
|
| 3 |
+
|
| 4 |
+
Model
|
| 5 |
+
-----
|
| 6 |
+
- Architecture : Vision Transformer (ViT-B/16)
|
| 7 |
+
- HF repo : dima806/hair_type_image_detection
|
| 8 |
+
- License : Apache 2.0
|
| 9 |
+
- Classes (5) : curly, dreadlocks, kinky, straight, wavy
|
| 10 |
+
- Reported acc : 93% overall.
|
| 11 |
+
Per-class F1: dreadlocks 0.978, kinky 0.949,
|
| 12 |
+
straight 0.927, curly 0.902, wavy 0.884.
|
| 13 |
+
|
| 14 |
+
Inputs
|
| 15 |
+
------
|
| 16 |
+
img_rgb : np.ndarray (H, W, 3) uint8
|
| 17 |
+
|
| 18 |
+
Outputs (dict)
|
| 19 |
+
--------------
|
| 20 |
+
hair_type — argmax label
|
| 21 |
+
hair_type_confidence — argmax softmax score
|
| 22 |
+
hair_type_scores — full {class: score} dict
|
| 23 |
+
|
| 24 |
+
Notes
|
| 25 |
+
-----
|
| 26 |
+
This is the authoritative hair-texture output. The Laplacian-std-
|
| 27 |
+
based `hair_texture` field from ColorAnalyzer is a coarse fallback
|
| 28 |
+
that runs even when this model is unavailable.
|
| 29 |
+
"""
|
| 30 |
+
|
| 31 |
+
from typing import Any
|
| 32 |
+
|
| 33 |
+
from PIL import Image
|
| 34 |
+
from transformers import pipeline
|
| 35 |
+
|
| 36 |
+
|
| 37 |
+
MODEL_ID = "dima806/hair_type_image_detection"
|
| 38 |
+
|
| 39 |
+
# Canonical class names in lowercase. Pipeline output is normalised
|
| 40 |
+
# to these on the way out.
|
| 41 |
+
_KNOWN = {"curly", "dreadlocks", "kinky", "straight", "wavy"}
|
| 42 |
+
|
| 43 |
+
|
| 44 |
+
class HairTypeAnalyzer:
|
| 45 |
+
def __init__(self):
|
| 46 |
+
self.classifier = None
|
| 47 |
+
try:
|
| 48 |
+
self.classifier = pipeline("image-classification", model=MODEL_ID)
|
| 49 |
+
except Exception as exc:
|
| 50 |
+
print(f"[HairTypeAnalyzer] Failed to load {MODEL_ID}: {exc}")
|
| 51 |
+
|
| 52 |
+
def analyze(self, img_rgb) -> dict[str, Any]:
|
| 53 |
+
if self.classifier is None:
|
| 54 |
+
return self._empty_result()
|
| 55 |
+
|
| 56 |
+
try:
|
| 57 |
+
pil = Image.fromarray(img_rgb)
|
| 58 |
+
# Pull all five class probabilities so downstream code can
|
| 59 |
+
# inspect the full distribution (e.g. wavy-vs-curly margin).
|
| 60 |
+
preds = self.classifier(pil, top_k=len(_KNOWN))
|
| 61 |
+
except Exception as exc:
|
| 62 |
+
print(f"[HairTypeAnalyzer] Prediction failed: {exc}")
|
| 63 |
+
return self._empty_result()
|
| 64 |
+
|
| 65 |
+
# Normalise label casing and build the score map.
|
| 66 |
+
scores = {label: 0.0 for label in _KNOWN}
|
| 67 |
+
for pred in preds:
|
| 68 |
+
label = str(pred["label"]).strip().lower()
|
| 69 |
+
if label in scores:
|
| 70 |
+
scores[label] = round(float(pred["score"]), 3)
|
| 71 |
+
|
| 72 |
+
top_label = max(scores, key=scores.get)
|
| 73 |
+
top_score = scores[top_label]
|
| 74 |
+
|
| 75 |
+
return {
|
| 76 |
+
"hair_type": top_label,
|
| 77 |
+
"hair_type_confidence": top_score,
|
| 78 |
+
"hair_type_scores": scores,
|
| 79 |
+
}
|
| 80 |
+
|
| 81 |
+
@staticmethod
|
| 82 |
+
def _empty_result() -> dict[str, Any]:
|
| 83 |
+
return {
|
| 84 |
+
"hair_type": "unknown",
|
| 85 |
+
"hair_type_confidence": 0.0,
|
| 86 |
+
"hair_type_scores": {label: 0.0 for label in _KNOWN},
|
| 87 |
+
}
|
analyzers/landmark_analyzer.py
CHANGED
|
@@ -1,20 +1,43 @@
|
|
| 1 |
"""
|
| 2 |
-
MediaPipe Face Landmarker
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
-
|
| 8 |
-
-
|
| 9 |
-
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
-
|
| 17 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
"""
|
| 19 |
|
| 20 |
import math
|
|
@@ -27,6 +50,7 @@ import numpy as np
|
|
| 27 |
from mediapipe.tasks import python as mp_python
|
| 28 |
from mediapipe.tasks.python import vision
|
| 29 |
|
|
|
|
| 30 |
MODEL_URL = (
|
| 31 |
"https://storage.googleapis.com/mediapipe-models/"
|
| 32 |
"face_landmarker/face_landmarker/float16/latest/face_landmarker.task"
|
|
@@ -36,6 +60,9 @@ MODEL_PATH = "models/face_landmarker.task"
|
|
| 36 |
|
| 37 |
class LandmarkAnalyzer:
|
| 38 |
def __init__(self):
|
|
|
|
|
|
|
|
|
|
| 39 |
base_options = mp_python.BaseOptions(
|
| 40 |
model_asset_path=self._ensure_model()
|
| 41 |
)
|
|
@@ -49,7 +76,7 @@ class LandmarkAnalyzer:
|
|
| 49 |
|
| 50 |
@staticmethod
|
| 51 |
def _ensure_model() -> str:
|
| 52 |
-
"""
|
| 53 |
if not os.path.exists(MODEL_PATH):
|
| 54 |
os.makedirs("models", exist_ok=True)
|
| 55 |
urllib.request.urlretrieve(MODEL_URL, MODEL_PATH)
|
|
@@ -60,34 +87,48 @@ class LandmarkAnalyzer:
|
|
| 60 |
# ------------------------------------------------------------------
|
| 61 |
|
| 62 |
def analyze(self, img_rgb: np.ndarray) -> dict[str, Any]:
|
|
|
|
|
|
|
|
|
|
| 63 |
mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=img_rgb)
|
| 64 |
result = self.detector.detect(mp_image)
|
| 65 |
|
| 66 |
if not result.face_landmarks:
|
| 67 |
return {"error": "No face detected by MediaPipe"}
|
| 68 |
|
|
|
|
|
|
|
| 69 |
landmarks = result.face_landmarks[0]
|
| 70 |
lm = [{"x": l.x, "y": l.y, "z": l.z} for l in landmarks]
|
| 71 |
|
|
|
|
|
|
|
| 72 |
blendshapes: dict[str, float] = {}
|
| 73 |
if result.face_blendshapes:
|
| 74 |
for bs in result.face_blendshapes[0]:
|
| 75 |
blendshapes[bs.category_name] = round(bs.score, 4)
|
| 76 |
|
|
|
|
|
|
|
| 77 |
attrs: dict[str, Any] = {"_raw_landmarks": lm}
|
| 78 |
|
| 79 |
# ── Face Shape ────────────────────────────────────────────────
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
|
|
|
|
|
|
|
|
|
| 85 |
|
| 86 |
wh_ratio = face_width / face_height if face_height else 1
|
| 87 |
jaw_to_face = jaw_width / face_width if face_width else 1
|
| 88 |
forehead_to_jaw = forehead_width / jaw_width if jaw_width else 1
|
| 89 |
cheek_to_jaw = cheekbone_width / jaw_width if jaw_width else 1
|
| 90 |
|
|
|
|
|
|
|
| 91 |
if wh_ratio > 0.85 and jaw_to_face > 0.75:
|
| 92 |
attrs["face_shape"] = "round"
|
| 93 |
elif wh_ratio > 0.8 and jaw_to_face > 0.8 and forehead_to_jaw < 1.1:
|
|
@@ -110,13 +151,16 @@ class LandmarkAnalyzer:
|
|
| 110 |
"cheekbone_to_jaw_ratio": round(cheek_to_jaw, 3),
|
| 111 |
}
|
| 112 |
|
| 113 |
-
# ── Forehead ────────────────
|
|
|
|
| 114 |
fh_ratio = forehead_width / face_width if face_width else 0.6
|
| 115 |
attrs["forehead_width"] = (
|
| 116 |
"broad" if fh_ratio > 0.7 else "narrow" if fh_ratio < 0.55 else "average"
|
| 117 |
)
|
| 118 |
|
| 119 |
# ── Jawline ──────────────────────────────────────────────────
|
|
|
|
|
|
|
| 120 |
jaw_angle = self._jaw_angle(lm)
|
| 121 |
attrs["jawline_angle"] = round(jaw_angle, 1)
|
| 122 |
if jaw_angle < 110:
|
|
@@ -129,6 +173,7 @@ class LandmarkAnalyzer:
|
|
| 129 |
attrs["jawline_type"] = "soft"
|
| 130 |
|
| 131 |
# ── Chin ─────────────────────────────────────────────────────
|
|
|
|
| 132 |
chin_width = self._dist(lm[175], lm[396])
|
| 133 |
chin_ratio = chin_width / jaw_width if jaw_width else 0.4
|
| 134 |
attrs["chin_type"] = (
|
|
@@ -138,12 +183,16 @@ class LandmarkAnalyzer:
|
|
| 138 |
)
|
| 139 |
|
| 140 |
# ── Cheekbones ───────────────────────────────────────────────
|
|
|
|
|
|
|
| 141 |
cheek_z = (lm[93]["z"] + lm[323]["z"]) / 2
|
| 142 |
attrs["cheekbone_prominence"] = (
|
| 143 |
"high" if cheek_z < -0.04
|
| 144 |
else "flat" if cheek_z > 0.0
|
| 145 |
else "moderate"
|
| 146 |
)
|
|
|
|
|
|
|
| 147 |
cheek_puff = blendshapes.get("cheekPuff", 0)
|
| 148 |
if cheek_puff > 0.3:
|
| 149 |
attrs["cheek_fullness"] = "full"
|
|
@@ -153,12 +202,16 @@ class LandmarkAnalyzer:
|
|
| 153 |
attrs["cheek_fullness"] = "normal"
|
| 154 |
|
| 155 |
# ── Eyes ─────────────────────────────────────────────────────
|
|
|
|
|
|
|
| 156 |
l_top, l_bot = lm[159], lm[145]
|
| 157 |
l_inner, l_outer = lm[133], lm[33]
|
| 158 |
eye_open = self._dist(l_top, l_bot)
|
| 159 |
eye_w = self._dist(l_inner, l_outer)
|
| 160 |
eye_ratio = eye_open / eye_w if eye_w else 0.3
|
| 161 |
|
|
|
|
|
|
|
| 162 |
outer_angle = l_outer["y"] - l_inner["y"]
|
| 163 |
if outer_angle < -0.012:
|
| 164 |
attrs["eye_shape"] = "upturned"
|
|
@@ -171,7 +224,7 @@ class LandmarkAnalyzer:
|
|
| 171 |
else:
|
| 172 |
attrs["eye_shape"] = "almond"
|
| 173 |
|
| 174 |
-
# Deep-set vs protruding
|
| 175 |
eye_z = (lm[159]["z"] + lm[145]["z"]) / 2
|
| 176 |
nose_bridge_z = lm[6]["z"]
|
| 177 |
if eye_z > nose_bridge_z + 0.02:
|
|
@@ -181,7 +234,8 @@ class LandmarkAnalyzer:
|
|
| 181 |
else:
|
| 182 |
attrs["eye_depth"] = "normal"
|
| 183 |
|
| 184 |
-
# Eye spacing
|
|
|
|
| 185 |
if len(lm) > 473:
|
| 186 |
inter_pupillary = self._dist(lm[468], lm[473])
|
| 187 |
else:
|
|
@@ -193,7 +247,8 @@ class LandmarkAnalyzer:
|
|
| 193 |
else "average"
|
| 194 |
)
|
| 195 |
|
| 196 |
-
# Eye size
|
|
|
|
| 197 |
r_top, r_bot = lm[386], lm[374]
|
| 198 |
r_inner, r_outer = lm[362], lm[263]
|
| 199 |
r_area = self._dist(r_top, r_bot) * self._dist(r_inner, r_outer)
|
|
@@ -207,6 +262,8 @@ class LandmarkAnalyzer:
|
|
| 207 |
else "average"
|
| 208 |
)
|
| 209 |
|
|
|
|
|
|
|
| 210 |
blink_l = blendshapes.get("eyeBlinkLeft", 0)
|
| 211 |
blink_r = blendshapes.get("eyeBlinkRight", 0)
|
| 212 |
attrs["eyes_open"] = (blink_l + blink_r) / 2 < 0.5
|
|
@@ -215,6 +272,8 @@ class LandmarkAnalyzer:
|
|
| 215 |
brow_mid = lm[105]
|
| 216 |
brow_outer = lm[46]
|
| 217 |
brow_inner = lm[70]
|
|
|
|
|
|
|
| 218 |
brow_to_eye = self._dist(brow_mid, lm[159])
|
| 219 |
brow_arch_ratio = brow_to_eye / eye_open if eye_open else 1.5
|
| 220 |
|
|
@@ -224,6 +283,8 @@ class LandmarkAnalyzer:
|
|
| 224 |
else "average"
|
| 225 |
)
|
| 226 |
|
|
|
|
|
|
|
| 227 |
mid_y = brow_mid["y"]
|
| 228 |
avg_end_y = (brow_inner["y"] + brow_outer["y"]) / 2
|
| 229 |
curvature = mid_y - avg_end_y
|
|
@@ -234,6 +295,7 @@ class LandmarkAnalyzer:
|
|
| 234 |
else:
|
| 235 |
attrs["eyebrow_shape"] = "flat"
|
| 236 |
|
|
|
|
| 237 |
brow_top = lm[66]
|
| 238 |
brow_bottom = lm[105]
|
| 239 |
brow_thickness = self._dist(brow_top, brow_bottom)
|
|
@@ -243,6 +305,7 @@ class LandmarkAnalyzer:
|
|
| 243 |
else "medium"
|
| 244 |
)
|
| 245 |
|
|
|
|
| 246 |
inner_brow_dist = self._dist(lm[70], lm[300])
|
| 247 |
attrs["possible_unibrow"] = inner_brow_dist < 0.04
|
| 248 |
|
|
@@ -261,6 +324,8 @@ class LandmarkAnalyzer:
|
|
| 261 |
else "average"
|
| 262 |
)
|
| 263 |
|
|
|
|
|
|
|
| 264 |
tip_angle = nose_tip["y"] - nose_bottom["y"]
|
| 265 |
if tip_angle < -0.005:
|
| 266 |
attrs["nose_shape"] = "upturned"
|
|
@@ -273,16 +338,21 @@ class LandmarkAnalyzer:
|
|
| 273 |
else:
|
| 274 |
attrs["nose_shape"] = "straight"
|
| 275 |
|
|
|
|
| 276 |
attrs["nose_bridge"] = (
|
| 277 |
"high" if nose_bridge_top["z"] < -0.05
|
| 278 |
else "flat" if nose_bridge_top["z"] > 0.0
|
| 279 |
else "average"
|
| 280 |
)
|
|
|
|
| 281 |
attrs["nose_tip_shape"] = (
|
| 282 |
"pointed" if nose_tip["z"] < nose_bottom["z"] - 0.01 else "rounded"
|
| 283 |
)
|
| 284 |
|
| 285 |
# ── Lips & Mouth ─────────────────────────────────────────────
|
|
|
|
|
|
|
|
|
|
| 286 |
ul_top, ul_bot = lm[0], lm[13]
|
| 287 |
ll_top, ll_bot = lm[14], lm[17]
|
| 288 |
m_left, m_right = lm[61], lm[291]
|
|
@@ -298,6 +368,7 @@ class LandmarkAnalyzer:
|
|
| 298 |
else "thin" if lip_ratio < 0.22
|
| 299 |
else "average"
|
| 300 |
)
|
|
|
|
| 301 |
attrs["lip_balance"] = (
|
| 302 |
"top-heavy" if ul_h > ll_h * 1.2
|
| 303 |
else "bottom-heavy" if ll_h > ul_h * 1.2
|
|
@@ -311,7 +382,8 @@ class LandmarkAnalyzer:
|
|
| 311 |
else "average"
|
| 312 |
)
|
| 313 |
|
| 314 |
-
# Cupid's bow
|
|
|
|
| 315 |
c_left, c_center, c_right = lm[37], lm[0], lm[267]
|
| 316 |
bow = c_center["y"] - (c_left["y"] + c_right["y"]) / 2
|
| 317 |
attrs["cupids_bow"] = (
|
|
@@ -320,7 +392,9 @@ class LandmarkAnalyzer:
|
|
| 320 |
else "flat"
|
| 321 |
)
|
| 322 |
|
| 323 |
-
#
|
|
|
|
|
|
|
| 324 |
smile_l = blendshapes.get("mouthSmileLeft", 0)
|
| 325 |
smile_r = blendshapes.get("mouthSmileRight", 0)
|
| 326 |
attrs["smiling"] = (smile_l + smile_r) / 2 > 0.4
|
|
@@ -330,6 +404,9 @@ class LandmarkAnalyzer:
|
|
| 330 |
)
|
| 331 |
|
| 332 |
# ── Facial Asymmetry ─────────────────────────────────────────
|
|
|
|
|
|
|
|
|
|
| 333 |
pairs = [
|
| 334 |
(33, 263), (133, 362), (70, 300), (93, 323), (172, 397),
|
| 335 |
(61, 291), (159, 386), (145, 374), (46, 276),
|
|
@@ -341,6 +418,8 @@ class LandmarkAnalyzer:
|
|
| 341 |
min(asym / len(pairs) / 0.05, 1.0), 3
|
| 342 |
)
|
| 343 |
|
|
|
|
|
|
|
| 344 |
attrs["blendshapes"] = blendshapes
|
| 345 |
return attrs
|
| 346 |
|
|
@@ -350,6 +429,7 @@ class LandmarkAnalyzer:
|
|
| 350 |
|
| 351 |
@staticmethod
|
| 352 |
def _dist(a: dict, b: dict) -> float:
|
|
|
|
| 353 |
return math.sqrt(
|
| 354 |
(a["x"] - b["x"]) ** 2
|
| 355 |
+ (a["y"] - b["y"]) ** 2
|
|
@@ -358,6 +438,11 @@ class LandmarkAnalyzer:
|
|
| 358 |
|
| 359 |
@staticmethod
|
| 360 |
def _jaw_angle(lm: list[dict]) -> float:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 361 |
chin = lm[152]
|
| 362 |
left_jaw, right_jaw = lm[172], lm[397]
|
| 363 |
v1 = (left_jaw["x"] - chin["x"], left_jaw["y"] - chin["y"])
|
|
|
|
| 1 |
"""
|
| 2 |
+
LandmarkAnalyzer — MediaPipe Face Landmarker geometric feature extractor.
|
| 3 |
+
|
| 4 |
+
Model
|
| 5 |
+
-----
|
| 6 |
+
- Architecture : MediaPipe Face Landmarker (TF Lite, Google)
|
| 7 |
+
- Weights : face_landmarker.task (float16, auto-downloaded, ~4 MB)
|
| 8 |
+
- Outputs : 478 normalised 3D landmarks + 52 ARKit-compatible blendshapes
|
| 9 |
+
- License : Apache 2.0
|
| 10 |
+
|
| 11 |
+
Inputs
|
| 12 |
+
------
|
| 13 |
+
img_rgb : np.ndarray (H, W, 3) uint8, RGB order.
|
| 14 |
+
|
| 15 |
+
Outputs (dict)
|
| 16 |
+
--------------
|
| 17 |
+
Most fields are categorical strings derived from landmark distances,
|
| 18 |
+
ratios and angles. A few come straight from blendshape activations.
|
| 19 |
+
|
| 20 |
+
Face shape / structure :
|
| 21 |
+
face_shape, face_shape_metrics, forehead_width,
|
| 22 |
+
jawline_angle, jawline_type, chin_type,
|
| 23 |
+
cheekbone_prominence, cheek_fullness, facial_asymmetry_score
|
| 24 |
+
Eyes :
|
| 25 |
+
eye_shape, eye_depth, eye_spacing, eye_size, eyes_open
|
| 26 |
+
Eyebrows :
|
| 27 |
+
eyebrow_arch_height, eyebrow_shape, eyebrow_thickness, possible_unibrow
|
| 28 |
+
Nose :
|
| 29 |
+
nose_shape, nose_bridge, nose_tip_shape, nostril_width
|
| 30 |
+
Lips & mouth :
|
| 31 |
+
lip_fullness, lip_balance, mouth_width, cupids_bow,
|
| 32 |
+
smiling, smile_asymmetry, possible_dimples
|
| 33 |
+
Raw payloads (used downstream, stripped before JSON) :
|
| 34 |
+
_raw_landmarks, blendshapes
|
| 35 |
+
|
| 36 |
+
Notes
|
| 37 |
+
-----
|
| 38 |
+
All thresholds were hand-tuned against representative photos.
|
| 39 |
+
They are conservative: when a ratio sits near a boundary the analyzer
|
| 40 |
+
prefers "average" / "normal" over committing to an extreme bucket.
|
| 41 |
"""
|
| 42 |
|
| 43 |
import math
|
|
|
|
| 50 |
from mediapipe.tasks import python as mp_python
|
| 51 |
from mediapipe.tasks.python import vision
|
| 52 |
|
| 53 |
+
# Float16 MediaPipe weight file. ~4 MB, auto-fetched once and cached.
|
| 54 |
MODEL_URL = (
|
| 55 |
"https://storage.googleapis.com/mediapipe-models/"
|
| 56 |
"face_landmarker/face_landmarker/float16/latest/face_landmarker.task"
|
|
|
|
| 60 |
|
| 61 |
class LandmarkAnalyzer:
|
| 62 |
def __init__(self):
|
| 63 |
+
# Configure the detector to emit both blendshapes and the 4x4
|
| 64 |
+
# facial transformation matrix; the latter is unused for now but
|
| 65 |
+
# cheap to compute and useful if we ever need head pose.
|
| 66 |
base_options = mp_python.BaseOptions(
|
| 67 |
model_asset_path=self._ensure_model()
|
| 68 |
)
|
|
|
|
| 76 |
|
| 77 |
@staticmethod
|
| 78 |
def _ensure_model() -> str:
|
| 79 |
+
"""Cache the MediaPipe weight file on disk on first run."""
|
| 80 |
if not os.path.exists(MODEL_PATH):
|
| 81 |
os.makedirs("models", exist_ok=True)
|
| 82 |
urllib.request.urlretrieve(MODEL_URL, MODEL_PATH)
|
|
|
|
| 87 |
# ------------------------------------------------------------------
|
| 88 |
|
| 89 |
def analyze(self, img_rgb: np.ndarray) -> dict[str, Any]:
|
| 90 |
+
# Wrap the numpy array as a MediaPipe Image and run detection.
|
| 91 |
+
# If no face is found, downstream analyzers will see no landmarks
|
| 92 |
+
# and gracefully degrade to "unknown" fields.
|
| 93 |
mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=img_rgb)
|
| 94 |
result = self.detector.detect(mp_image)
|
| 95 |
|
| 96 |
if not result.face_landmarks:
|
| 97 |
return {"error": "No face detected by MediaPipe"}
|
| 98 |
|
| 99 |
+
# MediaPipe returns landmarks as NamedTuples; convert to plain
|
| 100 |
+
# dicts so downstream code (and JSON serialisation) is simpler.
|
| 101 |
landmarks = result.face_landmarks[0]
|
| 102 |
lm = [{"x": l.x, "y": l.y, "z": l.z} for l in landmarks]
|
| 103 |
|
| 104 |
+
# Build the {blendshape_name: score} dict. ARKit-compatible names
|
| 105 |
+
# like mouthSmileLeft, eyeBlinkRight, jawOpen.
|
| 106 |
blendshapes: dict[str, float] = {}
|
| 107 |
if result.face_blendshapes:
|
| 108 |
for bs in result.face_blendshapes[0]:
|
| 109 |
blendshapes[bs.category_name] = round(bs.score, 4)
|
| 110 |
|
| 111 |
+
# _raw_landmarks is consumed by ColorAnalyzer (iris + lip masks).
|
| 112 |
+
# The leading underscore tells app.py to strip it before JSON.
|
| 113 |
attrs: dict[str, Any] = {"_raw_landmarks": lm}
|
| 114 |
|
| 115 |
# ── Face Shape ────────────────────────────────────────────────
|
| 116 |
+
# Four ratios that, taken together, distinguish the seven canonical
|
| 117 |
+
# face shapes. All distances are in normalised image coordinates
|
| 118 |
+
# so the ratios are resolution-independent.
|
| 119 |
+
face_height = self._dist(lm[10], lm[152]) # forehead top → chin bottom
|
| 120 |
+
face_width = self._dist(lm[234], lm[454]) # left zygion → right zygion
|
| 121 |
+
jaw_width = self._dist(lm[172], lm[397]) # left gonion → right gonion
|
| 122 |
+
cheekbone_width = self._dist(lm[93], lm[323]) # left zygomatic → right
|
| 123 |
+
forehead_width = self._dist(lm[54], lm[284]) # left frontal → right frontal
|
| 124 |
|
| 125 |
wh_ratio = face_width / face_height if face_height else 1
|
| 126 |
jaw_to_face = jaw_width / face_width if face_width else 1
|
| 127 |
forehead_to_jaw = forehead_width / jaw_width if jaw_width else 1
|
| 128 |
cheek_to_jaw = cheekbone_width / jaw_width if jaw_width else 1
|
| 129 |
|
| 130 |
+
# Cascade ordered by specificity — a face that matches multiple
|
| 131 |
+
# categories is bucketed by the strictest matching rule.
|
| 132 |
if wh_ratio > 0.85 and jaw_to_face > 0.75:
|
| 133 |
attrs["face_shape"] = "round"
|
| 134 |
elif wh_ratio > 0.8 and jaw_to_face > 0.8 and forehead_to_jaw < 1.1:
|
|
|
|
| 151 |
"cheekbone_to_jaw_ratio": round(cheek_to_jaw, 3),
|
| 152 |
}
|
| 153 |
|
| 154 |
+
# ── Forehead width (broad / average / narrow) ────────────────
|
| 155 |
+
# Forehead width relative to overall face width.
|
| 156 |
fh_ratio = forehead_width / face_width if face_width else 0.6
|
| 157 |
attrs["forehead_width"] = (
|
| 158 |
"broad" if fh_ratio > 0.7 else "narrow" if fh_ratio < 0.55 else "average"
|
| 159 |
)
|
| 160 |
|
| 161 |
# ── Jawline ──────────────────────────────────────────────────
|
| 162 |
+
# Angle subtended at the chin point by the two gonion landmarks.
|
| 163 |
+
# Smaller angle = sharper jawline; larger = softer.
|
| 164 |
jaw_angle = self._jaw_angle(lm)
|
| 165 |
attrs["jawline_angle"] = round(jaw_angle, 1)
|
| 166 |
if jaw_angle < 110:
|
|
|
|
| 173 |
attrs["jawline_type"] = "soft"
|
| 174 |
|
| 175 |
# ── Chin ─────────────────────────────────────────────────────
|
| 176 |
+
# Chin width vs jaw width: narrower chin → pointier appearance.
|
| 177 |
chin_width = self._dist(lm[175], lm[396])
|
| 178 |
chin_ratio = chin_width / jaw_width if jaw_width else 0.4
|
| 179 |
attrs["chin_type"] = (
|
|
|
|
| 183 |
)
|
| 184 |
|
| 185 |
# ── Cheekbones ───────────────────────────────────────────────
|
| 186 |
+
# Z (depth) is signed: negative values are closer to the camera.
|
| 187 |
+
# Prominent cheekbones project forward → more negative cheek_z.
|
| 188 |
cheek_z = (lm[93]["z"] + lm[323]["z"]) / 2
|
| 189 |
attrs["cheekbone_prominence"] = (
|
| 190 |
"high" if cheek_z < -0.04
|
| 191 |
else "flat" if cheek_z > 0.0
|
| 192 |
else "moderate"
|
| 193 |
)
|
| 194 |
+
# cheekPuff blendshape catches actively puffed-out cheeks; a flat
|
| 195 |
+
# cheek_z signals a hollow look in the absence of puff.
|
| 196 |
cheek_puff = blendshapes.get("cheekPuff", 0)
|
| 197 |
if cheek_puff > 0.3:
|
| 198 |
attrs["cheek_fullness"] = "full"
|
|
|
|
| 202 |
attrs["cheek_fullness"] = "normal"
|
| 203 |
|
| 204 |
# ── Eyes ─────────────────────────────────────────────────────
|
| 205 |
+
# Left-eye landmarks. eye_open is vertical lid distance,
|
| 206 |
+
# eye_w is the inner→outer corner distance.
|
| 207 |
l_top, l_bot = lm[159], lm[145]
|
| 208 |
l_inner, l_outer = lm[133], lm[33]
|
| 209 |
eye_open = self._dist(l_top, l_bot)
|
| 210 |
eye_w = self._dist(l_inner, l_outer)
|
| 211 |
eye_ratio = eye_open / eye_w if eye_w else 0.3
|
| 212 |
|
| 213 |
+
# Outer-corner Y relative to inner corner classifies tilt.
|
| 214 |
+
# Hooded vs round vs almond come from the openness ratio.
|
| 215 |
outer_angle = l_outer["y"] - l_inner["y"]
|
| 216 |
if outer_angle < -0.012:
|
| 217 |
attrs["eye_shape"] = "upturned"
|
|
|
|
| 224 |
else:
|
| 225 |
attrs["eye_shape"] = "almond"
|
| 226 |
|
| 227 |
+
# Deep-set vs protruding: compare eye-region z vs nose-bridge z.
|
| 228 |
eye_z = (lm[159]["z"] + lm[145]["z"]) / 2
|
| 229 |
nose_bridge_z = lm[6]["z"]
|
| 230 |
if eye_z > nose_bridge_z + 0.02:
|
|
|
|
| 234 |
else:
|
| 235 |
attrs["eye_depth"] = "normal"
|
| 236 |
|
| 237 |
+
# Eye spacing: prefer pupil-to-pupil if iris landmarks (468/473)
|
| 238 |
+
# are present, otherwise fall back to inner-corner distance.
|
| 239 |
if len(lm) > 473:
|
| 240 |
inter_pupillary = self._dist(lm[468], lm[473])
|
| 241 |
else:
|
|
|
|
| 247 |
else "average"
|
| 248 |
)
|
| 249 |
|
| 250 |
+
# Eye size: avg of left & right eye-region bounding-box area,
|
| 251 |
+
# relative to overall face area.
|
| 252 |
r_top, r_bot = lm[386], lm[374]
|
| 253 |
r_inner, r_outer = lm[362], lm[263]
|
| 254 |
r_area = self._dist(r_top, r_bot) * self._dist(r_inner, r_outer)
|
|
|
|
| 262 |
else "average"
|
| 263 |
)
|
| 264 |
|
| 265 |
+
# eyeBlink blendshapes flip to ~1.0 when the eye is closed.
|
| 266 |
+
# eyes_open = True iff average blink activation is < 0.5.
|
| 267 |
blink_l = blendshapes.get("eyeBlinkLeft", 0)
|
| 268 |
blink_r = blendshapes.get("eyeBlinkRight", 0)
|
| 269 |
attrs["eyes_open"] = (blink_l + blink_r) / 2 < 0.5
|
|
|
|
| 272 |
brow_mid = lm[105]
|
| 273 |
brow_outer = lm[46]
|
| 274 |
brow_inner = lm[70]
|
| 275 |
+
# Vertical distance from brow-mid to upper-eyelid is roughly
|
| 276 |
+
# proportional to perceived "arch height" relative to eye size.
|
| 277 |
brow_to_eye = self._dist(brow_mid, lm[159])
|
| 278 |
brow_arch_ratio = brow_to_eye / eye_open if eye_open else 1.5
|
| 279 |
|
|
|
|
| 283 |
else "average"
|
| 284 |
)
|
| 285 |
|
| 286 |
+
# Curvature = mid Y vs avg of inner+outer Ys. Negative curvature
|
| 287 |
+
# (mid sits higher than the ends) → arched; near-zero → straight.
|
| 288 |
mid_y = brow_mid["y"]
|
| 289 |
avg_end_y = (brow_inner["y"] + brow_outer["y"]) / 2
|
| 290 |
curvature = mid_y - avg_end_y
|
|
|
|
| 295 |
else:
|
| 296 |
attrs["eyebrow_shape"] = "flat"
|
| 297 |
|
| 298 |
+
# Brow thickness from top-to-bottom landmark span.
|
| 299 |
brow_top = lm[66]
|
| 300 |
brow_bottom = lm[105]
|
| 301 |
brow_thickness = self._dist(brow_top, brow_bottom)
|
|
|
|
| 305 |
else "medium"
|
| 306 |
)
|
| 307 |
|
| 308 |
+
# Inner-brow distance below ~4% of face width suggests a unibrow.
|
| 309 |
inner_brow_dist = self._dist(lm[70], lm[300])
|
| 310 |
attrs["possible_unibrow"] = inner_brow_dist < 0.04
|
| 311 |
|
|
|
|
| 324 |
else "average"
|
| 325 |
)
|
| 326 |
|
| 327 |
+
# Tip vertical offset relative to nose base distinguishes
|
| 328 |
+
# upturned (tip sits higher) from aquiline (tip droops down).
|
| 329 |
tip_angle = nose_tip["y"] - nose_bottom["y"]
|
| 330 |
if tip_angle < -0.005:
|
| 331 |
attrs["nose_shape"] = "upturned"
|
|
|
|
| 338 |
else:
|
| 339 |
attrs["nose_shape"] = "straight"
|
| 340 |
|
| 341 |
+
# Bridge: high bridges project toward camera (more negative z).
|
| 342 |
attrs["nose_bridge"] = (
|
| 343 |
"high" if nose_bridge_top["z"] < -0.05
|
| 344 |
else "flat" if nose_bridge_top["z"] > 0.0
|
| 345 |
else "average"
|
| 346 |
)
|
| 347 |
+
# Pointed tip: tip projects forward of nostril base.
|
| 348 |
attrs["nose_tip_shape"] = (
|
| 349 |
"pointed" if nose_tip["z"] < nose_bottom["z"] - 0.01 else "rounded"
|
| 350 |
)
|
| 351 |
|
| 352 |
# ── Lips & Mouth ─────────────────────────────────────────────
|
| 353 |
+
# Top and bottom of upper lip, top and bottom of lower lip, plus
|
| 354 |
+
# the mouth corners. lip_ratio compares stacked lip height to
|
| 355 |
+
# mouth width — full vs thin lips.
|
| 356 |
ul_top, ul_bot = lm[0], lm[13]
|
| 357 |
ll_top, ll_bot = lm[14], lm[17]
|
| 358 |
m_left, m_right = lm[61], lm[291]
|
|
|
|
| 368 |
else "thin" if lip_ratio < 0.22
|
| 369 |
else "average"
|
| 370 |
)
|
| 371 |
+
# Balance compares upper-lip thickness to lower-lip thickness.
|
| 372 |
attrs["lip_balance"] = (
|
| 373 |
"top-heavy" if ul_h > ll_h * 1.2
|
| 374 |
else "bottom-heavy" if ll_h > ul_h * 1.2
|
|
|
|
| 382 |
else "average"
|
| 383 |
)
|
| 384 |
|
| 385 |
+
# Cupid's bow: depression at the centre of the upper lip relative
|
| 386 |
+
# to the two peak landmarks on either side.
|
| 387 |
c_left, c_center, c_right = lm[37], lm[0], lm[267]
|
| 388 |
bow = c_center["y"] - (c_left["y"] + c_right["y"]) / 2
|
| 389 |
attrs["cupids_bow"] = (
|
|
|
|
| 392 |
else "flat"
|
| 393 |
)
|
| 394 |
|
| 395 |
+
# Smiling and dimples come directly from blendshape activations.
|
| 396 |
+
# smile_asymmetry is the absolute difference between left/right
|
| 397 |
+
# mouthSmile scores — non-zero on lopsided smiles.
|
| 398 |
smile_l = blendshapes.get("mouthSmileLeft", 0)
|
| 399 |
smile_r = blendshapes.get("mouthSmileRight", 0)
|
| 400 |
attrs["smiling"] = (smile_l + smile_r) / 2 > 0.4
|
|
|
|
| 404 |
)
|
| 405 |
|
| 406 |
# ── Facial Asymmetry ─────────────────────────────────────────
|
| 407 |
+
# Sum mirror-pair x-coordinate offsets from the midline (x=0.5)
|
| 408 |
+
# over 9 paired landmarks. Normalise so a perfectly symmetric
|
| 409 |
+
# face scores ~0 and visibly asymmetric ones approach 1.
|
| 410 |
pairs = [
|
| 411 |
(33, 263), (133, 362), (70, 300), (93, 323), (172, 397),
|
| 412 |
(61, 291), (159, 386), (145, 374), (46, 276),
|
|
|
|
| 418 |
min(asym / len(pairs) / 0.05, 1.0), 3
|
| 419 |
)
|
| 420 |
|
| 421 |
+
# Exposed for downstream consumers (e.g. the screen reads
|
| 422 |
+
# blendshapes.jawOpen to compute mouth_open).
|
| 423 |
attrs["blendshapes"] = blendshapes
|
| 424 |
return attrs
|
| 425 |
|
|
|
|
| 429 |
|
| 430 |
@staticmethod
|
| 431 |
def _dist(a: dict, b: dict) -> float:
|
| 432 |
+
"""Euclidean distance between two landmarks in 3D space."""
|
| 433 |
return math.sqrt(
|
| 434 |
(a["x"] - b["x"]) ** 2
|
| 435 |
+ (a["y"] - b["y"]) ** 2
|
|
|
|
| 438 |
|
| 439 |
@staticmethod
|
| 440 |
def _jaw_angle(lm: list[dict]) -> float:
|
| 441 |
+
"""Angle (degrees) subtended at the chin by the two gonion points.
|
| 442 |
+
|
| 443 |
+
Operates in 2D image space — z is intentionally ignored so the
|
| 444 |
+
angle reflects what the camera sees, not the underlying anatomy.
|
| 445 |
+
"""
|
| 446 |
chin = lm[152]
|
| 447 |
left_jaw, right_jaw = lm[172], lm[397]
|
| 448 |
v1 = (left_jaw["x"] - chin["x"], left_jaw["y"] - chin["y"])
|
analyzers/obstruction_analyzer.py
ADDED
|
@@ -0,0 +1,108 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
ObstructionAnalyzer — face obstruction classifier.
|
| 3 |
+
|
| 4 |
+
Model
|
| 5 |
+
-----
|
| 6 |
+
- Architecture : Vision Transformer (ViT-B/16)
|
| 7 |
+
- HF repo : dima806/face_obstruction_image_detection
|
| 8 |
+
- License : Apache 2.0
|
| 9 |
+
- Classes (6) : sunglasses, glasses, mask, hand, other, none
|
| 10 |
+
- Reported acc : ~91% overall.
|
| 11 |
+
99.7% / 99.85% precision/recall on sunglasses
|
| 12 |
+
99.0% / 99.7% precision/recall on glasses
|
| 13 |
+
99.7% / 99.85% precision/recall on mask
|
| 14 |
+
Hand and "other" are much weaker (~71-75%); we don't
|
| 15 |
+
surface those as booleans.
|
| 16 |
+
|
| 17 |
+
Inputs
|
| 18 |
+
------
|
| 19 |
+
img_rgb : np.ndarray (H, W, 3) uint8
|
| 20 |
+
|
| 21 |
+
Outputs (dict)
|
| 22 |
+
--------------
|
| 23 |
+
obstruction_top — argmax label
|
| 24 |
+
obstruction_confidence — argmax softmax score
|
| 25 |
+
obstruction_scores — full {class: score} dict
|
| 26 |
+
wearing_glasses — bool (true when glasses OR sunglasses > 0.5)
|
| 27 |
+
wearing_sunglasses — bool
|
| 28 |
+
wearing_mask — bool
|
| 29 |
+
|
| 30 |
+
Notes
|
| 31 |
+
-----
|
| 32 |
+
Same author as the FairFace age/gender models already in
|
| 33 |
+
DemographicAnalyzer. Built specifically for the glasses/sunglasses/mask
|
| 34 |
+
case, which is why precision/recall on those three classes is so high.
|
| 35 |
+
"""
|
| 36 |
+
|
| 37 |
+
from typing import Any
|
| 38 |
+
|
| 39 |
+
from PIL import Image
|
| 40 |
+
from transformers import pipeline
|
| 41 |
+
|
| 42 |
+
|
| 43 |
+
MODEL_ID = "dima806/face_obstruction_image_detection"
|
| 44 |
+
|
| 45 |
+
# Canonical labels in lowercase. The pipeline may return any casing —
|
| 46 |
+
# we normalise on the way out so downstream code keys consistently.
|
| 47 |
+
_KNOWN = {"sunglasses", "glasses", "mask", "hand", "other", "none"}
|
| 48 |
+
|
| 49 |
+
|
| 50 |
+
class ObstructionAnalyzer:
|
| 51 |
+
def __init__(self):
|
| 52 |
+
self.classifier = None
|
| 53 |
+
try:
|
| 54 |
+
# HF image-classification pipeline. Weights lazy-load from
|
| 55 |
+
# the Hub on first instantiation and cache locally.
|
| 56 |
+
self.classifier = pipeline("image-classification", model=MODEL_ID)
|
| 57 |
+
except Exception as exc:
|
| 58 |
+
print(f"[ObstructionAnalyzer] Failed to load {MODEL_ID}: {exc}")
|
| 59 |
+
|
| 60 |
+
def analyze(self, img_rgb) -> dict[str, Any]:
|
| 61 |
+
# Empty stub when the model failed to load — keeps the result
|
| 62 |
+
# dict shape stable so the merge in app.py never sees missing keys.
|
| 63 |
+
if self.classifier is None:
|
| 64 |
+
return self._empty_result()
|
| 65 |
+
|
| 66 |
+
try:
|
| 67 |
+
pil = Image.fromarray(img_rgb)
|
| 68 |
+
# top_k=len(_KNOWN) → full softmax across all six classes.
|
| 69 |
+
preds = self.classifier(pil, top_k=len(_KNOWN))
|
| 70 |
+
except Exception as exc:
|
| 71 |
+
print(f"[ObstructionAnalyzer] Prediction failed: {exc}")
|
| 72 |
+
return self._empty_result()
|
| 73 |
+
|
| 74 |
+
# Flatten predictions into a {label: score} dict, normalising
|
| 75 |
+
# label casing as we go. Unseen labels stay at 0.
|
| 76 |
+
scores = {label: 0.0 for label in _KNOWN}
|
| 77 |
+
for pred in preds:
|
| 78 |
+
label = str(pred["label"]).strip().lower()
|
| 79 |
+
if label in scores:
|
| 80 |
+
scores[label] = round(float(pred["score"]), 3)
|
| 81 |
+
|
| 82 |
+
# Top class wins.
|
| 83 |
+
top_label = max(scores, key=scores.get)
|
| 84 |
+
top_score = scores[top_label]
|
| 85 |
+
|
| 86 |
+
return {
|
| 87 |
+
"obstruction_top": top_label,
|
| 88 |
+
"obstruction_confidence": top_score,
|
| 89 |
+
"obstruction_scores": scores,
|
| 90 |
+
# Specific boolean flags the UI consumes directly.
|
| 91 |
+
# `wearing_glasses` is True for any kind of eyewear — the
|
| 92 |
+
# caller can branch on `wearing_sunglasses` if it cares
|
| 93 |
+
# about tinted vs clear lenses.
|
| 94 |
+
"wearing_glasses": scores["glasses"] > 0.5 or scores["sunglasses"] > 0.5,
|
| 95 |
+
"wearing_sunglasses": scores["sunglasses"] > 0.5,
|
| 96 |
+
"wearing_mask": scores["mask"] > 0.5,
|
| 97 |
+
}
|
| 98 |
+
|
| 99 |
+
@staticmethod
|
| 100 |
+
def _empty_result() -> dict[str, Any]:
|
| 101 |
+
return {
|
| 102 |
+
"obstruction_top": "unknown",
|
| 103 |
+
"obstruction_confidence": 0.0,
|
| 104 |
+
"obstruction_scores": {label: 0.0 for label in _KNOWN},
|
| 105 |
+
"wearing_glasses": False,
|
| 106 |
+
"wearing_sunglasses": False,
|
| 107 |
+
"wearing_mask": False,
|
| 108 |
+
}
|
analyzers/parsing_analyzer.py
CHANGED
|
@@ -1,27 +1,42 @@
|
|
| 1 |
"""
|
| 2 |
-
SegFormer-B5 human parsing
|
| 3 |
-
|
| 4 |
-
Model
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
"""
|
| 22 |
|
| 23 |
from typing import Any
|
| 24 |
-
import warnings
|
| 25 |
|
| 26 |
import cv2
|
| 27 |
import numpy as np
|
|
@@ -34,7 +49,8 @@ from transformers import (
|
|
| 34 |
|
| 35 |
MODEL_ID = "matei-dorian/segformer-b5-finetuned-human-parsing"
|
| 36 |
|
| 37 |
-
#
|
|
|
|
| 38 |
PARSING_LABELS = {
|
| 39 |
0: "background",
|
| 40 |
1: "hat",
|
|
@@ -59,10 +75,14 @@ PARSING_LABELS = {
|
|
| 59 |
|
| 60 |
class ParsingAnalyzer:
|
| 61 |
def __init__(self):
|
|
|
|
|
|
|
| 62 |
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
| 63 |
self.processor = None
|
| 64 |
self.model = None
|
| 65 |
try:
|
|
|
|
|
|
|
| 66 |
self.processor = SegformerImageProcessor.from_pretrained(MODEL_ID)
|
| 67 |
self.model = SegformerForSemanticSegmentation.from_pretrained(MODEL_ID)
|
| 68 |
self.model.to(self.device).eval()
|
|
@@ -72,24 +92,36 @@ class ParsingAnalyzer:
|
|
| 72 |
def analyze(self, img_rgb: np.ndarray) -> dict[str, Any]:
|
| 73 |
h, w = img_rgb.shape[:2]
|
| 74 |
|
|
|
|
|
|
|
|
|
|
| 75 |
if self.model is None or self.processor is None:
|
| 76 |
return self._empty_result(h, w)
|
| 77 |
|
|
|
|
| 78 |
pil = Image.fromarray(img_rgb)
|
| 79 |
inputs = self.processor(images=pil, return_tensors="pt").to(self.device)
|
| 80 |
|
|
|
|
| 81 |
with torch.no_grad():
|
| 82 |
logits = self.model(**inputs).logits # (1, C, H/4, W/4)
|
| 83 |
|
|
|
|
|
|
|
| 84 |
upsampled = torch.nn.functional.interpolate(
|
| 85 |
logits, size=(h, w), mode="bilinear", align_corners=False
|
| 86 |
)
|
| 87 |
parsing = upsampled.argmax(dim=1)[0].cpu().numpy().astype(np.uint8)
|
| 88 |
|
|
|
|
|
|
|
| 89 |
masks: dict[str, np.ndarray] = {
|
| 90 |
name: (parsing == label_id) for label_id, name in PARSING_LABELS.items()
|
| 91 |
}
|
| 92 |
|
|
|
|
|
|
|
|
|
|
| 93 |
total_pixels = h * w
|
| 94 |
region_coverage = {
|
| 95 |
name: round(float(mask.sum()) / total_pixels, 4)
|
|
@@ -99,16 +131,16 @@ class ParsingAnalyzer:
|
|
| 99 |
|
| 100 |
result: dict[str, Any] = {"region_coverage": region_coverage}
|
| 101 |
|
|
|
|
|
|
|
| 102 |
skin_mask = masks.get("face", np.zeros((h, w), dtype=bool))
|
| 103 |
hair_mask = masks.get("hair", np.zeros((h, w), dtype=bool))
|
| 104 |
-
# No dedicated lip class; color_analyzer falls back to landmarks for lips.
|
| 105 |
-
lip_mask = np.zeros((h, w), dtype=bool)
|
| 106 |
-
|
| 107 |
result["_skin_mask"] = skin_mask
|
| 108 |
result["_hair_mask"] = hair_mask
|
| 109 |
-
result["_lip_mask"] = lip_mask
|
| 110 |
|
| 111 |
# ── Hair length estimation ───────────────────────────────────
|
|
|
|
|
|
|
| 112 |
hair_pixels = int(hair_mask.sum())
|
| 113 |
face_pixels = int(skin_mask.sum()) + hair_pixels
|
| 114 |
hair_ratio = hair_pixels / face_pixels if face_pixels else 0
|
|
@@ -124,18 +156,22 @@ class ParsingAnalyzer:
|
|
| 124 |
|
| 125 |
result["hair_present"] = hair_ratio > 0.03
|
| 126 |
|
| 127 |
-
# ──
|
| 128 |
-
|
|
|
|
| 129 |
result["hat_detected"] = region_coverage.get("hat", 0) > 0.01
|
| 130 |
-
result["earring_detected"] = False # no earring class in this model
|
| 131 |
-
result["necklace_detected"] = False # no necklace class in this model
|
| 132 |
|
| 133 |
-
# ── Skin
|
|
|
|
|
|
|
| 134 |
if skin_mask.sum() > 100:
|
|
|
|
|
|
|
|
|
|
| 135 |
skin_gray = cv2.cvtColor(img_rgb, cv2.COLOR_RGB2GRAY)
|
| 136 |
laplacian = cv2.Laplacian(skin_gray, cv2.CV_64F)
|
| 137 |
skin_edges = np.abs(laplacian)
|
| 138 |
-
skin_edges[~skin_mask] = 0
|
| 139 |
edge_density = skin_edges.sum() / skin_mask.sum() if skin_mask.sum() else 0
|
| 140 |
|
| 141 |
if edge_density > 15:
|
|
@@ -149,6 +185,10 @@ class ParsingAnalyzer:
|
|
| 149 |
|
| 150 |
result["skin_texture_score"] = round(float(edge_density), 2)
|
| 151 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 152 |
skin_lab = cv2.cvtColor(img_rgb, cv2.COLOR_RGB2LAB)
|
| 153 |
l_channel = skin_lab[:, :, 0].astype(float)
|
| 154 |
l_channel[~skin_mask] = np.nan
|
|
@@ -162,6 +202,8 @@ class ParsingAnalyzer:
|
|
| 162 |
else "none"
|
| 163 |
)
|
| 164 |
|
|
|
|
|
|
|
| 165 |
skin_l_values = l_channel[skin_mask]
|
| 166 |
result["skin_uniformity"] = round(float(np.nanstd(skin_l_values)), 2)
|
| 167 |
else:
|
|
@@ -174,18 +216,19 @@ class ParsingAnalyzer:
|
|
| 174 |
|
| 175 |
@staticmethod
|
| 176 |
def _empty_result(h: int, w: int) -> dict[str, Any]:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 177 |
empty = np.zeros((h, w), dtype=bool)
|
| 178 |
return {
|
| 179 |
"region_coverage": {},
|
| 180 |
"_skin_mask": empty,
|
| 181 |
"_hair_mask": empty,
|
| 182 |
-
"_lip_mask": empty,
|
| 183 |
"hair_length": "unknown",
|
| 184 |
"hair_present": False,
|
| 185 |
-
"glasses_detected": False,
|
| 186 |
"hat_detected": False,
|
| 187 |
-
"earring_detected": False,
|
| 188 |
-
"necklace_detected": False,
|
| 189 |
"wrinkle_level": "unknown",
|
| 190 |
"skin_texture_score": 0,
|
| 191 |
"freckles_or_moles": "unknown",
|
|
|
|
| 1 |
"""
|
| 2 |
+
ParsingAnalyzer — SegFormer-B5 human parsing for masks and skin stats.
|
| 3 |
+
|
| 4 |
+
Model
|
| 5 |
+
-----
|
| 6 |
+
- Architecture : SegFormer-B5 (nvidia/mit-b5 backbone)
|
| 7 |
+
- HF repo : matei-dorian/segformer-b5-finetuned-human-parsing
|
| 8 |
+
- License : Apache 2.0
|
| 9 |
+
- Eval metrics : mean IoU 0.626, overall acc 0.826
|
| 10 |
+
face acc 0.909 / IoU 0.829
|
| 11 |
+
hair acc 0.897 / IoU 0.817
|
| 12 |
+
- Classes (18) : background, hat, hair, sunglasses, upper_clothes, skirt,
|
| 13 |
+
pants, dress, belt, left_shoe, right_shoe, face,
|
| 14 |
+
left_leg, right_leg, left_arm, right_arm, bag, scarf
|
| 15 |
+
|
| 16 |
+
Inputs
|
| 17 |
+
------
|
| 18 |
+
img_rgb : np.ndarray (H, W, 3) uint8
|
| 19 |
+
|
| 20 |
+
Outputs (dict)
|
| 21 |
+
--------------
|
| 22 |
+
Internal masks (stripped from JSON):
|
| 23 |
+
_skin_mask, _hair_mask
|
| 24 |
+
Public fields:
|
| 25 |
+
region_coverage — per-class fraction of pixels
|
| 26 |
+
hair_length — bald/very short | short | medium | long
|
| 27 |
+
hair_present — bool
|
| 28 |
+
hat_detected — bool, true when ≥1% of pixels are class "hat"
|
| 29 |
+
wrinkle_level — smooth | slight | moderate | prominent
|
| 30 |
+
skin_texture_score, skin_uniformity, freckles_or_moles
|
| 31 |
+
|
| 32 |
+
Notes
|
| 33 |
+
-----
|
| 34 |
+
The wrinkle / texture / freckle fields are OpenCV statistics computed
|
| 35 |
+
over the SegFormer face mask, not direct model outputs. SegFormer
|
| 36 |
+
contributes the mask; OpenCV does the per-pixel math.
|
| 37 |
"""
|
| 38 |
|
| 39 |
from typing import Any
|
|
|
|
| 40 |
|
| 41 |
import cv2
|
| 42 |
import numpy as np
|
|
|
|
| 49 |
|
| 50 |
MODEL_ID = "matei-dorian/segformer-b5-finetuned-human-parsing"
|
| 51 |
|
| 52 |
+
# Class id → name as published by the model card. We index masks by
|
| 53 |
+
# these names downstream rather than raw integer ids.
|
| 54 |
PARSING_LABELS = {
|
| 55 |
0: "background",
|
| 56 |
1: "hat",
|
|
|
|
| 75 |
|
| 76 |
class ParsingAnalyzer:
|
| 77 |
def __init__(self):
|
| 78 |
+
# CUDA when available, CPU otherwise. The HF Spaces free tier is
|
| 79 |
+
# CPU-only, so SegFormer-B5 inference takes ~1-2 s per request.
|
| 80 |
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
| 81 |
self.processor = None
|
| 82 |
self.model = None
|
| 83 |
try:
|
| 84 |
+
# Both processor and model weights come from the same repo;
|
| 85 |
+
# processor handles resize/normalize/tensorize.
|
| 86 |
self.processor = SegformerImageProcessor.from_pretrained(MODEL_ID)
|
| 87 |
self.model = SegformerForSemanticSegmentation.from_pretrained(MODEL_ID)
|
| 88 |
self.model.to(self.device).eval()
|
|
|
|
| 92 |
def analyze(self, img_rgb: np.ndarray) -> dict[str, Any]:
|
| 93 |
h, w = img_rgb.shape[:2]
|
| 94 |
|
| 95 |
+
# If the model failed to load we return empty masks so the rest
|
| 96 |
+
# of the pipeline (especially ColorAnalyzer) sees a consistent
|
| 97 |
+
# shape and degrades cleanly to "unknown" fields.
|
| 98 |
if self.model is None or self.processor is None:
|
| 99 |
return self._empty_result(h, w)
|
| 100 |
|
| 101 |
+
# SegFormer expects PIL; processor will resize internally.
|
| 102 |
pil = Image.fromarray(img_rgb)
|
| 103 |
inputs = self.processor(images=pil, return_tensors="pt").to(self.device)
|
| 104 |
|
| 105 |
+
# Forward pass → logits at H/4 × W/4 resolution.
|
| 106 |
with torch.no_grad():
|
| 107 |
logits = self.model(**inputs).logits # (1, C, H/4, W/4)
|
| 108 |
|
| 109 |
+
# Upsample to original resolution, then argmax to get the
|
| 110 |
+
# class id per pixel.
|
| 111 |
upsampled = torch.nn.functional.interpolate(
|
| 112 |
logits, size=(h, w), mode="bilinear", align_corners=False
|
| 113 |
)
|
| 114 |
parsing = upsampled.argmax(dim=1)[0].cpu().numpy().astype(np.uint8)
|
| 115 |
|
| 116 |
+
# Build a boolean mask per class. Cheap because we already have
|
| 117 |
+
# the argmax map; each is one numpy equality check.
|
| 118 |
masks: dict[str, np.ndarray] = {
|
| 119 |
name: (parsing == label_id) for label_id, name in PARSING_LABELS.items()
|
| 120 |
}
|
| 121 |
|
| 122 |
+
# region_coverage = fraction of image occupied by each class.
|
| 123 |
+
# Useful as a coarse "is this class even present" signal — e.g.
|
| 124 |
+
# hat detection just checks if hat coverage exceeds a threshold.
|
| 125 |
total_pixels = h * w
|
| 126 |
region_coverage = {
|
| 127 |
name: round(float(mask.sum()) / total_pixels, 4)
|
|
|
|
| 131 |
|
| 132 |
result: dict[str, Any] = {"region_coverage": region_coverage}
|
| 133 |
|
| 134 |
+
# Skin & hair masks are passed downstream to ColorAnalyzer.
|
| 135 |
+
# Leading underscore → stripped from the final JSON payload.
|
| 136 |
skin_mask = masks.get("face", np.zeros((h, w), dtype=bool))
|
| 137 |
hair_mask = masks.get("hair", np.zeros((h, w), dtype=bool))
|
|
|
|
|
|
|
|
|
|
| 138 |
result["_skin_mask"] = skin_mask
|
| 139 |
result["_hair_mask"] = hair_mask
|
|
|
|
| 140 |
|
| 141 |
# ── Hair length estimation ───────────────────────────────────
|
| 142 |
+
# Ratio of hair pixels to (face + hair) pixels — bigger ratio
|
| 143 |
+
# means longer hair extending past the face.
|
| 144 |
hair_pixels = int(hair_mask.sum())
|
| 145 |
face_pixels = int(skin_mask.sum()) + hair_pixels
|
| 146 |
hair_ratio = hair_pixels / face_pixels if face_pixels else 0
|
|
|
|
| 156 |
|
| 157 |
result["hair_present"] = hair_ratio > 0.03
|
| 158 |
|
| 159 |
+
# ── Hat detection ────────────────────────────────────────────
|
| 160 |
+
# A real hat consistently covers >1% of pixels; below that we're
|
| 161 |
+
# in noise / mis-segmentation territory.
|
| 162 |
result["hat_detected"] = region_coverage.get("hat", 0) > 0.01
|
|
|
|
|
|
|
| 163 |
|
| 164 |
+
# ── Skin texture / wrinkles / freckles ───────────────────────
|
| 165 |
+
# Only worth computing if the face mask actually has substance.
|
| 166 |
+
# Under ~100 pixels we don't have enough signal.
|
| 167 |
if skin_mask.sum() > 100:
|
| 168 |
+
# Wrinkles → high-frequency edge energy on the face mask.
|
| 169 |
+
# Laplacian responds to local intensity curvature; std/mean
|
| 170 |
+
# over the masked region gives a "how much fine detail" score.
|
| 171 |
skin_gray = cv2.cvtColor(img_rgb, cv2.COLOR_RGB2GRAY)
|
| 172 |
laplacian = cv2.Laplacian(skin_gray, cv2.CV_64F)
|
| 173 |
skin_edges = np.abs(laplacian)
|
| 174 |
+
skin_edges[~skin_mask] = 0 # zero out non-face pixels
|
| 175 |
edge_density = skin_edges.sum() / skin_mask.sum() if skin_mask.sum() else 0
|
| 176 |
|
| 177 |
if edge_density > 15:
|
|
|
|
| 185 |
|
| 186 |
result["skin_texture_score"] = round(float(edge_density), 2)
|
| 187 |
|
| 188 |
+
# Freckles/moles → count pixels well below mean L* lightness.
|
| 189 |
+
# Working in LAB rather than RGB makes the threshold tone-
|
| 190 |
+
# independent (a freckle is "darker than surrounding skin"
|
| 191 |
+
# regardless of base skin tone).
|
| 192 |
skin_lab = cv2.cvtColor(img_rgb, cv2.COLOR_RGB2LAB)
|
| 193 |
l_channel = skin_lab[:, :, 0].astype(float)
|
| 194 |
l_channel[~skin_mask] = np.nan
|
|
|
|
| 202 |
else "none"
|
| 203 |
)
|
| 204 |
|
| 205 |
+
# Uniformity = std-dev of L* over the face. Higher = more
|
| 206 |
+
# variation (uneven skin tone, shadows, scarring).
|
| 207 |
skin_l_values = l_channel[skin_mask]
|
| 208 |
result["skin_uniformity"] = round(float(np.nanstd(skin_l_values)), 2)
|
| 209 |
else:
|
|
|
|
| 216 |
|
| 217 |
@staticmethod
|
| 218 |
def _empty_result(h: int, w: int) -> dict[str, Any]:
|
| 219 |
+
"""Stub returned when the SegFormer model fails to load.
|
| 220 |
+
|
| 221 |
+
Shape must match the success path so downstream code can rely
|
| 222 |
+
on key presence without conditional checks.
|
| 223 |
+
"""
|
| 224 |
empty = np.zeros((h, w), dtype=bool)
|
| 225 |
return {
|
| 226 |
"region_coverage": {},
|
| 227 |
"_skin_mask": empty,
|
| 228 |
"_hair_mask": empty,
|
|
|
|
| 229 |
"hair_length": "unknown",
|
| 230 |
"hair_present": False,
|
|
|
|
| 231 |
"hat_detected": False,
|
|
|
|
|
|
|
| 232 |
"wrinkle_level": "unknown",
|
| 233 |
"skin_texture_score": 0,
|
| 234 |
"freckles_or_moles": "unknown",
|
app.py
CHANGED
|
@@ -1,25 +1,63 @@
|
|
| 1 |
"""
|
| 2 |
-
Face Analysis Microservice
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
"""
|
| 13 |
|
| 14 |
import os
|
|
|
|
|
|
|
|
|
|
| 15 |
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
|
| 16 |
-
os.environ["HF_HUB_DOWNLOAD_TIMEOUT"] = "60"
|
| 17 |
|
| 18 |
import io
|
| 19 |
import logging
|
| 20 |
from typing import Optional
|
| 21 |
|
| 22 |
-
import cv2
|
| 23 |
import numpy as np
|
| 24 |
from fastapi import FastAPI, File, HTTPException, UploadFile
|
| 25 |
from fastapi.middleware.cors import CORSMiddleware
|
|
@@ -27,10 +65,11 @@ from PIL import Image
|
|
| 27 |
|
| 28 |
from analyzers.landmark_analyzer import LandmarkAnalyzer
|
| 29 |
from analyzers.demographic_analyzer import DemographicAnalyzer
|
| 30 |
-
from analyzers.attribute_analyzer import AttributeAnalyzer
|
| 31 |
from analyzers.parsing_analyzer import ParsingAnalyzer
|
| 32 |
from analyzers.emotion_analyzer import EmotionAnalyzer
|
| 33 |
from analyzers.color_analyzer import ColorAnalyzer
|
|
|
|
|
|
|
| 34 |
|
| 35 |
logging.basicConfig(level=logging.INFO)
|
| 36 |
logger = logging.getLogger(__name__)
|
|
@@ -48,15 +87,22 @@ app.add_middleware(
|
|
| 48 |
# Analyzers are initialized lazily on first request to reduce cold-start time
|
| 49 |
landmark_analyzer: Optional[LandmarkAnalyzer] = None
|
| 50 |
demographic_analyzer: Optional[DemographicAnalyzer] = None
|
| 51 |
-
attribute_analyzer: Optional[AttributeAnalyzer] = None
|
| 52 |
parsing_analyzer: Optional[ParsingAnalyzer] = None
|
| 53 |
emotion_analyzer: Optional[EmotionAnalyzer] = None
|
| 54 |
color_analyzer: Optional[ColorAnalyzer] = None
|
|
|
|
|
|
|
| 55 |
|
| 56 |
|
| 57 |
def _to_json_safe(value):
|
| 58 |
-
"""
|
| 59 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 60 |
if isinstance(value, (np.ndarray,)):
|
| 61 |
return value.tolist()
|
| 62 |
if isinstance(value, (np.integer, np.floating)):
|
|
@@ -65,7 +111,7 @@ def _to_json_safe(value):
|
|
| 65 |
return bool(value)
|
| 66 |
if isinstance(value, np.generic):
|
| 67 |
return value.item()
|
| 68 |
-
#
|
| 69 |
if isinstance(value, dict):
|
| 70 |
return {str(k): _to_json_safe(v) for k, v in value.items()}
|
| 71 |
if isinstance(value, (list, tuple, set)):
|
|
@@ -74,9 +120,15 @@ def _to_json_safe(value):
|
|
| 74 |
|
| 75 |
|
| 76 |
def get_analyzers():
|
| 77 |
-
"""Lazy-load all analyzer models on first use.
|
| 78 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 79 |
global parsing_analyzer, emotion_analyzer, color_analyzer
|
|
|
|
| 80 |
|
| 81 |
if landmark_analyzer is None:
|
| 82 |
logger.info("Loading MediaPipe Face Landmarker...")
|
|
@@ -86,12 +138,8 @@ def get_analyzers():
|
|
| 86 |
logger.info("Loading FairFace demographics model...")
|
| 87 |
demographic_analyzer = DemographicAnalyzer()
|
| 88 |
|
| 89 |
-
if attribute_analyzer is None:
|
| 90 |
-
logger.info("Loading CelebA attribute classifier...")
|
| 91 |
-
attribute_analyzer = AttributeAnalyzer()
|
| 92 |
-
|
| 93 |
if parsing_analyzer is None:
|
| 94 |
-
logger.info("Loading
|
| 95 |
parsing_analyzer = ParsingAnalyzer()
|
| 96 |
|
| 97 |
if emotion_analyzer is None:
|
|
@@ -101,88 +149,96 @@ def get_analyzers():
|
|
| 101 |
if color_analyzer is None:
|
| 102 |
color_analyzer = ColorAnalyzer()
|
| 103 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 104 |
return (
|
| 105 |
landmark_analyzer,
|
| 106 |
demographic_analyzer,
|
| 107 |
-
attribute_analyzer,
|
| 108 |
parsing_analyzer,
|
| 109 |
emotion_analyzer,
|
| 110 |
color_analyzer,
|
|
|
|
|
|
|
| 111 |
)
|
| 112 |
|
| 113 |
|
| 114 |
@app.get("/")
|
| 115 |
async def root():
|
| 116 |
-
"""
|
| 117 |
return {
|
| 118 |
"name": "HCP Face Analysis Service",
|
| 119 |
"version": "2.0.0",
|
| 120 |
"status": "running",
|
| 121 |
"endpoints": {
|
| 122 |
"health": "/health",
|
| 123 |
-
"analyze": "/analyze"
|
|
|
|
| 124 |
}
|
| 125 |
}
|
| 126 |
|
| 127 |
|
| 128 |
@app.get("/health")
|
| 129 |
async def health():
|
| 130 |
-
"""
|
| 131 |
return {"status": "ok"}
|
| 132 |
|
| 133 |
|
| 134 |
@app.post("/analyze")
|
| 135 |
async def analyze_face(file: UploadFile = File(...)):
|
| 136 |
-
"""
|
| 137 |
-
Comprehensive face analysis endpoint.
|
| 138 |
|
| 139 |
-
|
| 140 |
-
|
|
|
|
| 141 |
"""
|
| 142 |
try:
|
| 143 |
-
#
|
|
|
|
|
|
|
| 144 |
contents = await file.read()
|
| 145 |
image = Image.open(io.BytesIO(contents)).convert("RGB")
|
| 146 |
img_array = np.array(image)
|
| 147 |
-
img_bgr = cv2.cvtColor(img_array, cv2.COLOR_RGB2BGR)
|
| 148 |
|
| 149 |
(
|
| 150 |
landmarks,
|
| 151 |
demographics,
|
| 152 |
-
attributes,
|
| 153 |
parsing,
|
| 154 |
emotions,
|
| 155 |
colors,
|
|
|
|
|
|
|
| 156 |
) = get_analyzers()
|
| 157 |
|
| 158 |
results = {}
|
| 159 |
|
| 160 |
-
# Step 1: MediaPipe Landmarks → geometric features
|
| 161 |
logger.info("Running landmark analysis...")
|
| 162 |
landmark_results = landmarks.analyze(img_array)
|
| 163 |
results.update(landmark_results)
|
| 164 |
|
| 165 |
-
# Step 2: FairFace
|
| 166 |
logger.info("Running demographic analysis...")
|
| 167 |
demo_results = demographics.analyze(img_array)
|
| 168 |
results.update(demo_results)
|
| 169 |
|
| 170 |
-
# Step 3:
|
| 171 |
-
logger.info("Running attribute analysis...")
|
| 172 |
-
attr_results = attributes.analyze(img_array)
|
| 173 |
-
results.update(attr_results)
|
| 174 |
-
|
| 175 |
-
# Step 4: BiSeNet → pixel segmentation → hair length, wrinkles, spots
|
| 176 |
logger.info("Running face parsing...")
|
| 177 |
parse_results = parsing.analyze(img_array)
|
| 178 |
results.update(parse_results)
|
| 179 |
|
| 180 |
-
# Step
|
| 181 |
logger.info("Running emotion analysis...")
|
| 182 |
emo_results = emotions.analyze(img_array)
|
| 183 |
results.update(emo_results)
|
| 184 |
|
| 185 |
-
# Step
|
|
|
|
| 186 |
logger.info("Running color analysis...")
|
| 187 |
color_results = colors.analyze(
|
| 188 |
img_array,
|
|
@@ -192,6 +248,14 @@ async def analyze_face(file: UploadFile = File(...)):
|
|
| 192 |
)
|
| 193 |
results.update(color_results)
|
| 194 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 195 |
# Remove internal fields (prefixed with underscore)
|
| 196 |
results = {k: v for k, v in results.items() if not k.startswith("_")}
|
| 197 |
|
|
@@ -204,9 +268,11 @@ async def analyze_face(file: UploadFile = File(...)):
|
|
| 204 |
|
| 205 |
@app.post("/analyze-base64")
|
| 206 |
async def analyze_face_base64(body: dict):
|
| 207 |
-
"""
|
| 208 |
-
|
| 209 |
-
This
|
|
|
|
|
|
|
| 210 |
"""
|
| 211 |
import base64
|
| 212 |
|
|
@@ -222,28 +288,28 @@ async def analyze_face_base64(body: dict):
|
|
| 222 |
image_bytes = base64.b64decode(image_b64)
|
| 223 |
image = Image.open(io.BytesIO(image_bytes)).convert("RGB")
|
| 224 |
img_array = np.array(image)
|
| 225 |
-
img_bgr = cv2.cvtColor(img_array, cv2.COLOR_RGB2BGR)
|
| 226 |
|
| 227 |
(
|
| 228 |
landmarks,
|
| 229 |
demographics,
|
| 230 |
-
attributes,
|
| 231 |
parsing,
|
| 232 |
emotions,
|
| 233 |
colors,
|
|
|
|
|
|
|
| 234 |
) = get_analyzers()
|
| 235 |
|
| 236 |
results = {}
|
| 237 |
|
|
|
|
|
|
|
|
|
|
| 238 |
landmark_results = landmarks.analyze(img_array)
|
| 239 |
results.update(landmark_results)
|
| 240 |
|
| 241 |
demo_results = demographics.analyze(img_array)
|
| 242 |
results.update(demo_results)
|
| 243 |
|
| 244 |
-
attr_results = attributes.analyze(img_array)
|
| 245 |
-
results.update(attr_results)
|
| 246 |
-
|
| 247 |
parse_results = parsing.analyze(img_array)
|
| 248 |
results.update(parse_results)
|
| 249 |
|
|
@@ -258,6 +324,11 @@ async def analyze_face_base64(body: dict):
|
|
| 258 |
)
|
| 259 |
results.update(color_results)
|
| 260 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 261 |
results = {k: v for k, v in results.items() if not k.startswith("_")}
|
| 262 |
|
| 263 |
return {"success": True, "data": _to_json_safe(results)}
|
|
|
|
| 1 |
"""
|
| 2 |
+
HCP Face Analysis Microservice
|
| 3 |
+
==============================
|
| 4 |
+
|
| 5 |
+
FastAPI service that runs seven specialized analyzers over a single photo
|
| 6 |
+
and merges their outputs into one ~100-field facial-attribute dictionary.
|
| 7 |
+
|
| 8 |
+
Pipeline (in execution order)
|
| 9 |
+
-----------------------------
|
| 10 |
+
1. MediaPipe Face Landmarker 478 3D landmarks + 52 ARKit blendshapes.
|
| 11 |
+
Produces all geometric face/eye/nose/lip/
|
| 12 |
+
jaw features plus smiling and mouth-open.
|
| 13 |
+
|
| 14 |
+
2. DemographicAnalyzer Three ViT classifiers (FairFace age,
|
| 15 |
+
FairFace gender, Ethnicity_Test_v003).
|
| 16 |
+
Age is reported as a softmax-weighted
|
| 17 |
+
continuous estimate, not a bucket midpoint.
|
| 18 |
+
|
| 19 |
+
3. ParsingAnalyzer SegFormer-B5 human parsing. Emits face
|
| 20 |
+
and hair pixel masks plus hair length,
|
| 21 |
+
hat detection, and skin texture/wrinkle/
|
| 22 |
+
freckle/uniformity stats computed via
|
| 23 |
+
OpenCV over the face mask.
|
| 24 |
+
|
| 25 |
+
4. EmotionAnalyzer HSEmotion EfficientNet-B0 8-class output
|
| 26 |
+
plus derived valence, arousal, mood.
|
| 27 |
+
|
| 28 |
+
5. ColorAnalyzer Pixel-level LAB/HSV statistics. Reads
|
| 29 |
+
masks from step 3 and lip/iris landmarks
|
| 30 |
+
from step 1. No ML model.
|
| 31 |
+
|
| 32 |
+
6. ObstructionAnalyzer dima806 ViT-B/16. Glasses, sunglasses,
|
| 33 |
+
mask flags with ~99% precision/recall.
|
| 34 |
+
|
| 35 |
+
7. HairTypeAnalyzer dima806 ViT-B/16. Curly/dreadlocks/kinky/
|
| 36 |
+
straight/wavy at ~93% accuracy.
|
| 37 |
+
|
| 38 |
+
Endpoints
|
| 39 |
+
---------
|
| 40 |
+
GET / service banner
|
| 41 |
+
GET /health liveness check
|
| 42 |
+
POST /analyze multipart file upload
|
| 43 |
+
POST /analyze-base64 JSON {"image": "<base64>"}
|
| 44 |
+
|
| 45 |
+
Both POST endpoints run the same pipeline. All analyzers are lazily
|
| 46 |
+
instantiated on first request to keep cold-start latency manageable
|
| 47 |
+
on the Hugging Face Spaces free tier.
|
| 48 |
"""
|
| 49 |
|
| 50 |
import os
|
| 51 |
+
# hf_transfer gives much faster model downloads from the HF Hub on first
|
| 52 |
+
# inference. HF_HUB_DOWNLOAD_TIMEOUT defaults to 10s which is too short
|
| 53 |
+
# for the larger ViT checkpoints on a cold start.
|
| 54 |
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
|
| 55 |
+
os.environ["HF_HUB_DOWNLOAD_TIMEOUT"] = "60"
|
| 56 |
|
| 57 |
import io
|
| 58 |
import logging
|
| 59 |
from typing import Optional
|
| 60 |
|
|
|
|
| 61 |
import numpy as np
|
| 62 |
from fastapi import FastAPI, File, HTTPException, UploadFile
|
| 63 |
from fastapi.middleware.cors import CORSMiddleware
|
|
|
|
| 65 |
|
| 66 |
from analyzers.landmark_analyzer import LandmarkAnalyzer
|
| 67 |
from analyzers.demographic_analyzer import DemographicAnalyzer
|
|
|
|
| 68 |
from analyzers.parsing_analyzer import ParsingAnalyzer
|
| 69 |
from analyzers.emotion_analyzer import EmotionAnalyzer
|
| 70 |
from analyzers.color_analyzer import ColorAnalyzer
|
| 71 |
+
from analyzers.obstruction_analyzer import ObstructionAnalyzer
|
| 72 |
+
from analyzers.hair_type_analyzer import HairTypeAnalyzer
|
| 73 |
|
| 74 |
logging.basicConfig(level=logging.INFO)
|
| 75 |
logger = logging.getLogger(__name__)
|
|
|
|
| 87 |
# Analyzers are initialized lazily on first request to reduce cold-start time
|
| 88 |
landmark_analyzer: Optional[LandmarkAnalyzer] = None
|
| 89 |
demographic_analyzer: Optional[DemographicAnalyzer] = None
|
|
|
|
| 90 |
parsing_analyzer: Optional[ParsingAnalyzer] = None
|
| 91 |
emotion_analyzer: Optional[EmotionAnalyzer] = None
|
| 92 |
color_analyzer: Optional[ColorAnalyzer] = None
|
| 93 |
+
obstruction_analyzer: Optional[ObstructionAnalyzer] = None
|
| 94 |
+
hair_type_analyzer: Optional[HairTypeAnalyzer] = None
|
| 95 |
|
| 96 |
|
| 97 |
def _to_json_safe(value):
|
| 98 |
+
"""Recursively coerce numpy scalars/arrays into JSON-serialisable types.
|
| 99 |
+
|
| 100 |
+
Several analyzers return numpy floats/booleans (e.g. from `np.std`
|
| 101 |
+
or boolean mask logic). FastAPI's default JSON encoder doesn't
|
| 102 |
+
handle those, so we normalise everything here before returning.
|
| 103 |
+
"""
|
| 104 |
+
# Numpy first — these checks would otherwise be caught by isinstance
|
| 105 |
+
# for dict/list because numpy.generic types are duck-typed.
|
| 106 |
if isinstance(value, (np.ndarray,)):
|
| 107 |
return value.tolist()
|
| 108 |
if isinstance(value, (np.integer, np.floating)):
|
|
|
|
| 111 |
return bool(value)
|
| 112 |
if isinstance(value, np.generic):
|
| 113 |
return value.item()
|
| 114 |
+
# Recurse into nested containers.
|
| 115 |
if isinstance(value, dict):
|
| 116 |
return {str(k): _to_json_safe(v) for k, v in value.items()}
|
| 117 |
if isinstance(value, (list, tuple, set)):
|
|
|
|
| 120 |
|
| 121 |
|
| 122 |
def get_analyzers():
|
| 123 |
+
"""Lazy-load all analyzer models on first use.
|
| 124 |
+
|
| 125 |
+
Each analyzer is instantiated once per process and reused across
|
| 126 |
+
requests. First request pays the full model-load cost; subsequent
|
| 127 |
+
requests are warm.
|
| 128 |
+
"""
|
| 129 |
+
global landmark_analyzer, demographic_analyzer
|
| 130 |
global parsing_analyzer, emotion_analyzer, color_analyzer
|
| 131 |
+
global obstruction_analyzer, hair_type_analyzer
|
| 132 |
|
| 133 |
if landmark_analyzer is None:
|
| 134 |
logger.info("Loading MediaPipe Face Landmarker...")
|
|
|
|
| 138 |
logger.info("Loading FairFace demographics model...")
|
| 139 |
demographic_analyzer = DemographicAnalyzer()
|
| 140 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 141 |
if parsing_analyzer is None:
|
| 142 |
+
logger.info("Loading SegFormer face parser...")
|
| 143 |
parsing_analyzer = ParsingAnalyzer()
|
| 144 |
|
| 145 |
if emotion_analyzer is None:
|
|
|
|
| 149 |
if color_analyzer is None:
|
| 150 |
color_analyzer = ColorAnalyzer()
|
| 151 |
|
| 152 |
+
if obstruction_analyzer is None:
|
| 153 |
+
logger.info("Loading face obstruction classifier...")
|
| 154 |
+
obstruction_analyzer = ObstructionAnalyzer()
|
| 155 |
+
|
| 156 |
+
if hair_type_analyzer is None:
|
| 157 |
+
logger.info("Loading hair type classifier...")
|
| 158 |
+
hair_type_analyzer = HairTypeAnalyzer()
|
| 159 |
+
|
| 160 |
return (
|
| 161 |
landmark_analyzer,
|
| 162 |
demographic_analyzer,
|
|
|
|
| 163 |
parsing_analyzer,
|
| 164 |
emotion_analyzer,
|
| 165 |
color_analyzer,
|
| 166 |
+
obstruction_analyzer,
|
| 167 |
+
hair_type_analyzer,
|
| 168 |
)
|
| 169 |
|
| 170 |
|
| 171 |
@app.get("/")
|
| 172 |
async def root():
|
| 173 |
+
"""Service banner — confirms the server is reachable and which version."""
|
| 174 |
return {
|
| 175 |
"name": "HCP Face Analysis Service",
|
| 176 |
"version": "2.0.0",
|
| 177 |
"status": "running",
|
| 178 |
"endpoints": {
|
| 179 |
"health": "/health",
|
| 180 |
+
"analyze": "/analyze",
|
| 181 |
+
"analyze-base64": "/analyze-base64",
|
| 182 |
}
|
| 183 |
}
|
| 184 |
|
| 185 |
|
| 186 |
@app.get("/health")
|
| 187 |
async def health():
|
| 188 |
+
"""Liveness probe. Used by the Express server and HF Spaces uptime checks."""
|
| 189 |
return {"status": "ok"}
|
| 190 |
|
| 191 |
|
| 192 |
@app.post("/analyze")
|
| 193 |
async def analyze_face(file: UploadFile = File(...)):
|
| 194 |
+
"""Multipart endpoint for direct uploads.
|
|
|
|
| 195 |
|
| 196 |
+
Runs all seven analyzers and returns the merged attribute dict.
|
| 197 |
+
See `analyze_face_base64` for the JSON-body variant the Express
|
| 198 |
+
server calls.
|
| 199 |
"""
|
| 200 |
try:
|
| 201 |
+
# Decode the upload into an RGB numpy array. All analyzers
|
| 202 |
+
# work in RGB; we don't actually need BGR but keeping it as a
|
| 203 |
+
# local in case a future analyzer wants the OpenCV-native order.
|
| 204 |
contents = await file.read()
|
| 205 |
image = Image.open(io.BytesIO(contents)).convert("RGB")
|
| 206 |
img_array = np.array(image)
|
|
|
|
| 207 |
|
| 208 |
(
|
| 209 |
landmarks,
|
| 210 |
demographics,
|
|
|
|
| 211 |
parsing,
|
| 212 |
emotions,
|
| 213 |
colors,
|
| 214 |
+
obstructions,
|
| 215 |
+
hair_types,
|
| 216 |
) = get_analyzers()
|
| 217 |
|
| 218 |
results = {}
|
| 219 |
|
| 220 |
+
# Step 1: MediaPipe Landmarks → all geometric features + blendshapes.
|
| 221 |
logger.info("Running landmark analysis...")
|
| 222 |
landmark_results = landmarks.analyze(img_array)
|
| 223 |
results.update(landmark_results)
|
| 224 |
|
| 225 |
+
# Step 2: FairFace + Ethnicity ViT → demographics.
|
| 226 |
logger.info("Running demographic analysis...")
|
| 227 |
demo_results = demographics.analyze(img_array)
|
| 228 |
results.update(demo_results)
|
| 229 |
|
| 230 |
+
# Step 3: SegFormer-B5 human parsing → masks + hair length + skin stats.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 231 |
logger.info("Running face parsing...")
|
| 232 |
parse_results = parsing.analyze(img_array)
|
| 233 |
results.update(parse_results)
|
| 234 |
|
| 235 |
+
# Step 4: HSEmotion → 8-class emotion + valence/arousal/mood.
|
| 236 |
logger.info("Running emotion analysis...")
|
| 237 |
emo_results = emotions.analyze(img_array)
|
| 238 |
results.update(emo_results)
|
| 239 |
|
| 240 |
+
# Step 5: Pixel color analysis. Uses the face/hair masks from step 3
|
| 241 |
+
# and MediaPipe lip/iris landmarks from step 1.
|
| 242 |
logger.info("Running color analysis...")
|
| 243 |
color_results = colors.analyze(
|
| 244 |
img_array,
|
|
|
|
| 248 |
)
|
| 249 |
results.update(color_results)
|
| 250 |
|
| 251 |
+
# Step 6: ObstructionViT → glasses / sunglasses / mask flags.
|
| 252 |
+
logger.info("Running obstruction analysis...")
|
| 253 |
+
results.update(obstructions.analyze(img_array))
|
| 254 |
+
|
| 255 |
+
# Step 7: HairTypeViT → curly/dreadlocks/kinky/straight/wavy.
|
| 256 |
+
logger.info("Running hair-type analysis...")
|
| 257 |
+
results.update(hair_types.analyze(img_array))
|
| 258 |
+
|
| 259 |
# Remove internal fields (prefixed with underscore)
|
| 260 |
results = {k: v for k, v in results.items() if not k.startswith("_")}
|
| 261 |
|
|
|
|
| 268 |
|
| 269 |
@app.post("/analyze-base64")
|
| 270 |
async def analyze_face_base64(body: dict):
|
| 271 |
+
"""JSON-body endpoint accepting `{"image": "<base64>"}`.
|
| 272 |
+
|
| 273 |
+
This is what the Node/Express server forwards client requests to
|
| 274 |
+
so we don't have to push multipart payloads through the proxy.
|
| 275 |
+
The pipeline body is identical to `/analyze`.
|
| 276 |
"""
|
| 277 |
import base64
|
| 278 |
|
|
|
|
| 288 |
image_bytes = base64.b64decode(image_b64)
|
| 289 |
image = Image.open(io.BytesIO(image_bytes)).convert("RGB")
|
| 290 |
img_array = np.array(image)
|
|
|
|
| 291 |
|
| 292 |
(
|
| 293 |
landmarks,
|
| 294 |
demographics,
|
|
|
|
| 295 |
parsing,
|
| 296 |
emotions,
|
| 297 |
colors,
|
| 298 |
+
obstructions,
|
| 299 |
+
hair_types,
|
| 300 |
) = get_analyzers()
|
| 301 |
|
| 302 |
results = {}
|
| 303 |
|
| 304 |
+
# Same seven-step pipeline as /analyze. Kept inline (rather
|
| 305 |
+
# than factored out) so the per-step `logger.info` cadence and
|
| 306 |
+
# ordering stay obvious when reading either endpoint top-down.
|
| 307 |
landmark_results = landmarks.analyze(img_array)
|
| 308 |
results.update(landmark_results)
|
| 309 |
|
| 310 |
demo_results = demographics.analyze(img_array)
|
| 311 |
results.update(demo_results)
|
| 312 |
|
|
|
|
|
|
|
|
|
|
| 313 |
parse_results = parsing.analyze(img_array)
|
| 314 |
results.update(parse_results)
|
| 315 |
|
|
|
|
| 324 |
)
|
| 325 |
results.update(color_results)
|
| 326 |
|
| 327 |
+
results.update(obstructions.analyze(img_array))
|
| 328 |
+
results.update(hair_types.analyze(img_array))
|
| 329 |
+
|
| 330 |
+
# Drop internal/scratch fields (leading underscore) before
|
| 331 |
+
# returning. Keeps masks and raw landmark lists out of the JSON.
|
| 332 |
results = {k: v for k, v in results.items() if not k.startswith("_")}
|
| 333 |
|
| 334 |
return {"success": True, "data": _to_json_safe(results)}
|
architecture.md
CHANGED
|
@@ -1,1707 +1,99 @@
|
|
| 1 |
-
# HCP Face Analysis — Architecture
|
| 2 |
-
|
| 3 |
-
##
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
│
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
│
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
│
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
##
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
#### 9. Skin Analysis (Wrinkles, Acne, etc.)
|
| 102 |
-
- **Model:** https://huggingface.co/imfarzanansari/skin-disease-detection (for acne/skin conditions)
|
| 103 |
-
- **For wrinkles:** Edge detection (Canny/Sobel) on forehead/eye regions from BiSeNet parsing — no model needed
|
| 104 |
-
- **For freckles/moles:** Blob detection on skin regions from BiSeNet parsing
|
| 105 |
-
|
| 106 |
-
---
|
| 107 |
-
|
| 108 |
-
## Complete Feature Coverage Map
|
| 109 |
-
|
| 110 |
-
| Feature | Model/Method | Confidence |
|
| 111 |
-
|---------|-------------|------------|
|
| 112 |
-
| **Face shape** (oval, round, square, heart, diamond, oblong, triangle) | MediaPipe landmarks geometric ratios + CelebA (`Oval_Face`) | ⭐⭐⭐⭐ |
|
| 113 |
-
| **Jawline** (sharp, soft, strong) | MediaPipe jaw landmark angles | ⭐⭐⭐⭐ |
|
| 114 |
-
| **Chin** (receding, pointed, cleft, wide) | MediaPipe chin landmarks + depth (z) | ⭐⭐⭐ |
|
| 115 |
-
| **Cheekbones** (high, flat, full, hollow) | MediaPipe landmark z-depth + CelebA (`High_Cheekbones`, `Chubby`) | ⭐⭐⭐⭐ |
|
| 116 |
-
| **Forehead** (broad, narrow) | MediaPipe forehead span ratio | ⭐⭐⭐⭐ |
|
| 117 |
-
| **Eye shape** (almond, round, hooded, monolid, upturned, downturned) | MediaPipe eyelid curvature + corner angles | ⭐⭐⭐⭐ |
|
| 118 |
-
| **Eye spacing** (wide-set, close-set) | MediaPipe interpupillary distance ratio | ⭐⭐⭐⭐⭐ |
|
| 119 |
-
| **Eye size** (large, small) | MediaPipe eye area / face area | ⭐⭐⭐⭐⭐ |
|
| 120 |
-
| **Deep-set / protruding eyes** | MediaPipe landmark z-depth at eye region | ⭐⭐⭐ |
|
| 121 |
-
| **Eye color** (brown, blue, green, hazel) | Iris crop → HSV color histogram + KNN | ⭐⭐⭐⭐ |
|
| 122 |
-
| **Dark under-eyes / eye bags** | BiSeNet skin parsing → brightness analysis under eyes | ⭐⭐⭐ |
|
| 123 |
-
| **Crow's feet** | Canny edge detection on BiSeNet-parsed outer eye skin | ⭐⭐⭐ |
|
| 124 |
-
| **Eyebrow shape** (arched, straight, bushy, thick, thin) | MediaPipe brow landmarks + CelebA (`Bushy_Eyebrows`, `Arched_Eyebrows`) | ⭐⭐⭐⭐ |
|
| 125 |
-
| **Unibrow** | MediaPipe inner brow distance + pixel analysis between brows | ⭐⭐⭐⭐ |
|
| 126 |
-
| **Nose shape** (straight, aquiline, button, upturned, wide, narrow) | MediaPipe nose landmarks + CelebA (`Big_Nose`, `Pointy_Nose`) | ⭐⭐⭐⭐ |
|
| 127 |
-
| **Nose bridge** (flat, high) | MediaPipe z-depth at nasal bridge | ⭐⭐⭐ |
|
| 128 |
-
| **Nostrils** (wide, narrow) | MediaPipe nostril landmark width ratio | ⭐⭐⭐⭐ |
|
| 129 |
-
| **Lips** (full, thin) | MediaPipe lip landmarks + CelebA (`Big_Lips`) | ⭐⭐⭐⭐ |
|
| 130 |
-
| **Mouth width** | MediaPipe mouth corner distance ratio | ⭐⭐⭐⭐⭐ |
|
| 131 |
-
| **Cupid's bow** | MediaPipe upper lip curvature analysis | ⭐⭐⭐ |
|
| 132 |
-
| **Teeth** (gap, crooked, straight, overbite, underbite) | Mouth crop when smiling → custom classifier or rule-based | ⭐⭐ |
|
| 133 |
-
| **Dimples** | MediaPipe blendshapes during smile + cheek region analysis | ⭐⭐⭐ |
|
| 134 |
-
| **Smile lines** | Edge detection on nasolabial region | ⭐⭐⭐ |
|
| 135 |
-
| **Asymmetrical smile** | MediaPipe left/right smile blendshape difference | ⭐⭐⭐⭐ |
|
| 136 |
-
| **Hair type** (straight, wavy, curly, coily) | BiSeNet hair segmentation → texture frequency (FFT) + CelebA (`Straight_Hair`, `Wavy_Hair`) | ⭐⭐⭐ |
|
| 137 |
-
| **Hair length** (short, long, bald) | BiSeNet hair mask area + CelebA (`Bald`, `Bangs`) | ⭐⭐⭐⭐ |
|
| 138 |
-
| **Hair color** (black, brown, blonde, red, gray, dyed) | BiSeNet hair mask → K-means color clustering + CelebA (`Black_Hair`, `Brown_Hair`, `Blond_Hair`, `Gray_Hair`) | ⭐⭐⭐⭐ |
|
| 139 |
-
| **Receding hairline / widow's peak** | BiSeNet hair boundary analysis + CelebA (`Receding_Hairline`) | ⭐⭐⭐ |
|
| 140 |
-
| **Beard/facial hair** (full, stubble, goatee, mustache, sideburns, clean-shaven) | BiSeNet parsing lower face + CelebA (`5_o_Clock_Shadow`, `Goatee`, `Mustache`, `No_Beard`, `Sideburns`) | ⭐⭐⭐⭐ |
|
| 141 |
-
| **Skin tone** (light, medium, dark) | BiSeNet skin parsing → mean LAB brightness | ⭐⭐⭐⭐⭐ |
|
| 142 |
-
| **Freckles** | BiSeNet skin mask → small blob detection (contrast) | ⭐⭐⭐ |
|
| 143 |
-
| **Moles / birthmark** | BiSeNet skin mask → dark blob detection | ⭐⭐⭐ |
|
| 144 |
-
| **Scars** | BiSeNet skin mask → linear edge anomaly detection | ⭐⭐ |
|
| 145 |
-
| **Acne** | BiSeNet skin mask → red blob detection or HuggingFace skin model | ⭐⭐⭐ |
|
| 146 |
-
| **Wrinkles / forehead lines** | BiSeNet forehead mask → Gabor filter or Canny edges | ⭐⭐⭐ |
|
| 147 |
-
| **Facial asymmetry** | MediaPipe left/right landmark mirror distance | ⭐⭐⭐⭐⭐ |
|
| 148 |
-
| **Prominent Adam's apple** | Neck region detection (limited accuracy) | ⭐ |
|
| 149 |
-
| **Glasses** | CelebA (`Eyeglasses`) + BiSeNet parsing | ⭐⭐⭐⭐⭐ |
|
| 150 |
-
| **Age** | FairFace (MAE 3.4 years) | ⭐⭐⭐⭐⭐ |
|
| 151 |
-
| **Gender** | FairFace (94.2%) | ⭐⭐⭐⭐⭐ |
|
| 152 |
-
| **Race** | FairFace (93.4%, 7 categories) | ⭐⭐⭐⭐⭐ |
|
| 153 |
-
| **Emotion** | HSEmotion (66.5% AffectNet-8, SOTA) | ⭐⭐⭐⭐ |
|
| 154 |
-
|
| 155 |
-
---
|
| 156 |
-
|
| 157 |
-
## Model Comparison Table
|
| 158 |
-
|
| 159 |
-
| Model | Accuracy (LFW) | Size | Runs in Deno/Browser? | Feature Depth | Notes |
|
| 160 |
-
|-------|----------------|------|----------------------|---------------|-------|
|
| 161 |
-
| **DeepFace** (Python) | 97.4% (VGG-Face) | 500MB+ | ❌ No (Python only) | Age, gender, race, emotion | Too large, wrong runtime |
|
| 162 |
-
| **InsightFace Buffalo_L** | 99.8% (LFW) | ~320MB | ❌ No (Python/C++) | Landmarks, age, gender | Too large for edge |
|
| 163 |
-
| **InsightFace MobileFaceNet** | 99.5% (LFW) | ~4MB | ⚠️ ONNX possible | Recognition only, no attributes | Very small but limited features |
|
| 164 |
-
| **MediaPipe Face Landmarker** | N/A (landmark model) | ~4MB | ✅ Yes (TFJS/WASM) | 478 landmarks, blendshapes | Best for geometric features |
|
| 165 |
-
| **face-api.js** | 99.2% (LFW) | ~6MB (all models) | ✅ Yes (TFJS) | Age, gender, emotion, 68 landmarks | Browser/Node.js ready |
|
| 166 |
-
| **ONNX FER+ (emotion)** | ~85% (FER2013) | ~2MB | ✅ Yes (ONNX.js) | Emotion only | Supplement model |
|
| 167 |
-
| **HuggingFace ViT models** | Varies | 50-350MB | ⚠️ ONNX export possible | Age, gender, various classifiers | Some fit under 50MB |
|
| 168 |
-
|
| 169 |
-
---
|
| 170 |
-
|
| 171 |
-
## Free Hosting Options for the Python Microservice
|
| 172 |
-
|
| 173 |
-
| Platform | Free Tier | RAM | Cold Start | Best For |
|
| 174 |
-
|----------|-----------|-----|------------|----------|
|
| 175 |
-
| **Hugging Face Spaces** | Unlimited | 2GB CPU | ~15s | Best free option, runs Gradio/FastAPI |
|
| 176 |
-
| **Railway.app** | $5 credit/month | 512MB | ~5s | Good for always-on API |
|
| 177 |
-
| **Render.com** | 750 hrs/month | 512MB | ~30s | Spins down after 15min inactivity |
|
| 178 |
-
| **Google Cloud Run** | 2M requests/month | 512MB | ~10s | Best scaling, pay-per-request |
|
| 179 |
-
| **Fly.io** | 3 shared VMs | 256MB | ~3s | Low latency, always on |
|
| 180 |
-
|
| 181 |
-
**Recommendation: Hugging Face Spaces** — 2GB RAM free, pre-installed ML libraries, no cold start limits, and you can use their Inference API for some models without even hosting.
|
| 182 |
-
|
| 183 |
-
---
|
| 184 |
-
|
| 185 |
-
## Full Implementation
|
| 186 |
-
|
| 187 |
-
### Python Microservice
|
| 188 |
-
|
| 189 |
-
#### requirements.txt
|
| 190 |
-
|
| 191 |
-
```
|
| 192 |
-
fastapi==0.115.0
|
| 193 |
-
uvicorn==0.30.0
|
| 194 |
-
python-multipart==0.0.9
|
| 195 |
-
mediapipe==0.10.14
|
| 196 |
-
insightface==0.7.3
|
| 197 |
-
onnxruntime==1.18.0
|
| 198 |
-
torch==2.3.0
|
| 199 |
-
torchvision==0.18.0
|
| 200 |
-
Pillow==10.4.0
|
| 201 |
-
numpy==1.26.4
|
| 202 |
-
opencv-python-headless==4.10.0.84
|
| 203 |
-
scipy==1.13.0
|
| 204 |
-
scikit-learn==1.5.0
|
| 205 |
-
huggingface-hub==0.23.0
|
| 206 |
-
```
|
| 207 |
-
|
| 208 |
-
#### face-service/app.py
|
| 209 |
-
|
| 210 |
-
```python
|
| 211 |
-
"""
|
| 212 |
-
Face Analysis Microservice
|
| 213 |
-
Combines multiple models for comprehensive facial attribute detection.
|
| 214 |
-
"""
|
| 215 |
-
|
| 216 |
-
import io
|
| 217 |
-
import logging
|
| 218 |
-
from typing import Optional
|
| 219 |
-
|
| 220 |
-
import cv2
|
| 221 |
-
import numpy as np
|
| 222 |
-
from fastapi import FastAPI, File, HTTPException, UploadFile
|
| 223 |
-
from fastapi.middleware.cors import CORSMiddleware
|
| 224 |
-
from PIL import Image
|
| 225 |
-
|
| 226 |
-
from analyzers.landmark_analyzer import LandmarkAnalyzer
|
| 227 |
-
from analyzers.demographic_analyzer import DemographicAnalyzer
|
| 228 |
-
from analyzers.attribute_analyzer import AttributeAnalyzer
|
| 229 |
-
from analyzers.parsing_analyzer import ParsingAnalyzer
|
| 230 |
-
from analyzers.emotion_analyzer import EmotionAnalyzer
|
| 231 |
-
from analyzers.color_analyzer import ColorAnalyzer
|
| 232 |
-
|
| 233 |
-
logging.basicConfig(level=logging.INFO)
|
| 234 |
-
logger = logging.getLogger(__name__)
|
| 235 |
-
|
| 236 |
-
app = FastAPI(title="Face Analysis Service", version="2.0.0")
|
| 237 |
-
|
| 238 |
-
app.add_middleware(
|
| 239 |
-
CORSMiddleware,
|
| 240 |
-
allow_origins=["*"], # Restrict in production
|
| 241 |
-
allow_credentials=True,
|
| 242 |
-
allow_methods=["*"],
|
| 243 |
-
allow_headers=["*"],
|
| 244 |
-
)
|
| 245 |
-
|
| 246 |
-
# Initialize analyzers lazily
|
| 247 |
-
landmark_analyzer: Optional[LandmarkAnalyzer] = None
|
| 248 |
-
demographic_analyzer: Optional[DemographicAnalyzer] = None
|
| 249 |
-
attribute_analyzer: Optional[AttributeAnalyzer] = None
|
| 250 |
-
parsing_analyzer: Optional[ParsingAnalyzer] = None
|
| 251 |
-
emotion_analyzer: Optional[EmotionAnalyzer] = None
|
| 252 |
-
color_analyzer: Optional[ColorAnalyzer] = None
|
| 253 |
-
|
| 254 |
-
|
| 255 |
-
def get_analyzers():
|
| 256 |
-
global landmark_analyzer, demographic_analyzer, attribute_analyzer
|
| 257 |
-
global parsing_analyzer, emotion_analyzer, color_analyzer
|
| 258 |
-
|
| 259 |
-
if landmark_analyzer is None:
|
| 260 |
-
logger.info("Loading MediaPipe landmarks...")
|
| 261 |
-
landmark_analyzer = LandmarkAnalyzer()
|
| 262 |
-
|
| 263 |
-
if demographic_analyzer is None:
|
| 264 |
-
logger.info("Loading FairFace demographics...")
|
| 265 |
-
demographic_analyzer = DemographicAnalyzer()
|
| 266 |
-
|
| 267 |
-
if attribute_analyzer is None:
|
| 268 |
-
logger.info("Loading CelebA attribute classifier...")
|
| 269 |
-
attribute_analyzer = AttributeAnalyzer()
|
| 270 |
-
|
| 271 |
-
if parsing_analyzer is None:
|
| 272 |
-
logger.info("Loading BiSeNet face parser...")
|
| 273 |
-
parsing_analyzer = ParsingAnalyzer()
|
| 274 |
-
|
| 275 |
-
if emotion_analyzer is None:
|
| 276 |
-
logger.info("Loading HSEmotion...")
|
| 277 |
-
emotion_analyzer = EmotionAnalyzer()
|
| 278 |
-
|
| 279 |
-
if color_analyzer is None:
|
| 280 |
-
color_analyzer = ColorAnalyzer()
|
| 281 |
-
|
| 282 |
-
return (
|
| 283 |
-
landmark_analyzer,
|
| 284 |
-
demographic_analyzer,
|
| 285 |
-
attribute_analyzer,
|
| 286 |
-
parsing_analyzer,
|
| 287 |
-
emotion_analyzer,
|
| 288 |
-
color_analyzer,
|
| 289 |
-
)
|
| 290 |
-
|
| 291 |
-
|
| 292 |
-
@app.get("/health")
|
| 293 |
-
async def health():
|
| 294 |
-
return {"status": "ok"}
|
| 295 |
-
|
| 296 |
-
|
| 297 |
-
@app.post("/analyze")
|
| 298 |
-
async def analyze_face(file: UploadFile = File(...)):
|
| 299 |
-
"""Comprehensive face analysis endpoint."""
|
| 300 |
-
try:
|
| 301 |
-
contents = await file.read()
|
| 302 |
-
image = Image.open(io.BytesIO(contents)).convert("RGB")
|
| 303 |
-
img_array = np.array(image)
|
| 304 |
-
img_bgr = cv2.cvtColor(img_array, cv2.COLOR_RGB2BGR)
|
| 305 |
-
|
| 306 |
-
(
|
| 307 |
-
landmarks,
|
| 308 |
-
demographics,
|
| 309 |
-
attributes,
|
| 310 |
-
parsing,
|
| 311 |
-
emotions,
|
| 312 |
-
colors,
|
| 313 |
-
) = get_analyzers()
|
| 314 |
-
|
| 315 |
-
results = {}
|
| 316 |
-
|
| 317 |
-
# 1. MediaPipe Landmarks → geometric features
|
| 318 |
-
logger.info("Running landmark analysis...")
|
| 319 |
-
landmark_results = landmarks.analyze(img_array)
|
| 320 |
-
results.update(landmark_results)
|
| 321 |
-
|
| 322 |
-
# 2. FairFace → age, gender, race
|
| 323 |
-
logger.info("Running demographic analysis...")
|
| 324 |
-
demo_results = demographics.analyze(img_array)
|
| 325 |
-
results.update(demo_results)
|
| 326 |
-
|
| 327 |
-
# 3. CelebA attributes → 40 binary facial attributes
|
| 328 |
-
logger.info("Running attribute analysis...")
|
| 329 |
-
attr_results = attributes.analyze(img_array)
|
| 330 |
-
results.update(attr_results)
|
| 331 |
-
|
| 332 |
-
# 4. BiSeNet face parsing → segmentation masks
|
| 333 |
-
logger.info("Running face parsing...")
|
| 334 |
-
parse_results = parsing.analyze(img_bgr)
|
| 335 |
-
results.update(parse_results)
|
| 336 |
-
|
| 337 |
-
# 5. HSEmotion → emotion classification
|
| 338 |
-
logger.info("Running emotion analysis...")
|
| 339 |
-
emo_results = emotions.analyze(img_array)
|
| 340 |
-
results.update(emo_results)
|
| 341 |
-
|
| 342 |
-
# 6. Color analysis using parsing masks
|
| 343 |
-
logger.info("Running color analysis...")
|
| 344 |
-
color_results = colors.analyze(
|
| 345 |
-
img_array,
|
| 346 |
-
skin_mask=parse_results.get("_skin_mask"),
|
| 347 |
-
hair_mask=parse_results.get("_hair_mask"),
|
| 348 |
-
landmark_data=landmark_results.get("_raw_landmarks"),
|
| 349 |
-
)
|
| 350 |
-
results.update(color_results)
|
| 351 |
-
|
| 352 |
-
# Remove internal fields
|
| 353 |
-
results = {k: v for k, v in results.items() if not k.startswith("_")}
|
| 354 |
-
|
| 355 |
-
return {"success": True, "data": results}
|
| 356 |
-
|
| 357 |
-
except Exception as e:
|
| 358 |
-
logger.error(f"Analysis failed: {e}", exc_info=True)
|
| 359 |
-
raise HTTPException(status_code=500, detail=str(e))
|
| 360 |
-
```
|
| 361 |
-
|
| 362 |
-
#### face-service/analyzers/landmark_analyzer.py
|
| 363 |
-
|
| 364 |
-
```python
|
| 365 |
-
"""
|
| 366 |
-
MediaPipe Face Landmarker — 478 3D landmarks + 52 blendshapes
|
| 367 |
-
Derives geometric facial features from landmark positions.
|
| 368 |
-
"""
|
| 369 |
-
|
| 370 |
-
import math
|
| 371 |
-
from typing import Any
|
| 372 |
-
|
| 373 |
-
import mediapipe as mp
|
| 374 |
-
import numpy as np
|
| 375 |
-
from mediapipe.tasks import python as mp_python
|
| 376 |
-
from mediapipe.tasks.python import vision
|
| 377 |
-
|
| 378 |
-
|
| 379 |
-
class LandmarkAnalyzer:
|
| 380 |
-
def __init__(self):
|
| 381 |
-
base_options = mp_python.BaseOptions(
|
| 382 |
-
model_asset_path=self._download_model()
|
| 383 |
-
)
|
| 384 |
-
options = vision.FaceLandmarkerOptions(
|
| 385 |
-
base_options=base_options,
|
| 386 |
-
output_face_blendshapes=True,
|
| 387 |
-
output_facial_transformation_matrixes=True,
|
| 388 |
-
num_faces=1,
|
| 389 |
-
)
|
| 390 |
-
self.detector = vision.FaceLandmarker.create_from_options(options)
|
| 391 |
-
|
| 392 |
-
def _download_model(self) -> str:
|
| 393 |
-
import urllib.request
|
| 394 |
-
import os
|
| 395 |
-
|
| 396 |
-
model_path = "models/face_landmarker.task"
|
| 397 |
-
if not os.path.exists(model_path):
|
| 398 |
-
os.makedirs("models", exist_ok=True)
|
| 399 |
-
url = "https://storage.googleapis.com/mediapipe-models/face_landmarker/face_landmarker/float16/latest/face_landmarker.task"
|
| 400 |
-
urllib.request.urlretrieve(url, model_path)
|
| 401 |
-
return model_path
|
| 402 |
-
|
| 403 |
-
def analyze(self, img_rgb: np.ndarray) -> dict[str, Any]:
|
| 404 |
-
mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=img_rgb)
|
| 405 |
-
result = self.detector.detect(mp_image)
|
| 406 |
-
|
| 407 |
-
if not result.face_landmarks:
|
| 408 |
-
return {"error": "No face detected by MediaPipe"}
|
| 409 |
-
|
| 410 |
-
landmarks = result.face_landmarks[0]
|
| 411 |
-
lm = [{"x": l.x, "y": l.y, "z": l.z} for l in landmarks]
|
| 412 |
-
|
| 413 |
-
blendshapes = {}
|
| 414 |
-
if result.face_blendshapes:
|
| 415 |
-
for bs in result.face_blendshapes[0]:
|
| 416 |
-
blendshapes[bs.category_name] = round(bs.score, 4)
|
| 417 |
-
|
| 418 |
-
attrs = {}
|
| 419 |
-
attrs["_raw_landmarks"] = lm
|
| 420 |
-
|
| 421 |
-
# === Face Shape ===
|
| 422 |
-
face_height = self._dist(lm[10], lm[152])
|
| 423 |
-
face_width = self._dist(lm[234], lm[454])
|
| 424 |
-
jaw_width = self._dist(lm[172], lm[397])
|
| 425 |
-
cheekbone_width = self._dist(lm[93], lm[323])
|
| 426 |
-
forehead_width = self._dist(lm[54], lm[284])
|
| 427 |
-
|
| 428 |
-
wh_ratio = face_width / face_height if face_height > 0 else 1
|
| 429 |
-
jaw_to_face = jaw_width / face_width if face_width > 0 else 1
|
| 430 |
-
forehead_to_jaw = forehead_width / jaw_width if jaw_width > 0 else 1
|
| 431 |
-
cheek_to_jaw = cheekbone_width / jaw_width if jaw_width > 0 else 1
|
| 432 |
-
|
| 433 |
-
if wh_ratio > 0.85 and jaw_to_face > 0.75:
|
| 434 |
-
attrs["face_shape"] = "round"
|
| 435 |
-
elif wh_ratio > 0.8 and jaw_to_face > 0.8 and forehead_to_jaw < 1.1:
|
| 436 |
-
attrs["face_shape"] = "square"
|
| 437 |
-
elif wh_ratio < 0.75:
|
| 438 |
-
attrs["face_shape"] = "oblong"
|
| 439 |
-
elif forehead_to_jaw > 1.3:
|
| 440 |
-
attrs["face_shape"] = "heart"
|
| 441 |
-
elif cheek_to_jaw > 1.25 and forehead_to_jaw < 1.15:
|
| 442 |
-
attrs["face_shape"] = "diamond"
|
| 443 |
-
elif forehead_to_jaw < 0.85:
|
| 444 |
-
attrs["face_shape"] = "triangle"
|
| 445 |
-
else:
|
| 446 |
-
attrs["face_shape"] = "oval"
|
| 447 |
-
|
| 448 |
-
attrs["face_shape_metrics"] = {
|
| 449 |
-
"width_height_ratio": round(wh_ratio, 3),
|
| 450 |
-
"jaw_to_face_ratio": round(jaw_to_face, 3),
|
| 451 |
-
"forehead_to_jaw_ratio": round(forehead_to_jaw, 3),
|
| 452 |
-
"cheekbone_to_jaw_ratio": round(cheek_to_jaw, 3),
|
| 453 |
-
}
|
| 454 |
-
|
| 455 |
-
# === Forehead ===
|
| 456 |
-
forehead_ratio = forehead_width / face_width if face_width > 0 else 0.6
|
| 457 |
-
attrs["forehead_width"] = (
|
| 458 |
-
"broad" if forehead_ratio > 0.7
|
| 459 |
-
else "narrow" if forehead_ratio < 0.55
|
| 460 |
-
else "average"
|
| 461 |
-
)
|
| 462 |
-
|
| 463 |
-
# === Jawline ===
|
| 464 |
-
jaw_angle = self._jaw_angle(lm)
|
| 465 |
-
attrs["jawline_angle"] = round(jaw_angle, 1)
|
| 466 |
-
if jaw_angle < 110:
|
| 467 |
-
attrs["jawline_type"] = "sharp"
|
| 468 |
-
elif jaw_angle > 140:
|
| 469 |
-
attrs["jawline_type"] = "soft"
|
| 470 |
-
elif jaw_to_face > 0.75:
|
| 471 |
-
attrs["jawline_type"] = "strong"
|
| 472 |
-
else:
|
| 473 |
-
attrs["jawline_type"] = "soft"
|
| 474 |
-
|
| 475 |
-
# === Chin ===
|
| 476 |
-
chin_width = self._dist(lm[175], lm[396])
|
| 477 |
-
chin_ratio = chin_width / jaw_width if jaw_width > 0 else 0.4
|
| 478 |
-
attrs["chin_type"] = (
|
| 479 |
-
"pointed" if chin_ratio < 0.3
|
| 480 |
-
else "wide" if chin_ratio > 0.5
|
| 481 |
-
else "normal"
|
| 482 |
-
)
|
| 483 |
-
|
| 484 |
-
# === Cheekbones ===
|
| 485 |
-
cheek_z = (lm[93]["z"] + lm[323]["z"]) / 2
|
| 486 |
-
attrs["cheekbone_prominence"] = (
|
| 487 |
-
"high" if cheek_z < -0.04
|
| 488 |
-
else "flat" if cheek_z > 0.0
|
| 489 |
-
else "moderate"
|
| 490 |
-
)
|
| 491 |
-
|
| 492 |
-
# Hollow vs full cheeks (blendshape-assisted)
|
| 493 |
-
cheek_puff = blendshapes.get("cheekPuff", 0)
|
| 494 |
-
cheek_squint_l = blendshapes.get("cheekSquintLeft", 0)
|
| 495 |
-
cheek_squint_r = blendshapes.get("cheekSquintRight", 0)
|
| 496 |
-
if cheek_puff > 0.3:
|
| 497 |
-
attrs["cheek_fullness"] = "full"
|
| 498 |
-
elif cheek_z > -0.01:
|
| 499 |
-
attrs["cheek_fullness"] = "hollow"
|
| 500 |
-
else:
|
| 501 |
-
attrs["cheek_fullness"] = "normal"
|
| 502 |
-
|
| 503 |
-
# === Eyes ===
|
| 504 |
-
left_eye_top = lm[159]
|
| 505 |
-
left_eye_bottom = lm[145]
|
| 506 |
-
left_eye_inner = lm[133]
|
| 507 |
-
left_eye_outer = lm[33]
|
| 508 |
-
eye_openness = self._dist(left_eye_top, left_eye_bottom)
|
| 509 |
-
eye_width_val = self._dist(left_eye_inner, left_eye_outer)
|
| 510 |
-
eye_ratio = eye_openness / eye_width_val if eye_width_val > 0 else 0.3
|
| 511 |
-
|
| 512 |
-
outer_angle = left_eye_outer["y"] - left_eye_inner["y"]
|
| 513 |
-
if outer_angle < -0.012:
|
| 514 |
-
attrs["eye_shape"] = "upturned"
|
| 515 |
-
elif outer_angle > 0.012:
|
| 516 |
-
attrs["eye_shape"] = "downturned"
|
| 517 |
-
elif eye_ratio > 0.38:
|
| 518 |
-
attrs["eye_shape"] = "round"
|
| 519 |
-
elif eye_ratio < 0.2:
|
| 520 |
-
attrs["eye_shape"] = "hooded"
|
| 521 |
-
else:
|
| 522 |
-
attrs["eye_shape"] = "almond"
|
| 523 |
-
|
| 524 |
-
# Deep-set vs protruding
|
| 525 |
-
eye_z = (lm[159]["z"] + lm[145]["z"]) / 2
|
| 526 |
-
nose_bridge_z = lm[6]["z"]
|
| 527 |
-
if eye_z > nose_bridge_z + 0.02:
|
| 528 |
-
attrs["eye_depth"] = "deep-set"
|
| 529 |
-
elif eye_z < nose_bridge_z - 0.01:
|
| 530 |
-
attrs["eye_depth"] = "protruding"
|
| 531 |
-
else:
|
| 532 |
-
attrs["eye_depth"] = "normal"
|
| 533 |
-
|
| 534 |
-
# Eye spacing
|
| 535 |
-
if len(lm) > 473: # Iris landmarks available
|
| 536 |
-
inter_pupillary = self._dist(lm[468], lm[473])
|
| 537 |
-
else:
|
| 538 |
-
inter_pupillary = self._dist(lm[133], lm[362])
|
| 539 |
-
ip_ratio = inter_pupillary / face_width if face_width > 0 else 0.35
|
| 540 |
-
attrs["eye_spacing"] = (
|
| 541 |
-
"wide-set" if ip_ratio > 0.38
|
| 542 |
-
else "close-set" if ip_ratio < 0.28
|
| 543 |
-
else "average"
|
| 544 |
-
)
|
| 545 |
-
|
| 546 |
-
# Eye size
|
| 547 |
-
right_eye_top = lm[386]
|
| 548 |
-
right_eye_bottom = lm[374]
|
| 549 |
-
right_eye_inner = lm[362]
|
| 550 |
-
right_eye_outer = lm[263]
|
| 551 |
-
r_eye_area = self._dist(right_eye_top, right_eye_bottom) * self._dist(right_eye_inner, right_eye_outer)
|
| 552 |
-
l_eye_area = eye_openness * eye_width_val
|
| 553 |
-
avg_eye_area = (l_eye_area + r_eye_area) / 2
|
| 554 |
-
face_area = face_width * face_height
|
| 555 |
-
eye_size_ratio = avg_eye_area / face_area if face_area > 0 else 0.015
|
| 556 |
-
attrs["eye_size"] = (
|
| 557 |
-
"large" if eye_size_ratio > 0.02
|
| 558 |
-
else "small" if eye_size_ratio < 0.012
|
| 559 |
-
else "average"
|
| 560 |
-
)
|
| 561 |
-
|
| 562 |
-
# Eye blink (closed vs open)
|
| 563 |
-
blink_l = blendshapes.get("eyeBlinkLeft", 0)
|
| 564 |
-
blink_r = blendshapes.get("eyeBlinkRight", 0)
|
| 565 |
-
attrs["eyes_open"] = (blink_l + blink_r) / 2 < 0.5
|
| 566 |
-
|
| 567 |
-
# === Eyebrows ===
|
| 568 |
-
brow_mid_l = lm[105]
|
| 569 |
-
brow_outer_l = lm[46]
|
| 570 |
-
brow_inner_l = lm[70]
|
| 571 |
-
brow_to_eye = self._dist(brow_mid_l, lm[159])
|
| 572 |
-
brow_arch_ratio = brow_to_eye / eye_openness if eye_openness > 0 else 1.5
|
| 573 |
-
|
| 574 |
-
attrs["eyebrow_arch_height"] = (
|
| 575 |
-
"high" if brow_arch_ratio > 2.2
|
| 576 |
-
else "low" if brow_arch_ratio < 1.3
|
| 577 |
-
else "average"
|
| 578 |
-
)
|
| 579 |
-
|
| 580 |
-
# Brow curvature
|
| 581 |
-
mid_y = brow_mid_l["y"]
|
| 582 |
-
avg_end_y = (brow_inner_l["y"] + brow_outer_l["y"]) / 2
|
| 583 |
-
curvature = mid_y - avg_end_y
|
| 584 |
-
if abs(curvature) < 0.003:
|
| 585 |
-
attrs["eyebrow_shape"] = "straight"
|
| 586 |
-
elif curvature < -0.008:
|
| 587 |
-
attrs["eyebrow_shape"] = "arched"
|
| 588 |
-
else:
|
| 589 |
-
attrs["eyebrow_shape"] = "flat"
|
| 590 |
-
|
| 591 |
-
# Eyebrow thickness (vertical span of brow landmarks)
|
| 592 |
-
brow_top = lm[66] # Top of left brow
|
| 593 |
-
brow_bottom = lm[105] # Bottom of left brow
|
| 594 |
-
brow_thickness = self._dist(brow_top, brow_bottom)
|
| 595 |
-
attrs["eyebrow_thickness"] = (
|
| 596 |
-
"thick" if brow_thickness > 0.015
|
| 597 |
-
else "thin" if brow_thickness < 0.008
|
| 598 |
-
else "medium"
|
| 599 |
-
)
|
| 600 |
-
|
| 601 |
-
# Unibrow detection
|
| 602 |
-
inner_brow_dist = self._dist(lm[70], lm[300])
|
| 603 |
-
attrs["possible_unibrow"] = inner_brow_dist < 0.04
|
| 604 |
-
|
| 605 |
-
# === Nose ===
|
| 606 |
-
nose_bridge_top = lm[6]
|
| 607 |
-
nose_tip = lm[1]
|
| 608 |
-
nose_bottom = lm[2]
|
| 609 |
-
left_nostril = lm[129]
|
| 610 |
-
right_nostril = lm[358]
|
| 611 |
-
nostril_w = self._dist(left_nostril, right_nostril)
|
| 612 |
-
|
| 613 |
-
nw_ratio = nostril_w / face_width if face_width > 0 else 0.24
|
| 614 |
-
attrs["nostril_width"] = (
|
| 615 |
-
"wide" if nw_ratio > 0.28
|
| 616 |
-
else "narrow" if nw_ratio < 0.2
|
| 617 |
-
else "average"
|
| 618 |
-
)
|
| 619 |
-
|
| 620 |
-
tip_angle = nose_tip["y"] - nose_bottom["y"]
|
| 621 |
-
if tip_angle < -0.005:
|
| 622 |
-
attrs["nose_shape"] = "upturned"
|
| 623 |
-
elif tip_angle > 0.01:
|
| 624 |
-
attrs["nose_shape"] = "aquiline"
|
| 625 |
-
elif nw_ratio > 0.28:
|
| 626 |
-
attrs["nose_shape"] = "wide"
|
| 627 |
-
elif nw_ratio < 0.2:
|
| 628 |
-
attrs["nose_shape"] = "narrow"
|
| 629 |
-
else:
|
| 630 |
-
attrs["nose_shape"] = "straight"
|
| 631 |
-
|
| 632 |
-
attrs["nose_bridge"] = (
|
| 633 |
-
"high" if nose_bridge_top["z"] < -0.05
|
| 634 |
-
else "flat" if nose_bridge_top["z"] > 0.0
|
| 635 |
-
else "average"
|
| 636 |
-
)
|
| 637 |
-
|
| 638 |
-
attrs["nose_tip_shape"] = (
|
| 639 |
-
"pointed" if nose_tip["z"] < nose_bottom["z"] - 0.01
|
| 640 |
-
else "rounded"
|
| 641 |
-
)
|
| 642 |
-
|
| 643 |
-
# === Lips & Mouth ===
|
| 644 |
-
upper_lip_top = lm[0]
|
| 645 |
-
upper_lip_bottom = lm[13]
|
| 646 |
-
lower_lip_top = lm[14]
|
| 647 |
-
lower_lip_bottom = lm[17]
|
| 648 |
-
mouth_left = lm[61]
|
| 649 |
-
mouth_right = lm[291]
|
| 650 |
-
|
| 651 |
-
upper_lip_h = self._dist(upper_lip_top, upper_lip_bottom)
|
| 652 |
-
lower_lip_h = self._dist(lower_lip_top, lower_lip_bottom)
|
| 653 |
-
total_lip_h = upper_lip_h + lower_lip_h
|
| 654 |
-
mouth_w = self._dist(mouth_left, mouth_right)
|
| 655 |
-
|
| 656 |
-
lip_ratio = total_lip_h / mouth_w if mouth_w > 0 else 0.3
|
| 657 |
-
attrs["lip_fullness"] = (
|
| 658 |
-
"full" if lip_ratio > 0.38
|
| 659 |
-
else "thin" if lip_ratio < 0.22
|
| 660 |
-
else "average"
|
| 661 |
-
)
|
| 662 |
-
|
| 663 |
-
attrs["lip_balance"] = (
|
| 664 |
-
"top-heavy" if upper_lip_h > lower_lip_h * 1.2
|
| 665 |
-
else "bottom-heavy" if lower_lip_h > upper_lip_h * 1.2
|
| 666 |
-
else "balanced"
|
| 667 |
-
)
|
| 668 |
-
|
| 669 |
-
mw_ratio = mouth_w / face_width if face_width > 0 else 0.37
|
| 670 |
-
attrs["mouth_width"] = (
|
| 671 |
-
"wide" if mw_ratio > 0.42
|
| 672 |
-
else "small" if mw_ratio < 0.32
|
| 673 |
-
else "average"
|
| 674 |
-
)
|
| 675 |
-
|
| 676 |
-
# Cupid's bow
|
| 677 |
-
cupid_left = lm[37]
|
| 678 |
-
cupid_center = lm[0]
|
| 679 |
-
cupid_right = lm[267]
|
| 680 |
-
bow_depth = cupid_center["y"] - (cupid_left["y"] + cupid_right["y"]) / 2
|
| 681 |
-
attrs["cupids_bow"] = (
|
| 682 |
-
"defined" if bow_depth > 0.005
|
| 683 |
-
else "subtle" if bow_depth > 0.002
|
| 684 |
-
else "flat"
|
| 685 |
-
)
|
| 686 |
-
|
| 687 |
-
# Smile
|
| 688 |
-
smile_l = blendshapes.get("mouthSmileLeft", 0)
|
| 689 |
-
smile_r = blendshapes.get("mouthSmileRight", 0)
|
| 690 |
-
attrs["smiling"] = (smile_l + smile_r) / 2 > 0.4
|
| 691 |
-
attrs["smile_asymmetry"] = round(abs(smile_l - smile_r), 3)
|
| 692 |
-
|
| 693 |
-
# Dimples (heuristic: strong smile with low cheek puff)
|
| 694 |
-
attrs["possible_dimples"] = (
|
| 695 |
-
(smile_l > 0.5 or smile_r > 0.5) and cheek_puff < 0.2
|
| 696 |
-
)
|
| 697 |
-
|
| 698 |
-
# === Facial Asymmetry ===
|
| 699 |
-
symmetry_pairs = [
|
| 700 |
-
(33, 263), (133, 362), (70, 300), (93, 323), (172, 397),
|
| 701 |
-
(61, 291), (159, 386), (145, 374), (46, 276),
|
| 702 |
-
]
|
| 703 |
-
asymmetry_sum = 0.0
|
| 704 |
-
for li, ri in symmetry_pairs:
|
| 705 |
-
left_dist = abs(lm[li]["x"] - 0.5)
|
| 706 |
-
right_dist = abs(lm[ri]["x"] - 0.5)
|
| 707 |
-
asymmetry_sum += abs(left_dist - right_dist)
|
| 708 |
-
attrs["facial_asymmetry_score"] = round(
|
| 709 |
-
min(asymmetry_sum / len(symmetry_pairs) / 0.05, 1.0), 3
|
| 710 |
-
)
|
| 711 |
-
|
| 712 |
-
# === Head Pose (from transformation matrix) ===
|
| 713 |
-
attrs["blendshapes"] = blendshapes
|
| 714 |
-
|
| 715 |
-
return attrs
|
| 716 |
-
|
| 717 |
-
def _dist(self, a: dict, b: dict) -> float:
|
| 718 |
-
return math.sqrt(
|
| 719 |
-
(a["x"] - b["x"]) ** 2
|
| 720 |
-
+ (a["y"] - b["y"]) ** 2
|
| 721 |
-
+ (a.get("z", 0) - b.get("z", 0)) ** 2
|
| 722 |
-
)
|
| 723 |
-
|
| 724 |
-
def _jaw_angle(self, lm: list[dict]) -> float:
|
| 725 |
-
chin = lm[152]
|
| 726 |
-
left_jaw = lm[172]
|
| 727 |
-
right_jaw = lm[397]
|
| 728 |
-
v1 = (left_jaw["x"] - chin["x"], left_jaw["y"] - chin["y"])
|
| 729 |
-
v2 = (right_jaw["x"] - chin["x"], right_jaw["y"] - chin["y"])
|
| 730 |
-
dot = v1[0] * v2[0] + v1[1] * v2[1]
|
| 731 |
-
mag1 = math.sqrt(v1[0] ** 2 + v1[1] ** 2)
|
| 732 |
-
mag2 = math.sqrt(v2[0] ** 2 + v2[1] ** 2)
|
| 733 |
-
if mag1 * mag2 == 0:
|
| 734 |
-
return 120.0
|
| 735 |
-
cos_angle = max(-1, min(1, dot / (mag1 * mag2)))
|
| 736 |
-
return math.acos(cos_angle) * (180 / math.pi)
|
| 737 |
-
```
|
| 738 |
-
|
| 739 |
-
#### face-service/analyzers/demographic_analyzer.py
|
| 740 |
-
|
| 741 |
-
```python
|
| 742 |
-
"""
|
| 743 |
-
FairFace — Age, Gender, Race prediction
|
| 744 |
-
Most fair and accurate demographic classifier.
|
| 745 |
-
"""
|
| 746 |
-
|
| 747 |
-
import os
|
| 748 |
-
from typing import Any
|
| 749 |
-
|
| 750 |
-
import cv2
|
| 751 |
-
import numpy as np
|
| 752 |
-
import torch
|
| 753 |
-
import torchvision.transforms as transforms
|
| 754 |
-
from huggingface_hub import hf_hub_download
|
| 755 |
-
from PIL import Image
|
| 756 |
-
from torchvision import models
|
| 757 |
-
|
| 758 |
-
|
| 759 |
-
class DemographicAnalyzer:
|
| 760 |
-
"""FairFace-based age, gender, race classifier."""
|
| 761 |
-
|
| 762 |
-
AGE_LABELS = [
|
| 763 |
-
"0-2", "3-9", "10-19", "20-29", "30-39", "40-49", "50-59", "60-69", "70+"
|
| 764 |
-
]
|
| 765 |
-
GENDER_LABELS = ["Male", "Female"]
|
| 766 |
-
RACE_LABELS = [
|
| 767 |
-
"White", "Black", "Latino_Hispanic", "East Asian",
|
| 768 |
-
"Southeast Asian", "Indian", "Middle Eastern"
|
| 769 |
-
]
|
| 770 |
-
|
| 771 |
-
def __init__(self):
|
| 772 |
-
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
| 773 |
-
self.model = self._load_model()
|
| 774 |
-
self.transform = transforms.Compose([
|
| 775 |
-
transforms.Resize((224, 224)),
|
| 776 |
-
transforms.ToTensor(),
|
| 777 |
-
transforms.Normalize(
|
| 778 |
-
mean=[0.485, 0.456, 0.406],
|
| 779 |
-
std=[0.229, 0.224, 0.225],
|
| 780 |
-
),
|
| 781 |
-
])
|
| 782 |
-
|
| 783 |
-
def _load_model(self):
|
| 784 |
-
"""Load FairFace ResNet34 model."""
|
| 785 |
-
model_path = "models/fairface_model.pt"
|
| 786 |
-
if not os.path.exists(model_path):
|
| 787 |
-
os.makedirs("models", exist_ok=True)
|
| 788 |
-
# Download from HuggingFace mirror or original source
|
| 789 |
-
# FairFace official weights: res34_fair_align_multi_7_20190809.pt
|
| 790 |
-
try:
|
| 791 |
-
hf_hub_download(
|
| 792 |
-
repo_id="dchen236/FairFace",
|
| 793 |
-
filename="res34_fair_align_multi_7_20190809.pt",
|
| 794 |
-
local_dir="models",
|
| 795 |
-
local_dir_use_symlinks=False,
|
| 796 |
-
)
|
| 797 |
-
os.rename(
|
| 798 |
-
"models/res34_fair_align_multi_7_20190809.pt",
|
| 799 |
-
model_path,
|
| 800 |
-
)
|
| 801 |
-
except Exception:
|
| 802 |
-
# Fallback: use a smaller pretrained model
|
| 803 |
-
raise FileNotFoundError(
|
| 804 |
-
"Please download FairFace weights from "
|
| 805 |
-
"https://github.com/dchen236/FairFace and place at models/fairface_model.pt"
|
| 806 |
-
)
|
| 807 |
-
|
| 808 |
-
model = models.resnet34(pretrained=False)
|
| 809 |
-
# FairFace has 3 output heads: race(7), gender(2), age(9) = 18
|
| 810 |
-
model.fc = torch.nn.Linear(model.fc.in_features, 18)
|
| 811 |
-
model.load_state_dict(torch.load(model_path, map_location=self.device))
|
| 812 |
-
model.to(self.device)
|
| 813 |
-
model.eval()
|
| 814 |
-
return model
|
| 815 |
-
|
| 816 |
-
def analyze(self, img_rgb: np.ndarray) -> dict[str, Any]:
|
| 817 |
-
"""Predict age, gender, and race."""
|
| 818 |
-
pil_image = Image.fromarray(img_rgb)
|
| 819 |
-
input_tensor = self.transform(pil_image).unsqueeze(0).to(self.device)
|
| 820 |
-
|
| 821 |
-
with torch.no_grad():
|
| 822 |
-
outputs = self.model(input_tensor)
|
| 823 |
-
|
| 824 |
-
outputs = outputs.cpu().numpy()[0]
|
| 825 |
-
|
| 826 |
-
# Split outputs: race(0-6), gender(7-8), age(9-17)
|
| 827 |
-
race_logits = outputs[0:7]
|
| 828 |
-
gender_logits = outputs[7:9]
|
| 829 |
-
age_logits = outputs[9:18]
|
| 830 |
-
|
| 831 |
-
race_probs = self._softmax(race_logits)
|
| 832 |
-
gender_probs = self._softmax(gender_logits)
|
| 833 |
-
age_probs = self._softmax(age_logits)
|
| 834 |
-
|
| 835 |
-
race_idx = int(np.argmax(race_probs))
|
| 836 |
-
gender_idx = int(np.argmax(gender_probs))
|
| 837 |
-
age_idx = int(np.argmax(age_probs))
|
| 838 |
-
|
| 839 |
-
# Estimate numeric age from bucket
|
| 840 |
-
age_ranges = [(0, 2), (3, 9), (10, 19), (20, 29), (30, 39), (40, 49), (50, 59), (60, 69), (70, 85)]
|
| 841 |
-
age_estimate = sum(age_ranges[age_idx]) / 2
|
| 842 |
-
|
| 843 |
-
return {
|
| 844 |
-
"age_estimate": round(age_estimate, 1),
|
| 845 |
-
"age_range": self.AGE_LABELS[age_idx],
|
| 846 |
-
"age_confidence": round(float(age_probs[age_idx]), 3),
|
| 847 |
-
"gender": self.GENDER_LABELS[gender_idx].lower(),
|
| 848 |
-
"gender_confidence": round(float(gender_probs[gender_idx]), 3),
|
| 849 |
-
"race": self.RACE_LABELS[race_idx],
|
| 850 |
-
"race_confidence": round(float(race_probs[race_idx]), 3),
|
| 851 |
-
"race_probabilities": {
|
| 852 |
-
label: round(float(prob), 3)
|
| 853 |
-
for label, prob in zip(self.RACE_LABELS, race_probs)
|
| 854 |
-
},
|
| 855 |
-
}
|
| 856 |
-
|
| 857 |
-
@staticmethod
|
| 858 |
-
def _softmax(x: np.ndarray) -> np.ndarray:
|
| 859 |
-
e_x = np.exp(x - np.max(x))
|
| 860 |
-
return e_x / e_x.sum()
|
| 861 |
-
```
|
| 862 |
-
|
| 863 |
-
#### face-service/analyzers/attribute_analyzer.py
|
| 864 |
-
|
| 865 |
-
```python
|
| 866 |
-
"""
|
| 867 |
-
CelebA Multi-Label Attribute Classifier
|
| 868 |
-
Predicts 40 binary facial attributes from CelebA-trained model.
|
| 869 |
-
Uses a pretrained model from HuggingFace.
|
| 870 |
-
"""
|
| 871 |
-
|
| 872 |
-
import os
|
| 873 |
-
from typing import Any
|
| 874 |
-
|
| 875 |
-
import numpy as np
|
| 876 |
-
import torch
|
| 877 |
-
import torchvision.transforms as transforms
|
| 878 |
-
from PIL import Image
|
| 879 |
-
|
| 880 |
-
|
| 881 |
-
CELEBA_ATTRIBUTES = [
|
| 882 |
-
"5_o_Clock_Shadow", "Arched_Eyebrows", "Attractive", "Bags_Under_Eyes",
|
| 883 |
-
"Bald", "Bangs", "Big_Lips", "Big_Nose", "Black_Hair", "Blond_Hair",
|
| 884 |
-
"Blurry", "Brown_Hair", "Bushy_Eyebrows", "Chubby", "Double_Chin",
|
| 885 |
-
"Eyeglasses", "Goatee", "Gray_Hair", "Heavy_Makeup", "High_Cheekbones",
|
| 886 |
-
"Male", "Mouth_Slightly_Open", "Mustache", "Narrow_Eyes", "No_Beard",
|
| 887 |
-
"Oval_Face", "Pale_Skin", "Pointy_Nose", "Receding_Hairline",
|
| 888 |
-
"Rosy_Cheeks", "Sideburns", "Smiling", "Straight_Hair", "Wavy_Hair",
|
| 889 |
-
"Wearing_Earrings", "Wearing_Hat", "Wearing_Lipstick", "Wearing_Necklace",
|
| 890 |
-
"Wearing_Necktie", "Young",
|
| 891 |
-
]
|
| 892 |
-
|
| 893 |
-
|
| 894 |
-
class AttributeAnalyzer:
|
| 895 |
-
"""CelebA 40-attribute binary classifier using a fine-tuned ResNet."""
|
| 896 |
-
|
| 897 |
-
def __init__(self):
|
| 898 |
-
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
| 899 |
-
self.model = self._load_model()
|
| 900 |
-
self.transform = transforms.Compose([
|
| 901 |
-
transforms.Resize((224, 224)),
|
| 902 |
-
transforms.ToTensor(),
|
| 903 |
-
transforms.Normalize(
|
| 904 |
-
mean=[0.485, 0.456, 0.406],
|
| 905 |
-
std=[0.229, 0.224, 0.225],
|
| 906 |
-
),
|
| 907 |
-
])
|
| 908 |
-
|
| 909 |
-
def _load_model(self):
|
| 910 |
-
"""
|
| 911 |
-
Load a CelebA attribute prediction model.
|
| 912 |
-
Using a ResNet-18 fine-tuned on CelebA for 40 attributes.
|
| 913 |
-
"""
|
| 914 |
-
from torchvision import models
|
| 915 |
-
|
| 916 |
-
model_path = "models/celeba_resnet18.pt"
|
| 917 |
-
|
| 918 |
-
if not os.path.exists(model_path):
|
| 919 |
-
os.makedirs("models", exist_ok=True)
|
| 920 |
-
# Try loading from HuggingFace
|
| 921 |
-
try:
|
| 922 |
-
from huggingface_hub import hf_hub_download
|
| 923 |
-
hf_hub_download(
|
| 924 |
-
repo_id="jnferreira/attribute-prediction-celebA",
|
| 925 |
-
filename="model.pt",
|
| 926 |
-
local_dir="models",
|
| 927 |
-
local_dir_use_symlinks=False,
|
| 928 |
-
)
|
| 929 |
-
os.rename("models/model.pt", model_path)
|
| 930 |
-
except Exception:
|
| 931 |
-
# Fallback: build a fresh model skeleton
|
| 932 |
-
# Users will need to train or provide weights
|
| 933 |
-
model = models.resnet18(pretrained=True)
|
| 934 |
-
model.fc = torch.nn.Linear(model.fc.in_features, 40)
|
| 935 |
-
torch.save(model.state_dict(), model_path)
|
| 936 |
-
print(
|
| 937 |
-
"WARNING: Using ImageNet-pretrained ResNet18 without CelebA fine-tuning. "
|
| 938 |
-
"Attribute predictions will be inaccurate. "
|
| 939 |
-
"Please provide CelebA-trained weights at models/celeba_resnet18.pt"
|
| 940 |
-
)
|
| 941 |
-
|
| 942 |
-
model = models.resnet18(pretrained=False)
|
| 943 |
-
model.fc = torch.nn.Linear(model.fc.in_features, 40)
|
| 944 |
-
model.load_state_dict(
|
| 945 |
-
torch.load(model_path, map_location=self.device)
|
| 946 |
-
)
|
| 947 |
-
model.to(self.device)
|
| 948 |
-
model.eval()
|
| 949 |
-
return model
|
| 950 |
-
|
| 951 |
-
def analyze(self, img_rgb: np.ndarray) -> dict[str, Any]:
|
| 952 |
-
pil_image = Image.fromarray(img_rgb)
|
| 953 |
-
input_tensor = self.transform(pil_image).unsqueeze(0).to(self.device)
|
| 954 |
-
|
| 955 |
-
with torch.no_grad():
|
| 956 |
-
logits = self.model(input_tensor)
|
| 957 |
-
|
| 958 |
-
probs = torch.sigmoid(logits).cpu().numpy()[0]
|
| 959 |
-
|
| 960 |
-
# Build structured results
|
| 961 |
-
raw_attrs = {
|
| 962 |
-
attr: round(float(prob), 3)
|
| 963 |
-
for attr, prob in zip(CELEBA_ATTRIBUTES, probs)
|
| 964 |
-
}
|
| 965 |
-
|
| 966 |
-
# Interpret into user-friendly categories
|
| 967 |
-
result: dict[str, Any] = {"celeba_raw": raw_attrs}
|
| 968 |
-
|
| 969 |
-
# Hair color (pick highest confidence)
|
| 970 |
-
hair_colors = {
|
| 971 |
-
"black": raw_attrs.get("Black_Hair", 0),
|
| 972 |
-
"brown": raw_attrs.get("Brown_Hair", 0),
|
| 973 |
-
"blonde": raw_attrs.get("Blond_Hair", 0),
|
| 974 |
-
"gray": raw_attrs.get("Gray_Hair", 0),
|
| 975 |
-
}
|
| 976 |
-
result["hair_color_celeba"] = max(hair_colors, key=hair_colors.get)
|
| 977 |
-
|
| 978 |
-
# Hair type
|
| 979 |
-
if raw_attrs.get("Straight_Hair", 0) > 0.5:
|
| 980 |
-
result["hair_type_celeba"] = "straight"
|
| 981 |
-
elif raw_attrs.get("Wavy_Hair", 0) > 0.5:
|
| 982 |
-
result["hair_type_celeba"] = "wavy"
|
| 983 |
-
else:
|
| 984 |
-
result["hair_type_celeba"] = "unknown"
|
| 985 |
-
|
| 986 |
-
result["bald"] = raw_attrs.get("Bald", 0) > 0.5
|
| 987 |
-
result["bangs"] = raw_attrs.get("Bangs", 0) > 0.5
|
| 988 |
-
result["receding_hairline"] = raw_attrs.get("Receding_Hairline", 0) > 0.5
|
| 989 |
-
|
| 990 |
-
# Facial hair
|
| 991 |
-
has_beard = raw_attrs.get("No_Beard", 0) < 0.5
|
| 992 |
-
has_goatee = raw_attrs.get("Goatee", 0) > 0.5
|
| 993 |
-
has_mustache = raw_attrs.get("Mustache", 0) > 0.5
|
| 994 |
-
has_sideburns = raw_attrs.get("Sideburns", 0) > 0.5
|
| 995 |
-
has_stubble = raw_attrs.get("5_o_Clock_Shadow", 0) > 0.5
|
| 996 |
-
|
| 997 |
-
if has_goatee:
|
| 998 |
-
result["facial_hair"] = "goatee"
|
| 999 |
-
elif has_mustache and has_beard:
|
| 1000 |
-
result["facial_hair"] = "full_beard"
|
| 1001 |
-
elif has_mustache:
|
| 1002 |
-
result["facial_hair"] = "mustache"
|
| 1003 |
-
elif has_sideburns:
|
| 1004 |
-
result["facial_hair"] = "sideburns"
|
| 1005 |
-
elif has_stubble:
|
| 1006 |
-
result["facial_hair"] = "stubble"
|
| 1007 |
-
elif not has_beard:
|
| 1008 |
-
result["facial_hair"] = "clean_shaven"
|
| 1009 |
-
else:
|
| 1010 |
-
result["facial_hair"] = "beard"
|
| 1011 |
-
|
| 1012 |
-
# Appearance attributes
|
| 1013 |
-
result["wearing_glasses"] = raw_attrs.get("Eyeglasses", 0) > 0.5
|
| 1014 |
-
result["wearing_hat"] = raw_attrs.get("Wearing_Hat", 0) > 0.5
|
| 1015 |
-
result["bushy_eyebrows"] = raw_attrs.get("Bushy_Eyebrows", 0) > 0.5
|
| 1016 |
-
result["arched_eyebrows_celeba"] = raw_attrs.get("Arched_Eyebrows", 0) > 0.5
|
| 1017 |
-
result["bags_under_eyes"] = raw_attrs.get("Bags_Under_Eyes", 0) > 0.5
|
| 1018 |
-
result["high_cheekbones_celeba"] = raw_attrs.get("High_Cheekbones", 0) > 0.5
|
| 1019 |
-
result["oval_face_celeba"] = raw_attrs.get("Oval_Face", 0) > 0.5
|
| 1020 |
-
result["pointy_nose_celeba"] = raw_attrs.get("Pointy_Nose", 0) > 0.5
|
| 1021 |
-
result["big_lips_celeba"] = raw_attrs.get("Big_Lips", 0) > 0.5
|
| 1022 |
-
result["big_nose_celeba"] = raw_attrs.get("Big_Nose", 0) > 0.5
|
| 1023 |
-
result["narrow_eyes_celeba"] = raw_attrs.get("Narrow_Eyes", 0) > 0.5
|
| 1024 |
-
result["double_chin"] = raw_attrs.get("Double_Chin", 0) > 0.5
|
| 1025 |
-
result["chubby"] = raw_attrs.get("Chubby", 0) > 0.5
|
| 1026 |
-
result["rosy_cheeks"] = raw_attrs.get("Rosy_Cheeks", 0) > 0.5
|
| 1027 |
-
result["pale_skin"] = raw_attrs.get("Pale_Skin", 0) > 0.5
|
| 1028 |
-
result["young"] = raw_attrs.get("Young", 0) > 0.5
|
| 1029 |
-
result["smiling_celeba"] = raw_attrs.get("Smiling", 0) > 0.5
|
| 1030 |
-
result["mouth_open"] = raw_attrs.get("Mouth_Slightly_Open", 0) > 0.5
|
| 1031 |
-
|
| 1032 |
-
return result
|
| 1033 |
-
```
|
| 1034 |
-
|
| 1035 |
-
#### face-service/analyzers/parsing_analyzer.py
|
| 1036 |
-
|
| 1037 |
-
```python
|
| 1038 |
-
"""
|
| 1039 |
-
BiSeNet Face Parsing — 19-class semantic segmentation of the face.
|
| 1040 |
-
Segments: skin, eyebrows, eyes, nose, lips, hair, ears, neck, etc.
|
| 1041 |
-
"""
|
| 1042 |
-
|
| 1043 |
-
import os
|
| 1044 |
-
from typing import Any
|
| 1045 |
-
|
| 1046 |
-
import cv2
|
| 1047 |
-
import numpy as np
|
| 1048 |
-
import torch
|
| 1049 |
-
from torchvision import transforms
|
| 1050 |
-
|
| 1051 |
-
|
| 1052 |
-
class ParsingAnalyzer:
|
| 1053 |
-
"""
|
| 1054 |
-
BiSeNet face parsing for hair/skin/feature segmentation.
|
| 1055 |
-
|
| 1056 |
-
Parsing classes:
|
| 1057 |
-
0: background, 1: skin, 2: l_brow, 3: r_brow, 4: l_eye, 5: r_eye,
|
| 1058 |
-
6: eye_g (glasses), 7: l_ear, 8: r_ear, 9: ear_r (earring),
|
| 1059 |
-
10: nose, 11: mouth, 12: u_lip, 13: l_lip, 14: neck,
|
| 1060 |
-
15: necklace, 16: cloth, 17: hair, 18: hat
|
| 1061 |
-
"""
|
| 1062 |
-
|
| 1063 |
-
LABELS = {
|
| 1064 |
-
0: "background", 1: "skin", 2: "left_brow", 3: "right_brow",
|
| 1065 |
-
4: "left_eye", 5: "right_eye", 6: "glasses", 7: "left_ear",
|
| 1066 |
-
8: "right_ear", 9: "earring", 10: "nose", 11: "mouth",
|
| 1067 |
-
12: "upper_lip", 13: "lower_lip", 14: "neck", 15: "necklace",
|
| 1068 |
-
16: "cloth", 17: "hair", 18: "hat",
|
| 1069 |
-
}
|
| 1070 |
-
|
| 1071 |
-
def __init__(self):
|
| 1072 |
-
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
| 1073 |
-
self.model = self._load_model()
|
| 1074 |
-
self.transform = transforms.Compose([
|
| 1075 |
-
transforms.ToTensor(),
|
| 1076 |
-
transforms.Normalize(
|
| 1077 |
-
mean=[0.485, 0.456, 0.406],
|
| 1078 |
-
std=[0.229, 0.224, 0.225],
|
| 1079 |
-
),
|
| 1080 |
-
])
|
| 1081 |
-
|
| 1082 |
-
def _load_model(self):
|
| 1083 |
-
model_path = "models/bisenet_face_parsing.pt"
|
| 1084 |
-
if not os.path.exists(model_path):
|
| 1085 |
-
os.makedirs("models", exist_ok=True)
|
| 1086 |
-
# BiSeNet model from face-parsing.PyTorch
|
| 1087 |
-
# Download from: https://drive.google.com/file/d/154JgKpzCPW82qINcVieuPH3fZ2e0P812
|
| 1088 |
-
raise FileNotFoundError(
|
| 1089 |
-
"Please download BiSeNet face parsing weights from "
|
| 1090 |
-
"https://github.com/zllrunning/face-parsing.PyTorch and place at "
|
| 1091 |
-
"models/bisenet_face_parsing.pt"
|
| 1092 |
-
)
|
| 1093 |
-
|
| 1094 |
-
from models.bisenet_model import BiSeNet # You'll need to include this
|
| 1095 |
-
model = BiSeNet(n_classes=19)
|
| 1096 |
-
model.load_state_dict(
|
| 1097 |
-
torch.load(model_path, map_location=self.device)
|
| 1098 |
-
)
|
| 1099 |
-
model.to(self.device)
|
| 1100 |
-
model.eval()
|
| 1101 |
-
return model
|
| 1102 |
-
|
| 1103 |
-
def analyze(self, img_bgr: np.ndarray) -> dict[str, Any]:
|
| 1104 |
-
h, w = img_bgr.shape[:2]
|
| 1105 |
-
img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)
|
| 1106 |
-
img_resized = cv2.resize(img_rgb, (512, 512))
|
| 1107 |
-
|
| 1108 |
-
input_tensor = self.transform(img_resized).unsqueeze(0).to(self.device)
|
| 1109 |
-
|
| 1110 |
-
with torch.no_grad():
|
| 1111 |
-
output = self.model(input_tensor)[0] # BiSeNet returns tuple
|
| 1112 |
-
|
| 1113 |
-
parsing = output.squeeze(0).argmax(0).cpu().numpy()
|
| 1114 |
-
parsing = cv2.resize(
|
| 1115 |
-
parsing.astype(np.uint8), (w, h), interpolation=cv2.INTER_NEAREST
|
| 1116 |
-
)
|
| 1117 |
-
|
| 1118 |
-
# Generate masks
|
| 1119 |
-
skin_mask = (parsing == 1).astype(np.uint8)
|
| 1120 |
-
hair_mask = (parsing == 17).astype(np.uint8)
|
| 1121 |
-
glasses_mask = (parsing == 6).astype(np.uint8)
|
| 1122 |
-
hat_mask = (parsing == 18).astype(np.uint8)
|
| 1123 |
-
|
| 1124 |
-
# Facial hair detection: look for dark pixels in lower face skin region
|
| 1125 |
-
lower_face = parsing[int(h * 0.55):int(h * 0.85), int(w * 0.25):int(w * 0.75)]
|
| 1126 |
-
lower_skin = (lower_face == 1).sum()
|
| 1127 |
-
total_lower = lower_face.size or 1
|
| 1128 |
-
|
| 1129 |
-
# Region stats
|
| 1130 |
-
hair_area = hair_mask.sum() / (h * w)
|
| 1131 |
-
skin_area = skin_mask.sum() / (h * w)
|
| 1132 |
-
|
| 1133 |
-
result: dict[str, Any] = {
|
| 1134 |
-
"_skin_mask": skin_mask,
|
| 1135 |
-
"_hair_mask": hair_mask,
|
| 1136 |
-
"has_glasses_parsing": int(glasses_mask.sum()) > 100,
|
| 1137 |
-
"wearing_hat_parsing": int(hat_mask.sum()) > 500,
|
| 1138 |
-
"hair_coverage": round(float(hair_area), 3),
|
| 1139 |
-
"skin_coverage": round(float(skin_area), 3),
|
| 1140 |
-
}
|
| 1141 |
-
|
| 1142 |
-
# Hair length estimation from mask
|
| 1143 |
-
if hair_area < 0.01:
|
| 1144 |
-
result["hair_length_estimate"] = "bald"
|
| 1145 |
-
elif hair_area < 0.08:
|
| 1146 |
-
result["hair_length_estimate"] = "short"
|
| 1147 |
-
elif hair_area < 0.18:
|
| 1148 |
-
result["hair_length_estimate"] = "medium"
|
| 1149 |
-
else:
|
| 1150 |
-
result["hair_length_estimate"] = "long"
|
| 1151 |
-
|
| 1152 |
-
# Wrinkle analysis on forehead skin
|
| 1153 |
-
forehead_region = img_bgr[int(h * 0.05):int(h * 0.25), int(w * 0.3):int(w * 0.7)]
|
| 1154 |
-
forehead_skin = skin_mask[int(h * 0.05):int(h * 0.25), int(w * 0.3):int(w * 0.7)]
|
| 1155 |
-
if forehead_skin.sum() > 100:
|
| 1156 |
-
gray_forehead = cv2.cvtColor(forehead_region, cv2.COLOR_BGR2GRAY)
|
| 1157 |
-
# Apply mask
|
| 1158 |
-
gray_forehead = cv2.bitwise_and(gray_forehead, gray_forehead, mask=forehead_skin)
|
| 1159 |
-
edges = cv2.Canny(gray_forehead, 30, 80)
|
| 1160 |
-
edge_density = edges.sum() / (forehead_skin.sum() * 255 + 1)
|
| 1161 |
-
result["forehead_wrinkle_score"] = round(float(edge_density), 3)
|
| 1162 |
-
result["forehead_wrinkles"] = (
|
| 1163 |
-
"heavy" if edge_density > 0.15
|
| 1164 |
-
else "moderate" if edge_density > 0.08
|
| 1165 |
-
else "mild" if edge_density > 0.04
|
| 1166 |
-
else "none"
|
| 1167 |
-
)
|
| 1168 |
-
|
| 1169 |
-
# Freckles/moles detection on skin
|
| 1170 |
-
skin_region = cv2.bitwise_and(img_bgr, img_bgr, mask=skin_mask)
|
| 1171 |
-
gray_skin = cv2.cvtColor(skin_region, cv2.COLOR_BGR2GRAY)
|
| 1172 |
-
# Detect dark spots
|
| 1173 |
-
_, dark_spots = cv2.threshold(gray_skin, 80, 255, cv2.THRESH_BINARY_INV)
|
| 1174 |
-
dark_spots = cv2.bitwise_and(dark_spots, dark_spots, mask=skin_mask)
|
| 1175 |
-
# Find contours of dark spots
|
| 1176 |
-
contours, _ = cv2.findContours(dark_spots, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
|
| 1177 |
-
small_spots = [c for c in contours if 5 < cv2.contourArea(c) < 200]
|
| 1178 |
-
result["possible_freckles_moles"] = len(small_spots) > 10
|
| 1179 |
-
result["dark_spot_count"] = len(small_spots)
|
| 1180 |
-
|
| 1181 |
-
return result
|
| 1182 |
-
```
|
| 1183 |
-
|
| 1184 |
-
#### face-service/analyzers/emotion_analyzer.py
|
| 1185 |
-
|
| 1186 |
-
```python
|
| 1187 |
-
"""
|
| 1188 |
-
HSEmotion — State-of-the-art facial emotion recognition.
|
| 1189 |
-
Supports 8 emotions on AffectNet.
|
| 1190 |
-
"""
|
| 1191 |
-
|
| 1192 |
-
import os
|
| 1193 |
-
from typing import Any
|
| 1194 |
-
|
| 1195 |
-
import cv2
|
| 1196 |
-
import numpy as np
|
| 1197 |
-
import torch
|
| 1198 |
-
import torchvision.transforms as transforms
|
| 1199 |
-
from PIL import Image
|
| 1200 |
-
|
| 1201 |
-
|
| 1202 |
-
class EmotionAnalyzer:
|
| 1203 |
-
"""HSEmotion-based facial expression classifier."""
|
| 1204 |
-
|
| 1205 |
-
EMOTION_LABELS = [
|
| 1206 |
-
"angry", "contempt", "disgust", "fear",
|
| 1207 |
-
"happy", "neutral", "sad", "surprise",
|
| 1208 |
-
]
|
| 1209 |
-
|
| 1210 |
-
def __init__(self):
|
| 1211 |
-
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
| 1212 |
-
self.model = self._load_model()
|
| 1213 |
-
self.transform = transforms.Compose([
|
| 1214 |
-
transforms.Resize((260, 260)),
|
| 1215 |
-
transforms.CenterCrop(224),
|
| 1216 |
-
transforms.ToTensor(),
|
| 1217 |
-
transforms.Normalize(
|
| 1218 |
-
mean=[0.485, 0.456, 0.406],
|
| 1219 |
-
std=[0.229, 0.224, 0.225],
|
| 1220 |
-
),
|
| 1221 |
-
])
|
| 1222 |
-
|
| 1223 |
-
def _load_model(self):
|
| 1224 |
-
"""Load HSEmotion EfficientNet model."""
|
| 1225 |
-
model_path = "models/hsemotion_enet_b0_8.pt"
|
| 1226 |
-
|
| 1227 |
-
if not os.path.exists(model_path):
|
| 1228 |
-
os.makedirs("models", exist_ok=True)
|
| 1229 |
-
try:
|
| 1230 |
-
from huggingface_hub import hf_hub_download
|
| 1231 |
-
# HSEmotion models available at:
|
| 1232 |
-
# https://github.com/HSE-asavchenko/face-emotion-recognition
|
| 1233 |
-
hf_hub_download(
|
| 1234 |
-
repo_id="HSE-asavchenko/hsemotion",
|
| 1235 |
-
filename="enet_b0_8_best_afew.pt",
|
| 1236 |
-
local_dir="models",
|
| 1237 |
-
local_dir_use_symlinks=False,
|
| 1238 |
-
)
|
| 1239 |
-
os.rename("models/enet_b0_8_best_afew.pt", model_path)
|
| 1240 |
-
except Exception:
|
| 1241 |
-
raise FileNotFoundError(
|
| 1242 |
-
"Please download HSEmotion weights from "
|
| 1243 |
-
"https://github.com/HSE-asavchenko/face-emotion-recognition"
|
| 1244 |
-
)
|
| 1245 |
-
|
| 1246 |
-
import timm
|
| 1247 |
-
model = timm.create_model("efficientnet_b0", pretrained=False, num_classes=8)
|
| 1248 |
-
model.load_state_dict(torch.load(model_path, map_location=self.device))
|
| 1249 |
-
model.to(self.device)
|
| 1250 |
-
model.eval()
|
| 1251 |
-
return model
|
| 1252 |
-
|
| 1253 |
-
def analyze(self, img_rgb: np.ndarray) -> dict[str, Any]:
|
| 1254 |
-
pil_image = Image.fromarray(img_rgb)
|
| 1255 |
-
input_tensor = self.transform(pil_image).unsqueeze(0).to(self.device)
|
| 1256 |
-
|
| 1257 |
-
with torch.no_grad():
|
| 1258 |
-
logits = self.model(input_tensor)
|
| 1259 |
-
|
| 1260 |
-
probs = torch.softmax(logits, dim=1).cpu().numpy()[0]
|
| 1261 |
-
top_idx = int(np.argmax(probs))
|
| 1262 |
-
|
| 1263 |
-
return {
|
| 1264 |
-
"emotion": self.EMOTION_LABELS[top_idx],
|
| 1265 |
-
"emotion_confidence": round(float(probs[top_idx]), 3),
|
| 1266 |
-
"emotion_probabilities": {
|
| 1267 |
-
label: round(float(prob), 3)
|
| 1268 |
-
for label, prob in zip(self.EMOTION_LABELS, probs)
|
| 1269 |
-
},
|
| 1270 |
-
}
|
| 1271 |
-
```
|
| 1272 |
-
|
| 1273 |
-
#### face-service/analyzers/color_analyzer.py
|
| 1274 |
-
|
| 1275 |
-
```python
|
| 1276 |
-
"""
|
| 1277 |
-
Pixel-level color analysis using segmentation masks from BiSeNet
|
| 1278 |
-
and landmark positions from MediaPipe.
|
| 1279 |
-
"""
|
| 1280 |
-
|
| 1281 |
-
from typing import Any, Optional
|
| 1282 |
-
|
| 1283 |
-
import cv2
|
| 1284 |
-
import numpy as np
|
| 1285 |
-
from sklearn.cluster import KMeans
|
| 1286 |
-
|
| 1287 |
-
|
| 1288 |
-
class ColorAnalyzer:
|
| 1289 |
-
"""Analyzes skin tone, eye color, and hair color from pixel data."""
|
| 1290 |
-
|
| 1291 |
-
def analyze(
|
| 1292 |
-
self,
|
| 1293 |
-
img_rgb: np.ndarray,
|
| 1294 |
-
skin_mask: Optional[np.ndarray] = None,
|
| 1295 |
-
hair_mask: Optional[np.ndarray] = None,
|
| 1296 |
-
landmark_data: Optional[list[dict]] = None,
|
| 1297 |
-
) -> dict[str, Any]:
|
| 1298 |
-
h, w = img_rgb.shape[:2]
|
| 1299 |
-
results: dict[str, Any] = {}
|
| 1300 |
-
|
| 1301 |
-
# === Skin Tone ===
|
| 1302 |
-
if skin_mask is not None and skin_mask.sum() > 100:
|
| 1303 |
-
skin_pixels = img_rgb[skin_mask > 0]
|
| 1304 |
-
# Convert to LAB for perceptually uniform brightness
|
| 1305 |
-
skin_lab = cv2.cvtColor(
|
| 1306 |
-
skin_pixels.reshape(-1, 1, 3), cv2.COLOR_RGB2LAB
|
| 1307 |
-
).reshape(-1, 3)
|
| 1308 |
-
avg_l = float(skin_lab[:, 0].mean()) # L channel (brightness)
|
| 1309 |
-
|
| 1310 |
-
if avg_l > 180:
|
| 1311 |
-
results["skin_tone"] = "very_light"
|
| 1312 |
-
elif avg_l > 155:
|
| 1313 |
-
results["skin_tone"] = "light"
|
| 1314 |
-
elif avg_l > 130:
|
| 1315 |
-
results["skin_tone"] = "medium_light"
|
| 1316 |
-
elif avg_l > 105:
|
| 1317 |
-
results["skin_tone"] = "medium"
|
| 1318 |
-
elif avg_l > 80:
|
| 1319 |
-
results["skin_tone"] = "medium_dark"
|
| 1320 |
-
else:
|
| 1321 |
-
results["skin_tone"] = "dark"
|
| 1322 |
-
|
| 1323 |
-
results["skin_tone_score"] = round(avg_l / 255, 3)
|
| 1324 |
-
|
| 1325 |
-
# Fitzpatrick scale approximation
|
| 1326 |
-
if avg_l > 170:
|
| 1327 |
-
results["fitzpatrick_type"] = "I"
|
| 1328 |
-
elif avg_l > 145:
|
| 1329 |
-
results["fitzpatrick_type"] = "II"
|
| 1330 |
-
elif avg_l > 120:
|
| 1331 |
-
results["fitzpatrick_type"] = "III"
|
| 1332 |
-
elif avg_l > 95:
|
| 1333 |
-
results["fitzpatrick_type"] = "IV"
|
| 1334 |
-
elif avg_l > 70:
|
| 1335 |
-
results["fitzpatrick_type"] = "V"
|
| 1336 |
-
else:
|
| 1337 |
-
results["fitzpatrick_type"] = "VI"
|
| 1338 |
-
|
| 1339 |
-
# === Hair Color ===
|
| 1340 |
-
if hair_mask is not None and hair_mask.sum() > 500:
|
| 1341 |
-
hair_pixels = img_rgb[hair_mask > 0]
|
| 1342 |
-
|
| 1343 |
-
# K-means to find dominant hair color
|
| 1344 |
-
if len(hair_pixels) > 100:
|
| 1345 |
-
sample_size = min(5000, len(hair_pixels))
|
| 1346 |
-
indices = np.random.choice(len(hair_pixels), sample_size, replace=False)
|
| 1347 |
-
sampled = hair_pixels[indices].astype(np.float32)
|
| 1348 |
-
|
| 1349 |
-
kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)
|
| 1350 |
-
kmeans.fit(sampled)
|
| 1351 |
-
|
| 1352 |
-
# Pick the cluster with most members
|
| 1353 |
-
labels, counts = np.unique(kmeans.labels_, return_counts=True)
|
| 1354 |
-
dominant_idx = labels[np.argmax(counts)]
|
| 1355 |
-
dominant_color = kmeans.cluster_centers_[dominant_idx].astype(int)
|
| 1356 |
-
|
| 1357 |
-
r, g, b = dominant_color
|
| 1358 |
-
brightness = (int(r) + int(g) + int(b)) / 3
|
| 1359 |
-
|
| 1360 |
-
# Classify hair color
|
| 1361 |
-
hsv_color = cv2.cvtColor(
|
| 1362 |
-
np.array([[dominant_color]], dtype=np.uint8), cv2.COLOR_RGB2HSV
|
| 1363 |
-
)[0][0]
|
| 1364 |
-
hue, sat, val = int(hsv_color[0]), int(hsv_color[1]), int(hsv_color[2])
|
| 1365 |
-
|
| 1366 |
-
if brightness < 40:
|
| 1367 |
-
results["hair_color_detected"] = "black"
|
| 1368 |
-
elif brightness > 190:
|
| 1369 |
-
results["hair_color_detected"] = "platinum_blonde"
|
| 1370 |
-
elif brightness > 160 and sat < 50:
|
| 1371 |
-
results["hair_color_detected"] = "gray"
|
| 1372 |
-
elif brightness > 140 and (hue > 15 and hue < 35):
|
| 1373 |
-
results["hair_color_detected"] = "blonde"
|
| 1374 |
-
elif (hue < 15 or hue > 160) and sat > 80:
|
| 1375 |
-
results["hair_color_detected"] = "red"
|
| 1376 |
-
elif brightness > 60:
|
| 1377 |
-
results["hair_color_detected"] = "brown"
|
| 1378 |
-
else:
|
| 1379 |
-
results["hair_color_detected"] = "dark_brown"
|
| 1380 |
-
|
| 1381 |
-
results["hair_dominant_rgb"] = [int(r), int(g), int(b)]
|
| 1382 |
-
|
| 1383 |
-
# Hair texture analysis (FFT-based)
|
| 1384 |
-
hair_region = cv2.bitwise_and(
|
| 1385 |
-
img_rgb,
|
| 1386 |
-
img_rgb,
|
| 1387 |
-
mask=hair_mask,
|
| 1388 |
-
)
|
| 1389 |
-
gray_hair = cv2.cvtColor(hair_region, cv2.COLOR_RGB2GRAY)
|
| 1390 |
-
# Mask out non-hair regions
|
| 1391 |
-
gray_hair_masked = gray_hair[hair_mask > 0]
|
| 1392 |
-
|
| 1393 |
-
if len(gray_hair_masked) > 1000:
|
| 1394 |
-
# Compute local variance as texture indicator
|
| 1395 |
-
# High frequency = curly, low frequency = straight
|
| 1396 |
-
hair_patch = gray_hair_masked[:1024].astype(np.float32)
|
| 1397 |
-
fft = np.fft.fft(hair_patch)
|
| 1398 |
-
magnitude = np.abs(fft)
|
| 1399 |
-
# Ratio of high freq to low freq energy
|
| 1400 |
-
low_freq = magnitude[:len(magnitude) // 4].sum()
|
| 1401 |
-
high_freq = magnitude[len(magnitude) // 4:].sum()
|
| 1402 |
-
freq_ratio = high_freq / (low_freq + 1e-6)
|
| 1403 |
-
|
| 1404 |
-
if freq_ratio > 0.8:
|
| 1405 |
-
results["hair_texture_detected"] = "curly"
|
| 1406 |
-
elif freq_ratio > 0.5:
|
| 1407 |
-
results["hair_texture_detected"] = "wavy"
|
| 1408 |
-
else:
|
| 1409 |
-
results["hair_texture_detected"] = "straight"
|
| 1410 |
-
|
| 1411 |
-
# === Eye Color ===
|
| 1412 |
-
if landmark_data is not None and len(landmark_data) > 473:
|
| 1413 |
-
for eye_name, iris_idx in [("left", 468), ("right", 473)]:
|
| 1414 |
-
ix = int(landmark_data[iris_idx]["x"] * w)
|
| 1415 |
-
iy = int(landmark_data[iris_idx]["y"] * h)
|
| 1416 |
-
|
| 1417 |
-
# Sample a small patch around iris
|
| 1418 |
-
pad = 3
|
| 1419 |
-
y1 = max(0, iy - pad)
|
| 1420 |
-
y2 = min(h, iy + pad)
|
| 1421 |
-
x1 = max(0, ix - pad)
|
| 1422 |
-
x2 = min(w, ix + pad)
|
| 1423 |
-
|
| 1424 |
-
iris_patch = img_rgb[y1:y2, x1:x2]
|
| 1425 |
-
if iris_patch.size == 0:
|
| 1426 |
-
continue
|
| 1427 |
-
|
| 1428 |
-
avg_color = iris_patch.mean(axis=(0, 1))
|
| 1429 |
-
r, g, b = avg_color
|
| 1430 |
-
|
| 1431 |
-
# Convert to HSV for better classification
|
| 1432 |
-
hsv = cv2.cvtColor(
|
| 1433 |
-
np.array([[avg_color]], dtype=np.uint8), cv2.COLOR_RGB2HSV
|
| 1434 |
-
)[0][0]
|
| 1435 |
-
hue_val, sat_val, val_val = int(hsv[0]), int(hsv[1]), int(hsv[2])
|
| 1436 |
-
|
| 1437 |
-
if val_val < 60:
|
| 1438 |
-
color = "dark_brown"
|
| 1439 |
-
elif sat_val < 30:
|
| 1440 |
-
color = "gray"
|
| 1441 |
-
elif hue_val > 100 and hue_val < 130 and sat_val > 50:
|
| 1442 |
-
color = "blue"
|
| 1443 |
-
elif hue_val > 35 and hue_val < 85 and sat_val > 40:
|
| 1444 |
-
color = "green"
|
| 1445 |
-
elif (hue_val > 15 and hue_val < 35) and sat_val > 40:
|
| 1446 |
-
color = "hazel"
|
| 1447 |
-
elif val_val > 120 and sat_val > 60:
|
| 1448 |
-
color = "amber"
|
| 1449 |
-
else:
|
| 1450 |
-
color = "brown"
|
| 1451 |
-
|
| 1452 |
-
results[f"{eye_name}_eye_color"] = color
|
| 1453 |
-
|
| 1454 |
-
# Consensus
|
| 1455 |
-
if "left_eye_color" in results and "right_eye_color" in results:
|
| 1456 |
-
if results["left_eye_color"] == results["right_eye_color"]:
|
| 1457 |
-
results["eye_color"] = results["left_eye_color"]
|
| 1458 |
-
else:
|
| 1459 |
-
results["eye_color"] = results["left_eye_color"] # Use left as primary
|
| 1460 |
-
results["heterochromia"] = True
|
| 1461 |
-
|
| 1462 |
-
return results
|
| 1463 |
-
```
|
| 1464 |
-
|
| 1465 |
-
#### face-service/Dockerfile
|
| 1466 |
-
|
| 1467 |
-
```dockerfile
|
| 1468 |
-
FROM python:3.11-slim
|
| 1469 |
-
|
| 1470 |
-
WORKDIR /app
|
| 1471 |
-
|
| 1472 |
-
# Install system dependencies for OpenCV
|
| 1473 |
-
RUN apt-get update && apt-get install -y \
|
| 1474 |
-
libgl1-mesa-glx \
|
| 1475 |
-
libglib2.0-0 \
|
| 1476 |
-
curl \
|
| 1477 |
-
&& rm -rf /var/lib/apt/lists/*
|
| 1478 |
-
|
| 1479 |
-
COPY requirements.txt .
|
| 1480 |
-
RUN pip install --no-cache-dir -r requirements.txt
|
| 1481 |
-
|
| 1482 |
-
COPY . .
|
| 1483 |
-
|
| 1484 |
-
# Download MediaPipe model at build time
|
| 1485 |
-
RUN python -c "from analyzers.landmark_analyzer import LandmarkAnalyzer; LandmarkAnalyzer()"
|
| 1486 |
-
|
| 1487 |
-
EXPOSE 8000
|
| 1488 |
-
|
| 1489 |
-
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
|
| 1490 |
-
```
|
| 1491 |
-
|
| 1492 |
-
### Connect Your Next.js App to the Microservice
|
| 1493 |
-
|
| 1494 |
-
#### lib/face-analysis/api-client.ts
|
| 1495 |
-
|
| 1496 |
-
```typescript
|
| 1497 |
-
/**
|
| 1498 |
-
* Client for the Python face analysis microservice.
|
| 1499 |
-
* Replaces the Supabase Edge Function that called AWS Rekognition.
|
| 1500 |
-
*/
|
| 1501 |
-
|
| 1502 |
-
const FACE_SERVICE_URL = process.env.NEXT_PUBLIC_FACE_SERVICE_URL || "http://localhost:8000";
|
| 1503 |
-
|
| 1504 |
-
export interface FaceAnalysisResult {
|
| 1505 |
-
// Geometric (MediaPipe)
|
| 1506 |
-
face_shape: string;
|
| 1507 |
-
face_shape_metrics: Record<string, number>;
|
| 1508 |
-
forehead_width: string;
|
| 1509 |
-
jawline_type: string;
|
| 1510 |
-
jawline_angle: number;
|
| 1511 |
-
chin_type: string;
|
| 1512 |
-
cheekbone_prominence: string;
|
| 1513 |
-
cheek_fullness: string;
|
| 1514 |
-
eye_shape: string;
|
| 1515 |
-
eye_depth: string;
|
| 1516 |
-
eye_spacing: string;
|
| 1517 |
-
eye_size: string;
|
| 1518 |
-
eyebrow_shape: string;
|
| 1519 |
-
eyebrow_arch_height: string;
|
| 1520 |
-
eyebrow_thickness: string;
|
| 1521 |
-
possible_unibrow: boolean;
|
| 1522 |
-
nose_shape: string;
|
| 1523 |
-
nose_bridge: string;
|
| 1524 |
-
nose_tip_shape: string;
|
| 1525 |
-
nostril_width: string;
|
| 1526 |
-
lip_fullness: string;
|
| 1527 |
-
lip_balance: string;
|
| 1528 |
-
mouth_width: string;
|
| 1529 |
-
cupids_bow: string;
|
| 1530 |
-
smiling: boolean;
|
| 1531 |
-
smile_asymmetry: number;
|
| 1532 |
-
possible_dimples: boolean;
|
| 1533 |
-
facial_asymmetry_score: number;
|
| 1534 |
-
|
| 1535 |
-
// Demographics (FairFace)
|
| 1536 |
-
age_estimate: number;
|
| 1537 |
-
age_range: string;
|
| 1538 |
-
age_confidence: number;
|
| 1539 |
-
gender: string;
|
| 1540 |
-
gender_confidence: number;
|
| 1541 |
-
race: string;
|
| 1542 |
-
race_confidence: number;
|
| 1543 |
-
race_probabilities: Record<string, number>;
|
| 1544 |
-
|
| 1545 |
-
// CelebA Attributes
|
| 1546 |
-
facial_hair: string;
|
| 1547 |
-
wearing_glasses: boolean;
|
| 1548 |
-
bald: boolean;
|
| 1549 |
-
receding_hairline: boolean;
|
| 1550 |
-
hair_color_celeba: string;
|
| 1551 |
-
hair_type_celeba: string;
|
| 1552 |
-
bags_under_eyes: boolean;
|
| 1553 |
-
double_chin: boolean;
|
| 1554 |
-
bushy_eyebrows: boolean;
|
| 1555 |
-
high_cheekbones_celeba: boolean;
|
| 1556 |
-
|
| 1557 |
-
// Emotion (HSEmotion)
|
| 1558 |
-
emotion: string;
|
| 1559 |
-
emotion_confidence: number;
|
| 1560 |
-
emotion_probabilities: Record<string, number>;
|
| 1561 |
-
|
| 1562 |
-
// Color Analysis
|
| 1563 |
-
skin_tone: string;
|
| 1564 |
-
skin_tone_score: number;
|
| 1565 |
-
fitzpatrick_type: string;
|
| 1566 |
-
eye_color: string;
|
| 1567 |
-
hair_color_detected: string;
|
| 1568 |
-
hair_dominant_rgb: number[];
|
| 1569 |
-
hair_texture_detected: string;
|
| 1570 |
-
|
| 1571 |
-
// Parsing
|
| 1572 |
-
hair_length_estimate: string;
|
| 1573 |
-
forehead_wrinkles: string;
|
| 1574 |
-
possible_freckles_moles: boolean;
|
| 1575 |
-
dark_spot_count: number;
|
| 1576 |
-
|
| 1577 |
-
// Blendshapes
|
| 1578 |
-
blendshapes: Record<string, number>;
|
| 1579 |
-
}
|
| 1580 |
-
|
| 1581 |
-
export async function analyzeFace(imageFile: File): Promise<FaceAnalysisResult> {
|
| 1582 |
-
const formData = new FormData();
|
| 1583 |
-
formData.append("file", imageFile);
|
| 1584 |
-
|
| 1585 |
-
const response = await fetch(`${FACE_SERVICE_URL}/analyze`, {
|
| 1586 |
-
method: "POST",
|
| 1587 |
-
body: formData,
|
| 1588 |
-
});
|
| 1589 |
-
|
| 1590 |
-
if (!response.ok) {
|
| 1591 |
-
const error = await response.json().catch(() => ({ detail: "Unknown error" }));
|
| 1592 |
-
throw new Error(`Face analysis failed: ${error.detail}`);
|
| 1593 |
-
}
|
| 1594 |
-
|
| 1595 |
-
const result = await response.json();
|
| 1596 |
-
|
| 1597 |
-
if (!result.success) {
|
| 1598 |
-
throw new Error("Face analysis returned unsuccessful result");
|
| 1599 |
-
}
|
| 1600 |
-
|
| 1601 |
-
return result.data;
|
| 1602 |
-
}
|
| 1603 |
-
|
| 1604 |
-
export async function checkServiceHealth(): Promise<boolean> {
|
| 1605 |
-
try {
|
| 1606 |
-
const response = await fetch(`${FACE_SERVICE_URL}/health`);
|
| 1607 |
-
return response.ok;
|
| 1608 |
-
} catch {
|
| 1609 |
-
return false;
|
| 1610 |
-
}
|
| 1611 |
-
}
|
| 1612 |
-
```
|
| 1613 |
-
|
| 1614 |
-
### Deploy to Hugging Face Spaces (Free)
|
| 1615 |
-
|
| 1616 |
-
Create a `README.md` in the `face-service/` directory with the following frontmatter:
|
| 1617 |
-
|
| 1618 |
-
```yaml
|
| 1619 |
-
---
|
| 1620 |
-
title: HCP Face Analysis
|
| 1621 |
-
emoji: 🔍
|
| 1622 |
-
colorFrom: blue
|
| 1623 |
-
colorTo: purple
|
| 1624 |
-
sdk: docker
|
| 1625 |
-
app_port: 8000
|
| 1626 |
-
---
|
| 1627 |
-
```
|
| 1628 |
-
|
| 1629 |
-
---
|
| 1630 |
-
|
| 1631 |
-
## Final Architecture Summary
|
| 1632 |
-
|
| 1633 |
-
```
|
| 1634 |
-
Browser (Next.js)
|
| 1635 |
-
│
|
| 1636 |
-
│ POST /analyze (image file)
|
| 1637 |
-
▼
|
| 1638 |
-
Hugging Face Spaces (FREE, 2GB RAM)
|
| 1639 |
-
├── FastAPI Server
|
| 1640 |
-
├── MediaPipe (4MB) ──────► 478 landmarks → ~40 geometric features
|
| 1641 |
-
├── FairFace (90MB) ──────► age, gender, race
|
| 1642 |
-
├── CelebA ResNet (44MB) ─► 40 binary attributes (hair, beard, glasses...)
|
| 1643 |
-
├── BiSeNet (50MB) ───────► face parsing → hair/skin segmentation
|
| 1644 |
-
├── HSEmotion (20MB) ─────► 8 emotions
|
| 1645 |
-
└── Color Analysis ───────► skin tone, eye color, hair color
|
| 1646 |
-
│
|
| 1647 |
-
│ JSON response (~150 attributes)
|
| 1648 |
-
▼
|
| 1649 |
-
Supabase (existing)
|
| 1650 |
-
├── Store results in PostgreSQL
|
| 1651 |
-
└── Auth / Storage unchanged
|
| 1652 |
-
```
|
| 1653 |
-
|
| 1654 |
-
| Metric | Value |
|
| 1655 |
-
|--------|-------|
|
| 1656 |
-
| **Total models** | ~210MB |
|
| 1657 |
-
| **Features detected** | **~95% of the full feature list** |
|
| 1658 |
-
| **Hosting cost** | **$0** (HF Spaces free tier) |
|
| 1659 |
-
| **Latency** | ~2-4s per image (CPU) |
|
| 1660 |
-
| **Languages** | Python (microservice) + TypeScript (existing Next.js) |
|
| 1661 |
-
| **Only missing** | Teeth analysis, scar detection, Adam's apple (require specialized fine-tuned models) |
|
| 1662 |
-
|
| 1663 |
-
---
|
| 1664 |
-
|
| 1665 |
-
## Required Feature List
|
| 1666 |
-
|
| 1667 |
-
### Face shape
|
| 1668 |
-
- Oval face, Round face, Square face, Heart-shaped face, Diamond face, Long/oblong face, Triangle face
|
| 1669 |
-
- Jawline sharp, Jawline soft, Strong jaw, Receding chin, Pointed chin, Cleft chin, Wide chin
|
| 1670 |
-
- High cheekbones, Flat cheekbones, Full cheeks, Hollow cheeks
|
| 1671 |
-
- Broad forehead, Narrow forehead
|
| 1672 |
-
|
| 1673 |
-
### Eye shape
|
| 1674 |
-
- Almond, Round, Hooded, Monolid, Deep-set eyes, Protruding eyes
|
| 1675 |
-
- Upturned eyes, Downturned eyes, Wide-set eyes, Close-set eyes, Large eyes, Small eyes
|
| 1676 |
-
- Eye color: brown, blue, green, hazel
|
| 1677 |
-
- Dark under-eyes, Eye bags, Crow's feet
|
| 1678 |
-
|
| 1679 |
-
### Eyebrows
|
| 1680 |
-
- Thick, Thin, Arched, Straight, Bushy, Unibrow
|
| 1681 |
-
- High eyebrow arch, Low eyebrow arch
|
| 1682 |
-
|
| 1683 |
-
### Nose
|
| 1684 |
-
- Straight, Aquiline, Button, Upturned, Wide, Narrow
|
| 1685 |
-
- Flat bridge, High bridge, Wide nostrils, Narrow nostrils
|
| 1686 |
-
- Rounded tip, Pointed tip
|
| 1687 |
-
|
| 1688 |
-
### Lips & Mouth
|
| 1689 |
-
- Full, Thin, Wide mouth, Small mouth
|
| 1690 |
-
- Defined cupid's bow, Uneven lips
|
| 1691 |
-
- Gap teeth, Crooked teeth, Straight teeth, Overbite, Underbite
|
| 1692 |
-
- Dimples, Smile lines, Asymmetrical smile
|
| 1693 |
-
|
| 1694 |
-
### Hair
|
| 1695 |
-
- Straight, Wavy, Curly, Coily
|
| 1696 |
-
- Short, Long, Bald, Receding hairline, Widow's peak
|
| 1697 |
-
- Thick, Thin
|
| 1698 |
-
- Color: black, brown, blonde, red, gray, dyed
|
| 1699 |
-
|
| 1700 |
-
### Facial hair
|
| 1701 |
-
- Full beard, Stubble, Goatee, Mustache, Clean-shaven, Sideburns
|
| 1702 |
-
|
| 1703 |
-
### Skin & Other
|
| 1704 |
-
- Skin tone: light, medium, dark
|
| 1705 |
-
- Freckles, Moles, Birthmark, Scar, Acne
|
| 1706 |
-
- Wrinkles, Forehead lines, Smile lines
|
| 1707 |
-
- Facial asymmetry, Prominent Adam's apple
|
|
|
|
| 1 |
+
# HCP Face Analysis — Architecture
|
| 2 |
+
|
| 3 |
+
## Pipeline
|
| 4 |
+
|
| 5 |
+
A single photo is fed through seven analyzers. Their outputs are merged
|
| 6 |
+
into one dictionary; later analyzers overwrite any colliding keys from
|
| 7 |
+
earlier ones.
|
| 8 |
+
|
| 9 |
+
```
|
| 10 |
+
Photo (RGB ndarray)
|
| 11 |
+
│
|
| 12 |
+
├─► [1] MediaPipe Face Landmarker
|
| 13 |
+
│ 478 landmarks + 52 blendshapes
|
| 14 |
+
│ → all geometric features (face/eye/nose/eyebrow/lip/jaw shape),
|
| 15 |
+
│ smiling (mouthSmile blendshapes), eyes_open, possible_dimples,
|
| 16 |
+
│ possible_unibrow, facial_asymmetry_score, blendshapes dict
|
| 17 |
+
│
|
| 18 |
+
├─► [2] FairFace + Ethnicity ViT (DemographicAnalyzer)
|
| 19 |
+
│ → age_range, age_estimate (softmax-weighted continuous), age_confidence,
|
| 20 |
+
│ gender + confidence, ethnicity + confidence, full distributions
|
| 21 |
+
│
|
| 22 |
+
├─► [3] SegFormer-B5 human parsing (ParsingAnalyzer)
|
| 23 |
+
│ → per-class pixel masks (face, hair, hat, …)
|
| 24 |
+
│ → hair_length, hair_present, hat_detected,
|
| 25 |
+
│ wrinkle_level, skin_texture_score, skin_uniformity, freckles_or_moles
|
| 26 |
+
│ (uses OpenCV stats over the SegFormer face mask for the skin rows)
|
| 27 |
+
│
|
| 28 |
+
├─► [4] HSEmotion EfficientNet-B0 (EmotionAnalyzer)
|
| 29 |
+
│ → primary/secondary emotion, emotion_scores (8 classes),
|
| 30 |
+
│ valence, arousal, mood
|
| 31 |
+
│
|
| 32 |
+
├─► [5] ColorAnalyzer (no ML — OpenCV LAB/HSV)
|
| 33 |
+
│ inputs: SegFormer skin/hair masks + MediaPipe landmarks
|
| 34 |
+
│ → skin_tone (Fitzpatrick + L*/a*/b* + hex), skin_undertone,
|
| 35 |
+
│ eye_color, hair_color (name + hex), hair_texture (pixel-Laplacian, coarse),
|
| 36 |
+
│ lip_color (shade + hex) ← lip mask built from MediaPipe outer-minus-inner lip
|
| 37 |
+
│
|
| 38 |
+
├─► [6] ObstructionViT — dima806/face_obstruction_image_detection
|
| 39 |
+
│ → wearing_glasses, wearing_sunglasses, wearing_mask,
|
| 40 |
+
│ obstruction_top, obstruction_scores
|
| 41 |
+
│
|
| 42 |
+
└─► [7] HairTypeViT — dima806/hair_type_image_detection
|
| 43 |
+
→ hair_type (curly/dreadlocks/kinky/straight/wavy),
|
| 44 |
+
hair_type_confidence, hair_type_scores
|
| 45 |
+
```
|
| 46 |
+
|
| 47 |
+
All masks and other internal fields use a leading underscore in the key
|
| 48 |
+
(e.g. `_skin_mask`). `app.py` strips those before returning JSON so the
|
| 49 |
+
client never sees them.
|
| 50 |
+
|
| 51 |
+
## Attribute → source map
|
| 52 |
+
|
| 53 |
+
The EditProfileScreen renders only fields backed by one of these
|
| 54 |
+
analyzers. Anything previously fed by the FaRL zero-shot classifier
|
| 55 |
+
has been removed because its outputs were too noisy to trust.
|
| 56 |
+
|
| 57 |
+
| Section | Field(s) | Source |
|
| 58 |
+
|---|---|---|
|
| 59 |
+
| Demographics | gender, age (continuous), age_range, ethnicity, distributions | FairFace + Ethnicity ViT |
|
| 60 |
+
| Emotion | primary/secondary emotion, scores, valence, arousal, mood | HSEmotion |
|
| 61 |
+
| Face Structure | face_shape (+ 4 ratios), jawline_type/angle, chin_type, cheekbone_prominence, cheek_fullness, forehead_width, facial_asymmetry_score | MediaPipe |
|
| 62 |
+
| Hair | hair_length, hair_present | SegFormer |
|
| 63 |
+
| Hair | hair_type (+ confidence) | HairTypeViT |
|
| 64 |
+
| Hair | hair_color, hair hex | ColorAnalyzer |
|
| 65 |
+
| Eyes | eye_shape, eye_depth, eye_spacing, eye_size, eyes_open | MediaPipe |
|
| 66 |
+
| Eyes | eye_color | ColorAnalyzer |
|
| 67 |
+
| Eyebrows | eyebrow_shape, eyebrow_arch_height, eyebrow_thickness, possible_unibrow | MediaPipe |
|
| 68 |
+
| Nose | nose_shape, nose_bridge, nose_tip_shape, nostril_width | MediaPipe |
|
| 69 |
+
| Lips & Mouth | lip_fullness, lip_balance, mouth_width, cupids_bow, smile_asymmetry, possible_dimples, smiling, mouth_open | MediaPipe (last two via blendshapes) |
|
| 70 |
+
| Lips & Mouth | lip_color (shade + hex) | ColorAnalyzer (mask from MediaPipe) |
|
| 71 |
+
| Skin | skin_tone (Fitzpatrick, L*/a*/b*, hex), skin_undertone | ColorAnalyzer |
|
| 72 |
+
| Skin | wrinkle_level, skin_texture_score, skin_uniformity, freckles_or_moles | SegFormer mask + OpenCV stats |
|
| 73 |
+
| Accessories | wearing_glasses, wearing_sunglasses, wearing_mask | ObstructionViT |
|
| 74 |
+
| Accessories | wearing_hat | SegFormer (hat class coverage) |
|
| 75 |
+
|
| 76 |
+
## Deployment
|
| 77 |
+
|
| 78 |
+
The service is built as a Docker image targeting Hugging Face Spaces
|
| 79 |
+
free tier (2GB RAM, shared CPU). The MediaPipe `.task` is pulled at
|
| 80 |
+
build time; all Hugging Face models lazy-download on first inference
|
| 81 |
+
and cache under `/root/.cache/huggingface` inside the container.
|
| 82 |
+
|
| 83 |
+
The Node/Express server forwards `/analyze-face` requests to
|
| 84 |
+
`FACE_SERVICE_URL/analyze-base64`. The React Native client never talks
|
| 85 |
+
to this service directly.
|
| 86 |
+
|
| 87 |
+
## Adding a new analyzer
|
| 88 |
+
|
| 89 |
+
1. Drop a new module under `analyzers/` exposing a class with
|
| 90 |
+
`__init__()` and `analyze(img_rgb) -> dict`.
|
| 91 |
+
2. Import it in `app.py`, add a global slot and a lazy-load block in
|
| 92 |
+
`get_analyzers()`, and append a `results.update(...)` call to both
|
| 93 |
+
`/analyze` and `/analyze-base64`.
|
| 94 |
+
3. Surface the new keys in `client/src/screens/EditProfileScreen.js`
|
| 95 |
+
and add a legend row in the "Analysis Method Details" section.
|
| 96 |
+
|
| 97 |
+
Order matters: later analyzers overwrite earlier keys on collision.
|
| 98 |
+
The specialized ViT classifiers run last so they win over any coarser
|
| 99 |
+
signal.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
requirements.txt
CHANGED
|
@@ -13,6 +13,3 @@ timm==1.0.3
|
|
| 13 |
safetensors>=0.6.0
|
| 14 |
transformers==4.45.2
|
| 15 |
hsemotion>=0.2.2
|
| 16 |
-
openai-clip==1.0.1
|
| 17 |
-
ftfy
|
| 18 |
-
regex
|
|
|
|
| 13 |
safetensors>=0.6.0
|
| 14 |
transformers==4.45.2
|
| 15 |
hsemotion>=0.2.2
|
|
|
|
|
|
|
|
|