Spaces:

onitsche
/

recognizer

Running

Oliver Nitsche Claude Sonnet 4.6 commited on 27 days ago

Commit

2fda523

1 Parent(s): 77c6ffa

Switch to local ONNX face recognition (no compilation, no cloud)

Replaces AWS Rekognition with onnxruntime + OpenCV Haar cascade +
InsightFace MobileFaceNet (w600k_mbf.onnx, ~17 MB downloaded once on
first run). Both onnxruntime and opencv-python-headless ship pre-built
ARM64 wheels so pip install is fast with no compilation step. Face
embeddings are stored locally in face_db.json as before.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Files changed (5) hide show

.gitignore +3 -1
CLAUDE.md +7 -14
pyproject.toml +2 -2
recognizer/face_db.py +122 -58
recognizer/main.py +5 -5

.gitignore CHANGED Viewed

@@ -1,3 +1,5 @@
 __pycache__/
 *.egg-info/
-build/

 __pycache__/
 *.egg-info/
+build/
+recognizer/models/
+recognizer/face_db.json

CLAUDE.md CHANGED Viewed

@@ -22,21 +22,14 @@ pip install -e .
 sudo apt-get install espeak-ng        # text-to-speech synthesis
 ```
-### AWS Rekognition credentials
-Face recognition is handled by AWS Rekognition (no local compilation required).
-Set credentials on the robot before running:
-```bash
-export AWS_ACCESS_KEY_ID=...
-export AWS_SECRET_ACCESS_KEY=...
-export AWS_DEFAULT_REGION=us-east-1   # or your preferred region
-```
-Or use `aws configure` if the AWS CLI is installed. The app auto-creates a
-Rekognition collection named `reachy-mini-recognizer` on first run.
-The IAM user/role needs: `rekognition:CreateCollection`,
-`rekognition:IndexFaces`, `rekognition:SearchFacesByImage`.
 ## Running the App
@@ -87,7 +80,7 @@ SLEEPING  →(speech detected × 3)→  WAKING  →  ACTIVE  →  SLEEPING
 - **ENROLLING**: robot has detected an unrecognised face; waits for name to be submitted via the web UI (`POST /set_name`). Stores encoding in `face_db.json`, says "Nice to meet you, <name>!", then sleeps.
 **Helper modules**:
-- `recognizer/face_db.py` — AWS Rekognition wrapper. `load()` creates/opens the collection and returns its ID. `find_match(frame_bgr, collection_id)` returns the name or None (raises `NoFaceDetected` if no face present). `add_face(name, frame_bgr, collection_id)` enrolls a face. Similarity threshold = 85 (0–100 scale).
 - `recognizer/tts.py` — synthesises text via `espeak-ng -s 140 -w <tmp.wav>`, plays via `media.play_sound()`, then sleeps to let playback finish.
 **Settings UI** (`recognizer/static/`):

 sudo apt-get install espeak-ng        # text-to-speech synthesis
 ```
+### Face recognition model (one-time download)
+Face recognition runs **locally** using ONNX Runtime (no cloud account needed).
+On first run the app downloads the InsightFace MobileFaceNet model (~17 MB)
+from GitHub and caches it at `recognizer/models/w600k_mbf.onnx`.
+Requires internet access the first time only; fully offline thereafter.
+Requires **64-bit Raspberry Pi OS** (onnxruntime ships pre-built aarch64 wheels).
 ## Running the App
 - **ENROLLING**: robot has detected an unrecognised face; waits for name to be submitted via the web UI (`POST /set_name`). Stores encoding in `face_db.json`, says "Nice to meet you, <name>!", then sleeps.
 **Helper modules**:
+- `recognizer/face_db.py` — local face recognition via ONNX Runtime. `load()` warms up the ONNX session (downloads model on first run) and returns the embedding DB dict. `find_match(frame_bgr, db)` detects with OpenCV Haar cascade, embeds with MobileFaceNet, matches by cosine similarity (threshold 0.35); raises `NoFaceDetected` if no face. `add_face(name, frame_bgr, db)` enrolls a face. DB stored in `recognizer/face_db.json`.
 - `recognizer/tts.py` — synthesises text via `espeak-ng -s 140 -w <tmp.wav>`, plays via `media.play_sound()`, then sleeps to let playback finish.
 **Settings UI** (`recognizer/static/`):

pyproject.toml CHANGED Viewed

@@ -11,8 +11,8 @@ readme = "README.md"
 requires-python = ">=3.10"
 dependencies = [
     "reachy-mini",
-    "boto3",
-    "Pillow",
     "scipy",
 ]
 keywords = ["reachy-mini-app", "reachy-mini"]

 requires-python = ">=3.10"
 dependencies = [
     "reachy-mini",
+    "onnxruntime",
+    "opencv-python-headless",
     "scipy",
 ]
 keywords = ["reachy-mini-app", "reachy-mini"]

recognizer/face_db.py CHANGED Viewed

@@ -1,86 +1,150 @@
-"""Face database: backed by AWS Rekognition.
-Requires boto3 and AWS credentials configured (e.g. via environment variables
-AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_DEFAULT_REGION, or an IAM role).
 """
-import io
 import logging
 from typing import Optional
-import boto3
-from botocore.exceptions import BotoCoreError, ClientError
-from PIL import Image
 logger = logging.getLogger(__name__)
-COLLECTION_ID = "reachy-mini-recognizer"
 class NoFaceDetected(Exception):
     """Raised when no face is found in the provided image."""
-def _client():
-    return boto3.client("rekognition")
-def _to_jpeg(frame_bgr) -> bytes:
-    rgb = frame_bgr[:, :, ::-1]
-    buf = io.BytesIO()
-    Image.fromarray(rgb).save(buf, format="JPEG")
-    return buf.getvalue()
-def load() -> str:
-    """Ensure the Rekognition collection exists; return its ID."""
-    client = _client()
-    try:
-        client.create_collection(CollectionId=COLLECTION_ID)
-        logger.info("Created Rekognition collection '%s'", COLLECTION_ID)
-    except client.exceptions.ResourceAlreadyExistsException:
-        pass
-    return COLLECTION_ID
 def find_match(
-    frame_bgr,
-    collection_id: str,
-    threshold: float = 85.0,
 ) -> Optional[str]:
-    """Search for a face in frame_bgr against the collection.
-    Returns the matched name if recognised, None if a face is present but
-    unknown.  Raises NoFaceDetected if no face appears in the image at all.
     """
-    client = _client()
-    try:
-        resp = client.search_faces_by_image(
-            CollectionId=collection_id,
-            Image={"Bytes": _to_jpeg(frame_bgr)},
-            FaceMatchThreshold=threshold,
-            MaxFaces=1,
-        )
-        matches = resp.get("FaceMatches", [])
-        if matches:
-            return matches[0]["Face"]["ExternalImageId"]
-        return None  # face detected but not in collection
-    except client.exceptions.InvalidParameterException:
         raise NoFaceDetected()
-    except (BotoCoreError, ClientError) as exc:
-        logger.warning("Rekognition error: %s", exc)
-        raise NoFaceDetected()
-def add_face(name: str, frame_bgr, collection_id: str) -> None:
-    """Index the face in frame_bgr under name in the collection."""
-    client = _client()
-    resp = client.index_faces(
-        CollectionId=collection_id,
-        Image={"Bytes": _to_jpeg(frame_bgr)},
-        ExternalImageId=name,
-        MaxFaces=1,
-        DetectionAttributes=[],
-    )
-    if not resp.get("FaceRecords"):
         raise ValueError("No face detected in enrollment image")

+"""Face database: local face recognition via ONNX + OpenCV.
+Detection : OpenCV Haar cascade (built into opencv, no download).
+Embedding : InsightFace MobileFaceNet (w600k_mbf.onnx, ~17 MB, downloaded
+            once on first run from the InsightFace GitHub release).
+Matching  : cosine similarity on L2-normalised 512-D embeddings.
+Storage   : recognizer/face_db.json (gitignored).
+No compilation required — onnxruntime ships pre-built ARM64 wheels.
 """
+import json
 import logging
+import urllib.request
+import zipfile
+from pathlib import Path
 from typing import Optional
+import cv2
+import numpy as np
+import onnxruntime as ort
 logger = logging.getLogger(__name__)
+DB_PATH    = Path(__file__).parent / "face_db.json"
+MODEL_DIR  = Path(__file__).parent / "models"
+MODEL_FILE = MODEL_DIR / "w600k_mbf.onnx"
+MODEL_URL  = (
+    "https://github.com/deepinsight/insightface"
+    "/releases/download/v0.7/buffalo_sc.zip"
+)
+_REC_ENTRY = "buffalo_sc/w600k_mbf.onnx"   # path inside the zip
+_CASCADE = cv2.CascadeClassifier(
+    cv2.data.haarcascades + "haarcascade_frontalface_default.xml"
+)
+_session: Optional[ort.InferenceSession] = None
 class NoFaceDetected(Exception):
     """Raised when no face is found in the provided image."""
+# ---------------------------------------------------------------------------
+# Internal helpers
+# ---------------------------------------------------------------------------
+def _ensure_model() -> None:
+    if MODEL_FILE.exists():
+        return
+    MODEL_DIR.mkdir(exist_ok=True)
+    zip_path = MODEL_DIR / "buffalo_sc.zip"
+    logger.info("Downloading face recognition model (~17 MB) — one-time setup...")
+    urllib.request.urlretrieve(MODEL_URL, zip_path)
+    with zipfile.ZipFile(zip_path) as zf:
+        with zf.open(_REC_ENTRY) as src, open(MODEL_FILE, "wb") as dst:
+            dst.write(src.read())
+    zip_path.unlink()
+    logger.info("Model ready at %s", MODEL_FILE)
+def _get_session() -> ort.InferenceSession:
+    global _session
+    if _session is None:
+        _ensure_model()
+        _session = ort.InferenceSession(
+            str(MODEL_FILE), providers=["CPUExecutionProvider"]
+        )
+    return _session
+def _detect(frame_bgr: np.ndarray) -> list[tuple[int, int, int, int]]:
+    gray = cv2.cvtColor(frame_bgr, cv2.COLOR_BGR2GRAY)
+    boxes = _CASCADE.detectMultiScale(
+        gray, scaleFactor=1.1, minNeighbors=4, minSize=(60, 60)
+    )
+    return [tuple(b) for b in boxes] if len(boxes) > 0 else []
+def _embed(face_bgr: np.ndarray) -> np.ndarray:
+    img = cv2.resize(face_bgr, (112, 112)).astype(np.float32)
+    img = (img - 127.5) / 127.5
+    inp = np.transpose(img, (2, 0, 1))[np.newaxis]          # NCHW
+    sess = _get_session()
+    emb = sess.run(None, {sess.get_inputs()[0].name: inp})[0][0]
+    return emb / np.linalg.norm(emb)                        # L2-normalise
+# ---------------------------------------------------------------------------
+# Public API  (called from main.py)
+# ---------------------------------------------------------------------------
+def load() -> dict[str, list[list[float]]]:
+    """Load face DB from disk and warm up the ONNX session."""
+    _get_session()                  # triggers one-time model download
+    if DB_PATH.exists():
+        return json.loads(DB_PATH.read_text())
+    return {}
+def save(db: dict[str, list[list[float]]]) -> None:
+    DB_PATH.write_text(json.dumps(db, indent=2))
 def find_match(
+    frame_bgr: np.ndarray,
+    db: dict[str, list[list[float]]],
+    threshold: float = 0.35,
 ) -> Optional[str]:
+    """Return matched name if recognised, None if face present but unknown.
+    Raises NoFaceDetected if no face appears in the image at all.
     """
+    boxes = _detect(frame_bgr)
+    if not boxes:
         raise NoFaceDetected()
+    x, y, w, h = boxes[0]
+    emb = _embed(frame_bgr[y : y + h, x : x + w])
+    best_name, best_sim = None, -1.0
+    for name, enc_list in db.items():
+        for enc in enc_list:
+            sim = float(np.dot(emb, np.array(enc)))
+            if sim > best_sim:
+                best_sim, best_name = sim, name
+    if best_name is not None and best_sim >= threshold:
+        return best_name
+    return None   # face present but not recognised (or DB is empty)
+def add_face(
+    name: str,
+    frame_bgr: np.ndarray,
+    db: dict[str, list[list[float]]],
+    max_per_person: int = 5,
+) -> None:
+    """Embed and store the face from frame_bgr under name."""
+    boxes = _detect(frame_bgr)
+    if not boxes:
         raise ValueError("No face detected in enrollment image")
+    x, y, w, h = boxes[0]
+    emb = _embed(frame_bgr[y : y + h, x : x + w])
+    db.setdefault(name, [])
+    if len(db[name]) < max_per_person:
+        db[name].append(emb.tolist())
+    save(db)

recognizer/main.py CHANGED Viewed

@@ -66,7 +66,7 @@ class Recognizer(ReachyMiniApp):
                 return {"state": _shared["state"]}
         # --- Initialise ---
-        collection_id = load_face_db()
         state = State.SLEEPING
         doa_angle = math.pi / 2   # default: facing front
         speech_count = 0
@@ -112,7 +112,7 @@ class Recognizer(ReachyMiniApp):
                 active_start = time.time()
                 scan_t0 = active_start
                 last_face_check = 0.0
-                pending_enc = None
                 state = State.ACTIVE
             # ---------- ACTIVE ----------
@@ -129,13 +129,13 @@ class Recognizer(ReachyMiniApp):
                     head=_look_direction(1.0, y_scan, 0.0)
                 )
-                # Throttled face recognition via AWS Rekognition
                 if now - last_face_check >= FACE_INTERVAL:
                     last_face_check = now
                     frame = reachy_mini.media.get_frame()
                     if frame is not None:
                         try:
-                            name = find_match(frame, collection_id)
                             if name:
                                 speak(f"Hi {name}!", reachy_mini)
                                 reachy_mini.goto_sleep()
@@ -171,7 +171,7 @@ class Recognizer(ReachyMiniApp):
                         _shared["pending_name"] = None
                     if pending_frame is not None:
                         try:
-                            add_face(name, pending_frame, collection_id)
                         except ValueError as exc:
                             logger.warning("Enrollment failed: %s", exc)
                     speak(f"Nice to meet you, {name}!", reachy_mini)

                 return {"state": _shared["state"]}
         # --- Initialise ---
+        face_db = load_face_db()
         state = State.SLEEPING
         doa_angle = math.pi / 2   # default: facing front
         speech_count = 0
                 active_start = time.time()
                 scan_t0 = active_start
                 last_face_check = 0.0
+                pending_frame = None
                 state = State.ACTIVE
             # ---------- ACTIVE ----------
                     head=_look_direction(1.0, y_scan, 0.0)
                 )
+                # Throttled face recognition
                 if now - last_face_check >= FACE_INTERVAL:
                     last_face_check = now
                     frame = reachy_mini.media.get_frame()
                     if frame is not None:
                         try:
+                            name = find_match(frame, face_db)
                             if name:
                                 speak(f"Hi {name}!", reachy_mini)
                                 reachy_mini.goto_sleep()
                         _shared["pending_name"] = None
                     if pending_frame is not None:
                         try:
+                            add_face(name, pending_frame, face_db)
                         except ValueError as exc:
                             logger.warning("Enrollment failed: %s", exc)
                     speak(f"Nice to meet you, {name}!", reachy_mini)