Spaces:

onitsche
/

recognizer

Running

App Files Files Community

recognizer / CLAUDE.md

Oliver Nitsche

Switch to local ONNX face recognition (no compilation, no cloud)

2fda523 27 days ago

preview code

raw

history blame contribute delete

4.93 kB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Environment Setup

The virtual environment lives one level up at ../reachy_mini_env. Always activate it first:

source ../reachy_mini_env/bin/activate

Install the package in editable mode (required for entry-point registration):

pip install -e .

System dependency (Raspberry Pi / Reachy Mini wireless)

sudo apt-get install espeak-ng        # text-to-speech synthesis

Face recognition model (one-time download)

Face recognition runs locally using ONNX Runtime (no cloud account needed). On first run the app downloads the InsightFace MobileFaceNet model (~17 MB) from GitHub and caches it at recognizer/models/w600k_mbf.onnx. Requires internet access the first time only; fully offline thereafter.

Requires 64-bit Raspberry Pi OS (onnxruntime ships pre-built aarch64 wheels).

Running the App

Run directly (connects to a live Reachy Mini robot):

python recognizer/main.py

Or via the daemon entry point (used when the robot's daemon manages app lifecycle):

reachy-mini-app run recognizer

The control panel web UI is served at http://0.0.0.0:8042 while the app runs.

Publishing

reachy-mini-app check          # validate the app before publishing
reachy-mini-app publish        # publish to Hugging Face Spaces

Architecture

This is a Reachy Mini robot app — a Python package that plugs into the reachy_mini SDK.

Entry point: recognizer/main.py — Recognizer class inheriting from ReachyMiniApp (ABC from reachy_mini).

App lifecycle (handled by ReachyMiniApp.wrapped_run()):

Spawns a FastAPI/uvicorn server on custom_app_url (port 8042) in a background thread
Connects to the robot daemon (auto-detects localhost vs. network → LOCAL backend on wireless robot)
Calls Recognizer.run(reachy_mini, stop_event) — the main state-machine loop
On stop: sets stop_event, shuts down the web server

State machine (recognizer/main.py):

SLEEPING  →(speech detected × 3)→  WAKING  →  ACTIVE  →  SLEEPING
                                                  ↓ (unknown face)
                                               ENROLLING  →  SLEEPING

SLEEPING: polls media.get_DoA() at 5 Hz; robot stays in sleep pose. Three consecutive speech_detected=True readings (debounced) trigger a wake-up.
WAKING: calls wake_up() (built-in animation + sound), then look_at_world() toward the DoA angle.
ACTIVE: captures camera frames every 0.5 s, runs face_recognition.face_locations() + face_recognition.face_encodings() (HOG model, 2× downsampled for speed). Gentle head-scan idle animation via set_target(). 15 s timeout → back to sleep.
ENROLLING: robot has detected an unrecognised face; waits for name to be submitted via the web UI (POST /set_name). Stores encoding in face_db.json, says "Nice to meet you, !", then sleeps.

Helper modules:

recognizer/face_db.py — local face recognition via ONNX Runtime. load() warms up the ONNX session (downloads model on first run) and returns the embedding DB dict. find_match(frame_bgr, db) detects with OpenCV Haar cascade, embeds with MobileFaceNet, matches by cosine similarity (threshold 0.35); raises NoFaceDetected if no face. add_face(name, frame_bgr, db) enrolls a face. DB stored in recognizer/face_db.json.
recognizer/tts.py — synthesises text via espeak-ng -s 140 -w <tmp.wav>, plays via media.play_sound(), then sleeps to let playback finish.

Settings UI (recognizer/static/):

index.html / main.js / style.css — polls GET /status every second to show current state; reveals a name-entry form when state is "enrolling".
REST endpoints defined in run() via self.settings_app (FastAPI): GET /status, POST /set_name.

Root-level index.html / style.css: HuggingFace Spaces landing page — separate from the in-app UI in recognizer/static/.

Entry-point registration in pyproject.toml:

[project.entry-points."reachy_mini_apps"]
recognizer = "recognizer.main:Recognizer"

Key APIs

# Direction of Arrival from the ReSpeaker mic array
# Returns (angle_radians, speech_detected) or None
# 0 rad = left, π/2 = front/back, π = right
doa = reachy_mini.media.get_DoA()

# Camera frame (BGR uint8 numpy array)
frame = reachy_mini.media.get_frame()

# Built-in animations (blocking)
reachy_mini.wake_up()
reachy_mini.goto_sleep()

# Smooth head movement (blocking)
reachy_mini.look_at_world(x, y, z, duration=0.5)  # forward=+x, right=+y

# Immediate head pose (non-blocking, use set_target for idle animation)
reachy_mini.set_target(head=pose_4x4)

# Audio
reachy_mini.media.play_sound("/abs/path/to/file.wav")  # async; sleep afterward