recognizer / CLAUDE.md
Oliver Nitsche
Switch to local ONNX face recognition (no compilation, no cloud)
2fda523

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Environment Setup

The virtual environment lives one level up at ../reachy_mini_env. Always activate it first:

source ../reachy_mini_env/bin/activate

Install the package in editable mode (required for entry-point registration):

pip install -e .

System dependency (Raspberry Pi / Reachy Mini wireless)

sudo apt-get install espeak-ng        # text-to-speech synthesis

Face recognition model (one-time download)

Face recognition runs locally using ONNX Runtime (no cloud account needed). On first run the app downloads the InsightFace MobileFaceNet model (~17 MB) from GitHub and caches it at recognizer/models/w600k_mbf.onnx. Requires internet access the first time only; fully offline thereafter.

Requires 64-bit Raspberry Pi OS (onnxruntime ships pre-built aarch64 wheels).

Running the App

Run directly (connects to a live Reachy Mini robot):

python recognizer/main.py

Or via the daemon entry point (used when the robot's daemon manages app lifecycle):

reachy-mini-app run recognizer

The control panel web UI is served at http://0.0.0.0:8042 while the app runs.

Publishing

reachy-mini-app check          # validate the app before publishing
reachy-mini-app publish        # publish to Hugging Face Spaces

Architecture

This is a Reachy Mini robot app β€” a Python package that plugs into the reachy_mini SDK.

Entry point: recognizer/main.py β€” Recognizer class inheriting from ReachyMiniApp (ABC from reachy_mini).

App lifecycle (handled by ReachyMiniApp.wrapped_run()):

  1. Spawns a FastAPI/uvicorn server on custom_app_url (port 8042) in a background thread
  2. Connects to the robot daemon (auto-detects localhost vs. network β†’ LOCAL backend on wireless robot)
  3. Calls Recognizer.run(reachy_mini, stop_event) β€” the main state-machine loop
  4. On stop: sets stop_event, shuts down the web server

State machine (recognizer/main.py):

SLEEPING  β†’(speech detected Γ— 3)β†’  WAKING  β†’  ACTIVE  β†’  SLEEPING
                                                  ↓ (unknown face)
                                               ENROLLING  β†’  SLEEPING
  • SLEEPING: polls media.get_DoA() at 5 Hz; robot stays in sleep pose. Three consecutive speech_detected=True readings (debounced) trigger a wake-up.
  • WAKING: calls wake_up() (built-in animation + sound), then look_at_world() toward the DoA angle.
  • ACTIVE: captures camera frames every 0.5 s, runs face_recognition.face_locations() + face_recognition.face_encodings() (HOG model, 2Γ— downsampled for speed). Gentle head-scan idle animation via set_target(). 15 s timeout β†’ back to sleep.
  • ENROLLING: robot has detected an unrecognised face; waits for name to be submitted via the web UI (POST /set_name). Stores encoding in face_db.json, says "Nice to meet you, !", then sleeps.

Helper modules:

  • recognizer/face_db.py β€” local face recognition via ONNX Runtime. load() warms up the ONNX session (downloads model on first run) and returns the embedding DB dict. find_match(frame_bgr, db) detects with OpenCV Haar cascade, embeds with MobileFaceNet, matches by cosine similarity (threshold 0.35); raises NoFaceDetected if no face. add_face(name, frame_bgr, db) enrolls a face. DB stored in recognizer/face_db.json.
  • recognizer/tts.py β€” synthesises text via espeak-ng -s 140 -w <tmp.wav>, plays via media.play_sound(), then sleeps to let playback finish.

Settings UI (recognizer/static/):

  • index.html / main.js / style.css β€” polls GET /status every second to show current state; reveals a name-entry form when state is "enrolling".
  • REST endpoints defined in run() via self.settings_app (FastAPI): GET /status, POST /set_name.

Root-level index.html / style.css: HuggingFace Spaces landing page β€” separate from the in-app UI in recognizer/static/.

Entry-point registration in pyproject.toml:

[project.entry-points."reachy_mini_apps"]
recognizer = "recognizer.main:Recognizer"

Key APIs

# Direction of Arrival from the ReSpeaker mic array
# Returns (angle_radians, speech_detected) or None
# 0 rad = left, Ο€/2 = front/back, Ο€ = right
doa = reachy_mini.media.get_DoA()

# Camera frame (BGR uint8 numpy array)
frame = reachy_mini.media.get_frame()

# Built-in animations (blocking)
reachy_mini.wake_up()
reachy_mini.goto_sleep()

# Smooth head movement (blocking)
reachy_mini.look_at_world(x, y, z, duration=0.5)  # forward=+x, right=+y

# Immediate head pose (non-blocking, use set_target for idle animation)
reachy_mini.set_target(head=pose_4x4)

# Audio
reachy_mini.media.play_sound("/abs/path/to/file.wav")  # async; sleep afterward