Spaces:

onitsche
/

recognizer

Running

App Files Files Community

recognizer / CLAUDE.md

Oliver Nitsche

Switch to local ONNX face recognition (no compilation, no cloud)

2fda523 27 days ago

preview code

raw

history blame contribute delete

4.93 kB

	# CLAUDE.md

	This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

	## Environment Setup

	The virtual environment lives one level up at `../reachy_mini_env`. Always activate it first:

	```bash
	source ../reachy_mini_env/bin/activate
	```

	Install the package in editable mode (required for entry-point registration):

	```bash
	pip install -e .
	```

	### System dependency (Raspberry Pi / Reachy Mini wireless)

	```bash
	sudo apt-get install espeak-ng # text-to-speech synthesis
	```

	### Face recognition model (one-time download)

	Face recognition runs locally using ONNX Runtime (no cloud account needed).
	On first run the app downloads the InsightFace MobileFaceNet model (~17 MB)
	from GitHub and caches it at `recognizer/models/w600k_mbf.onnx`.
	Requires internet access the first time only; fully offline thereafter.

	Requires 64-bit Raspberry Pi OS (onnxruntime ships pre-built aarch64 wheels).

	## Running the App

	Run directly (connects to a live Reachy Mini robot):

	```bash
	python recognizer/main.py
	```

	Or via the daemon entry point (used when the robot's daemon manages app lifecycle):

	```bash
	reachy-mini-app run recognizer
	```

	The control panel web UI is served at `http://0.0.0.0:8042` while the app runs.

	## Publishing

	```bash
	reachy-mini-app check # validate the app before publishing
	reachy-mini-app publish # publish to Hugging Face Spaces
	```

	## Architecture

	This is a Reachy Mini robot app — a Python package that plugs into the `reachy_mini` SDK.

	Entry point: `recognizer/main.py` — `Recognizer` class inheriting from `ReachyMiniApp` (ABC from `reachy_mini`).

	App lifecycle (handled by `ReachyMiniApp.wrapped_run()`):
	1. Spawns a FastAPI/uvicorn server on `custom_app_url` (port 8042) in a background thread
	2. Connects to the robot daemon (auto-detects localhost vs. network → LOCAL backend on wireless robot)
	3. Calls `Recognizer.run(reachy_mini, stop_event)` — the main state-machine loop
	4. On stop: sets `stop_event`, shuts down the web server

	State machine (`recognizer/main.py`):

	```
	SLEEPING →(speech detected × 3)→ WAKING → ACTIVE → SLEEPING
	↓ (unknown face)
	ENROLLING → SLEEPING
	```

	- SLEEPING: polls `media.get_DoA()` at 5 Hz; robot stays in sleep pose. Three consecutive `speech_detected=True` readings (debounced) trigger a wake-up.
	- WAKING: calls `wake_up()` (built-in animation + sound), then `look_at_world()` toward the DoA angle.
	- ACTIVE: captures camera frames every 0.5 s, runs `face_recognition.face_locations()` + `face_recognition.face_encodings()` (HOG model, 2× downsampled for speed). Gentle head-scan idle animation via `set_target()`. 15 s timeout → back to sleep.
	- ENROLLING: robot has detected an unrecognised face; waits for name to be submitted via the web UI (`POST /set_name`). Stores encoding in `face_db.json`, says "Nice to meet you, <name>!", then sleeps.

	Helper modules:
	- `recognizer/face_db.py` — local face recognition via ONNX Runtime. `load()` warms up the ONNX session (downloads model on first run) and returns the embedding DB dict. `find_match(frame_bgr, db)` detects with OpenCV Haar cascade, embeds with MobileFaceNet, matches by cosine similarity (threshold 0.35); raises `NoFaceDetected` if no face. `add_face(name, frame_bgr, db)` enrolls a face. DB stored in `recognizer/face_db.json`.
	- `recognizer/tts.py` — synthesises text via `espeak-ng -s 140 -w <tmp.wav>`, plays via `media.play_sound()`, then sleeps to let playback finish.

	Settings UI (`recognizer/static/`):
	- `index.html` / `main.js` / `style.css` — polls `GET /status` every second to show current state; reveals a name-entry form when state is `"enrolling"`.
	- REST endpoints defined in `run()` via `self.settings_app` (FastAPI): `GET /status`, `POST /set_name`.

	Root-level `index.html` / `style.css`: HuggingFace Spaces landing page — separate from the in-app UI in `recognizer/static/`.

	Entry-point registration in `pyproject.toml`:
	```toml
	[project.entry-points."reachy_mini_apps"]
	recognizer = "recognizer.main:Recognizer"
	```

	## Key APIs

	```python
	# Direction of Arrival from the ReSpeaker mic array
	# Returns (angle_radians, speech_detected) or None
	# 0 rad = left, π/2 = front/back, π = right
	doa = reachy_mini.media.get_DoA()

	# Camera frame (BGR uint8 numpy array)
	frame = reachy_mini.media.get_frame()

	# Built-in animations (blocking)
	reachy_mini.wake_up()
	reachy_mini.goto_sleep()

	# Smooth head movement (blocking)
	reachy_mini.look_at_world(x, y, z, duration=0.5) # forward=+x, right=+y

	# Immediate head pose (non-blocking, use set_target for idle animation)
	reachy_mini.set_target(head=pose_4x4)

	# Audio
	reachy_mini.media.play_sound("/abs/path/to/file.wav") # async; sleep afterward
	```