| # CLAUDE.md |
|
|
| This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. |
|
|
| ## Environment Setup |
|
|
| The shared virtual environment lives one level up at `../reachy_mini_env`. Always activate it first: |
|
|
| ```bash |
| source ../reachy_mini_env/bin/activate |
| ``` |
|
|
| Install the package in editable mode (required for entry-point registration): |
|
|
| ```bash |
| pip install -e . |
| ``` |
|
|
| ## Running the App |
|
|
| Run directly (connects to a live Reachy Mini robot): |
|
|
| ```bash |
| python talk/main.py |
| ``` |
|
|
| Or via the daemon entry point: |
|
|
| ```bash |
| reachy-mini-app run talk |
| ``` |
|
|
| The control panel web UI is served at `http://0.0.0.0:8042` while the app runs. |
|
|
| Conversation mode needs an Anthropic API key. Provide it either via the |
| `ANTHROPIC_API_KEY` environment variable, or by entering it in the app's web UI |
| (`http://0.0.0.0:8042` β *Einstellungen*). A key entered in the UI is stored at |
| `~/.config/talk/api_key` (chmod 600, outside the repo) and takes precedence over |
| the env var. |
|
|
| ## Publishing |
|
|
| ```bash |
| reachy-mini-app check # validate the app before publishing |
| reachy-mini-app publish # publish to Hugging Face Spaces |
| ``` |
|
|
| ## Architecture |
|
|
| This is a **Reachy Mini robot app** β a Python package that plugs into the `reachy_mini` SDK. |
|
|
| **App lifecycle** (handled by `ReachyMiniApp.wrapped_run()`): |
| 1. Spawns a FastAPI/uvicorn server on `custom_app_url` (port 8042) in a background thread |
| 2. Connects to the robot daemon |
| 3. Calls `run(reachy_mini, stop_event)` β the main loop |
| 4. On stop: sets `stop_event`, shuts down the web server |
|
|
| **Entry-point registration** in `pyproject.toml`: |
| ```toml |
| [project.entry-points."reachy_mini_apps"] |
| talk = "talk.main:Talk" |
| ``` |
|
|
| ## State Machine |
|
|
| ``` |
| SLEEPING β (speech detected) β TIME β CONVERSING β (silence/antenna press) β SLEEPING |
| ``` |
|
|
| - **SLEEPING**: polls `get_DoA()` at 5 Hz; wakes after `DOA_DEBOUNCE` (3) consecutive speech-detected readings (same mechanism as the recognizer). Ignores audio for `DEBOUNCE_AFTER_SPEAK` (2 s) after the robot itself spoke so its own goodbye can't re-wake it. |
| - **TIME**: `wake_up()` β speak German datetime with gesture loop, facing the speaker via the captured DoA angle β `start_recording()` β enter CONVERSING. |
| - **CONVERSING** (inner loop): |
| - **LISTENING**: `record_utterance()` uses RMS-energy VAD with a threshold auto-calibrated from the ambient noise floor; head tracking toward the speaker is **non-blocking** (`set_target`) so the audio loop is never frozen. Exits on antenna press, or returns empty after `IDLE_TIMEOUT` (25 s) of silence. |
| - **PROCESSING**: `transcribe(chunks)` β Google STT; `get_response(messages)` β Claude API. |
| - **RESPONDING**: `_speak_with_gestures()` β back to LISTENING. Recording runs continuously throughout the conversation; `record_utterance()` drains the echo captured during playback. |
| - Exit: antenna press *or* idle timeout β `stop_recording()` β `goto_sleep()` β SLEEPING. |
|
|
| ## Helper Modules |
|
|
| - **`talk/tts.py`**: edge-tts (MS neural, `de-DE-KatjaNeural`) β MP3 β `media.play_sound()`. Falls back to espeak-ng. Loudness is configurable (0-200, 100 = engine default) via the `tts_volume` setting β applied as edge-tts `volume="+X%"` and espeak `-a`. Blocks for estimated playback duration. |
| - **`talk/stt.py`**: records from ReSpeaker (16 kHz float32), loudest-channel RMS-energy VAD with a threshold auto-calibrated from ambient noise (logs ambient/threshold/max-RMS for tuning), converts to mono 16-bit WAV, transcribes via Google Speech Recognition. Non-blocking DoA head tracking. Accepts an `idle_timeout` so a silent conversation returns to sleep. |
| - **`talk/llm.py`**: stateless Claude API wrapper. Caller owns `messages` list. Resolves the API key via `get_api_key()` β the web-UI file (`~/.config/talk/api_key`) first, then `ANTHROPIC_API_KEY`. Also exposes `has_api_key()` and `save_api_key()` for the web UI. |
| - **`talk/config.py`**: JSON-backed non-secret settings store at `~/.config/talk/settings.json` (`get_setting`/`set_setting`). Holds `tts_volume`. Outside the repo so it is never committed/packaged. |
|
|
| ## Key SDK APIs |
|
|
| ```python |
| # Direction of Arrival: (angle_radians, speech_detected) or None |
| # 0 rad = left, Ο/2 = front, Ο = right |
| doa = reachy_mini.media.get_DoA() |
| |
| # Audio recording (chunks are (N, 2) float32 arrays at 16 kHz) |
| reachy_mini.media.start_recording() |
| chunk = reachy_mini.media.get_audio_sample() # None if no new data |
| reachy_mini.media.stop_recording() |
| |
| # Audio playback (async β sleep afterward for estimated duration) |
| reachy_mini.media.play_sound("/abs/path/to/file.mp3") |
| |
| # Head movement |
| reachy_mini.look_at_world(x, y, z, duration=0.5) # forward=+x, right=+y |
| head_pose = reachy_mini.look_at_world(1.0, y, z, perform_movement=False) |
| reachy_mini.set_target(head=head_pose, antennas=[left, right]) |
| |
| # Built-in animations (blocking) |
| reachy_mini.wake_up() |
| reachy_mini.goto_sleep() |
| ``` |
|
|
| ## Settings UI |
|
|
| `talk/static/` polls `GET /status` every second. Returns `{state, last_user, last_assistant, api_key_set, tts_volume}`. Shows colour-coded status chip and conversation bubbles (user on right, assistant on left) during CONVERSING. An *Einstellungen* section lets the user (a) enter the Anthropic API key β `POST /set_api_key` (`{api_key}`) β `save_api_key()`, with the `api_key_set` flag driving a "key set?" indicator; and (b) set the voice loudness with a slider β `POST /set_config` (`{tts_volume}`) β `config.set_setting()`. |
|
|