Spaces:

onitsche
/

talk

Running

App Files Files Community

talk / CLAUDE.md

onitsche

Fix dropped conversation + add configurable voice volume

6c690a7 3 days ago

preview code

raw

history blame contribute delete

5.47 kB

	# CLAUDE.md

	This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

	## Environment Setup

	The shared virtual environment lives one level up at `../reachy_mini_env`. Always activate it first:

	```bash
	source ../reachy_mini_env/bin/activate
	```

	Install the package in editable mode (required for entry-point registration):

	```bash
	pip install -e .
	```

	## Running the App

	Run directly (connects to a live Reachy Mini robot):

	```bash
	python talk/main.py
	```

	Or via the daemon entry point:

	```bash
	reachy-mini-app run talk
	```

	The control panel web UI is served at `http://0.0.0.0:8042` while the app runs.

	Conversation mode needs an Anthropic API key. Provide it either via the
	`ANTHROPIC_API_KEY` environment variable, or by entering it in the app's web UI
	(`http://0.0.0.0:8042` → Einstellungen). A key entered in the UI is stored at
	`~/.config/talk/api_key` (chmod 600, outside the repo) and takes precedence over
	the env var.

	## Publishing

	```bash
	reachy-mini-app check # validate the app before publishing
	reachy-mini-app publish # publish to Hugging Face Spaces
	```

	## Architecture

	This is a Reachy Mini robot app — a Python package that plugs into the `reachy_mini` SDK.

	App lifecycle (handled by `ReachyMiniApp.wrapped_run()`):
	1. Spawns a FastAPI/uvicorn server on `custom_app_url` (port 8042) in a background thread
	2. Connects to the robot daemon
	3. Calls `run(reachy_mini, stop_event)` — the main loop
	4. On stop: sets `stop_event`, shuts down the web server

	Entry-point registration in `pyproject.toml`:
	```toml
	[project.entry-points."reachy_mini_apps"]
	talk = "talk.main:Talk"
	```

	## State Machine

	```
	SLEEPING → (speech detected) → TIME → CONVERSING → (silence/antenna press) → SLEEPING
	```

	- SLEEPING: polls `get_DoA()` at 5 Hz; wakes after `DOA_DEBOUNCE` (3) consecutive speech-detected readings (same mechanism as the recognizer). Ignores audio for `DEBOUNCE_AFTER_SPEAK` (2 s) after the robot itself spoke so its own goodbye can't re-wake it.
	- TIME: `wake_up()` → speak German datetime with gesture loop, facing the speaker via the captured DoA angle → `start_recording()` → enter CONVERSING.
	- CONVERSING (inner loop):
	- LISTENING: `record_utterance()` uses RMS-energy VAD with a threshold auto-calibrated from the ambient noise floor; head tracking toward the speaker is non-blocking (`set_target`) so the audio loop is never frozen. Exits on antenna press, or returns empty after `IDLE_TIMEOUT` (25 s) of silence.
	- PROCESSING: `transcribe(chunks)` → Google STT; `get_response(messages)` → Claude API.
	- RESPONDING: `_speak_with_gestures()` → back to LISTENING. Recording runs continuously throughout the conversation; `record_utterance()` drains the echo captured during playback.
	- Exit: antenna press or idle timeout → `stop_recording()` → `goto_sleep()` → SLEEPING.

	## Helper Modules

	- `talk/tts.py`: edge-tts (MS neural, `de-DE-KatjaNeural`) → MP3 → `media.play_sound()`. Falls back to espeak-ng. Loudness is configurable (0-200, 100 = engine default) via the `tts_volume` setting — applied as edge-tts `volume="+X%"` and espeak `-a`. Blocks for estimated playback duration.
	- `talk/stt.py`: records from ReSpeaker (16 kHz float32), loudest-channel RMS-energy VAD with a threshold auto-calibrated from ambient noise (logs ambient/threshold/max-RMS for tuning), converts to mono 16-bit WAV, transcribes via Google Speech Recognition. Non-blocking DoA head tracking. Accepts an `idle_timeout` so a silent conversation returns to sleep.
	- `talk/llm.py`: stateless Claude API wrapper. Caller owns `messages` list. Resolves the API key via `get_api_key()` — the web-UI file (`~/.config/talk/api_key`) first, then `ANTHROPIC_API_KEY`. Also exposes `has_api_key()` and `save_api_key()` for the web UI.
	- `talk/config.py`: JSON-backed non-secret settings store at `~/.config/talk/settings.json` (`get_setting`/`set_setting`). Holds `tts_volume`. Outside the repo so it is never committed/packaged.

	## Key SDK APIs

	```python
	# Direction of Arrival: (angle_radians, speech_detected) or None
	# 0 rad = left, π/2 = front, π = right
	doa = reachy_mini.media.get_DoA()

	# Audio recording (chunks are (N, 2) float32 arrays at 16 kHz)
	reachy_mini.media.start_recording()
	chunk = reachy_mini.media.get_audio_sample() # None if no new data
	reachy_mini.media.stop_recording()

	# Audio playback (async — sleep afterward for estimated duration)
	reachy_mini.media.play_sound("/abs/path/to/file.mp3")

	# Head movement
	reachy_mini.look_at_world(x, y, z, duration=0.5) # forward=+x, right=+y
	head_pose = reachy_mini.look_at_world(1.0, y, z, perform_movement=False)
	reachy_mini.set_target(head=head_pose, antennas=[left, right])

	# Built-in animations (blocking)
	reachy_mini.wake_up()
	reachy_mini.goto_sleep()
	```

	## Settings UI

	`talk/static/` polls `GET /status` every second. Returns `{state, last_user, last_assistant, api_key_set, tts_volume}`. Shows colour-coded status chip and conversation bubbles (user on right, assistant on left) during CONVERSING. An Einstellungen section lets the user (a) enter the Anthropic API key — `POST /set_api_key` (`{api_key}`) → `save_api_key()`, with the `api_key_set` flag driving a "key set?" indicator; and (b) set the voice loudness with a slider — `POST /set_config` (`{tts_volume}`) → `config.set_setting()`.