talk / CLAUDE.md
onitsche's picture
Fix dropped conversation + add configurable voice volume
6c690a7

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Environment Setup

The shared virtual environment lives one level up at ../reachy_mini_env. Always activate it first:

source ../reachy_mini_env/bin/activate

Install the package in editable mode (required for entry-point registration):

pip install -e .

Running the App

Run directly (connects to a live Reachy Mini robot):

python talk/main.py

Or via the daemon entry point:

reachy-mini-app run talk

The control panel web UI is served at http://0.0.0.0:8042 while the app runs.

Conversation mode needs an Anthropic API key. Provide it either via the ANTHROPIC_API_KEY environment variable, or by entering it in the app's web UI (http://0.0.0.0:8042 β†’ Einstellungen). A key entered in the UI is stored at ~/.config/talk/api_key (chmod 600, outside the repo) and takes precedence over the env var.

Publishing

reachy-mini-app check          # validate the app before publishing
reachy-mini-app publish        # publish to Hugging Face Spaces

Architecture

This is a Reachy Mini robot app β€” a Python package that plugs into the reachy_mini SDK.

App lifecycle (handled by ReachyMiniApp.wrapped_run()):

  1. Spawns a FastAPI/uvicorn server on custom_app_url (port 8042) in a background thread
  2. Connects to the robot daemon
  3. Calls run(reachy_mini, stop_event) β€” the main loop
  4. On stop: sets stop_event, shuts down the web server

Entry-point registration in pyproject.toml:

[project.entry-points."reachy_mini_apps"]
talk = "talk.main:Talk"

State Machine

SLEEPING β†’ (speech detected) β†’ TIME β†’ CONVERSING β†’ (silence/antenna press) β†’ SLEEPING
  • SLEEPING: polls get_DoA() at 5 Hz; wakes after DOA_DEBOUNCE (3) consecutive speech-detected readings (same mechanism as the recognizer). Ignores audio for DEBOUNCE_AFTER_SPEAK (2 s) after the robot itself spoke so its own goodbye can't re-wake it.
  • TIME: wake_up() β†’ speak German datetime with gesture loop, facing the speaker via the captured DoA angle β†’ start_recording() β†’ enter CONVERSING.
  • CONVERSING (inner loop):
    • LISTENING: record_utterance() uses RMS-energy VAD with a threshold auto-calibrated from the ambient noise floor; head tracking toward the speaker is non-blocking (set_target) so the audio loop is never frozen. Exits on antenna press, or returns empty after IDLE_TIMEOUT (25 s) of silence.
    • PROCESSING: transcribe(chunks) β†’ Google STT; get_response(messages) β†’ Claude API.
    • RESPONDING: _speak_with_gestures() β†’ back to LISTENING. Recording runs continuously throughout the conversation; record_utterance() drains the echo captured during playback.
    • Exit: antenna press or idle timeout β†’ stop_recording() β†’ goto_sleep() β†’ SLEEPING.

Helper Modules

  • talk/tts.py: edge-tts (MS neural, de-DE-KatjaNeural) β†’ MP3 β†’ media.play_sound(). Falls back to espeak-ng. Loudness is configurable (0-200, 100 = engine default) via the tts_volume setting β€” applied as edge-tts volume="+X%" and espeak -a. Blocks for estimated playback duration.
  • talk/stt.py: records from ReSpeaker (16 kHz float32), loudest-channel RMS-energy VAD with a threshold auto-calibrated from ambient noise (logs ambient/threshold/max-RMS for tuning), converts to mono 16-bit WAV, transcribes via Google Speech Recognition. Non-blocking DoA head tracking. Accepts an idle_timeout so a silent conversation returns to sleep.
  • talk/llm.py: stateless Claude API wrapper. Caller owns messages list. Resolves the API key via get_api_key() β€” the web-UI file (~/.config/talk/api_key) first, then ANTHROPIC_API_KEY. Also exposes has_api_key() and save_api_key() for the web UI.
  • talk/config.py: JSON-backed non-secret settings store at ~/.config/talk/settings.json (get_setting/set_setting). Holds tts_volume. Outside the repo so it is never committed/packaged.

Key SDK APIs

# Direction of Arrival: (angle_radians, speech_detected) or None
# 0 rad = left, Ο€/2 = front, Ο€ = right
doa = reachy_mini.media.get_DoA()

# Audio recording (chunks are (N, 2) float32 arrays at 16 kHz)
reachy_mini.media.start_recording()
chunk = reachy_mini.media.get_audio_sample()  # None if no new data
reachy_mini.media.stop_recording()

# Audio playback (async β€” sleep afterward for estimated duration)
reachy_mini.media.play_sound("/abs/path/to/file.mp3")

# Head movement
reachy_mini.look_at_world(x, y, z, duration=0.5)  # forward=+x, right=+y
head_pose = reachy_mini.look_at_world(1.0, y, z, perform_movement=False)
reachy_mini.set_target(head=head_pose, antennas=[left, right])

# Built-in animations (blocking)
reachy_mini.wake_up()
reachy_mini.goto_sleep()

Settings UI

talk/static/ polls GET /status every second. Returns {state, last_user, last_assistant, api_key_set, tts_volume}. Shows colour-coded status chip and conversation bubbles (user on right, assistant on left) during CONVERSING. An Einstellungen section lets the user (a) enter the Anthropic API key β€” POST /set_api_key ({api_key}) β†’ save_api_key(), with the api_key_set flag driving a "key set?" indicator; and (b) set the voice loudness with a slider β€” POST /set_config ({tts_volume}) β†’ config.set_setting().