Spaces:

onitsche
/

talk

Running

Conversation mode needs an Anthropic API key. Provide it either via the ANTHROPIC_API_KEY environment variable, or by entering it in the app's web UI (http://0.0.0.0:8042 → Einstellungen). A key entered in the UI is stored at ~/.config/talk/api_key (chmod 600, outside the repo) and takes precedence over the env var.

Publishing

reachy-mini-app check          # validate the app before publishing
reachy-mini-app publish        # publish to Hugging Face Spaces

Architecture

This is a Reachy Mini robot app — a Python package that plugs into the reachy_mini SDK.

App lifecycle (handled by ReachyMiniApp.wrapped_run()):

Spawns a FastAPI/uvicorn server on custom_app_url (port 8042) in a background thread
Connects to the robot daemon
Calls run(reachy_mini, stop_event) — the main loop
On stop: sets stop_event, shuts down the web server

Entry-point registration in pyproject.toml:

[project.entry-points."reachy_mini_apps"]
talk = "talk.main:Talk"

State Machine

SLEEPING → (speech detected) → TIME → CONVERSING → (silence/antenna press) → SLEEPING

SLEEPING: polls get_DoA() at 5 Hz; wakes after DOA_DEBOUNCE (3) consecutive speech-detected readings (same mechanism as the recognizer). Ignores audio for DEBOUNCE_AFTER_SPEAK (2 s) after the robot itself spoke so its own goodbye can't re-wake it.
TIME: wake_up() → speak German datetime with gesture loop, facing the speaker via the captured DoA angle → start_recording() → enter CONVERSING.
CONVERSING (inner loop):
- LISTENING: record_utterance() uses RMS-energy VAD with a threshold auto-calibrated from the ambient noise floor; head tracking toward the speaker is non-blocking (set_target) so the audio loop is never frozen. Exits on antenna press, or returns empty after IDLE_TIMEOUT (25 s) of silence.
- PROCESSING: transcribe(chunks) → Google STT; get_response(messages) → Claude API.
- RESPONDING: _speak_with_gestures() → back to LISTENING. Recording runs continuously throughout the conversation; record_utterance() drains the echo captured during playback.
- Exit: antenna press or idle timeout → stop_recording() → goto_sleep() → SLEEPING.

Helper Modules

talk/tts.py: edge-tts (MS neural, de-DE-KatjaNeural) → MP3 → media.play_sound(). Falls back to espeak-ng. Loudness is configurable (0-200, 100 = engine default) via the tts_volume setting — applied as edge-tts volume="+X%" and espeak -a. Blocks for estimated playback duration.
talk/stt.py: records from ReSpeaker (16 kHz float32), loudest-channel RMS-energy VAD with a threshold auto-calibrated from ambient noise (logs ambient/threshold/max-RMS for tuning), converts to mono 16-bit WAV, transcribes via Google Speech Recognition. Non-blocking DoA head tracking. Accepts an idle_timeout so a silent conversation returns to sleep.
talk/llm.py: stateless Claude API wrapper. Caller owns messages list. Resolves the API key via get_api_key() — the web-UI file (~/.config/talk/api_key) first, then ANTHROPIC_API_KEY. Also exposes has_api_key() and save_api_key() for the web UI.
talk/config.py: JSON-backed non-secret settings store at ~/.config/talk/settings.json (get_setting/set_setting). Holds tts_volume. Outside the repo so it is never committed/packaged.

Key SDK APIs

# Direction of Arrival: (angle_radians, speech_detected) or None
# 0 rad = left, π/2 = front, π = right
doa = reachy_mini.media.get_DoA()

# Audio recording (chunks are (N, 2) float32 arrays at 16 kHz)
reachy_mini.media.start_recording()
chunk = reachy_mini.media.get_audio_sample()  # None if no new data
reachy_mini.media.stop_recording()

# Audio playback (async — sleep afterward for estimated duration)
reachy_mini.media.play_sound("/abs/path/to/file.mp3")

# Head movement
reachy_mini.look_at_world(x, y, z, duration=0.5)  # forward=+x, right=+y
head_pose = reachy_mini.look_at_world(1.0, y, z, perform_movement=False)
reachy_mini.set_target(head=head_pose, antennas=[left, right])

# Built-in animations (blocking)
reachy_mini.wake_up()
reachy_mini.goto_sleep()

Settings UI

talk/static/ polls GET /status every second. Returns {state, last_user, last_assistant, api_key_set, tts_volume}. Shows colour-coded status chip and conversation bubbles (user on right, assistant on left) during CONVERSING. An Einstellungen section lets the user (a) enter the Anthropic API key — POST /set_api_key ({api_key}) → save_api_key(), with the api_key_set flag driving a "key set?" indicator; and (b) set the voice loudness with a slider — POST /set_config ({tts_volume}) → config.set_setting().