Spaces:

EdLuxAI
/

reachyclaw

Running

shaunx Claude Opus 4.6 commited on Mar 5

Commit

1ab5bef

0 Parent(s):

Initial release of ReachyClaw

OpenClaw AI agent embodied in a Reachy Mini robot. OpenClaw is the actual
brain — every message is routed through OpenClaw, and it controls the robot
body (movement, emotions, camera) via inline action tags. OpenAI Realtime
API is used purely for voice I/O.

Based on ClawBody (Apache 2.0) with fundamental architecture changes:
- GPT-4o is a voice relay, not the brain
- OpenClaw controls all physical actions via [ACTION:param] tags
- No startup context fetch (instant startup)
- Dynamic action list built from TOOL_SPECS (single source of truth)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Files changed (29) hide show

.env.example +44 -0
.github/workflows/sync-to-huggingface.yml +23 -0
.gitignore +62 -0
CONTRIBUTING.md +78 -0
LICENSE +191 -0
README.md +185 -0
index.html +201 -0
openclaw-skill/SKILL.md +109 -0
pyproject.toml +120 -0
src/reachy_mini_openclaw/__init__.py +9 -0
src/reachy_mini_openclaw/audio/__init__.py +5 -0
src/reachy_mini_openclaw/audio/head_wobbler.py +223 -0
src/reachy_mini_openclaw/camera_worker.py +382 -0
src/reachy_mini_openclaw/config.py +84 -0
src/reachy_mini_openclaw/gradio_app.py +202 -0
src/reachy_mini_openclaw/main.py +591 -0
src/reachy_mini_openclaw/moves.py +648 -0
src/reachy_mini_openclaw/openai_realtime.py +562 -0
src/reachy_mini_openclaw/openclaw_bridge.py +606 -0
src/reachy_mini_openclaw/prompts.py +98 -0
src/reachy_mini_openclaw/prompts/default.txt +15 -0
src/reachy_mini_openclaw/tools/__init__.py +17 -0
src/reachy_mini_openclaw/tools/core_tools.py +421 -0
src/reachy_mini_openclaw/vision/__init__.py +18 -0
src/reachy_mini_openclaw/vision/head_tracker.py +70 -0
src/reachy_mini_openclaw/vision/mediapipe_tracker.py +112 -0
src/reachy_mini_openclaw/vision/processors.py +419 -0
src/reachy_mini_openclaw/vision/yolo_head_tracker.py +152 -0
style.css +425 -0

.env.example ADDED Viewed

	@@ -0,0 +1,44 @@

+# ReachyClaw Configuration
+# Give your OpenClaw AI agent a physical robot body!
+# ==============================================================================
+# REQUIRED: OpenAI API Key
+# ==============================================================================
+# Get your key at: https://platform.openai.com/api-keys
+# Requires Realtime API access
+OPENAI_API_KEY=sk-your-openai-key
+# ==============================================================================
+# REQUIRED: OpenClaw Gateway
+# ==============================================================================
+# The URL where your OpenClaw gateway is running
+# If running on the same machine as the robot, use the host machine's IP
+OPENCLAW_GATEWAY_URL=http://192.168.1.100:18789
+# Your OpenClaw gateway authentication token
+# Find this in ~/.openclaw/openclaw.json under gateway.token
+OPENCLAW_TOKEN=your-gateway-token
+# Agent ID to use (default: main)
+OPENCLAW_AGENT_ID=main
+# Session key for conversation context - IMPORTANT!
+# Use "main" (default) to share context with WhatsApp and other DM channels
+# This allows the robot to be aware of all your conversations
+OPENCLAW_SESSION_KEY=main
+# ==============================================================================
+# OPTIONAL: Voice Settings
+# ==============================================================================
+# OpenAI Realtime voice (alloy, echo, fable, onyx, nova, shimmer, cedar)
+OPENAI_VOICE=cedar
+# OpenAI model for Realtime API
+OPENAI_MODEL=gpt-4o-realtime-preview-2024-12-17
+# ==============================================================================
+# OPTIONAL: Features
+# ==============================================================================
+# Enable/disable features (true/false)
+ENABLE_CAMERA=true
+ENABLE_OPENCLAW_TOOLS=true

.github/workflows/sync-to-huggingface.yml ADDED Viewed

	@@ -0,0 +1,23 @@

+name: Sync to Hugging Face Space
+on:
+  push:
+    branches: [main]
+  workflow_dispatch:  # Allow manual trigger
+jobs:
+  sync-to-hub:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+          lfs: true
+      - name: Push to Hugging Face
+        env:
+          HF_TOKEN: ${{ secrets.HF_TOKEN }}
+        run: |
+          git remote add huggingface https://huggingface:$HF_TOKEN@huggingface.co/spaces/shaunx/reachyclaw
+          git push huggingface main --force

.gitignore ADDED Viewed

	@@ -0,0 +1,62 @@

+# ReachyClaw .gitignore
+# Environment and secrets
+.env
+*.env.local
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+# Virtual environments
+.venv/
+venv/
+ENV/
+env/
+# IDE
+.idea/
+.vscode/
+*.swp
+*.swo
+*~
+# Testing
+.pytest_cache/
+.coverage
+htmlcov/
+.tox/
+.nox/
+# Type checking
+.mypy_cache/
+.dmypy.json
+dmypy.json
+# Logs
+*.log
+# OS
+.DS_Store
+Thumbs.db
+# Package manager
+uv.lock

CONTRIBUTING.md ADDED Viewed

	@@ -0,0 +1,78 @@

+# Contributing to ReachyClaw
+Thank you for your interest in contributing! This project welcomes contributions from the community.
+## How to Contribute
+### Reporting Bugs
+If you find a bug, please open an issue with:
+- A clear title and description
+- Steps to reproduce the issue
+- Expected vs actual behavior
+- Your environment (OS, Python version, robot model)
+### Suggesting Features
+Feature requests are welcome! Please open an issue with:
+- A clear description of the feature
+- Use cases and motivation
+- Any technical considerations
+### Pull Requests
+1. Fork the repository
+2. Create a feature branch (`git checkout -b feature/amazing-feature`)
+3. Make your changes
+4. Add tests if applicable
+5. Run linting: `ruff check . && ruff format .`
+6. Commit your changes (`git commit -m 'Add amazing feature'`)
+7. Push to the branch (`git push origin feature/amazing-feature`)
+8. Open a Pull Request
+## Development Setup
+```bash
+# Clone your fork
+git clone https://github.com/YOUR_USERNAME/reachyclaw.git
+cd reachyclaw
+# Install in development mode
+pip install -e ".[dev]"
+# Run tests
+pytest
+# Format code
+ruff check --fix .
+ruff format .
+```
+## Code Style
+- Follow PEP 8
+- Use type hints
+- Write docstrings for public functions and classes
+- Keep functions focused and small
+## Where to Submit Contributions
+### This Project
+Submit PRs directly to this repository for:
+- Bug fixes
+- New features
+- Documentation improvements
+- New personality profiles
+### Reachy Mini Ecosystem
+- **SDK improvements**: [pollen-robotics/reachy_mini](https://github.com/pollen-robotics/reachy_mini)
+- **New dances/emotions**: [reachy_mini_dances_library](https://github.com/pollen-robotics/reachy_mini_dances_library)
+- **Apps for the app store**: Submit to [Hugging Face Spaces](https://huggingface.co/spaces)
+### OpenClaw Ecosystem
+- **New skills**: Submit to [MoltDirectory](https://github.com/neonone123/moltdirectory)
+- **Core OpenClaw**: [openclaw/openclaw](https://github.com/openclaw/openclaw)
+## License
+By contributing, you agree that your contributions will be licensed under the Apache 2.0 License.

LICENSE ADDED Viewed

	@@ -0,0 +1,191 @@

+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+   1. Definitions.
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to the Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+   8. Limitation of Liability. In no event and under no theory of
+      liability, whether in contract, strict liability, or tort
+      (including negligence or otherwise) arising in any way out of
+      the use or inability to use the Work (even if such Holder or
+      other party has been advised of the possibility of such damages),
+      shall any Contributor be liable to You for damages, including any
+      direct, indirect, special, incidental, or consequential damages of
+      any character arising as a result of this License or out of the use
+      or inability to use the Work (including but not limited to damages
+      for loss of goodwill, work stoppage, computer failure or malfunction,
+      or any and all other commercial damages or losses), even if such
+      Contributor has been advised of the possibility of such damages.
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+   END OF TERMS AND CONDITIONS
+   Copyright 2024 Tom
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+       http://www.apache.org/licenses/LICENSE-2.0
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.

README.md ADDED Viewed

	@@ -0,0 +1,185 @@

+---
+title: ReachyClaw
+emoji: 🤖
+colorFrom: blue
+colorTo: purple
+sdk: static
+pinned: false
+short_description: OpenClaw AI agent with a Reachy Mini robot body
+tags:
+ - reachy_mini
+ - reachy_mini_python_app
+ - openclaw
+ - robotics
+ - embodied-ai
+ - ai-assistant
+ - openai-realtime
+ - voice-assistant
+ - conversational-ai
+ - physical-ai
+ - robot-body
+ - speech-to-speech
+ - multimodal
+ - vision
+ - expressive-robot
+ - face-tracking
+ - human-robot-interaction
+---
+# ReachyClaw
+**Your OpenClaw AI agent, embodied in a Reachy Mini robot.**
+ReachyClaw makes OpenClaw the actual brain of a Reachy Mini robot. Unlike typical setups where GPT-4o handles conversations and only calls OpenClaw occasionally, ReachyClaw routes **every** user message through OpenClaw. The robot speaks, moves, and sees — all controlled by your OpenClaw agent.
+OpenAI Realtime API is used purely for voice I/O (speech-to-text and text-to-speech). Your OpenClaw agent decides what to say **and** how the robot moves.
+## Architecture
+```
+User speaks -> OpenAI Realtime API (STT only)
+            -> GPT-4o calls ask_openclaw with the user's message
+            -> OpenClaw (the actual brain) responds with text + action tags
+            -> ReachyClaw parses action tags -> robot moves (emotions, look, dance)
+            -> Clean text -> GPT-4o (TTS only) -> Robot speaks
+```
+OpenClaw can include action tags like `[EMOTION:happy]`, `[LOOK:left]`, `[DANCE:excited]` in its responses. These are parsed and executed on the robot, then stripped so only the spoken words go to TTS.
+## Features
+- **OpenClaw is the brain** — every message goes through your OpenClaw agent, not GPT-4o
+- **Full body control** — OpenClaw controls head movement, emotions, dances, and camera
+- **Real-time voice** — OpenAI Realtime API for low-latency speech I/O
+- **Face tracking** — robot tracks your face and maintains eye contact
+- **Camera vision** — robot can see through its camera and describe what it sees
+- **Conversation memory** — OpenClaw maintains full context across sessions and channels
+- **Works with simulator** — no physical robot required
+## Available Robot Actions
+OpenClaw can use these action tags in responses:
+| Action | Tags |
+|--------|------|
+| **Look** | `[LOOK:left]` `[LOOK:right]` `[LOOK:up]` `[LOOK:down]` `[LOOK:front]` |
+| **Emotion** | `[EMOTION:happy]` `[EMOTION:sad]` `[EMOTION:surprised]` `[EMOTION:curious]` `[EMOTION:thinking]` `[EMOTION:confused]` `[EMOTION:excited]` |
+| **Dance** | `[DANCE:happy]` `[DANCE:excited]` `[DANCE:wave]` `[DANCE:nod]` `[DANCE:shake]` `[DANCE:bounce]` |
+| **Camera** | `[CAMERA]` |
+| **Face Tracking** | `[FACE_TRACKING:on]` `[FACE_TRACKING:off]` |
+| **Stop** | `[STOP]` |
+## Prerequisites
+### Option A: With Physical Robot
+- [Reachy Mini](https://www.pollen-robotics.com/reachy-mini/) robot (Wireless or Lite)
+### Option B: With Simulator
+- Any computer with Python 3.11+
+- Install: `pip install "reachy-mini[mujoco]"`
+### Software (Both Options)
+- Python 3.11+
+- [Reachy Mini SDK](https://github.com/pollen-robotics/reachy_mini)
+- [OpenClaw](https://github.com/openclaw/openclaw) gateway running
+- OpenAI API key with Realtime API access
+## Installation
+```bash
+# Clone ReachyClaw
+git clone https://github.com/shaunx/reachyclaw
+cd reachyclaw
+# Create virtual environment
+python -m venv .venv
+source .venv/bin/activate
+# Install
+pip install -e ".[mediapipe_vision]"
+# Configure
+cp .env.example .env
+# Edit .env with your keys
+```
+## Configuration
+Edit `.env`:
+```bash
+# Required
+OPENAI_API_KEY=sk-...your-key...
+# OpenClaw Gateway (required)
+OPENCLAW_GATEWAY_URL=http://localhost:18789
+OPENCLAW_TOKEN=your-gateway-token
+OPENCLAW_AGENT_ID=main
+# Optional
+OPENAI_VOICE=cedar
+ENABLE_FACE_TRACKING=true
+HEAD_TRACKER_TYPE=mediapipe
+```
+## Usage
+### With Simulator
+```bash
+# Terminal 1: Start simulator
+reachy-mini-daemon --sim
+# Terminal 2: Run ReachyClaw
+reachyclaw --gradio
+```
+### With Physical Robot
+```bash
+reachyclaw
+# With debug logging
+reachyclaw --debug
+# With specific robot
+reachyclaw --robot-name my-reachy
+```
+### CLI Options
+| Option | Description |
+|--------|-------------|
+| `--debug` | Enable debug logging |
+| `--gradio` | Launch web UI instead of console mode |
+| `--robot-name NAME` | Specify robot name for connection |
+| `--gateway-url URL` | OpenClaw gateway URL |
+| `--no-camera` | Disable camera functionality |
+| `--no-openclaw` | Disable OpenClaw integration |
+| `--head-tracker TYPE` | Face tracker: `mediapipe` or `yolo` |
+| `--no-face-tracking` | Disable face tracking |
+## How It Differs from ClawBody
+ClawBody (the stock app) uses GPT-4o as the brain and only calls OpenClaw occasionally for tools like calendar or weather. ReachyClaw inverts this:
+| | ClawBody | ReachyClaw |
+|---|---|---|
+| **Brain** | GPT-4o (with OpenClaw context snapshot) | OpenClaw (every message) |
+| **Body control** | GPT-4o decides movements | OpenClaw decides movements |
+| **Startup** | 20-30s context fetch from OpenClaw | Instant (no context fetch needed) |
+| **Memory** | Stale snapshot from startup | Live OpenClaw memory |
+| **GPT-4o role** | Full agent | Voice relay only |
+## License
+Apache 2.0 — see [LICENSE](LICENSE).
+## Acknowledgments
+Built on top of:
+- [Pollen Robotics](https://www.pollen-robotics.com/) — Reachy Mini robot, SDK, and simulator
+- [OpenClaw](https://github.com/openclaw/openclaw) — AI agent framework
+- [OpenAI](https://openai.com/) — Realtime API for voice I/O
+- [ClawBody](https://github.com/tomrikert/clawbody) — Original Reachy Mini + OpenClaw app (Apache 2.0)

index.html ADDED Viewed

	@@ -0,0 +1,201 @@

+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>ReachyClaw - Reachy Mini App</title>
+    <link rel="preconnect" href="https://fonts.googleapis.com">
+    <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
+    <link href="https://fonts.googleapis.com/css2?family=Space+Grotesk:wght@400;500;700&display=swap" rel="stylesheet">
+    <link rel="stylesheet" href="style.css">
+</head>
+<body>
+<section class="hero">
+    <div class="topline">
+        <div class="brand">
+            <span class="logo">🤖</span>
+            <span class="brand-name">ReachyClaw</span>
+        </div>
+        <div class="pill">Voice conversation · OpenClaw brain · Full body control</div>
+    </div>
+    <div class="hero-grid">
+        <div class="hero-copy">
+            <div class="eyebrow">Reachy Mini App</div>
+            <h1>Your OpenClaw agent, embodied.</h1>
+            <p class="lede">
+                Give your OpenClaw AI agent a Reachy Mini robot body.
+                OpenClaw is the brain — it controls what the robot says,
+                how it moves, and what it sees. OpenAI Realtime API handles voice I/O.
+            </p>
+            <div class="hero-actions">
+                <a href="#simulator" class="btn primary">🖥️ Try with Simulator</a>
+                <a href="#features" class="btn ghost">See features</a>
+            </div>
+            <div class="hero-badges">
+                <span>🧠 OpenClaw brain</span>
+                <span>🎙️ OpenAI Realtime voice</span>
+                <span>💃 Full body control</span>
+                <span>🖥️ No robot required!</span>
+            </div>
+        </div>
+        <div class="hero-visual">
+            <div class="glass-card">
+                <img src="https://huggingface.co/spaces/pollen-robotics/reachy_mini_conversation_app/resolve/main/docs/assets/reachy_mini_dance.gif"
+                     alt="Reachy Mini Robot Dancing"
+                     class="hero-gif">
+                <p class="caption">Works with physical robot OR MuJoCo simulator!</p>
+            </div>
+        </div>
+    </div>
+</section>
+<section class="section simulator-callout" id="simulator">
+    <div class="story-card highlight">
+        <h2>🖥️ No Robot? No Problem!</h2>
+        <p class="story-text" style="font-size: 1.1rem;">
+            <strong>You don't need a physical Reachy Mini robot to use ReachyClaw!</strong><br><br>
+            ReachyClaw works with the Reachy Mini Simulator, a MuJoCo-based physics simulation
+            that runs on your computer. Watch your agent move and express emotions on screen
+            while you talk.
+        </p>
+        <div class="architecture-preview" style="margin: 1.5rem 0;">
+<pre>
+# Install simulator support
+pip install "reachy-mini[mujoco]"
+# Start the simulator (opens 3D window)
+reachy-mini-daemon --sim
+# In another terminal, run ReachyClaw
+reachyclaw --gradio
+</pre>
+        </div>
+        <p class="caption">🍎 Mac Users: Use <code>mjpython -m reachy_mini.daemon.app.main --sim</code> instead</p>
+        <a href="https://huggingface.co/docs/reachy_mini/platforms/simulation/get_started" class="btn primary" style="margin-top: 1rem;" target="_blank">
+            📚 Simulator Setup Guide
+        </a>
+    </div>
+</section>
+<section class="section" id="features">
+    <div class="section-header">
+        <h2>What's inside</h2>
+        <p class="intro">
+            ReachyClaw makes OpenClaw the actual brain — every message, every movement, every decision.
+        </p>
+    </div>
+    <div class="feature-grid">
+        <div class="feature-card">
+            <div class="icon">🧠</div>
+            <h3>OpenClaw is the brain</h3>
+            <p>Every user message goes through your OpenClaw agent. No GPT-4o guessing — real responses with full tool access.</p>
+        </div>
+        <div class="feature-card">
+            <div class="icon">🎤</div>
+            <h3>Real-time voice</h3>
+            <p>OpenAI Realtime API for low-latency speech-to-text and text-to-speech. Voice I/O only — no GPT-4o brain.</p>
+        </div>
+        <div class="feature-card">
+            <div class="icon">🤖</div>
+            <h3>Full body control</h3>
+            <p>OpenClaw controls the robot body via action tags — head movement, emotions, dances, camera, face tracking.</p>
+        </div>
+        <div class="feature-card">
+            <div class="icon">👀</div>
+            <h3>Vision</h3>
+            <p>See through the robot's camera. Your agent can look around and describe what it sees.</p>
+        </div>
+        <div class="feature-card">
+            <div class="icon">🖥️</div>
+            <h3>Simulator support</h3>
+            <p>No robot? Run with MuJoCo simulator and watch your agent move in a 3D window.</p>
+        </div>
+        <div class="feature-card">
+            <div class="icon">⚡</div>
+            <h3>Instant startup</h3>
+            <p>No 30-second context fetch. GPT-4o is just a relay — the session starts immediately.</p>
+        </div>
+    </div>
+</section>
+<section class="section story" id="how-it-works">
+    <div class="story-grid">
+        <div class="story-card">
+            <h3>How it works</h3>
+            <p class="story-text">OpenClaw controls everything</p>
+            <ol class="story-list">
+                <li><span>🎤</span> Robot captures your voice</li>
+                <li><span>📝</span> OpenAI Realtime transcribes your speech</li>
+                <li><span>🧠</span> Your message goes to OpenClaw (the real brain)</li>
+                <li><span>🤖</span> OpenClaw responds with text + action tags like [EMOTION:happy]</li>
+                <li><span>💃</span> ReachyClaw executes the actions on the robot</li>
+                <li><span>🔊</span> Clean text goes to TTS — robot speaks while moving</li>
+            </ol>
+        </div>
+        <div class="story-card secondary">
+            <h3>Prerequisites</h3>
+            <p class="story-text">Choose your setup:</p>
+            <div class="chips">
+                <span class="chip">🧠 OpenClaw Gateway</span>
+                <span class="chip">🔑 OpenAI API Key</span>
+                <span class="chip">🐍 Python 3.11+</span>
+            </div>
+            <p class="story-text" style="margin-top: 1rem;">
+                <strong>Option A:</strong> 🤖 Physical Reachy Mini robot<br>
+                <strong>Option B:</strong> 🖥️ MuJoCo Simulator (free, no hardware!)
+            </p>
+            <a href="https://github.com/shaunx/reachyclaw#readme" class="btn ghost wide" style="margin-top: 1rem;">
+                View installation guide
+            </a>
+        </div>
+    </div>
+</section>
+<section class="section">
+    <div class="section-header">
+        <h2>Quick start</h2>
+        <p class="intro">Get ReachyClaw running with the simulator</p>
+    </div>
+    <div class="story-card">
+        <div class="architecture-preview">
+<pre>
+# Clone ReachyClaw
+git clone https://github.com/shaunx/reachyclaw
+cd reachyclaw
+# Create virtual environment
+python -m venv .venv
+source .venv/bin/activate
+# Install ReachyClaw + simulator
+pip install -e .
+pip install "reachy-mini[mujoco]"
+# Configure (edit with your OpenClaw URL and OpenAI key)
+cp .env.example .env
+nano .env
+# Terminal 1: Start simulator
+reachy-mini-daemon --sim
+# Terminal 2: Run ReachyClaw
+reachyclaw --gradio
+</pre>
+        </div>
+    </div>
+</section>
+<footer class="footer">
+    <p>
+        ReachyClaw — your OpenClaw agent, embodied in Reachy Mini.<br>
+        <strong>Works with physical robot OR simulator!</strong><br><br>
+        Learn more about <a href="https://github.com/openclaw/openclaw">OpenClaw</a>,
+        <a href="https://github.com/pollen-robotics/reachy_mini">Reachy Mini</a>, and
+        <a href="https://huggingface.co/docs/reachy_mini/platforms/simulation/get_started">the Simulator</a>.
+    </p>
+</footer>
+</body>
+</html>

openclaw-skill/SKILL.md ADDED Viewed

	@@ -0,0 +1,109 @@

+---
+name: reachyclaw
+description: Give your OpenClaw AI agent a physical robot body with Reachy Mini. OpenClaw is the brain — it controls speech, movement, and vision. Works with physical robot OR simulator!
+---
+# ReachyClaw - Robot Body for OpenClaw
+Give your OpenClaw agent a physical Reachy Mini robot body where OpenClaw is the actual brain.
+## Overview
+ReachyClaw embodies your OpenClaw AI agent in a Reachy Mini robot. Unlike typical setups where GPT-4o is the brain, ReachyClaw routes every message through OpenClaw and lets it control the robot body via action tags.
+- **Hear**: Listen to voice commands via the robot's microphone
+- **See**: View the world through the robot's camera
+- **Speak**: Respond with natural voice through the robot's speaker
+- **Move**: Control head movements, emotions, and dances from OpenClaw
+## Architecture
+```
+You speak -> Reachy Mini microphone
+                 |
+          OpenAI Realtime API (STT only)
+                 |
+          OpenClaw (the actual brain)
+                 |
+     Response: "[EMOTION:happy] That's great!"
+                 |
+     ReachyClaw parses actions -> robot moves
+     Clean text -> TTS -> robot speaks
+```
+## Requirements
+### Option A: Physical Robot
+- [Reachy Mini](https://github.com/pollen-robotics/reachy_mini) robot (Wireless or Lite)
+### Option B: Simulator (No Hardware Required!)
+- Any computer with Python 3.11+
+- Install: `pip install "reachy-mini[mujoco]"`
+### Software (Both Options)
+- Python 3.11+
+- OpenAI API key with Realtime API access
+- OpenClaw gateway running on your network
+## Installation
+```bash
+git clone https://github.com/shaunx/reachyclaw
+cd reachyclaw
+pip install -e .
+```
+## Configuration
+Create a `.env` file:
+```bash
+OPENAI_API_KEY=sk-your-key-here
+OPENCLAW_GATEWAY_URL=http://your-host-ip:18789
+OPENCLAW_TOKEN=your-gateway-token
+```
+## Usage
+### With Simulator
+```bash
+# Terminal 1: Start simulator
+reachy-mini-daemon --sim
+# Terminal 2: Run ReachyClaw
+reachyclaw --gradio
+```
+### With Physical Robot
+```bash
+reachyclaw
+# With debug logging
+reachyclaw --debug
+# With Gradio web UI
+reachyclaw --gradio
+```
+## Robot Actions
+OpenClaw can include these action tags in its responses:
+- `[LOOK:left/right/up/down/front]` — head movement
+- `[EMOTION:happy/sad/surprised/curious/thinking/confused/excited]` — emotions
+- `[DANCE:happy/excited/wave/nod/shake/bounce]` — dances
+- `[CAMERA]` — capture what the robot sees
+- `[FACE_TRACKING:on/off]` — toggle face tracking
+- `[STOP]` — stop all movements
+## Links
+- [GitHub Repository](https://github.com/shaunx/reachyclaw)
+- [Reachy Mini SDK](https://github.com/pollen-robotics/reachy_mini)
+- [OpenClaw Documentation](https://docs.openclaw.ai)
+## License
+Apache 2.0

pyproject.toml ADDED Viewed

	@@ -0,0 +1,120 @@

+[build-system]
+requires = ["setuptools>=61.0", "wheel"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "reachyclaw"
+version = "0.1.0"
+description = "ReachyClaw - Give your OpenClaw AI agent a physical robot body with Reachy Mini. OpenClaw is the brain; OpenAI Realtime API handles voice I/O."
+readme = "README.md"
+license = {text = "Apache-2.0"}
+requires-python = ">=3.11"
+authors = [
+    {name = "Shaun"}
+]
+keywords = [
+    "reachyclaw",
+    "reachy-mini",
+    "openclaw",
+    "robotics",
+    "ai-assistant",
+    "openai-realtime",
+    "voice-conversation",
+    "expressive-robot",
+    "embodied-ai"
+]
+classifiers = [
+    "Development Status :: 4 - Beta",
+    "Intended Audience :: Developers",
+    "License :: OSI Approved :: Apache Software License",
+    "Programming Language :: Python :: 3",
+    "Programming Language :: Python :: 3.11",
+    "Programming Language :: Python :: 3.12",
+    "Topic :: Scientific/Engineering :: Artificial Intelligence",
+    "Topic :: Scientific/Engineering :: Human Machine Interfaces",
+]
+dependencies = [
+    # OpenAI Realtime API
+    "openai>=1.50.0",
+    # Audio streaming
+    "fastrtc>=0.0.17",
+    "numpy",
+    "scipy",
+    # OpenClaw gateway client (WebSocket protocol)
+    "websockets>=12.0",
+    # Gradio UI
+    "gradio>=4.0.0",
+    # Environment
+    "python-dotenv",
+]
+# Note: reachy-mini SDK must be installed separately from the robot or GitHub:
+# pip install git+https://github.com/pollen-robotics/reachy_mini.git
+# Or on the robot, it's pre-installed.
+[project.optional-dependencies]
+wireless = [
+    "pygobject",
+]
+# YOLO-based face tracking (recommended - more accurate)
+yolo_vision = [
+    "opencv-python",
+    "ultralytics",
+    "supervision",
+]
+# MediaPipe-based face tracking (lighter weight alternative)
+mediapipe_vision = [
+    "mediapipe>=0.10.14",
+]
+# All vision options
+all_vision = [
+    "opencv-python",
+    "ultralytics",
+    "supervision",
+    "mediapipe>=0.10.14",
+]
+# Legacy alias
+vision = [
+    "opencv-python",
+    "ultralytics",
+    "supervision",
+]
+dev = [
+    "pytest",
+    "pytest-asyncio",
+    "ruff",
+    "mypy",
+]
+[project.scripts]
+reachyclaw = "reachy_mini_openclaw.main:main"
+[project.entry-points."reachy_mini_apps"]
+reachyclaw = "reachy_mini_openclaw.main:ReachyClawApp"
+[project.urls]
+Homepage = "https://github.com/shaunx/reachyclaw"
+Documentation = "https://github.com/shaunx/reachyclaw#readme"
+Repository = "https://github.com/shaunx/reachyclaw"
+Issues = "https://github.com/shaunx/reachyclaw/issues"
+[tool.setuptools.packages.find]
+where = ["src"]
+[tool.ruff]
+line-length = 120
+target-version = "py311"
+[tool.ruff.lint]
+select = ["E", "F", "I", "N", "W", "UP"]
+ignore = ["E501"]
+[tool.mypy]
+python_version = "3.11"
+warn_return_any = true
+warn_unused_configs = true
+ignore_missing_imports = true

src/reachy_mini_openclaw/__init__.py ADDED Viewed

	@@ -0,0 +1,9 @@

+"""Reachy Mini OpenClaw - Give your OpenClaw AI agent a physical presence.
+This package combines OpenAI's Realtime API for responsive voice conversation
+with Reachy Mini's expressive robot movements, allowing your OpenClaw agent
+to see, hear, and speak through a physical robot body.
+"""
+__version__ = "0.1.0"
+__author__ = "Tom"

src/reachy_mini_openclaw/audio/__init__.py ADDED Viewed

	@@ -0,0 +1,5 @@

+"""Audio processing modules for Reachy Mini OpenClaw."""
+from reachy_mini_openclaw.audio.head_wobbler import HeadWobbler
+__all__ = ["HeadWobbler"]

src/reachy_mini_openclaw/audio/head_wobbler.py ADDED Viewed

	@@ -0,0 +1,223 @@

+"""Audio-driven head movement for natural speech animation.
+This module analyzes audio output in real-time and generates subtle head
+movements that make the robot appear more expressive and alive while speaking.
+The wobble is generated based on:
+- Audio amplitude (volume) -> vertical movement
+- Frequency content -> horizontal sway
+- Speech rhythm -> timing of movements
+Design:
+- Runs in a separate thread to avoid blocking the main audio pipeline
+- Uses a circular buffer for smooth interpolation
+- Generates offsets that are added to the primary pose by MovementManager
+"""
+import base64
+import logging
+import threading
+import time
+from collections import deque
+from typing import Callable, Optional, Tuple
+import numpy as np
+from numpy.typing import NDArray
+logger = logging.getLogger(__name__)
+# Type alias for speech offsets: (x, y, z, roll, pitch, yaw)
+SpeechOffsets = Tuple[float, float, float, float, float, float]
+class HeadWobbler:
+    """Generate audio-driven head movements for expressive speech.
+    The wobbler analyzes incoming audio and produces subtle head movements
+    that are synchronized with speech patterns, making the robot appear
+    more natural and engaged during conversation.
+    Example:
+        def apply_offsets(offsets):
+            movement_manager.set_speech_offsets(offsets)
+        wobbler = HeadWobbler(set_speech_offsets=apply_offsets)
+        wobbler.start()
+        # Feed audio as it's played
+        wobbler.feed(base64_audio_chunk)
+        wobbler.stop()
+    """
+    def __init__(
+        self,
+        set_speech_offsets: Callable[[SpeechOffsets], None],
+        sample_rate: int = 24000,
+        update_rate: float = 30.0,  # Hz
+    ):
+        """Initialize the head wobbler.
+        Args:
+            set_speech_offsets: Callback to apply offsets to the movement system
+            sample_rate: Expected audio sample rate (Hz)
+            update_rate: How often to update offsets (Hz)
+        """
+        self.set_speech_offsets = set_speech_offsets
+        self.sample_rate = sample_rate
+        self.update_period = 1.0 / update_rate
+        # Audio analysis parameters
+        self.amplitude_scale = 0.008  # Max displacement in meters
+        self.roll_scale = 0.15  # Max roll in radians
+        self.pitch_scale = 0.08  # Max pitch in radians
+        self.smoothing = 0.3  # Smoothing factor (0-1)
+        # State
+        self._audio_buffer: deque[NDArray[np.float32]] = deque(maxlen=10)
+        self._buffer_lock = threading.Lock()
+        self._current_amplitude = 0.0
+        self._current_offsets: SpeechOffsets = (0.0, 0.0, 0.0, 0.0, 0.0, 0.0)
+        # Thread control
+        self._stop_event = threading.Event()
+        self._thread: Optional[threading.Thread] = None
+        self._last_feed_time = 0.0
+        self._is_speaking = False
+        # Decay parameters for smooth return to neutral
+        self._decay_rate = 3.0  # How fast to decay when not speaking
+        self._speech_timeout = 0.3  # Seconds of silence before decay starts
+    def start(self) -> None:
+        """Start the wobbler thread."""
+        if self._thread is not None and self._thread.is_alive():
+            logger.warning("HeadWobbler already running")
+            return
+        self._stop_event.clear()
+        self._thread = threading.Thread(target=self._run_loop, daemon=True)
+        self._thread.start()
+        logger.debug("HeadWobbler started")
+    def stop(self) -> None:
+        """Stop the wobbler thread."""
+        self._stop_event.set()
+        if self._thread is not None:
+            self._thread.join(timeout=1.0)
+            self._thread = None
+        # Reset to neutral
+        self.set_speech_offsets((0.0, 0.0, 0.0, 0.0, 0.0, 0.0))
+        logger.debug("HeadWobbler stopped")
+    def reset(self) -> None:
+        """Reset the wobbler state (call when speech ends or is interrupted)."""
+        with self._buffer_lock:
+            self._audio_buffer.clear()
+        self._current_amplitude = 0.0
+        self._is_speaking = False
+        self.set_speech_offsets((0.0, 0.0, 0.0, 0.0, 0.0, 0.0))
+        logger.debug("HeadWobbler reset")
+    def feed(self, audio_b64: str) -> None:
+        """Feed audio data to the wobbler.
+        Args:
+            audio_b64: Base64-encoded PCM audio (int16)
+        """
+        try:
+            audio_bytes = base64.b64decode(audio_b64)
+            audio_int16 = np.frombuffer(audio_bytes, dtype=np.int16)
+            audio_float = audio_int16.astype(np.float32) / 32768.0
+            with self._buffer_lock:
+                self._audio_buffer.append(audio_float)
+            self._last_feed_time = time.monotonic()
+            self._is_speaking = True
+        except Exception as e:
+            logger.debug("Error feeding audio to wobbler: %s", e)
+    def _compute_amplitude(self) -> float:
+        """Compute current audio amplitude from buffer."""
+        with self._buffer_lock:
+            if not self._audio_buffer:
+                return 0.0
+            # Concatenate recent audio
+            audio = np.concatenate(list(self._audio_buffer))
+        # RMS amplitude
+        rms = np.sqrt(np.mean(audio ** 2))
+        return min(1.0, rms * 3.0)  # Scale and clamp
+    def _compute_offsets(self, amplitude: float, t: float) -> SpeechOffsets:
+        """Compute head offsets based on amplitude and time.
+        Args:
+            amplitude: Current audio amplitude (0-1)
+            t: Current time for oscillation
+        Returns:
+            Tuple of (x, y, z, roll, pitch, yaw) offsets
+        """
+        if amplitude < 0.01:
+            return (0.0, 0.0, 0.0, 0.0, 0.0, 0.0)
+        # Vertical bob based on amplitude
+        z_offset = amplitude * self.amplitude_scale * np.sin(t * 8.0)
+        # Subtle roll sway
+        roll_offset = amplitude * self.roll_scale * np.sin(t * 3.0)
+        # Pitch variation
+        pitch_offset = amplitude * self.pitch_scale * np.sin(t * 5.0 + 0.5)
+        # Small yaw drift
+        yaw_offset = amplitude * 0.05 * np.sin(t * 2.0)
+        return (0.0, 0.0, z_offset, roll_offset, pitch_offset, yaw_offset)
+    def _run_loop(self) -> None:
+        """Main wobbler loop."""
+        start_time = time.monotonic()
+        while not self._stop_event.is_set():
+            loop_start = time.monotonic()
+            t = loop_start - start_time
+            # Check if we're still receiving audio
+            silence_duration = loop_start - self._last_feed_time
+            if silence_duration > self._speech_timeout:
+                # Decay amplitude when not speaking
+                self._current_amplitude *= np.exp(-self._decay_rate * self.update_period)
+                self._is_speaking = False
+            else:
+                # Compute new amplitude with smoothing
+                raw_amplitude = self._compute_amplitude()
+                self._current_amplitude = (
+                    self.smoothing * raw_amplitude +
+                    (1 - self.smoothing) * self._current_amplitude
+                )
+            # Compute and apply offsets
+            offsets = self._compute_offsets(self._current_amplitude, t)
+            # Smooth transition between offsets
+            new_offsets = tuple(
+                self.smoothing * new + (1 - self.smoothing) * old
+                for new, old in zip(offsets, self._current_offsets)
+            )
+            self._current_offsets = new_offsets
+            # Apply to movement system
+            self.set_speech_offsets(new_offsets)
+            # Maintain update rate
+            elapsed = time.monotonic() - loop_start
+            sleep_time = max(0.0, self.update_period - elapsed)
+            if sleep_time > 0:
+                time.sleep(sleep_time)

src/reachy_mini_openclaw/camera_worker.py ADDED Viewed

	@@ -0,0 +1,382 @@

+"""Camera worker thread with frame buffering and face tracking.
+Provides:
+- 30Hz+ camera polling with thread-safe frame buffering
+- Face tracking integration with smooth interpolation
+- Room scanning when no face is detected
+- Latest frame always available for tools
+- Smooth return to neutral when face is lost
+Based on pollen-robotics/reachy_mini_conversation_app camera worker.
+"""
+import time
+import logging
+import threading
+from typing import Any, List, Tuple, Optional
+import numpy as np
+from numpy.typing import NDArray
+from scipy.spatial.transform import Rotation as R
+from reachy_mini import ReachyMini
+from reachy_mini.utils.interpolation import linear_pose_interpolation
+logger = logging.getLogger(__name__)
+class CameraWorker:
+    """Thread-safe camera worker with frame buffering and face tracking.
+    State machine for face tracking:
+        SCANNING  -- no face known, sweeping the room to find one
+        TRACKING  -- face detected, following it with head offsets
+        WAITING   -- face just lost, holding position briefly
+        RETURNING -- interpolating back to neutral before scanning again
+    """
+    def __init__(self, reachy_mini: ReachyMini, head_tracker: Any = None) -> None:
+        """Initialize camera worker.
+        Args:
+            reachy_mini: Connected ReachyMini instance
+            head_tracker: Optional head tracker (YOLO or MediaPipe)
+        """
+        self.reachy_mini = reachy_mini
+        self.head_tracker = head_tracker
+        # Thread-safe frame storage
+        self.latest_frame: Optional[NDArray[np.uint8]] = None
+        self.frame_lock = threading.Lock()
+        self._stop_event = threading.Event()
+        self._thread: Optional[threading.Thread] = None
+        # Face tracking state
+        self.is_head_tracking_enabled = True
+        self.face_tracking_offsets: List[float] = [
+            0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
+        ]  # x, y, z, roll, pitch, yaw
+        self.face_tracking_lock = threading.Lock()
+        # Face tracking timing (for smooth interpolation back to neutral)
+        self.last_face_detected_time: Optional[float] = None
+        self.interpolation_start_time: Optional[float] = None
+        self.interpolation_start_pose: Optional[NDArray[np.float32]] = None
+        self.face_lost_delay = 2.0  # seconds to wait before starting interpolation
+        self.interpolation_duration = 1.0  # seconds to interpolate back to neutral
+        # Track state changes
+        self.previous_head_tracking_state = self.is_head_tracking_enabled
+        # Tracking scale factor (proportional gain for the camera-head servo loop).
+        # 0.85 provides accurate convergence via closed-loop feedback while
+        # avoiding single-frame overshoot that causes jitter.
+        self.tracking_scale = 0.85
+        # Smoothing factor for exponential moving average (0.0-1.0)
+        # At 25Hz with alpha=0.25, 95% convergence ~0.5s -- smooth enough to
+        # filter detection noise, responsive enough to feel like eye contact.
+        self.smoothing_alpha = 0.25
+        # Previous smoothed offsets for EMA calculation
+        self._smoothed_offsets: List[float] = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
+        # --- Room scanning state ---
+        # When no face is visible, the robot periodically sweeps the room.
+        self._scanning = False
+        self._scanning_start_time = 0.0
+        # Scanning pattern: sinusoidal yaw sweep
+        self._scan_yaw_amplitude = np.deg2rad(35)  # ±35 degrees
+        self._scan_period = 8.0  # seconds for a full left-right-left cycle
+        self._scan_pitch_offset = np.deg2rad(3)  # slight upward tilt while scanning
+        # Start scanning immediately at boot (before any face has ever been seen)
+        self._ever_seen_face = False
+    def get_latest_frame(self) -> Optional[NDArray[np.uint8]]:
+        """Get the latest frame (thread-safe).
+        Returns:
+            Copy of latest frame in BGR format, or None if no frame available
+        """
+        with self.frame_lock:
+            if self.latest_frame is None:
+                return None
+            return self.latest_frame.copy()
+    def get_face_tracking_offsets(
+        self,
+    ) -> Tuple[float, float, float, float, float, float]:
+        """Get current face tracking offsets (thread-safe).
+        Returns:
+            Tuple of (x, y, z, roll, pitch, yaw) offsets
+        """
+        with self.face_tracking_lock:
+            offsets = self.face_tracking_offsets
+            return (offsets[0], offsets[1], offsets[2], offsets[3], offsets[4], offsets[5])
+    def set_head_tracking_enabled(self, enabled: bool) -> None:
+        """Enable/disable head tracking.
+        Args:
+            enabled: Whether to enable face tracking
+        """
+        if enabled and not self.is_head_tracking_enabled:
+            # Reset smoothed offsets so tracking converges quickly from scratch
+            self._smoothed_offsets = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
+            # Start scanning immediately when re-enabled
+            self._start_scanning()
+        self.is_head_tracking_enabled = enabled
+        logger.info("Head tracking %s", "enabled" if enabled else "disabled")
+    def start(self) -> None:
+        """Start the camera worker loop in a thread."""
+        self._stop_event.clear()
+        self._thread = threading.Thread(target=self._working_loop, daemon=True)
+        self._thread.start()
+        logger.info("Camera worker started")
+    def stop(self) -> None:
+        """Stop the camera worker loop."""
+        self._stop_event.set()
+        if self._thread is not None:
+            self._thread.join(timeout=2.0)
+        logger.info("Camera worker stopped")
+    # ------------------------------------------------------------------
+    # Scanning helpers
+    # ------------------------------------------------------------------
+    def _start_scanning(self) -> None:
+        """Begin the room-scanning sweep."""
+        if not self._scanning:
+            self._scanning = True
+            self._scanning_start_time = time.time()
+            logger.debug("Started room scanning")
+    def _stop_scanning(self) -> None:
+        """Stop the room-scanning sweep."""
+        if self._scanning:
+            self._scanning = False
+            logger.debug("Stopped room scanning")
+    def _update_scanning_offsets(self, current_time: float) -> None:
+        """Compute scanning offsets -- a slow yaw sweep with slight pitch up.
+        The sweep is sinusoidal so the head slows at the extremes (more natural)
+        and the face detector gets a chance to catch faces at the edges.
+        """
+        t = current_time - self._scanning_start_time
+        yaw = float(self._scan_yaw_amplitude * np.sin(2 * np.pi * t / self._scan_period))
+        pitch = float(self._scan_pitch_offset)
+        with self.face_tracking_lock:
+            self.face_tracking_offsets = [0.0, 0.0, 0.0, 0.0, pitch, yaw]
+    # ------------------------------------------------------------------
+    # Main loop
+    # ------------------------------------------------------------------
+    def _working_loop(self) -> None:
+        """Main camera worker loop.
+        Runs at ~25Hz, captures frames and processes face tracking.
+        """
+        logger.debug("Starting camera working loop")
+        # Neutral pose for interpolation target
+        neutral_pose = np.eye(4, dtype=np.float32)
+        self.previous_head_tracking_state = self.is_head_tracking_enabled
+        # Begin scanning right away so the robot looks for a face on startup
+        if self.is_head_tracking_enabled and self.head_tracker is not None:
+            self._start_scanning()
+        while not self._stop_event.is_set():
+            try:
+                current_time = time.time()
+                # Get frame from robot
+                frame = self.reachy_mini.media.get_frame()
+                if frame is not None:
+                    # Thread-safe frame storage
+                    with self.frame_lock:
+                        self.latest_frame = frame
+                    # Check if face tracking was just disabled
+                    if self.previous_head_tracking_state and not self.is_head_tracking_enabled:
+                        # Face tracking was just disabled - start interpolation to neutral
+                        self.last_face_detected_time = current_time
+                        self.interpolation_start_time = None
+                        self.interpolation_start_pose = None
+                        self._stop_scanning()
+                    # Update tracking state
+                    self.previous_head_tracking_state = self.is_head_tracking_enabled
+                    # Handle face tracking if enabled and head tracker available
+                    if self.is_head_tracking_enabled and self.head_tracker is not None:
+                        self._process_face_tracking(frame, current_time, neutral_pose)
+                    elif self.last_face_detected_time is not None:
+                        # Handle interpolation back to neutral when tracking disabled
+                        self._interpolate_to_neutral(current_time, neutral_pose)
+                # Sleep to maintain ~25Hz
+                time.sleep(0.04)
+            except Exception as e:
+                logger.error("Camera worker error: %s", e)
+                time.sleep(0.1)
+        logger.debug("Camera worker thread exited")
+    def _process_face_tracking(
+        self,
+        frame: NDArray[np.uint8],
+        current_time: float,
+        neutral_pose: NDArray[np.float32]
+    ) -> None:
+        """Process face tracking from frame.
+        Args:
+            frame: Current camera frame
+            current_time: Current timestamp
+            neutral_pose: Neutral pose matrix for interpolation
+        """
+        eye_center, _ = self.head_tracker.get_head_position(frame)
+        if eye_center is not None:
+            # Face detected!
+            if not self._ever_seen_face:
+                self._ever_seen_face = True
+                logger.info("Face detected for the first time")
+            # Stop scanning if we were scanning
+            if self._scanning:
+                self._stop_scanning()
+                # Seed the EMA from current scanning offsets for smooth transition
+                with self.face_tracking_lock:
+                    self._smoothed_offsets = list(self.face_tracking_offsets)
+            self.last_face_detected_time = current_time
+            self.interpolation_start_time = None  # Stop any interpolation
+            # Convert normalized coordinates to pixel coordinates
+            h, w = frame.shape[:2]
+            eye_center_norm = (eye_center + 1) / 2
+            eye_center_pixels = [
+                eye_center_norm[0] * w,
+                eye_center_norm[1] * h,
+            ]
+            # Get the head pose needed to look at the target
+            target_pose = self.reachy_mini.look_at_image(
+                eye_center_pixels[0],
+                eye_center_pixels[1],
+                duration=0.0,
+                perform_movement=False,
+            )
+            # Extract translation and rotation from the target pose
+            translation = target_pose[:3, 3]
+            rotation = R.from_matrix(target_pose[:3, :3]).as_euler("xyz", degrees=False)
+            # Scale for smoother closed-loop convergence
+            translation *= self.tracking_scale
+            rotation *= self.tracking_scale
+            # Apply exponential moving average (EMA) smoothing to reduce jitter
+            # new_smoothed = alpha * new_value + (1 - alpha) * old_smoothed
+            alpha = self.smoothing_alpha
+            new_offsets = [
+                translation[0], translation[1], translation[2],
+                rotation[0], rotation[1], rotation[2],
+            ]
+            smoothed = [
+                alpha * new_offsets[i] + (1 - alpha) * self._smoothed_offsets[i]
+                for i in range(6)
+            ]
+            self._smoothed_offsets = smoothed
+            # Thread-safe update of face tracking offsets
+            with self.face_tracking_lock:
+                self.face_tracking_offsets = smoothed
+        else:
+            # No face detected
+            if self._scanning:
+                # Already scanning -- keep sweeping the room
+                self._update_scanning_offsets(current_time)
+            else:
+                # Not scanning yet -- go through the wait/return/scan sequence
+                self._interpolate_to_neutral(current_time, neutral_pose)
+    def _interpolate_to_neutral(
+        self,
+        current_time: float,
+        neutral_pose: NDArray[np.float32]
+    ) -> None:
+        """Interpolate face tracking offsets back to neutral when face is lost.
+        Once interpolation completes, automatically starts room scanning.
+        Args:
+            current_time: Current timestamp
+            neutral_pose: Target neutral pose matrix
+        """
+        if self.last_face_detected_time is None:
+            # Never seen a face -- go straight to scanning
+            self._start_scanning()
+            return
+        time_since_face_lost = current_time - self.last_face_detected_time
+        if time_since_face_lost >= self.face_lost_delay:
+            # Start interpolation if not already started
+            if self.interpolation_start_time is None:
+                self.interpolation_start_time = current_time
+                # Capture current pose as start of interpolation
+                with self.face_tracking_lock:
+                    current_translation = self.face_tracking_offsets[:3]
+                    current_rotation_euler = self.face_tracking_offsets[3:]
+                    # Convert to 4x4 pose matrix
+                    pose_matrix = np.eye(4, dtype=np.float32)
+                    pose_matrix[:3, 3] = current_translation
+                    pose_matrix[:3, :3] = R.from_euler(
+                        "xyz", current_rotation_euler
+                    ).as_matrix()
+                    self.interpolation_start_pose = pose_matrix
+            # Calculate interpolation progress (t from 0 to 1)
+            elapsed_interpolation = current_time - self.interpolation_start_time
+            t = min(1.0, elapsed_interpolation / self.interpolation_duration)
+            # Interpolate between current pose and neutral pose
+            interpolated_pose = linear_pose_interpolation(
+                self.interpolation_start_pose,
+                neutral_pose,
+                t,
+            )
+            # Extract translation and rotation from interpolated pose
+            translation = interpolated_pose[:3, 3]
+            rotation = R.from_matrix(interpolated_pose[:3, :3]).as_euler("xyz", degrees=False)
+            # Thread-safe update of face tracking offsets
+            with self.face_tracking_lock:
+                self.face_tracking_offsets = [
+                    translation[0], translation[1], translation[2],
+                    rotation[0], rotation[1], rotation[2],
+                ]
+            # If interpolation is complete, start scanning the room
+            if t >= 1.0:
+                self.last_face_detected_time = None
+                self.interpolation_start_time = None
+                self.interpolation_start_pose = None
+                self._smoothed_offsets = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
+                self._start_scanning()

src/reachy_mini_openclaw/config.py ADDED Viewed

	@@ -0,0 +1,84 @@

+"""Configuration management for Reachy Mini OpenClaw.
+Handles environment variables and configuration settings for the application.
+"""
+import os
+from pathlib import Path
+from dataclasses import dataclass, field
+from typing import Optional
+from dotenv import load_dotenv
+# Load environment variables from .env file
+_project_root = Path(__file__).parent.parent.parent
+load_dotenv(_project_root / ".env")
+@dataclass
+class Config:
+    """Application configuration loaded from environment variables."""
+    # OpenAI Configuration
+    OPENAI_API_KEY: str = field(default_factory=lambda: os.getenv("OPENAI_API_KEY", ""))
+    OPENAI_MODEL: str = field(default_factory=lambda: os.getenv("OPENAI_MODEL", "gpt-4o-realtime-preview-2024-12-17"))
+    OPENAI_VOICE: str = field(default_factory=lambda: os.getenv("OPENAI_VOICE", "cedar"))
+    # OpenClaw Gateway Configuration
+    OPENCLAW_GATEWAY_URL: str = field(default_factory=lambda: os.getenv("OPENCLAW_GATEWAY_URL", "ws://localhost:18789"))
+    OPENCLAW_TOKEN: Optional[str] = field(default_factory=lambda: os.getenv("OPENCLAW_TOKEN"))
+    OPENCLAW_AGENT_ID: str = field(default_factory=lambda: os.getenv("OPENCLAW_AGENT_ID", "main"))
+    # Session key for OpenClaw - uses "main" to share context with WhatsApp and other channels
+    # Format: agent:<agent_id>:<session_key>, but we only need the session key part here
+    OPENCLAW_SESSION_KEY: str = field(default_factory=lambda: os.getenv("OPENCLAW_SESSION_KEY", "main"))
+    # Robot Configuration
+    ROBOT_NAME: Optional[str] = field(default_factory=lambda: os.getenv("ROBOT_NAME"))
+    # Feature Flags
+    ENABLE_OPENCLAW_TOOLS: bool = field(default_factory=lambda: os.getenv("ENABLE_OPENCLAW_TOOLS", "true").lower() == "true")
+    ENABLE_CAMERA: bool = field(default_factory=lambda: os.getenv("ENABLE_CAMERA", "true").lower() == "true")
+    ENABLE_FACE_TRACKING: bool = field(default_factory=lambda: os.getenv("ENABLE_FACE_TRACKING", "true").lower() == "true")
+    # Face Tracking Configuration
+    # Options: "yolo", "mediapipe", or None for auto-detect
+    HEAD_TRACKER_TYPE: Optional[str] = field(default_factory=lambda: os.getenv("HEAD_TRACKER_TYPE", "yolo"))
+    # Local Vision Processing
+    ENABLE_LOCAL_VISION: bool = field(default_factory=lambda: os.getenv("ENABLE_LOCAL_VISION", "false").lower() == "true")
+    LOCAL_VISION_MODEL: str = field(default_factory=lambda: os.getenv("LOCAL_VISION_MODEL", "HuggingFaceTB/SmolVLM2-256M-Video-Instruct"))
+    VISION_DEVICE: str = field(default_factory=lambda: os.getenv("VISION_DEVICE", "auto"))  # "auto", "cuda", "mps", "cpu"
+    HF_HOME: str = field(default_factory=lambda: os.getenv("HF_HOME", os.path.expanduser("~/.cache/huggingface")))
+    # Custom Profile (for personality customization)
+    CUSTOM_PROFILE: Optional[str] = field(default_factory=lambda: os.getenv("REACHY_MINI_CUSTOM_PROFILE"))
+    def validate(self) -> list[str]:
+        """Validate configuration and return list of errors."""
+        errors = []
+        if not self.OPENAI_API_KEY:
+            errors.append("OPENAI_API_KEY is required")
+        return errors
+# Global configuration instance
+config = Config()
+def set_custom_profile(profile: Optional[str]) -> None:
+    """Update the custom profile at runtime."""
+    global config
+    config.CUSTOM_PROFILE = profile
+    os.environ["REACHY_MINI_CUSTOM_PROFILE"] = profile or ""
+def set_face_tracking_enabled(enabled: bool) -> None:
+    """Enable or disable face tracking at runtime."""
+    global config
+    config.ENABLE_FACE_TRACKING = enabled
+def set_local_vision_enabled(enabled: bool) -> None:
+    """Enable or disable local vision processing at runtime."""
+    global config
+    config.ENABLE_LOCAL_VISION = enabled

src/reachy_mini_openclaw/gradio_app.py ADDED Viewed

	@@ -0,0 +1,202 @@

+"""Gradio web UI for Reachy Mini OpenClaw.
+This module provides a web interface for:
+- Viewing conversation transcripts
+- Configuring the assistant personality
+- Monitoring robot status
+- Manual control options
+"""
+import os
+import logging
+from typing import Optional
+import gradio as gr
+logger = logging.getLogger(__name__)
+def launch_gradio(
+    gateway_url: str = "ws://localhost:18789",
+    robot_name: Optional[str] = None,
+    enable_camera: bool = True,
+    enable_openclaw: bool = True,
+    enable_face_tracking: bool = True,
+    head_tracker_type: Optional[str] = None,
+    share: bool = False,
+) -> None:
+    """Launch the Gradio web UI.
+    Args:
+        gateway_url: OpenClaw gateway URL
+        robot_name: Robot name for connection
+        enable_camera: Whether to enable camera
+        enable_openclaw: Whether to enable OpenClaw
+        enable_face_tracking: Whether to enable face tracking
+        head_tracker_type: Head tracker type ('yolo', 'mediapipe', or None)
+        share: Whether to create a public URL
+    """
+    from reachy_mini_openclaw.prompts import get_available_profiles, save_custom_profile
+    from reachy_mini_openclaw.config import set_custom_profile, config
+    # State
+    app_instance = None
+    def start_conversation():
+        """Start the conversation."""
+        nonlocal app_instance
+        from reachy_mini_openclaw.main import ReachyClawCore
+        import asyncio
+        import threading
+        if app_instance is not None:
+            return "Already running"
+        try:
+            app_instance = ReachyClawCore(
+                gateway_url=gateway_url,
+                robot_name=robot_name,
+                enable_camera=enable_camera,
+                enable_openclaw=enable_openclaw,
+                enable_face_tracking=enable_face_tracking,
+                head_tracker_type=head_tracker_type,
+            )
+            # Run in background thread
+            def run_app():
+                loop = asyncio.new_event_loop()
+                asyncio.set_event_loop(loop)
+                try:
+                    loop.run_until_complete(app_instance.run())
+                except Exception as e:
+                    logger.error("App error: %s", e)
+                finally:
+                    loop.close()
+            thread = threading.Thread(target=run_app, daemon=True)
+            thread.start()
+            return "Started successfully"
+        except Exception as e:
+            return f"Error: {e}"
+    def stop_conversation():
+        """Stop the conversation."""
+        nonlocal app_instance
+        if app_instance is None:
+            return "Not running"
+        try:
+            app_instance.stop()
+            app_instance = None
+            return "Stopped"
+        except Exception as e:
+            return f"Error: {e}"
+    def apply_profile(profile_name):
+        """Apply a personality profile."""
+        set_custom_profile(profile_name if profile_name else None)
+        return f"Applied profile: {profile_name or 'default'}"
+    def save_profile(name, instructions):
+        """Save a new profile."""
+        if save_custom_profile(name, instructions):
+            return f"Saved profile: {name}"
+        return "Error saving profile"
+    # Build UI
+    with gr.Blocks(title="Reachy Mini OpenClaw") as demo:
+        gr.Markdown("""
+        # 🤖 Reachy Mini OpenClaw
+        Give your OpenClaw AI agent a physical presence with Reachy Mini.
+        Using OpenAI Realtime API for responsive voice conversation.
+        """)
+        with gr.Tab("Conversation"):
+            with gr.Row():
+                start_btn = gr.Button("▶️ Start", variant="primary")
+                stop_btn = gr.Button("⏹️ Stop", variant="secondary")
+            status_text = gr.Textbox(label="Status", interactive=False)
+            transcript = gr.Chatbot(label="Conversation", height=400)
+            start_btn.click(start_conversation, outputs=[status_text])
+            stop_btn.click(stop_conversation, outputs=[status_text])
+        with gr.Tab("Personality"):
+            profiles = get_available_profiles()
+            profile_dropdown = gr.Dropdown(
+                choices=[""] + profiles,
+                label="Select Profile",
+                value=""
+            )
+            apply_btn = gr.Button("Apply Profile")
+            profile_status = gr.Textbox(label="Status", interactive=False)
+            apply_btn.click(
+                apply_profile,
+                inputs=[profile_dropdown],
+                outputs=[profile_status]
+            )
+            gr.Markdown("### Create New Profile")
+            new_name = gr.Textbox(label="Profile Name")
+            new_instructions = gr.Textbox(
+                label="Instructions",
+                lines=10,
+                placeholder="Enter the system prompt for this personality..."
+            )
+            save_btn = gr.Button("Save Profile")
+            save_status = gr.Textbox(label="Save Status", interactive=False)
+            save_btn.click(
+                save_profile,
+                inputs=[new_name, new_instructions],
+                outputs=[save_status]
+            )
+        with gr.Tab("Settings"):
+            gr.Markdown(f"""
+            ### Current Configuration
+            - **OpenClaw Gateway**: {gateway_url}
+            - **OpenAI Model**: {config.OPENAI_MODEL}
+            - **Voice**: {config.OPENAI_VOICE}
+            - **Camera Enabled**: {enable_camera}
+            - **OpenClaw Enabled**: {enable_openclaw}
+            - **Face Tracking**: {enable_face_tracking}
+            - **Head Tracker**: {head_tracker_type or 'auto-detect'}
+            Edit `.env` file to change these settings.
+            """)
+        with gr.Tab("About"):
+            gr.Markdown("""
+            ## About Reachy Mini OpenClaw
+            This application combines:
+            - **OpenAI Realtime API** for ultra-low-latency voice conversation
+            - **OpenClaw Gateway** for extended AI capabilities (web, calendar, smart home, etc.)
+            - **Reachy Mini Robot** for physical embodiment with expressive movements
+            ### Features
+            - 🎤 Real-time voice conversation
+            - 👀 Camera-based vision
+            - 💃 Expressive robot movements
+            - 🔧 Tool integration via OpenClaw
+            - 🎭 Customizable personalities
+            ### Links
+            - [Reachy Mini SDK](https://github.com/pollen-robotics/reachy_mini)
+            - [OpenClaw](https://github.com/openclaw/openclaw)
+            - [OpenAI Realtime API](https://platform.openai.com/docs/guides/realtime)
+            """)
+    demo.launch(share=share, server_name="0.0.0.0", server_port=7860)

src/reachy_mini_openclaw/main.py ADDED Viewed

	@@ -0,0 +1,591 @@

+"""ReachyClaw - Give your OpenClaw AI agent a physical robot body.
+This module provides the main application that connects:
+- OpenAI Realtime API for voice I/O (speech recognition + TTS)
+- OpenClaw Gateway for AI intelligence (the actual brain)
+- Reachy Mini robot for physical embodiment
+Usage:
+    # Console mode (direct audio)
+    reachyclaw
+    # With Gradio UI
+    reachyclaw --gradio
+    # With debug logging
+    reachyclaw --debug
+"""
+import os
+import sys
+import time
+import asyncio
+import logging
+import argparse
+import threading
+from pathlib import Path
+from typing import Any, Optional
+from dotenv import load_dotenv
+# Load environment from project root (override=True ensures .env takes precedence)
+_project_root = Path(__file__).parent.parent.parent
+load_dotenv(_project_root / ".env", override=True)
+logger = logging.getLogger(__name__)
+def setup_logging(debug: bool = False) -> None:
+    """Configure logging for the application.
+    Args:
+        debug: Enable debug level logging
+    """
+    level = logging.DEBUG if debug else logging.INFO
+    logging.basicConfig(
+        level=level,
+        format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
+        datefmt="%H:%M:%S",
+    )
+    # Reduce noise from libraries
+    if not debug:
+        logging.getLogger("httpx").setLevel(logging.WARNING)
+        logging.getLogger("websockets").setLevel(logging.WARNING)
+        logging.getLogger("openai").setLevel(logging.WARNING)
+def parse_args() -> argparse.Namespace:
+    """Parse command line arguments.
+    Returns:
+        Parsed arguments namespace
+    """
+    parser = argparse.ArgumentParser(
+        description="ReachyClaw - Give your OpenClaw AI agent a physical robot body",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+    # Run in console mode
+    reachyclaw
+    # Run with Gradio web UI
+    reachyclaw --gradio
+    # Connect to specific robot
+    reachyclaw --robot-name my-reachy
+    # Use different OpenClaw gateway
+    reachyclaw --gateway-url http://192.168.1.100:18790
+        """
+    )
+    parser.add_argument(
+        "--debug",
+        action="store_true",
+        help="Enable debug logging"
+    )
+    parser.add_argument(
+        "--gradio",
+        action="store_true",
+        help="Launch Gradio web UI instead of console mode"
+    )
+    parser.add_argument(
+        "--robot-name",
+        type=str,
+        help="Robot name for connection (default: auto-discover)"
+    )
+    parser.add_argument(
+        "--gateway-url",
+        type=str,
+        default=os.getenv("OPENCLAW_GATEWAY_URL", "ws://localhost:18789"),
+        help="OpenClaw gateway URL (from OPENCLAW_GATEWAY_URL env or default)"
+    )
+    parser.add_argument(
+        "--no-camera",
+        action="store_true",
+        help="Disable camera functionality"
+    )
+    parser.add_argument(
+        "--no-openclaw",
+        action="store_true",
+        help="Disable OpenClaw integration"
+    )
+    parser.add_argument(
+        "--no-face-tracking",
+        action="store_true",
+        help="Disable face tracking"
+    )
+    parser.add_argument(
+        "--local-vision",
+        action="store_true",
+        help="Enable local vision processing with SmolVLM2"
+    )
+    parser.add_argument(
+        "--profile",
+        type=str,
+        help="Custom personality profile to use"
+    )
+    return parser.parse_args()
+class ReachyClawCore:
+    """ReachyClaw core application controller.
+    This class orchestrates all components:
+    - Reachy Mini robot connection and movement control
+    - OpenAI Realtime API for voice I/O
+    - OpenClaw gateway bridge for AI intelligence
+    - Audio input/output loops
+    """
+    def __init__(
+        self,
+        gateway_url: str = "ws://localhost:18789",
+        robot_name: Optional[str] = None,
+        enable_camera: bool = True,
+        enable_openclaw: bool = True,
+        robot: Optional["ReachyMini"] = None,
+        external_stop_event: Optional[threading.Event] = None,
+    ):
+        """Initialize the application.
+        Args:
+            gateway_url: OpenClaw gateway URL
+            robot_name: Optional robot name for connection
+            enable_camera: Whether to enable camera functionality
+            enable_openclaw: Whether to enable OpenClaw integration
+            robot: Optional pre-initialized robot (for app framework)
+            external_stop_event: Optional external stop event
+        """
+        from reachy_mini import ReachyMini
+        from reachy_mini_openclaw.config import config
+        from reachy_mini_openclaw.moves import MovementManager
+        from reachy_mini_openclaw.audio.head_wobbler import HeadWobbler
+        from reachy_mini_openclaw.openclaw_bridge import OpenClawBridge
+        from reachy_mini_openclaw.tools.core_tools import ToolDependencies
+        from reachy_mini_openclaw.openai_realtime import OpenAIRealtimeHandler
+        self.gateway_url = gateway_url
+        self._external_stop_event = external_stop_event
+        self._owns_robot = robot is None
+        # Validate configuration
+        errors = config.validate()
+        if errors:
+            for error in errors:
+                logger.error("Config error: %s", error)
+            sys.exit(1)
+        # Connect to robot
+        if robot is not None:
+            self.robot = robot
+            logger.info("Using provided Reachy Mini instance")
+        else:
+            logger.info("Connecting to Reachy Mini...")
+            robot_kwargs = {}
+            if robot_name:
+                robot_kwargs["robot_name"] = robot_name
+            try:
+                self.robot = ReachyMini(**robot_kwargs)
+            except TimeoutError as e:
+                logger.error("Connection timeout: %s", e)
+                logger.error("Check that the robot is powered on and reachable.")
+                sys.exit(1)
+            except Exception as e:
+                logger.error("Robot connection failed: %s", e)
+                sys.exit(1)
+            logger.info("Connected to robot: %s", self.robot.client.get_status())
+        # Initialize movement system
+        logger.info("Initializing movement system...")
+        self.movement_manager = MovementManager(current_robot=self.robot)
+        self.head_wobbler = HeadWobbler(
+            set_speech_offsets=self.movement_manager.set_speech_offsets
+        )
+        # Initialize OpenClaw bridge
+        self.openclaw_bridge = None
+        if enable_openclaw:
+            logger.info("Initializing OpenClaw bridge...")
+            self.openclaw_bridge = OpenClawBridge(
+                gateway_url=gateway_url,
+                gateway_token=config.OPENCLAW_TOKEN,
+            )
+        # Camera worker for video streaming and frame capture
+        self.camera_worker = None
+        self.head_tracker = None
+        self.vision_manager = None
+        if enable_camera:
+            logger.info("Initializing camera worker...")
+            from reachy_mini_openclaw.camera_worker import CameraWorker
+            # Initialize head tracker for local face tracking
+            if config.ENABLE_FACE_TRACKING:
+                self.head_tracker = self._initialize_head_tracker(config.HEAD_TRACKER_TYPE)
+            # Initialize camera worker with head tracker
+            self.camera_worker = CameraWorker(
+                reachy_mini=self.robot,
+                head_tracker=self.head_tracker,
+            )
+            # Enable/disable head tracking based on whether we have a tracker
+            self.camera_worker.set_head_tracking_enabled(self.head_tracker is not None)
+            # Initialize local vision processor if enabled
+            if config.ENABLE_LOCAL_VISION:
+                self.vision_manager = self._initialize_vision_manager()
+        # Create tool dependencies
+        self.deps = ToolDependencies(
+            movement_manager=self.movement_manager,
+            head_wobbler=self.head_wobbler,
+            robot=self.robot,
+            camera_worker=self.camera_worker,
+            openclaw_bridge=self.openclaw_bridge,
+            vision_manager=self.vision_manager,
+        )
+        # Initialize OpenAI Realtime handler with OpenClaw bridge
+        self.handler = OpenAIRealtimeHandler(
+            deps=self.deps,
+            openclaw_bridge=self.openclaw_bridge,
+        )
+        # State
+        self._stop_event = asyncio.Event()
+        self._tasks: list[asyncio.Task] = []
+    def _initialize_vision_manager(self) -> Optional[Any]:
+        """Initialize local vision processor (SmolVLM2).
+        Returns:
+            VisionManager instance or None if initialization fails
+        """
+        if self.camera_worker is None:
+            logger.warning("Cannot initialize vision manager without camera worker")
+            return None
+        try:
+            from reachy_mini_openclaw.vision.processors import (
+                VisionConfig,
+                initialize_vision_manager,
+            )
+            from reachy_mini_openclaw.config import config
+            vision_config = VisionConfig(
+                model_path=config.LOCAL_VISION_MODEL,
+                device_preference=config.VISION_DEVICE,
+                hf_home=config.HF_HOME,
+            )
+            logger.info("Initializing local vision processor (SmolVLM2)...")
+            vision_manager = initialize_vision_manager(self.camera_worker, vision_config)
+            if vision_manager is not None:
+                logger.info("Local vision processor initialized")
+            else:
+                logger.warning("Local vision processor failed to initialize")
+            return vision_manager
+        except ImportError as e:
+            logger.warning(f"Local vision not available: {e}")
+            logger.warning("Install with: pip install torch transformers")
+            return None
+        except Exception as e:
+            logger.error(f"Failed to initialize vision manager: {e}")
+            return None
+    def _initialize_head_tracker(self, tracker_type: Optional[str] = None) -> Optional[Any]:
+        """Initialize head tracker for local face tracking.
+        Args:
+            tracker_type: Type of tracker ("yolo", "mediapipe", or None for auto)
+        Returns:
+            Initialized head tracker or None if initialization fails
+        """
+        # Default to YOLO if not specified
+        if tracker_type is None:
+            tracker_type = "yolo"
+        if tracker_type == "yolo":
+            try:
+                from reachy_mini_openclaw.vision.yolo_head_tracker import HeadTracker
+                logger.info("Initializing YOLO face tracker...")
+                tracker = HeadTracker(device="cpu")  # CPU is fast enough for face detection
+                logger.info("YOLO face tracker initialized")
+                return tracker
+            except ImportError as e:
+                logger.warning(f"YOLO tracker not available: {e}")
+                logger.warning("Install with: pip install ultralytics supervision")
+            except Exception as e:
+                logger.error(f"Failed to initialize YOLO tracker: {e}")
+        elif tracker_type == "mediapipe":
+            try:
+                from reachy_mini_openclaw.vision.mediapipe_tracker import HeadTracker
+                logger.info("Initializing MediaPipe face tracker...")
+                tracker = HeadTracker()
+                logger.info("MediaPipe face tracker initialized")
+                return tracker
+            except ImportError as e:
+                logger.warning(f"MediaPipe tracker not available: {e}")
+            except Exception as e:
+                logger.error(f"Failed to initialize MediaPipe tracker: {e}")
+        logger.warning("No face tracker available - face tracking disabled")
+        return None
+    def _should_stop(self) -> bool:
+        """Check if we should stop."""
+        if self._stop_event.is_set():
+            return True
+        if self._external_stop_event is not None and self._external_stop_event.is_set():
+            return True
+        return False
+    async def record_loop(self) -> None:
+        """Read audio from robot microphone and send to handler."""
+        input_sr = self.robot.media.get_input_audio_samplerate()
+        logger.info("Recording at %d Hz", input_sr)
+        while not self._should_stop():
+            audio_frame = self.robot.media.get_audio_sample()
+            if audio_frame is not None:
+                await self.handler.receive((input_sr, audio_frame))
+            await asyncio.sleep(0.01)
+    async def play_loop(self) -> None:
+        """Play audio from handler through robot speakers."""
+        output_sr = self.robot.media.get_output_audio_samplerate()
+        logger.info("Playing at %d Hz", output_sr)
+        while not self._should_stop():
+            output = await self.handler.emit()
+            if output is not None:
+                if isinstance(output, tuple):
+                    input_sr, audio_data = output
+                    # Convert to float32 and normalize (OpenAI sends int16)
+                    audio_data = audio_data.flatten().astype("float32") / 32768.0
+                    # Reduce volume to prevent distortion (0.5 = 50% volume)
+                    audio_data = audio_data * 0.5
+                    # Resample if needed
+                    if input_sr != output_sr:
+                        from scipy.signal import resample
+                        num_samples = int(len(audio_data) * output_sr / input_sr)
+                        audio_data = resample(audio_data, num_samples).astype("float32")
+                    self.robot.media.push_audio_sample(audio_data)
+                # else: it's an AdditionalOutputs (transcript) - handle in UI mode
+            await asyncio.sleep(0.01)
+    async def run(self) -> None:
+        """Run the main application loop."""
+        # Test OpenClaw connection
+        if self.openclaw_bridge is not None:
+            connected = await self.openclaw_bridge.connect()
+            if connected:
+                logger.info("OpenClaw gateway connected")
+            else:
+                logger.warning("OpenClaw gateway not available - some features disabled")
+        # Enable motors and move to neutral pose
+        logger.info("Enabling motors and moving to neutral position...")
+        try:
+            self.robot.enable_motors()
+            from reachy_mini.utils import create_head_pose
+            neutral = create_head_pose(0, 0, 0, 0, 0, 0, degrees=True)
+            self.robot.goto_target(
+                head=neutral,
+                antennas=[0.0, 0.0],
+                duration=2.0,
+                body_yaw=0.0,
+            )
+            time.sleep(2)  # Wait for goto to complete
+            logger.info("Robot at neutral position with motors enabled")
+        except Exception as e:
+            logger.error("Failed to initialize robot pose: %s", e)
+        # Wire up camera worker to movement manager for face tracking
+        if self.camera_worker is not None:
+            self.movement_manager.camera_worker = self.camera_worker
+            logger.info("Face tracking connected to movement system")
+        # Start movement system
+        logger.info("Starting movement system...")
+        self.movement_manager.start()
+        self.head_wobbler.start()
+        # Start camera worker for video streaming
+        if self.camera_worker is not None:
+            logger.info("Starting camera worker...")
+            self.camera_worker.start()
+        # Start local vision processor if available
+        if self.vision_manager is not None:
+            logger.info("Starting local vision processor...")
+            self.vision_manager.start()
+        # Start audio
+        logger.info("Starting audio...")
+        self.robot.media.start_recording()
+        self.robot.media.start_playing()
+        time.sleep(1)  # Let pipelines initialize
+        logger.info("Ready! Speak to me...")
+        # Start OpenAI handler in background
+        handler_task = asyncio.create_task(self.handler.start_up(), name="openai-handler")
+        # Start audio loops
+        self._tasks = [
+            handler_task,
+            asyncio.create_task(self.record_loop(), name="record-loop"),
+            asyncio.create_task(self.play_loop(), name="play-loop"),
+        ]
+        try:
+            await asyncio.gather(*self._tasks)
+        except asyncio.CancelledError:
+            logger.info("Tasks cancelled")
+    def stop(self) -> None:
+        """Stop everything."""
+        logger.info("Stopping...")
+        self._stop_event.set()
+        # Cancel tasks
+        for task in self._tasks:
+            if not task.done():
+                task.cancel()
+        # Stop movement system
+        self.head_wobbler.stop()
+        self.movement_manager.stop()
+        # Stop vision manager
+        if self.vision_manager is not None:
+            self.vision_manager.stop()
+        # Stop camera worker
+        if self.camera_worker is not None:
+            self.camera_worker.stop()
+        # Disconnect OpenClaw bridge
+        if self.openclaw_bridge is not None:
+            try:
+                asyncio.get_event_loop().run_until_complete(
+                    self.openclaw_bridge.disconnect()
+                )
+            except Exception as e:
+                logger.debug("OpenClaw disconnect: %s", e)
+        # Close resources if we own them
+        if self._owns_robot:
+            try:
+                self.robot.media.close()
+            except Exception as e:
+                logger.debug("Media close: %s", e)
+            self.robot.client.disconnect()
+        logger.info("Stopped")
+class ReachyClawApp:
+    """ReachyClaw - Reachy Mini Apps entry point.
+    This class allows ReachyClaw to be installed and run from
+    the Reachy Mini dashboard as a Reachy Mini App.
+    """
+    # No custom settings UI
+    custom_app_url: Optional[str] = None
+    def run(self, reachy_mini, stop_event: threading.Event) -> None:
+        """Run ReachyClaw as a Reachy Mini App.
+        Args:
+            reachy_mini: Pre-initialized ReachyMini instance
+            stop_event: Threading event to signal stop
+        """
+        loop = asyncio.new_event_loop()
+        asyncio.set_event_loop(loop)
+        gateway_url = os.getenv("OPENCLAW_GATEWAY_URL", "ws://localhost:18789")
+        app = ReachyClawCore(
+            gateway_url=gateway_url,
+            robot=reachy_mini,
+            external_stop_event=stop_event,
+        )
+        try:
+            loop.run_until_complete(app.run())
+        except Exception as e:
+            logger.error("Error running app: %s", e)
+        finally:
+            app.stop()
+            loop.close()
+def main() -> None:
+    """Main entry point."""
+    args = parse_args()
+    setup_logging(args.debug)
+    # Set custom profile if specified
+    if args.profile:
+        from reachy_mini_openclaw.config import set_custom_profile
+        set_custom_profile(args.profile)
+    # Configure face tracking and local vision from args
+    from reachy_mini_openclaw.config import (
+        set_face_tracking_enabled,
+        set_local_vision_enabled,
+    )
+    if args.no_face_tracking:
+        set_face_tracking_enabled(False)
+    if args.local_vision:
+        set_local_vision_enabled(True)
+    if args.gradio:
+        # Launch Gradio UI
+        logger.info("Starting Gradio UI...")
+        from reachy_mini_openclaw.gradio_app import launch_gradio
+        launch_gradio(
+            gateway_url=args.gateway_url,
+            robot_name=args.robot_name,
+            enable_camera=not args.no_camera,
+            enable_openclaw=not args.no_openclaw,
+        )
+    else:
+        # Console mode
+        app = ReachyClawCore(
+            gateway_url=args.gateway_url,
+            robot_name=args.robot_name,
+            enable_camera=not args.no_camera,
+            enable_openclaw=not args.no_openclaw,
+        )
+        try:
+            asyncio.run(app.run())
+        except KeyboardInterrupt:
+            logger.info("Interrupted")
+        finally:
+            app.stop()
+if __name__ == "__main__":
+    main()

src/reachy_mini_openclaw/moves.py ADDED Viewed

	@@ -0,0 +1,648 @@

+"""Movement system for expressive robot control.
+This module provides a 100Hz control loop for managing robot movements,
+combining sequential primary moves (dances, emotions, head movements) with
+additive secondary moves (speech wobble, face tracking).
+Architecture:
+- Primary moves are queued and executed sequentially
+- Secondary moves are additive offsets applied on top
+- Single control point via set_target at 100Hz
+- Automatic breathing animation when idle
+Based on the movement systems from:
+- pollen-robotics/reachy_mini_conversation_app
+- eoai-dev/moltbot_body
+"""
+from __future__ import annotations
+import logging
+import threading
+import time
+from collections import deque
+from dataclasses import dataclass
+from queue import Empty, Queue
+from typing import Any, Dict, Optional, Tuple
+import numpy as np
+from numpy.typing import NDArray
+from reachy_mini import ReachyMini
+from reachy_mini.motion.move import Move
+from reachy_mini.utils import create_head_pose
+from reachy_mini.utils.interpolation import compose_world_offset, linear_pose_interpolation
+logger = logging.getLogger(__name__)
+# Configuration
+CONTROL_LOOP_FREQUENCY_HZ = 100.0
+# Type definitions
+FullBodyPose = Tuple[NDArray[np.float32], Tuple[float, float], float]
+SpeechOffsets = Tuple[float, float, float, float, float, float]
+class BreathingMove(Move):
+    """Continuous breathing animation for idle state."""
+    def __init__(
+        self,
+        interpolation_start_pose: NDArray[np.float32],
+        interpolation_start_antennas: Tuple[float, float],
+        interpolation_duration: float = 1.0,
+    ):
+        """Initialize breathing move.
+        Args:
+            interpolation_start_pose: Current head pose to interpolate from
+            interpolation_start_antennas: Current antenna positions
+            interpolation_duration: Time to blend to neutral (seconds)
+        """
+        self.interpolation_start_pose = interpolation_start_pose
+        self.interpolation_start_antennas = np.array(interpolation_start_antennas)
+        self.interpolation_duration = interpolation_duration
+        # Target neutral pose
+        self.neutral_head_pose = create_head_pose(0, 0, 0, 0, 0, 0, degrees=True)
+        self.neutral_antennas = np.array([0.0, 0.0])
+        # Breathing parameters
+        self.breathing_z_amplitude = 0.005  # 5mm gentle movement
+        self.breathing_frequency = 0.1  # Hz
+        self.antenna_sway_amplitude = np.deg2rad(15)  # degrees
+        self.antenna_frequency = 0.5  # Hz
+    @property
+    def duration(self) -> float:
+        """Duration of the move (infinite for breathing)."""
+        return float("inf")
+    def evaluate(self, t: float) -> tuple:
+        """Evaluate the breathing pose at time t."""
+        if t < self.interpolation_duration:
+            # Interpolate to neutral
+            alpha = t / self.interpolation_duration
+            head_pose = linear_pose_interpolation(
+                self.interpolation_start_pose,
+                self.neutral_head_pose,
+                alpha
+            )
+            antennas = (1 - alpha) * self.interpolation_start_antennas + alpha * self.neutral_antennas
+            antennas = antennas.astype(np.float64)
+        else:
+            # Breathing pattern
+            breathing_t = t - self.interpolation_duration
+            z_offset = self.breathing_z_amplitude * np.sin(
+                2 * np.pi * self.breathing_frequency * breathing_t
+            )
+            head_pose = create_head_pose(
+                x=0, y=0, z=z_offset,
+                roll=0, pitch=0, yaw=0,
+                degrees=True, mm=False
+            )
+            antenna_sway = self.antenna_sway_amplitude * np.sin(
+                2 * np.pi * self.antenna_frequency * breathing_t
+            )
+            antennas = np.array([antenna_sway, -antenna_sway], dtype=np.float64)
+        return (head_pose, antennas, 0.0)
+class HeadLookMove(Move):
+    """Move to look in a specific direction."""
+    DIRECTIONS = {
+        "left": (0, 0, 0, 0, 0, 30),      # yaw left
+        "right": (0, 0, 0, 0, 0, -30),    # yaw right
+        "up": (0, 0, 10, 0, 15, 0),       # pitch up, z up
+        "down": (0, 0, -5, 0, -15, 0),    # pitch down, z down
+        "front": (0, 0, 0, 0, 0, 0),      # neutral
+    }
+    def __init__(
+        self,
+        direction: str,
+        start_pose: NDArray[np.float32],
+        start_antennas: Tuple[float, float],
+        duration: float = 1.0,
+    ):
+        """Initialize head look move.
+        Args:
+            direction: One of 'left', 'right', 'up', 'down', 'front'
+            start_pose: Current head pose
+            start_antennas: Current antenna positions
+            duration: Move duration in seconds
+        """
+        self.direction = direction
+        self.start_pose = start_pose
+        self.start_antennas = np.array(start_antennas)
+        self._duration = duration
+        # Get target pose from direction
+        params = self.DIRECTIONS.get(direction, self.DIRECTIONS["front"])
+        self.target_pose = create_head_pose(
+            x=params[0], y=params[1], z=params[2],
+            roll=params[3], pitch=params[4], yaw=params[5],
+            degrees=True, mm=True
+        )
+        self.target_antennas = np.array([0.0, 0.0])
+    @property
+    def duration(self) -> float:
+        return self._duration
+    def evaluate(self, t: float) -> tuple:
+        """Evaluate pose at time t."""
+        alpha = min(1.0, t / self._duration)
+        # Smooth easing
+        alpha = alpha * alpha * (3 - 2 * alpha)
+        head_pose = linear_pose_interpolation(
+            self.start_pose,
+            self.target_pose,
+            alpha
+        )
+        antennas = (1 - alpha) * self.start_antennas + alpha * self.target_antennas
+        return (head_pose, antennas.astype(np.float64), 0.0)
+def combine_full_body(primary: FullBodyPose, secondary: FullBodyPose) -> FullBodyPose:
+    """Combine primary pose with secondary offsets."""
+    primary_head, primary_ant, primary_yaw = primary
+    secondary_head, secondary_ant, secondary_yaw = secondary
+    combined_head = compose_world_offset(primary_head, secondary_head, reorthonormalize=True)
+    combined_ant = (
+        primary_ant[0] + secondary_ant[0],
+        primary_ant[1] + secondary_ant[1],
+    )
+    combined_yaw = primary_yaw + secondary_yaw
+    return (combined_head, combined_ant, combined_yaw)
+def clone_pose(pose: FullBodyPose) -> FullBodyPose:
+    """Deep copy a full body pose."""
+    head, ant, yaw = pose
+    return (head.copy(), (float(ant[0]), float(ant[1])), float(yaw))
+@dataclass
+class MovementState:
+    """State for the movement system."""
+    current_move: Optional[Move] = None
+    move_start_time: Optional[float] = None
+    last_activity_time: float = 0.0
+    speech_offsets: SpeechOffsets = (0.0, 0.0, 0.0, 0.0, 0.0, 0.0)
+    face_tracking_offsets: SpeechOffsets = (0.0, 0.0, 0.0, 0.0, 0.0, 0.0)
+    thinking_offsets: SpeechOffsets = (0.0, 0.0, 0.0, 0.0, 0.0, 0.0)
+    last_primary_pose: Optional[FullBodyPose] = None
+    def update_activity(self) -> None:
+        self.last_activity_time = time.monotonic()
+class MovementManager:
+    """Coordinate robot movements at 100Hz.
+    This class manages:
+    - Sequential primary moves (dances, emotions, head movements)
+    - Additive secondary offsets (speech wobble, face tracking)
+    - Automatic idle breathing animation
+    - Thread-safe communication with other components
+    Example:
+        manager = MovementManager(robot)
+        manager.start()
+        # Queue a head movement
+        manager.queue_move(HeadLookMove("left", ...))
+        # Set speech offsets (called by HeadWobbler)
+        manager.set_speech_offsets((0, 0, 0.01, 0.1, 0, 0))
+        manager.stop()
+    """
+    def __init__(
+        self,
+        current_robot: ReachyMini,
+        camera_worker: Any = None,
+    ):
+        """Initialize movement manager.
+        Args:
+            current_robot: Connected ReachyMini instance
+            camera_worker: Optional camera worker for face tracking
+        """
+        self.current_robot = current_robot
+        self.camera_worker = camera_worker
+        self._now = time.monotonic
+        self.state = MovementState()
+        self.state.last_activity_time = self._now()
+        # Initialize neutral pose
+        neutral = create_head_pose(0, 0, 0, 0, 0, 0, degrees=True)
+        self.state.last_primary_pose = (neutral, (0.0, 0.0), 0.0)
+        # Move queue
+        self.move_queue: deque[Move] = deque()
+        # Configuration
+        self.idle_inactivity_delay = 0.3  # seconds before breathing starts
+        self.target_frequency = CONTROL_LOOP_FREQUENCY_HZ
+        self.target_period = 1.0 / self.target_frequency
+        # Thread state
+        self._stop_event = threading.Event()
+        self._thread: Optional[threading.Thread] = None
+        self._is_listening = False
+        self._breathing_active = False
+        # Last commanded pose for smooth transitions
+        self._last_commanded_pose = clone_pose(self.state.last_primary_pose)
+        self._listening_antennas = self._last_commanded_pose[1]
+        self._antenna_unfreeze_blend = 1.0
+        self._antenna_blend_duration = 0.4
+        # Cross-thread communication
+        self._command_queue: Queue[Tuple[str, Any]] = Queue()
+        # Speech offsets (thread-safe)
+        self._speech_lock = threading.Lock()
+        self._pending_speech_offsets: SpeechOffsets = (0.0, 0.0, 0.0, 0.0, 0.0, 0.0)
+        self._speech_dirty = False
+        # Processing/thinking animation state
+        self._processing = False
+        self._processing_start_time = 0.0
+        self._thinking_amplitude = 0.0  # 0..1 envelope for smooth fade in/out
+        self._thinking_antenna_offsets: Tuple[float, float] = (0.0, 0.0)
+        # Shared state lock
+        self._shared_lock = threading.Lock()
+        self._shared_last_activity = self.state.last_activity_time
+        self._shared_is_listening = False
+    def queue_move(self, move: Move) -> None:
+        """Queue a primary move. Thread-safe."""
+        self._command_queue.put(("queue_move", move))
+    def clear_move_queue(self) -> None:
+        """Clear all queued moves. Thread-safe."""
+        self._command_queue.put(("clear_queue", None))
+    def set_speech_offsets(self, offsets: SpeechOffsets) -> None:
+        """Update speech-driven offsets. Thread-safe."""
+        with self._speech_lock:
+            self._pending_speech_offsets = offsets
+            self._speech_dirty = True
+    def set_listening(self, listening: bool) -> None:
+        """Set listening state (freezes antennas). Thread-safe."""
+        self._command_queue.put(("set_listening", listening))
+    def set_processing(self, processing: bool) -> None:
+        """Set processing state (triggers thinking animation). Thread-safe.
+        When True, the robot shows a continuous 'thinking' animation as
+        secondary offsets -- gentle head sway and asymmetric antenna scanning.
+        Face tracking continues underneath since this is additive.
+        """
+        self._command_queue.put(("set_processing", processing))
+    def is_idle(self) -> bool:
+        """Check if robot has been idle. Thread-safe."""
+        with self._shared_lock:
+            if self._shared_is_listening:
+                return False
+            return self._now() - self._shared_last_activity >= self.idle_inactivity_delay
+    def _poll_signals(self, current_time: float) -> None:
+        """Process queued commands and pending offsets."""
+        # Apply speech offsets
+        with self._speech_lock:
+            if self._speech_dirty:
+                self.state.speech_offsets = self._pending_speech_offsets
+                self._speech_dirty = False
+                self.state.update_activity()
+        # Process commands
+        while True:
+            try:
+                cmd, payload = self._command_queue.get_nowait()
+            except Empty:
+                break
+            self._handle_command(cmd, payload, current_time)
+    def _update_face_tracking(self, current_time: float) -> None:
+        """Get face tracking offsets from camera worker thread."""
+        if self.camera_worker is not None:
+            offsets = self.camera_worker.get_face_tracking_offsets()
+            self.state.face_tracking_offsets = offsets
+        else:
+            # No camera worker, use neutral offsets
+            self.state.face_tracking_offsets = (0.0, 0.0, 0.0, 0.0, 0.0, 0.0)
+    def _update_thinking_offsets(self, current_time: float) -> None:
+        """Compute thinking animation as secondary offsets.
+        Produces a gentle head sway (yaw drift, slight upward pitch, z bob)
+        and asymmetric antenna scanning pattern. The amplitude envelope
+        smoothly ramps up over 0.5s and decays over 0.5s for organic feel.
+        """
+        # Update amplitude envelope
+        if self._processing:
+            # Ramp up over 0.5s
+            elapsed = current_time - self._processing_start_time
+            self._thinking_amplitude = min(1.0, elapsed / 0.5)
+        elif self._thinking_amplitude > 0:
+            # Smooth decay at 2.0/s (full decay in 0.5s)
+            self._thinking_amplitude = max(
+                0.0, self._thinking_amplitude - 2.0 * self.target_period
+            )
+        # If fully decayed, zero everything and bail
+        if self._thinking_amplitude < 0.001:
+            self._thinking_amplitude = 0.0
+            self.state.thinking_offsets = (0.0, 0.0, 0.0, 0.0, 0.0, 0.0)
+            self._thinking_antenna_offsets = (0.0, 0.0)
+            return
+        amp = self._thinking_amplitude
+        t = current_time - self._processing_start_time
+        # Head offsets (radians / metres -- degrees=False, mm=False)
+        # Slow yaw drift: ±12° at 0.15 Hz
+        yaw = amp * np.deg2rad(12) * np.sin(2 * np.pi * 0.15 * t)
+        # Slight upward pitch: 6° base + 3° oscillation at 0.2 Hz
+        pitch = amp * (np.deg2rad(6) + np.deg2rad(3) * np.sin(2 * np.pi * 0.2 * t))
+        # Gentle z bob: 3 mm at 0.12 Hz
+        z = amp * 0.003 * np.sin(2 * np.pi * 0.12 * t)
+        self.state.thinking_offsets = (0.0, 0.0, z, 0.0, pitch, yaw)
+        # Antenna offsets: asymmetric scan (phase offset creates "searching" feel)
+        # ±20° at 0.4 Hz, right antenna lags left by ~70° of phase
+        left_ant = amp * np.deg2rad(20) * np.sin(2 * np.pi * 0.4 * t)
+        right_ant = amp * np.deg2rad(20) * np.sin(2 * np.pi * 0.4 * t + 1.2)
+        self._thinking_antenna_offsets = (left_ant, right_ant)
+    def _handle_command(self, cmd: str, payload: Any, current_time: float) -> None:
+        """Handle a single command."""
+        if cmd == "queue_move":
+            if isinstance(payload, Move):
+                self.move_queue.append(payload)
+                self.state.update_activity()
+                logger.debug("Queued move, queue size: %d", len(self.move_queue))
+        elif cmd == "clear_queue":
+            self.move_queue.clear()
+            self.state.current_move = None
+            self.state.move_start_time = None
+            self._breathing_active = False
+            logger.info("Cleared move queue")
+        elif cmd == "set_listening":
+            desired = bool(payload)
+            if self._is_listening != desired:
+                self._is_listening = desired
+                if desired:
+                    self._listening_antennas = self._last_commanded_pose[1]
+                    self._antenna_unfreeze_blend = 0.0
+                else:
+                    self._antenna_unfreeze_blend = 0.0
+                self.state.update_activity()
+        elif cmd == "set_processing":
+            desired = bool(payload)
+            if desired and not self._processing:
+                self._processing = True
+                self._processing_start_time = self._now()
+                # Interrupt breathing so thinking animation is clean
+                if self._breathing_active and isinstance(self.state.current_move, BreathingMove):
+                    self.state.current_move = None
+                    self.state.move_start_time = None
+                    self._breathing_active = False
+                self.state.update_activity()
+                logger.debug("Processing started - thinking animation active")
+            elif not desired and self._processing:
+                self._processing = False
+                # Amplitude will decay smoothly in _update_thinking_offsets
+                self.state.update_activity()
+                logger.debug("Processing ended - thinking animation decaying")
+    def _manage_move_queue(self, current_time: float) -> None:
+        """Advance the move queue."""
+        # Check if current move is done
+        if self.state.current_move is not None and self.state.move_start_time is not None:
+            elapsed = current_time - self.state.move_start_time
+            if elapsed >= self.state.current_move.duration:
+                self.state.current_move = None
+                self.state.move_start_time = None
+        # Start next move if available
+        if self.state.current_move is None and self.move_queue:
+            self.state.current_move = self.move_queue.popleft()
+            self.state.move_start_time = current_time
+            self._breathing_active = isinstance(self.state.current_move, BreathingMove)
+            logger.debug("Starting move with duration: %s", self.state.current_move.duration)
+    def _manage_breathing(self, current_time: float) -> None:
+        """Start breathing when idle."""
+        if (
+            self.state.current_move is None
+            and not self.move_queue
+            and not self._is_listening
+            and not self._breathing_active
+            and not self._processing
+        ):
+            idle_for = current_time - self.state.last_activity_time
+            if idle_for >= self.idle_inactivity_delay:
+                try:
+                    _, current_ant = self.current_robot.get_current_joint_positions()
+                    current_head = self.current_robot.get_current_head_pose()
+                    breathing = BreathingMove(
+                        interpolation_start_pose=current_head,
+                        interpolation_start_antennas=current_ant,
+                        interpolation_duration=1.0,
+                    )
+                    self.move_queue.append(breathing)
+                    self._breathing_active = True
+                    self.state.update_activity()
+                    logger.debug("Started breathing after %.1fs idle", idle_for)
+                except Exception as e:
+                    logger.error("Failed to start breathing: %s", e)
+        # Stop breathing if new moves queued
+        if isinstance(self.state.current_move, BreathingMove) and self.move_queue:
+            self.state.current_move = None
+            self.state.move_start_time = None
+            self._breathing_active = False
+    def _get_primary_pose(self, current_time: float) -> FullBodyPose:
+        """Get current primary pose from move or last pose."""
+        if self.state.current_move is not None and self.state.move_start_time is not None:
+            t = current_time - self.state.move_start_time
+            head, antennas, body_yaw = self.state.current_move.evaluate(t)
+            if head is None:
+                head = create_head_pose(0, 0, 0, 0, 0, 0, degrees=True)
+            if antennas is None:
+                antennas = np.array([0.0, 0.0])
+            if body_yaw is None:
+                body_yaw = 0.0
+            pose = (head.copy(), (float(antennas[0]), float(antennas[1])), float(body_yaw))
+            self.state.last_primary_pose = clone_pose(pose)
+            return pose
+        if self.state.last_primary_pose is not None:
+            return clone_pose(self.state.last_primary_pose)
+        neutral = create_head_pose(0, 0, 0, 0, 0, 0, degrees=True)
+        return (neutral, (0.0, 0.0), 0.0)
+    def _get_secondary_pose(self) -> FullBodyPose:
+        """Get secondary offsets (speech + face tracking + thinking)."""
+        offsets = [
+            self.state.speech_offsets[i]
+            + self.state.face_tracking_offsets[i]
+            + self.state.thinking_offsets[i]
+            for i in range(6)
+        ]
+        secondary_head = create_head_pose(
+            x=offsets[0], y=offsets[1], z=offsets[2],
+            roll=offsets[3], pitch=offsets[4], yaw=offsets[5],
+            degrees=False, mm=False
+        )
+        return (secondary_head, self._thinking_antenna_offsets, 0.0)
+    def _compose_pose(self, current_time: float) -> FullBodyPose:
+        """Compose final pose from primary and secondary."""
+        primary = self._get_primary_pose(current_time)
+        secondary = self._get_secondary_pose()
+        return combine_full_body(primary, secondary)
+    def _blend_antennas(self, target: Tuple[float, float]) -> Tuple[float, float]:
+        """Blend antennas with listening freeze state."""
+        if self._is_listening:
+            return self._listening_antennas
+        # Blend back from freeze
+        blend = min(1.0, self._antenna_unfreeze_blend + self.target_period / self._antenna_blend_duration)
+        self._antenna_unfreeze_blend = blend
+        return (
+            self._listening_antennas[0] * (1 - blend) + target[0] * blend,
+            self._listening_antennas[1] * (1 - blend) + target[1] * blend,
+        )
+    def _issue_command(self, head: NDArray, antennas: Tuple[float, float], body_yaw: float) -> None:
+        """Send command to robot."""
+        try:
+            self.current_robot.set_target(head=head, antennas=antennas, body_yaw=body_yaw)
+            self._last_commanded_pose = (head.copy(), antennas, body_yaw)
+        except Exception as e:
+            logger.debug("set_target failed: %s", e)
+    def _publish_shared_state(self) -> None:
+        """Update shared state for external queries."""
+        with self._shared_lock:
+            self._shared_last_activity = self.state.last_activity_time
+            self._shared_is_listening = self._is_listening
+    def start(self) -> None:
+        """Start the control loop thread."""
+        if self._thread is not None and self._thread.is_alive():
+            logger.warning("MovementManager already running")
+            return
+        self._stop_event.clear()
+        self._thread = threading.Thread(target=self._run_loop, daemon=True)
+        self._thread.start()
+        logger.info("MovementManager started")
+    def stop(self) -> None:
+        """Stop the control loop and reset to neutral."""
+        if self._thread is None or not self._thread.is_alive():
+            return
+        logger.info("Stopping MovementManager...")
+        self.clear_move_queue()
+        self._stop_event.set()
+        self._thread.join(timeout=2.0)
+        self._thread = None
+        # Reset to neutral
+        try:
+            neutral = create_head_pose(0, 0, 0, 0, 0, 0, degrees=True)
+            self.current_robot.goto_target(
+                head=neutral,
+                antennas=[0.0, 0.0],
+                duration=2.0,
+                body_yaw=0.0,
+            )
+            logger.info("Reset to neutral position")
+        except Exception as e:
+            logger.error("Failed to reset: %s", e)
+    def _run_loop(self) -> None:
+        """Main control loop at 100Hz."""
+        logger.debug("Starting 100Hz control loop")
+        while not self._stop_event.is_set():
+            loop_start = self._now()
+            # Process signals
+            self._poll_signals(loop_start)
+            # Manage moves
+            self._manage_move_queue(loop_start)
+            self._manage_breathing(loop_start)
+            # Update face tracking offsets from camera worker
+            self._update_face_tracking(loop_start)
+            # Update thinking animation offsets
+            self._update_thinking_offsets(loop_start)
+            # Compose pose
+            head, antennas, body_yaw = self._compose_pose(loop_start)
+            # Blend antennas for listening
+            antennas = self._blend_antennas(antennas)
+            # Send to robot
+            self._issue_command(head, antennas, body_yaw)
+            # Update shared state
+            self._publish_shared_state()
+            # Maintain timing
+            elapsed = self._now() - loop_start
+            sleep_time = max(0.0, self.target_period - elapsed)
+            if sleep_time > 0:
+                time.sleep(sleep_time)
+        logger.debug("Control loop stopped")
+    def get_status(self) -> Dict[str, Any]:
+        """Get current status for debugging."""
+        return {
+            "queue_size": len(self.move_queue),
+            "is_listening": self._is_listening,
+            "breathing_active": self._breathing_active,
+            "processing": self._processing,
+            "thinking_amplitude": round(self._thinking_amplitude, 3),
+            "last_commanded_pose": {
+                "head": self._last_commanded_pose[0].tolist(),
+                "antennas": self._last_commanded_pose[1],
+                "body_yaw": self._last_commanded_pose[2],
+            },
+        }

src/reachy_mini_openclaw/openai_realtime.py ADDED Viewed

	@@ -0,0 +1,562 @@

+"""ReachyClaw - OpenAI Realtime API handler with OpenClaw identity.
+This module implements ReachyClaw's voice conversation system using OpenAI Realtime API
+with the robot embodying the actual OpenClaw agent's personality and context.
+Architecture:
+    Startup: Fetch OpenClaw agent context (personality, memories, user info)
+    Runtime: User speaks -> OpenAI Realtime (as OpenClaw agent) -> Robot speaks
+             -> Tools for movements + OpenClaw queries for extended capabilities
+             -> Conversations synced back to OpenClaw for memory continuity
+The robot IS the OpenClaw agent - same personality, same memories, same context.
+"""
+import json
+import base64
+import random
+import asyncio
+import logging
+from typing import Any, Final, Literal, Optional, Tuple
+from datetime import datetime
+import numpy as np
+from numpy.typing import NDArray
+from openai import AsyncOpenAI
+from fastrtc import AdditionalOutputs, AsyncStreamHandler, wait_for_item
+from scipy.signal import resample
+from websockets.exceptions import ConnectionClosedError
+from reachy_mini_openclaw.config import config
+from reachy_mini_openclaw.prompts import get_session_voice
+from reachy_mini_openclaw.tools.core_tools import ToolDependencies, get_tool_specs, dispatch_tool_call, get_body_actions_description
+logger = logging.getLogger(__name__)
+# OpenAI Realtime API audio format
+OPENAI_SAMPLE_RATE: Final[Literal[24000]] = 24000
+# Base instructions for the robot body capabilities
+ROBOT_BODY_INSTRUCTIONS = """
+## CRITICAL: You are a voice relay for the OpenClaw agent
+You are the voice interface for an OpenClaw AI agent embodied in a Reachy Mini robot.
+You MUST call `ask_openclaw` for EVERY user message to get the real response.
+**Your ONLY job is:**
+1. When the user says something, IMMEDIATELY call `ask_openclaw` with their full message.
+2. Speak the response from `ask_openclaw` EXACTLY as returned — do not rephrase, add to, or summarize it.
+**You MUST NOT:**
+- Answer any question yourself — ALWAYS use `ask_openclaw` first.
+- Make up information, opinions, or responses on your own.
+- Summarize or modify what `ask_openclaw` returns — speak it verbatim.
+- Say things like "let me check" and then answer without calling `ask_openclaw`.
+**Robot Movement:**
+- Do NOT call look, emotion, dance, or camera tools yourself.
+- OpenClaw controls the robot body — movements are handled automatically from its response.
+**Conversation Style for Voice:**
+- Keep it natural — you are speaking out loud
+- If ask_openclaw is slow or errors, say "I'm having trouble reaching my brain, one moment"
+"""
+# Fallback if OpenClaw context fetch fails
+FALLBACK_IDENTITY = """You are the voice relay for an OpenClaw AI agent embodied in a Reachy Mini robot.
+You MUST call ask_openclaw for every user message and speak the response verbatim.
+Never answer on your own — always defer to ask_openclaw."""
+# System context sent to OpenClaw so it knows about the robot body.
+# Built dynamically from TOOL_SPECS so the action list stays in sync.
+REACHY_BODY_CONTEXT = f"""\
+User is talking to you through your Reachy Mini robot body. Keep responses concise for voice.
+You can control your robot body by including action tags anywhere in your response.
+The tags will be executed and stripped before your words are spoken aloud.
+Available actions:
+{get_body_actions_description()}
+Examples:
+  "Sure, let me look over there. [LOOK:left] I see a bookshelf!"
+  "[EMOTION:happy] That's great to hear!"
+  "[DANCE:excited] Let's celebrate!"
+Use actions naturally to make the conversation more expressive. You don't have to use them every time — only when it adds to the interaction."""
+class OpenAIRealtimeHandler(AsyncStreamHandler):
+    """Handler for OpenAI Realtime API embodying the OpenClaw agent.
+    This handler:
+    - Fetches OpenClaw's personality and context at startup
+    - Maintains voice conversation AS the OpenClaw agent
+    - Executes robot movement tools locally for low latency
+    - Calls OpenClaw for extended capabilities (web, calendar, memory)
+    - Syncs conversations back to OpenClaw for memory continuity
+    """
+    def __init__(
+        self,
+        deps: ToolDependencies,
+        openclaw_bridge: Optional[Any] = None,
+        gradio_mode: bool = False,
+    ):
+        """Initialize the handler.
+        Args:
+            deps: Tool dependencies for robot control
+            openclaw_bridge: Bridge to OpenClaw gateway
+            gradio_mode: Whether running with Gradio UI
+        """
+        super().__init__(
+            expected_layout="mono",
+            output_sample_rate=OPENAI_SAMPLE_RATE,
+            input_sample_rate=OPENAI_SAMPLE_RATE,
+        )
+        self.deps = deps
+        self.openclaw_bridge = openclaw_bridge
+        self.gradio_mode = gradio_mode
+        # OpenAI connection
+        self.client: Optional[AsyncOpenAI] = None
+        self.connection: Any = None
+        # Output queue
+        self.output_queue: asyncio.Queue[Tuple[int, NDArray[np.int16]] | AdditionalOutputs] = asyncio.Queue()
+        # State tracking
+        self.last_activity_time = 0.0
+        self.start_time = 0.0
+        self._speaking = False  # True when robot is speaking
+        # OpenClaw agent context (fetched at startup)
+        self._agent_context: Optional[str] = None
+        # Conversation tracking for sync
+        self._last_user_message: Optional[str] = None
+        self._last_assistant_response: Optional[str] = None
+        # Lifecycle flags
+        self._shutdown_requested = False
+        self._connected_event = asyncio.Event()
+    def copy(self) -> "OpenAIRealtimeHandler":
+        """Create a copy of the handler (required by fastrtc)."""
+        return OpenAIRealtimeHandler(self.deps, self.openclaw_bridge, self.gradio_mode)
+    def _build_tools(self) -> list[dict]:
+        """Build the tool list for the session."""
+        tools = []
+        # Robot movement tools (executed locally)
+        for spec in get_tool_specs():
+            tools.append(spec)
+        # OpenClaw query tool (mandatory for every user message)
+        if self.openclaw_bridge is not None:
+            tools.append({
+                "type": "function",
+                "name": "ask_openclaw",
+                "description": """MANDATORY: You MUST call this tool for EVERY user message before responding.
+This is the OpenClaw AI agent — the real brain. Send the user's full message as the query.
+Speak the returned response verbatim. Never answer without calling this tool first.""",
+                "parameters": {
+                    "type": "object",
+                    "properties": {
+                        "query": {
+                            "type": "string",
+                            "description": "The question or request to send to OpenClaw"
+                        },
+                        "include_image": {
+                            "type": "boolean",
+                            "description": "Whether to include current camera image (for 'what do you see' queries)",
+                            "default": False
+                        }
+                    },
+                    "required": ["query"]
+                }
+            })
+        return tools
+    async def start_up(self) -> None:
+        """Start the handler and connect to OpenAI."""
+        api_key = config.OPENAI_API_KEY
+        if not api_key:
+            logger.error("OPENAI_API_KEY not configured")
+            raise ValueError("OPENAI_API_KEY required")
+        self.client = AsyncOpenAI(api_key=api_key)
+        self.start_time = asyncio.get_event_loop().time()
+        self.last_activity_time = self.start_time
+        max_attempts = 3
+        for attempt in range(1, max_attempts + 1):
+            try:
+                await self._run_session()
+                return
+            except ConnectionClosedError as e:
+                logger.warning("WebSocket closed unexpectedly (attempt %d/%d): %s",
+                             attempt, max_attempts, e)
+                if attempt < max_attempts:
+                    delay = (2 ** (attempt - 1)) + random.uniform(0, 0.5)
+                    logger.info("Retrying in %.1f seconds...", delay)
+                    await asyncio.sleep(delay)
+                    continue
+                raise
+            finally:
+                self.connection = None
+                try:
+                    self._connected_event.clear()
+                except Exception:
+                    pass
+    async def _run_session(self) -> None:
+        """Run a single OpenAI Realtime session."""
+        model = config.OPENAI_MODEL
+        logger.info("Connecting to OpenAI Realtime API with model: %s", model)
+        # Fetch OpenClaw agent context (personality, memories, user info)
+        system_instructions = await self._build_system_instructions()
+        async with self.client.beta.realtime.connect(model=model) as conn:
+            # Configure session with OpenClaw's identity + robot body capabilities
+            tools = self._build_tools()
+            await conn.session.update(
+                session={
+                    "modalities": ["text", "audio"],
+                    "instructions": system_instructions,
+                    "voice": get_session_voice(),
+                    "input_audio_format": "pcm16",
+                    "output_audio_format": "pcm16",
+                    "input_audio_transcription": {
+                        "model": "whisper-1",
+                    },
+                    "turn_detection": {
+                        "type": "server_vad",
+                        "threshold": 0.5,
+                        "prefix_padding_ms": 300,
+                        "silence_duration_ms": 600,
+                    },
+                    "tools": tools,
+                    "tool_choice": "auto",
+                },
+            )
+            logger.info("OpenAI Realtime session configured with %d tools", len(tools))
+            self.connection = conn
+            self._connected_event.set()
+            # Process events
+            async for event in conn:
+                await self._handle_event(event)
+    async def _build_system_instructions(self) -> str:
+        """Build system instructions for the voice relay.
+        GPT-4o is a dumb relay — it only needs instructions on how to
+        call ask_openclaw and speak the result. No personality context needed.
+        """
+        return ROBOT_BODY_INSTRUCTIONS
+    async def _handle_event(self, event: Any) -> None:
+        """Handle an event from the OpenAI Realtime API."""
+        event_type = event.type
+        # Speech detection
+        if event_type == "input_audio_buffer.speech_started":
+            # User started speaking - stop any current output
+            self._speaking = False
+            self.deps.movement_manager.set_processing(False)
+            while not self.output_queue.empty():
+                try:
+                    self.output_queue.get_nowait()
+                except asyncio.QueueEmpty:
+                    break
+            if self.deps.head_wobbler is not None:
+                self.deps.head_wobbler.reset()
+            self.deps.movement_manager.set_listening(True)
+            logger.info("User started speaking")
+        if event_type == "input_audio_buffer.speech_stopped":
+            self.deps.movement_manager.set_listening(False)
+            logger.info("User stopped speaking")
+        # Transcription (for logging, UI, and sync)
+        if event_type == "conversation.item.input_audio_transcription.completed":
+            transcript = event.transcript
+            if transcript and transcript.strip():
+                logger.info("User: %s", transcript)
+                self._last_user_message = transcript  # Track for sync
+                await self.output_queue.put(
+                    AdditionalOutputs({"role": "user", "content": transcript})
+                )
+        # Response started - robot is about to speak
+        if event_type == "response.created":
+            self._speaking = True
+            logger.debug("Response started")
+        # Audio output from TTS
+        if event_type == "response.audio.delta":
+            # Audio arriving means we have a response - stop thinking animation
+            self.deps.movement_manager.set_processing(False)
+            # Feed to head wobbler for expressive movement
+            if self.deps.head_wobbler is not None:
+                self.deps.head_wobbler.feed(event.delta)
+            self.last_activity_time = asyncio.get_event_loop().time()
+            # Queue audio for playback
+            audio_data = np.frombuffer(
+                base64.b64decode(event.delta),
+                dtype=np.int16
+            ).reshape(1, -1)
+            await self.output_queue.put((OPENAI_SAMPLE_RATE, audio_data))
+        # Response text (for logging and UI)
+        if event_type == "response.audio_transcript.delta":
+            # Streaming transcript of what's being said
+            pass  # Could log incrementally if needed
+        if event_type == "response.audio_transcript.done":
+            response_text = event.transcript
+            logger.info("Assistant: %s", response_text[:100] if len(response_text) > 100 else response_text)
+            self._last_assistant_response = response_text  # Track for sync
+            await self.output_queue.put(
+                AdditionalOutputs({"role": "assistant", "content": response_text})
+            )
+        # Response completed - sync conversation to OpenClaw
+        if event_type == "response.done":
+            self._speaking = False
+            self.deps.movement_manager.set_processing(False)
+            if self.deps.head_wobbler is not None:
+                self.deps.head_wobbler.reset()
+            logger.debug("Response completed")
+            # Sync conversation to OpenClaw for memory continuity
+            await self._sync_to_openclaw()
+        # Tool calls
+        if event_type == "response.function_call_arguments.done":
+            await self._handle_tool_call(event)
+        # Errors
+        if event_type == "error":
+            err = getattr(event, "error", None)
+            msg = getattr(err, "message", str(err))
+            code = getattr(err, "code", "")
+            logger.error("OpenAI error [%s]: %s", code, msg)
+    async def _handle_tool_call(self, event: Any) -> None:
+        """Handle a tool call from OpenAI."""
+        tool_name = getattr(event, "name", None)
+        args_json = getattr(event, "arguments", None)
+        call_id = getattr(event, "call_id", None)
+        if not isinstance(tool_name, str) or not isinstance(args_json, str):
+            return
+        logger.info("Tool call: %s(%s)", tool_name, args_json[:50] if len(args_json) > 50 else args_json)
+        # Start thinking animation while we process the tool call.
+        # It will stop when the next audio delta arrives or response completes.
+        self.deps.movement_manager.set_processing(True)
+        try:
+            if tool_name == "ask_openclaw":
+                result = await self._handle_openclaw_query(args_json)
+            else:
+                # Robot movement tools - dispatch locally
+                result = await dispatch_tool_call(tool_name, args_json, self.deps)
+            logger.debug("Tool '%s' result: %s", tool_name, str(result)[:100])
+        except Exception as e:
+            logger.error("Tool '%s' failed: %s", tool_name, e)
+            result = {"error": str(e)}
+        # Send result back to continue the conversation
+        if isinstance(call_id, str) and self.connection:
+            await self.connection.conversation.item.create(
+                item={
+                    "type": "function_call_output",
+                    "call_id": call_id,
+                    "output": json.dumps(result),
+                }
+            )
+            # Trigger response generation after tool result
+            await self.connection.response.create()
+    async def _sync_to_openclaw(self) -> None:
+        """Sync the last conversation turn to OpenClaw for memory continuity."""
+        if not self.openclaw_bridge or not self.openclaw_bridge.is_connected:
+            return
+        if self._last_user_message and self._last_assistant_response:
+            try:
+                await self.openclaw_bridge.sync_conversation(
+                    self._last_user_message,
+                    self._last_assistant_response
+                )
+                # Clear after sync
+                self._last_user_message = None
+                self._last_assistant_response = None
+            except Exception as e:
+                logger.debug("Failed to sync conversation: %s", e)
+    async def _handle_openclaw_query(self, args_json: str) -> dict:
+        """Handle a query to OpenClaw."""
+        if self.openclaw_bridge is None or not self.openclaw_bridge.is_connected:
+            return {"error": "OpenClaw not connected"}
+        try:
+            args = json.loads(args_json)
+            query = args.get("query", "")
+            include_image = args.get("include_image", False)
+            # Capture image if requested
+            image_b64 = None
+            if include_image and self.deps.camera_worker:
+                frame = self.deps.camera_worker.get_latest_frame()
+                if frame is not None:
+                    import cv2
+                    _, buffer = cv2.imencode('.jpg', frame, [cv2.IMWRITE_JPEG_QUALITY, 80])
+                    image_b64 = base64.b64encode(buffer).decode('utf-8')
+                    logger.debug("Captured camera image for OpenClaw query")
+            # Query OpenClaw
+            response = await self.openclaw_bridge.chat(
+                query,
+                image_b64=image_b64,
+                system_context=REACHY_BODY_CONTEXT,
+            )
+            if response.error:
+                return {"error": response.error}
+            # Parse and execute any action commands from OpenClaw's response
+            spoken_text = await self._execute_body_actions(response.content)
+            return {"response": spoken_text}
+        except Exception as e:
+            logger.error("OpenClaw query failed: %s", e)
+            return {"error": str(e)}
+    async def _execute_body_actions(self, text: str) -> str:
+        """Parse action tags from OpenClaw's response, execute them, and return clean text.
+        Supported tags:
+            [LOOK:direction]  - Move head (left/right/up/down/front)
+            [EMOTION:name]    - Express emotion (happy/sad/surprised/curious/thinking/confused/excited)
+            [DANCE:name]      - Perform dance (happy/excited/wave/nod/shake/bounce)
+            [CAMERA]          - Capture and describe what the robot sees
+            [FACE_TRACKING:on/off] - Toggle face tracking
+            [STOP]            - Stop all movements
+        """
+        import re
+        action_pattern = re.compile(
+            r'\[(LOOK|EMOTION|DANCE|FACE_TRACKING):(\w+)\]'
+            r'|\[(CAMERA|STOP)\]'
+        )
+        actions_found = []
+        for match in action_pattern.finditer(text):
+            if match.group(3):
+                # No-arg action: [CAMERA] or [STOP]
+                actions_found.append((match.group(3), None))
+            else:
+                # Parameterized action: [LOOK:left], etc.
+                actions_found.append((match.group(1), match.group(2)))
+        # Execute actions
+        for action, param in actions_found:
+            try:
+                if action == "LOOK":
+                    await dispatch_tool_call("look", json.dumps({"direction": param}), self.deps)
+                elif action == "EMOTION":
+                    await dispatch_tool_call("emotion", json.dumps({"emotion_name": param}), self.deps)
+                elif action == "DANCE":
+                    await dispatch_tool_call("dance", json.dumps({"dance_name": param}), self.deps)
+                elif action == "CAMERA":
+                    await dispatch_tool_call("camera", json.dumps({}), self.deps)
+                elif action == "FACE_TRACKING":
+                    enabled = param.lower() in ("on", "true", "yes")
+                    await dispatch_tool_call("face_tracking", json.dumps({"enabled": enabled}), self.deps)
+                elif action == "STOP":
+                    await dispatch_tool_call("stop_moves", json.dumps({}), self.deps)
+                logger.info("Executed body action: %s(%s)", action, param)
+            except Exception as e:
+                logger.warning("Body action %s(%s) failed: %s", action, param, e)
+        # Strip action tags from text so GPT-4o only speaks the words
+        spoken_text = action_pattern.sub('', text).strip()
+        # Clean up extra whitespace left by removed tags
+        spoken_text = re.sub(r'  +', ' ', spoken_text)
+        return spoken_text
+    async def receive(self, frame: Tuple[int, NDArray]) -> None:
+        """Receive audio from the robot microphone."""
+        if not self.connection:
+            return
+        input_sr, audio = frame
+        # Handle stereo
+        if audio.ndim == 2:
+            if audio.shape[1] > audio.shape[0]:
+                audio = audio.T
+            if audio.shape[1] > 1:
+                audio = audio[:, 0]
+        audio = audio.flatten()
+        # Convert to float for resampling
+        if audio.dtype == np.int16:
+            audio = audio.astype(np.float32) / 32768.0
+        elif audio.dtype != np.float32:
+            audio = audio.astype(np.float32)
+        # Resample to OpenAI sample rate
+        if input_sr != OPENAI_SAMPLE_RATE:
+            num_samples = int(len(audio) * OPENAI_SAMPLE_RATE / input_sr)
+            audio = resample(audio, num_samples).astype(np.float32)
+        # Convert to int16 for OpenAI
+        audio_int16 = (audio * 32767).astype(np.int16)
+        # Send to OpenAI
+        try:
+            audio_b64 = base64.b64encode(audio_int16.tobytes()).decode("utf-8")
+            await self.connection.input_audio_buffer.append(audio=audio_b64)
+        except Exception as e:
+            logger.debug("Failed to send audio: %s", e)
+    async def emit(self) -> Tuple[int, NDArray[np.int16]] | AdditionalOutputs | None:
+        """Get the next output (audio or transcript)."""
+        return await wait_for_item(self.output_queue)
+    async def shutdown(self) -> None:
+        """Shutdown the handler."""
+        self._shutdown_requested = True
+        if self.connection:
+            try:
+                await self.connection.close()
+            except Exception as e:
+                logger.debug("Connection close: %s", e)
+            self.connection = None
+        while not self.output_queue.empty():
+            try:
+                self.output_queue.get_nowait()
+            except asyncio.QueueEmpty:
+                break

src/reachy_mini_openclaw/openclaw_bridge.py ADDED Viewed

	@@ -0,0 +1,606 @@

+"""ReachyClaw - Bridge to OpenClaw Gateway for AI responses.
+This module provides ReachyClaw's integration with the OpenClaw gateway
+using the WebSocket protocol (the gateway's native transport).
+ReachyClaw uses OpenAI Realtime API for voice I/O (speech recognition + TTS)
+but routes all responses through OpenClaw for intelligence.
+"""
+import json
+import asyncio
+import logging
+import uuid
+from typing import Optional, Any, AsyncIterator
+from dataclasses import dataclass
+import websockets
+from reachy_mini_openclaw.config import config
+logger = logging.getLogger(__name__)
+# Protocol version supported by this client
+PROTOCOL_VERSION = 3
+@dataclass
+class OpenClawResponse:
+    """Response from OpenClaw gateway."""
+    content: str
+    error: Optional[str] = None
+class OpenClawBridge:
+    """Bridge to OpenClaw Gateway using WebSocket protocol.
+    The OpenClaw gateway speaks WebSocket with a JSON frame protocol.
+    This class handles the connect handshake, authentication, and
+    chat operations.
+    Example:
+        bridge = OpenClawBridge()
+        await bridge.connect()
+        # Simple query
+        response = await bridge.chat("Hello!")
+        print(response.content)
+    """
+    def __init__(
+        self,
+        gateway_url: Optional[str] = None,
+        gateway_token: Optional[str] = None,
+        agent_id: Optional[str] = None,
+        timeout: float = 120.0,
+    ):
+        """Initialize the OpenClaw bridge.
+        Args:
+            gateway_url: URL of the OpenClaw gateway (default: from env/config).
+                         Accepts http:// or ws:// schemes; http is converted to ws.
+            gateway_token: Authentication token (default: from env/config)
+            agent_id: OpenClaw agent ID to use (default: from env/config)
+            timeout: Request timeout in seconds
+        """
+        import os
+        raw_url = (
+            gateway_url
+            or os.getenv("OPENCLAW_GATEWAY_URL")
+            or config.OPENCLAW_GATEWAY_URL
+        )
+        # Normalise to ws:// (the gateway listens on the same port for both)
+        self.gateway_url = self._normalise_ws_url(raw_url)
+        self.gateway_token = (
+            gateway_token
+            or os.getenv("OPENCLAW_TOKEN")
+            or config.OPENCLAW_TOKEN
+        )
+        self.agent_id = (
+            agent_id
+            or os.getenv("OPENCLAW_AGENT_ID")
+            or config.OPENCLAW_AGENT_ID
+        )
+        self.timeout = timeout
+        # Session key – "main" shares context with WhatsApp and other channels.
+        # Full key format: agent:<agent_id>:<session_key>
+        self.session_key = (
+            os.getenv("OPENCLAW_SESSION_KEY")
+            or config.OPENCLAW_SESSION_KEY
+            or "main"
+        )
+        # Persistent WebSocket state
+        self._ws: Optional[websockets.WebSocketClientProtocol] = None
+        self._connected = False
+        self._conn_id: Optional[str] = None
+        # Background listener task & pending request futures
+        self._listener_task: Optional[asyncio.Task] = None
+        self._pending: dict[str, asyncio.Future] = {}
+        # Events keyed by runId -> list of event payloads
+        self._run_events: dict[str, asyncio.Queue] = {}
+    # ------------------------------------------------------------------
+    # URL helpers
+    # ------------------------------------------------------------------
+    @staticmethod
+    def _normalise_ws_url(url: str) -> str:
+        """Convert http(s) URL to ws(s)."""
+        if url.startswith("http://"):
+            return "ws://" + url[7:]
+        if url.startswith("https://"):
+            return "wss://" + url[8:]
+        if not url.startswith("ws://") and not url.startswith("wss://"):
+            return "ws://" + url
+        return url
+    # ------------------------------------------------------------------
+    # Connection lifecycle
+    # ------------------------------------------------------------------
+    async def connect(self) -> bool:
+        """Connect to the OpenClaw gateway and authenticate.
+        Returns:
+            True if connection successful, False otherwise
+        """
+        logger.info(
+            "Connecting to OpenClaw at %s (token: %s)",
+            self.gateway_url,
+            "set" if self.gateway_token else "not set",
+        )
+        try:
+            self._ws = await websockets.connect(
+                self.gateway_url,
+                ping_interval=20,
+                ping_timeout=30,
+                close_timeout=5,
+            )
+            # 1. Receive challenge
+            raw = await asyncio.wait_for(self._ws.recv(), timeout=10)
+            challenge = json.loads(raw)
+            if challenge.get("event") != "connect.challenge":
+                logger.warning("Unexpected first frame: %s", challenge.get("event"))
+            # 2. Send connect request
+            req_id = str(uuid.uuid4())
+            connect_req = {
+                "type": "req",
+                "id": req_id,
+                "method": "connect",
+                "params": {
+                    "minProtocol": PROTOCOL_VERSION,
+                    "maxProtocol": PROTOCOL_VERSION,
+                    "auth": {"token": self.gateway_token} if self.gateway_token else {},
+                    "client": {
+                        "id": "cli",
+                        "version": "1.0.0",
+                        "platform": "darwin",
+                        "mode": "cli",
+                    },
+                    "role": "operator",
+                    "scopes": ["chat", "operator.write", "operator.read"],
+                },
+            }
+            await self._ws.send(json.dumps(connect_req))
+            # 3. Read hello response
+            raw = await asyncio.wait_for(self._ws.recv(), timeout=10)
+            hello = json.loads(raw)
+            if hello.get("ok"):
+                self._connected = True
+                payload = hello.get("payload", {})
+                server = payload.get("server", {})
+                self._conn_id = server.get("connId")
+                logger.info(
+                    "Connected to OpenClaw gateway (server=%s, connId=%s)",
+                    server.get("host", "?"),
+                    self._conn_id,
+                )
+                # Start background listener
+                self._listener_task = asyncio.create_task(
+                    self._listen_loop(), name="openclaw-ws-listener"
+                )
+                return True
+            else:
+                err = hello.get("error", {})
+                logger.error(
+                    "OpenClaw connect failed: %s - %s",
+                    err.get("code"),
+                    err.get("message"),
+                )
+                await self._close_ws()
+                return False
+        except Exception as e:
+            logger.error(
+                "Failed to connect to OpenClaw gateway: %s (%s)",
+                e,
+                type(e).__name__,
+            )
+            await self._close_ws()
+            return False
+    async def disconnect(self) -> None:
+        """Disconnect from the gateway."""
+        self._connected = False
+        if self._listener_task and not self._listener_task.done():
+            self._listener_task.cancel()
+            try:
+                await self._listener_task
+            except (asyncio.CancelledError, Exception):
+                pass
+        await self._close_ws()
+    async def _close_ws(self) -> None:
+        self._connected = False
+        if self._ws:
+            try:
+                await self._ws.close()
+            except Exception:
+                pass
+            self._ws = None
+    # ------------------------------------------------------------------
+    # Background listener
+    # ------------------------------------------------------------------
+    async def _listen_loop(self) -> None:
+        """Background task that reads all frames from the WebSocket."""
+        try:
+            async for raw in self._ws:
+                try:
+                    msg = json.loads(raw)
+                except json.JSONDecodeError:
+                    continue
+                await self._dispatch(msg)
+        except websockets.ConnectionClosed as e:
+            logger.warning("OpenClaw WebSocket closed: %s", e)
+        except asyncio.CancelledError:
+            return
+        except Exception as e:
+            logger.error("OpenClaw listener error: %s", e)
+        finally:
+            self._connected = False
+    async def _dispatch(self, msg: dict) -> None:
+        """Route an incoming frame to the right handler."""
+        msg_type = msg.get("type")
+        if msg_type == "res":
+            # Response to a request we sent
+            req_id = msg.get("id")
+            fut = self._pending.pop(req_id, None)
+            if fut and not fut.done():
+                fut.set_result(msg)
+        elif msg_type == "event":
+            event_name = msg.get("event", "")
+            payload = msg.get("payload", {})
+            # Route agent / chat events to the correct run queue
+            run_id = payload.get("runId")
+            if run_id and run_id in self._run_events:
+                await self._run_events[run_id].put(msg)
+            # Ignore noisy events silently
+            if event_name in ("health", "tick"):
+                return
+            logger.debug("Event: %s (runId=%s)", event_name, run_id)
+    # ------------------------------------------------------------------
+    # Request helpers
+    # ------------------------------------------------------------------
+    async def _send_request(
+        self, method: str, params: dict, timeout: Optional[float] = None
+    ) -> dict:
+        """Send a request and wait for the response.
+        Args:
+            method: The RPC method name
+            params: The params dict
+            timeout: Override timeout (defaults to self.timeout)
+        Returns:
+            The full response message dict
+        """
+        if not self._ws or not self._connected:
+            return {"ok": False, "error": {"code": "NOT_CONNECTED", "message": "Not connected"}}
+        req_id = str(uuid.uuid4())
+        req = {"type": "req", "id": req_id, "method": method, "params": params}
+        fut: asyncio.Future = asyncio.get_event_loop().create_future()
+        self._pending[req_id] = fut
+        try:
+            await self._ws.send(json.dumps(req))
+            result = await asyncio.wait_for(fut, timeout=timeout or self.timeout)
+            return result
+        except asyncio.TimeoutError:
+            self._pending.pop(req_id, None)
+            return {"ok": False, "error": {"code": "TIMEOUT", "message": "Request timed out"}}
+        except Exception as e:
+            self._pending.pop(req_id, None)
+            return {"ok": False, "error": {"code": "ERROR", "message": str(e)}}
+    def _full_session_key(self) -> str:
+        """Build the full session key: agent:<agentId>:<sessionKey>."""
+        return f"agent:{self.agent_id}:{self.session_key}"
+    # ------------------------------------------------------------------
+    # Chat API
+    # ------------------------------------------------------------------
+    async def chat(
+        self,
+        message: str,
+        image_b64: Optional[str] = None,
+        system_context: Optional[str] = None,
+    ) -> OpenClawResponse:
+        """Send a message to OpenClaw and get a response.
+        OpenClaw maintains conversation memory on its end, so it will be aware
+        of conversations from other channels (WhatsApp, web, etc.). We only send
+        the current message and let OpenClaw handle the context.
+        Args:
+            message: The user's message (transcribed speech)
+            image_b64: Optional base64-encoded image from robot camera (not yet
+                       supported over WebSocket chat.send – reserved for future)
+            system_context: Optional additional system context (prepended to message)
+        Returns:
+            OpenClawResponse with the AI's response
+        """
+        if not self._connected:
+            return OpenClawResponse(content="", error="Not connected to OpenClaw")
+        # Prefix system context if provided
+        final_message = message
+        if system_context:
+            final_message = f"[System: {system_context}]\n\n{message}"
+        # If image provided, mention it (WebSocket protocol uses string messages;
+        # image passing would require a separate mechanism)
+        if image_b64:
+            final_message = f"[Image attached]\n{final_message}"
+        idempotency_key = str(uuid.uuid4())
+        session_key = self._full_session_key()
+        # Create a queue to collect events for this run
+        # We'll get the runId from the response
+        params = {
+            "idempotencyKey": idempotency_key,
+            "sessionKey": session_key,
+            "message": final_message,
+        }
+        try:
+            # Send the request
+            resp = await self._send_request("chat.send", params, timeout=30)
+            if not resp.get("ok"):
+                err = resp.get("error", {})
+                error_msg = f"{err.get('code', 'UNKNOWN')}: {err.get('message', 'Unknown error')}"
+                logger.error("chat.send failed: %s", error_msg)
+                return OpenClawResponse(content="", error=error_msg)
+            run_id = resp.get("payload", {}).get("runId")
+            if not run_id:
+                return OpenClawResponse(content="", error="No runId in response")
+            # Register a queue to receive events for this run
+            event_queue: asyncio.Queue = asyncio.Queue()
+            self._run_events[run_id] = event_queue
+            try:
+                # Collect the streamed response
+                full_text = ""
+                while True:
+                    try:
+                        event = await asyncio.wait_for(
+                            event_queue.get(), timeout=self.timeout
+                        )
+                        payload = event.get("payload", {})
+                        event_name = event.get("event", "")
+                        if event_name == "agent":
+                            stream = payload.get("stream")
+                            data = payload.get("data", {})
+                            if stream == "assistant":
+                                # Accumulate the full text
+                                full_text = data.get("text", full_text)
+                            elif stream == "lifecycle" and data.get("phase") == "end":
+                                # Run completed
+                                break
+                        elif event_name == "chat":
+                            state = payload.get("state")
+                            if state == "final":
+                                # Extract final text
+                                msg_payload = payload.get("message", {})
+                                content_parts = msg_payload.get("content", [])
+                                if isinstance(content_parts, list):
+                                    for part in content_parts:
+                                        if isinstance(part, dict) and part.get("type") == "text":
+                                            full_text = part.get("text", full_text)
+                                elif isinstance(content_parts, str):
+                                    full_text = content_parts
+                                break
+                    except asyncio.TimeoutError:
+                        logger.warning("Timeout waiting for chat response (runId=%s)", run_id)
+                        if full_text:
+                            break
+                        return OpenClawResponse(content="", error="Response timeout")
+                return OpenClawResponse(content=full_text)
+            finally:
+                self._run_events.pop(run_id, None)
+        except Exception as e:
+            logger.error("OpenClaw chat error: %s", e)
+            return OpenClawResponse(content="", error=str(e))
+    async def stream_chat(
+        self,
+        message: str,
+        image_b64: Optional[str] = None,
+    ) -> AsyncIterator[str]:
+        """Stream a response from OpenClaw.
+        Args:
+            message: The user's message
+            image_b64: Optional base64-encoded image
+        Yields:
+            String chunks of the response as they arrive
+        """
+        if not self._connected:
+            yield "[Error: Not connected to OpenClaw]"
+            return
+        final_message = message
+        if image_b64:
+            final_message = f"[Image attached]\n{message}"
+        params = {
+            "idempotencyKey": str(uuid.uuid4()),
+            "sessionKey": self._full_session_key(),
+            "message": final_message,
+        }
+        try:
+            resp = await self._send_request("chat.send", params, timeout=30)
+            if not resp.get("ok"):
+                err = resp.get("error", {})
+                yield f"[Error: {err.get('message', 'Unknown error')}]"
+                return
+            run_id = resp.get("payload", {}).get("runId")
+            if not run_id:
+                yield "[Error: No runId]"
+                return
+            event_queue: asyncio.Queue = asyncio.Queue()
+            self._run_events[run_id] = event_queue
+            try:
+                prev_text = ""
+                while True:
+                    try:
+                        event = await asyncio.wait_for(
+                            event_queue.get(), timeout=self.timeout
+                        )
+                        payload = event.get("payload", {})
+                        event_name = event.get("event", "")
+                        if event_name == "agent":
+                            stream = payload.get("stream")
+                            data = payload.get("data", {})
+                            if stream == "assistant":
+                                delta = data.get("delta", "")
+                                if delta:
+                                    yield delta
+                            elif stream == "lifecycle" and data.get("phase") == "end":
+                                break
+                        elif event_name == "chat" and payload.get("state") == "final":
+                            break
+                    except asyncio.TimeoutError:
+                        yield "[Error: timeout]"
+                        break
+            finally:
+                self._run_events.pop(run_id, None)
+        except Exception as e:
+            logger.error("OpenClaw streaming error: %s", e)
+            yield f"[Error: {e}]"
+    @property
+    def is_connected(self) -> bool:
+        """Check if bridge is connected to gateway."""
+        return self._connected
+    async def get_agent_context(self) -> Optional[str]:
+        """Fetch the agent's current context, personality, and memory summary.
+        This asks OpenClaw to provide a summary of:
+        - The agent's personality and identity
+        - Recent conversation context
+        - Important memories about the user
+        - Current state
+        Returns:
+            A context string to use as system instructions, or None if failed
+        """
+        try:
+            response = await self.chat(
+                message="Provide your current context summary for the robot body.",
+                system_context=(
+                    "You are being asked to provide your current context for your robot body. "
+                    "Output a comprehensive context summary that another AI can use to embody you. Include: "
+                    "1. YOUR IDENTITY: Who you are, your name, your personality traits, how you speak. "
+                    "2. USER CONTEXT: What you know about the user (name, preferences, relationship). "
+                    "3. RECENT CONTEXT: Summary of recent conversations or important ongoing topics. "
+                    "4. MEMORIES: Key things you remember that are relevant to interactions. "
+                    "5. CURRENT STATE: Any relevant time/date awareness, ongoing tasks. "
+                    "Be specific and personal. This context will be used by your robot body to speak and act AS YOU. "
+                    "Output ONLY the context summary, no preamble."
+                ),
+            )
+            if response.error:
+                logger.warning("Failed to get agent context: %s", response.error)
+                return None
+            if response.content:
+                logger.info(
+                    "Retrieved agent context from OpenClaw (%d chars)",
+                    len(response.content),
+                )
+                return response.content
+            logger.warning("No context returned from OpenClaw")
+            return None
+        except Exception as e:
+            logger.error("Failed to get agent context: %s", e)
+            return None
+    async def sync_conversation(
+        self, user_message: str, assistant_response: str
+    ) -> None:
+        """Sync a conversation turn back to OpenClaw for memory continuity.
+        Args:
+            user_message: What the user said
+            assistant_response: What the robot/AI responded
+        """
+        try:
+            await self.chat(
+                message=(
+                    f"[ROBOT BODY SYNC] The following happened through the Reachy Mini robot:\n"
+                    f"User said: {user_message}\n"
+                    f"You responded: {assistant_response}\n"
+                    f"Remember this as part of your ongoing conversation."
+                ),
+                system_context=(
+                    "[ROBOT BODY SYNC] The following conversation happened through your "
+                    "Reachy Mini robot body. Remember it as part of your ongoing conversation "
+                    "with the user."
+                ),
+            )
+            logger.debug("Synced conversation to OpenClaw")
+        except Exception as e:
+            logger.debug("Failed to sync conversation: %s", e)
+# Global bridge instance (lazy initialization)
+_bridge: Optional[OpenClawBridge] = None
+def get_bridge() -> OpenClawBridge:
+    """Get the global OpenClaw bridge instance."""
+    global _bridge
+    if _bridge is None:
+        _bridge = OpenClawBridge()
+    return _bridge

src/reachy_mini_openclaw/prompts.py ADDED Viewed

	@@ -0,0 +1,98 @@

+"""Prompt management for the robot assistant.
+Handles loading and customizing system prompts for the OpenAI Realtime session.
+"""
+import logging
+from pathlib import Path
+from typing import Optional
+from reachy_mini_openclaw.config import config
+logger = logging.getLogger(__name__)
+# Default prompts directory
+PROMPTS_DIR = Path(__file__).parent / "prompts"
+def get_session_instructions() -> str:
+    """Get the system instructions for the OpenAI Realtime session.
+    Loads from custom profile if configured, otherwise uses default.
+    Returns:
+        System instructions string
+    """
+    # Check for custom profile
+    custom_profile = config.CUSTOM_PROFILE
+    if custom_profile:
+        custom_path = PROMPTS_DIR / f"{custom_profile}.txt"
+        if custom_path.exists():
+            try:
+                instructions = custom_path.read_text(encoding="utf-8")
+                logger.info("Loaded custom profile: %s", custom_profile)
+                return instructions
+            except Exception as e:
+                logger.warning("Failed to load custom profile %s: %s", custom_profile, e)
+    # Load default
+    default_path = PROMPTS_DIR / "default.txt"
+    if default_path.exists():
+        try:
+            return default_path.read_text(encoding="utf-8")
+        except Exception as e:
+            logger.warning("Failed to load default prompt: %s", e)
+    # Fallback inline prompt
+    return """You are a friendly AI assistant with a robot body. You can see, hear, and move expressively.
+Be conversational and use your movement capabilities to be engaging.
+Use the camera tool when asked about your surroundings.
+Express emotions through movement to enhance communication."""
+def get_session_voice() -> str:
+    """Get the voice to use for the OpenAI Realtime session.
+    Returns:
+        Voice name string
+    """
+    return config.OPENAI_VOICE
+def get_available_profiles() -> list[str]:
+    """Get list of available prompt profiles.
+    Returns:
+        List of profile names (without .txt extension)
+    """
+    profiles = []
+    if PROMPTS_DIR.exists():
+        for path in PROMPTS_DIR.glob("*.txt"):
+            profiles.append(path.stem)
+    return sorted(profiles)
+def save_custom_profile(name: str, instructions: str) -> bool:
+    """Save a custom prompt profile.
+    Args:
+        name: Profile name (alphanumeric and underscores only)
+        instructions: The prompt instructions
+    Returns:
+        True if saved successfully
+    """
+    # Validate name
+    if not name or not name.replace("_", "").isalnum():
+        logger.error("Invalid profile name: %s", name)
+        return False
+    try:
+        PROMPTS_DIR.mkdir(parents=True, exist_ok=True)
+        profile_path = PROMPTS_DIR / f"{name}.txt"
+        profile_path.write_text(instructions, encoding="utf-8")
+        logger.info("Saved custom profile: %s", name)
+        return True
+    except Exception as e:
+        logger.error("Failed to save profile %s: %s", name, e)
+        return False

src/reachy_mini_openclaw/prompts/default.txt ADDED Viewed

	@@ -0,0 +1,15 @@

+You are an OpenClaw AI agent embodied in a physical Reachy Mini robot through ReachyClaw. You have a robot body that lets you see through a camera, hear through a microphone, speak through a speaker, and move expressively.
+## Your Robot Capabilities
+1. **Vision**: You can see through your camera. Use the camera tool to look at things when asked.
+2. **Movement**: You can look in different directions (left, right, up, down, front) to show attention.
+3. **Emotions**: Express emotions through movement (happy, sad, surprised, curious, thinking, confused, excited).
+4. **Dances**: Perform dances to celebrate or entertain.
+5. **Face Tracking**: Automatically look at people's faces when enabled.
+## Guidelines
+- Be conversational and natural — keep responses concise for voice
+- Use your body expressively — look at things you discuss, show emotions
+- When asked to see something, use your camera
+- Keep responses short for natural conversation flow
+- Your movements complement your speech — be expressive!

src/reachy_mini_openclaw/tools/__init__.py ADDED Viewed

	@@ -0,0 +1,17 @@

+"""Tool definitions for Reachy Mini OpenClaw.
+These tools are exposed to the OpenAI Realtime API and allow the assistant
+to control the robot and interact with the environment.
+"""
+from reachy_mini_openclaw.tools.core_tools import (
+    ToolDependencies,
+    get_tool_specs,
+    dispatch_tool_call,
+)
+__all__ = [
+    "ToolDependencies",
+    "get_tool_specs",
+    "dispatch_tool_call",
+]

src/reachy_mini_openclaw/tools/core_tools.py ADDED Viewed

	@@ -0,0 +1,421 @@

+"""Core tool definitions for the ReachyClaw robot.
+These tools allow the OpenClaw agent (in a robot body) to control
+robot movements and capture images.
+Tool Categories:
+1. Movement Tools - Control head position, play emotions/dances
+2. Vision Tools - Capture and analyze camera images
+"""
+import json
+import logging
+import base64
+from dataclasses import dataclass
+from typing import Any, Optional, TYPE_CHECKING
+import numpy as np
+if TYPE_CHECKING:
+    from reachy_mini_openclaw.moves import MovementManager, HeadLookMove
+    from reachy_mini_openclaw.audio.head_wobbler import HeadWobbler
+    from reachy_mini_openclaw.openclaw_bridge import OpenClawBridge
+logger = logging.getLogger(__name__)
+@dataclass
+class ToolDependencies:
+    """Dependencies required by tools.
+    This dataclass holds references to robot systems that tools need
+    to interact with.
+    """
+    movement_manager: "MovementManager"
+    head_wobbler: "HeadWobbler"
+    robot: Any  # ReachyMini instance
+    camera_worker: Optional[Any] = None
+    openclaw_bridge: Optional["OpenClawBridge"] = None
+    vision_manager: Optional[Any] = None  # Local vision processor (SmolVLM2)
+# Tool specifications in OpenAI format
+TOOL_SPECS = [
+    {
+        "type": "function",
+        "name": "look",
+        "description": "Move the robot's head to look in a specific direction. Use this to direct attention or emphasize a point.",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "direction": {
+                    "type": "string",
+                    "enum": ["left", "right", "up", "down", "front"],
+                    "description": "The direction to look. 'front' returns to neutral position."
+                }
+            },
+            "required": ["direction"]
+        }
+    },
+    {
+        "type": "function",
+        "name": "camera",
+        "description": "Capture an image from the robot's camera to see what's in front of you. Use this when asked about your surroundings or to identify objects/people.",
+        "parameters": {
+            "type": "object",
+            "properties": {},
+            "required": []
+        }
+    },
+    {
+        "type": "function",
+        "name": "face_tracking",
+        "description": "Enable or disable face tracking. When enabled, the robot will automatically look at detected faces.",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "enabled": {
+                    "type": "boolean",
+                    "description": "True to enable face tracking, False to disable"
+                }
+            },
+            "required": ["enabled"]
+        }
+    },
+    {
+        "type": "function",
+        "name": "dance",
+        "description": "Perform a dance animation. Use this to express joy, celebrate, or entertain.",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "dance_name": {
+                    "type": "string",
+                    "enum": ["happy", "excited", "wave", "nod", "shake", "bounce"],
+                    "description": "The dance to perform"
+                }
+            },
+            "required": ["dance_name"]
+        }
+    },
+    {
+        "type": "function",
+        "name": "emotion",
+        "description": "Express an emotion through movement. Use this to show reactions and feelings.",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "emotion_name": {
+                    "type": "string",
+                    "enum": ["happy", "sad", "surprised", "curious", "thinking", "confused", "excited"],
+                    "description": "The emotion to express"
+                }
+            },
+            "required": ["emotion_name"]
+        }
+    },
+    {
+        "type": "function",
+        "name": "stop_moves",
+        "description": "Stop all current movements and clear the movement queue.",
+        "parameters": {
+            "type": "object",
+            "properties": {},
+            "required": []
+        }
+    },
+    {
+        "type": "function",
+        "name": "idle",
+        "description": "Do nothing and remain idle. Use this when you want to stay still.",
+        "parameters": {
+            "type": "object",
+            "properties": {},
+            "required": []
+        }
+    },
+]
+def get_tool_specs() -> list[dict]:
+    """Get the list of tool specifications for OpenAI.
+    Returns:
+        List of tool specification dictionaries
+    """
+    return TOOL_SPECS
+# Mapping from tool names to action tag names used by OpenClaw
+_TOOL_TO_TAG = {
+    "look": ("LOOK", "direction"),
+    "emotion": ("EMOTION", "emotion_name"),
+    "dance": ("DANCE", "dance_name"),
+    "camera": ("CAMERA", None),
+    "face_tracking": ("FACE_TRACKING", None),  # special: on/off
+    "stop_moves": ("STOP", None),
+}
+def get_body_actions_description() -> str:
+    """Build a description of available robot body actions from TOOL_SPECS.
+    Returns a string listing all action tags and their valid values,
+    derived directly from TOOL_SPECS so it stays in sync automatically.
+    """
+    specs_by_name = {s["name"]: s for s in TOOL_SPECS}
+    lines = []
+    for tool_name, (tag, param_key) in _TOOL_TO_TAG.items():
+        spec = specs_by_name.get(tool_name)
+        if spec is None:
+            continue
+        props = spec["parameters"].get("properties", {})
+        if param_key and param_key in props:
+            # Enum-based param: list all values
+            values = props[param_key].get("enum", [])
+            tags = "  ".join(f"[{tag}:{v}]" for v in values)
+            lines.append(f"  {tags}")
+        elif tool_name == "face_tracking":
+            lines.append(f"  [{tag}:on]  [{tag}:off]")
+        else:
+            # No-param action
+            desc = spec.get("description", "")
+            lines.append(f"  [{tag}] — {desc}")
+    return "\n".join(lines)
+async def dispatch_tool_call(
+    tool_name: str,
+    arguments_json: str,
+    deps: ToolDependencies,
+) -> dict[str, Any]:
+    """Dispatch a tool call to the appropriate handler.
+    Args:
+        tool_name: Name of the tool to execute
+        arguments_json: JSON string of tool arguments
+        deps: Tool dependencies
+    Returns:
+        Dictionary with tool result
+    """
+    try:
+        args = json.loads(arguments_json) if arguments_json else {}
+    except json.JSONDecodeError:
+        return {"error": f"Invalid JSON arguments: {arguments_json}"}
+    handlers = {
+        "look": _handle_look,
+        "camera": _handle_camera,
+        "face_tracking": _handle_face_tracking,
+        "dance": _handle_dance,
+        "emotion": _handle_emotion,
+        "stop_moves": _handle_stop_moves,
+        "idle": _handle_idle,
+    }
+    handler = handlers.get(tool_name)
+    if handler is None:
+        return {"error": f"Unknown tool: {tool_name}"}
+    try:
+        return await handler(args, deps)
+    except Exception as e:
+        logger.error("Tool '%s' failed: %s", tool_name, e, exc_info=True)
+        return {"error": str(e)}
+async def _handle_look(args: dict, deps: ToolDependencies) -> dict:
+    """Handle the look tool."""
+    from reachy_mini_openclaw.moves import HeadLookMove
+    direction = args.get("direction", "front")
+    try:
+        # Get current pose for smooth transition
+        _, current_ant = deps.robot.get_current_joint_positions()
+        current_head = deps.robot.get_current_head_pose()
+        move = HeadLookMove(
+            direction=direction,
+            start_pose=current_head,
+            start_antennas=tuple(current_ant),
+            duration=1.0,
+        )
+        deps.movement_manager.queue_move(move)
+        return {"status": "success", "direction": direction}
+    except Exception as e:
+        return {"error": str(e)}
+async def _handle_camera(args: dict, deps: ToolDependencies) -> dict:
+    """Handle the camera tool - capture image and get description.
+    Uses local vision (SmolVLM2) if available, otherwise falls back to OpenClaw.
+    """
+    logger.info("Camera tool called, camera_worker=%s, vision_manager=%s",
+                deps.camera_worker is not None, deps.vision_manager is not None)
+    if deps.camera_worker is None:
+        logger.warning("Camera worker is None")
+        return {"error": "Camera not available"}
+    try:
+        frame = deps.camera_worker.get_latest_frame()
+        logger.info("Got frame from camera_worker: %s", frame is not None)
+        if frame is None:
+            # Try getting frame directly from robot as fallback
+            logger.info("Trying direct robot camera access...")
+            if deps.robot is not None:
+                try:
+                    frame = deps.robot.media.get_frame()
+                    logger.info("Direct frame capture: %s", frame is not None)
+                except Exception as e:
+                    logger.error("Direct frame capture failed: %s", e)
+        if frame is None:
+            return {"error": "No frame available from camera"}
+        logger.info("Got frame, shape=%s", frame.shape)
+        # Option 1: Use local vision processor (SmolVLM2) if available
+        if deps.vision_manager is not None:
+            logger.info("Using local vision processor (SmolVLM2)...")
+            description = deps.vision_manager.process_now(
+                "Describe what you see in this image. Be specific about people, objects, and the environment. Keep it concise (2-3 sentences)."
+            )
+            if description and not description.startswith(("Vision", "Failed", "Error", "GPU", "No camera")):
+                logger.info("Local vision response: %s", description[:100])
+                return {
+                    "status": "success",
+                    "description": description,
+                    "source": "local_vision"
+                }
+            else:
+                logger.warning("Local vision failed: %s", description)
+        # Option 2: Fall back to OpenClaw for vision analysis
+        if deps.openclaw_bridge is not None and deps.openclaw_bridge.is_connected:
+            logger.info("Using OpenClaw for vision analysis...")
+            import cv2
+            _, buffer = cv2.imencode('.jpg', frame, [cv2.IMWRITE_JPEG_QUALITY, 85])
+            b64_image = base64.b64encode(buffer).decode('utf-8')
+            response = await deps.openclaw_bridge.chat(
+                "Describe what you see in this image. Be specific about people, objects, and the environment. Keep it concise (2-3 sentences).",
+                image_b64=b64_image,
+                system_context="You are looking through your robot camera. Describe what you see naturally, as if you're the one looking.",
+            )
+            if response.content and not response.error:
+                logger.info("OpenClaw vision response: %s", response.content[:100])
+                return {
+                    "status": "success",
+                    "description": response.content,
+                    "source": "openclaw"
+                }
+            else:
+                logger.warning("OpenClaw vision failed: %s", response.error)
+        # Fallback if neither is available
+        return {
+            "status": "partial",
+            "description": "I captured an image but couldn't analyze it. No vision processing available."
+        }
+    except Exception as e:
+        logger.error("Camera tool error: %s", e, exc_info=True)
+        return {"error": str(e)}
+async def _handle_face_tracking(args: dict, deps: ToolDependencies) -> dict:
+    """Handle face tracking toggle."""
+    enabled = args.get("enabled", False)
+    if deps.camera_worker is None:
+        return {"error": "Camera not available for face tracking"}
+    try:
+        # Check if head tracker is available
+        if deps.camera_worker.head_tracker is None:
+            return {"error": "Face tracking not available - no head tracker initialized"}
+        deps.camera_worker.set_head_tracking_enabled(enabled)
+        return {"status": "success", "face_tracking": enabled}
+    except Exception as e:
+        return {"error": str(e)}
+async def _handle_dance(args: dict, deps: ToolDependencies) -> dict:
+    """Handle dance tool."""
+    dance_name = args.get("dance_name", "happy")
+    try:
+        # Try to use dance library if available
+        from reachy_mini_dances_library import dances
+        if hasattr(dances, dance_name):
+            dance_class = getattr(dances, dance_name)
+            dance_move = dance_class()
+            deps.movement_manager.queue_move(dance_move)
+            return {"status": "success", "dance": dance_name}
+        else:
+            # Fallback to simple head movement
+            return await _handle_emotion({"emotion_name": dance_name}, deps)
+    except ImportError:
+        # No dance library, use emotion as fallback
+        return await _handle_emotion({"emotion_name": dance_name}, deps)
+    except Exception as e:
+        return {"error": str(e)}
+async def _handle_emotion(args: dict, deps: ToolDependencies) -> dict:
+    """Handle emotion expression."""
+    from reachy_mini_openclaw.moves import HeadLookMove
+    emotion_name = args.get("emotion_name", "happy")
+    # Map emotions to simple head movements
+    emotion_sequences = {
+        "happy": ["up", "front"],
+        "sad": ["down"],
+        "surprised": ["up", "front"],
+        "curious": ["right", "left", "front"],
+        "thinking": ["up", "left"],
+        "confused": ["left", "right", "front"],
+        "excited": ["up", "down", "up", "front"],
+    }
+    sequence = emotion_sequences.get(emotion_name, ["front"])
+    try:
+        for direction in sequence:
+            _, current_ant = deps.robot.get_current_joint_positions()
+            current_head = deps.robot.get_current_head_pose()
+            move = HeadLookMove(
+                direction=direction,
+                start_pose=current_head,
+                start_antennas=tuple(current_ant),
+                duration=0.5,
+            )
+            deps.movement_manager.queue_move(move)
+        return {"status": "success", "emotion": emotion_name}
+    except Exception as e:
+        return {"error": str(e)}
+async def _handle_stop_moves(args: dict, deps: ToolDependencies) -> dict:
+    """Stop all movements."""
+    deps.movement_manager.clear_move_queue()
+    return {"status": "success", "message": "All movements stopped"}
+async def _handle_idle(args: dict, deps: ToolDependencies) -> dict:
+    """Do nothing - explicitly stay idle."""
+    return {"status": "success", "message": "Staying idle"}

src/reachy_mini_openclaw/vision/__init__.py ADDED Viewed

	@@ -0,0 +1,18 @@

+"""Vision modules for face tracking, detection, and image understanding."""
+from reachy_mini_openclaw.vision.head_tracker import get_head_tracker
+__all__ = [
+    "get_head_tracker",
+]
+# Lazy imports for optional heavy dependencies
+def get_vision_processor():
+    """Get the VisionProcessor class (requires torch, transformers)."""
+    from reachy_mini_openclaw.vision.processors import VisionProcessor
+    return VisionProcessor
+def get_vision_manager():
+    """Get the VisionManager class (requires torch, transformers)."""
+    from reachy_mini_openclaw.vision.processors import VisionManager
+    return VisionManager

src/reachy_mini_openclaw/vision/head_tracker.py ADDED Viewed

	@@ -0,0 +1,70 @@

+"""Head tracker factory for selecting the best available tracker."""
+import logging
+from typing import Any, Optional
+logger = logging.getLogger(__name__)
+def get_head_tracker(tracker_type: Optional[str] = None) -> Optional[Any]:
+    """Get a head tracker instance based on availability and preference.
+    Args:
+        tracker_type: One of 'yolo', 'mediapipe', or None for auto-detect
+    Returns:
+        Head tracker instance or None if no tracker available
+    """
+    if tracker_type == "yolo":
+        return _try_yolo_tracker()
+    elif tracker_type == "mediapipe":
+        return _try_mediapipe_tracker()
+    elif tracker_type is None:
+        # Auto-detect: try MediaPipe first (lighter), then YOLO
+        tracker = _try_mediapipe_tracker()
+        if tracker is not None:
+            return tracker
+        return _try_yolo_tracker()
+    else:
+        logger.warning(f"Unknown tracker type: {tracker_type}")
+        return None
+def _try_yolo_tracker() -> Optional[Any]:
+    """Try to create a YOLO head tracker."""
+    try:
+        from reachy_mini_openclaw.vision.yolo_head_tracker import HeadTracker
+        tracker = HeadTracker()
+        logger.info("Using YOLO head tracker")
+        return tracker
+    except ImportError as e:
+        logger.debug(f"YOLO tracker not available: {e}")
+        return None
+    except Exception as e:
+        logger.warning(f"Failed to initialize YOLO tracker: {e}")
+        return None
+def _try_mediapipe_tracker() -> Optional[Any]:
+    """Try to create a MediaPipe head tracker."""
+    try:
+        # First try the toolbox version
+        from reachy_mini_toolbox.vision import HeadTracker
+        tracker = HeadTracker()
+        logger.info("Using MediaPipe head tracker (from toolbox)")
+        return tracker
+    except ImportError:
+        pass
+    try:
+        # Fall back to our own MediaPipe implementation
+        from reachy_mini_openclaw.vision.mediapipe_tracker import HeadTracker
+        tracker = HeadTracker()
+        logger.info("Using MediaPipe head tracker")
+        return tracker
+    except ImportError as e:
+        logger.debug(f"MediaPipe tracker not available: {e}")
+        return None
+    except Exception as e:
+        logger.warning(f"Failed to initialize MediaPipe tracker: {e}")
+        return None

src/reachy_mini_openclaw/vision/mediapipe_tracker.py ADDED Viewed

	@@ -0,0 +1,112 @@

+"""MediaPipe-based head tracker for face detection.
+Uses MediaPipe Face Detection for lightweight face tracking.
+Falls back to this if YOLO is not available.
+"""
+from __future__ import annotations
+import logging
+from typing import Tuple, Optional
+import numpy as np
+from numpy.typing import NDArray
+try:
+    import mediapipe as mp
+except ImportError as e:
+    raise ImportError(
+        "To use MediaPipe head tracker, install: pip install mediapipe"
+    ) from e
+logger = logging.getLogger(__name__)
+class HeadTracker:
+    """Lightweight head tracker using MediaPipe for face detection."""
+    def __init__(
+        self,
+        min_detection_confidence: float = 0.5,
+        model_selection: int = 0,
+    ) -> None:
+        """Initialize MediaPipe-based head tracker.
+        Args:
+            min_detection_confidence: Minimum confidence for face detection
+            model_selection: 0 for short-range (2m), 1 for long-range (5m)
+        """
+        self.min_detection_confidence = min_detection_confidence
+        # Initialize MediaPipe Face Detection
+        self.mp_face_detection = mp.solutions.face_detection
+        self.face_detection = self.mp_face_detection.FaceDetection(
+            min_detection_confidence=min_detection_confidence,
+            model_selection=model_selection,
+        )
+        logger.info("MediaPipe face detection initialized")
+    def get_head_position(
+        self, img: NDArray[np.uint8]
+    ) -> Tuple[Optional[NDArray[np.float32]], Optional[float]]:
+        """Get head position from face detection.
+        Args:
+            img: Input image (BGR format)
+        Returns:
+            Tuple of (eye_center in [-1,1] coords, roll_angle in radians)
+        """
+        h, w = img.shape[:2]
+        try:
+            # Convert BGR to RGB for MediaPipe
+            rgb_img = img[:, :, ::-1]
+            # Run face detection
+            results = self.face_detection.process(rgb_img)
+            if not results.detections:
+                return None, None
+            # Get the first (most confident) detection
+            detection = results.detections[0]
+            # Get bounding box
+            bbox = detection.location_data.relative_bounding_box
+            # Calculate center of face
+            center_x = bbox.xmin + bbox.width / 2
+            center_y = bbox.ymin + bbox.height / 2
+            # Convert to [-1, 1] range
+            norm_x = center_x * 2.0 - 1.0
+            norm_y = center_y * 2.0 - 1.0
+            face_center = np.array([norm_x, norm_y], dtype=np.float32)
+            # Estimate roll from key points if available
+            roll = 0.0
+            keypoints = detection.location_data.relative_keypoints
+            if len(keypoints) >= 2:
+                # Use left and right eye positions to estimate roll
+                left_eye = keypoints[0]  # LEFT_EYE
+                right_eye = keypoints[1]  # RIGHT_EYE
+                dx = right_eye.x - left_eye.x
+                dy = right_eye.y - left_eye.y
+                roll = np.arctan2(dy, dx)
+            logger.debug(f"Face detected at ({norm_x:.2f}, {norm_y:.2f}), roll: {np.degrees(roll):.1f}°")
+            return face_center, roll
+        except Exception as e:
+            logger.error(f"Error in head position detection: {e}")
+            return None, None
+    def __del__(self):
+        """Clean up MediaPipe resources."""
+        if hasattr(self, 'face_detection'):
+            self.face_detection.close()

src/reachy_mini_openclaw/vision/processors.py ADDED Viewed

	@@ -0,0 +1,419 @@

+"""Local vision processing with SmolVLM2.
+Provides on-device image understanding using the SmolVLM2 model
+for scene description and visual analysis.
+Based on pollen-robotics/reachy_mini_conversation_app vision processors.
+"""
+import os
+import time
+import base64
+import logging
+import threading
+from typing import Any, Dict, Optional
+from dataclasses import dataclass, field
+import cv2
+import numpy as np
+from numpy.typing import NDArray
+try:
+    import torch
+    from transformers import AutoProcessor, AutoModelForImageTextToText
+    from huggingface_hub import snapshot_download
+    VISION_AVAILABLE = True
+except ImportError:
+    VISION_AVAILABLE = False
+logger = logging.getLogger(__name__)
+@dataclass
+class VisionConfig:
+    """Configuration for vision processing."""
+    model_path: str = "HuggingFaceTB/SmolVLM2-256M-Video-Instruct"
+    vision_interval: float = 5.0
+    max_new_tokens: int = 64
+    jpeg_quality: int = 85
+    max_retries: int = 3
+    retry_delay: float = 1.0
+    device_preference: str = "auto"  # "auto", "cuda", "mps", "cpu"
+    hf_home: str = field(default_factory=lambda: os.path.expanduser("~/.cache/huggingface"))
+class VisionProcessor:
+    """Handles SmolVLM2 model loading and inference for local vision."""
+    def __init__(self, vision_config: Optional[VisionConfig] = None):
+        """Initialize the vision processor.
+        Args:
+            vision_config: Vision configuration settings
+        """
+        if not VISION_AVAILABLE:
+            raise ImportError(
+                "Vision processing requires: pip install torch transformers huggingface-hub"
+            )
+        self.vision_config = vision_config or VisionConfig()
+        self.model_path = self.vision_config.model_path
+        self.device = self._determine_device()
+        self.processor = None
+        self.model = None
+        self._initialized = False
+    def _determine_device(self) -> str:
+        """Determine the best device for inference."""
+        pref = self.vision_config.device_preference
+        if pref == "cpu":
+            return "cpu"
+        if pref == "cuda":
+            return "cuda" if torch.cuda.is_available() else "cpu"
+        if pref == "mps":
+            return "mps" if torch.backends.mps.is_available() else "cpu"
+        # auto: prefer mps on Apple, then cuda, else cpu
+        if torch.backends.mps.is_available():
+            return "mps"
+        return "cuda" if torch.cuda.is_available() else "cpu"
+    def initialize(self) -> bool:
+        """Load model and processor onto the selected device.
+        Returns:
+            True if initialization successful, False otherwise
+        """
+        try:
+            cache_dir = self.vision_config.hf_home
+            os.makedirs(cache_dir, exist_ok=True)
+            os.environ["HF_HOME"] = cache_dir
+            logger.info(f"Loading SmolVLM2 model on {self.device} (HF_HOME={cache_dir})")
+            # Download model to cache first
+            logger.info(f"Downloading vision model {self.model_path}...")
+            snapshot_download(
+                repo_id=self.model_path,
+                repo_type="model",
+                cache_dir=cache_dir,
+            )
+            self.processor = AutoProcessor.from_pretrained(self.model_path)
+            # Select dtype depending on device
+            if self.device == "cuda":
+                dtype = torch.bfloat16
+            elif self.device == "mps":
+                dtype = torch.float32  # best for MPS
+            else:
+                dtype = torch.float32
+            model_kwargs: Dict[str, Any] = {"torch_dtype": dtype}
+            # flash_attention_2 is CUDA-only; skip on MPS/CPU
+            if self.device == "cuda":
+                model_kwargs["_attn_implementation"] = "flash_attention_2"
+            # Load model weights
+            self.model = AutoModelForImageTextToText.from_pretrained(
+                self.model_path, **model_kwargs
+            ).to(self.device)
+            if self.model is not None:
+                self.model.eval()
+                self._initialized = True
+                logger.info(f"Vision model loaded successfully on {self.device}")
+                return True
+        except Exception as e:
+            logger.error(f"Failed to initialize vision model: {e}")
+            return False
+        return False
+    def process_image(
+        self,
+        cv2_image: NDArray[np.uint8],
+        prompt: str = "Briefly describe what you see in one sentence.",
+    ) -> str:
+        """Process CV2 image and return description with retry logic.
+        Args:
+            cv2_image: OpenCV image (BGR format)
+            prompt: Question/prompt to ask about the image
+        Returns:
+            Text description of the image
+        """
+        if not self._initialized or self.processor is None or self.model is None:
+            return "Vision model not initialized"
+        for attempt in range(self.vision_config.max_retries):
+            try:
+                # Convert to JPEG bytes
+                success, jpeg_buffer = cv2.imencode(
+                    ".jpg",
+                    cv2_image,
+                    [cv2.IMWRITE_JPEG_QUALITY, self.vision_config.jpeg_quality],
+                )
+                if not success:
+                    return "Failed to encode image"
+                # Convert to base64
+                image_base64 = base64.b64encode(jpeg_buffer.tobytes()).decode("utf-8")
+                messages = [
+                    {
+                        "role": "user",
+                        "content": [
+                            {
+                                "type": "image",
+                                "url": f"data:image/jpeg;base64,{image_base64}",
+                            },
+                            {"type": "text", "text": prompt},
+                        ],
+                    },
+                ]
+                inputs = self.processor.apply_chat_template(
+                    messages,
+                    add_generation_prompt=True,
+                    tokenize=True,
+                    return_dict=True,
+                    return_tensors="pt",
+                )
+                # Move tensors to device WITHOUT forcing dtype (keeps input_ids as torch.long)
+                inputs = {
+                    k: (v.to(self.device) if hasattr(v, "to") else v)
+                    for k, v in inputs.items()
+                }
+                with torch.no_grad():
+                    generated_ids = self.model.generate(
+                        **inputs,
+                        do_sample=False,
+                        max_new_tokens=self.vision_config.max_new_tokens,
+                        pad_token_id=self.processor.tokenizer.eos_token_id,
+                    )
+                generated_texts = self.processor.batch_decode(
+                    generated_ids,
+                    skip_special_tokens=True,
+                )
+                # Extract just the response part
+                full_text = generated_texts[0]
+                response = self._extract_response(full_text)
+                # Clean up GPU memory if using CUDA
+                if self.device == "cuda":
+                    torch.cuda.empty_cache()
+                elif self.device == "mps":
+                    torch.mps.empty_cache()
+                return response.replace(chr(10), " ").strip()
+            except Exception as e:
+                if "OutOfMemory" in str(type(e).__name__):
+                    logger.error(f"GPU OOM on attempt {attempt + 1}: {e}")
+                    if self.device == "cuda":
+                        torch.cuda.empty_cache()
+                    if attempt < self.vision_config.max_retries - 1:
+                        time.sleep(self.vision_config.retry_delay * (attempt + 1))
+                    else:
+                        return "GPU out of memory - vision processing failed"
+                else:
+                    logger.error(f"Vision processing failed (attempt {attempt + 1}): {e}")
+                    if attempt < self.vision_config.max_retries - 1:
+                        time.sleep(self.vision_config.retry_delay)
+                    else:
+                        return f"Vision processing error after {self.vision_config.max_retries} attempts"
+        return "Vision processing failed"
+    def _extract_response(self, full_text: str) -> str:
+        """Extract the assistant's response from the full generated text."""
+        # Handle different response formats
+        markers = ["assistant\n", "Assistant:", "Response:", "\n\n"]
+        for marker in markers:
+            if marker in full_text:
+                response = full_text.split(marker)[-1].strip()
+                if response:  # Ensure we got a meaningful response
+                    return response
+        # Fallback: return the full text cleaned up
+        return full_text.strip()
+    def get_model_info(self) -> Dict[str, Any]:
+        """Get information about the loaded model."""
+        info = {
+            "initialized": self._initialized,
+            "device": self.device,
+            "model_path": self.model_path,
+            "cuda_available": torch.cuda.is_available() if VISION_AVAILABLE else False,
+        }
+        if VISION_AVAILABLE and torch.cuda.is_available():
+            info["gpu_memory_gb"] = torch.cuda.get_device_properties(0).total_memory // (1024**3)
+        else:
+            info["gpu_memory_gb"] = "N/A"
+        return info
+class VisionManager:
+    """Manages periodic vision processing and scene understanding.
+    This runs in the background, periodically capturing frames and
+    generating scene descriptions that can be queried.
+    """
+    def __init__(
+        self,
+        camera_worker: Any,
+        vision_config: Optional[VisionConfig] = None,
+    ):
+        """Initialize vision manager.
+        Args:
+            camera_worker: CameraWorker instance for frame capture
+            vision_config: Vision configuration settings
+        """
+        self.camera_worker = camera_worker
+        self.vision_config = vision_config or VisionConfig()
+        self.vision_interval = self.vision_config.vision_interval
+        self.processor = VisionProcessor(self.vision_config)
+        self._last_processed_time = 0.0
+        self._last_description = ""
+        self._description_lock = threading.Lock()
+        self._stop_event = threading.Event()
+        self._thread: Optional[threading.Thread] = None
+        # Initialize processor
+        if not self.processor.initialize():
+            logger.error("Failed to initialize vision processor")
+            raise RuntimeError("Vision processor initialization failed")
+    def start(self) -> None:
+        """Start the vision processing loop in a background thread."""
+        self._stop_event.clear()
+        self._thread = threading.Thread(target=self._working_loop, daemon=True)
+        self._thread.start()
+        logger.info("Local vision processing started")
+    def stop(self) -> None:
+        """Stop the vision processing loop."""
+        self._stop_event.set()
+        if self._thread is not None:
+            self._thread.join(timeout=5.0)
+        logger.info("Local vision processing stopped")
+    def get_latest_description(self) -> str:
+        """Get the most recent scene description.
+        Returns:
+            Latest scene description or empty string if none available
+        """
+        with self._description_lock:
+            return self._last_description
+    def process_now(self, prompt: str = "Briefly describe what you see in one sentence.") -> str:
+        """Process the current frame immediately with a custom prompt.
+        Args:
+            prompt: Question/prompt to ask about the image
+        Returns:
+            Description of what the camera sees
+        """
+        frame = self.camera_worker.get_latest_frame()
+        if frame is None:
+            return "No camera frame available"
+        return self.processor.process_image(frame, prompt)
+    def _working_loop(self) -> None:
+        """Vision processing loop (runs in separate thread)."""
+        while not self._stop_event.is_set():
+            try:
+                current_time = time.time()
+                if current_time - self._last_processed_time >= self.vision_interval:
+                    frame = self.camera_worker.get_latest_frame()
+                    if frame is not None:
+                        description = self.processor.process_image(
+                            frame,
+                            "Briefly describe what you see in one sentence.",
+                        )
+                        # Only update if we got a valid response
+                        if description and not description.startswith(
+                            ("Vision", "Failed", "Error", "GPU")
+                        ):
+                            with self._description_lock:
+                                self._last_description = description
+                            self._last_processed_time = current_time
+                            logger.debug(f"Vision update: {description}")
+                        else:
+                            logger.warning(f"Invalid vision response: {description}")
+                time.sleep(1.0)  # Check every second
+            except Exception:
+                logger.exception("Vision processing loop error")
+                time.sleep(5.0)  # Longer sleep on error
+        logger.info("Vision loop finished")
+    def get_status(self) -> Dict[str, Any]:
+        """Get comprehensive status information."""
+        return {
+            "last_processed": self._last_processed_time,
+            "last_description": self.get_latest_description(),
+            "processor_info": self.processor.get_model_info(),
+            "config": {
+                "interval": self.vision_interval,
+            },
+        }
+def initialize_vision_manager(
+    camera_worker: Any,
+    config: Optional[VisionConfig] = None,
+) -> Optional[VisionManager]:
+    """Initialize vision manager with model download and configuration.
+    Args:
+        camera_worker: CameraWorker instance for frame capture
+        config: Optional vision configuration
+    Returns:
+        VisionManager instance or None if initialization fails
+    """
+    if not VISION_AVAILABLE:
+        logger.warning("Vision dependencies not available. Install: pip install torch transformers")
+        return None
+    try:
+        vision_config = config or VisionConfig()
+        # Initialize vision manager
+        vision_manager = VisionManager(camera_worker, vision_config)
+        # Log device info
+        device_info = vision_manager.processor.get_model_info()
+        logger.info(
+            f"Local vision enabled: {device_info.get('model_path')} on {device_info.get('device')}"
+        )
+        return vision_manager
+    except Exception as e:
+        logger.error(f"Failed to initialize vision manager: {e}")
+        return None

src/reachy_mini_openclaw/vision/yolo_head_tracker.py ADDED Viewed

	@@ -0,0 +1,152 @@

+"""YOLO-based head tracker for face detection.
+Uses YOLOv11 for fast, accurate face detection.
+"""
+from __future__ import annotations
+import logging
+from typing import Tuple, Optional
+import numpy as np
+from numpy.typing import NDArray
+try:
+    from supervision import Detections
+    from ultralytics import YOLO
+except ImportError as e:
+    raise ImportError(
+        "To use YOLO head tracker, install: pip install ultralytics supervision"
+    ) from e
+from huggingface_hub import hf_hub_download
+logger = logging.getLogger(__name__)
+class HeadTracker:
+    """Lightweight head tracker using YOLO for face detection."""
+    def __init__(
+        self,
+        model_repo: str = "AdamCodd/YOLOv11n-face-detection",
+        model_filename: str = "model.pt",
+        confidence_threshold: float = 0.3,
+        device: str = "cpu",
+    ) -> None:
+        """Initialize YOLO-based head tracker.
+        Args:
+            model_repo: HuggingFace model repository
+            model_filename: Model file name
+            confidence_threshold: Minimum confidence for face detection
+            device: Device to run inference on ('cpu' or 'cuda')
+        """
+        self.confidence_threshold = confidence_threshold
+        try:
+            # Download and load YOLO model
+            model_path = hf_hub_download(repo_id=model_repo, filename=model_filename)
+            self.model = YOLO(model_path).to(device)
+            logger.info(f"YOLO face detection model loaded from {model_repo}")
+        except Exception as e:
+            logger.error(f"Failed to load YOLO model: {e}")
+            raise
+    def _select_best_face(self, detections: Detections) -> Optional[int]:
+        """Select the best face based on confidence and area.
+        Args:
+            detections: Supervision detections object
+        Returns:
+            Index of best face or None if no valid faces
+        """
+        if detections.xyxy.shape[0] == 0:
+            return None
+        if detections.confidence is None:
+            return None
+        # Filter by confidence threshold
+        valid_mask = detections.confidence >= self.confidence_threshold
+        if not np.any(valid_mask):
+            return None
+        valid_indices = np.where(valid_mask)[0]
+        # Calculate areas for valid detections
+        boxes = detections.xyxy[valid_indices]
+        areas = (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1])
+        # Combine confidence and area (weighted towards larger faces)
+        confidences = detections.confidence[valid_indices]
+        scores = confidences * 0.7 + (areas / np.max(areas)) * 0.3
+        # Return index of best face
+        best_idx = valid_indices[np.argmax(scores)]
+        return int(best_idx)
+    def _bbox_to_normalized_coords(
+        self, bbox: NDArray[np.float32], w: int, h: int
+    ) -> NDArray[np.float32]:
+        """Convert bounding box center to normalized coordinates [-1, 1].
+        Args:
+            bbox: Bounding box [x1, y1, x2, y2]
+            w: Image width
+            h: Image height
+        Returns:
+            Center point in [-1, 1] coordinates
+        """
+        center_x = (bbox[0] + bbox[2]) / 2.0
+        center_y = (bbox[1] + bbox[3]) / 2.0
+        # Normalize to [0, 1] then to [-1, 1]
+        norm_x = (center_x / w) * 2.0 - 1.0
+        norm_y = (center_y / h) * 2.0 - 1.0
+        return np.array([norm_x, norm_y], dtype=np.float32)
+    def get_head_position(
+        self, img: NDArray[np.uint8]
+    ) -> Tuple[Optional[NDArray[np.float32]], Optional[float]]:
+        """Get head position from face detection.
+        Args:
+            img: Input image (BGR format)
+        Returns:
+            Tuple of (eye_center in [-1,1] coords, roll_angle in radians)
+        """
+        h, w = img.shape[:2]
+        try:
+            # Run YOLO inference
+            results = self.model(img, verbose=False)
+            detections = Detections.from_ultralytics(results[0])
+            # Select best face
+            face_idx = self._select_best_face(detections)
+            if face_idx is None:
+                return None, None
+            bbox = detections.xyxy[face_idx]
+            if detections.confidence is not None:
+                confidence = detections.confidence[face_idx]
+                logger.debug(f"Face detected with confidence: {confidence:.2f}")
+            # Get face center in [-1, 1] coordinates
+            face_center = self._bbox_to_normalized_coords(bbox, w, h)
+            # Roll is 0 since we don't have keypoints for precise angle estimation
+            roll = 0.0
+            return face_center, roll
+        except Exception as e:
+            logger.error(f"Error in head position detection: {e}")
+            return None, None

style.css ADDED Viewed

	@@ -0,0 +1,425 @@

+:root {
+    --bg: #0d0a1a;
+    --panel: #1a1428;
+    --glass: rgba(26, 20, 40, 0.7);
+    --card: rgba(255, 255, 255, 0.04);
+    --accent: #ff6b6b;
+    --accent-2: #9b59b6;
+    --accent-3: #f39c12;
+    --text: #f0e8f8;
+    --muted: #b8a8c8;
+    --border: rgba(255, 255, 255, 0.08);
+    --shadow: 0 25px 70px rgba(0, 0, 0, 0.45);
+    font-family: "Space Grotesk", "Manrope", system-ui, -apple-system, sans-serif;
+}
+* {
+    margin: 0;
+    padding: 0;
+    box-sizing: border-box;
+}
+body {
+    background: radial-gradient(circle at 20% 20%, rgba(255, 107, 107, 0.15), transparent 30%),
+                radial-gradient(circle at 80% 0%, rgba(155, 89, 182, 0.18), transparent 32%),
+                radial-gradient(circle at 50% 70%, rgba(243, 156, 18, 0.1), transparent 30%),
+                var(--bg);
+    color: var(--text);
+    min-height: 100vh;
+    line-height: 1.6;
+    padding-bottom: 3rem;
+}
+a {
+    color: inherit;
+    text-decoration: none;
+}
+.hero {
+    padding: 3.5rem clamp(1.5rem, 3vw, 3rem) 2.5rem;
+    position: relative;
+    overflow: hidden;
+}
+.hero::after {
+    content: "";
+    position: absolute;
+    inset: 0;
+    background: linear-gradient(120deg, rgba(255, 107, 107, 0.12), rgba(155, 89, 182, 0.08), transparent);
+    pointer-events: none;
+}
+.topline {
+    display: flex;
+    align-items: center;
+    justify-content: space-between;
+    max-width: 1200px;
+    margin: 0 auto 2rem;
+    position: relative;
+    z-index: 2;
+}
+.brand {
+    display: flex;
+    align-items: center;
+    gap: 0.5rem;
+    font-weight: 700;
+    letter-spacing: 0.5px;
+    color: var(--text);
+}
+.logo {
+    display: inline-flex;
+    align-items: center;
+    justify-content: center;
+    width: 2.4rem;
+    height: 2.4rem;
+    border-radius: 10px;
+    background: linear-gradient(145deg, rgba(255, 107, 107, 0.2), rgba(155, 89, 182, 0.2));
+    box-shadow: 0 10px 30px rgba(0, 0, 0, 0.25);
+    font-size: 1.4rem;
+}
+.brand-name {
+    font-size: 1.2rem;
+}
+.pill {
+    background: rgba(255, 255, 255, 0.06);
+    border: 1px solid var(--border);
+    padding: 0.6rem 1rem;
+    border-radius: 999px;
+    color: var(--muted);
+    font-size: 0.9rem;
+    box-shadow: 0 12px 30px rgba(0, 0, 0, 0.2);
+}
+.hero-grid {
+    display: grid;
+    grid-template-columns: repeat(auto-fit, minmax(320px, 1fr));
+    gap: clamp(1.5rem, 2.5vw, 2.5rem);
+    max-width: 1200px;
+    margin: 0 auto;
+    position: relative;
+    z-index: 2;
+    align-items: center;
+}
+.hero-copy h1 {
+    font-size: clamp(2.6rem, 4vw, 3.6rem);
+    margin-bottom: 1rem;
+    line-height: 1.1;
+    letter-spacing: -0.5px;
+}
+.eyebrow {
+    display: inline-flex;
+    align-items: center;
+    gap: 0.5rem;
+    text-transform: uppercase;
+    letter-spacing: 1px;
+    font-size: 0.8rem;
+    color: var(--muted);
+    margin-bottom: 0.75rem;
+}
+.eyebrow::before {
+    content: "";
+    display: inline-block;
+    width: 24px;
+    height: 2px;
+    background: linear-gradient(90deg, var(--accent), var(--accent-2));
+    border-radius: 999px;
+}
+.lede {
+    font-size: 1.1rem;
+    color: var(--muted);
+    max-width: 620px;
+}
+.hero-actions {
+    display: flex;
+    gap: 1rem;
+    align-items: center;
+    margin: 1.6rem 0 1.2rem;
+    flex-wrap: wrap;
+}
+.btn {
+    display: inline-flex;
+    align-items: center;
+    justify-content: center;
+    gap: 0.6rem;
+    padding: 0.85rem 1.4rem;
+    border-radius: 12px;
+    font-weight: 700;
+    border: 1px solid transparent;
+    cursor: pointer;
+    transition: transform 0.2s ease, box-shadow 0.2s ease, background 0.2s ease, border-color 0.2s ease;
+}
+.btn.primary {
+    background: linear-gradient(135deg, #ff6b6b, #9b59b6);
+    color: #fff;
+    box-shadow: 0 15px 30px rgba(255, 107, 107, 0.25);
+}
+.btn.primary:hover {
+    transform: translateY(-2px);
+    box-shadow: 0 25px 45px rgba(255, 107, 107, 0.35);
+}
+.btn.ghost {
+    background: rgba(255, 255, 255, 0.05);
+    border-color: var(--border);
+    color: var(--text);
+}
+.btn.ghost:hover {
+    border-color: rgba(255, 255, 255, 0.3);
+    transform: translateY(-2px);
+}
+.btn.wide {
+    width: 100%;
+    justify-content: center;
+}
+.hero-badges {
+    display: flex;
+    flex-wrap: wrap;
+    gap: 0.6rem;
+    color: var(--muted);
+    font-size: 0.9rem;
+}
+.hero-badges span {
+    padding: 0.5rem 0.8rem;
+    border-radius: 10px;
+    border: 1px solid var(--border);
+    background: rgba(255, 255, 255, 0.04);
+}
+.hero-visual .glass-card {
+    background: rgba(255, 255, 255, 0.03);
+    border: 1px solid var(--border);
+    border-radius: 18px;
+    padding: 1.2rem;
+    box-shadow: var(--shadow);
+    backdrop-filter: blur(10px);
+}
+.hero-gif {
+    width: 100%;
+    max-width: 500px;
+    height: auto;
+    border-radius: 14px;
+    display: block;
+    margin: 0 auto;
+}
+.architecture-preview {
+    background: rgba(0, 0, 0, 0.4);
+    border-radius: 14px;
+    border: 1px solid var(--border);
+    padding: 1.5rem;
+    overflow-x: auto;
+}
+.architecture-preview pre {
+    font-family: "SF Mono", "Fira Code", "Consolas", monospace;
+    font-size: 0.85rem;
+    color: var(--accent);
+    white-space: pre;
+    margin: 0;
+    line-height: 1.5;
+}
+.caption {
+    margin-top: 0.75rem;
+    color: var(--muted);
+    font-size: 0.95rem;
+}
+.section {
+    max-width: 1200px;
+    margin: 0 auto;
+    padding: clamp(2rem, 4vw, 3.5rem) clamp(1.5rem, 3vw, 3rem);
+}
+.section-header {
+    text-align: center;
+    max-width: 780px;
+    margin: 0 auto 2rem;
+}
+.section-header h2 {
+    font-size: clamp(2rem, 3vw, 2.6rem);
+    margin-bottom: 0.5rem;
+}
+.intro {
+    color: var(--muted);
+    font-size: 1.05rem;
+}
+.feature-grid {
+    display: grid;
+    grid-template-columns: repeat(auto-fit, minmax(240px, 1fr));
+    gap: 1rem;
+}
+.feature-card {
+    background: rgba(255, 255, 255, 0.03);
+    border: 1px solid var(--border);
+    border-radius: 16px;
+    padding: 1.25rem;
+    box-shadow: 0 10px 30px rgba(0, 0, 0, 0.2);
+    transition: transform 0.2s ease, border-color 0.2s ease, box-shadow 0.2s ease;
+}
+.feature-card:hover {
+    transform: translateY(-4px);
+    border-color: rgba(255, 107, 107, 0.3);
+    box-shadow: 0 18px 40px rgba(0, 0, 0, 0.3);
+}
+.feature-card .icon {
+    width: 48px;
+    height: 48px;
+    border-radius: 12px;
+    display: grid;
+    place-items: center;
+    background: linear-gradient(145deg, rgba(255, 107, 107, 0.15), rgba(155, 89, 182, 0.15));
+    margin-bottom: 0.8rem;
+    font-size: 1.4rem;
+}
+.feature-card h3 {
+    margin-bottom: 0.35rem;
+}
+.feature-card p {
+    color: var(--muted);
+}
+.story {
+    padding-top: 1rem;
+}
+.story-grid {
+    display: grid;
+    grid-template-columns: repeat(auto-fit, minmax(280px, 1fr));
+    gap: 1rem;
+}
+.story-card {
+    background: rgba(255, 255, 255, 0.03);
+    border: 1px solid var(--border);
+    border-radius: 18px;
+    padding: 1.5rem;
+    box-shadow: var(--shadow);
+}
+.story-card.secondary {
+    background: linear-gradient(145deg, rgba(155, 89, 182, 0.1), rgba(255, 107, 107, 0.08));
+}
+.story-card.highlight {
+    background: linear-gradient(145deg, rgba(46, 204, 113, 0.15), rgba(52, 152, 219, 0.1));
+    border-color: rgba(46, 204, 113, 0.3);
+}
+.simulator-callout {
+    padding-top: 0;
+}
+.simulator-callout code {
+    background: rgba(0, 0, 0, 0.3);
+    padding: 0.2rem 0.5rem;
+    border-radius: 4px;
+    font-family: "SF Mono", "Fira Code", monospace;
+    font-size: 0.85rem;
+}
+.story-card h3 {
+    margin-bottom: 0.8rem;
+}
+.story-list {
+    list-style: none;
+    display: grid;
+    gap: 0.7rem;
+    color: var(--muted);
+    font-size: 0.98rem;
+}
+.story-list li {
+    display: flex;
+    gap: 0.7rem;
+    align-items: flex-start;
+}
+.story-text {
+    color: var(--muted);
+    line-height: 1.7;
+    margin-bottom: 1rem;
+}
+.chips {
+    display: flex;
+    flex-wrap: wrap;
+    gap: 0.5rem;
+}
+.chip {
+    padding: 0.45rem 0.8rem;
+    border-radius: 12px;
+    background: rgba(0, 0, 0, 0.3);
+    border: 1px solid var(--border);
+    color: var(--text);
+    font-size: 0.9rem;
+}
+.footer {
+    text-align: center;
+    color: var(--muted);
+    padding: 2rem 1.5rem 0;
+    max-width: 800px;
+    margin: 0 auto;
+}
+.footer a {
+    color: var(--accent);
+    border-bottom: 1px solid transparent;
+}
+.footer a:hover {
+    border-color: var(--accent);
+}
+@media (max-width: 768px) {
+    .hero {
+        padding-top: 2.5rem;
+    }
+    .topline {
+        flex-direction: column;
+        gap: 0.8rem;
+        align-items: flex-start;
+    }
+    .hero-actions {
+        width: 100%;
+    }
+    .btn {
+        width: 100%;
+        justify-content: center;
+    }
+    .hero-badges {
+        gap: 0.4rem;
+    }
+}