Spaces:
Running
Running
shaunx Claude Opus 4.6 commited on
Commit ·
1ab5bef
0
Parent(s):
Initial release of ReachyClaw
Browse filesOpenClaw AI agent embodied in a Reachy Mini robot. OpenClaw is the actual
brain — every message is routed through OpenClaw, and it controls the robot
body (movement, emotions, camera) via inline action tags. OpenAI Realtime
API is used purely for voice I/O.
Based on ClawBody (Apache 2.0) with fundamental architecture changes:
- GPT-4o is a voice relay, not the brain
- OpenClaw controls all physical actions via [ACTION:param] tags
- No startup context fetch (instant startup)
- Dynamic action list built from TOOL_SPECS (single source of truth)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- .env.example +44 -0
- .github/workflows/sync-to-huggingface.yml +23 -0
- .gitignore +62 -0
- CONTRIBUTING.md +78 -0
- LICENSE +191 -0
- README.md +185 -0
- index.html +201 -0
- openclaw-skill/SKILL.md +109 -0
- pyproject.toml +120 -0
- src/reachy_mini_openclaw/__init__.py +9 -0
- src/reachy_mini_openclaw/audio/__init__.py +5 -0
- src/reachy_mini_openclaw/audio/head_wobbler.py +223 -0
- src/reachy_mini_openclaw/camera_worker.py +382 -0
- src/reachy_mini_openclaw/config.py +84 -0
- src/reachy_mini_openclaw/gradio_app.py +202 -0
- src/reachy_mini_openclaw/main.py +591 -0
- src/reachy_mini_openclaw/moves.py +648 -0
- src/reachy_mini_openclaw/openai_realtime.py +562 -0
- src/reachy_mini_openclaw/openclaw_bridge.py +606 -0
- src/reachy_mini_openclaw/prompts.py +98 -0
- src/reachy_mini_openclaw/prompts/default.txt +15 -0
- src/reachy_mini_openclaw/tools/__init__.py +17 -0
- src/reachy_mini_openclaw/tools/core_tools.py +421 -0
- src/reachy_mini_openclaw/vision/__init__.py +18 -0
- src/reachy_mini_openclaw/vision/head_tracker.py +70 -0
- src/reachy_mini_openclaw/vision/mediapipe_tracker.py +112 -0
- src/reachy_mini_openclaw/vision/processors.py +419 -0
- src/reachy_mini_openclaw/vision/yolo_head_tracker.py +152 -0
- style.css +425 -0
.env.example
ADDED
|
@@ -0,0 +1,44 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# ReachyClaw Configuration
|
| 2 |
+
# Give your OpenClaw AI agent a physical robot body!
|
| 3 |
+
|
| 4 |
+
# ==============================================================================
|
| 5 |
+
# REQUIRED: OpenAI API Key
|
| 6 |
+
# ==============================================================================
|
| 7 |
+
# Get your key at: https://platform.openai.com/api-keys
|
| 8 |
+
# Requires Realtime API access
|
| 9 |
+
OPENAI_API_KEY=sk-your-openai-key
|
| 10 |
+
|
| 11 |
+
# ==============================================================================
|
| 12 |
+
# REQUIRED: OpenClaw Gateway
|
| 13 |
+
# ==============================================================================
|
| 14 |
+
# The URL where your OpenClaw gateway is running
|
| 15 |
+
# If running on the same machine as the robot, use the host machine's IP
|
| 16 |
+
OPENCLAW_GATEWAY_URL=http://192.168.1.100:18789
|
| 17 |
+
|
| 18 |
+
# Your OpenClaw gateway authentication token
|
| 19 |
+
# Find this in ~/.openclaw/openclaw.json under gateway.token
|
| 20 |
+
OPENCLAW_TOKEN=your-gateway-token
|
| 21 |
+
|
| 22 |
+
# Agent ID to use (default: main)
|
| 23 |
+
OPENCLAW_AGENT_ID=main
|
| 24 |
+
|
| 25 |
+
# Session key for conversation context - IMPORTANT!
|
| 26 |
+
# Use "main" (default) to share context with WhatsApp and other DM channels
|
| 27 |
+
# This allows the robot to be aware of all your conversations
|
| 28 |
+
OPENCLAW_SESSION_KEY=main
|
| 29 |
+
|
| 30 |
+
# ==============================================================================
|
| 31 |
+
# OPTIONAL: Voice Settings
|
| 32 |
+
# ==============================================================================
|
| 33 |
+
# OpenAI Realtime voice (alloy, echo, fable, onyx, nova, shimmer, cedar)
|
| 34 |
+
OPENAI_VOICE=cedar
|
| 35 |
+
|
| 36 |
+
# OpenAI model for Realtime API
|
| 37 |
+
OPENAI_MODEL=gpt-4o-realtime-preview-2024-12-17
|
| 38 |
+
|
| 39 |
+
# ==============================================================================
|
| 40 |
+
# OPTIONAL: Features
|
| 41 |
+
# ==============================================================================
|
| 42 |
+
# Enable/disable features (true/false)
|
| 43 |
+
ENABLE_CAMERA=true
|
| 44 |
+
ENABLE_OPENCLAW_TOOLS=true
|
.github/workflows/sync-to-huggingface.yml
ADDED
|
@@ -0,0 +1,23 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
name: Sync to Hugging Face Space
|
| 2 |
+
|
| 3 |
+
on:
|
| 4 |
+
push:
|
| 5 |
+
branches: [main]
|
| 6 |
+
workflow_dispatch: # Allow manual trigger
|
| 7 |
+
|
| 8 |
+
jobs:
|
| 9 |
+
sync-to-hub:
|
| 10 |
+
runs-on: ubuntu-latest
|
| 11 |
+
steps:
|
| 12 |
+
- name: Checkout repository
|
| 13 |
+
uses: actions/checkout@v4
|
| 14 |
+
with:
|
| 15 |
+
fetch-depth: 0
|
| 16 |
+
lfs: true
|
| 17 |
+
|
| 18 |
+
- name: Push to Hugging Face
|
| 19 |
+
env:
|
| 20 |
+
HF_TOKEN: ${{ secrets.HF_TOKEN }}
|
| 21 |
+
run: |
|
| 22 |
+
git remote add huggingface https://huggingface:$HF_TOKEN@huggingface.co/spaces/shaunx/reachyclaw
|
| 23 |
+
git push huggingface main --force
|
.gitignore
ADDED
|
@@ -0,0 +1,62 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# ReachyClaw .gitignore
|
| 2 |
+
|
| 3 |
+
# Environment and secrets
|
| 4 |
+
.env
|
| 5 |
+
*.env.local
|
| 6 |
+
|
| 7 |
+
# Python
|
| 8 |
+
__pycache__/
|
| 9 |
+
*.py[cod]
|
| 10 |
+
*$py.class
|
| 11 |
+
*.so
|
| 12 |
+
.Python
|
| 13 |
+
build/
|
| 14 |
+
develop-eggs/
|
| 15 |
+
dist/
|
| 16 |
+
downloads/
|
| 17 |
+
eggs/
|
| 18 |
+
.eggs/
|
| 19 |
+
lib/
|
| 20 |
+
lib64/
|
| 21 |
+
parts/
|
| 22 |
+
sdist/
|
| 23 |
+
var/
|
| 24 |
+
wheels/
|
| 25 |
+
*.egg-info/
|
| 26 |
+
.installed.cfg
|
| 27 |
+
*.egg
|
| 28 |
+
|
| 29 |
+
# Virtual environments
|
| 30 |
+
.venv/
|
| 31 |
+
venv/
|
| 32 |
+
ENV/
|
| 33 |
+
env/
|
| 34 |
+
|
| 35 |
+
# IDE
|
| 36 |
+
.idea/
|
| 37 |
+
.vscode/
|
| 38 |
+
*.swp
|
| 39 |
+
*.swo
|
| 40 |
+
*~
|
| 41 |
+
|
| 42 |
+
# Testing
|
| 43 |
+
.pytest_cache/
|
| 44 |
+
.coverage
|
| 45 |
+
htmlcov/
|
| 46 |
+
.tox/
|
| 47 |
+
.nox/
|
| 48 |
+
|
| 49 |
+
# Type checking
|
| 50 |
+
.mypy_cache/
|
| 51 |
+
.dmypy.json
|
| 52 |
+
dmypy.json
|
| 53 |
+
|
| 54 |
+
# Logs
|
| 55 |
+
*.log
|
| 56 |
+
|
| 57 |
+
# OS
|
| 58 |
+
.DS_Store
|
| 59 |
+
Thumbs.db
|
| 60 |
+
|
| 61 |
+
# Package manager
|
| 62 |
+
uv.lock
|
CONTRIBUTING.md
ADDED
|
@@ -0,0 +1,78 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Contributing to ReachyClaw
|
| 2 |
+
|
| 3 |
+
Thank you for your interest in contributing! This project welcomes contributions from the community.
|
| 4 |
+
|
| 5 |
+
## How to Contribute
|
| 6 |
+
|
| 7 |
+
### Reporting Bugs
|
| 8 |
+
|
| 9 |
+
If you find a bug, please open an issue with:
|
| 10 |
+
- A clear title and description
|
| 11 |
+
- Steps to reproduce the issue
|
| 12 |
+
- Expected vs actual behavior
|
| 13 |
+
- Your environment (OS, Python version, robot model)
|
| 14 |
+
|
| 15 |
+
### Suggesting Features
|
| 16 |
+
|
| 17 |
+
Feature requests are welcome! Please open an issue with:
|
| 18 |
+
- A clear description of the feature
|
| 19 |
+
- Use cases and motivation
|
| 20 |
+
- Any technical considerations
|
| 21 |
+
|
| 22 |
+
### Pull Requests
|
| 23 |
+
|
| 24 |
+
1. Fork the repository
|
| 25 |
+
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
|
| 26 |
+
3. Make your changes
|
| 27 |
+
4. Add tests if applicable
|
| 28 |
+
5. Run linting: `ruff check . && ruff format .`
|
| 29 |
+
6. Commit your changes (`git commit -m 'Add amazing feature'`)
|
| 30 |
+
7. Push to the branch (`git push origin feature/amazing-feature`)
|
| 31 |
+
8. Open a Pull Request
|
| 32 |
+
|
| 33 |
+
## Development Setup
|
| 34 |
+
|
| 35 |
+
```bash
|
| 36 |
+
# Clone your fork
|
| 37 |
+
git clone https://github.com/YOUR_USERNAME/reachyclaw.git
|
| 38 |
+
cd reachyclaw
|
| 39 |
+
|
| 40 |
+
# Install in development mode
|
| 41 |
+
pip install -e ".[dev]"
|
| 42 |
+
|
| 43 |
+
# Run tests
|
| 44 |
+
pytest
|
| 45 |
+
|
| 46 |
+
# Format code
|
| 47 |
+
ruff check --fix .
|
| 48 |
+
ruff format .
|
| 49 |
+
```
|
| 50 |
+
|
| 51 |
+
## Code Style
|
| 52 |
+
|
| 53 |
+
- Follow PEP 8
|
| 54 |
+
- Use type hints
|
| 55 |
+
- Write docstrings for public functions and classes
|
| 56 |
+
- Keep functions focused and small
|
| 57 |
+
|
| 58 |
+
## Where to Submit Contributions
|
| 59 |
+
|
| 60 |
+
### This Project
|
| 61 |
+
Submit PRs directly to this repository for:
|
| 62 |
+
- Bug fixes
|
| 63 |
+
- New features
|
| 64 |
+
- Documentation improvements
|
| 65 |
+
- New personality profiles
|
| 66 |
+
|
| 67 |
+
### Reachy Mini Ecosystem
|
| 68 |
+
- **SDK improvements**: [pollen-robotics/reachy_mini](https://github.com/pollen-robotics/reachy_mini)
|
| 69 |
+
- **New dances/emotions**: [reachy_mini_dances_library](https://github.com/pollen-robotics/reachy_mini_dances_library)
|
| 70 |
+
- **Apps for the app store**: Submit to [Hugging Face Spaces](https://huggingface.co/spaces)
|
| 71 |
+
|
| 72 |
+
### OpenClaw Ecosystem
|
| 73 |
+
- **New skills**: Submit to [MoltDirectory](https://github.com/neonone123/moltdirectory)
|
| 74 |
+
- **Core OpenClaw**: [openclaw/openclaw](https://github.com/openclaw/openclaw)
|
| 75 |
+
|
| 76 |
+
## License
|
| 77 |
+
|
| 78 |
+
By contributing, you agree that your contributions will be licensed under the Apache 2.0 License.
|
LICENSE
ADDED
|
@@ -0,0 +1,191 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Apache License
|
| 2 |
+
Version 2.0, January 2004
|
| 3 |
+
http://www.apache.org/licenses/
|
| 4 |
+
|
| 5 |
+
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
|
| 6 |
+
|
| 7 |
+
1. Definitions.
|
| 8 |
+
|
| 9 |
+
"License" shall mean the terms and conditions for use, reproduction,
|
| 10 |
+
and distribution as defined by Sections 1 through 9 of this document.
|
| 11 |
+
|
| 12 |
+
"Licensor" shall mean the copyright owner or entity authorized by
|
| 13 |
+
the copyright owner that is granting the License.
|
| 14 |
+
|
| 15 |
+
"Legal Entity" shall mean the union of the acting entity and all
|
| 16 |
+
other entities that control, are controlled by, or are under common
|
| 17 |
+
control with that entity. For the purposes of this definition,
|
| 18 |
+
"control" means (i) the power, direct or indirect, to cause the
|
| 19 |
+
direction or management of such entity, whether by contract or
|
| 20 |
+
otherwise, or (ii) ownership of fifty percent (50%) or more of the
|
| 21 |
+
outstanding shares, or (iii) beneficial ownership of such entity.
|
| 22 |
+
|
| 23 |
+
"You" (or "Your") shall mean an individual or Legal Entity
|
| 24 |
+
exercising permissions granted by this License.
|
| 25 |
+
|
| 26 |
+
"Source" form shall mean the preferred form for making modifications,
|
| 27 |
+
including but not limited to software source code, documentation
|
| 28 |
+
source, and configuration files.
|
| 29 |
+
|
| 30 |
+
"Object" form shall mean any form resulting from mechanical
|
| 31 |
+
transformation or translation of a Source form, including but
|
| 32 |
+
not limited to compiled object code, generated documentation,
|
| 33 |
+
and conversions to other media types.
|
| 34 |
+
|
| 35 |
+
"Work" shall mean the work of authorship, whether in Source or
|
| 36 |
+
Object form, made available under the License, as indicated by a
|
| 37 |
+
copyright notice that is included in or attached to the work
|
| 38 |
+
(an example is provided in the Appendix below).
|
| 39 |
+
|
| 40 |
+
"Derivative Works" shall mean any work, whether in Source or Object
|
| 41 |
+
form, that is based on (or derived from) the Work and for which the
|
| 42 |
+
editorial revisions, annotations, elaborations, or other modifications
|
| 43 |
+
represent, as a whole, an original work of authorship. For the purposes
|
| 44 |
+
of this License, Derivative Works shall not include works that remain
|
| 45 |
+
separable from, or merely link (or bind by name) to the interfaces of,
|
| 46 |
+
the Work and Derivative Works thereof.
|
| 47 |
+
|
| 48 |
+
"Contribution" shall mean any work of authorship, including
|
| 49 |
+
the original version of the Work and any modifications or additions
|
| 50 |
+
to that Work or Derivative Works thereof, that is intentionally
|
| 51 |
+
submitted to the Licensor for inclusion in the Work by the copyright owner
|
| 52 |
+
or by an individual or Legal Entity authorized to submit on behalf of
|
| 53 |
+
the copyright owner. For the purposes of this definition, "submitted"
|
| 54 |
+
means any form of electronic, verbal, or written communication sent
|
| 55 |
+
to the Licensor or its representatives, including but not limited to
|
| 56 |
+
communication on electronic mailing lists, source code control systems,
|
| 57 |
+
and issue tracking systems that are managed by, or on behalf of, the
|
| 58 |
+
Licensor for the purpose of discussing and improving the Work, but
|
| 59 |
+
excluding communication that is conspicuously marked or otherwise
|
| 60 |
+
designated in writing by the copyright owner as "Not a Contribution."
|
| 61 |
+
|
| 62 |
+
"Contributor" shall mean Licensor and any individual or Legal Entity
|
| 63 |
+
on behalf of whom a Contribution has been received by Licensor and
|
| 64 |
+
subsequently incorporated within the Work.
|
| 65 |
+
|
| 66 |
+
2. Grant of Copyright License. Subject to the terms and conditions of
|
| 67 |
+
this License, each Contributor hereby grants to You a perpetual,
|
| 68 |
+
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
| 69 |
+
copyright license to reproduce, prepare Derivative Works of,
|
| 70 |
+
publicly display, publicly perform, sublicense, and distribute the
|
| 71 |
+
Work and such Derivative Works in Source or Object form.
|
| 72 |
+
|
| 73 |
+
3. Grant of Patent License. Subject to the terms and conditions of
|
| 74 |
+
this License, each Contributor hereby grants to You a perpetual,
|
| 75 |
+
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
| 76 |
+
(except as stated in this section) patent license to make, have made,
|
| 77 |
+
use, offer to sell, sell, import, and otherwise transfer the Work,
|
| 78 |
+
where such license applies only to those patent claims licensable
|
| 79 |
+
by such Contributor that are necessarily infringed by their
|
| 80 |
+
Contribution(s) alone or by combination of their Contribution(s)
|
| 81 |
+
with the Work to which such Contribution(s) was submitted. If You
|
| 82 |
+
institute patent litigation against any entity (including a
|
| 83 |
+
cross-claim or counterclaim in a lawsuit) alleging that the Work
|
| 84 |
+
or a Contribution incorporated within the Work constitutes direct
|
| 85 |
+
or contributory patent infringement, then any patent licenses
|
| 86 |
+
granted to You under this License for that Work shall terminate
|
| 87 |
+
as of the date such litigation is filed.
|
| 88 |
+
|
| 89 |
+
4. Redistribution. You may reproduce and distribute copies of the
|
| 90 |
+
Work or Derivative Works thereof in any medium, with or without
|
| 91 |
+
modifications, and in Source or Object form, provided that You
|
| 92 |
+
meet the following conditions:
|
| 93 |
+
|
| 94 |
+
(a) You must give any other recipients of the Work or
|
| 95 |
+
Derivative Works a copy of this License; and
|
| 96 |
+
|
| 97 |
+
(b) You must cause any modified files to carry prominent notices
|
| 98 |
+
stating that You changed the files; and
|
| 99 |
+
|
| 100 |
+
(c) You must retain, in the Source form of any Derivative Works
|
| 101 |
+
that You distribute, all copyright, patent, trademark, and
|
| 102 |
+
attribution notices from the Source form of the Work,
|
| 103 |
+
excluding those notices that do not pertain to any part of
|
| 104 |
+
the Derivative Works; and
|
| 105 |
+
|
| 106 |
+
(d) If the Work includes a "NOTICE" text file as part of its
|
| 107 |
+
distribution, then any Derivative Works that You distribute must
|
| 108 |
+
include a readable copy of the attribution notices contained
|
| 109 |
+
within such NOTICE file, excluding those notices that do not
|
| 110 |
+
pertain to any part of the Derivative Works, in at least one
|
| 111 |
+
of the following places: within a NOTICE text file distributed
|
| 112 |
+
as part of the Derivative Works; within the Source form or
|
| 113 |
+
documentation, if provided along with the Derivative Works; or,
|
| 114 |
+
within a display generated by the Derivative Works, if and
|
| 115 |
+
wherever such third-party notices normally appear. The contents
|
| 116 |
+
of the NOTICE file are for informational purposes only and
|
| 117 |
+
do not modify the License. You may add Your own attribution
|
| 118 |
+
notices within Derivative Works that You distribute, alongside
|
| 119 |
+
or as an addendum to the NOTICE text from the Work, provided
|
| 120 |
+
that such additional attribution notices cannot be construed
|
| 121 |
+
as modifying the License.
|
| 122 |
+
|
| 123 |
+
You may add Your own copyright statement to Your modifications and
|
| 124 |
+
may provide additional or different license terms and conditions
|
| 125 |
+
for use, reproduction, or distribution of Your modifications, or
|
| 126 |
+
for any such Derivative Works as a whole, provided Your use,
|
| 127 |
+
reproduction, and distribution of the Work otherwise complies with
|
| 128 |
+
the conditions stated in this License.
|
| 129 |
+
|
| 130 |
+
5. Submission of Contributions. Unless You explicitly state otherwise,
|
| 131 |
+
any Contribution intentionally submitted for inclusion in the Work
|
| 132 |
+
by You to the Licensor shall be under the terms and conditions of
|
| 133 |
+
this License, without any additional terms or conditions.
|
| 134 |
+
Notwithstanding the above, nothing herein shall supersede or modify
|
| 135 |
+
the terms of any separate license agreement you may have executed
|
| 136 |
+
with Licensor regarding such Contributions.
|
| 137 |
+
|
| 138 |
+
6. Trademarks. This License does not grant permission to use the trade
|
| 139 |
+
names, trademarks, service marks, or product names of the Licensor,
|
| 140 |
+
except as required for reasonable and customary use in describing the
|
| 141 |
+
origin of the Work and reproducing the content of the NOTICE file.
|
| 142 |
+
|
| 143 |
+
7. Disclaimer of Warranty. Unless required by applicable law or
|
| 144 |
+
agreed to in writing, Licensor provides the Work (and each
|
| 145 |
+
Contributor provides its Contributions) on an "AS IS" BASIS,
|
| 146 |
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
| 147 |
+
implied, including, without limitation, any warranties or conditions
|
| 148 |
+
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
|
| 149 |
+
PARTICULAR PURPOSE. You are solely responsible for determining the
|
| 150 |
+
appropriateness of using or redistributing the Work and assume any
|
| 151 |
+
risks associated with Your exercise of permissions under this License.
|
| 152 |
+
|
| 153 |
+
8. Limitation of Liability. In no event and under no theory of
|
| 154 |
+
liability, whether in contract, strict liability, or tort
|
| 155 |
+
(including negligence or otherwise) arising in any way out of
|
| 156 |
+
the use or inability to use the Work (even if such Holder or
|
| 157 |
+
other party has been advised of the possibility of such damages),
|
| 158 |
+
shall any Contributor be liable to You for damages, including any
|
| 159 |
+
direct, indirect, special, incidental, or consequential damages of
|
| 160 |
+
any character arising as a result of this License or out of the use
|
| 161 |
+
or inability to use the Work (including but not limited to damages
|
| 162 |
+
for loss of goodwill, work stoppage, computer failure or malfunction,
|
| 163 |
+
or any and all other commercial damages or losses), even if such
|
| 164 |
+
Contributor has been advised of the possibility of such damages.
|
| 165 |
+
|
| 166 |
+
9. Accepting Warranty or Additional Liability. While redistributing
|
| 167 |
+
the Work or Derivative Works thereof, You may choose to offer,
|
| 168 |
+
and charge a fee for, acceptance of support, warranty, indemnity,
|
| 169 |
+
or other liability obligations and/or rights consistent with this
|
| 170 |
+
License. However, in accepting such obligations, You may act only
|
| 171 |
+
on Your own behalf and on Your sole responsibility, not on behalf
|
| 172 |
+
of any other Contributor, and only if You agree to indemnify,
|
| 173 |
+
defend, and hold each Contributor harmless for any liability
|
| 174 |
+
incurred by, or claims asserted against, such Contributor by reason
|
| 175 |
+
of your accepting any such warranty or additional liability.
|
| 176 |
+
|
| 177 |
+
END OF TERMS AND CONDITIONS
|
| 178 |
+
|
| 179 |
+
Copyright 2024 Tom
|
| 180 |
+
|
| 181 |
+
Licensed under the Apache License, Version 2.0 (the "License");
|
| 182 |
+
you may not use this file except in compliance with the License.
|
| 183 |
+
You may obtain a copy of the License at
|
| 184 |
+
|
| 185 |
+
http://www.apache.org/licenses/LICENSE-2.0
|
| 186 |
+
|
| 187 |
+
Unless required by applicable law or agreed to in writing, software
|
| 188 |
+
distributed under the License is distributed on an "AS IS" BASIS,
|
| 189 |
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
| 190 |
+
See the License for the specific language governing permissions and
|
| 191 |
+
limitations under the License.
|
README.md
ADDED
|
@@ -0,0 +1,185 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: ReachyClaw
|
| 3 |
+
emoji: 🤖
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: purple
|
| 6 |
+
sdk: static
|
| 7 |
+
pinned: false
|
| 8 |
+
short_description: OpenClaw AI agent with a Reachy Mini robot body
|
| 9 |
+
tags:
|
| 10 |
+
- reachy_mini
|
| 11 |
+
- reachy_mini_python_app
|
| 12 |
+
- openclaw
|
| 13 |
+
- robotics
|
| 14 |
+
- embodied-ai
|
| 15 |
+
- ai-assistant
|
| 16 |
+
- openai-realtime
|
| 17 |
+
- voice-assistant
|
| 18 |
+
- conversational-ai
|
| 19 |
+
- physical-ai
|
| 20 |
+
- robot-body
|
| 21 |
+
- speech-to-speech
|
| 22 |
+
- multimodal
|
| 23 |
+
- vision
|
| 24 |
+
- expressive-robot
|
| 25 |
+
- face-tracking
|
| 26 |
+
- human-robot-interaction
|
| 27 |
+
---
|
| 28 |
+
|
| 29 |
+
# ReachyClaw
|
| 30 |
+
|
| 31 |
+
**Your OpenClaw AI agent, embodied in a Reachy Mini robot.**
|
| 32 |
+
|
| 33 |
+
ReachyClaw makes OpenClaw the actual brain of a Reachy Mini robot. Unlike typical setups where GPT-4o handles conversations and only calls OpenClaw occasionally, ReachyClaw routes **every** user message through OpenClaw. The robot speaks, moves, and sees — all controlled by your OpenClaw agent.
|
| 34 |
+
|
| 35 |
+
OpenAI Realtime API is used purely for voice I/O (speech-to-text and text-to-speech). Your OpenClaw agent decides what to say **and** how the robot moves.
|
| 36 |
+
|
| 37 |
+
## Architecture
|
| 38 |
+
|
| 39 |
+
```
|
| 40 |
+
User speaks -> OpenAI Realtime API (STT only)
|
| 41 |
+
-> GPT-4o calls ask_openclaw with the user's message
|
| 42 |
+
-> OpenClaw (the actual brain) responds with text + action tags
|
| 43 |
+
-> ReachyClaw parses action tags -> robot moves (emotions, look, dance)
|
| 44 |
+
-> Clean text -> GPT-4o (TTS only) -> Robot speaks
|
| 45 |
+
```
|
| 46 |
+
|
| 47 |
+
OpenClaw can include action tags like `[EMOTION:happy]`, `[LOOK:left]`, `[DANCE:excited]` in its responses. These are parsed and executed on the robot, then stripped so only the spoken words go to TTS.
|
| 48 |
+
|
| 49 |
+
## Features
|
| 50 |
+
|
| 51 |
+
- **OpenClaw is the brain** — every message goes through your OpenClaw agent, not GPT-4o
|
| 52 |
+
- **Full body control** — OpenClaw controls head movement, emotions, dances, and camera
|
| 53 |
+
- **Real-time voice** — OpenAI Realtime API for low-latency speech I/O
|
| 54 |
+
- **Face tracking** — robot tracks your face and maintains eye contact
|
| 55 |
+
- **Camera vision** — robot can see through its camera and describe what it sees
|
| 56 |
+
- **Conversation memory** — OpenClaw maintains full context across sessions and channels
|
| 57 |
+
- **Works with simulator** — no physical robot required
|
| 58 |
+
|
| 59 |
+
## Available Robot Actions
|
| 60 |
+
|
| 61 |
+
OpenClaw can use these action tags in responses:
|
| 62 |
+
|
| 63 |
+
| Action | Tags |
|
| 64 |
+
|--------|------|
|
| 65 |
+
| **Look** | `[LOOK:left]` `[LOOK:right]` `[LOOK:up]` `[LOOK:down]` `[LOOK:front]` |
|
| 66 |
+
| **Emotion** | `[EMOTION:happy]` `[EMOTION:sad]` `[EMOTION:surprised]` `[EMOTION:curious]` `[EMOTION:thinking]` `[EMOTION:confused]` `[EMOTION:excited]` |
|
| 67 |
+
| **Dance** | `[DANCE:happy]` `[DANCE:excited]` `[DANCE:wave]` `[DANCE:nod]` `[DANCE:shake]` `[DANCE:bounce]` |
|
| 68 |
+
| **Camera** | `[CAMERA]` |
|
| 69 |
+
| **Face Tracking** | `[FACE_TRACKING:on]` `[FACE_TRACKING:off]` |
|
| 70 |
+
| **Stop** | `[STOP]` |
|
| 71 |
+
|
| 72 |
+
## Prerequisites
|
| 73 |
+
|
| 74 |
+
### Option A: With Physical Robot
|
| 75 |
+
- [Reachy Mini](https://www.pollen-robotics.com/reachy-mini/) robot (Wireless or Lite)
|
| 76 |
+
|
| 77 |
+
### Option B: With Simulator
|
| 78 |
+
- Any computer with Python 3.11+
|
| 79 |
+
- Install: `pip install "reachy-mini[mujoco]"`
|
| 80 |
+
|
| 81 |
+
### Software (Both Options)
|
| 82 |
+
- Python 3.11+
|
| 83 |
+
- [Reachy Mini SDK](https://github.com/pollen-robotics/reachy_mini)
|
| 84 |
+
- [OpenClaw](https://github.com/openclaw/openclaw) gateway running
|
| 85 |
+
- OpenAI API key with Realtime API access
|
| 86 |
+
|
| 87 |
+
## Installation
|
| 88 |
+
|
| 89 |
+
```bash
|
| 90 |
+
# Clone ReachyClaw
|
| 91 |
+
git clone https://github.com/shaunx/reachyclaw
|
| 92 |
+
cd reachyclaw
|
| 93 |
+
|
| 94 |
+
# Create virtual environment
|
| 95 |
+
python -m venv .venv
|
| 96 |
+
source .venv/bin/activate
|
| 97 |
+
|
| 98 |
+
# Install
|
| 99 |
+
pip install -e ".[mediapipe_vision]"
|
| 100 |
+
|
| 101 |
+
# Configure
|
| 102 |
+
cp .env.example .env
|
| 103 |
+
# Edit .env with your keys
|
| 104 |
+
```
|
| 105 |
+
|
| 106 |
+
## Configuration
|
| 107 |
+
|
| 108 |
+
Edit `.env`:
|
| 109 |
+
|
| 110 |
+
```bash
|
| 111 |
+
# Required
|
| 112 |
+
OPENAI_API_KEY=sk-...your-key...
|
| 113 |
+
|
| 114 |
+
# OpenClaw Gateway (required)
|
| 115 |
+
OPENCLAW_GATEWAY_URL=http://localhost:18789
|
| 116 |
+
OPENCLAW_TOKEN=your-gateway-token
|
| 117 |
+
OPENCLAW_AGENT_ID=main
|
| 118 |
+
|
| 119 |
+
# Optional
|
| 120 |
+
OPENAI_VOICE=cedar
|
| 121 |
+
ENABLE_FACE_TRACKING=true
|
| 122 |
+
HEAD_TRACKER_TYPE=mediapipe
|
| 123 |
+
```
|
| 124 |
+
|
| 125 |
+
## Usage
|
| 126 |
+
|
| 127 |
+
### With Simulator
|
| 128 |
+
|
| 129 |
+
```bash
|
| 130 |
+
# Terminal 1: Start simulator
|
| 131 |
+
reachy-mini-daemon --sim
|
| 132 |
+
|
| 133 |
+
# Terminal 2: Run ReachyClaw
|
| 134 |
+
reachyclaw --gradio
|
| 135 |
+
```
|
| 136 |
+
|
| 137 |
+
### With Physical Robot
|
| 138 |
+
|
| 139 |
+
```bash
|
| 140 |
+
reachyclaw
|
| 141 |
+
|
| 142 |
+
# With debug logging
|
| 143 |
+
reachyclaw --debug
|
| 144 |
+
|
| 145 |
+
# With specific robot
|
| 146 |
+
reachyclaw --robot-name my-reachy
|
| 147 |
+
```
|
| 148 |
+
|
| 149 |
+
### CLI Options
|
| 150 |
+
|
| 151 |
+
| Option | Description |
|
| 152 |
+
|--------|-------------|
|
| 153 |
+
| `--debug` | Enable debug logging |
|
| 154 |
+
| `--gradio` | Launch web UI instead of console mode |
|
| 155 |
+
| `--robot-name NAME` | Specify robot name for connection |
|
| 156 |
+
| `--gateway-url URL` | OpenClaw gateway URL |
|
| 157 |
+
| `--no-camera` | Disable camera functionality |
|
| 158 |
+
| `--no-openclaw` | Disable OpenClaw integration |
|
| 159 |
+
| `--head-tracker TYPE` | Face tracker: `mediapipe` or `yolo` |
|
| 160 |
+
| `--no-face-tracking` | Disable face tracking |
|
| 161 |
+
|
| 162 |
+
## How It Differs from ClawBody
|
| 163 |
+
|
| 164 |
+
ClawBody (the stock app) uses GPT-4o as the brain and only calls OpenClaw occasionally for tools like calendar or weather. ReachyClaw inverts this:
|
| 165 |
+
|
| 166 |
+
| | ClawBody | ReachyClaw |
|
| 167 |
+
|---|---|---|
|
| 168 |
+
| **Brain** | GPT-4o (with OpenClaw context snapshot) | OpenClaw (every message) |
|
| 169 |
+
| **Body control** | GPT-4o decides movements | OpenClaw decides movements |
|
| 170 |
+
| **Startup** | 20-30s context fetch from OpenClaw | Instant (no context fetch needed) |
|
| 171 |
+
| **Memory** | Stale snapshot from startup | Live OpenClaw memory |
|
| 172 |
+
| **GPT-4o role** | Full agent | Voice relay only |
|
| 173 |
+
|
| 174 |
+
## License
|
| 175 |
+
|
| 176 |
+
Apache 2.0 — see [LICENSE](LICENSE).
|
| 177 |
+
|
| 178 |
+
## Acknowledgments
|
| 179 |
+
|
| 180 |
+
Built on top of:
|
| 181 |
+
|
| 182 |
+
- [Pollen Robotics](https://www.pollen-robotics.com/) — Reachy Mini robot, SDK, and simulator
|
| 183 |
+
- [OpenClaw](https://github.com/openclaw/openclaw) — AI agent framework
|
| 184 |
+
- [OpenAI](https://openai.com/) — Realtime API for voice I/O
|
| 185 |
+
- [ClawBody](https://github.com/tomrikert/clawbody) — Original Reachy Mini + OpenClaw app (Apache 2.0)
|
index.html
ADDED
|
@@ -0,0 +1,201 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!DOCTYPE html>
|
| 2 |
+
<html lang="en">
|
| 3 |
+
<head>
|
| 4 |
+
<meta charset="UTF-8">
|
| 5 |
+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
| 6 |
+
<title>ReachyClaw - Reachy Mini App</title>
|
| 7 |
+
<link rel="preconnect" href="https://fonts.googleapis.com">
|
| 8 |
+
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
|
| 9 |
+
<link href="https://fonts.googleapis.com/css2?family=Space+Grotesk:wght@400;500;700&display=swap" rel="stylesheet">
|
| 10 |
+
<link rel="stylesheet" href="style.css">
|
| 11 |
+
</head>
|
| 12 |
+
<body>
|
| 13 |
+
|
| 14 |
+
<section class="hero">
|
| 15 |
+
<div class="topline">
|
| 16 |
+
<div class="brand">
|
| 17 |
+
<span class="logo">🤖</span>
|
| 18 |
+
<span class="brand-name">ReachyClaw</span>
|
| 19 |
+
</div>
|
| 20 |
+
<div class="pill">Voice conversation · OpenClaw brain · Full body control</div>
|
| 21 |
+
</div>
|
| 22 |
+
|
| 23 |
+
<div class="hero-grid">
|
| 24 |
+
<div class="hero-copy">
|
| 25 |
+
<div class="eyebrow">Reachy Mini App</div>
|
| 26 |
+
<h1>Your OpenClaw agent, embodied.</h1>
|
| 27 |
+
<p class="lede">
|
| 28 |
+
Give your OpenClaw AI agent a Reachy Mini robot body.
|
| 29 |
+
OpenClaw is the brain — it controls what the robot says,
|
| 30 |
+
how it moves, and what it sees. OpenAI Realtime API handles voice I/O.
|
| 31 |
+
</p>
|
| 32 |
+
<div class="hero-actions">
|
| 33 |
+
<a href="#simulator" class="btn primary">🖥️ Try with Simulator</a>
|
| 34 |
+
<a href="#features" class="btn ghost">See features</a>
|
| 35 |
+
</div>
|
| 36 |
+
<div class="hero-badges">
|
| 37 |
+
<span>🧠 OpenClaw brain</span>
|
| 38 |
+
<span>🎙️ OpenAI Realtime voice</span>
|
| 39 |
+
<span>💃 Full body control</span>
|
| 40 |
+
<span>🖥️ No robot required!</span>
|
| 41 |
+
</div>
|
| 42 |
+
</div>
|
| 43 |
+
<div class="hero-visual">
|
| 44 |
+
<div class="glass-card">
|
| 45 |
+
<img src="https://huggingface.co/spaces/pollen-robotics/reachy_mini_conversation_app/resolve/main/docs/assets/reachy_mini_dance.gif"
|
| 46 |
+
alt="Reachy Mini Robot Dancing"
|
| 47 |
+
class="hero-gif">
|
| 48 |
+
<p class="caption">Works with physical robot OR MuJoCo simulator!</p>
|
| 49 |
+
</div>
|
| 50 |
+
</div>
|
| 51 |
+
</div>
|
| 52 |
+
</section>
|
| 53 |
+
|
| 54 |
+
<section class="section simulator-callout" id="simulator">
|
| 55 |
+
<div class="story-card highlight">
|
| 56 |
+
<h2>🖥️ No Robot? No Problem!</h2>
|
| 57 |
+
<p class="story-text" style="font-size: 1.1rem;">
|
| 58 |
+
<strong>You don't need a physical Reachy Mini robot to use ReachyClaw!</strong><br><br>
|
| 59 |
+
ReachyClaw works with the Reachy Mini Simulator, a MuJoCo-based physics simulation
|
| 60 |
+
that runs on your computer. Watch your agent move and express emotions on screen
|
| 61 |
+
while you talk.
|
| 62 |
+
</p>
|
| 63 |
+
<div class="architecture-preview" style="margin: 1.5rem 0;">
|
| 64 |
+
<pre>
|
| 65 |
+
# Install simulator support
|
| 66 |
+
pip install "reachy-mini[mujoco]"
|
| 67 |
+
|
| 68 |
+
# Start the simulator (opens 3D window)
|
| 69 |
+
reachy-mini-daemon --sim
|
| 70 |
+
|
| 71 |
+
# In another terminal, run ReachyClaw
|
| 72 |
+
reachyclaw --gradio
|
| 73 |
+
</pre>
|
| 74 |
+
</div>
|
| 75 |
+
<p class="caption">🍎 Mac Users: Use <code>mjpython -m reachy_mini.daemon.app.main --sim</code> instead</p>
|
| 76 |
+
<a href="https://huggingface.co/docs/reachy_mini/platforms/simulation/get_started" class="btn primary" style="margin-top: 1rem;" target="_blank">
|
| 77 |
+
📚 Simulator Setup Guide
|
| 78 |
+
</a>
|
| 79 |
+
</div>
|
| 80 |
+
</section>
|
| 81 |
+
|
| 82 |
+
<section class="section" id="features">
|
| 83 |
+
<div class="section-header">
|
| 84 |
+
<h2>What's inside</h2>
|
| 85 |
+
<p class="intro">
|
| 86 |
+
ReachyClaw makes OpenClaw the actual brain — every message, every movement, every decision.
|
| 87 |
+
</p>
|
| 88 |
+
</div>
|
| 89 |
+
<div class="feature-grid">
|
| 90 |
+
<div class="feature-card">
|
| 91 |
+
<div class="icon">🧠</div>
|
| 92 |
+
<h3>OpenClaw is the brain</h3>
|
| 93 |
+
<p>Every user message goes through your OpenClaw agent. No GPT-4o guessing — real responses with full tool access.</p>
|
| 94 |
+
</div>
|
| 95 |
+
<div class="feature-card">
|
| 96 |
+
<div class="icon">🎤</div>
|
| 97 |
+
<h3>Real-time voice</h3>
|
| 98 |
+
<p>OpenAI Realtime API for low-latency speech-to-text and text-to-speech. Voice I/O only — no GPT-4o brain.</p>
|
| 99 |
+
</div>
|
| 100 |
+
<div class="feature-card">
|
| 101 |
+
<div class="icon">🤖</div>
|
| 102 |
+
<h3>Full body control</h3>
|
| 103 |
+
<p>OpenClaw controls the robot body via action tags — head movement, emotions, dances, camera, face tracking.</p>
|
| 104 |
+
</div>
|
| 105 |
+
<div class="feature-card">
|
| 106 |
+
<div class="icon">👀</div>
|
| 107 |
+
<h3>Vision</h3>
|
| 108 |
+
<p>See through the robot's camera. Your agent can look around and describe what it sees.</p>
|
| 109 |
+
</div>
|
| 110 |
+
<div class="feature-card">
|
| 111 |
+
<div class="icon">🖥️</div>
|
| 112 |
+
<h3>Simulator support</h3>
|
| 113 |
+
<p>No robot? Run with MuJoCo simulator and watch your agent move in a 3D window.</p>
|
| 114 |
+
</div>
|
| 115 |
+
<div class="feature-card">
|
| 116 |
+
<div class="icon">⚡</div>
|
| 117 |
+
<h3>Instant startup</h3>
|
| 118 |
+
<p>No 30-second context fetch. GPT-4o is just a relay — the session starts immediately.</p>
|
| 119 |
+
</div>
|
| 120 |
+
</div>
|
| 121 |
+
</section>
|
| 122 |
+
|
| 123 |
+
<section class="section story" id="how-it-works">
|
| 124 |
+
<div class="story-grid">
|
| 125 |
+
<div class="story-card">
|
| 126 |
+
<h3>How it works</h3>
|
| 127 |
+
<p class="story-text">OpenClaw controls everything</p>
|
| 128 |
+
<ol class="story-list">
|
| 129 |
+
<li><span>🎤</span> Robot captures your voice</li>
|
| 130 |
+
<li><span>📝</span> OpenAI Realtime transcribes your speech</li>
|
| 131 |
+
<li><span>🧠</span> Your message goes to OpenClaw (the real brain)</li>
|
| 132 |
+
<li><span>🤖</span> OpenClaw responds with text + action tags like [EMOTION:happy]</li>
|
| 133 |
+
<li><span>💃</span> ReachyClaw executes the actions on the robot</li>
|
| 134 |
+
<li><span>🔊</span> Clean text goes to TTS — robot speaks while moving</li>
|
| 135 |
+
</ol>
|
| 136 |
+
</div>
|
| 137 |
+
<div class="story-card secondary">
|
| 138 |
+
<h3>Prerequisites</h3>
|
| 139 |
+
<p class="story-text">Choose your setup:</p>
|
| 140 |
+
<div class="chips">
|
| 141 |
+
<span class="chip">🧠 OpenClaw Gateway</span>
|
| 142 |
+
<span class="chip">🔑 OpenAI API Key</span>
|
| 143 |
+
<span class="chip">🐍 Python 3.11+</span>
|
| 144 |
+
</div>
|
| 145 |
+
<p class="story-text" style="margin-top: 1rem;">
|
| 146 |
+
<strong>Option A:</strong> 🤖 Physical Reachy Mini robot<br>
|
| 147 |
+
<strong>Option B:</strong> 🖥️ MuJoCo Simulator (free, no hardware!)
|
| 148 |
+
</p>
|
| 149 |
+
<a href="https://github.com/shaunx/reachyclaw#readme" class="btn ghost wide" style="margin-top: 1rem;">
|
| 150 |
+
View installation guide
|
| 151 |
+
</a>
|
| 152 |
+
</div>
|
| 153 |
+
</div>
|
| 154 |
+
</section>
|
| 155 |
+
|
| 156 |
+
<section class="section">
|
| 157 |
+
<div class="section-header">
|
| 158 |
+
<h2>Quick start</h2>
|
| 159 |
+
<p class="intro">Get ReachyClaw running with the simulator</p>
|
| 160 |
+
</div>
|
| 161 |
+
<div class="story-card">
|
| 162 |
+
<div class="architecture-preview">
|
| 163 |
+
<pre>
|
| 164 |
+
# Clone ReachyClaw
|
| 165 |
+
git clone https://github.com/shaunx/reachyclaw
|
| 166 |
+
cd reachyclaw
|
| 167 |
+
|
| 168 |
+
# Create virtual environment
|
| 169 |
+
python -m venv .venv
|
| 170 |
+
source .venv/bin/activate
|
| 171 |
+
|
| 172 |
+
# Install ReachyClaw + simulator
|
| 173 |
+
pip install -e .
|
| 174 |
+
pip install "reachy-mini[mujoco]"
|
| 175 |
+
|
| 176 |
+
# Configure (edit with your OpenClaw URL and OpenAI key)
|
| 177 |
+
cp .env.example .env
|
| 178 |
+
nano .env
|
| 179 |
+
|
| 180 |
+
# Terminal 1: Start simulator
|
| 181 |
+
reachy-mini-daemon --sim
|
| 182 |
+
|
| 183 |
+
# Terminal 2: Run ReachyClaw
|
| 184 |
+
reachyclaw --gradio
|
| 185 |
+
</pre>
|
| 186 |
+
</div>
|
| 187 |
+
</div>
|
| 188 |
+
</section>
|
| 189 |
+
|
| 190 |
+
<footer class="footer">
|
| 191 |
+
<p>
|
| 192 |
+
ReachyClaw — your OpenClaw agent, embodied in Reachy Mini.<br>
|
| 193 |
+
<strong>Works with physical robot OR simulator!</strong><br><br>
|
| 194 |
+
Learn more about <a href="https://github.com/openclaw/openclaw">OpenClaw</a>,
|
| 195 |
+
<a href="https://github.com/pollen-robotics/reachy_mini">Reachy Mini</a>, and
|
| 196 |
+
<a href="https://huggingface.co/docs/reachy_mini/platforms/simulation/get_started">the Simulator</a>.
|
| 197 |
+
</p>
|
| 198 |
+
</footer>
|
| 199 |
+
|
| 200 |
+
</body>
|
| 201 |
+
</html>
|
openclaw-skill/SKILL.md
ADDED
|
@@ -0,0 +1,109 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
name: reachyclaw
|
| 3 |
+
description: Give your OpenClaw AI agent a physical robot body with Reachy Mini. OpenClaw is the brain — it controls speech, movement, and vision. Works with physical robot OR simulator!
|
| 4 |
+
---
|
| 5 |
+
|
| 6 |
+
# ReachyClaw - Robot Body for OpenClaw
|
| 7 |
+
|
| 8 |
+
Give your OpenClaw agent a physical Reachy Mini robot body where OpenClaw is the actual brain.
|
| 9 |
+
|
| 10 |
+
## Overview
|
| 11 |
+
|
| 12 |
+
ReachyClaw embodies your OpenClaw AI agent in a Reachy Mini robot. Unlike typical setups where GPT-4o is the brain, ReachyClaw routes every message through OpenClaw and lets it control the robot body via action tags.
|
| 13 |
+
|
| 14 |
+
- **Hear**: Listen to voice commands via the robot's microphone
|
| 15 |
+
- **See**: View the world through the robot's camera
|
| 16 |
+
- **Speak**: Respond with natural voice through the robot's speaker
|
| 17 |
+
- **Move**: Control head movements, emotions, and dances from OpenClaw
|
| 18 |
+
|
| 19 |
+
## Architecture
|
| 20 |
+
|
| 21 |
+
```
|
| 22 |
+
You speak -> Reachy Mini microphone
|
| 23 |
+
|
|
| 24 |
+
OpenAI Realtime API (STT only)
|
| 25 |
+
|
|
| 26 |
+
OpenClaw (the actual brain)
|
| 27 |
+
|
|
| 28 |
+
Response: "[EMOTION:happy] That's great!"
|
| 29 |
+
|
|
| 30 |
+
ReachyClaw parses actions -> robot moves
|
| 31 |
+
Clean text -> TTS -> robot speaks
|
| 32 |
+
```
|
| 33 |
+
|
| 34 |
+
## Requirements
|
| 35 |
+
|
| 36 |
+
### Option A: Physical Robot
|
| 37 |
+
- [Reachy Mini](https://github.com/pollen-robotics/reachy_mini) robot (Wireless or Lite)
|
| 38 |
+
|
| 39 |
+
### Option B: Simulator (No Hardware Required!)
|
| 40 |
+
- Any computer with Python 3.11+
|
| 41 |
+
- Install: `pip install "reachy-mini[mujoco]"`
|
| 42 |
+
|
| 43 |
+
### Software (Both Options)
|
| 44 |
+
- Python 3.11+
|
| 45 |
+
- OpenAI API key with Realtime API access
|
| 46 |
+
- OpenClaw gateway running on your network
|
| 47 |
+
|
| 48 |
+
## Installation
|
| 49 |
+
|
| 50 |
+
```bash
|
| 51 |
+
git clone https://github.com/shaunx/reachyclaw
|
| 52 |
+
cd reachyclaw
|
| 53 |
+
pip install -e .
|
| 54 |
+
```
|
| 55 |
+
|
| 56 |
+
## Configuration
|
| 57 |
+
|
| 58 |
+
Create a `.env` file:
|
| 59 |
+
|
| 60 |
+
```bash
|
| 61 |
+
OPENAI_API_KEY=sk-your-key-here
|
| 62 |
+
OPENCLAW_GATEWAY_URL=http://your-host-ip:18789
|
| 63 |
+
OPENCLAW_TOKEN=your-gateway-token
|
| 64 |
+
```
|
| 65 |
+
|
| 66 |
+
## Usage
|
| 67 |
+
|
| 68 |
+
### With Simulator
|
| 69 |
+
|
| 70 |
+
```bash
|
| 71 |
+
# Terminal 1: Start simulator
|
| 72 |
+
reachy-mini-daemon --sim
|
| 73 |
+
|
| 74 |
+
# Terminal 2: Run ReachyClaw
|
| 75 |
+
reachyclaw --gradio
|
| 76 |
+
```
|
| 77 |
+
|
| 78 |
+
### With Physical Robot
|
| 79 |
+
|
| 80 |
+
```bash
|
| 81 |
+
reachyclaw
|
| 82 |
+
|
| 83 |
+
# With debug logging
|
| 84 |
+
reachyclaw --debug
|
| 85 |
+
|
| 86 |
+
# With Gradio web UI
|
| 87 |
+
reachyclaw --gradio
|
| 88 |
+
```
|
| 89 |
+
|
| 90 |
+
## Robot Actions
|
| 91 |
+
|
| 92 |
+
OpenClaw can include these action tags in its responses:
|
| 93 |
+
|
| 94 |
+
- `[LOOK:left/right/up/down/front]` — head movement
|
| 95 |
+
- `[EMOTION:happy/sad/surprised/curious/thinking/confused/excited]` — emotions
|
| 96 |
+
- `[DANCE:happy/excited/wave/nod/shake/bounce]` — dances
|
| 97 |
+
- `[CAMERA]` — capture what the robot sees
|
| 98 |
+
- `[FACE_TRACKING:on/off]` — toggle face tracking
|
| 99 |
+
- `[STOP]` — stop all movements
|
| 100 |
+
|
| 101 |
+
## Links
|
| 102 |
+
|
| 103 |
+
- [GitHub Repository](https://github.com/shaunx/reachyclaw)
|
| 104 |
+
- [Reachy Mini SDK](https://github.com/pollen-robotics/reachy_mini)
|
| 105 |
+
- [OpenClaw Documentation](https://docs.openclaw.ai)
|
| 106 |
+
|
| 107 |
+
## License
|
| 108 |
+
|
| 109 |
+
Apache 2.0
|
pyproject.toml
ADDED
|
@@ -0,0 +1,120 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[build-system]
|
| 2 |
+
requires = ["setuptools>=61.0", "wheel"]
|
| 3 |
+
build-backend = "setuptools.build_meta"
|
| 4 |
+
|
| 5 |
+
[project]
|
| 6 |
+
name = "reachyclaw"
|
| 7 |
+
version = "0.1.0"
|
| 8 |
+
description = "ReachyClaw - Give your OpenClaw AI agent a physical robot body with Reachy Mini. OpenClaw is the brain; OpenAI Realtime API handles voice I/O."
|
| 9 |
+
readme = "README.md"
|
| 10 |
+
license = {text = "Apache-2.0"}
|
| 11 |
+
requires-python = ">=3.11"
|
| 12 |
+
authors = [
|
| 13 |
+
{name = "Shaun"}
|
| 14 |
+
]
|
| 15 |
+
keywords = [
|
| 16 |
+
"reachyclaw",
|
| 17 |
+
"reachy-mini",
|
| 18 |
+
"openclaw",
|
| 19 |
+
"robotics",
|
| 20 |
+
"ai-assistant",
|
| 21 |
+
"openai-realtime",
|
| 22 |
+
"voice-conversation",
|
| 23 |
+
"expressive-robot",
|
| 24 |
+
"embodied-ai"
|
| 25 |
+
]
|
| 26 |
+
classifiers = [
|
| 27 |
+
"Development Status :: 4 - Beta",
|
| 28 |
+
"Intended Audience :: Developers",
|
| 29 |
+
"License :: OSI Approved :: Apache Software License",
|
| 30 |
+
"Programming Language :: Python :: 3",
|
| 31 |
+
"Programming Language :: Python :: 3.11",
|
| 32 |
+
"Programming Language :: Python :: 3.12",
|
| 33 |
+
"Topic :: Scientific/Engineering :: Artificial Intelligence",
|
| 34 |
+
"Topic :: Scientific/Engineering :: Human Machine Interfaces",
|
| 35 |
+
]
|
| 36 |
+
dependencies = [
|
| 37 |
+
# OpenAI Realtime API
|
| 38 |
+
"openai>=1.50.0",
|
| 39 |
+
|
| 40 |
+
# Audio streaming
|
| 41 |
+
"fastrtc>=0.0.17",
|
| 42 |
+
"numpy",
|
| 43 |
+
"scipy",
|
| 44 |
+
|
| 45 |
+
# OpenClaw gateway client (WebSocket protocol)
|
| 46 |
+
"websockets>=12.0",
|
| 47 |
+
|
| 48 |
+
# Gradio UI
|
| 49 |
+
"gradio>=4.0.0",
|
| 50 |
+
|
| 51 |
+
# Environment
|
| 52 |
+
"python-dotenv",
|
| 53 |
+
]
|
| 54 |
+
|
| 55 |
+
# Note: reachy-mini SDK must be installed separately from the robot or GitHub:
|
| 56 |
+
# pip install git+https://github.com/pollen-robotics/reachy_mini.git
|
| 57 |
+
# Or on the robot, it's pre-installed.
|
| 58 |
+
|
| 59 |
+
[project.optional-dependencies]
|
| 60 |
+
wireless = [
|
| 61 |
+
"pygobject",
|
| 62 |
+
]
|
| 63 |
+
# YOLO-based face tracking (recommended - more accurate)
|
| 64 |
+
yolo_vision = [
|
| 65 |
+
"opencv-python",
|
| 66 |
+
"ultralytics",
|
| 67 |
+
"supervision",
|
| 68 |
+
]
|
| 69 |
+
# MediaPipe-based face tracking (lighter weight alternative)
|
| 70 |
+
mediapipe_vision = [
|
| 71 |
+
"mediapipe>=0.10.14",
|
| 72 |
+
]
|
| 73 |
+
# All vision options
|
| 74 |
+
all_vision = [
|
| 75 |
+
"opencv-python",
|
| 76 |
+
"ultralytics",
|
| 77 |
+
"supervision",
|
| 78 |
+
"mediapipe>=0.10.14",
|
| 79 |
+
]
|
| 80 |
+
# Legacy alias
|
| 81 |
+
vision = [
|
| 82 |
+
"opencv-python",
|
| 83 |
+
"ultralytics",
|
| 84 |
+
"supervision",
|
| 85 |
+
]
|
| 86 |
+
dev = [
|
| 87 |
+
"pytest",
|
| 88 |
+
"pytest-asyncio",
|
| 89 |
+
"ruff",
|
| 90 |
+
"mypy",
|
| 91 |
+
]
|
| 92 |
+
|
| 93 |
+
[project.scripts]
|
| 94 |
+
reachyclaw = "reachy_mini_openclaw.main:main"
|
| 95 |
+
|
| 96 |
+
[project.entry-points."reachy_mini_apps"]
|
| 97 |
+
reachyclaw = "reachy_mini_openclaw.main:ReachyClawApp"
|
| 98 |
+
|
| 99 |
+
[project.urls]
|
| 100 |
+
Homepage = "https://github.com/shaunx/reachyclaw"
|
| 101 |
+
Documentation = "https://github.com/shaunx/reachyclaw#readme"
|
| 102 |
+
Repository = "https://github.com/shaunx/reachyclaw"
|
| 103 |
+
Issues = "https://github.com/shaunx/reachyclaw/issues"
|
| 104 |
+
|
| 105 |
+
[tool.setuptools.packages.find]
|
| 106 |
+
where = ["src"]
|
| 107 |
+
|
| 108 |
+
[tool.ruff]
|
| 109 |
+
line-length = 120
|
| 110 |
+
target-version = "py311"
|
| 111 |
+
|
| 112 |
+
[tool.ruff.lint]
|
| 113 |
+
select = ["E", "F", "I", "N", "W", "UP"]
|
| 114 |
+
ignore = ["E501"]
|
| 115 |
+
|
| 116 |
+
[tool.mypy]
|
| 117 |
+
python_version = "3.11"
|
| 118 |
+
warn_return_any = true
|
| 119 |
+
warn_unused_configs = true
|
| 120 |
+
ignore_missing_imports = true
|
src/reachy_mini_openclaw/__init__.py
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Reachy Mini OpenClaw - Give your OpenClaw AI agent a physical presence.
|
| 2 |
+
|
| 3 |
+
This package combines OpenAI's Realtime API for responsive voice conversation
|
| 4 |
+
with Reachy Mini's expressive robot movements, allowing your OpenClaw agent
|
| 5 |
+
to see, hear, and speak through a physical robot body.
|
| 6 |
+
"""
|
| 7 |
+
|
| 8 |
+
__version__ = "0.1.0"
|
| 9 |
+
__author__ = "Tom"
|
src/reachy_mini_openclaw/audio/__init__.py
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Audio processing modules for Reachy Mini OpenClaw."""
|
| 2 |
+
|
| 3 |
+
from reachy_mini_openclaw.audio.head_wobbler import HeadWobbler
|
| 4 |
+
|
| 5 |
+
__all__ = ["HeadWobbler"]
|
src/reachy_mini_openclaw/audio/head_wobbler.py
ADDED
|
@@ -0,0 +1,223 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Audio-driven head movement for natural speech animation.
|
| 2 |
+
|
| 3 |
+
This module analyzes audio output in real-time and generates subtle head
|
| 4 |
+
movements that make the robot appear more expressive and alive while speaking.
|
| 5 |
+
|
| 6 |
+
The wobble is generated based on:
|
| 7 |
+
- Audio amplitude (volume) -> vertical movement
|
| 8 |
+
- Frequency content -> horizontal sway
|
| 9 |
+
- Speech rhythm -> timing of movements
|
| 10 |
+
|
| 11 |
+
Design:
|
| 12 |
+
- Runs in a separate thread to avoid blocking the main audio pipeline
|
| 13 |
+
- Uses a circular buffer for smooth interpolation
|
| 14 |
+
- Generates offsets that are added to the primary pose by MovementManager
|
| 15 |
+
"""
|
| 16 |
+
|
| 17 |
+
import base64
|
| 18 |
+
import logging
|
| 19 |
+
import threading
|
| 20 |
+
import time
|
| 21 |
+
from collections import deque
|
| 22 |
+
from typing import Callable, Optional, Tuple
|
| 23 |
+
|
| 24 |
+
import numpy as np
|
| 25 |
+
from numpy.typing import NDArray
|
| 26 |
+
|
| 27 |
+
logger = logging.getLogger(__name__)
|
| 28 |
+
|
| 29 |
+
# Type alias for speech offsets: (x, y, z, roll, pitch, yaw)
|
| 30 |
+
SpeechOffsets = Tuple[float, float, float, float, float, float]
|
| 31 |
+
|
| 32 |
+
|
| 33 |
+
class HeadWobbler:
|
| 34 |
+
"""Generate audio-driven head movements for expressive speech.
|
| 35 |
+
|
| 36 |
+
The wobbler analyzes incoming audio and produces subtle head movements
|
| 37 |
+
that are synchronized with speech patterns, making the robot appear
|
| 38 |
+
more natural and engaged during conversation.
|
| 39 |
+
|
| 40 |
+
Example:
|
| 41 |
+
def apply_offsets(offsets):
|
| 42 |
+
movement_manager.set_speech_offsets(offsets)
|
| 43 |
+
|
| 44 |
+
wobbler = HeadWobbler(set_speech_offsets=apply_offsets)
|
| 45 |
+
wobbler.start()
|
| 46 |
+
|
| 47 |
+
# Feed audio as it's played
|
| 48 |
+
wobbler.feed(base64_audio_chunk)
|
| 49 |
+
|
| 50 |
+
wobbler.stop()
|
| 51 |
+
"""
|
| 52 |
+
|
| 53 |
+
def __init__(
|
| 54 |
+
self,
|
| 55 |
+
set_speech_offsets: Callable[[SpeechOffsets], None],
|
| 56 |
+
sample_rate: int = 24000,
|
| 57 |
+
update_rate: float = 30.0, # Hz
|
| 58 |
+
):
|
| 59 |
+
"""Initialize the head wobbler.
|
| 60 |
+
|
| 61 |
+
Args:
|
| 62 |
+
set_speech_offsets: Callback to apply offsets to the movement system
|
| 63 |
+
sample_rate: Expected audio sample rate (Hz)
|
| 64 |
+
update_rate: How often to update offsets (Hz)
|
| 65 |
+
"""
|
| 66 |
+
self.set_speech_offsets = set_speech_offsets
|
| 67 |
+
self.sample_rate = sample_rate
|
| 68 |
+
self.update_period = 1.0 / update_rate
|
| 69 |
+
|
| 70 |
+
# Audio analysis parameters
|
| 71 |
+
self.amplitude_scale = 0.008 # Max displacement in meters
|
| 72 |
+
self.roll_scale = 0.15 # Max roll in radians
|
| 73 |
+
self.pitch_scale = 0.08 # Max pitch in radians
|
| 74 |
+
self.smoothing = 0.3 # Smoothing factor (0-1)
|
| 75 |
+
|
| 76 |
+
# State
|
| 77 |
+
self._audio_buffer: deque[NDArray[np.float32]] = deque(maxlen=10)
|
| 78 |
+
self._buffer_lock = threading.Lock()
|
| 79 |
+
self._current_amplitude = 0.0
|
| 80 |
+
self._current_offsets: SpeechOffsets = (0.0, 0.0, 0.0, 0.0, 0.0, 0.0)
|
| 81 |
+
|
| 82 |
+
# Thread control
|
| 83 |
+
self._stop_event = threading.Event()
|
| 84 |
+
self._thread: Optional[threading.Thread] = None
|
| 85 |
+
self._last_feed_time = 0.0
|
| 86 |
+
self._is_speaking = False
|
| 87 |
+
|
| 88 |
+
# Decay parameters for smooth return to neutral
|
| 89 |
+
self._decay_rate = 3.0 # How fast to decay when not speaking
|
| 90 |
+
self._speech_timeout = 0.3 # Seconds of silence before decay starts
|
| 91 |
+
|
| 92 |
+
def start(self) -> None:
|
| 93 |
+
"""Start the wobbler thread."""
|
| 94 |
+
if self._thread is not None and self._thread.is_alive():
|
| 95 |
+
logger.warning("HeadWobbler already running")
|
| 96 |
+
return
|
| 97 |
+
|
| 98 |
+
self._stop_event.clear()
|
| 99 |
+
self._thread = threading.Thread(target=self._run_loop, daemon=True)
|
| 100 |
+
self._thread.start()
|
| 101 |
+
logger.debug("HeadWobbler started")
|
| 102 |
+
|
| 103 |
+
def stop(self) -> None:
|
| 104 |
+
"""Stop the wobbler thread."""
|
| 105 |
+
self._stop_event.set()
|
| 106 |
+
if self._thread is not None:
|
| 107 |
+
self._thread.join(timeout=1.0)
|
| 108 |
+
self._thread = None
|
| 109 |
+
|
| 110 |
+
# Reset to neutral
|
| 111 |
+
self.set_speech_offsets((0.0, 0.0, 0.0, 0.0, 0.0, 0.0))
|
| 112 |
+
logger.debug("HeadWobbler stopped")
|
| 113 |
+
|
| 114 |
+
def reset(self) -> None:
|
| 115 |
+
"""Reset the wobbler state (call when speech ends or is interrupted)."""
|
| 116 |
+
with self._buffer_lock:
|
| 117 |
+
self._audio_buffer.clear()
|
| 118 |
+
self._current_amplitude = 0.0
|
| 119 |
+
self._is_speaking = False
|
| 120 |
+
self.set_speech_offsets((0.0, 0.0, 0.0, 0.0, 0.0, 0.0))
|
| 121 |
+
logger.debug("HeadWobbler reset")
|
| 122 |
+
|
| 123 |
+
def feed(self, audio_b64: str) -> None:
|
| 124 |
+
"""Feed audio data to the wobbler.
|
| 125 |
+
|
| 126 |
+
Args:
|
| 127 |
+
audio_b64: Base64-encoded PCM audio (int16)
|
| 128 |
+
"""
|
| 129 |
+
try:
|
| 130 |
+
audio_bytes = base64.b64decode(audio_b64)
|
| 131 |
+
audio_int16 = np.frombuffer(audio_bytes, dtype=np.int16)
|
| 132 |
+
audio_float = audio_int16.astype(np.float32) / 32768.0
|
| 133 |
+
|
| 134 |
+
with self._buffer_lock:
|
| 135 |
+
self._audio_buffer.append(audio_float)
|
| 136 |
+
|
| 137 |
+
self._last_feed_time = time.monotonic()
|
| 138 |
+
self._is_speaking = True
|
| 139 |
+
|
| 140 |
+
except Exception as e:
|
| 141 |
+
logger.debug("Error feeding audio to wobbler: %s", e)
|
| 142 |
+
|
| 143 |
+
def _compute_amplitude(self) -> float:
|
| 144 |
+
"""Compute current audio amplitude from buffer."""
|
| 145 |
+
with self._buffer_lock:
|
| 146 |
+
if not self._audio_buffer:
|
| 147 |
+
return 0.0
|
| 148 |
+
|
| 149 |
+
# Concatenate recent audio
|
| 150 |
+
audio = np.concatenate(list(self._audio_buffer))
|
| 151 |
+
|
| 152 |
+
# RMS amplitude
|
| 153 |
+
rms = np.sqrt(np.mean(audio ** 2))
|
| 154 |
+
return min(1.0, rms * 3.0) # Scale and clamp
|
| 155 |
+
|
| 156 |
+
def _compute_offsets(self, amplitude: float, t: float) -> SpeechOffsets:
|
| 157 |
+
"""Compute head offsets based on amplitude and time.
|
| 158 |
+
|
| 159 |
+
Args:
|
| 160 |
+
amplitude: Current audio amplitude (0-1)
|
| 161 |
+
t: Current time for oscillation
|
| 162 |
+
|
| 163 |
+
Returns:
|
| 164 |
+
Tuple of (x, y, z, roll, pitch, yaw) offsets
|
| 165 |
+
"""
|
| 166 |
+
if amplitude < 0.01:
|
| 167 |
+
return (0.0, 0.0, 0.0, 0.0, 0.0, 0.0)
|
| 168 |
+
|
| 169 |
+
# Vertical bob based on amplitude
|
| 170 |
+
z_offset = amplitude * self.amplitude_scale * np.sin(t * 8.0)
|
| 171 |
+
|
| 172 |
+
# Subtle roll sway
|
| 173 |
+
roll_offset = amplitude * self.roll_scale * np.sin(t * 3.0)
|
| 174 |
+
|
| 175 |
+
# Pitch variation
|
| 176 |
+
pitch_offset = amplitude * self.pitch_scale * np.sin(t * 5.0 + 0.5)
|
| 177 |
+
|
| 178 |
+
# Small yaw drift
|
| 179 |
+
yaw_offset = amplitude * 0.05 * np.sin(t * 2.0)
|
| 180 |
+
|
| 181 |
+
return (0.0, 0.0, z_offset, roll_offset, pitch_offset, yaw_offset)
|
| 182 |
+
|
| 183 |
+
def _run_loop(self) -> None:
|
| 184 |
+
"""Main wobbler loop."""
|
| 185 |
+
start_time = time.monotonic()
|
| 186 |
+
|
| 187 |
+
while not self._stop_event.is_set():
|
| 188 |
+
loop_start = time.monotonic()
|
| 189 |
+
t = loop_start - start_time
|
| 190 |
+
|
| 191 |
+
# Check if we're still receiving audio
|
| 192 |
+
silence_duration = loop_start - self._last_feed_time
|
| 193 |
+
|
| 194 |
+
if silence_duration > self._speech_timeout:
|
| 195 |
+
# Decay amplitude when not speaking
|
| 196 |
+
self._current_amplitude *= np.exp(-self._decay_rate * self.update_period)
|
| 197 |
+
self._is_speaking = False
|
| 198 |
+
else:
|
| 199 |
+
# Compute new amplitude with smoothing
|
| 200 |
+
raw_amplitude = self._compute_amplitude()
|
| 201 |
+
self._current_amplitude = (
|
| 202 |
+
self.smoothing * raw_amplitude +
|
| 203 |
+
(1 - self.smoothing) * self._current_amplitude
|
| 204 |
+
)
|
| 205 |
+
|
| 206 |
+
# Compute and apply offsets
|
| 207 |
+
offsets = self._compute_offsets(self._current_amplitude, t)
|
| 208 |
+
|
| 209 |
+
# Smooth transition between offsets
|
| 210 |
+
new_offsets = tuple(
|
| 211 |
+
self.smoothing * new + (1 - self.smoothing) * old
|
| 212 |
+
for new, old in zip(offsets, self._current_offsets)
|
| 213 |
+
)
|
| 214 |
+
self._current_offsets = new_offsets
|
| 215 |
+
|
| 216 |
+
# Apply to movement system
|
| 217 |
+
self.set_speech_offsets(new_offsets)
|
| 218 |
+
|
| 219 |
+
# Maintain update rate
|
| 220 |
+
elapsed = time.monotonic() - loop_start
|
| 221 |
+
sleep_time = max(0.0, self.update_period - elapsed)
|
| 222 |
+
if sleep_time > 0:
|
| 223 |
+
time.sleep(sleep_time)
|
src/reachy_mini_openclaw/camera_worker.py
ADDED
|
@@ -0,0 +1,382 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Camera worker thread with frame buffering and face tracking.
|
| 2 |
+
|
| 3 |
+
Provides:
|
| 4 |
+
- 30Hz+ camera polling with thread-safe frame buffering
|
| 5 |
+
- Face tracking integration with smooth interpolation
|
| 6 |
+
- Room scanning when no face is detected
|
| 7 |
+
- Latest frame always available for tools
|
| 8 |
+
- Smooth return to neutral when face is lost
|
| 9 |
+
|
| 10 |
+
Based on pollen-robotics/reachy_mini_conversation_app camera worker.
|
| 11 |
+
"""
|
| 12 |
+
|
| 13 |
+
import time
|
| 14 |
+
import logging
|
| 15 |
+
import threading
|
| 16 |
+
from typing import Any, List, Tuple, Optional
|
| 17 |
+
|
| 18 |
+
import numpy as np
|
| 19 |
+
from numpy.typing import NDArray
|
| 20 |
+
from scipy.spatial.transform import Rotation as R
|
| 21 |
+
|
| 22 |
+
from reachy_mini import ReachyMini
|
| 23 |
+
from reachy_mini.utils.interpolation import linear_pose_interpolation
|
| 24 |
+
|
| 25 |
+
|
| 26 |
+
logger = logging.getLogger(__name__)
|
| 27 |
+
|
| 28 |
+
|
| 29 |
+
class CameraWorker:
|
| 30 |
+
"""Thread-safe camera worker with frame buffering and face tracking.
|
| 31 |
+
|
| 32 |
+
State machine for face tracking:
|
| 33 |
+
SCANNING -- no face known, sweeping the room to find one
|
| 34 |
+
TRACKING -- face detected, following it with head offsets
|
| 35 |
+
WAITING -- face just lost, holding position briefly
|
| 36 |
+
RETURNING -- interpolating back to neutral before scanning again
|
| 37 |
+
"""
|
| 38 |
+
|
| 39 |
+
def __init__(self, reachy_mini: ReachyMini, head_tracker: Any = None) -> None:
|
| 40 |
+
"""Initialize camera worker.
|
| 41 |
+
|
| 42 |
+
Args:
|
| 43 |
+
reachy_mini: Connected ReachyMini instance
|
| 44 |
+
head_tracker: Optional head tracker (YOLO or MediaPipe)
|
| 45 |
+
"""
|
| 46 |
+
self.reachy_mini = reachy_mini
|
| 47 |
+
self.head_tracker = head_tracker
|
| 48 |
+
|
| 49 |
+
# Thread-safe frame storage
|
| 50 |
+
self.latest_frame: Optional[NDArray[np.uint8]] = None
|
| 51 |
+
self.frame_lock = threading.Lock()
|
| 52 |
+
self._stop_event = threading.Event()
|
| 53 |
+
self._thread: Optional[threading.Thread] = None
|
| 54 |
+
|
| 55 |
+
# Face tracking state
|
| 56 |
+
self.is_head_tracking_enabled = True
|
| 57 |
+
self.face_tracking_offsets: List[float] = [
|
| 58 |
+
0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
|
| 59 |
+
] # x, y, z, roll, pitch, yaw
|
| 60 |
+
self.face_tracking_lock = threading.Lock()
|
| 61 |
+
|
| 62 |
+
# Face tracking timing (for smooth interpolation back to neutral)
|
| 63 |
+
self.last_face_detected_time: Optional[float] = None
|
| 64 |
+
self.interpolation_start_time: Optional[float] = None
|
| 65 |
+
self.interpolation_start_pose: Optional[NDArray[np.float32]] = None
|
| 66 |
+
self.face_lost_delay = 2.0 # seconds to wait before starting interpolation
|
| 67 |
+
self.interpolation_duration = 1.0 # seconds to interpolate back to neutral
|
| 68 |
+
|
| 69 |
+
# Track state changes
|
| 70 |
+
self.previous_head_tracking_state = self.is_head_tracking_enabled
|
| 71 |
+
|
| 72 |
+
# Tracking scale factor (proportional gain for the camera-head servo loop).
|
| 73 |
+
# 0.85 provides accurate convergence via closed-loop feedback while
|
| 74 |
+
# avoiding single-frame overshoot that causes jitter.
|
| 75 |
+
self.tracking_scale = 0.85
|
| 76 |
+
|
| 77 |
+
# Smoothing factor for exponential moving average (0.0-1.0)
|
| 78 |
+
# At 25Hz with alpha=0.25, 95% convergence ~0.5s -- smooth enough to
|
| 79 |
+
# filter detection noise, responsive enough to feel like eye contact.
|
| 80 |
+
self.smoothing_alpha = 0.25
|
| 81 |
+
|
| 82 |
+
# Previous smoothed offsets for EMA calculation
|
| 83 |
+
self._smoothed_offsets: List[float] = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
|
| 84 |
+
|
| 85 |
+
# --- Room scanning state ---
|
| 86 |
+
# When no face is visible, the robot periodically sweeps the room.
|
| 87 |
+
self._scanning = False
|
| 88 |
+
self._scanning_start_time = 0.0
|
| 89 |
+
# Scanning pattern: sinusoidal yaw sweep
|
| 90 |
+
self._scan_yaw_amplitude = np.deg2rad(35) # ±35 degrees
|
| 91 |
+
self._scan_period = 8.0 # seconds for a full left-right-left cycle
|
| 92 |
+
self._scan_pitch_offset = np.deg2rad(3) # slight upward tilt while scanning
|
| 93 |
+
# Start scanning immediately at boot (before any face has ever been seen)
|
| 94 |
+
self._ever_seen_face = False
|
| 95 |
+
|
| 96 |
+
def get_latest_frame(self) -> Optional[NDArray[np.uint8]]:
|
| 97 |
+
"""Get the latest frame (thread-safe).
|
| 98 |
+
|
| 99 |
+
Returns:
|
| 100 |
+
Copy of latest frame in BGR format, or None if no frame available
|
| 101 |
+
"""
|
| 102 |
+
with self.frame_lock:
|
| 103 |
+
if self.latest_frame is None:
|
| 104 |
+
return None
|
| 105 |
+
return self.latest_frame.copy()
|
| 106 |
+
|
| 107 |
+
def get_face_tracking_offsets(
|
| 108 |
+
self,
|
| 109 |
+
) -> Tuple[float, float, float, float, float, float]:
|
| 110 |
+
"""Get current face tracking offsets (thread-safe).
|
| 111 |
+
|
| 112 |
+
Returns:
|
| 113 |
+
Tuple of (x, y, z, roll, pitch, yaw) offsets
|
| 114 |
+
"""
|
| 115 |
+
with self.face_tracking_lock:
|
| 116 |
+
offsets = self.face_tracking_offsets
|
| 117 |
+
return (offsets[0], offsets[1], offsets[2], offsets[3], offsets[4], offsets[5])
|
| 118 |
+
|
| 119 |
+
def set_head_tracking_enabled(self, enabled: bool) -> None:
|
| 120 |
+
"""Enable/disable head tracking.
|
| 121 |
+
|
| 122 |
+
Args:
|
| 123 |
+
enabled: Whether to enable face tracking
|
| 124 |
+
"""
|
| 125 |
+
if enabled and not self.is_head_tracking_enabled:
|
| 126 |
+
# Reset smoothed offsets so tracking converges quickly from scratch
|
| 127 |
+
self._smoothed_offsets = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
|
| 128 |
+
# Start scanning immediately when re-enabled
|
| 129 |
+
self._start_scanning()
|
| 130 |
+
self.is_head_tracking_enabled = enabled
|
| 131 |
+
logger.info("Head tracking %s", "enabled" if enabled else "disabled")
|
| 132 |
+
|
| 133 |
+
def start(self) -> None:
|
| 134 |
+
"""Start the camera worker loop in a thread."""
|
| 135 |
+
self._stop_event.clear()
|
| 136 |
+
self._thread = threading.Thread(target=self._working_loop, daemon=True)
|
| 137 |
+
self._thread.start()
|
| 138 |
+
logger.info("Camera worker started")
|
| 139 |
+
|
| 140 |
+
def stop(self) -> None:
|
| 141 |
+
"""Stop the camera worker loop."""
|
| 142 |
+
self._stop_event.set()
|
| 143 |
+
if self._thread is not None:
|
| 144 |
+
self._thread.join(timeout=2.0)
|
| 145 |
+
logger.info("Camera worker stopped")
|
| 146 |
+
|
| 147 |
+
# ------------------------------------------------------------------
|
| 148 |
+
# Scanning helpers
|
| 149 |
+
# ------------------------------------------------------------------
|
| 150 |
+
|
| 151 |
+
def _start_scanning(self) -> None:
|
| 152 |
+
"""Begin the room-scanning sweep."""
|
| 153 |
+
if not self._scanning:
|
| 154 |
+
self._scanning = True
|
| 155 |
+
self._scanning_start_time = time.time()
|
| 156 |
+
logger.debug("Started room scanning")
|
| 157 |
+
|
| 158 |
+
def _stop_scanning(self) -> None:
|
| 159 |
+
"""Stop the room-scanning sweep."""
|
| 160 |
+
if self._scanning:
|
| 161 |
+
self._scanning = False
|
| 162 |
+
logger.debug("Stopped room scanning")
|
| 163 |
+
|
| 164 |
+
def _update_scanning_offsets(self, current_time: float) -> None:
|
| 165 |
+
"""Compute scanning offsets -- a slow yaw sweep with slight pitch up.
|
| 166 |
+
|
| 167 |
+
The sweep is sinusoidal so the head slows at the extremes (more natural)
|
| 168 |
+
and the face detector gets a chance to catch faces at the edges.
|
| 169 |
+
"""
|
| 170 |
+
t = current_time - self._scanning_start_time
|
| 171 |
+
|
| 172 |
+
yaw = float(self._scan_yaw_amplitude * np.sin(2 * np.pi * t / self._scan_period))
|
| 173 |
+
pitch = float(self._scan_pitch_offset)
|
| 174 |
+
|
| 175 |
+
with self.face_tracking_lock:
|
| 176 |
+
self.face_tracking_offsets = [0.0, 0.0, 0.0, 0.0, pitch, yaw]
|
| 177 |
+
|
| 178 |
+
# ------------------------------------------------------------------
|
| 179 |
+
# Main loop
|
| 180 |
+
# ------------------------------------------------------------------
|
| 181 |
+
|
| 182 |
+
def _working_loop(self) -> None:
|
| 183 |
+
"""Main camera worker loop.
|
| 184 |
+
|
| 185 |
+
Runs at ~25Hz, captures frames and processes face tracking.
|
| 186 |
+
"""
|
| 187 |
+
logger.debug("Starting camera working loop")
|
| 188 |
+
|
| 189 |
+
# Neutral pose for interpolation target
|
| 190 |
+
neutral_pose = np.eye(4, dtype=np.float32)
|
| 191 |
+
self.previous_head_tracking_state = self.is_head_tracking_enabled
|
| 192 |
+
|
| 193 |
+
# Begin scanning right away so the robot looks for a face on startup
|
| 194 |
+
if self.is_head_tracking_enabled and self.head_tracker is not None:
|
| 195 |
+
self._start_scanning()
|
| 196 |
+
|
| 197 |
+
while not self._stop_event.is_set():
|
| 198 |
+
try:
|
| 199 |
+
current_time = time.time()
|
| 200 |
+
|
| 201 |
+
# Get frame from robot
|
| 202 |
+
frame = self.reachy_mini.media.get_frame()
|
| 203 |
+
|
| 204 |
+
if frame is not None:
|
| 205 |
+
# Thread-safe frame storage
|
| 206 |
+
with self.frame_lock:
|
| 207 |
+
self.latest_frame = frame
|
| 208 |
+
|
| 209 |
+
# Check if face tracking was just disabled
|
| 210 |
+
if self.previous_head_tracking_state and not self.is_head_tracking_enabled:
|
| 211 |
+
# Face tracking was just disabled - start interpolation to neutral
|
| 212 |
+
self.last_face_detected_time = current_time
|
| 213 |
+
self.interpolation_start_time = None
|
| 214 |
+
self.interpolation_start_pose = None
|
| 215 |
+
self._stop_scanning()
|
| 216 |
+
|
| 217 |
+
# Update tracking state
|
| 218 |
+
self.previous_head_tracking_state = self.is_head_tracking_enabled
|
| 219 |
+
|
| 220 |
+
# Handle face tracking if enabled and head tracker available
|
| 221 |
+
if self.is_head_tracking_enabled and self.head_tracker is not None:
|
| 222 |
+
self._process_face_tracking(frame, current_time, neutral_pose)
|
| 223 |
+
elif self.last_face_detected_time is not None:
|
| 224 |
+
# Handle interpolation back to neutral when tracking disabled
|
| 225 |
+
self._interpolate_to_neutral(current_time, neutral_pose)
|
| 226 |
+
|
| 227 |
+
# Sleep to maintain ~25Hz
|
| 228 |
+
time.sleep(0.04)
|
| 229 |
+
|
| 230 |
+
except Exception as e:
|
| 231 |
+
logger.error("Camera worker error: %s", e)
|
| 232 |
+
time.sleep(0.1)
|
| 233 |
+
|
| 234 |
+
logger.debug("Camera worker thread exited")
|
| 235 |
+
|
| 236 |
+
def _process_face_tracking(
|
| 237 |
+
self,
|
| 238 |
+
frame: NDArray[np.uint8],
|
| 239 |
+
current_time: float,
|
| 240 |
+
neutral_pose: NDArray[np.float32]
|
| 241 |
+
) -> None:
|
| 242 |
+
"""Process face tracking from frame.
|
| 243 |
+
|
| 244 |
+
Args:
|
| 245 |
+
frame: Current camera frame
|
| 246 |
+
current_time: Current timestamp
|
| 247 |
+
neutral_pose: Neutral pose matrix for interpolation
|
| 248 |
+
"""
|
| 249 |
+
eye_center, _ = self.head_tracker.get_head_position(frame)
|
| 250 |
+
|
| 251 |
+
if eye_center is not None:
|
| 252 |
+
# Face detected!
|
| 253 |
+
if not self._ever_seen_face:
|
| 254 |
+
self._ever_seen_face = True
|
| 255 |
+
logger.info("Face detected for the first time")
|
| 256 |
+
|
| 257 |
+
# Stop scanning if we were scanning
|
| 258 |
+
if self._scanning:
|
| 259 |
+
self._stop_scanning()
|
| 260 |
+
# Seed the EMA from current scanning offsets for smooth transition
|
| 261 |
+
with self.face_tracking_lock:
|
| 262 |
+
self._smoothed_offsets = list(self.face_tracking_offsets)
|
| 263 |
+
|
| 264 |
+
self.last_face_detected_time = current_time
|
| 265 |
+
self.interpolation_start_time = None # Stop any interpolation
|
| 266 |
+
|
| 267 |
+
# Convert normalized coordinates to pixel coordinates
|
| 268 |
+
h, w = frame.shape[:2]
|
| 269 |
+
eye_center_norm = (eye_center + 1) / 2
|
| 270 |
+
eye_center_pixels = [
|
| 271 |
+
eye_center_norm[0] * w,
|
| 272 |
+
eye_center_norm[1] * h,
|
| 273 |
+
]
|
| 274 |
+
|
| 275 |
+
# Get the head pose needed to look at the target
|
| 276 |
+
target_pose = self.reachy_mini.look_at_image(
|
| 277 |
+
eye_center_pixels[0],
|
| 278 |
+
eye_center_pixels[1],
|
| 279 |
+
duration=0.0,
|
| 280 |
+
perform_movement=False,
|
| 281 |
+
)
|
| 282 |
+
|
| 283 |
+
# Extract translation and rotation from the target pose
|
| 284 |
+
translation = target_pose[:3, 3]
|
| 285 |
+
rotation = R.from_matrix(target_pose[:3, :3]).as_euler("xyz", degrees=False)
|
| 286 |
+
|
| 287 |
+
# Scale for smoother closed-loop convergence
|
| 288 |
+
translation *= self.tracking_scale
|
| 289 |
+
rotation *= self.tracking_scale
|
| 290 |
+
|
| 291 |
+
# Apply exponential moving average (EMA) smoothing to reduce jitter
|
| 292 |
+
# new_smoothed = alpha * new_value + (1 - alpha) * old_smoothed
|
| 293 |
+
alpha = self.smoothing_alpha
|
| 294 |
+
new_offsets = [
|
| 295 |
+
translation[0], translation[1], translation[2],
|
| 296 |
+
rotation[0], rotation[1], rotation[2],
|
| 297 |
+
]
|
| 298 |
+
|
| 299 |
+
smoothed = [
|
| 300 |
+
alpha * new_offsets[i] + (1 - alpha) * self._smoothed_offsets[i]
|
| 301 |
+
for i in range(6)
|
| 302 |
+
]
|
| 303 |
+
self._smoothed_offsets = smoothed
|
| 304 |
+
|
| 305 |
+
# Thread-safe update of face tracking offsets
|
| 306 |
+
with self.face_tracking_lock:
|
| 307 |
+
self.face_tracking_offsets = smoothed
|
| 308 |
+
|
| 309 |
+
else:
|
| 310 |
+
# No face detected
|
| 311 |
+
if self._scanning:
|
| 312 |
+
# Already scanning -- keep sweeping the room
|
| 313 |
+
self._update_scanning_offsets(current_time)
|
| 314 |
+
else:
|
| 315 |
+
# Not scanning yet -- go through the wait/return/scan sequence
|
| 316 |
+
self._interpolate_to_neutral(current_time, neutral_pose)
|
| 317 |
+
|
| 318 |
+
def _interpolate_to_neutral(
|
| 319 |
+
self,
|
| 320 |
+
current_time: float,
|
| 321 |
+
neutral_pose: NDArray[np.float32]
|
| 322 |
+
) -> None:
|
| 323 |
+
"""Interpolate face tracking offsets back to neutral when face is lost.
|
| 324 |
+
|
| 325 |
+
Once interpolation completes, automatically starts room scanning.
|
| 326 |
+
|
| 327 |
+
Args:
|
| 328 |
+
current_time: Current timestamp
|
| 329 |
+
neutral_pose: Target neutral pose matrix
|
| 330 |
+
"""
|
| 331 |
+
if self.last_face_detected_time is None:
|
| 332 |
+
# Never seen a face -- go straight to scanning
|
| 333 |
+
self._start_scanning()
|
| 334 |
+
return
|
| 335 |
+
|
| 336 |
+
time_since_face_lost = current_time - self.last_face_detected_time
|
| 337 |
+
|
| 338 |
+
if time_since_face_lost >= self.face_lost_delay:
|
| 339 |
+
# Start interpolation if not already started
|
| 340 |
+
if self.interpolation_start_time is None:
|
| 341 |
+
self.interpolation_start_time = current_time
|
| 342 |
+
# Capture current pose as start of interpolation
|
| 343 |
+
with self.face_tracking_lock:
|
| 344 |
+
current_translation = self.face_tracking_offsets[:3]
|
| 345 |
+
current_rotation_euler = self.face_tracking_offsets[3:]
|
| 346 |
+
# Convert to 4x4 pose matrix
|
| 347 |
+
pose_matrix = np.eye(4, dtype=np.float32)
|
| 348 |
+
pose_matrix[:3, 3] = current_translation
|
| 349 |
+
pose_matrix[:3, :3] = R.from_euler(
|
| 350 |
+
"xyz", current_rotation_euler
|
| 351 |
+
).as_matrix()
|
| 352 |
+
self.interpolation_start_pose = pose_matrix
|
| 353 |
+
|
| 354 |
+
# Calculate interpolation progress (t from 0 to 1)
|
| 355 |
+
elapsed_interpolation = current_time - self.interpolation_start_time
|
| 356 |
+
t = min(1.0, elapsed_interpolation / self.interpolation_duration)
|
| 357 |
+
|
| 358 |
+
# Interpolate between current pose and neutral pose
|
| 359 |
+
interpolated_pose = linear_pose_interpolation(
|
| 360 |
+
self.interpolation_start_pose,
|
| 361 |
+
neutral_pose,
|
| 362 |
+
t,
|
| 363 |
+
)
|
| 364 |
+
|
| 365 |
+
# Extract translation and rotation from interpolated pose
|
| 366 |
+
translation = interpolated_pose[:3, 3]
|
| 367 |
+
rotation = R.from_matrix(interpolated_pose[:3, :3]).as_euler("xyz", degrees=False)
|
| 368 |
+
|
| 369 |
+
# Thread-safe update of face tracking offsets
|
| 370 |
+
with self.face_tracking_lock:
|
| 371 |
+
self.face_tracking_offsets = [
|
| 372 |
+
translation[0], translation[1], translation[2],
|
| 373 |
+
rotation[0], rotation[1], rotation[2],
|
| 374 |
+
]
|
| 375 |
+
|
| 376 |
+
# If interpolation is complete, start scanning the room
|
| 377 |
+
if t >= 1.0:
|
| 378 |
+
self.last_face_detected_time = None
|
| 379 |
+
self.interpolation_start_time = None
|
| 380 |
+
self.interpolation_start_pose = None
|
| 381 |
+
self._smoothed_offsets = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
|
| 382 |
+
self._start_scanning()
|
src/reachy_mini_openclaw/config.py
ADDED
|
@@ -0,0 +1,84 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Configuration management for Reachy Mini OpenClaw.
|
| 2 |
+
|
| 3 |
+
Handles environment variables and configuration settings for the application.
|
| 4 |
+
"""
|
| 5 |
+
|
| 6 |
+
import os
|
| 7 |
+
from pathlib import Path
|
| 8 |
+
from dataclasses import dataclass, field
|
| 9 |
+
from typing import Optional
|
| 10 |
+
|
| 11 |
+
from dotenv import load_dotenv
|
| 12 |
+
|
| 13 |
+
# Load environment variables from .env file
|
| 14 |
+
_project_root = Path(__file__).parent.parent.parent
|
| 15 |
+
load_dotenv(_project_root / ".env")
|
| 16 |
+
|
| 17 |
+
|
| 18 |
+
@dataclass
|
| 19 |
+
class Config:
|
| 20 |
+
"""Application configuration loaded from environment variables."""
|
| 21 |
+
|
| 22 |
+
# OpenAI Configuration
|
| 23 |
+
OPENAI_API_KEY: str = field(default_factory=lambda: os.getenv("OPENAI_API_KEY", ""))
|
| 24 |
+
OPENAI_MODEL: str = field(default_factory=lambda: os.getenv("OPENAI_MODEL", "gpt-4o-realtime-preview-2024-12-17"))
|
| 25 |
+
OPENAI_VOICE: str = field(default_factory=lambda: os.getenv("OPENAI_VOICE", "cedar"))
|
| 26 |
+
|
| 27 |
+
# OpenClaw Gateway Configuration
|
| 28 |
+
OPENCLAW_GATEWAY_URL: str = field(default_factory=lambda: os.getenv("OPENCLAW_GATEWAY_URL", "ws://localhost:18789"))
|
| 29 |
+
OPENCLAW_TOKEN: Optional[str] = field(default_factory=lambda: os.getenv("OPENCLAW_TOKEN"))
|
| 30 |
+
OPENCLAW_AGENT_ID: str = field(default_factory=lambda: os.getenv("OPENCLAW_AGENT_ID", "main"))
|
| 31 |
+
# Session key for OpenClaw - uses "main" to share context with WhatsApp and other channels
|
| 32 |
+
# Format: agent:<agent_id>:<session_key>, but we only need the session key part here
|
| 33 |
+
OPENCLAW_SESSION_KEY: str = field(default_factory=lambda: os.getenv("OPENCLAW_SESSION_KEY", "main"))
|
| 34 |
+
|
| 35 |
+
# Robot Configuration
|
| 36 |
+
ROBOT_NAME: Optional[str] = field(default_factory=lambda: os.getenv("ROBOT_NAME"))
|
| 37 |
+
|
| 38 |
+
# Feature Flags
|
| 39 |
+
ENABLE_OPENCLAW_TOOLS: bool = field(default_factory=lambda: os.getenv("ENABLE_OPENCLAW_TOOLS", "true").lower() == "true")
|
| 40 |
+
ENABLE_CAMERA: bool = field(default_factory=lambda: os.getenv("ENABLE_CAMERA", "true").lower() == "true")
|
| 41 |
+
ENABLE_FACE_TRACKING: bool = field(default_factory=lambda: os.getenv("ENABLE_FACE_TRACKING", "true").lower() == "true")
|
| 42 |
+
|
| 43 |
+
# Face Tracking Configuration
|
| 44 |
+
# Options: "yolo", "mediapipe", or None for auto-detect
|
| 45 |
+
HEAD_TRACKER_TYPE: Optional[str] = field(default_factory=lambda: os.getenv("HEAD_TRACKER_TYPE", "yolo"))
|
| 46 |
+
|
| 47 |
+
# Local Vision Processing
|
| 48 |
+
ENABLE_LOCAL_VISION: bool = field(default_factory=lambda: os.getenv("ENABLE_LOCAL_VISION", "false").lower() == "true")
|
| 49 |
+
LOCAL_VISION_MODEL: str = field(default_factory=lambda: os.getenv("LOCAL_VISION_MODEL", "HuggingFaceTB/SmolVLM2-256M-Video-Instruct"))
|
| 50 |
+
VISION_DEVICE: str = field(default_factory=lambda: os.getenv("VISION_DEVICE", "auto")) # "auto", "cuda", "mps", "cpu"
|
| 51 |
+
HF_HOME: str = field(default_factory=lambda: os.getenv("HF_HOME", os.path.expanduser("~/.cache/huggingface")))
|
| 52 |
+
|
| 53 |
+
# Custom Profile (for personality customization)
|
| 54 |
+
CUSTOM_PROFILE: Optional[str] = field(default_factory=lambda: os.getenv("REACHY_MINI_CUSTOM_PROFILE"))
|
| 55 |
+
|
| 56 |
+
def validate(self) -> list[str]:
|
| 57 |
+
"""Validate configuration and return list of errors."""
|
| 58 |
+
errors = []
|
| 59 |
+
if not self.OPENAI_API_KEY:
|
| 60 |
+
errors.append("OPENAI_API_KEY is required")
|
| 61 |
+
return errors
|
| 62 |
+
|
| 63 |
+
|
| 64 |
+
# Global configuration instance
|
| 65 |
+
config = Config()
|
| 66 |
+
|
| 67 |
+
|
| 68 |
+
def set_custom_profile(profile: Optional[str]) -> None:
|
| 69 |
+
"""Update the custom profile at runtime."""
|
| 70 |
+
global config
|
| 71 |
+
config.CUSTOM_PROFILE = profile
|
| 72 |
+
os.environ["REACHY_MINI_CUSTOM_PROFILE"] = profile or ""
|
| 73 |
+
|
| 74 |
+
|
| 75 |
+
def set_face_tracking_enabled(enabled: bool) -> None:
|
| 76 |
+
"""Enable or disable face tracking at runtime."""
|
| 77 |
+
global config
|
| 78 |
+
config.ENABLE_FACE_TRACKING = enabled
|
| 79 |
+
|
| 80 |
+
|
| 81 |
+
def set_local_vision_enabled(enabled: bool) -> None:
|
| 82 |
+
"""Enable or disable local vision processing at runtime."""
|
| 83 |
+
global config
|
| 84 |
+
config.ENABLE_LOCAL_VISION = enabled
|
src/reachy_mini_openclaw/gradio_app.py
ADDED
|
@@ -0,0 +1,202 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Gradio web UI for Reachy Mini OpenClaw.
|
| 2 |
+
|
| 3 |
+
This module provides a web interface for:
|
| 4 |
+
- Viewing conversation transcripts
|
| 5 |
+
- Configuring the assistant personality
|
| 6 |
+
- Monitoring robot status
|
| 7 |
+
- Manual control options
|
| 8 |
+
"""
|
| 9 |
+
|
| 10 |
+
import os
|
| 11 |
+
import logging
|
| 12 |
+
from typing import Optional
|
| 13 |
+
|
| 14 |
+
import gradio as gr
|
| 15 |
+
|
| 16 |
+
logger = logging.getLogger(__name__)
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
def launch_gradio(
|
| 20 |
+
gateway_url: str = "ws://localhost:18789",
|
| 21 |
+
robot_name: Optional[str] = None,
|
| 22 |
+
enable_camera: bool = True,
|
| 23 |
+
enable_openclaw: bool = True,
|
| 24 |
+
enable_face_tracking: bool = True,
|
| 25 |
+
head_tracker_type: Optional[str] = None,
|
| 26 |
+
share: bool = False,
|
| 27 |
+
) -> None:
|
| 28 |
+
"""Launch the Gradio web UI.
|
| 29 |
+
|
| 30 |
+
Args:
|
| 31 |
+
gateway_url: OpenClaw gateway URL
|
| 32 |
+
robot_name: Robot name for connection
|
| 33 |
+
enable_camera: Whether to enable camera
|
| 34 |
+
enable_openclaw: Whether to enable OpenClaw
|
| 35 |
+
enable_face_tracking: Whether to enable face tracking
|
| 36 |
+
head_tracker_type: Head tracker type ('yolo', 'mediapipe', or None)
|
| 37 |
+
share: Whether to create a public URL
|
| 38 |
+
"""
|
| 39 |
+
from reachy_mini_openclaw.prompts import get_available_profiles, save_custom_profile
|
| 40 |
+
from reachy_mini_openclaw.config import set_custom_profile, config
|
| 41 |
+
|
| 42 |
+
# State
|
| 43 |
+
app_instance = None
|
| 44 |
+
|
| 45 |
+
def start_conversation():
|
| 46 |
+
"""Start the conversation."""
|
| 47 |
+
nonlocal app_instance
|
| 48 |
+
|
| 49 |
+
from reachy_mini_openclaw.main import ReachyClawCore
|
| 50 |
+
import asyncio
|
| 51 |
+
import threading
|
| 52 |
+
|
| 53 |
+
if app_instance is not None:
|
| 54 |
+
return "Already running"
|
| 55 |
+
|
| 56 |
+
try:
|
| 57 |
+
app_instance = ReachyClawCore(
|
| 58 |
+
gateway_url=gateway_url,
|
| 59 |
+
robot_name=robot_name,
|
| 60 |
+
enable_camera=enable_camera,
|
| 61 |
+
enable_openclaw=enable_openclaw,
|
| 62 |
+
enable_face_tracking=enable_face_tracking,
|
| 63 |
+
head_tracker_type=head_tracker_type,
|
| 64 |
+
)
|
| 65 |
+
|
| 66 |
+
# Run in background thread
|
| 67 |
+
def run_app():
|
| 68 |
+
loop = asyncio.new_event_loop()
|
| 69 |
+
asyncio.set_event_loop(loop)
|
| 70 |
+
try:
|
| 71 |
+
loop.run_until_complete(app_instance.run())
|
| 72 |
+
except Exception as e:
|
| 73 |
+
logger.error("App error: %s", e)
|
| 74 |
+
finally:
|
| 75 |
+
loop.close()
|
| 76 |
+
|
| 77 |
+
thread = threading.Thread(target=run_app, daemon=True)
|
| 78 |
+
thread.start()
|
| 79 |
+
|
| 80 |
+
return "Started successfully"
|
| 81 |
+
except Exception as e:
|
| 82 |
+
return f"Error: {e}"
|
| 83 |
+
|
| 84 |
+
def stop_conversation():
|
| 85 |
+
"""Stop the conversation."""
|
| 86 |
+
nonlocal app_instance
|
| 87 |
+
|
| 88 |
+
if app_instance is None:
|
| 89 |
+
return "Not running"
|
| 90 |
+
|
| 91 |
+
try:
|
| 92 |
+
app_instance.stop()
|
| 93 |
+
app_instance = None
|
| 94 |
+
return "Stopped"
|
| 95 |
+
except Exception as e:
|
| 96 |
+
return f"Error: {e}"
|
| 97 |
+
|
| 98 |
+
def apply_profile(profile_name):
|
| 99 |
+
"""Apply a personality profile."""
|
| 100 |
+
set_custom_profile(profile_name if profile_name else None)
|
| 101 |
+
return f"Applied profile: {profile_name or 'default'}"
|
| 102 |
+
|
| 103 |
+
def save_profile(name, instructions):
|
| 104 |
+
"""Save a new profile."""
|
| 105 |
+
if save_custom_profile(name, instructions):
|
| 106 |
+
return f"Saved profile: {name}"
|
| 107 |
+
return "Error saving profile"
|
| 108 |
+
|
| 109 |
+
# Build UI
|
| 110 |
+
with gr.Blocks(title="Reachy Mini OpenClaw") as demo:
|
| 111 |
+
gr.Markdown("""
|
| 112 |
+
# 🤖 Reachy Mini OpenClaw
|
| 113 |
+
|
| 114 |
+
Give your OpenClaw AI agent a physical presence with Reachy Mini.
|
| 115 |
+
Using OpenAI Realtime API for responsive voice conversation.
|
| 116 |
+
""")
|
| 117 |
+
|
| 118 |
+
with gr.Tab("Conversation"):
|
| 119 |
+
with gr.Row():
|
| 120 |
+
start_btn = gr.Button("▶️ Start", variant="primary")
|
| 121 |
+
stop_btn = gr.Button("⏹️ Stop", variant="secondary")
|
| 122 |
+
|
| 123 |
+
status_text = gr.Textbox(label="Status", interactive=False)
|
| 124 |
+
|
| 125 |
+
transcript = gr.Chatbot(label="Conversation", height=400)
|
| 126 |
+
|
| 127 |
+
start_btn.click(start_conversation, outputs=[status_text])
|
| 128 |
+
stop_btn.click(stop_conversation, outputs=[status_text])
|
| 129 |
+
|
| 130 |
+
with gr.Tab("Personality"):
|
| 131 |
+
profiles = get_available_profiles()
|
| 132 |
+
profile_dropdown = gr.Dropdown(
|
| 133 |
+
choices=[""] + profiles,
|
| 134 |
+
label="Select Profile",
|
| 135 |
+
value=""
|
| 136 |
+
)
|
| 137 |
+
apply_btn = gr.Button("Apply Profile")
|
| 138 |
+
profile_status = gr.Textbox(label="Status", interactive=False)
|
| 139 |
+
|
| 140 |
+
apply_btn.click(
|
| 141 |
+
apply_profile,
|
| 142 |
+
inputs=[profile_dropdown],
|
| 143 |
+
outputs=[profile_status]
|
| 144 |
+
)
|
| 145 |
+
|
| 146 |
+
gr.Markdown("### Create New Profile")
|
| 147 |
+
new_name = gr.Textbox(label="Profile Name")
|
| 148 |
+
new_instructions = gr.Textbox(
|
| 149 |
+
label="Instructions",
|
| 150 |
+
lines=10,
|
| 151 |
+
placeholder="Enter the system prompt for this personality..."
|
| 152 |
+
)
|
| 153 |
+
save_btn = gr.Button("Save Profile")
|
| 154 |
+
save_status = gr.Textbox(label="Save Status", interactive=False)
|
| 155 |
+
|
| 156 |
+
save_btn.click(
|
| 157 |
+
save_profile,
|
| 158 |
+
inputs=[new_name, new_instructions],
|
| 159 |
+
outputs=[save_status]
|
| 160 |
+
)
|
| 161 |
+
|
| 162 |
+
with gr.Tab("Settings"):
|
| 163 |
+
gr.Markdown(f"""
|
| 164 |
+
### Current Configuration
|
| 165 |
+
|
| 166 |
+
- **OpenClaw Gateway**: {gateway_url}
|
| 167 |
+
- **OpenAI Model**: {config.OPENAI_MODEL}
|
| 168 |
+
- **Voice**: {config.OPENAI_VOICE}
|
| 169 |
+
- **Camera Enabled**: {enable_camera}
|
| 170 |
+
- **OpenClaw Enabled**: {enable_openclaw}
|
| 171 |
+
- **Face Tracking**: {enable_face_tracking}
|
| 172 |
+
- **Head Tracker**: {head_tracker_type or 'auto-detect'}
|
| 173 |
+
|
| 174 |
+
Edit `.env` file to change these settings.
|
| 175 |
+
""")
|
| 176 |
+
|
| 177 |
+
with gr.Tab("About"):
|
| 178 |
+
gr.Markdown("""
|
| 179 |
+
## About Reachy Mini OpenClaw
|
| 180 |
+
|
| 181 |
+
This application combines:
|
| 182 |
+
|
| 183 |
+
- **OpenAI Realtime API** for ultra-low-latency voice conversation
|
| 184 |
+
- **OpenClaw Gateway** for extended AI capabilities (web, calendar, smart home, etc.)
|
| 185 |
+
- **Reachy Mini Robot** for physical embodiment with expressive movements
|
| 186 |
+
|
| 187 |
+
### Features
|
| 188 |
+
|
| 189 |
+
- 🎤 Real-time voice conversation
|
| 190 |
+
- 👀 Camera-based vision
|
| 191 |
+
- 💃 Expressive robot movements
|
| 192 |
+
- 🔧 Tool integration via OpenClaw
|
| 193 |
+
- 🎭 Customizable personalities
|
| 194 |
+
|
| 195 |
+
### Links
|
| 196 |
+
|
| 197 |
+
- [Reachy Mini SDK](https://github.com/pollen-robotics/reachy_mini)
|
| 198 |
+
- [OpenClaw](https://github.com/openclaw/openclaw)
|
| 199 |
+
- [OpenAI Realtime API](https://platform.openai.com/docs/guides/realtime)
|
| 200 |
+
""")
|
| 201 |
+
|
| 202 |
+
demo.launch(share=share, server_name="0.0.0.0", server_port=7860)
|
src/reachy_mini_openclaw/main.py
ADDED
|
@@ -0,0 +1,591 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""ReachyClaw - Give your OpenClaw AI agent a physical robot body.
|
| 2 |
+
|
| 3 |
+
This module provides the main application that connects:
|
| 4 |
+
- OpenAI Realtime API for voice I/O (speech recognition + TTS)
|
| 5 |
+
- OpenClaw Gateway for AI intelligence (the actual brain)
|
| 6 |
+
- Reachy Mini robot for physical embodiment
|
| 7 |
+
|
| 8 |
+
Usage:
|
| 9 |
+
# Console mode (direct audio)
|
| 10 |
+
reachyclaw
|
| 11 |
+
|
| 12 |
+
# With Gradio UI
|
| 13 |
+
reachyclaw --gradio
|
| 14 |
+
|
| 15 |
+
# With debug logging
|
| 16 |
+
reachyclaw --debug
|
| 17 |
+
"""
|
| 18 |
+
|
| 19 |
+
import os
|
| 20 |
+
import sys
|
| 21 |
+
import time
|
| 22 |
+
import asyncio
|
| 23 |
+
import logging
|
| 24 |
+
import argparse
|
| 25 |
+
import threading
|
| 26 |
+
from pathlib import Path
|
| 27 |
+
from typing import Any, Optional
|
| 28 |
+
|
| 29 |
+
from dotenv import load_dotenv
|
| 30 |
+
|
| 31 |
+
# Load environment from project root (override=True ensures .env takes precedence)
|
| 32 |
+
_project_root = Path(__file__).parent.parent.parent
|
| 33 |
+
load_dotenv(_project_root / ".env", override=True)
|
| 34 |
+
|
| 35 |
+
logger = logging.getLogger(__name__)
|
| 36 |
+
|
| 37 |
+
|
| 38 |
+
def setup_logging(debug: bool = False) -> None:
|
| 39 |
+
"""Configure logging for the application.
|
| 40 |
+
|
| 41 |
+
Args:
|
| 42 |
+
debug: Enable debug level logging
|
| 43 |
+
"""
|
| 44 |
+
level = logging.DEBUG if debug else logging.INFO
|
| 45 |
+
logging.basicConfig(
|
| 46 |
+
level=level,
|
| 47 |
+
format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
|
| 48 |
+
datefmt="%H:%M:%S",
|
| 49 |
+
)
|
| 50 |
+
|
| 51 |
+
# Reduce noise from libraries
|
| 52 |
+
if not debug:
|
| 53 |
+
logging.getLogger("httpx").setLevel(logging.WARNING)
|
| 54 |
+
logging.getLogger("websockets").setLevel(logging.WARNING)
|
| 55 |
+
logging.getLogger("openai").setLevel(logging.WARNING)
|
| 56 |
+
|
| 57 |
+
|
| 58 |
+
def parse_args() -> argparse.Namespace:
|
| 59 |
+
"""Parse command line arguments.
|
| 60 |
+
|
| 61 |
+
Returns:
|
| 62 |
+
Parsed arguments namespace
|
| 63 |
+
"""
|
| 64 |
+
parser = argparse.ArgumentParser(
|
| 65 |
+
description="ReachyClaw - Give your OpenClaw AI agent a physical robot body",
|
| 66 |
+
formatter_class=argparse.RawDescriptionHelpFormatter,
|
| 67 |
+
epilog="""
|
| 68 |
+
Examples:
|
| 69 |
+
# Run in console mode
|
| 70 |
+
reachyclaw
|
| 71 |
+
|
| 72 |
+
# Run with Gradio web UI
|
| 73 |
+
reachyclaw --gradio
|
| 74 |
+
|
| 75 |
+
# Connect to specific robot
|
| 76 |
+
reachyclaw --robot-name my-reachy
|
| 77 |
+
|
| 78 |
+
# Use different OpenClaw gateway
|
| 79 |
+
reachyclaw --gateway-url http://192.168.1.100:18790
|
| 80 |
+
"""
|
| 81 |
+
)
|
| 82 |
+
|
| 83 |
+
parser.add_argument(
|
| 84 |
+
"--debug",
|
| 85 |
+
action="store_true",
|
| 86 |
+
help="Enable debug logging"
|
| 87 |
+
)
|
| 88 |
+
parser.add_argument(
|
| 89 |
+
"--gradio",
|
| 90 |
+
action="store_true",
|
| 91 |
+
help="Launch Gradio web UI instead of console mode"
|
| 92 |
+
)
|
| 93 |
+
parser.add_argument(
|
| 94 |
+
"--robot-name",
|
| 95 |
+
type=str,
|
| 96 |
+
help="Robot name for connection (default: auto-discover)"
|
| 97 |
+
)
|
| 98 |
+
parser.add_argument(
|
| 99 |
+
"--gateway-url",
|
| 100 |
+
type=str,
|
| 101 |
+
default=os.getenv("OPENCLAW_GATEWAY_URL", "ws://localhost:18789"),
|
| 102 |
+
help="OpenClaw gateway URL (from OPENCLAW_GATEWAY_URL env or default)"
|
| 103 |
+
)
|
| 104 |
+
parser.add_argument(
|
| 105 |
+
"--no-camera",
|
| 106 |
+
action="store_true",
|
| 107 |
+
help="Disable camera functionality"
|
| 108 |
+
)
|
| 109 |
+
parser.add_argument(
|
| 110 |
+
"--no-openclaw",
|
| 111 |
+
action="store_true",
|
| 112 |
+
help="Disable OpenClaw integration"
|
| 113 |
+
)
|
| 114 |
+
parser.add_argument(
|
| 115 |
+
"--no-face-tracking",
|
| 116 |
+
action="store_true",
|
| 117 |
+
help="Disable face tracking"
|
| 118 |
+
)
|
| 119 |
+
parser.add_argument(
|
| 120 |
+
"--local-vision",
|
| 121 |
+
action="store_true",
|
| 122 |
+
help="Enable local vision processing with SmolVLM2"
|
| 123 |
+
)
|
| 124 |
+
parser.add_argument(
|
| 125 |
+
"--profile",
|
| 126 |
+
type=str,
|
| 127 |
+
help="Custom personality profile to use"
|
| 128 |
+
)
|
| 129 |
+
|
| 130 |
+
return parser.parse_args()
|
| 131 |
+
|
| 132 |
+
|
| 133 |
+
class ReachyClawCore:
|
| 134 |
+
"""ReachyClaw core application controller.
|
| 135 |
+
|
| 136 |
+
This class orchestrates all components:
|
| 137 |
+
- Reachy Mini robot connection and movement control
|
| 138 |
+
- OpenAI Realtime API for voice I/O
|
| 139 |
+
- OpenClaw gateway bridge for AI intelligence
|
| 140 |
+
- Audio input/output loops
|
| 141 |
+
"""
|
| 142 |
+
|
| 143 |
+
def __init__(
|
| 144 |
+
self,
|
| 145 |
+
gateway_url: str = "ws://localhost:18789",
|
| 146 |
+
robot_name: Optional[str] = None,
|
| 147 |
+
enable_camera: bool = True,
|
| 148 |
+
enable_openclaw: bool = True,
|
| 149 |
+
robot: Optional["ReachyMini"] = None,
|
| 150 |
+
external_stop_event: Optional[threading.Event] = None,
|
| 151 |
+
):
|
| 152 |
+
"""Initialize the application.
|
| 153 |
+
|
| 154 |
+
Args:
|
| 155 |
+
gateway_url: OpenClaw gateway URL
|
| 156 |
+
robot_name: Optional robot name for connection
|
| 157 |
+
enable_camera: Whether to enable camera functionality
|
| 158 |
+
enable_openclaw: Whether to enable OpenClaw integration
|
| 159 |
+
robot: Optional pre-initialized robot (for app framework)
|
| 160 |
+
external_stop_event: Optional external stop event
|
| 161 |
+
"""
|
| 162 |
+
from reachy_mini import ReachyMini
|
| 163 |
+
from reachy_mini_openclaw.config import config
|
| 164 |
+
from reachy_mini_openclaw.moves import MovementManager
|
| 165 |
+
from reachy_mini_openclaw.audio.head_wobbler import HeadWobbler
|
| 166 |
+
from reachy_mini_openclaw.openclaw_bridge import OpenClawBridge
|
| 167 |
+
from reachy_mini_openclaw.tools.core_tools import ToolDependencies
|
| 168 |
+
from reachy_mini_openclaw.openai_realtime import OpenAIRealtimeHandler
|
| 169 |
+
|
| 170 |
+
self.gateway_url = gateway_url
|
| 171 |
+
self._external_stop_event = external_stop_event
|
| 172 |
+
self._owns_robot = robot is None
|
| 173 |
+
|
| 174 |
+
# Validate configuration
|
| 175 |
+
errors = config.validate()
|
| 176 |
+
if errors:
|
| 177 |
+
for error in errors:
|
| 178 |
+
logger.error("Config error: %s", error)
|
| 179 |
+
sys.exit(1)
|
| 180 |
+
|
| 181 |
+
# Connect to robot
|
| 182 |
+
if robot is not None:
|
| 183 |
+
self.robot = robot
|
| 184 |
+
logger.info("Using provided Reachy Mini instance")
|
| 185 |
+
else:
|
| 186 |
+
logger.info("Connecting to Reachy Mini...")
|
| 187 |
+
robot_kwargs = {}
|
| 188 |
+
if robot_name:
|
| 189 |
+
robot_kwargs["robot_name"] = robot_name
|
| 190 |
+
|
| 191 |
+
try:
|
| 192 |
+
self.robot = ReachyMini(**robot_kwargs)
|
| 193 |
+
except TimeoutError as e:
|
| 194 |
+
logger.error("Connection timeout: %s", e)
|
| 195 |
+
logger.error("Check that the robot is powered on and reachable.")
|
| 196 |
+
sys.exit(1)
|
| 197 |
+
except Exception as e:
|
| 198 |
+
logger.error("Robot connection failed: %s", e)
|
| 199 |
+
sys.exit(1)
|
| 200 |
+
|
| 201 |
+
logger.info("Connected to robot: %s", self.robot.client.get_status())
|
| 202 |
+
|
| 203 |
+
# Initialize movement system
|
| 204 |
+
logger.info("Initializing movement system...")
|
| 205 |
+
self.movement_manager = MovementManager(current_robot=self.robot)
|
| 206 |
+
self.head_wobbler = HeadWobbler(
|
| 207 |
+
set_speech_offsets=self.movement_manager.set_speech_offsets
|
| 208 |
+
)
|
| 209 |
+
|
| 210 |
+
# Initialize OpenClaw bridge
|
| 211 |
+
self.openclaw_bridge = None
|
| 212 |
+
if enable_openclaw:
|
| 213 |
+
logger.info("Initializing OpenClaw bridge...")
|
| 214 |
+
self.openclaw_bridge = OpenClawBridge(
|
| 215 |
+
gateway_url=gateway_url,
|
| 216 |
+
gateway_token=config.OPENCLAW_TOKEN,
|
| 217 |
+
)
|
| 218 |
+
|
| 219 |
+
# Camera worker for video streaming and frame capture
|
| 220 |
+
self.camera_worker = None
|
| 221 |
+
self.head_tracker = None
|
| 222 |
+
self.vision_manager = None
|
| 223 |
+
|
| 224 |
+
if enable_camera:
|
| 225 |
+
logger.info("Initializing camera worker...")
|
| 226 |
+
from reachy_mini_openclaw.camera_worker import CameraWorker
|
| 227 |
+
|
| 228 |
+
# Initialize head tracker for local face tracking
|
| 229 |
+
if config.ENABLE_FACE_TRACKING:
|
| 230 |
+
self.head_tracker = self._initialize_head_tracker(config.HEAD_TRACKER_TYPE)
|
| 231 |
+
|
| 232 |
+
# Initialize camera worker with head tracker
|
| 233 |
+
self.camera_worker = CameraWorker(
|
| 234 |
+
reachy_mini=self.robot,
|
| 235 |
+
head_tracker=self.head_tracker,
|
| 236 |
+
)
|
| 237 |
+
|
| 238 |
+
# Enable/disable head tracking based on whether we have a tracker
|
| 239 |
+
self.camera_worker.set_head_tracking_enabled(self.head_tracker is not None)
|
| 240 |
+
|
| 241 |
+
# Initialize local vision processor if enabled
|
| 242 |
+
if config.ENABLE_LOCAL_VISION:
|
| 243 |
+
self.vision_manager = self._initialize_vision_manager()
|
| 244 |
+
|
| 245 |
+
# Create tool dependencies
|
| 246 |
+
self.deps = ToolDependencies(
|
| 247 |
+
movement_manager=self.movement_manager,
|
| 248 |
+
head_wobbler=self.head_wobbler,
|
| 249 |
+
robot=self.robot,
|
| 250 |
+
camera_worker=self.camera_worker,
|
| 251 |
+
openclaw_bridge=self.openclaw_bridge,
|
| 252 |
+
vision_manager=self.vision_manager,
|
| 253 |
+
)
|
| 254 |
+
|
| 255 |
+
# Initialize OpenAI Realtime handler with OpenClaw bridge
|
| 256 |
+
self.handler = OpenAIRealtimeHandler(
|
| 257 |
+
deps=self.deps,
|
| 258 |
+
openclaw_bridge=self.openclaw_bridge,
|
| 259 |
+
)
|
| 260 |
+
|
| 261 |
+
# State
|
| 262 |
+
self._stop_event = asyncio.Event()
|
| 263 |
+
self._tasks: list[asyncio.Task] = []
|
| 264 |
+
|
| 265 |
+
def _initialize_vision_manager(self) -> Optional[Any]:
|
| 266 |
+
"""Initialize local vision processor (SmolVLM2).
|
| 267 |
+
|
| 268 |
+
Returns:
|
| 269 |
+
VisionManager instance or None if initialization fails
|
| 270 |
+
"""
|
| 271 |
+
if self.camera_worker is None:
|
| 272 |
+
logger.warning("Cannot initialize vision manager without camera worker")
|
| 273 |
+
return None
|
| 274 |
+
|
| 275 |
+
try:
|
| 276 |
+
from reachy_mini_openclaw.vision.processors import (
|
| 277 |
+
VisionConfig,
|
| 278 |
+
initialize_vision_manager,
|
| 279 |
+
)
|
| 280 |
+
from reachy_mini_openclaw.config import config
|
| 281 |
+
|
| 282 |
+
vision_config = VisionConfig(
|
| 283 |
+
model_path=config.LOCAL_VISION_MODEL,
|
| 284 |
+
device_preference=config.VISION_DEVICE,
|
| 285 |
+
hf_home=config.HF_HOME,
|
| 286 |
+
)
|
| 287 |
+
|
| 288 |
+
logger.info("Initializing local vision processor (SmolVLM2)...")
|
| 289 |
+
vision_manager = initialize_vision_manager(self.camera_worker, vision_config)
|
| 290 |
+
|
| 291 |
+
if vision_manager is not None:
|
| 292 |
+
logger.info("Local vision processor initialized")
|
| 293 |
+
else:
|
| 294 |
+
logger.warning("Local vision processor failed to initialize")
|
| 295 |
+
|
| 296 |
+
return vision_manager
|
| 297 |
+
|
| 298 |
+
except ImportError as e:
|
| 299 |
+
logger.warning(f"Local vision not available: {e}")
|
| 300 |
+
logger.warning("Install with: pip install torch transformers")
|
| 301 |
+
return None
|
| 302 |
+
except Exception as e:
|
| 303 |
+
logger.error(f"Failed to initialize vision manager: {e}")
|
| 304 |
+
return None
|
| 305 |
+
|
| 306 |
+
def _initialize_head_tracker(self, tracker_type: Optional[str] = None) -> Optional[Any]:
|
| 307 |
+
"""Initialize head tracker for local face tracking.
|
| 308 |
+
|
| 309 |
+
Args:
|
| 310 |
+
tracker_type: Type of tracker ("yolo", "mediapipe", or None for auto)
|
| 311 |
+
|
| 312 |
+
Returns:
|
| 313 |
+
Initialized head tracker or None if initialization fails
|
| 314 |
+
"""
|
| 315 |
+
# Default to YOLO if not specified
|
| 316 |
+
if tracker_type is None:
|
| 317 |
+
tracker_type = "yolo"
|
| 318 |
+
|
| 319 |
+
if tracker_type == "yolo":
|
| 320 |
+
try:
|
| 321 |
+
from reachy_mini_openclaw.vision.yolo_head_tracker import HeadTracker
|
| 322 |
+
logger.info("Initializing YOLO face tracker...")
|
| 323 |
+
tracker = HeadTracker(device="cpu") # CPU is fast enough for face detection
|
| 324 |
+
logger.info("YOLO face tracker initialized")
|
| 325 |
+
return tracker
|
| 326 |
+
except ImportError as e:
|
| 327 |
+
logger.warning(f"YOLO tracker not available: {e}")
|
| 328 |
+
logger.warning("Install with: pip install ultralytics supervision")
|
| 329 |
+
except Exception as e:
|
| 330 |
+
logger.error(f"Failed to initialize YOLO tracker: {e}")
|
| 331 |
+
|
| 332 |
+
elif tracker_type == "mediapipe":
|
| 333 |
+
try:
|
| 334 |
+
from reachy_mini_openclaw.vision.mediapipe_tracker import HeadTracker
|
| 335 |
+
logger.info("Initializing MediaPipe face tracker...")
|
| 336 |
+
tracker = HeadTracker()
|
| 337 |
+
logger.info("MediaPipe face tracker initialized")
|
| 338 |
+
return tracker
|
| 339 |
+
except ImportError as e:
|
| 340 |
+
logger.warning(f"MediaPipe tracker not available: {e}")
|
| 341 |
+
except Exception as e:
|
| 342 |
+
logger.error(f"Failed to initialize MediaPipe tracker: {e}")
|
| 343 |
+
|
| 344 |
+
logger.warning("No face tracker available - face tracking disabled")
|
| 345 |
+
return None
|
| 346 |
+
|
| 347 |
+
def _should_stop(self) -> bool:
|
| 348 |
+
"""Check if we should stop."""
|
| 349 |
+
if self._stop_event.is_set():
|
| 350 |
+
return True
|
| 351 |
+
if self._external_stop_event is not None and self._external_stop_event.is_set():
|
| 352 |
+
return True
|
| 353 |
+
return False
|
| 354 |
+
|
| 355 |
+
async def record_loop(self) -> None:
|
| 356 |
+
"""Read audio from robot microphone and send to handler."""
|
| 357 |
+
input_sr = self.robot.media.get_input_audio_samplerate()
|
| 358 |
+
logger.info("Recording at %d Hz", input_sr)
|
| 359 |
+
|
| 360 |
+
while not self._should_stop():
|
| 361 |
+
audio_frame = self.robot.media.get_audio_sample()
|
| 362 |
+
if audio_frame is not None:
|
| 363 |
+
await self.handler.receive((input_sr, audio_frame))
|
| 364 |
+
await asyncio.sleep(0.01)
|
| 365 |
+
|
| 366 |
+
async def play_loop(self) -> None:
|
| 367 |
+
"""Play audio from handler through robot speakers."""
|
| 368 |
+
output_sr = self.robot.media.get_output_audio_samplerate()
|
| 369 |
+
logger.info("Playing at %d Hz", output_sr)
|
| 370 |
+
|
| 371 |
+
while not self._should_stop():
|
| 372 |
+
output = await self.handler.emit()
|
| 373 |
+
if output is not None:
|
| 374 |
+
if isinstance(output, tuple):
|
| 375 |
+
input_sr, audio_data = output
|
| 376 |
+
|
| 377 |
+
# Convert to float32 and normalize (OpenAI sends int16)
|
| 378 |
+
audio_data = audio_data.flatten().astype("float32") / 32768.0
|
| 379 |
+
|
| 380 |
+
# Reduce volume to prevent distortion (0.5 = 50% volume)
|
| 381 |
+
audio_data = audio_data * 0.5
|
| 382 |
+
|
| 383 |
+
# Resample if needed
|
| 384 |
+
if input_sr != output_sr:
|
| 385 |
+
from scipy.signal import resample
|
| 386 |
+
num_samples = int(len(audio_data) * output_sr / input_sr)
|
| 387 |
+
audio_data = resample(audio_data, num_samples).astype("float32")
|
| 388 |
+
|
| 389 |
+
self.robot.media.push_audio_sample(audio_data)
|
| 390 |
+
# else: it's an AdditionalOutputs (transcript) - handle in UI mode
|
| 391 |
+
|
| 392 |
+
await asyncio.sleep(0.01)
|
| 393 |
+
|
| 394 |
+
async def run(self) -> None:
|
| 395 |
+
"""Run the main application loop."""
|
| 396 |
+
# Test OpenClaw connection
|
| 397 |
+
if self.openclaw_bridge is not None:
|
| 398 |
+
connected = await self.openclaw_bridge.connect()
|
| 399 |
+
if connected:
|
| 400 |
+
logger.info("OpenClaw gateway connected")
|
| 401 |
+
else:
|
| 402 |
+
logger.warning("OpenClaw gateway not available - some features disabled")
|
| 403 |
+
|
| 404 |
+
# Enable motors and move to neutral pose
|
| 405 |
+
logger.info("Enabling motors and moving to neutral position...")
|
| 406 |
+
try:
|
| 407 |
+
self.robot.enable_motors()
|
| 408 |
+
from reachy_mini.utils import create_head_pose
|
| 409 |
+
neutral = create_head_pose(0, 0, 0, 0, 0, 0, degrees=True)
|
| 410 |
+
self.robot.goto_target(
|
| 411 |
+
head=neutral,
|
| 412 |
+
antennas=[0.0, 0.0],
|
| 413 |
+
duration=2.0,
|
| 414 |
+
body_yaw=0.0,
|
| 415 |
+
)
|
| 416 |
+
time.sleep(2) # Wait for goto to complete
|
| 417 |
+
logger.info("Robot at neutral position with motors enabled")
|
| 418 |
+
except Exception as e:
|
| 419 |
+
logger.error("Failed to initialize robot pose: %s", e)
|
| 420 |
+
|
| 421 |
+
# Wire up camera worker to movement manager for face tracking
|
| 422 |
+
if self.camera_worker is not None:
|
| 423 |
+
self.movement_manager.camera_worker = self.camera_worker
|
| 424 |
+
logger.info("Face tracking connected to movement system")
|
| 425 |
+
|
| 426 |
+
# Start movement system
|
| 427 |
+
logger.info("Starting movement system...")
|
| 428 |
+
self.movement_manager.start()
|
| 429 |
+
self.head_wobbler.start()
|
| 430 |
+
|
| 431 |
+
# Start camera worker for video streaming
|
| 432 |
+
if self.camera_worker is not None:
|
| 433 |
+
logger.info("Starting camera worker...")
|
| 434 |
+
self.camera_worker.start()
|
| 435 |
+
|
| 436 |
+
# Start local vision processor if available
|
| 437 |
+
if self.vision_manager is not None:
|
| 438 |
+
logger.info("Starting local vision processor...")
|
| 439 |
+
self.vision_manager.start()
|
| 440 |
+
|
| 441 |
+
# Start audio
|
| 442 |
+
logger.info("Starting audio...")
|
| 443 |
+
self.robot.media.start_recording()
|
| 444 |
+
self.robot.media.start_playing()
|
| 445 |
+
time.sleep(1) # Let pipelines initialize
|
| 446 |
+
|
| 447 |
+
logger.info("Ready! Speak to me...")
|
| 448 |
+
|
| 449 |
+
# Start OpenAI handler in background
|
| 450 |
+
handler_task = asyncio.create_task(self.handler.start_up(), name="openai-handler")
|
| 451 |
+
|
| 452 |
+
# Start audio loops
|
| 453 |
+
self._tasks = [
|
| 454 |
+
handler_task,
|
| 455 |
+
asyncio.create_task(self.record_loop(), name="record-loop"),
|
| 456 |
+
asyncio.create_task(self.play_loop(), name="play-loop"),
|
| 457 |
+
]
|
| 458 |
+
|
| 459 |
+
try:
|
| 460 |
+
await asyncio.gather(*self._tasks)
|
| 461 |
+
except asyncio.CancelledError:
|
| 462 |
+
logger.info("Tasks cancelled")
|
| 463 |
+
|
| 464 |
+
def stop(self) -> None:
|
| 465 |
+
"""Stop everything."""
|
| 466 |
+
logger.info("Stopping...")
|
| 467 |
+
self._stop_event.set()
|
| 468 |
+
|
| 469 |
+
# Cancel tasks
|
| 470 |
+
for task in self._tasks:
|
| 471 |
+
if not task.done():
|
| 472 |
+
task.cancel()
|
| 473 |
+
|
| 474 |
+
# Stop movement system
|
| 475 |
+
self.head_wobbler.stop()
|
| 476 |
+
self.movement_manager.stop()
|
| 477 |
+
|
| 478 |
+
# Stop vision manager
|
| 479 |
+
if self.vision_manager is not None:
|
| 480 |
+
self.vision_manager.stop()
|
| 481 |
+
|
| 482 |
+
# Stop camera worker
|
| 483 |
+
if self.camera_worker is not None:
|
| 484 |
+
self.camera_worker.stop()
|
| 485 |
+
|
| 486 |
+
# Disconnect OpenClaw bridge
|
| 487 |
+
if self.openclaw_bridge is not None:
|
| 488 |
+
try:
|
| 489 |
+
asyncio.get_event_loop().run_until_complete(
|
| 490 |
+
self.openclaw_bridge.disconnect()
|
| 491 |
+
)
|
| 492 |
+
except Exception as e:
|
| 493 |
+
logger.debug("OpenClaw disconnect: %s", e)
|
| 494 |
+
|
| 495 |
+
# Close resources if we own them
|
| 496 |
+
if self._owns_robot:
|
| 497 |
+
try:
|
| 498 |
+
self.robot.media.close()
|
| 499 |
+
except Exception as e:
|
| 500 |
+
logger.debug("Media close: %s", e)
|
| 501 |
+
self.robot.client.disconnect()
|
| 502 |
+
|
| 503 |
+
logger.info("Stopped")
|
| 504 |
+
|
| 505 |
+
|
| 506 |
+
class ReachyClawApp:
|
| 507 |
+
"""ReachyClaw - Reachy Mini Apps entry point.
|
| 508 |
+
|
| 509 |
+
This class allows ReachyClaw to be installed and run from
|
| 510 |
+
the Reachy Mini dashboard as a Reachy Mini App.
|
| 511 |
+
"""
|
| 512 |
+
|
| 513 |
+
# No custom settings UI
|
| 514 |
+
custom_app_url: Optional[str] = None
|
| 515 |
+
|
| 516 |
+
def run(self, reachy_mini, stop_event: threading.Event) -> None:
|
| 517 |
+
"""Run ReachyClaw as a Reachy Mini App.
|
| 518 |
+
|
| 519 |
+
Args:
|
| 520 |
+
reachy_mini: Pre-initialized ReachyMini instance
|
| 521 |
+
stop_event: Threading event to signal stop
|
| 522 |
+
"""
|
| 523 |
+
loop = asyncio.new_event_loop()
|
| 524 |
+
asyncio.set_event_loop(loop)
|
| 525 |
+
|
| 526 |
+
gateway_url = os.getenv("OPENCLAW_GATEWAY_URL", "ws://localhost:18789")
|
| 527 |
+
|
| 528 |
+
app = ReachyClawCore(
|
| 529 |
+
gateway_url=gateway_url,
|
| 530 |
+
robot=reachy_mini,
|
| 531 |
+
external_stop_event=stop_event,
|
| 532 |
+
)
|
| 533 |
+
|
| 534 |
+
try:
|
| 535 |
+
loop.run_until_complete(app.run())
|
| 536 |
+
except Exception as e:
|
| 537 |
+
logger.error("Error running app: %s", e)
|
| 538 |
+
finally:
|
| 539 |
+
app.stop()
|
| 540 |
+
loop.close()
|
| 541 |
+
|
| 542 |
+
|
| 543 |
+
def main() -> None:
|
| 544 |
+
"""Main entry point."""
|
| 545 |
+
args = parse_args()
|
| 546 |
+
setup_logging(args.debug)
|
| 547 |
+
|
| 548 |
+
# Set custom profile if specified
|
| 549 |
+
if args.profile:
|
| 550 |
+
from reachy_mini_openclaw.config import set_custom_profile
|
| 551 |
+
set_custom_profile(args.profile)
|
| 552 |
+
|
| 553 |
+
# Configure face tracking and local vision from args
|
| 554 |
+
from reachy_mini_openclaw.config import (
|
| 555 |
+
set_face_tracking_enabled,
|
| 556 |
+
set_local_vision_enabled,
|
| 557 |
+
)
|
| 558 |
+
if args.no_face_tracking:
|
| 559 |
+
set_face_tracking_enabled(False)
|
| 560 |
+
if args.local_vision:
|
| 561 |
+
set_local_vision_enabled(True)
|
| 562 |
+
|
| 563 |
+
if args.gradio:
|
| 564 |
+
# Launch Gradio UI
|
| 565 |
+
logger.info("Starting Gradio UI...")
|
| 566 |
+
from reachy_mini_openclaw.gradio_app import launch_gradio
|
| 567 |
+
launch_gradio(
|
| 568 |
+
gateway_url=args.gateway_url,
|
| 569 |
+
robot_name=args.robot_name,
|
| 570 |
+
enable_camera=not args.no_camera,
|
| 571 |
+
enable_openclaw=not args.no_openclaw,
|
| 572 |
+
)
|
| 573 |
+
else:
|
| 574 |
+
# Console mode
|
| 575 |
+
app = ReachyClawCore(
|
| 576 |
+
gateway_url=args.gateway_url,
|
| 577 |
+
robot_name=args.robot_name,
|
| 578 |
+
enable_camera=not args.no_camera,
|
| 579 |
+
enable_openclaw=not args.no_openclaw,
|
| 580 |
+
)
|
| 581 |
+
|
| 582 |
+
try:
|
| 583 |
+
asyncio.run(app.run())
|
| 584 |
+
except KeyboardInterrupt:
|
| 585 |
+
logger.info("Interrupted")
|
| 586 |
+
finally:
|
| 587 |
+
app.stop()
|
| 588 |
+
|
| 589 |
+
|
| 590 |
+
if __name__ == "__main__":
|
| 591 |
+
main()
|
src/reachy_mini_openclaw/moves.py
ADDED
|
@@ -0,0 +1,648 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Movement system for expressive robot control.
|
| 2 |
+
|
| 3 |
+
This module provides a 100Hz control loop for managing robot movements,
|
| 4 |
+
combining sequential primary moves (dances, emotions, head movements) with
|
| 5 |
+
additive secondary moves (speech wobble, face tracking).
|
| 6 |
+
|
| 7 |
+
Architecture:
|
| 8 |
+
- Primary moves are queued and executed sequentially
|
| 9 |
+
- Secondary moves are additive offsets applied on top
|
| 10 |
+
- Single control point via set_target at 100Hz
|
| 11 |
+
- Automatic breathing animation when idle
|
| 12 |
+
|
| 13 |
+
Based on the movement systems from:
|
| 14 |
+
- pollen-robotics/reachy_mini_conversation_app
|
| 15 |
+
- eoai-dev/moltbot_body
|
| 16 |
+
"""
|
| 17 |
+
|
| 18 |
+
from __future__ import annotations
|
| 19 |
+
|
| 20 |
+
import logging
|
| 21 |
+
import threading
|
| 22 |
+
import time
|
| 23 |
+
from collections import deque
|
| 24 |
+
from dataclasses import dataclass
|
| 25 |
+
from queue import Empty, Queue
|
| 26 |
+
from typing import Any, Dict, Optional, Tuple
|
| 27 |
+
|
| 28 |
+
import numpy as np
|
| 29 |
+
from numpy.typing import NDArray
|
| 30 |
+
from reachy_mini import ReachyMini
|
| 31 |
+
from reachy_mini.motion.move import Move
|
| 32 |
+
from reachy_mini.utils import create_head_pose
|
| 33 |
+
from reachy_mini.utils.interpolation import compose_world_offset, linear_pose_interpolation
|
| 34 |
+
|
| 35 |
+
logger = logging.getLogger(__name__)
|
| 36 |
+
|
| 37 |
+
# Configuration
|
| 38 |
+
CONTROL_LOOP_FREQUENCY_HZ = 100.0
|
| 39 |
+
|
| 40 |
+
# Type definitions
|
| 41 |
+
FullBodyPose = Tuple[NDArray[np.float32], Tuple[float, float], float]
|
| 42 |
+
SpeechOffsets = Tuple[float, float, float, float, float, float]
|
| 43 |
+
|
| 44 |
+
|
| 45 |
+
class BreathingMove(Move):
|
| 46 |
+
"""Continuous breathing animation for idle state."""
|
| 47 |
+
|
| 48 |
+
def __init__(
|
| 49 |
+
self,
|
| 50 |
+
interpolation_start_pose: NDArray[np.float32],
|
| 51 |
+
interpolation_start_antennas: Tuple[float, float],
|
| 52 |
+
interpolation_duration: float = 1.0,
|
| 53 |
+
):
|
| 54 |
+
"""Initialize breathing move.
|
| 55 |
+
|
| 56 |
+
Args:
|
| 57 |
+
interpolation_start_pose: Current head pose to interpolate from
|
| 58 |
+
interpolation_start_antennas: Current antenna positions
|
| 59 |
+
interpolation_duration: Time to blend to neutral (seconds)
|
| 60 |
+
"""
|
| 61 |
+
self.interpolation_start_pose = interpolation_start_pose
|
| 62 |
+
self.interpolation_start_antennas = np.array(interpolation_start_antennas)
|
| 63 |
+
self.interpolation_duration = interpolation_duration
|
| 64 |
+
|
| 65 |
+
# Target neutral pose
|
| 66 |
+
self.neutral_head_pose = create_head_pose(0, 0, 0, 0, 0, 0, degrees=True)
|
| 67 |
+
self.neutral_antennas = np.array([0.0, 0.0])
|
| 68 |
+
|
| 69 |
+
# Breathing parameters
|
| 70 |
+
self.breathing_z_amplitude = 0.005 # 5mm gentle movement
|
| 71 |
+
self.breathing_frequency = 0.1 # Hz
|
| 72 |
+
self.antenna_sway_amplitude = np.deg2rad(15) # degrees
|
| 73 |
+
self.antenna_frequency = 0.5 # Hz
|
| 74 |
+
|
| 75 |
+
@property
|
| 76 |
+
def duration(self) -> float:
|
| 77 |
+
"""Duration of the move (infinite for breathing)."""
|
| 78 |
+
return float("inf")
|
| 79 |
+
|
| 80 |
+
def evaluate(self, t: float) -> tuple:
|
| 81 |
+
"""Evaluate the breathing pose at time t."""
|
| 82 |
+
if t < self.interpolation_duration:
|
| 83 |
+
# Interpolate to neutral
|
| 84 |
+
alpha = t / self.interpolation_duration
|
| 85 |
+
head_pose = linear_pose_interpolation(
|
| 86 |
+
self.interpolation_start_pose,
|
| 87 |
+
self.neutral_head_pose,
|
| 88 |
+
alpha
|
| 89 |
+
)
|
| 90 |
+
antennas = (1 - alpha) * self.interpolation_start_antennas + alpha * self.neutral_antennas
|
| 91 |
+
antennas = antennas.astype(np.float64)
|
| 92 |
+
else:
|
| 93 |
+
# Breathing pattern
|
| 94 |
+
breathing_t = t - self.interpolation_duration
|
| 95 |
+
|
| 96 |
+
z_offset = self.breathing_z_amplitude * np.sin(
|
| 97 |
+
2 * np.pi * self.breathing_frequency * breathing_t
|
| 98 |
+
)
|
| 99 |
+
head_pose = create_head_pose(
|
| 100 |
+
x=0, y=0, z=z_offset,
|
| 101 |
+
roll=0, pitch=0, yaw=0,
|
| 102 |
+
degrees=True, mm=False
|
| 103 |
+
)
|
| 104 |
+
|
| 105 |
+
antenna_sway = self.antenna_sway_amplitude * np.sin(
|
| 106 |
+
2 * np.pi * self.antenna_frequency * breathing_t
|
| 107 |
+
)
|
| 108 |
+
antennas = np.array([antenna_sway, -antenna_sway], dtype=np.float64)
|
| 109 |
+
|
| 110 |
+
return (head_pose, antennas, 0.0)
|
| 111 |
+
|
| 112 |
+
|
| 113 |
+
class HeadLookMove(Move):
|
| 114 |
+
"""Move to look in a specific direction."""
|
| 115 |
+
|
| 116 |
+
DIRECTIONS = {
|
| 117 |
+
"left": (0, 0, 0, 0, 0, 30), # yaw left
|
| 118 |
+
"right": (0, 0, 0, 0, 0, -30), # yaw right
|
| 119 |
+
"up": (0, 0, 10, 0, 15, 0), # pitch up, z up
|
| 120 |
+
"down": (0, 0, -5, 0, -15, 0), # pitch down, z down
|
| 121 |
+
"front": (0, 0, 0, 0, 0, 0), # neutral
|
| 122 |
+
}
|
| 123 |
+
|
| 124 |
+
def __init__(
|
| 125 |
+
self,
|
| 126 |
+
direction: str,
|
| 127 |
+
start_pose: NDArray[np.float32],
|
| 128 |
+
start_antennas: Tuple[float, float],
|
| 129 |
+
duration: float = 1.0,
|
| 130 |
+
):
|
| 131 |
+
"""Initialize head look move.
|
| 132 |
+
|
| 133 |
+
Args:
|
| 134 |
+
direction: One of 'left', 'right', 'up', 'down', 'front'
|
| 135 |
+
start_pose: Current head pose
|
| 136 |
+
start_antennas: Current antenna positions
|
| 137 |
+
duration: Move duration in seconds
|
| 138 |
+
"""
|
| 139 |
+
self.direction = direction
|
| 140 |
+
self.start_pose = start_pose
|
| 141 |
+
self.start_antennas = np.array(start_antennas)
|
| 142 |
+
self._duration = duration
|
| 143 |
+
|
| 144 |
+
# Get target pose from direction
|
| 145 |
+
params = self.DIRECTIONS.get(direction, self.DIRECTIONS["front"])
|
| 146 |
+
self.target_pose = create_head_pose(
|
| 147 |
+
x=params[0], y=params[1], z=params[2],
|
| 148 |
+
roll=params[3], pitch=params[4], yaw=params[5],
|
| 149 |
+
degrees=True, mm=True
|
| 150 |
+
)
|
| 151 |
+
self.target_antennas = np.array([0.0, 0.0])
|
| 152 |
+
|
| 153 |
+
@property
|
| 154 |
+
def duration(self) -> float:
|
| 155 |
+
return self._duration
|
| 156 |
+
|
| 157 |
+
def evaluate(self, t: float) -> tuple:
|
| 158 |
+
"""Evaluate pose at time t."""
|
| 159 |
+
alpha = min(1.0, t / self._duration)
|
| 160 |
+
# Smooth easing
|
| 161 |
+
alpha = alpha * alpha * (3 - 2 * alpha)
|
| 162 |
+
|
| 163 |
+
head_pose = linear_pose_interpolation(
|
| 164 |
+
self.start_pose,
|
| 165 |
+
self.target_pose,
|
| 166 |
+
alpha
|
| 167 |
+
)
|
| 168 |
+
antennas = (1 - alpha) * self.start_antennas + alpha * self.target_antennas
|
| 169 |
+
|
| 170 |
+
return (head_pose, antennas.astype(np.float64), 0.0)
|
| 171 |
+
|
| 172 |
+
|
| 173 |
+
def combine_full_body(primary: FullBodyPose, secondary: FullBodyPose) -> FullBodyPose:
|
| 174 |
+
"""Combine primary pose with secondary offsets."""
|
| 175 |
+
primary_head, primary_ant, primary_yaw = primary
|
| 176 |
+
secondary_head, secondary_ant, secondary_yaw = secondary
|
| 177 |
+
|
| 178 |
+
combined_head = compose_world_offset(primary_head, secondary_head, reorthonormalize=True)
|
| 179 |
+
combined_ant = (
|
| 180 |
+
primary_ant[0] + secondary_ant[0],
|
| 181 |
+
primary_ant[1] + secondary_ant[1],
|
| 182 |
+
)
|
| 183 |
+
combined_yaw = primary_yaw + secondary_yaw
|
| 184 |
+
|
| 185 |
+
return (combined_head, combined_ant, combined_yaw)
|
| 186 |
+
|
| 187 |
+
|
| 188 |
+
def clone_pose(pose: FullBodyPose) -> FullBodyPose:
|
| 189 |
+
"""Deep copy a full body pose."""
|
| 190 |
+
head, ant, yaw = pose
|
| 191 |
+
return (head.copy(), (float(ant[0]), float(ant[1])), float(yaw))
|
| 192 |
+
|
| 193 |
+
|
| 194 |
+
@dataclass
|
| 195 |
+
class MovementState:
|
| 196 |
+
"""State for the movement system."""
|
| 197 |
+
current_move: Optional[Move] = None
|
| 198 |
+
move_start_time: Optional[float] = None
|
| 199 |
+
last_activity_time: float = 0.0
|
| 200 |
+
speech_offsets: SpeechOffsets = (0.0, 0.0, 0.0, 0.0, 0.0, 0.0)
|
| 201 |
+
face_tracking_offsets: SpeechOffsets = (0.0, 0.0, 0.0, 0.0, 0.0, 0.0)
|
| 202 |
+
thinking_offsets: SpeechOffsets = (0.0, 0.0, 0.0, 0.0, 0.0, 0.0)
|
| 203 |
+
last_primary_pose: Optional[FullBodyPose] = None
|
| 204 |
+
|
| 205 |
+
def update_activity(self) -> None:
|
| 206 |
+
self.last_activity_time = time.monotonic()
|
| 207 |
+
|
| 208 |
+
|
| 209 |
+
class MovementManager:
|
| 210 |
+
"""Coordinate robot movements at 100Hz.
|
| 211 |
+
|
| 212 |
+
This class manages:
|
| 213 |
+
- Sequential primary moves (dances, emotions, head movements)
|
| 214 |
+
- Additive secondary offsets (speech wobble, face tracking)
|
| 215 |
+
- Automatic idle breathing animation
|
| 216 |
+
- Thread-safe communication with other components
|
| 217 |
+
|
| 218 |
+
Example:
|
| 219 |
+
manager = MovementManager(robot)
|
| 220 |
+
manager.start()
|
| 221 |
+
|
| 222 |
+
# Queue a head movement
|
| 223 |
+
manager.queue_move(HeadLookMove("left", ...))
|
| 224 |
+
|
| 225 |
+
# Set speech offsets (called by HeadWobbler)
|
| 226 |
+
manager.set_speech_offsets((0, 0, 0.01, 0.1, 0, 0))
|
| 227 |
+
|
| 228 |
+
manager.stop()
|
| 229 |
+
"""
|
| 230 |
+
|
| 231 |
+
def __init__(
|
| 232 |
+
self,
|
| 233 |
+
current_robot: ReachyMini,
|
| 234 |
+
camera_worker: Any = None,
|
| 235 |
+
):
|
| 236 |
+
"""Initialize movement manager.
|
| 237 |
+
|
| 238 |
+
Args:
|
| 239 |
+
current_robot: Connected ReachyMini instance
|
| 240 |
+
camera_worker: Optional camera worker for face tracking
|
| 241 |
+
"""
|
| 242 |
+
self.current_robot = current_robot
|
| 243 |
+
self.camera_worker = camera_worker
|
| 244 |
+
|
| 245 |
+
self._now = time.monotonic
|
| 246 |
+
self.state = MovementState()
|
| 247 |
+
self.state.last_activity_time = self._now()
|
| 248 |
+
|
| 249 |
+
# Initialize neutral pose
|
| 250 |
+
neutral = create_head_pose(0, 0, 0, 0, 0, 0, degrees=True)
|
| 251 |
+
self.state.last_primary_pose = (neutral, (0.0, 0.0), 0.0)
|
| 252 |
+
|
| 253 |
+
# Move queue
|
| 254 |
+
self.move_queue: deque[Move] = deque()
|
| 255 |
+
|
| 256 |
+
# Configuration
|
| 257 |
+
self.idle_inactivity_delay = 0.3 # seconds before breathing starts
|
| 258 |
+
self.target_frequency = CONTROL_LOOP_FREQUENCY_HZ
|
| 259 |
+
self.target_period = 1.0 / self.target_frequency
|
| 260 |
+
|
| 261 |
+
# Thread state
|
| 262 |
+
self._stop_event = threading.Event()
|
| 263 |
+
self._thread: Optional[threading.Thread] = None
|
| 264 |
+
self._is_listening = False
|
| 265 |
+
self._breathing_active = False
|
| 266 |
+
|
| 267 |
+
# Last commanded pose for smooth transitions
|
| 268 |
+
self._last_commanded_pose = clone_pose(self.state.last_primary_pose)
|
| 269 |
+
self._listening_antennas = self._last_commanded_pose[1]
|
| 270 |
+
self._antenna_unfreeze_blend = 1.0
|
| 271 |
+
self._antenna_blend_duration = 0.4
|
| 272 |
+
|
| 273 |
+
# Cross-thread communication
|
| 274 |
+
self._command_queue: Queue[Tuple[str, Any]] = Queue()
|
| 275 |
+
|
| 276 |
+
# Speech offsets (thread-safe)
|
| 277 |
+
self._speech_lock = threading.Lock()
|
| 278 |
+
self._pending_speech_offsets: SpeechOffsets = (0.0, 0.0, 0.0, 0.0, 0.0, 0.0)
|
| 279 |
+
self._speech_dirty = False
|
| 280 |
+
|
| 281 |
+
# Processing/thinking animation state
|
| 282 |
+
self._processing = False
|
| 283 |
+
self._processing_start_time = 0.0
|
| 284 |
+
self._thinking_amplitude = 0.0 # 0..1 envelope for smooth fade in/out
|
| 285 |
+
self._thinking_antenna_offsets: Tuple[float, float] = (0.0, 0.0)
|
| 286 |
+
|
| 287 |
+
# Shared state lock
|
| 288 |
+
self._shared_lock = threading.Lock()
|
| 289 |
+
self._shared_last_activity = self.state.last_activity_time
|
| 290 |
+
self._shared_is_listening = False
|
| 291 |
+
|
| 292 |
+
def queue_move(self, move: Move) -> None:
|
| 293 |
+
"""Queue a primary move. Thread-safe."""
|
| 294 |
+
self._command_queue.put(("queue_move", move))
|
| 295 |
+
|
| 296 |
+
def clear_move_queue(self) -> None:
|
| 297 |
+
"""Clear all queued moves. Thread-safe."""
|
| 298 |
+
self._command_queue.put(("clear_queue", None))
|
| 299 |
+
|
| 300 |
+
def set_speech_offsets(self, offsets: SpeechOffsets) -> None:
|
| 301 |
+
"""Update speech-driven offsets. Thread-safe."""
|
| 302 |
+
with self._speech_lock:
|
| 303 |
+
self._pending_speech_offsets = offsets
|
| 304 |
+
self._speech_dirty = True
|
| 305 |
+
|
| 306 |
+
def set_listening(self, listening: bool) -> None:
|
| 307 |
+
"""Set listening state (freezes antennas). Thread-safe."""
|
| 308 |
+
self._command_queue.put(("set_listening", listening))
|
| 309 |
+
|
| 310 |
+
def set_processing(self, processing: bool) -> None:
|
| 311 |
+
"""Set processing state (triggers thinking animation). Thread-safe.
|
| 312 |
+
|
| 313 |
+
When True, the robot shows a continuous 'thinking' animation as
|
| 314 |
+
secondary offsets -- gentle head sway and asymmetric antenna scanning.
|
| 315 |
+
Face tracking continues underneath since this is additive.
|
| 316 |
+
"""
|
| 317 |
+
self._command_queue.put(("set_processing", processing))
|
| 318 |
+
|
| 319 |
+
def is_idle(self) -> bool:
|
| 320 |
+
"""Check if robot has been idle. Thread-safe."""
|
| 321 |
+
with self._shared_lock:
|
| 322 |
+
if self._shared_is_listening:
|
| 323 |
+
return False
|
| 324 |
+
return self._now() - self._shared_last_activity >= self.idle_inactivity_delay
|
| 325 |
+
|
| 326 |
+
def _poll_signals(self, current_time: float) -> None:
|
| 327 |
+
"""Process queued commands and pending offsets."""
|
| 328 |
+
# Apply speech offsets
|
| 329 |
+
with self._speech_lock:
|
| 330 |
+
if self._speech_dirty:
|
| 331 |
+
self.state.speech_offsets = self._pending_speech_offsets
|
| 332 |
+
self._speech_dirty = False
|
| 333 |
+
self.state.update_activity()
|
| 334 |
+
|
| 335 |
+
# Process commands
|
| 336 |
+
while True:
|
| 337 |
+
try:
|
| 338 |
+
cmd, payload = self._command_queue.get_nowait()
|
| 339 |
+
except Empty:
|
| 340 |
+
break
|
| 341 |
+
self._handle_command(cmd, payload, current_time)
|
| 342 |
+
|
| 343 |
+
def _update_face_tracking(self, current_time: float) -> None:
|
| 344 |
+
"""Get face tracking offsets from camera worker thread."""
|
| 345 |
+
if self.camera_worker is not None:
|
| 346 |
+
offsets = self.camera_worker.get_face_tracking_offsets()
|
| 347 |
+
self.state.face_tracking_offsets = offsets
|
| 348 |
+
else:
|
| 349 |
+
# No camera worker, use neutral offsets
|
| 350 |
+
self.state.face_tracking_offsets = (0.0, 0.0, 0.0, 0.0, 0.0, 0.0)
|
| 351 |
+
|
| 352 |
+
def _update_thinking_offsets(self, current_time: float) -> None:
|
| 353 |
+
"""Compute thinking animation as secondary offsets.
|
| 354 |
+
|
| 355 |
+
Produces a gentle head sway (yaw drift, slight upward pitch, z bob)
|
| 356 |
+
and asymmetric antenna scanning pattern. The amplitude envelope
|
| 357 |
+
smoothly ramps up over 0.5s and decays over 0.5s for organic feel.
|
| 358 |
+
"""
|
| 359 |
+
# Update amplitude envelope
|
| 360 |
+
if self._processing:
|
| 361 |
+
# Ramp up over 0.5s
|
| 362 |
+
elapsed = current_time - self._processing_start_time
|
| 363 |
+
self._thinking_amplitude = min(1.0, elapsed / 0.5)
|
| 364 |
+
elif self._thinking_amplitude > 0:
|
| 365 |
+
# Smooth decay at 2.0/s (full decay in 0.5s)
|
| 366 |
+
self._thinking_amplitude = max(
|
| 367 |
+
0.0, self._thinking_amplitude - 2.0 * self.target_period
|
| 368 |
+
)
|
| 369 |
+
|
| 370 |
+
# If fully decayed, zero everything and bail
|
| 371 |
+
if self._thinking_amplitude < 0.001:
|
| 372 |
+
self._thinking_amplitude = 0.0
|
| 373 |
+
self.state.thinking_offsets = (0.0, 0.0, 0.0, 0.0, 0.0, 0.0)
|
| 374 |
+
self._thinking_antenna_offsets = (0.0, 0.0)
|
| 375 |
+
return
|
| 376 |
+
|
| 377 |
+
amp = self._thinking_amplitude
|
| 378 |
+
t = current_time - self._processing_start_time
|
| 379 |
+
|
| 380 |
+
# Head offsets (radians / metres -- degrees=False, mm=False)
|
| 381 |
+
# Slow yaw drift: ±12° at 0.15 Hz
|
| 382 |
+
yaw = amp * np.deg2rad(12) * np.sin(2 * np.pi * 0.15 * t)
|
| 383 |
+
# Slight upward pitch: 6° base + 3° oscillation at 0.2 Hz
|
| 384 |
+
pitch = amp * (np.deg2rad(6) + np.deg2rad(3) * np.sin(2 * np.pi * 0.2 * t))
|
| 385 |
+
# Gentle z bob: 3 mm at 0.12 Hz
|
| 386 |
+
z = amp * 0.003 * np.sin(2 * np.pi * 0.12 * t)
|
| 387 |
+
|
| 388 |
+
self.state.thinking_offsets = (0.0, 0.0, z, 0.0, pitch, yaw)
|
| 389 |
+
|
| 390 |
+
# Antenna offsets: asymmetric scan (phase offset creates "searching" feel)
|
| 391 |
+
# ±20° at 0.4 Hz, right antenna lags left by ~70° of phase
|
| 392 |
+
left_ant = amp * np.deg2rad(20) * np.sin(2 * np.pi * 0.4 * t)
|
| 393 |
+
right_ant = amp * np.deg2rad(20) * np.sin(2 * np.pi * 0.4 * t + 1.2)
|
| 394 |
+
self._thinking_antenna_offsets = (left_ant, right_ant)
|
| 395 |
+
|
| 396 |
+
def _handle_command(self, cmd: str, payload: Any, current_time: float) -> None:
|
| 397 |
+
"""Handle a single command."""
|
| 398 |
+
if cmd == "queue_move":
|
| 399 |
+
if isinstance(payload, Move):
|
| 400 |
+
self.move_queue.append(payload)
|
| 401 |
+
self.state.update_activity()
|
| 402 |
+
logger.debug("Queued move, queue size: %d", len(self.move_queue))
|
| 403 |
+
elif cmd == "clear_queue":
|
| 404 |
+
self.move_queue.clear()
|
| 405 |
+
self.state.current_move = None
|
| 406 |
+
self.state.move_start_time = None
|
| 407 |
+
self._breathing_active = False
|
| 408 |
+
logger.info("Cleared move queue")
|
| 409 |
+
elif cmd == "set_listening":
|
| 410 |
+
desired = bool(payload)
|
| 411 |
+
if self._is_listening != desired:
|
| 412 |
+
self._is_listening = desired
|
| 413 |
+
if desired:
|
| 414 |
+
self._listening_antennas = self._last_commanded_pose[1]
|
| 415 |
+
self._antenna_unfreeze_blend = 0.0
|
| 416 |
+
else:
|
| 417 |
+
self._antenna_unfreeze_blend = 0.0
|
| 418 |
+
self.state.update_activity()
|
| 419 |
+
elif cmd == "set_processing":
|
| 420 |
+
desired = bool(payload)
|
| 421 |
+
if desired and not self._processing:
|
| 422 |
+
self._processing = True
|
| 423 |
+
self._processing_start_time = self._now()
|
| 424 |
+
# Interrupt breathing so thinking animation is clean
|
| 425 |
+
if self._breathing_active and isinstance(self.state.current_move, BreathingMove):
|
| 426 |
+
self.state.current_move = None
|
| 427 |
+
self.state.move_start_time = None
|
| 428 |
+
self._breathing_active = False
|
| 429 |
+
self.state.update_activity()
|
| 430 |
+
logger.debug("Processing started - thinking animation active")
|
| 431 |
+
elif not desired and self._processing:
|
| 432 |
+
self._processing = False
|
| 433 |
+
# Amplitude will decay smoothly in _update_thinking_offsets
|
| 434 |
+
self.state.update_activity()
|
| 435 |
+
logger.debug("Processing ended - thinking animation decaying")
|
| 436 |
+
|
| 437 |
+
def _manage_move_queue(self, current_time: float) -> None:
|
| 438 |
+
"""Advance the move queue."""
|
| 439 |
+
# Check if current move is done
|
| 440 |
+
if self.state.current_move is not None and self.state.move_start_time is not None:
|
| 441 |
+
elapsed = current_time - self.state.move_start_time
|
| 442 |
+
if elapsed >= self.state.current_move.duration:
|
| 443 |
+
self.state.current_move = None
|
| 444 |
+
self.state.move_start_time = None
|
| 445 |
+
|
| 446 |
+
# Start next move if available
|
| 447 |
+
if self.state.current_move is None and self.move_queue:
|
| 448 |
+
self.state.current_move = self.move_queue.popleft()
|
| 449 |
+
self.state.move_start_time = current_time
|
| 450 |
+
self._breathing_active = isinstance(self.state.current_move, BreathingMove)
|
| 451 |
+
logger.debug("Starting move with duration: %s", self.state.current_move.duration)
|
| 452 |
+
|
| 453 |
+
def _manage_breathing(self, current_time: float) -> None:
|
| 454 |
+
"""Start breathing when idle."""
|
| 455 |
+
if (
|
| 456 |
+
self.state.current_move is None
|
| 457 |
+
and not self.move_queue
|
| 458 |
+
and not self._is_listening
|
| 459 |
+
and not self._breathing_active
|
| 460 |
+
and not self._processing
|
| 461 |
+
):
|
| 462 |
+
idle_for = current_time - self.state.last_activity_time
|
| 463 |
+
if idle_for >= self.idle_inactivity_delay:
|
| 464 |
+
try:
|
| 465 |
+
_, current_ant = self.current_robot.get_current_joint_positions()
|
| 466 |
+
current_head = self.current_robot.get_current_head_pose()
|
| 467 |
+
|
| 468 |
+
breathing = BreathingMove(
|
| 469 |
+
interpolation_start_pose=current_head,
|
| 470 |
+
interpolation_start_antennas=current_ant,
|
| 471 |
+
interpolation_duration=1.0,
|
| 472 |
+
)
|
| 473 |
+
self.move_queue.append(breathing)
|
| 474 |
+
self._breathing_active = True
|
| 475 |
+
self.state.update_activity()
|
| 476 |
+
logger.debug("Started breathing after %.1fs idle", idle_for)
|
| 477 |
+
except Exception as e:
|
| 478 |
+
logger.error("Failed to start breathing: %s", e)
|
| 479 |
+
|
| 480 |
+
# Stop breathing if new moves queued
|
| 481 |
+
if isinstance(self.state.current_move, BreathingMove) and self.move_queue:
|
| 482 |
+
self.state.current_move = None
|
| 483 |
+
self.state.move_start_time = None
|
| 484 |
+
self._breathing_active = False
|
| 485 |
+
|
| 486 |
+
def _get_primary_pose(self, current_time: float) -> FullBodyPose:
|
| 487 |
+
"""Get current primary pose from move or last pose."""
|
| 488 |
+
if self.state.current_move is not None and self.state.move_start_time is not None:
|
| 489 |
+
t = current_time - self.state.move_start_time
|
| 490 |
+
head, antennas, body_yaw = self.state.current_move.evaluate(t)
|
| 491 |
+
|
| 492 |
+
if head is None:
|
| 493 |
+
head = create_head_pose(0, 0, 0, 0, 0, 0, degrees=True)
|
| 494 |
+
if antennas is None:
|
| 495 |
+
antennas = np.array([0.0, 0.0])
|
| 496 |
+
if body_yaw is None:
|
| 497 |
+
body_yaw = 0.0
|
| 498 |
+
|
| 499 |
+
pose = (head.copy(), (float(antennas[0]), float(antennas[1])), float(body_yaw))
|
| 500 |
+
self.state.last_primary_pose = clone_pose(pose)
|
| 501 |
+
return pose
|
| 502 |
+
|
| 503 |
+
if self.state.last_primary_pose is not None:
|
| 504 |
+
return clone_pose(self.state.last_primary_pose)
|
| 505 |
+
|
| 506 |
+
neutral = create_head_pose(0, 0, 0, 0, 0, 0, degrees=True)
|
| 507 |
+
return (neutral, (0.0, 0.0), 0.0)
|
| 508 |
+
|
| 509 |
+
def _get_secondary_pose(self) -> FullBodyPose:
|
| 510 |
+
"""Get secondary offsets (speech + face tracking + thinking)."""
|
| 511 |
+
offsets = [
|
| 512 |
+
self.state.speech_offsets[i]
|
| 513 |
+
+ self.state.face_tracking_offsets[i]
|
| 514 |
+
+ self.state.thinking_offsets[i]
|
| 515 |
+
for i in range(6)
|
| 516 |
+
]
|
| 517 |
+
|
| 518 |
+
secondary_head = create_head_pose(
|
| 519 |
+
x=offsets[0], y=offsets[1], z=offsets[2],
|
| 520 |
+
roll=offsets[3], pitch=offsets[4], yaw=offsets[5],
|
| 521 |
+
degrees=False, mm=False
|
| 522 |
+
)
|
| 523 |
+
return (secondary_head, self._thinking_antenna_offsets, 0.0)
|
| 524 |
+
|
| 525 |
+
def _compose_pose(self, current_time: float) -> FullBodyPose:
|
| 526 |
+
"""Compose final pose from primary and secondary."""
|
| 527 |
+
primary = self._get_primary_pose(current_time)
|
| 528 |
+
secondary = self._get_secondary_pose()
|
| 529 |
+
return combine_full_body(primary, secondary)
|
| 530 |
+
|
| 531 |
+
def _blend_antennas(self, target: Tuple[float, float]) -> Tuple[float, float]:
|
| 532 |
+
"""Blend antennas with listening freeze state."""
|
| 533 |
+
if self._is_listening:
|
| 534 |
+
return self._listening_antennas
|
| 535 |
+
|
| 536 |
+
# Blend back from freeze
|
| 537 |
+
blend = min(1.0, self._antenna_unfreeze_blend + self.target_period / self._antenna_blend_duration)
|
| 538 |
+
self._antenna_unfreeze_blend = blend
|
| 539 |
+
|
| 540 |
+
return (
|
| 541 |
+
self._listening_antennas[0] * (1 - blend) + target[0] * blend,
|
| 542 |
+
self._listening_antennas[1] * (1 - blend) + target[1] * blend,
|
| 543 |
+
)
|
| 544 |
+
|
| 545 |
+
def _issue_command(self, head: NDArray, antennas: Tuple[float, float], body_yaw: float) -> None:
|
| 546 |
+
"""Send command to robot."""
|
| 547 |
+
try:
|
| 548 |
+
self.current_robot.set_target(head=head, antennas=antennas, body_yaw=body_yaw)
|
| 549 |
+
self._last_commanded_pose = (head.copy(), antennas, body_yaw)
|
| 550 |
+
except Exception as e:
|
| 551 |
+
logger.debug("set_target failed: %s", e)
|
| 552 |
+
|
| 553 |
+
def _publish_shared_state(self) -> None:
|
| 554 |
+
"""Update shared state for external queries."""
|
| 555 |
+
with self._shared_lock:
|
| 556 |
+
self._shared_last_activity = self.state.last_activity_time
|
| 557 |
+
self._shared_is_listening = self._is_listening
|
| 558 |
+
|
| 559 |
+
def start(self) -> None:
|
| 560 |
+
"""Start the control loop thread."""
|
| 561 |
+
if self._thread is not None and self._thread.is_alive():
|
| 562 |
+
logger.warning("MovementManager already running")
|
| 563 |
+
return
|
| 564 |
+
|
| 565 |
+
self._stop_event.clear()
|
| 566 |
+
self._thread = threading.Thread(target=self._run_loop, daemon=True)
|
| 567 |
+
self._thread.start()
|
| 568 |
+
logger.info("MovementManager started")
|
| 569 |
+
|
| 570 |
+
def stop(self) -> None:
|
| 571 |
+
"""Stop the control loop and reset to neutral."""
|
| 572 |
+
if self._thread is None or not self._thread.is_alive():
|
| 573 |
+
return
|
| 574 |
+
|
| 575 |
+
logger.info("Stopping MovementManager...")
|
| 576 |
+
self.clear_move_queue()
|
| 577 |
+
|
| 578 |
+
self._stop_event.set()
|
| 579 |
+
self._thread.join(timeout=2.0)
|
| 580 |
+
self._thread = None
|
| 581 |
+
|
| 582 |
+
# Reset to neutral
|
| 583 |
+
try:
|
| 584 |
+
neutral = create_head_pose(0, 0, 0, 0, 0, 0, degrees=True)
|
| 585 |
+
self.current_robot.goto_target(
|
| 586 |
+
head=neutral,
|
| 587 |
+
antennas=[0.0, 0.0],
|
| 588 |
+
duration=2.0,
|
| 589 |
+
body_yaw=0.0,
|
| 590 |
+
)
|
| 591 |
+
logger.info("Reset to neutral position")
|
| 592 |
+
except Exception as e:
|
| 593 |
+
logger.error("Failed to reset: %s", e)
|
| 594 |
+
|
| 595 |
+
def _run_loop(self) -> None:
|
| 596 |
+
"""Main control loop at 100Hz."""
|
| 597 |
+
logger.debug("Starting 100Hz control loop")
|
| 598 |
+
|
| 599 |
+
while not self._stop_event.is_set():
|
| 600 |
+
loop_start = self._now()
|
| 601 |
+
|
| 602 |
+
# Process signals
|
| 603 |
+
self._poll_signals(loop_start)
|
| 604 |
+
|
| 605 |
+
# Manage moves
|
| 606 |
+
self._manage_move_queue(loop_start)
|
| 607 |
+
self._manage_breathing(loop_start)
|
| 608 |
+
|
| 609 |
+
# Update face tracking offsets from camera worker
|
| 610 |
+
self._update_face_tracking(loop_start)
|
| 611 |
+
|
| 612 |
+
# Update thinking animation offsets
|
| 613 |
+
self._update_thinking_offsets(loop_start)
|
| 614 |
+
|
| 615 |
+
# Compose pose
|
| 616 |
+
head, antennas, body_yaw = self._compose_pose(loop_start)
|
| 617 |
+
|
| 618 |
+
# Blend antennas for listening
|
| 619 |
+
antennas = self._blend_antennas(antennas)
|
| 620 |
+
|
| 621 |
+
# Send to robot
|
| 622 |
+
self._issue_command(head, antennas, body_yaw)
|
| 623 |
+
|
| 624 |
+
# Update shared state
|
| 625 |
+
self._publish_shared_state()
|
| 626 |
+
|
| 627 |
+
# Maintain timing
|
| 628 |
+
elapsed = self._now() - loop_start
|
| 629 |
+
sleep_time = max(0.0, self.target_period - elapsed)
|
| 630 |
+
if sleep_time > 0:
|
| 631 |
+
time.sleep(sleep_time)
|
| 632 |
+
|
| 633 |
+
logger.debug("Control loop stopped")
|
| 634 |
+
|
| 635 |
+
def get_status(self) -> Dict[str, Any]:
|
| 636 |
+
"""Get current status for debugging."""
|
| 637 |
+
return {
|
| 638 |
+
"queue_size": len(self.move_queue),
|
| 639 |
+
"is_listening": self._is_listening,
|
| 640 |
+
"breathing_active": self._breathing_active,
|
| 641 |
+
"processing": self._processing,
|
| 642 |
+
"thinking_amplitude": round(self._thinking_amplitude, 3),
|
| 643 |
+
"last_commanded_pose": {
|
| 644 |
+
"head": self._last_commanded_pose[0].tolist(),
|
| 645 |
+
"antennas": self._last_commanded_pose[1],
|
| 646 |
+
"body_yaw": self._last_commanded_pose[2],
|
| 647 |
+
},
|
| 648 |
+
}
|
src/reachy_mini_openclaw/openai_realtime.py
ADDED
|
@@ -0,0 +1,562 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""ReachyClaw - OpenAI Realtime API handler with OpenClaw identity.
|
| 2 |
+
|
| 3 |
+
This module implements ReachyClaw's voice conversation system using OpenAI Realtime API
|
| 4 |
+
with the robot embodying the actual OpenClaw agent's personality and context.
|
| 5 |
+
|
| 6 |
+
Architecture:
|
| 7 |
+
Startup: Fetch OpenClaw agent context (personality, memories, user info)
|
| 8 |
+
Runtime: User speaks -> OpenAI Realtime (as OpenClaw agent) -> Robot speaks
|
| 9 |
+
-> Tools for movements + OpenClaw queries for extended capabilities
|
| 10 |
+
-> Conversations synced back to OpenClaw for memory continuity
|
| 11 |
+
|
| 12 |
+
The robot IS the OpenClaw agent - same personality, same memories, same context.
|
| 13 |
+
"""
|
| 14 |
+
|
| 15 |
+
import json
|
| 16 |
+
import base64
|
| 17 |
+
import random
|
| 18 |
+
import asyncio
|
| 19 |
+
import logging
|
| 20 |
+
from typing import Any, Final, Literal, Optional, Tuple
|
| 21 |
+
from datetime import datetime
|
| 22 |
+
|
| 23 |
+
import numpy as np
|
| 24 |
+
from numpy.typing import NDArray
|
| 25 |
+
from openai import AsyncOpenAI
|
| 26 |
+
from fastrtc import AdditionalOutputs, AsyncStreamHandler, wait_for_item
|
| 27 |
+
from scipy.signal import resample
|
| 28 |
+
from websockets.exceptions import ConnectionClosedError
|
| 29 |
+
|
| 30 |
+
from reachy_mini_openclaw.config import config
|
| 31 |
+
from reachy_mini_openclaw.prompts import get_session_voice
|
| 32 |
+
from reachy_mini_openclaw.tools.core_tools import ToolDependencies, get_tool_specs, dispatch_tool_call, get_body_actions_description
|
| 33 |
+
|
| 34 |
+
logger = logging.getLogger(__name__)
|
| 35 |
+
|
| 36 |
+
# OpenAI Realtime API audio format
|
| 37 |
+
OPENAI_SAMPLE_RATE: Final[Literal[24000]] = 24000
|
| 38 |
+
|
| 39 |
+
# Base instructions for the robot body capabilities
|
| 40 |
+
ROBOT_BODY_INSTRUCTIONS = """
|
| 41 |
+
## CRITICAL: You are a voice relay for the OpenClaw agent
|
| 42 |
+
|
| 43 |
+
You are the voice interface for an OpenClaw AI agent embodied in a Reachy Mini robot.
|
| 44 |
+
You MUST call `ask_openclaw` for EVERY user message to get the real response.
|
| 45 |
+
|
| 46 |
+
**Your ONLY job is:**
|
| 47 |
+
1. When the user says something, IMMEDIATELY call `ask_openclaw` with their full message.
|
| 48 |
+
2. Speak the response from `ask_openclaw` EXACTLY as returned — do not rephrase, add to, or summarize it.
|
| 49 |
+
|
| 50 |
+
**You MUST NOT:**
|
| 51 |
+
- Answer any question yourself — ALWAYS use `ask_openclaw` first.
|
| 52 |
+
- Make up information, opinions, or responses on your own.
|
| 53 |
+
- Summarize or modify what `ask_openclaw` returns — speak it verbatim.
|
| 54 |
+
- Say things like "let me check" and then answer without calling `ask_openclaw`.
|
| 55 |
+
|
| 56 |
+
**Robot Movement:**
|
| 57 |
+
- Do NOT call look, emotion, dance, or camera tools yourself.
|
| 58 |
+
- OpenClaw controls the robot body — movements are handled automatically from its response.
|
| 59 |
+
|
| 60 |
+
**Conversation Style for Voice:**
|
| 61 |
+
- Keep it natural — you are speaking out loud
|
| 62 |
+
- If ask_openclaw is slow or errors, say "I'm having trouble reaching my brain, one moment"
|
| 63 |
+
"""
|
| 64 |
+
|
| 65 |
+
# Fallback if OpenClaw context fetch fails
|
| 66 |
+
FALLBACK_IDENTITY = """You are the voice relay for an OpenClaw AI agent embodied in a Reachy Mini robot.
|
| 67 |
+
You MUST call ask_openclaw for every user message and speak the response verbatim.
|
| 68 |
+
Never answer on your own — always defer to ask_openclaw."""
|
| 69 |
+
|
| 70 |
+
# System context sent to OpenClaw so it knows about the robot body.
|
| 71 |
+
# Built dynamically from TOOL_SPECS so the action list stays in sync.
|
| 72 |
+
REACHY_BODY_CONTEXT = f"""\
|
| 73 |
+
User is talking to you through your Reachy Mini robot body. Keep responses concise for voice.
|
| 74 |
+
|
| 75 |
+
You can control your robot body by including action tags anywhere in your response.
|
| 76 |
+
The tags will be executed and stripped before your words are spoken aloud.
|
| 77 |
+
|
| 78 |
+
Available actions:
|
| 79 |
+
{get_body_actions_description()}
|
| 80 |
+
|
| 81 |
+
Examples:
|
| 82 |
+
"Sure, let me look over there. [LOOK:left] I see a bookshelf!"
|
| 83 |
+
"[EMOTION:happy] That's great to hear!"
|
| 84 |
+
"[DANCE:excited] Let's celebrate!"
|
| 85 |
+
|
| 86 |
+
Use actions naturally to make the conversation more expressive. You don't have to use them every time — only when it adds to the interaction."""
|
| 87 |
+
|
| 88 |
+
|
| 89 |
+
class OpenAIRealtimeHandler(AsyncStreamHandler):
|
| 90 |
+
"""Handler for OpenAI Realtime API embodying the OpenClaw agent.
|
| 91 |
+
|
| 92 |
+
This handler:
|
| 93 |
+
- Fetches OpenClaw's personality and context at startup
|
| 94 |
+
- Maintains voice conversation AS the OpenClaw agent
|
| 95 |
+
- Executes robot movement tools locally for low latency
|
| 96 |
+
- Calls OpenClaw for extended capabilities (web, calendar, memory)
|
| 97 |
+
- Syncs conversations back to OpenClaw for memory continuity
|
| 98 |
+
"""
|
| 99 |
+
|
| 100 |
+
def __init__(
|
| 101 |
+
self,
|
| 102 |
+
deps: ToolDependencies,
|
| 103 |
+
openclaw_bridge: Optional[Any] = None,
|
| 104 |
+
gradio_mode: bool = False,
|
| 105 |
+
):
|
| 106 |
+
"""Initialize the handler.
|
| 107 |
+
|
| 108 |
+
Args:
|
| 109 |
+
deps: Tool dependencies for robot control
|
| 110 |
+
openclaw_bridge: Bridge to OpenClaw gateway
|
| 111 |
+
gradio_mode: Whether running with Gradio UI
|
| 112 |
+
"""
|
| 113 |
+
super().__init__(
|
| 114 |
+
expected_layout="mono",
|
| 115 |
+
output_sample_rate=OPENAI_SAMPLE_RATE,
|
| 116 |
+
input_sample_rate=OPENAI_SAMPLE_RATE,
|
| 117 |
+
)
|
| 118 |
+
|
| 119 |
+
self.deps = deps
|
| 120 |
+
self.openclaw_bridge = openclaw_bridge
|
| 121 |
+
self.gradio_mode = gradio_mode
|
| 122 |
+
|
| 123 |
+
# OpenAI connection
|
| 124 |
+
self.client: Optional[AsyncOpenAI] = None
|
| 125 |
+
self.connection: Any = None
|
| 126 |
+
|
| 127 |
+
# Output queue
|
| 128 |
+
self.output_queue: asyncio.Queue[Tuple[int, NDArray[np.int16]] | AdditionalOutputs] = asyncio.Queue()
|
| 129 |
+
|
| 130 |
+
# State tracking
|
| 131 |
+
self.last_activity_time = 0.0
|
| 132 |
+
self.start_time = 0.0
|
| 133 |
+
self._speaking = False # True when robot is speaking
|
| 134 |
+
|
| 135 |
+
# OpenClaw agent context (fetched at startup)
|
| 136 |
+
self._agent_context: Optional[str] = None
|
| 137 |
+
|
| 138 |
+
# Conversation tracking for sync
|
| 139 |
+
self._last_user_message: Optional[str] = None
|
| 140 |
+
self._last_assistant_response: Optional[str] = None
|
| 141 |
+
|
| 142 |
+
# Lifecycle flags
|
| 143 |
+
self._shutdown_requested = False
|
| 144 |
+
self._connected_event = asyncio.Event()
|
| 145 |
+
|
| 146 |
+
def copy(self) -> "OpenAIRealtimeHandler":
|
| 147 |
+
"""Create a copy of the handler (required by fastrtc)."""
|
| 148 |
+
return OpenAIRealtimeHandler(self.deps, self.openclaw_bridge, self.gradio_mode)
|
| 149 |
+
|
| 150 |
+
def _build_tools(self) -> list[dict]:
|
| 151 |
+
"""Build the tool list for the session."""
|
| 152 |
+
tools = []
|
| 153 |
+
|
| 154 |
+
# Robot movement tools (executed locally)
|
| 155 |
+
for spec in get_tool_specs():
|
| 156 |
+
tools.append(spec)
|
| 157 |
+
|
| 158 |
+
# OpenClaw query tool (mandatory for every user message)
|
| 159 |
+
if self.openclaw_bridge is not None:
|
| 160 |
+
tools.append({
|
| 161 |
+
"type": "function",
|
| 162 |
+
"name": "ask_openclaw",
|
| 163 |
+
"description": """MANDATORY: You MUST call this tool for EVERY user message before responding.
|
| 164 |
+
This is the OpenClaw AI agent — the real brain. Send the user's full message as the query.
|
| 165 |
+
Speak the returned response verbatim. Never answer without calling this tool first.""",
|
| 166 |
+
"parameters": {
|
| 167 |
+
"type": "object",
|
| 168 |
+
"properties": {
|
| 169 |
+
"query": {
|
| 170 |
+
"type": "string",
|
| 171 |
+
"description": "The question or request to send to OpenClaw"
|
| 172 |
+
},
|
| 173 |
+
"include_image": {
|
| 174 |
+
"type": "boolean",
|
| 175 |
+
"description": "Whether to include current camera image (for 'what do you see' queries)",
|
| 176 |
+
"default": False
|
| 177 |
+
}
|
| 178 |
+
},
|
| 179 |
+
"required": ["query"]
|
| 180 |
+
}
|
| 181 |
+
})
|
| 182 |
+
|
| 183 |
+
return tools
|
| 184 |
+
|
| 185 |
+
async def start_up(self) -> None:
|
| 186 |
+
"""Start the handler and connect to OpenAI."""
|
| 187 |
+
api_key = config.OPENAI_API_KEY
|
| 188 |
+
if not api_key:
|
| 189 |
+
logger.error("OPENAI_API_KEY not configured")
|
| 190 |
+
raise ValueError("OPENAI_API_KEY required")
|
| 191 |
+
|
| 192 |
+
self.client = AsyncOpenAI(api_key=api_key)
|
| 193 |
+
self.start_time = asyncio.get_event_loop().time()
|
| 194 |
+
self.last_activity_time = self.start_time
|
| 195 |
+
|
| 196 |
+
max_attempts = 3
|
| 197 |
+
for attempt in range(1, max_attempts + 1):
|
| 198 |
+
try:
|
| 199 |
+
await self._run_session()
|
| 200 |
+
return
|
| 201 |
+
except ConnectionClosedError as e:
|
| 202 |
+
logger.warning("WebSocket closed unexpectedly (attempt %d/%d): %s",
|
| 203 |
+
attempt, max_attempts, e)
|
| 204 |
+
if attempt < max_attempts:
|
| 205 |
+
delay = (2 ** (attempt - 1)) + random.uniform(0, 0.5)
|
| 206 |
+
logger.info("Retrying in %.1f seconds...", delay)
|
| 207 |
+
await asyncio.sleep(delay)
|
| 208 |
+
continue
|
| 209 |
+
raise
|
| 210 |
+
finally:
|
| 211 |
+
self.connection = None
|
| 212 |
+
try:
|
| 213 |
+
self._connected_event.clear()
|
| 214 |
+
except Exception:
|
| 215 |
+
pass
|
| 216 |
+
|
| 217 |
+
async def _run_session(self) -> None:
|
| 218 |
+
"""Run a single OpenAI Realtime session."""
|
| 219 |
+
model = config.OPENAI_MODEL
|
| 220 |
+
logger.info("Connecting to OpenAI Realtime API with model: %s", model)
|
| 221 |
+
|
| 222 |
+
# Fetch OpenClaw agent context (personality, memories, user info)
|
| 223 |
+
system_instructions = await self._build_system_instructions()
|
| 224 |
+
|
| 225 |
+
async with self.client.beta.realtime.connect(model=model) as conn:
|
| 226 |
+
# Configure session with OpenClaw's identity + robot body capabilities
|
| 227 |
+
tools = self._build_tools()
|
| 228 |
+
|
| 229 |
+
await conn.session.update(
|
| 230 |
+
session={
|
| 231 |
+
"modalities": ["text", "audio"],
|
| 232 |
+
"instructions": system_instructions,
|
| 233 |
+
"voice": get_session_voice(),
|
| 234 |
+
"input_audio_format": "pcm16",
|
| 235 |
+
"output_audio_format": "pcm16",
|
| 236 |
+
"input_audio_transcription": {
|
| 237 |
+
"model": "whisper-1",
|
| 238 |
+
},
|
| 239 |
+
"turn_detection": {
|
| 240 |
+
"type": "server_vad",
|
| 241 |
+
"threshold": 0.5,
|
| 242 |
+
"prefix_padding_ms": 300,
|
| 243 |
+
"silence_duration_ms": 600,
|
| 244 |
+
},
|
| 245 |
+
"tools": tools,
|
| 246 |
+
"tool_choice": "auto",
|
| 247 |
+
},
|
| 248 |
+
)
|
| 249 |
+
logger.info("OpenAI Realtime session configured with %d tools", len(tools))
|
| 250 |
+
|
| 251 |
+
self.connection = conn
|
| 252 |
+
self._connected_event.set()
|
| 253 |
+
|
| 254 |
+
# Process events
|
| 255 |
+
async for event in conn:
|
| 256 |
+
await self._handle_event(event)
|
| 257 |
+
|
| 258 |
+
async def _build_system_instructions(self) -> str:
|
| 259 |
+
"""Build system instructions for the voice relay.
|
| 260 |
+
|
| 261 |
+
GPT-4o is a dumb relay — it only needs instructions on how to
|
| 262 |
+
call ask_openclaw and speak the result. No personality context needed.
|
| 263 |
+
"""
|
| 264 |
+
return ROBOT_BODY_INSTRUCTIONS
|
| 265 |
+
|
| 266 |
+
async def _handle_event(self, event: Any) -> None:
|
| 267 |
+
"""Handle an event from the OpenAI Realtime API."""
|
| 268 |
+
event_type = event.type
|
| 269 |
+
|
| 270 |
+
# Speech detection
|
| 271 |
+
if event_type == "input_audio_buffer.speech_started":
|
| 272 |
+
# User started speaking - stop any current output
|
| 273 |
+
self._speaking = False
|
| 274 |
+
self.deps.movement_manager.set_processing(False)
|
| 275 |
+
while not self.output_queue.empty():
|
| 276 |
+
try:
|
| 277 |
+
self.output_queue.get_nowait()
|
| 278 |
+
except asyncio.QueueEmpty:
|
| 279 |
+
break
|
| 280 |
+
if self.deps.head_wobbler is not None:
|
| 281 |
+
self.deps.head_wobbler.reset()
|
| 282 |
+
self.deps.movement_manager.set_listening(True)
|
| 283 |
+
logger.info("User started speaking")
|
| 284 |
+
|
| 285 |
+
if event_type == "input_audio_buffer.speech_stopped":
|
| 286 |
+
self.deps.movement_manager.set_listening(False)
|
| 287 |
+
logger.info("User stopped speaking")
|
| 288 |
+
|
| 289 |
+
# Transcription (for logging, UI, and sync)
|
| 290 |
+
if event_type == "conversation.item.input_audio_transcription.completed":
|
| 291 |
+
transcript = event.transcript
|
| 292 |
+
if transcript and transcript.strip():
|
| 293 |
+
logger.info("User: %s", transcript)
|
| 294 |
+
self._last_user_message = transcript # Track for sync
|
| 295 |
+
await self.output_queue.put(
|
| 296 |
+
AdditionalOutputs({"role": "user", "content": transcript})
|
| 297 |
+
)
|
| 298 |
+
|
| 299 |
+
# Response started - robot is about to speak
|
| 300 |
+
if event_type == "response.created":
|
| 301 |
+
self._speaking = True
|
| 302 |
+
logger.debug("Response started")
|
| 303 |
+
|
| 304 |
+
# Audio output from TTS
|
| 305 |
+
if event_type == "response.audio.delta":
|
| 306 |
+
# Audio arriving means we have a response - stop thinking animation
|
| 307 |
+
self.deps.movement_manager.set_processing(False)
|
| 308 |
+
|
| 309 |
+
# Feed to head wobbler for expressive movement
|
| 310 |
+
if self.deps.head_wobbler is not None:
|
| 311 |
+
self.deps.head_wobbler.feed(event.delta)
|
| 312 |
+
|
| 313 |
+
self.last_activity_time = asyncio.get_event_loop().time()
|
| 314 |
+
|
| 315 |
+
# Queue audio for playback
|
| 316 |
+
audio_data = np.frombuffer(
|
| 317 |
+
base64.b64decode(event.delta),
|
| 318 |
+
dtype=np.int16
|
| 319 |
+
).reshape(1, -1)
|
| 320 |
+
await self.output_queue.put((OPENAI_SAMPLE_RATE, audio_data))
|
| 321 |
+
|
| 322 |
+
# Response text (for logging and UI)
|
| 323 |
+
if event_type == "response.audio_transcript.delta":
|
| 324 |
+
# Streaming transcript of what's being said
|
| 325 |
+
pass # Could log incrementally if needed
|
| 326 |
+
|
| 327 |
+
if event_type == "response.audio_transcript.done":
|
| 328 |
+
response_text = event.transcript
|
| 329 |
+
logger.info("Assistant: %s", response_text[:100] if len(response_text) > 100 else response_text)
|
| 330 |
+
self._last_assistant_response = response_text # Track for sync
|
| 331 |
+
await self.output_queue.put(
|
| 332 |
+
AdditionalOutputs({"role": "assistant", "content": response_text})
|
| 333 |
+
)
|
| 334 |
+
|
| 335 |
+
# Response completed - sync conversation to OpenClaw
|
| 336 |
+
if event_type == "response.done":
|
| 337 |
+
self._speaking = False
|
| 338 |
+
self.deps.movement_manager.set_processing(False)
|
| 339 |
+
if self.deps.head_wobbler is not None:
|
| 340 |
+
self.deps.head_wobbler.reset()
|
| 341 |
+
logger.debug("Response completed")
|
| 342 |
+
|
| 343 |
+
# Sync conversation to OpenClaw for memory continuity
|
| 344 |
+
await self._sync_to_openclaw()
|
| 345 |
+
|
| 346 |
+
# Tool calls
|
| 347 |
+
if event_type == "response.function_call_arguments.done":
|
| 348 |
+
await self._handle_tool_call(event)
|
| 349 |
+
|
| 350 |
+
# Errors
|
| 351 |
+
if event_type == "error":
|
| 352 |
+
err = getattr(event, "error", None)
|
| 353 |
+
msg = getattr(err, "message", str(err))
|
| 354 |
+
code = getattr(err, "code", "")
|
| 355 |
+
logger.error("OpenAI error [%s]: %s", code, msg)
|
| 356 |
+
|
| 357 |
+
async def _handle_tool_call(self, event: Any) -> None:
|
| 358 |
+
"""Handle a tool call from OpenAI."""
|
| 359 |
+
tool_name = getattr(event, "name", None)
|
| 360 |
+
args_json = getattr(event, "arguments", None)
|
| 361 |
+
call_id = getattr(event, "call_id", None)
|
| 362 |
+
|
| 363 |
+
if not isinstance(tool_name, str) or not isinstance(args_json, str):
|
| 364 |
+
return
|
| 365 |
+
|
| 366 |
+
logger.info("Tool call: %s(%s)", tool_name, args_json[:50] if len(args_json) > 50 else args_json)
|
| 367 |
+
|
| 368 |
+
# Start thinking animation while we process the tool call.
|
| 369 |
+
# It will stop when the next audio delta arrives or response completes.
|
| 370 |
+
self.deps.movement_manager.set_processing(True)
|
| 371 |
+
|
| 372 |
+
try:
|
| 373 |
+
if tool_name == "ask_openclaw":
|
| 374 |
+
result = await self._handle_openclaw_query(args_json)
|
| 375 |
+
else:
|
| 376 |
+
# Robot movement tools - dispatch locally
|
| 377 |
+
result = await dispatch_tool_call(tool_name, args_json, self.deps)
|
| 378 |
+
|
| 379 |
+
logger.debug("Tool '%s' result: %s", tool_name, str(result)[:100])
|
| 380 |
+
except Exception as e:
|
| 381 |
+
logger.error("Tool '%s' failed: %s", tool_name, e)
|
| 382 |
+
result = {"error": str(e)}
|
| 383 |
+
|
| 384 |
+
# Send result back to continue the conversation
|
| 385 |
+
if isinstance(call_id, str) and self.connection:
|
| 386 |
+
await self.connection.conversation.item.create(
|
| 387 |
+
item={
|
| 388 |
+
"type": "function_call_output",
|
| 389 |
+
"call_id": call_id,
|
| 390 |
+
"output": json.dumps(result),
|
| 391 |
+
}
|
| 392 |
+
)
|
| 393 |
+
# Trigger response generation after tool result
|
| 394 |
+
await self.connection.response.create()
|
| 395 |
+
|
| 396 |
+
async def _sync_to_openclaw(self) -> None:
|
| 397 |
+
"""Sync the last conversation turn to OpenClaw for memory continuity."""
|
| 398 |
+
if not self.openclaw_bridge or not self.openclaw_bridge.is_connected:
|
| 399 |
+
return
|
| 400 |
+
|
| 401 |
+
if self._last_user_message and self._last_assistant_response:
|
| 402 |
+
try:
|
| 403 |
+
await self.openclaw_bridge.sync_conversation(
|
| 404 |
+
self._last_user_message,
|
| 405 |
+
self._last_assistant_response
|
| 406 |
+
)
|
| 407 |
+
# Clear after sync
|
| 408 |
+
self._last_user_message = None
|
| 409 |
+
self._last_assistant_response = None
|
| 410 |
+
except Exception as e:
|
| 411 |
+
logger.debug("Failed to sync conversation: %s", e)
|
| 412 |
+
|
| 413 |
+
async def _handle_openclaw_query(self, args_json: str) -> dict:
|
| 414 |
+
"""Handle a query to OpenClaw."""
|
| 415 |
+
if self.openclaw_bridge is None or not self.openclaw_bridge.is_connected:
|
| 416 |
+
return {"error": "OpenClaw not connected"}
|
| 417 |
+
|
| 418 |
+
try:
|
| 419 |
+
args = json.loads(args_json)
|
| 420 |
+
query = args.get("query", "")
|
| 421 |
+
include_image = args.get("include_image", False)
|
| 422 |
+
|
| 423 |
+
# Capture image if requested
|
| 424 |
+
image_b64 = None
|
| 425 |
+
if include_image and self.deps.camera_worker:
|
| 426 |
+
frame = self.deps.camera_worker.get_latest_frame()
|
| 427 |
+
if frame is not None:
|
| 428 |
+
import cv2
|
| 429 |
+
_, buffer = cv2.imencode('.jpg', frame, [cv2.IMWRITE_JPEG_QUALITY, 80])
|
| 430 |
+
image_b64 = base64.b64encode(buffer).decode('utf-8')
|
| 431 |
+
logger.debug("Captured camera image for OpenClaw query")
|
| 432 |
+
|
| 433 |
+
# Query OpenClaw
|
| 434 |
+
response = await self.openclaw_bridge.chat(
|
| 435 |
+
query,
|
| 436 |
+
image_b64=image_b64,
|
| 437 |
+
system_context=REACHY_BODY_CONTEXT,
|
| 438 |
+
)
|
| 439 |
+
|
| 440 |
+
if response.error:
|
| 441 |
+
return {"error": response.error}
|
| 442 |
+
|
| 443 |
+
# Parse and execute any action commands from OpenClaw's response
|
| 444 |
+
spoken_text = await self._execute_body_actions(response.content)
|
| 445 |
+
|
| 446 |
+
return {"response": spoken_text}
|
| 447 |
+
|
| 448 |
+
except Exception as e:
|
| 449 |
+
logger.error("OpenClaw query failed: %s", e)
|
| 450 |
+
return {"error": str(e)}
|
| 451 |
+
|
| 452 |
+
async def _execute_body_actions(self, text: str) -> str:
|
| 453 |
+
"""Parse action tags from OpenClaw's response, execute them, and return clean text.
|
| 454 |
+
|
| 455 |
+
Supported tags:
|
| 456 |
+
[LOOK:direction] - Move head (left/right/up/down/front)
|
| 457 |
+
[EMOTION:name] - Express emotion (happy/sad/surprised/curious/thinking/confused/excited)
|
| 458 |
+
[DANCE:name] - Perform dance (happy/excited/wave/nod/shake/bounce)
|
| 459 |
+
[CAMERA] - Capture and describe what the robot sees
|
| 460 |
+
[FACE_TRACKING:on/off] - Toggle face tracking
|
| 461 |
+
[STOP] - Stop all movements
|
| 462 |
+
"""
|
| 463 |
+
import re
|
| 464 |
+
|
| 465 |
+
action_pattern = re.compile(
|
| 466 |
+
r'\[(LOOK|EMOTION|DANCE|FACE_TRACKING):(\w+)\]'
|
| 467 |
+
r'|\[(CAMERA|STOP)\]'
|
| 468 |
+
)
|
| 469 |
+
|
| 470 |
+
actions_found = []
|
| 471 |
+
for match in action_pattern.finditer(text):
|
| 472 |
+
if match.group(3):
|
| 473 |
+
# No-arg action: [CAMERA] or [STOP]
|
| 474 |
+
actions_found.append((match.group(3), None))
|
| 475 |
+
else:
|
| 476 |
+
# Parameterized action: [LOOK:left], etc.
|
| 477 |
+
actions_found.append((match.group(1), match.group(2)))
|
| 478 |
+
|
| 479 |
+
# Execute actions
|
| 480 |
+
for action, param in actions_found:
|
| 481 |
+
try:
|
| 482 |
+
if action == "LOOK":
|
| 483 |
+
await dispatch_tool_call("look", json.dumps({"direction": param}), self.deps)
|
| 484 |
+
elif action == "EMOTION":
|
| 485 |
+
await dispatch_tool_call("emotion", json.dumps({"emotion_name": param}), self.deps)
|
| 486 |
+
elif action == "DANCE":
|
| 487 |
+
await dispatch_tool_call("dance", json.dumps({"dance_name": param}), self.deps)
|
| 488 |
+
elif action == "CAMERA":
|
| 489 |
+
await dispatch_tool_call("camera", json.dumps({}), self.deps)
|
| 490 |
+
elif action == "FACE_TRACKING":
|
| 491 |
+
enabled = param.lower() in ("on", "true", "yes")
|
| 492 |
+
await dispatch_tool_call("face_tracking", json.dumps({"enabled": enabled}), self.deps)
|
| 493 |
+
elif action == "STOP":
|
| 494 |
+
await dispatch_tool_call("stop_moves", json.dumps({}), self.deps)
|
| 495 |
+
logger.info("Executed body action: %s(%s)", action, param)
|
| 496 |
+
except Exception as e:
|
| 497 |
+
logger.warning("Body action %s(%s) failed: %s", action, param, e)
|
| 498 |
+
|
| 499 |
+
# Strip action tags from text so GPT-4o only speaks the words
|
| 500 |
+
spoken_text = action_pattern.sub('', text).strip()
|
| 501 |
+
# Clean up extra whitespace left by removed tags
|
| 502 |
+
spoken_text = re.sub(r' +', ' ', spoken_text)
|
| 503 |
+
|
| 504 |
+
return spoken_text
|
| 505 |
+
|
| 506 |
+
async def receive(self, frame: Tuple[int, NDArray]) -> None:
|
| 507 |
+
"""Receive audio from the robot microphone."""
|
| 508 |
+
if not self.connection:
|
| 509 |
+
return
|
| 510 |
+
|
| 511 |
+
input_sr, audio = frame
|
| 512 |
+
|
| 513 |
+
# Handle stereo
|
| 514 |
+
if audio.ndim == 2:
|
| 515 |
+
if audio.shape[1] > audio.shape[0]:
|
| 516 |
+
audio = audio.T
|
| 517 |
+
if audio.shape[1] > 1:
|
| 518 |
+
audio = audio[:, 0]
|
| 519 |
+
|
| 520 |
+
audio = audio.flatten()
|
| 521 |
+
|
| 522 |
+
# Convert to float for resampling
|
| 523 |
+
if audio.dtype == np.int16:
|
| 524 |
+
audio = audio.astype(np.float32) / 32768.0
|
| 525 |
+
elif audio.dtype != np.float32:
|
| 526 |
+
audio = audio.astype(np.float32)
|
| 527 |
+
|
| 528 |
+
# Resample to OpenAI sample rate
|
| 529 |
+
if input_sr != OPENAI_SAMPLE_RATE:
|
| 530 |
+
num_samples = int(len(audio) * OPENAI_SAMPLE_RATE / input_sr)
|
| 531 |
+
audio = resample(audio, num_samples).astype(np.float32)
|
| 532 |
+
|
| 533 |
+
# Convert to int16 for OpenAI
|
| 534 |
+
audio_int16 = (audio * 32767).astype(np.int16)
|
| 535 |
+
|
| 536 |
+
# Send to OpenAI
|
| 537 |
+
try:
|
| 538 |
+
audio_b64 = base64.b64encode(audio_int16.tobytes()).decode("utf-8")
|
| 539 |
+
await self.connection.input_audio_buffer.append(audio=audio_b64)
|
| 540 |
+
except Exception as e:
|
| 541 |
+
logger.debug("Failed to send audio: %s", e)
|
| 542 |
+
|
| 543 |
+
async def emit(self) -> Tuple[int, NDArray[np.int16]] | AdditionalOutputs | None:
|
| 544 |
+
"""Get the next output (audio or transcript)."""
|
| 545 |
+
return await wait_for_item(self.output_queue)
|
| 546 |
+
|
| 547 |
+
async def shutdown(self) -> None:
|
| 548 |
+
"""Shutdown the handler."""
|
| 549 |
+
self._shutdown_requested = True
|
| 550 |
+
|
| 551 |
+
if self.connection:
|
| 552 |
+
try:
|
| 553 |
+
await self.connection.close()
|
| 554 |
+
except Exception as e:
|
| 555 |
+
logger.debug("Connection close: %s", e)
|
| 556 |
+
self.connection = None
|
| 557 |
+
|
| 558 |
+
while not self.output_queue.empty():
|
| 559 |
+
try:
|
| 560 |
+
self.output_queue.get_nowait()
|
| 561 |
+
except asyncio.QueueEmpty:
|
| 562 |
+
break
|
src/reachy_mini_openclaw/openclaw_bridge.py
ADDED
|
@@ -0,0 +1,606 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""ReachyClaw - Bridge to OpenClaw Gateway for AI responses.
|
| 2 |
+
|
| 3 |
+
This module provides ReachyClaw's integration with the OpenClaw gateway
|
| 4 |
+
using the WebSocket protocol (the gateway's native transport).
|
| 5 |
+
|
| 6 |
+
ReachyClaw uses OpenAI Realtime API for voice I/O (speech recognition + TTS)
|
| 7 |
+
but routes all responses through OpenClaw for intelligence.
|
| 8 |
+
"""
|
| 9 |
+
|
| 10 |
+
import json
|
| 11 |
+
import asyncio
|
| 12 |
+
import logging
|
| 13 |
+
import uuid
|
| 14 |
+
from typing import Optional, Any, AsyncIterator
|
| 15 |
+
from dataclasses import dataclass
|
| 16 |
+
|
| 17 |
+
import websockets
|
| 18 |
+
|
| 19 |
+
from reachy_mini_openclaw.config import config
|
| 20 |
+
|
| 21 |
+
logger = logging.getLogger(__name__)
|
| 22 |
+
|
| 23 |
+
# Protocol version supported by this client
|
| 24 |
+
PROTOCOL_VERSION = 3
|
| 25 |
+
|
| 26 |
+
|
| 27 |
+
@dataclass
|
| 28 |
+
class OpenClawResponse:
|
| 29 |
+
"""Response from OpenClaw gateway."""
|
| 30 |
+
content: str
|
| 31 |
+
error: Optional[str] = None
|
| 32 |
+
|
| 33 |
+
|
| 34 |
+
class OpenClawBridge:
|
| 35 |
+
"""Bridge to OpenClaw Gateway using WebSocket protocol.
|
| 36 |
+
|
| 37 |
+
The OpenClaw gateway speaks WebSocket with a JSON frame protocol.
|
| 38 |
+
This class handles the connect handshake, authentication, and
|
| 39 |
+
chat operations.
|
| 40 |
+
|
| 41 |
+
Example:
|
| 42 |
+
bridge = OpenClawBridge()
|
| 43 |
+
await bridge.connect()
|
| 44 |
+
|
| 45 |
+
# Simple query
|
| 46 |
+
response = await bridge.chat("Hello!")
|
| 47 |
+
print(response.content)
|
| 48 |
+
"""
|
| 49 |
+
|
| 50 |
+
def __init__(
|
| 51 |
+
self,
|
| 52 |
+
gateway_url: Optional[str] = None,
|
| 53 |
+
gateway_token: Optional[str] = None,
|
| 54 |
+
agent_id: Optional[str] = None,
|
| 55 |
+
timeout: float = 120.0,
|
| 56 |
+
):
|
| 57 |
+
"""Initialize the OpenClaw bridge.
|
| 58 |
+
|
| 59 |
+
Args:
|
| 60 |
+
gateway_url: URL of the OpenClaw gateway (default: from env/config).
|
| 61 |
+
Accepts http:// or ws:// schemes; http is converted to ws.
|
| 62 |
+
gateway_token: Authentication token (default: from env/config)
|
| 63 |
+
agent_id: OpenClaw agent ID to use (default: from env/config)
|
| 64 |
+
timeout: Request timeout in seconds
|
| 65 |
+
"""
|
| 66 |
+
import os
|
| 67 |
+
|
| 68 |
+
raw_url = (
|
| 69 |
+
gateway_url
|
| 70 |
+
or os.getenv("OPENCLAW_GATEWAY_URL")
|
| 71 |
+
or config.OPENCLAW_GATEWAY_URL
|
| 72 |
+
)
|
| 73 |
+
# Normalise to ws:// (the gateway listens on the same port for both)
|
| 74 |
+
self.gateway_url = self._normalise_ws_url(raw_url)
|
| 75 |
+
|
| 76 |
+
self.gateway_token = (
|
| 77 |
+
gateway_token
|
| 78 |
+
or os.getenv("OPENCLAW_TOKEN")
|
| 79 |
+
or config.OPENCLAW_TOKEN
|
| 80 |
+
)
|
| 81 |
+
self.agent_id = (
|
| 82 |
+
agent_id
|
| 83 |
+
or os.getenv("OPENCLAW_AGENT_ID")
|
| 84 |
+
or config.OPENCLAW_AGENT_ID
|
| 85 |
+
)
|
| 86 |
+
self.timeout = timeout
|
| 87 |
+
|
| 88 |
+
# Session key – "main" shares context with WhatsApp and other channels.
|
| 89 |
+
# Full key format: agent:<agent_id>:<session_key>
|
| 90 |
+
self.session_key = (
|
| 91 |
+
os.getenv("OPENCLAW_SESSION_KEY")
|
| 92 |
+
or config.OPENCLAW_SESSION_KEY
|
| 93 |
+
or "main"
|
| 94 |
+
)
|
| 95 |
+
|
| 96 |
+
# Persistent WebSocket state
|
| 97 |
+
self._ws: Optional[websockets.WebSocketClientProtocol] = None
|
| 98 |
+
self._connected = False
|
| 99 |
+
self._conn_id: Optional[str] = None
|
| 100 |
+
|
| 101 |
+
# Background listener task & pending request futures
|
| 102 |
+
self._listener_task: Optional[asyncio.Task] = None
|
| 103 |
+
self._pending: dict[str, asyncio.Future] = {}
|
| 104 |
+
# Events keyed by runId -> list of event payloads
|
| 105 |
+
self._run_events: dict[str, asyncio.Queue] = {}
|
| 106 |
+
|
| 107 |
+
# ------------------------------------------------------------------
|
| 108 |
+
# URL helpers
|
| 109 |
+
# ------------------------------------------------------------------
|
| 110 |
+
|
| 111 |
+
@staticmethod
|
| 112 |
+
def _normalise_ws_url(url: str) -> str:
|
| 113 |
+
"""Convert http(s) URL to ws(s)."""
|
| 114 |
+
if url.startswith("http://"):
|
| 115 |
+
return "ws://" + url[7:]
|
| 116 |
+
if url.startswith("https://"):
|
| 117 |
+
return "wss://" + url[8:]
|
| 118 |
+
if not url.startswith("ws://") and not url.startswith("wss://"):
|
| 119 |
+
return "ws://" + url
|
| 120 |
+
return url
|
| 121 |
+
|
| 122 |
+
# ------------------------------------------------------------------
|
| 123 |
+
# Connection lifecycle
|
| 124 |
+
# ------------------------------------------------------------------
|
| 125 |
+
|
| 126 |
+
async def connect(self) -> bool:
|
| 127 |
+
"""Connect to the OpenClaw gateway and authenticate.
|
| 128 |
+
|
| 129 |
+
Returns:
|
| 130 |
+
True if connection successful, False otherwise
|
| 131 |
+
"""
|
| 132 |
+
logger.info(
|
| 133 |
+
"Connecting to OpenClaw at %s (token: %s)",
|
| 134 |
+
self.gateway_url,
|
| 135 |
+
"set" if self.gateway_token else "not set",
|
| 136 |
+
)
|
| 137 |
+
try:
|
| 138 |
+
self._ws = await websockets.connect(
|
| 139 |
+
self.gateway_url,
|
| 140 |
+
ping_interval=20,
|
| 141 |
+
ping_timeout=30,
|
| 142 |
+
close_timeout=5,
|
| 143 |
+
)
|
| 144 |
+
|
| 145 |
+
# 1. Receive challenge
|
| 146 |
+
raw = await asyncio.wait_for(self._ws.recv(), timeout=10)
|
| 147 |
+
challenge = json.loads(raw)
|
| 148 |
+
if challenge.get("event") != "connect.challenge":
|
| 149 |
+
logger.warning("Unexpected first frame: %s", challenge.get("event"))
|
| 150 |
+
|
| 151 |
+
# 2. Send connect request
|
| 152 |
+
req_id = str(uuid.uuid4())
|
| 153 |
+
connect_req = {
|
| 154 |
+
"type": "req",
|
| 155 |
+
"id": req_id,
|
| 156 |
+
"method": "connect",
|
| 157 |
+
"params": {
|
| 158 |
+
"minProtocol": PROTOCOL_VERSION,
|
| 159 |
+
"maxProtocol": PROTOCOL_VERSION,
|
| 160 |
+
"auth": {"token": self.gateway_token} if self.gateway_token else {},
|
| 161 |
+
"client": {
|
| 162 |
+
"id": "cli",
|
| 163 |
+
"version": "1.0.0",
|
| 164 |
+
"platform": "darwin",
|
| 165 |
+
"mode": "cli",
|
| 166 |
+
},
|
| 167 |
+
"role": "operator",
|
| 168 |
+
"scopes": ["chat", "operator.write", "operator.read"],
|
| 169 |
+
},
|
| 170 |
+
}
|
| 171 |
+
await self._ws.send(json.dumps(connect_req))
|
| 172 |
+
|
| 173 |
+
# 3. Read hello response
|
| 174 |
+
raw = await asyncio.wait_for(self._ws.recv(), timeout=10)
|
| 175 |
+
hello = json.loads(raw)
|
| 176 |
+
|
| 177 |
+
if hello.get("ok"):
|
| 178 |
+
self._connected = True
|
| 179 |
+
payload = hello.get("payload", {})
|
| 180 |
+
server = payload.get("server", {})
|
| 181 |
+
self._conn_id = server.get("connId")
|
| 182 |
+
logger.info(
|
| 183 |
+
"Connected to OpenClaw gateway (server=%s, connId=%s)",
|
| 184 |
+
server.get("host", "?"),
|
| 185 |
+
self._conn_id,
|
| 186 |
+
)
|
| 187 |
+
# Start background listener
|
| 188 |
+
self._listener_task = asyncio.create_task(
|
| 189 |
+
self._listen_loop(), name="openclaw-ws-listener"
|
| 190 |
+
)
|
| 191 |
+
return True
|
| 192 |
+
else:
|
| 193 |
+
err = hello.get("error", {})
|
| 194 |
+
logger.error(
|
| 195 |
+
"OpenClaw connect failed: %s - %s",
|
| 196 |
+
err.get("code"),
|
| 197 |
+
err.get("message"),
|
| 198 |
+
)
|
| 199 |
+
await self._close_ws()
|
| 200 |
+
return False
|
| 201 |
+
|
| 202 |
+
except Exception as e:
|
| 203 |
+
logger.error(
|
| 204 |
+
"Failed to connect to OpenClaw gateway: %s (%s)",
|
| 205 |
+
e,
|
| 206 |
+
type(e).__name__,
|
| 207 |
+
)
|
| 208 |
+
await self._close_ws()
|
| 209 |
+
return False
|
| 210 |
+
|
| 211 |
+
async def disconnect(self) -> None:
|
| 212 |
+
"""Disconnect from the gateway."""
|
| 213 |
+
self._connected = False
|
| 214 |
+
if self._listener_task and not self._listener_task.done():
|
| 215 |
+
self._listener_task.cancel()
|
| 216 |
+
try:
|
| 217 |
+
await self._listener_task
|
| 218 |
+
except (asyncio.CancelledError, Exception):
|
| 219 |
+
pass
|
| 220 |
+
await self._close_ws()
|
| 221 |
+
|
| 222 |
+
async def _close_ws(self) -> None:
|
| 223 |
+
self._connected = False
|
| 224 |
+
if self._ws:
|
| 225 |
+
try:
|
| 226 |
+
await self._ws.close()
|
| 227 |
+
except Exception:
|
| 228 |
+
pass
|
| 229 |
+
self._ws = None
|
| 230 |
+
|
| 231 |
+
# ------------------------------------------------------------------
|
| 232 |
+
# Background listener
|
| 233 |
+
# ------------------------------------------------------------------
|
| 234 |
+
|
| 235 |
+
async def _listen_loop(self) -> None:
|
| 236 |
+
"""Background task that reads all frames from the WebSocket."""
|
| 237 |
+
try:
|
| 238 |
+
async for raw in self._ws:
|
| 239 |
+
try:
|
| 240 |
+
msg = json.loads(raw)
|
| 241 |
+
except json.JSONDecodeError:
|
| 242 |
+
continue
|
| 243 |
+
await self._dispatch(msg)
|
| 244 |
+
except websockets.ConnectionClosed as e:
|
| 245 |
+
logger.warning("OpenClaw WebSocket closed: %s", e)
|
| 246 |
+
except asyncio.CancelledError:
|
| 247 |
+
return
|
| 248 |
+
except Exception as e:
|
| 249 |
+
logger.error("OpenClaw listener error: %s", e)
|
| 250 |
+
finally:
|
| 251 |
+
self._connected = False
|
| 252 |
+
|
| 253 |
+
async def _dispatch(self, msg: dict) -> None:
|
| 254 |
+
"""Route an incoming frame to the right handler."""
|
| 255 |
+
msg_type = msg.get("type")
|
| 256 |
+
|
| 257 |
+
if msg_type == "res":
|
| 258 |
+
# Response to a request we sent
|
| 259 |
+
req_id = msg.get("id")
|
| 260 |
+
fut = self._pending.pop(req_id, None)
|
| 261 |
+
if fut and not fut.done():
|
| 262 |
+
fut.set_result(msg)
|
| 263 |
+
|
| 264 |
+
elif msg_type == "event":
|
| 265 |
+
event_name = msg.get("event", "")
|
| 266 |
+
payload = msg.get("payload", {})
|
| 267 |
+
|
| 268 |
+
# Route agent / chat events to the correct run queue
|
| 269 |
+
run_id = payload.get("runId")
|
| 270 |
+
if run_id and run_id in self._run_events:
|
| 271 |
+
await self._run_events[run_id].put(msg)
|
| 272 |
+
|
| 273 |
+
# Ignore noisy events silently
|
| 274 |
+
if event_name in ("health", "tick"):
|
| 275 |
+
return
|
| 276 |
+
|
| 277 |
+
logger.debug("Event: %s (runId=%s)", event_name, run_id)
|
| 278 |
+
|
| 279 |
+
# ------------------------------------------------------------------
|
| 280 |
+
# Request helpers
|
| 281 |
+
# ------------------------------------------------------------------
|
| 282 |
+
|
| 283 |
+
async def _send_request(
|
| 284 |
+
self, method: str, params: dict, timeout: Optional[float] = None
|
| 285 |
+
) -> dict:
|
| 286 |
+
"""Send a request and wait for the response.
|
| 287 |
+
|
| 288 |
+
Args:
|
| 289 |
+
method: The RPC method name
|
| 290 |
+
params: The params dict
|
| 291 |
+
timeout: Override timeout (defaults to self.timeout)
|
| 292 |
+
|
| 293 |
+
Returns:
|
| 294 |
+
The full response message dict
|
| 295 |
+
"""
|
| 296 |
+
if not self._ws or not self._connected:
|
| 297 |
+
return {"ok": False, "error": {"code": "NOT_CONNECTED", "message": "Not connected"}}
|
| 298 |
+
|
| 299 |
+
req_id = str(uuid.uuid4())
|
| 300 |
+
req = {"type": "req", "id": req_id, "method": method, "params": params}
|
| 301 |
+
|
| 302 |
+
fut: asyncio.Future = asyncio.get_event_loop().create_future()
|
| 303 |
+
self._pending[req_id] = fut
|
| 304 |
+
|
| 305 |
+
try:
|
| 306 |
+
await self._ws.send(json.dumps(req))
|
| 307 |
+
result = await asyncio.wait_for(fut, timeout=timeout or self.timeout)
|
| 308 |
+
return result
|
| 309 |
+
except asyncio.TimeoutError:
|
| 310 |
+
self._pending.pop(req_id, None)
|
| 311 |
+
return {"ok": False, "error": {"code": "TIMEOUT", "message": "Request timed out"}}
|
| 312 |
+
except Exception as e:
|
| 313 |
+
self._pending.pop(req_id, None)
|
| 314 |
+
return {"ok": False, "error": {"code": "ERROR", "message": str(e)}}
|
| 315 |
+
|
| 316 |
+
def _full_session_key(self) -> str:
|
| 317 |
+
"""Build the full session key: agent:<agentId>:<sessionKey>."""
|
| 318 |
+
return f"agent:{self.agent_id}:{self.session_key}"
|
| 319 |
+
|
| 320 |
+
# ------------------------------------------------------------------
|
| 321 |
+
# Chat API
|
| 322 |
+
# ------------------------------------------------------------------
|
| 323 |
+
|
| 324 |
+
async def chat(
|
| 325 |
+
self,
|
| 326 |
+
message: str,
|
| 327 |
+
image_b64: Optional[str] = None,
|
| 328 |
+
system_context: Optional[str] = None,
|
| 329 |
+
) -> OpenClawResponse:
|
| 330 |
+
"""Send a message to OpenClaw and get a response.
|
| 331 |
+
|
| 332 |
+
OpenClaw maintains conversation memory on its end, so it will be aware
|
| 333 |
+
of conversations from other channels (WhatsApp, web, etc.). We only send
|
| 334 |
+
the current message and let OpenClaw handle the context.
|
| 335 |
+
|
| 336 |
+
Args:
|
| 337 |
+
message: The user's message (transcribed speech)
|
| 338 |
+
image_b64: Optional base64-encoded image from robot camera (not yet
|
| 339 |
+
supported over WebSocket chat.send – reserved for future)
|
| 340 |
+
system_context: Optional additional system context (prepended to message)
|
| 341 |
+
|
| 342 |
+
Returns:
|
| 343 |
+
OpenClawResponse with the AI's response
|
| 344 |
+
"""
|
| 345 |
+
if not self._connected:
|
| 346 |
+
return OpenClawResponse(content="", error="Not connected to OpenClaw")
|
| 347 |
+
|
| 348 |
+
# Prefix system context if provided
|
| 349 |
+
final_message = message
|
| 350 |
+
if system_context:
|
| 351 |
+
final_message = f"[System: {system_context}]\n\n{message}"
|
| 352 |
+
|
| 353 |
+
# If image provided, mention it (WebSocket protocol uses string messages;
|
| 354 |
+
# image passing would require a separate mechanism)
|
| 355 |
+
if image_b64:
|
| 356 |
+
final_message = f"[Image attached]\n{final_message}"
|
| 357 |
+
|
| 358 |
+
idempotency_key = str(uuid.uuid4())
|
| 359 |
+
session_key = self._full_session_key()
|
| 360 |
+
|
| 361 |
+
# Create a queue to collect events for this run
|
| 362 |
+
# We'll get the runId from the response
|
| 363 |
+
params = {
|
| 364 |
+
"idempotencyKey": idempotency_key,
|
| 365 |
+
"sessionKey": session_key,
|
| 366 |
+
"message": final_message,
|
| 367 |
+
}
|
| 368 |
+
|
| 369 |
+
try:
|
| 370 |
+
# Send the request
|
| 371 |
+
resp = await self._send_request("chat.send", params, timeout=30)
|
| 372 |
+
|
| 373 |
+
if not resp.get("ok"):
|
| 374 |
+
err = resp.get("error", {})
|
| 375 |
+
error_msg = f"{err.get('code', 'UNKNOWN')}: {err.get('message', 'Unknown error')}"
|
| 376 |
+
logger.error("chat.send failed: %s", error_msg)
|
| 377 |
+
return OpenClawResponse(content="", error=error_msg)
|
| 378 |
+
|
| 379 |
+
run_id = resp.get("payload", {}).get("runId")
|
| 380 |
+
if not run_id:
|
| 381 |
+
return OpenClawResponse(content="", error="No runId in response")
|
| 382 |
+
|
| 383 |
+
# Register a queue to receive events for this run
|
| 384 |
+
event_queue: asyncio.Queue = asyncio.Queue()
|
| 385 |
+
self._run_events[run_id] = event_queue
|
| 386 |
+
|
| 387 |
+
try:
|
| 388 |
+
# Collect the streamed response
|
| 389 |
+
full_text = ""
|
| 390 |
+
while True:
|
| 391 |
+
try:
|
| 392 |
+
event = await asyncio.wait_for(
|
| 393 |
+
event_queue.get(), timeout=self.timeout
|
| 394 |
+
)
|
| 395 |
+
payload = event.get("payload", {})
|
| 396 |
+
event_name = event.get("event", "")
|
| 397 |
+
|
| 398 |
+
if event_name == "agent":
|
| 399 |
+
stream = payload.get("stream")
|
| 400 |
+
data = payload.get("data", {})
|
| 401 |
+
|
| 402 |
+
if stream == "assistant":
|
| 403 |
+
# Accumulate the full text
|
| 404 |
+
full_text = data.get("text", full_text)
|
| 405 |
+
|
| 406 |
+
elif stream == "lifecycle" and data.get("phase") == "end":
|
| 407 |
+
# Run completed
|
| 408 |
+
break
|
| 409 |
+
|
| 410 |
+
elif event_name == "chat":
|
| 411 |
+
state = payload.get("state")
|
| 412 |
+
if state == "final":
|
| 413 |
+
# Extract final text
|
| 414 |
+
msg_payload = payload.get("message", {})
|
| 415 |
+
content_parts = msg_payload.get("content", [])
|
| 416 |
+
if isinstance(content_parts, list):
|
| 417 |
+
for part in content_parts:
|
| 418 |
+
if isinstance(part, dict) and part.get("type") == "text":
|
| 419 |
+
full_text = part.get("text", full_text)
|
| 420 |
+
elif isinstance(content_parts, str):
|
| 421 |
+
full_text = content_parts
|
| 422 |
+
break
|
| 423 |
+
|
| 424 |
+
except asyncio.TimeoutError:
|
| 425 |
+
logger.warning("Timeout waiting for chat response (runId=%s)", run_id)
|
| 426 |
+
if full_text:
|
| 427 |
+
break
|
| 428 |
+
return OpenClawResponse(content="", error="Response timeout")
|
| 429 |
+
|
| 430 |
+
return OpenClawResponse(content=full_text)
|
| 431 |
+
|
| 432 |
+
finally:
|
| 433 |
+
self._run_events.pop(run_id, None)
|
| 434 |
+
|
| 435 |
+
except Exception as e:
|
| 436 |
+
logger.error("OpenClaw chat error: %s", e)
|
| 437 |
+
return OpenClawResponse(content="", error=str(e))
|
| 438 |
+
|
| 439 |
+
async def stream_chat(
|
| 440 |
+
self,
|
| 441 |
+
message: str,
|
| 442 |
+
image_b64: Optional[str] = None,
|
| 443 |
+
) -> AsyncIterator[str]:
|
| 444 |
+
"""Stream a response from OpenClaw.
|
| 445 |
+
|
| 446 |
+
Args:
|
| 447 |
+
message: The user's message
|
| 448 |
+
image_b64: Optional base64-encoded image
|
| 449 |
+
|
| 450 |
+
Yields:
|
| 451 |
+
String chunks of the response as they arrive
|
| 452 |
+
"""
|
| 453 |
+
if not self._connected:
|
| 454 |
+
yield "[Error: Not connected to OpenClaw]"
|
| 455 |
+
return
|
| 456 |
+
|
| 457 |
+
final_message = message
|
| 458 |
+
if image_b64:
|
| 459 |
+
final_message = f"[Image attached]\n{message}"
|
| 460 |
+
|
| 461 |
+
params = {
|
| 462 |
+
"idempotencyKey": str(uuid.uuid4()),
|
| 463 |
+
"sessionKey": self._full_session_key(),
|
| 464 |
+
"message": final_message,
|
| 465 |
+
}
|
| 466 |
+
|
| 467 |
+
try:
|
| 468 |
+
resp = await self._send_request("chat.send", params, timeout=30)
|
| 469 |
+
|
| 470 |
+
if not resp.get("ok"):
|
| 471 |
+
err = resp.get("error", {})
|
| 472 |
+
yield f"[Error: {err.get('message', 'Unknown error')}]"
|
| 473 |
+
return
|
| 474 |
+
|
| 475 |
+
run_id = resp.get("payload", {}).get("runId")
|
| 476 |
+
if not run_id:
|
| 477 |
+
yield "[Error: No runId]"
|
| 478 |
+
return
|
| 479 |
+
|
| 480 |
+
event_queue: asyncio.Queue = asyncio.Queue()
|
| 481 |
+
self._run_events[run_id] = event_queue
|
| 482 |
+
|
| 483 |
+
try:
|
| 484 |
+
prev_text = ""
|
| 485 |
+
while True:
|
| 486 |
+
try:
|
| 487 |
+
event = await asyncio.wait_for(
|
| 488 |
+
event_queue.get(), timeout=self.timeout
|
| 489 |
+
)
|
| 490 |
+
payload = event.get("payload", {})
|
| 491 |
+
event_name = event.get("event", "")
|
| 492 |
+
|
| 493 |
+
if event_name == "agent":
|
| 494 |
+
stream = payload.get("stream")
|
| 495 |
+
data = payload.get("data", {})
|
| 496 |
+
|
| 497 |
+
if stream == "assistant":
|
| 498 |
+
delta = data.get("delta", "")
|
| 499 |
+
if delta:
|
| 500 |
+
yield delta
|
| 501 |
+
|
| 502 |
+
elif stream == "lifecycle" and data.get("phase") == "end":
|
| 503 |
+
break
|
| 504 |
+
|
| 505 |
+
elif event_name == "chat" and payload.get("state") == "final":
|
| 506 |
+
break
|
| 507 |
+
|
| 508 |
+
except asyncio.TimeoutError:
|
| 509 |
+
yield "[Error: timeout]"
|
| 510 |
+
break
|
| 511 |
+
finally:
|
| 512 |
+
self._run_events.pop(run_id, None)
|
| 513 |
+
|
| 514 |
+
except Exception as e:
|
| 515 |
+
logger.error("OpenClaw streaming error: %s", e)
|
| 516 |
+
yield f"[Error: {e}]"
|
| 517 |
+
|
| 518 |
+
@property
|
| 519 |
+
def is_connected(self) -> bool:
|
| 520 |
+
"""Check if bridge is connected to gateway."""
|
| 521 |
+
return self._connected
|
| 522 |
+
|
| 523 |
+
async def get_agent_context(self) -> Optional[str]:
|
| 524 |
+
"""Fetch the agent's current context, personality, and memory summary.
|
| 525 |
+
|
| 526 |
+
This asks OpenClaw to provide a summary of:
|
| 527 |
+
- The agent's personality and identity
|
| 528 |
+
- Recent conversation context
|
| 529 |
+
- Important memories about the user
|
| 530 |
+
- Current state
|
| 531 |
+
|
| 532 |
+
Returns:
|
| 533 |
+
A context string to use as system instructions, or None if failed
|
| 534 |
+
"""
|
| 535 |
+
try:
|
| 536 |
+
response = await self.chat(
|
| 537 |
+
message="Provide your current context summary for the robot body.",
|
| 538 |
+
system_context=(
|
| 539 |
+
"You are being asked to provide your current context for your robot body. "
|
| 540 |
+
"Output a comprehensive context summary that another AI can use to embody you. Include: "
|
| 541 |
+
"1. YOUR IDENTITY: Who you are, your name, your personality traits, how you speak. "
|
| 542 |
+
"2. USER CONTEXT: What you know about the user (name, preferences, relationship). "
|
| 543 |
+
"3. RECENT CONTEXT: Summary of recent conversations or important ongoing topics. "
|
| 544 |
+
"4. MEMORIES: Key things you remember that are relevant to interactions. "
|
| 545 |
+
"5. CURRENT STATE: Any relevant time/date awareness, ongoing tasks. "
|
| 546 |
+
"Be specific and personal. This context will be used by your robot body to speak and act AS YOU. "
|
| 547 |
+
"Output ONLY the context summary, no preamble."
|
| 548 |
+
),
|
| 549 |
+
)
|
| 550 |
+
|
| 551 |
+
if response.error:
|
| 552 |
+
logger.warning("Failed to get agent context: %s", response.error)
|
| 553 |
+
return None
|
| 554 |
+
|
| 555 |
+
if response.content:
|
| 556 |
+
logger.info(
|
| 557 |
+
"Retrieved agent context from OpenClaw (%d chars)",
|
| 558 |
+
len(response.content),
|
| 559 |
+
)
|
| 560 |
+
return response.content
|
| 561 |
+
|
| 562 |
+
logger.warning("No context returned from OpenClaw")
|
| 563 |
+
return None
|
| 564 |
+
|
| 565 |
+
except Exception as e:
|
| 566 |
+
logger.error("Failed to get agent context: %s", e)
|
| 567 |
+
return None
|
| 568 |
+
|
| 569 |
+
async def sync_conversation(
|
| 570 |
+
self, user_message: str, assistant_response: str
|
| 571 |
+
) -> None:
|
| 572 |
+
"""Sync a conversation turn back to OpenClaw for memory continuity.
|
| 573 |
+
|
| 574 |
+
Args:
|
| 575 |
+
user_message: What the user said
|
| 576 |
+
assistant_response: What the robot/AI responded
|
| 577 |
+
"""
|
| 578 |
+
try:
|
| 579 |
+
await self.chat(
|
| 580 |
+
message=(
|
| 581 |
+
f"[ROBOT BODY SYNC] The following happened through the Reachy Mini robot:\n"
|
| 582 |
+
f"User said: {user_message}\n"
|
| 583 |
+
f"You responded: {assistant_response}\n"
|
| 584 |
+
f"Remember this as part of your ongoing conversation."
|
| 585 |
+
),
|
| 586 |
+
system_context=(
|
| 587 |
+
"[ROBOT BODY SYNC] The following conversation happened through your "
|
| 588 |
+
"Reachy Mini robot body. Remember it as part of your ongoing conversation "
|
| 589 |
+
"with the user."
|
| 590 |
+
),
|
| 591 |
+
)
|
| 592 |
+
logger.debug("Synced conversation to OpenClaw")
|
| 593 |
+
except Exception as e:
|
| 594 |
+
logger.debug("Failed to sync conversation: %s", e)
|
| 595 |
+
|
| 596 |
+
|
| 597 |
+
# Global bridge instance (lazy initialization)
|
| 598 |
+
_bridge: Optional[OpenClawBridge] = None
|
| 599 |
+
|
| 600 |
+
|
| 601 |
+
def get_bridge() -> OpenClawBridge:
|
| 602 |
+
"""Get the global OpenClaw bridge instance."""
|
| 603 |
+
global _bridge
|
| 604 |
+
if _bridge is None:
|
| 605 |
+
_bridge = OpenClawBridge()
|
| 606 |
+
return _bridge
|
src/reachy_mini_openclaw/prompts.py
ADDED
|
@@ -0,0 +1,98 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Prompt management for the robot assistant.
|
| 2 |
+
|
| 3 |
+
Handles loading and customizing system prompts for the OpenAI Realtime session.
|
| 4 |
+
"""
|
| 5 |
+
|
| 6 |
+
import logging
|
| 7 |
+
from pathlib import Path
|
| 8 |
+
from typing import Optional
|
| 9 |
+
|
| 10 |
+
from reachy_mini_openclaw.config import config
|
| 11 |
+
|
| 12 |
+
logger = logging.getLogger(__name__)
|
| 13 |
+
|
| 14 |
+
# Default prompts directory
|
| 15 |
+
PROMPTS_DIR = Path(__file__).parent / "prompts"
|
| 16 |
+
|
| 17 |
+
|
| 18 |
+
def get_session_instructions() -> str:
|
| 19 |
+
"""Get the system instructions for the OpenAI Realtime session.
|
| 20 |
+
|
| 21 |
+
Loads from custom profile if configured, otherwise uses default.
|
| 22 |
+
|
| 23 |
+
Returns:
|
| 24 |
+
System instructions string
|
| 25 |
+
"""
|
| 26 |
+
# Check for custom profile
|
| 27 |
+
custom_profile = config.CUSTOM_PROFILE
|
| 28 |
+
if custom_profile:
|
| 29 |
+
custom_path = PROMPTS_DIR / f"{custom_profile}.txt"
|
| 30 |
+
if custom_path.exists():
|
| 31 |
+
try:
|
| 32 |
+
instructions = custom_path.read_text(encoding="utf-8")
|
| 33 |
+
logger.info("Loaded custom profile: %s", custom_profile)
|
| 34 |
+
return instructions
|
| 35 |
+
except Exception as e:
|
| 36 |
+
logger.warning("Failed to load custom profile %s: %s", custom_profile, e)
|
| 37 |
+
|
| 38 |
+
# Load default
|
| 39 |
+
default_path = PROMPTS_DIR / "default.txt"
|
| 40 |
+
if default_path.exists():
|
| 41 |
+
try:
|
| 42 |
+
return default_path.read_text(encoding="utf-8")
|
| 43 |
+
except Exception as e:
|
| 44 |
+
logger.warning("Failed to load default prompt: %s", e)
|
| 45 |
+
|
| 46 |
+
# Fallback inline prompt
|
| 47 |
+
return """You are a friendly AI assistant with a robot body. You can see, hear, and move expressively.
|
| 48 |
+
Be conversational and use your movement capabilities to be engaging.
|
| 49 |
+
Use the camera tool when asked about your surroundings.
|
| 50 |
+
Express emotions through movement to enhance communication."""
|
| 51 |
+
|
| 52 |
+
|
| 53 |
+
def get_session_voice() -> str:
|
| 54 |
+
"""Get the voice to use for the OpenAI Realtime session.
|
| 55 |
+
|
| 56 |
+
Returns:
|
| 57 |
+
Voice name string
|
| 58 |
+
"""
|
| 59 |
+
return config.OPENAI_VOICE
|
| 60 |
+
|
| 61 |
+
|
| 62 |
+
def get_available_profiles() -> list[str]:
|
| 63 |
+
"""Get list of available prompt profiles.
|
| 64 |
+
|
| 65 |
+
Returns:
|
| 66 |
+
List of profile names (without .txt extension)
|
| 67 |
+
"""
|
| 68 |
+
profiles = []
|
| 69 |
+
if PROMPTS_DIR.exists():
|
| 70 |
+
for path in PROMPTS_DIR.glob("*.txt"):
|
| 71 |
+
profiles.append(path.stem)
|
| 72 |
+
return sorted(profiles)
|
| 73 |
+
|
| 74 |
+
|
| 75 |
+
def save_custom_profile(name: str, instructions: str) -> bool:
|
| 76 |
+
"""Save a custom prompt profile.
|
| 77 |
+
|
| 78 |
+
Args:
|
| 79 |
+
name: Profile name (alphanumeric and underscores only)
|
| 80 |
+
instructions: The prompt instructions
|
| 81 |
+
|
| 82 |
+
Returns:
|
| 83 |
+
True if saved successfully
|
| 84 |
+
"""
|
| 85 |
+
# Validate name
|
| 86 |
+
if not name or not name.replace("_", "").isalnum():
|
| 87 |
+
logger.error("Invalid profile name: %s", name)
|
| 88 |
+
return False
|
| 89 |
+
|
| 90 |
+
try:
|
| 91 |
+
PROMPTS_DIR.mkdir(parents=True, exist_ok=True)
|
| 92 |
+
profile_path = PROMPTS_DIR / f"{name}.txt"
|
| 93 |
+
profile_path.write_text(instructions, encoding="utf-8")
|
| 94 |
+
logger.info("Saved custom profile: %s", name)
|
| 95 |
+
return True
|
| 96 |
+
except Exception as e:
|
| 97 |
+
logger.error("Failed to save profile %s: %s", name, e)
|
| 98 |
+
return False
|
src/reachy_mini_openclaw/prompts/default.txt
ADDED
|
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
You are an OpenClaw AI agent embodied in a physical Reachy Mini robot through ReachyClaw. You have a robot body that lets you see through a camera, hear through a microphone, speak through a speaker, and move expressively.
|
| 2 |
+
|
| 3 |
+
## Your Robot Capabilities
|
| 4 |
+
1. **Vision**: You can see through your camera. Use the camera tool to look at things when asked.
|
| 5 |
+
2. **Movement**: You can look in different directions (left, right, up, down, front) to show attention.
|
| 6 |
+
3. **Emotions**: Express emotions through movement (happy, sad, surprised, curious, thinking, confused, excited).
|
| 7 |
+
4. **Dances**: Perform dances to celebrate or entertain.
|
| 8 |
+
5. **Face Tracking**: Automatically look at people's faces when enabled.
|
| 9 |
+
|
| 10 |
+
## Guidelines
|
| 11 |
+
- Be conversational and natural — keep responses concise for voice
|
| 12 |
+
- Use your body expressively — look at things you discuss, show emotions
|
| 13 |
+
- When asked to see something, use your camera
|
| 14 |
+
- Keep responses short for natural conversation flow
|
| 15 |
+
- Your movements complement your speech — be expressive!
|
src/reachy_mini_openclaw/tools/__init__.py
ADDED
|
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Tool definitions for Reachy Mini OpenClaw.
|
| 2 |
+
|
| 3 |
+
These tools are exposed to the OpenAI Realtime API and allow the assistant
|
| 4 |
+
to control the robot and interact with the environment.
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
from reachy_mini_openclaw.tools.core_tools import (
|
| 8 |
+
ToolDependencies,
|
| 9 |
+
get_tool_specs,
|
| 10 |
+
dispatch_tool_call,
|
| 11 |
+
)
|
| 12 |
+
|
| 13 |
+
__all__ = [
|
| 14 |
+
"ToolDependencies",
|
| 15 |
+
"get_tool_specs",
|
| 16 |
+
"dispatch_tool_call",
|
| 17 |
+
]
|
src/reachy_mini_openclaw/tools/core_tools.py
ADDED
|
@@ -0,0 +1,421 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Core tool definitions for the ReachyClaw robot.
|
| 2 |
+
|
| 3 |
+
These tools allow the OpenClaw agent (in a robot body) to control
|
| 4 |
+
robot movements and capture images.
|
| 5 |
+
|
| 6 |
+
Tool Categories:
|
| 7 |
+
1. Movement Tools - Control head position, play emotions/dances
|
| 8 |
+
2. Vision Tools - Capture and analyze camera images
|
| 9 |
+
"""
|
| 10 |
+
|
| 11 |
+
import json
|
| 12 |
+
import logging
|
| 13 |
+
import base64
|
| 14 |
+
from dataclasses import dataclass
|
| 15 |
+
from typing import Any, Optional, TYPE_CHECKING
|
| 16 |
+
|
| 17 |
+
import numpy as np
|
| 18 |
+
|
| 19 |
+
if TYPE_CHECKING:
|
| 20 |
+
from reachy_mini_openclaw.moves import MovementManager, HeadLookMove
|
| 21 |
+
from reachy_mini_openclaw.audio.head_wobbler import HeadWobbler
|
| 22 |
+
from reachy_mini_openclaw.openclaw_bridge import OpenClawBridge
|
| 23 |
+
|
| 24 |
+
logger = logging.getLogger(__name__)
|
| 25 |
+
|
| 26 |
+
|
| 27 |
+
@dataclass
|
| 28 |
+
class ToolDependencies:
|
| 29 |
+
"""Dependencies required by tools.
|
| 30 |
+
|
| 31 |
+
This dataclass holds references to robot systems that tools need
|
| 32 |
+
to interact with.
|
| 33 |
+
"""
|
| 34 |
+
movement_manager: "MovementManager"
|
| 35 |
+
head_wobbler: "HeadWobbler"
|
| 36 |
+
robot: Any # ReachyMini instance
|
| 37 |
+
camera_worker: Optional[Any] = None
|
| 38 |
+
openclaw_bridge: Optional["OpenClawBridge"] = None
|
| 39 |
+
vision_manager: Optional[Any] = None # Local vision processor (SmolVLM2)
|
| 40 |
+
|
| 41 |
+
|
| 42 |
+
# Tool specifications in OpenAI format
|
| 43 |
+
TOOL_SPECS = [
|
| 44 |
+
{
|
| 45 |
+
"type": "function",
|
| 46 |
+
"name": "look",
|
| 47 |
+
"description": "Move the robot's head to look in a specific direction. Use this to direct attention or emphasize a point.",
|
| 48 |
+
"parameters": {
|
| 49 |
+
"type": "object",
|
| 50 |
+
"properties": {
|
| 51 |
+
"direction": {
|
| 52 |
+
"type": "string",
|
| 53 |
+
"enum": ["left", "right", "up", "down", "front"],
|
| 54 |
+
"description": "The direction to look. 'front' returns to neutral position."
|
| 55 |
+
}
|
| 56 |
+
},
|
| 57 |
+
"required": ["direction"]
|
| 58 |
+
}
|
| 59 |
+
},
|
| 60 |
+
{
|
| 61 |
+
"type": "function",
|
| 62 |
+
"name": "camera",
|
| 63 |
+
"description": "Capture an image from the robot's camera to see what's in front of you. Use this when asked about your surroundings or to identify objects/people.",
|
| 64 |
+
"parameters": {
|
| 65 |
+
"type": "object",
|
| 66 |
+
"properties": {},
|
| 67 |
+
"required": []
|
| 68 |
+
}
|
| 69 |
+
},
|
| 70 |
+
{
|
| 71 |
+
"type": "function",
|
| 72 |
+
"name": "face_tracking",
|
| 73 |
+
"description": "Enable or disable face tracking. When enabled, the robot will automatically look at detected faces.",
|
| 74 |
+
"parameters": {
|
| 75 |
+
"type": "object",
|
| 76 |
+
"properties": {
|
| 77 |
+
"enabled": {
|
| 78 |
+
"type": "boolean",
|
| 79 |
+
"description": "True to enable face tracking, False to disable"
|
| 80 |
+
}
|
| 81 |
+
},
|
| 82 |
+
"required": ["enabled"]
|
| 83 |
+
}
|
| 84 |
+
},
|
| 85 |
+
{
|
| 86 |
+
"type": "function",
|
| 87 |
+
"name": "dance",
|
| 88 |
+
"description": "Perform a dance animation. Use this to express joy, celebrate, or entertain.",
|
| 89 |
+
"parameters": {
|
| 90 |
+
"type": "object",
|
| 91 |
+
"properties": {
|
| 92 |
+
"dance_name": {
|
| 93 |
+
"type": "string",
|
| 94 |
+
"enum": ["happy", "excited", "wave", "nod", "shake", "bounce"],
|
| 95 |
+
"description": "The dance to perform"
|
| 96 |
+
}
|
| 97 |
+
},
|
| 98 |
+
"required": ["dance_name"]
|
| 99 |
+
}
|
| 100 |
+
},
|
| 101 |
+
{
|
| 102 |
+
"type": "function",
|
| 103 |
+
"name": "emotion",
|
| 104 |
+
"description": "Express an emotion through movement. Use this to show reactions and feelings.",
|
| 105 |
+
"parameters": {
|
| 106 |
+
"type": "object",
|
| 107 |
+
"properties": {
|
| 108 |
+
"emotion_name": {
|
| 109 |
+
"type": "string",
|
| 110 |
+
"enum": ["happy", "sad", "surprised", "curious", "thinking", "confused", "excited"],
|
| 111 |
+
"description": "The emotion to express"
|
| 112 |
+
}
|
| 113 |
+
},
|
| 114 |
+
"required": ["emotion_name"]
|
| 115 |
+
}
|
| 116 |
+
},
|
| 117 |
+
{
|
| 118 |
+
"type": "function",
|
| 119 |
+
"name": "stop_moves",
|
| 120 |
+
"description": "Stop all current movements and clear the movement queue.",
|
| 121 |
+
"parameters": {
|
| 122 |
+
"type": "object",
|
| 123 |
+
"properties": {},
|
| 124 |
+
"required": []
|
| 125 |
+
}
|
| 126 |
+
},
|
| 127 |
+
{
|
| 128 |
+
"type": "function",
|
| 129 |
+
"name": "idle",
|
| 130 |
+
"description": "Do nothing and remain idle. Use this when you want to stay still.",
|
| 131 |
+
"parameters": {
|
| 132 |
+
"type": "object",
|
| 133 |
+
"properties": {},
|
| 134 |
+
"required": []
|
| 135 |
+
}
|
| 136 |
+
},
|
| 137 |
+
]
|
| 138 |
+
|
| 139 |
+
|
| 140 |
+
def get_tool_specs() -> list[dict]:
|
| 141 |
+
"""Get the list of tool specifications for OpenAI.
|
| 142 |
+
|
| 143 |
+
Returns:
|
| 144 |
+
List of tool specification dictionaries
|
| 145 |
+
"""
|
| 146 |
+
return TOOL_SPECS
|
| 147 |
+
|
| 148 |
+
|
| 149 |
+
# Mapping from tool names to action tag names used by OpenClaw
|
| 150 |
+
_TOOL_TO_TAG = {
|
| 151 |
+
"look": ("LOOK", "direction"),
|
| 152 |
+
"emotion": ("EMOTION", "emotion_name"),
|
| 153 |
+
"dance": ("DANCE", "dance_name"),
|
| 154 |
+
"camera": ("CAMERA", None),
|
| 155 |
+
"face_tracking": ("FACE_TRACKING", None), # special: on/off
|
| 156 |
+
"stop_moves": ("STOP", None),
|
| 157 |
+
}
|
| 158 |
+
|
| 159 |
+
|
| 160 |
+
def get_body_actions_description() -> str:
|
| 161 |
+
"""Build a description of available robot body actions from TOOL_SPECS.
|
| 162 |
+
|
| 163 |
+
Returns a string listing all action tags and their valid values,
|
| 164 |
+
derived directly from TOOL_SPECS so it stays in sync automatically.
|
| 165 |
+
"""
|
| 166 |
+
specs_by_name = {s["name"]: s for s in TOOL_SPECS}
|
| 167 |
+
lines = []
|
| 168 |
+
|
| 169 |
+
for tool_name, (tag, param_key) in _TOOL_TO_TAG.items():
|
| 170 |
+
spec = specs_by_name.get(tool_name)
|
| 171 |
+
if spec is None:
|
| 172 |
+
continue
|
| 173 |
+
|
| 174 |
+
props = spec["parameters"].get("properties", {})
|
| 175 |
+
|
| 176 |
+
if param_key and param_key in props:
|
| 177 |
+
# Enum-based param: list all values
|
| 178 |
+
values = props[param_key].get("enum", [])
|
| 179 |
+
tags = " ".join(f"[{tag}:{v}]" for v in values)
|
| 180 |
+
lines.append(f" {tags}")
|
| 181 |
+
elif tool_name == "face_tracking":
|
| 182 |
+
lines.append(f" [{tag}:on] [{tag}:off]")
|
| 183 |
+
else:
|
| 184 |
+
# No-param action
|
| 185 |
+
desc = spec.get("description", "")
|
| 186 |
+
lines.append(f" [{tag}] — {desc}")
|
| 187 |
+
|
| 188 |
+
return "\n".join(lines)
|
| 189 |
+
|
| 190 |
+
|
| 191 |
+
async def dispatch_tool_call(
|
| 192 |
+
tool_name: str,
|
| 193 |
+
arguments_json: str,
|
| 194 |
+
deps: ToolDependencies,
|
| 195 |
+
) -> dict[str, Any]:
|
| 196 |
+
"""Dispatch a tool call to the appropriate handler.
|
| 197 |
+
|
| 198 |
+
Args:
|
| 199 |
+
tool_name: Name of the tool to execute
|
| 200 |
+
arguments_json: JSON string of tool arguments
|
| 201 |
+
deps: Tool dependencies
|
| 202 |
+
|
| 203 |
+
Returns:
|
| 204 |
+
Dictionary with tool result
|
| 205 |
+
"""
|
| 206 |
+
try:
|
| 207 |
+
args = json.loads(arguments_json) if arguments_json else {}
|
| 208 |
+
except json.JSONDecodeError:
|
| 209 |
+
return {"error": f"Invalid JSON arguments: {arguments_json}"}
|
| 210 |
+
|
| 211 |
+
handlers = {
|
| 212 |
+
"look": _handle_look,
|
| 213 |
+
"camera": _handle_camera,
|
| 214 |
+
"face_tracking": _handle_face_tracking,
|
| 215 |
+
"dance": _handle_dance,
|
| 216 |
+
"emotion": _handle_emotion,
|
| 217 |
+
"stop_moves": _handle_stop_moves,
|
| 218 |
+
"idle": _handle_idle,
|
| 219 |
+
}
|
| 220 |
+
|
| 221 |
+
handler = handlers.get(tool_name)
|
| 222 |
+
if handler is None:
|
| 223 |
+
return {"error": f"Unknown tool: {tool_name}"}
|
| 224 |
+
|
| 225 |
+
try:
|
| 226 |
+
return await handler(args, deps)
|
| 227 |
+
except Exception as e:
|
| 228 |
+
logger.error("Tool '%s' failed: %s", tool_name, e, exc_info=True)
|
| 229 |
+
return {"error": str(e)}
|
| 230 |
+
|
| 231 |
+
|
| 232 |
+
async def _handle_look(args: dict, deps: ToolDependencies) -> dict:
|
| 233 |
+
"""Handle the look tool."""
|
| 234 |
+
from reachy_mini_openclaw.moves import HeadLookMove
|
| 235 |
+
|
| 236 |
+
direction = args.get("direction", "front")
|
| 237 |
+
|
| 238 |
+
try:
|
| 239 |
+
# Get current pose for smooth transition
|
| 240 |
+
_, current_ant = deps.robot.get_current_joint_positions()
|
| 241 |
+
current_head = deps.robot.get_current_head_pose()
|
| 242 |
+
|
| 243 |
+
move = HeadLookMove(
|
| 244 |
+
direction=direction,
|
| 245 |
+
start_pose=current_head,
|
| 246 |
+
start_antennas=tuple(current_ant),
|
| 247 |
+
duration=1.0,
|
| 248 |
+
)
|
| 249 |
+
deps.movement_manager.queue_move(move)
|
| 250 |
+
|
| 251 |
+
return {"status": "success", "direction": direction}
|
| 252 |
+
except Exception as e:
|
| 253 |
+
return {"error": str(e)}
|
| 254 |
+
|
| 255 |
+
|
| 256 |
+
async def _handle_camera(args: dict, deps: ToolDependencies) -> dict:
|
| 257 |
+
"""Handle the camera tool - capture image and get description.
|
| 258 |
+
|
| 259 |
+
Uses local vision (SmolVLM2) if available, otherwise falls back to OpenClaw.
|
| 260 |
+
"""
|
| 261 |
+
logger.info("Camera tool called, camera_worker=%s, vision_manager=%s",
|
| 262 |
+
deps.camera_worker is not None, deps.vision_manager is not None)
|
| 263 |
+
|
| 264 |
+
if deps.camera_worker is None:
|
| 265 |
+
logger.warning("Camera worker is None")
|
| 266 |
+
return {"error": "Camera not available"}
|
| 267 |
+
|
| 268 |
+
try:
|
| 269 |
+
frame = deps.camera_worker.get_latest_frame()
|
| 270 |
+
logger.info("Got frame from camera_worker: %s", frame is not None)
|
| 271 |
+
|
| 272 |
+
if frame is None:
|
| 273 |
+
# Try getting frame directly from robot as fallback
|
| 274 |
+
logger.info("Trying direct robot camera access...")
|
| 275 |
+
if deps.robot is not None:
|
| 276 |
+
try:
|
| 277 |
+
frame = deps.robot.media.get_frame()
|
| 278 |
+
logger.info("Direct frame capture: %s", frame is not None)
|
| 279 |
+
except Exception as e:
|
| 280 |
+
logger.error("Direct frame capture failed: %s", e)
|
| 281 |
+
|
| 282 |
+
if frame is None:
|
| 283 |
+
return {"error": "No frame available from camera"}
|
| 284 |
+
|
| 285 |
+
logger.info("Got frame, shape=%s", frame.shape)
|
| 286 |
+
|
| 287 |
+
# Option 1: Use local vision processor (SmolVLM2) if available
|
| 288 |
+
if deps.vision_manager is not None:
|
| 289 |
+
logger.info("Using local vision processor (SmolVLM2)...")
|
| 290 |
+
description = deps.vision_manager.process_now(
|
| 291 |
+
"Describe what you see in this image. Be specific about people, objects, and the environment. Keep it concise (2-3 sentences)."
|
| 292 |
+
)
|
| 293 |
+
if description and not description.startswith(("Vision", "Failed", "Error", "GPU", "No camera")):
|
| 294 |
+
logger.info("Local vision response: %s", description[:100])
|
| 295 |
+
return {
|
| 296 |
+
"status": "success",
|
| 297 |
+
"description": description,
|
| 298 |
+
"source": "local_vision"
|
| 299 |
+
}
|
| 300 |
+
else:
|
| 301 |
+
logger.warning("Local vision failed: %s", description)
|
| 302 |
+
|
| 303 |
+
# Option 2: Fall back to OpenClaw for vision analysis
|
| 304 |
+
if deps.openclaw_bridge is not None and deps.openclaw_bridge.is_connected:
|
| 305 |
+
logger.info("Using OpenClaw for vision analysis...")
|
| 306 |
+
import cv2
|
| 307 |
+
_, buffer = cv2.imencode('.jpg', frame, [cv2.IMWRITE_JPEG_QUALITY, 85])
|
| 308 |
+
b64_image = base64.b64encode(buffer).decode('utf-8')
|
| 309 |
+
|
| 310 |
+
response = await deps.openclaw_bridge.chat(
|
| 311 |
+
"Describe what you see in this image. Be specific about people, objects, and the environment. Keep it concise (2-3 sentences).",
|
| 312 |
+
image_b64=b64_image,
|
| 313 |
+
system_context="You are looking through your robot camera. Describe what you see naturally, as if you're the one looking.",
|
| 314 |
+
)
|
| 315 |
+
if response.content and not response.error:
|
| 316 |
+
logger.info("OpenClaw vision response: %s", response.content[:100])
|
| 317 |
+
return {
|
| 318 |
+
"status": "success",
|
| 319 |
+
"description": response.content,
|
| 320 |
+
"source": "openclaw"
|
| 321 |
+
}
|
| 322 |
+
else:
|
| 323 |
+
logger.warning("OpenClaw vision failed: %s", response.error)
|
| 324 |
+
|
| 325 |
+
# Fallback if neither is available
|
| 326 |
+
return {
|
| 327 |
+
"status": "partial",
|
| 328 |
+
"description": "I captured an image but couldn't analyze it. No vision processing available."
|
| 329 |
+
}
|
| 330 |
+
except Exception as e:
|
| 331 |
+
logger.error("Camera tool error: %s", e, exc_info=True)
|
| 332 |
+
return {"error": str(e)}
|
| 333 |
+
|
| 334 |
+
|
| 335 |
+
async def _handle_face_tracking(args: dict, deps: ToolDependencies) -> dict:
|
| 336 |
+
"""Handle face tracking toggle."""
|
| 337 |
+
enabled = args.get("enabled", False)
|
| 338 |
+
|
| 339 |
+
if deps.camera_worker is None:
|
| 340 |
+
return {"error": "Camera not available for face tracking"}
|
| 341 |
+
|
| 342 |
+
try:
|
| 343 |
+
# Check if head tracker is available
|
| 344 |
+
if deps.camera_worker.head_tracker is None:
|
| 345 |
+
return {"error": "Face tracking not available - no head tracker initialized"}
|
| 346 |
+
|
| 347 |
+
deps.camera_worker.set_head_tracking_enabled(enabled)
|
| 348 |
+
return {"status": "success", "face_tracking": enabled}
|
| 349 |
+
except Exception as e:
|
| 350 |
+
return {"error": str(e)}
|
| 351 |
+
|
| 352 |
+
|
| 353 |
+
async def _handle_dance(args: dict, deps: ToolDependencies) -> dict:
|
| 354 |
+
"""Handle dance tool."""
|
| 355 |
+
dance_name = args.get("dance_name", "happy")
|
| 356 |
+
|
| 357 |
+
try:
|
| 358 |
+
# Try to use dance library if available
|
| 359 |
+
from reachy_mini_dances_library import dances
|
| 360 |
+
|
| 361 |
+
if hasattr(dances, dance_name):
|
| 362 |
+
dance_class = getattr(dances, dance_name)
|
| 363 |
+
dance_move = dance_class()
|
| 364 |
+
deps.movement_manager.queue_move(dance_move)
|
| 365 |
+
return {"status": "success", "dance": dance_name}
|
| 366 |
+
else:
|
| 367 |
+
# Fallback to simple head movement
|
| 368 |
+
return await _handle_emotion({"emotion_name": dance_name}, deps)
|
| 369 |
+
except ImportError:
|
| 370 |
+
# No dance library, use emotion as fallback
|
| 371 |
+
return await _handle_emotion({"emotion_name": dance_name}, deps)
|
| 372 |
+
except Exception as e:
|
| 373 |
+
return {"error": str(e)}
|
| 374 |
+
|
| 375 |
+
|
| 376 |
+
async def _handle_emotion(args: dict, deps: ToolDependencies) -> dict:
|
| 377 |
+
"""Handle emotion expression."""
|
| 378 |
+
from reachy_mini_openclaw.moves import HeadLookMove
|
| 379 |
+
|
| 380 |
+
emotion_name = args.get("emotion_name", "happy")
|
| 381 |
+
|
| 382 |
+
# Map emotions to simple head movements
|
| 383 |
+
emotion_sequences = {
|
| 384 |
+
"happy": ["up", "front"],
|
| 385 |
+
"sad": ["down"],
|
| 386 |
+
"surprised": ["up", "front"],
|
| 387 |
+
"curious": ["right", "left", "front"],
|
| 388 |
+
"thinking": ["up", "left"],
|
| 389 |
+
"confused": ["left", "right", "front"],
|
| 390 |
+
"excited": ["up", "down", "up", "front"],
|
| 391 |
+
}
|
| 392 |
+
|
| 393 |
+
sequence = emotion_sequences.get(emotion_name, ["front"])
|
| 394 |
+
|
| 395 |
+
try:
|
| 396 |
+
for direction in sequence:
|
| 397 |
+
_, current_ant = deps.robot.get_current_joint_positions()
|
| 398 |
+
current_head = deps.robot.get_current_head_pose()
|
| 399 |
+
|
| 400 |
+
move = HeadLookMove(
|
| 401 |
+
direction=direction,
|
| 402 |
+
start_pose=current_head,
|
| 403 |
+
start_antennas=tuple(current_ant),
|
| 404 |
+
duration=0.5,
|
| 405 |
+
)
|
| 406 |
+
deps.movement_manager.queue_move(move)
|
| 407 |
+
|
| 408 |
+
return {"status": "success", "emotion": emotion_name}
|
| 409 |
+
except Exception as e:
|
| 410 |
+
return {"error": str(e)}
|
| 411 |
+
|
| 412 |
+
|
| 413 |
+
async def _handle_stop_moves(args: dict, deps: ToolDependencies) -> dict:
|
| 414 |
+
"""Stop all movements."""
|
| 415 |
+
deps.movement_manager.clear_move_queue()
|
| 416 |
+
return {"status": "success", "message": "All movements stopped"}
|
| 417 |
+
|
| 418 |
+
|
| 419 |
+
async def _handle_idle(args: dict, deps: ToolDependencies) -> dict:
|
| 420 |
+
"""Do nothing - explicitly stay idle."""
|
| 421 |
+
return {"status": "success", "message": "Staying idle"}
|
src/reachy_mini_openclaw/vision/__init__.py
ADDED
|
@@ -0,0 +1,18 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Vision modules for face tracking, detection, and image understanding."""
|
| 2 |
+
|
| 3 |
+
from reachy_mini_openclaw.vision.head_tracker import get_head_tracker
|
| 4 |
+
|
| 5 |
+
__all__ = [
|
| 6 |
+
"get_head_tracker",
|
| 7 |
+
]
|
| 8 |
+
|
| 9 |
+
# Lazy imports for optional heavy dependencies
|
| 10 |
+
def get_vision_processor():
|
| 11 |
+
"""Get the VisionProcessor class (requires torch, transformers)."""
|
| 12 |
+
from reachy_mini_openclaw.vision.processors import VisionProcessor
|
| 13 |
+
return VisionProcessor
|
| 14 |
+
|
| 15 |
+
def get_vision_manager():
|
| 16 |
+
"""Get the VisionManager class (requires torch, transformers)."""
|
| 17 |
+
from reachy_mini_openclaw.vision.processors import VisionManager
|
| 18 |
+
return VisionManager
|
src/reachy_mini_openclaw/vision/head_tracker.py
ADDED
|
@@ -0,0 +1,70 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Head tracker factory for selecting the best available tracker."""
|
| 2 |
+
|
| 3 |
+
import logging
|
| 4 |
+
from typing import Any, Optional
|
| 5 |
+
|
| 6 |
+
logger = logging.getLogger(__name__)
|
| 7 |
+
|
| 8 |
+
|
| 9 |
+
def get_head_tracker(tracker_type: Optional[str] = None) -> Optional[Any]:
|
| 10 |
+
"""Get a head tracker instance based on availability and preference.
|
| 11 |
+
|
| 12 |
+
Args:
|
| 13 |
+
tracker_type: One of 'yolo', 'mediapipe', or None for auto-detect
|
| 14 |
+
|
| 15 |
+
Returns:
|
| 16 |
+
Head tracker instance or None if no tracker available
|
| 17 |
+
"""
|
| 18 |
+
if tracker_type == "yolo":
|
| 19 |
+
return _try_yolo_tracker()
|
| 20 |
+
elif tracker_type == "mediapipe":
|
| 21 |
+
return _try_mediapipe_tracker()
|
| 22 |
+
elif tracker_type is None:
|
| 23 |
+
# Auto-detect: try MediaPipe first (lighter), then YOLO
|
| 24 |
+
tracker = _try_mediapipe_tracker()
|
| 25 |
+
if tracker is not None:
|
| 26 |
+
return tracker
|
| 27 |
+
return _try_yolo_tracker()
|
| 28 |
+
else:
|
| 29 |
+
logger.warning(f"Unknown tracker type: {tracker_type}")
|
| 30 |
+
return None
|
| 31 |
+
|
| 32 |
+
|
| 33 |
+
def _try_yolo_tracker() -> Optional[Any]:
|
| 34 |
+
"""Try to create a YOLO head tracker."""
|
| 35 |
+
try:
|
| 36 |
+
from reachy_mini_openclaw.vision.yolo_head_tracker import HeadTracker
|
| 37 |
+
tracker = HeadTracker()
|
| 38 |
+
logger.info("Using YOLO head tracker")
|
| 39 |
+
return tracker
|
| 40 |
+
except ImportError as e:
|
| 41 |
+
logger.debug(f"YOLO tracker not available: {e}")
|
| 42 |
+
return None
|
| 43 |
+
except Exception as e:
|
| 44 |
+
logger.warning(f"Failed to initialize YOLO tracker: {e}")
|
| 45 |
+
return None
|
| 46 |
+
|
| 47 |
+
|
| 48 |
+
def _try_mediapipe_tracker() -> Optional[Any]:
|
| 49 |
+
"""Try to create a MediaPipe head tracker."""
|
| 50 |
+
try:
|
| 51 |
+
# First try the toolbox version
|
| 52 |
+
from reachy_mini_toolbox.vision import HeadTracker
|
| 53 |
+
tracker = HeadTracker()
|
| 54 |
+
logger.info("Using MediaPipe head tracker (from toolbox)")
|
| 55 |
+
return tracker
|
| 56 |
+
except ImportError:
|
| 57 |
+
pass
|
| 58 |
+
|
| 59 |
+
try:
|
| 60 |
+
# Fall back to our own MediaPipe implementation
|
| 61 |
+
from reachy_mini_openclaw.vision.mediapipe_tracker import HeadTracker
|
| 62 |
+
tracker = HeadTracker()
|
| 63 |
+
logger.info("Using MediaPipe head tracker")
|
| 64 |
+
return tracker
|
| 65 |
+
except ImportError as e:
|
| 66 |
+
logger.debug(f"MediaPipe tracker not available: {e}")
|
| 67 |
+
return None
|
| 68 |
+
except Exception as e:
|
| 69 |
+
logger.warning(f"Failed to initialize MediaPipe tracker: {e}")
|
| 70 |
+
return None
|
src/reachy_mini_openclaw/vision/mediapipe_tracker.py
ADDED
|
@@ -0,0 +1,112 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""MediaPipe-based head tracker for face detection.
|
| 2 |
+
|
| 3 |
+
Uses MediaPipe Face Detection for lightweight face tracking.
|
| 4 |
+
Falls back to this if YOLO is not available.
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
from __future__ import annotations
|
| 8 |
+
|
| 9 |
+
import logging
|
| 10 |
+
from typing import Tuple, Optional
|
| 11 |
+
|
| 12 |
+
import numpy as np
|
| 13 |
+
from numpy.typing import NDArray
|
| 14 |
+
|
| 15 |
+
try:
|
| 16 |
+
import mediapipe as mp
|
| 17 |
+
except ImportError as e:
|
| 18 |
+
raise ImportError(
|
| 19 |
+
"To use MediaPipe head tracker, install: pip install mediapipe"
|
| 20 |
+
) from e
|
| 21 |
+
|
| 22 |
+
|
| 23 |
+
logger = logging.getLogger(__name__)
|
| 24 |
+
|
| 25 |
+
|
| 26 |
+
class HeadTracker:
|
| 27 |
+
"""Lightweight head tracker using MediaPipe for face detection."""
|
| 28 |
+
|
| 29 |
+
def __init__(
|
| 30 |
+
self,
|
| 31 |
+
min_detection_confidence: float = 0.5,
|
| 32 |
+
model_selection: int = 0,
|
| 33 |
+
) -> None:
|
| 34 |
+
"""Initialize MediaPipe-based head tracker.
|
| 35 |
+
|
| 36 |
+
Args:
|
| 37 |
+
min_detection_confidence: Minimum confidence for face detection
|
| 38 |
+
model_selection: 0 for short-range (2m), 1 for long-range (5m)
|
| 39 |
+
"""
|
| 40 |
+
self.min_detection_confidence = min_detection_confidence
|
| 41 |
+
|
| 42 |
+
# Initialize MediaPipe Face Detection
|
| 43 |
+
self.mp_face_detection = mp.solutions.face_detection
|
| 44 |
+
self.face_detection = self.mp_face_detection.FaceDetection(
|
| 45 |
+
min_detection_confidence=min_detection_confidence,
|
| 46 |
+
model_selection=model_selection,
|
| 47 |
+
)
|
| 48 |
+
logger.info("MediaPipe face detection initialized")
|
| 49 |
+
|
| 50 |
+
def get_head_position(
|
| 51 |
+
self, img: NDArray[np.uint8]
|
| 52 |
+
) -> Tuple[Optional[NDArray[np.float32]], Optional[float]]:
|
| 53 |
+
"""Get head position from face detection.
|
| 54 |
+
|
| 55 |
+
Args:
|
| 56 |
+
img: Input image (BGR format)
|
| 57 |
+
|
| 58 |
+
Returns:
|
| 59 |
+
Tuple of (eye_center in [-1,1] coords, roll_angle in radians)
|
| 60 |
+
"""
|
| 61 |
+
h, w = img.shape[:2]
|
| 62 |
+
|
| 63 |
+
try:
|
| 64 |
+
# Convert BGR to RGB for MediaPipe
|
| 65 |
+
rgb_img = img[:, :, ::-1]
|
| 66 |
+
|
| 67 |
+
# Run face detection
|
| 68 |
+
results = self.face_detection.process(rgb_img)
|
| 69 |
+
|
| 70 |
+
if not results.detections:
|
| 71 |
+
return None, None
|
| 72 |
+
|
| 73 |
+
# Get the first (most confident) detection
|
| 74 |
+
detection = results.detections[0]
|
| 75 |
+
|
| 76 |
+
# Get bounding box
|
| 77 |
+
bbox = detection.location_data.relative_bounding_box
|
| 78 |
+
|
| 79 |
+
# Calculate center of face
|
| 80 |
+
center_x = bbox.xmin + bbox.width / 2
|
| 81 |
+
center_y = bbox.ymin + bbox.height / 2
|
| 82 |
+
|
| 83 |
+
# Convert to [-1, 1] range
|
| 84 |
+
norm_x = center_x * 2.0 - 1.0
|
| 85 |
+
norm_y = center_y * 2.0 - 1.0
|
| 86 |
+
|
| 87 |
+
face_center = np.array([norm_x, norm_y], dtype=np.float32)
|
| 88 |
+
|
| 89 |
+
# Estimate roll from key points if available
|
| 90 |
+
roll = 0.0
|
| 91 |
+
keypoints = detection.location_data.relative_keypoints
|
| 92 |
+
if len(keypoints) >= 2:
|
| 93 |
+
# Use left and right eye positions to estimate roll
|
| 94 |
+
left_eye = keypoints[0] # LEFT_EYE
|
| 95 |
+
right_eye = keypoints[1] # RIGHT_EYE
|
| 96 |
+
|
| 97 |
+
dx = right_eye.x - left_eye.x
|
| 98 |
+
dy = right_eye.y - left_eye.y
|
| 99 |
+
roll = np.arctan2(dy, dx)
|
| 100 |
+
|
| 101 |
+
logger.debug(f"Face detected at ({norm_x:.2f}, {norm_y:.2f}), roll: {np.degrees(roll):.1f}°")
|
| 102 |
+
|
| 103 |
+
return face_center, roll
|
| 104 |
+
|
| 105 |
+
except Exception as e:
|
| 106 |
+
logger.error(f"Error in head position detection: {e}")
|
| 107 |
+
return None, None
|
| 108 |
+
|
| 109 |
+
def __del__(self):
|
| 110 |
+
"""Clean up MediaPipe resources."""
|
| 111 |
+
if hasattr(self, 'face_detection'):
|
| 112 |
+
self.face_detection.close()
|
src/reachy_mini_openclaw/vision/processors.py
ADDED
|
@@ -0,0 +1,419 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Local vision processing with SmolVLM2.
|
| 2 |
+
|
| 3 |
+
Provides on-device image understanding using the SmolVLM2 model
|
| 4 |
+
for scene description and visual analysis.
|
| 5 |
+
|
| 6 |
+
Based on pollen-robotics/reachy_mini_conversation_app vision processors.
|
| 7 |
+
"""
|
| 8 |
+
|
| 9 |
+
import os
|
| 10 |
+
import time
|
| 11 |
+
import base64
|
| 12 |
+
import logging
|
| 13 |
+
import threading
|
| 14 |
+
from typing import Any, Dict, Optional
|
| 15 |
+
from dataclasses import dataclass, field
|
| 16 |
+
|
| 17 |
+
import cv2
|
| 18 |
+
import numpy as np
|
| 19 |
+
from numpy.typing import NDArray
|
| 20 |
+
|
| 21 |
+
try:
|
| 22 |
+
import torch
|
| 23 |
+
from transformers import AutoProcessor, AutoModelForImageTextToText
|
| 24 |
+
from huggingface_hub import snapshot_download
|
| 25 |
+
VISION_AVAILABLE = True
|
| 26 |
+
except ImportError:
|
| 27 |
+
VISION_AVAILABLE = False
|
| 28 |
+
|
| 29 |
+
logger = logging.getLogger(__name__)
|
| 30 |
+
|
| 31 |
+
|
| 32 |
+
@dataclass
|
| 33 |
+
class VisionConfig:
|
| 34 |
+
"""Configuration for vision processing."""
|
| 35 |
+
|
| 36 |
+
model_path: str = "HuggingFaceTB/SmolVLM2-256M-Video-Instruct"
|
| 37 |
+
vision_interval: float = 5.0
|
| 38 |
+
max_new_tokens: int = 64
|
| 39 |
+
jpeg_quality: int = 85
|
| 40 |
+
max_retries: int = 3
|
| 41 |
+
retry_delay: float = 1.0
|
| 42 |
+
device_preference: str = "auto" # "auto", "cuda", "mps", "cpu"
|
| 43 |
+
hf_home: str = field(default_factory=lambda: os.path.expanduser("~/.cache/huggingface"))
|
| 44 |
+
|
| 45 |
+
|
| 46 |
+
class VisionProcessor:
|
| 47 |
+
"""Handles SmolVLM2 model loading and inference for local vision."""
|
| 48 |
+
|
| 49 |
+
def __init__(self, vision_config: Optional[VisionConfig] = None):
|
| 50 |
+
"""Initialize the vision processor.
|
| 51 |
+
|
| 52 |
+
Args:
|
| 53 |
+
vision_config: Vision configuration settings
|
| 54 |
+
"""
|
| 55 |
+
if not VISION_AVAILABLE:
|
| 56 |
+
raise ImportError(
|
| 57 |
+
"Vision processing requires: pip install torch transformers huggingface-hub"
|
| 58 |
+
)
|
| 59 |
+
|
| 60 |
+
self.vision_config = vision_config or VisionConfig()
|
| 61 |
+
self.model_path = self.vision_config.model_path
|
| 62 |
+
self.device = self._determine_device()
|
| 63 |
+
self.processor = None
|
| 64 |
+
self.model = None
|
| 65 |
+
self._initialized = False
|
| 66 |
+
|
| 67 |
+
def _determine_device(self) -> str:
|
| 68 |
+
"""Determine the best device for inference."""
|
| 69 |
+
pref = self.vision_config.device_preference
|
| 70 |
+
|
| 71 |
+
if pref == "cpu":
|
| 72 |
+
return "cpu"
|
| 73 |
+
if pref == "cuda":
|
| 74 |
+
return "cuda" if torch.cuda.is_available() else "cpu"
|
| 75 |
+
if pref == "mps":
|
| 76 |
+
return "mps" if torch.backends.mps.is_available() else "cpu"
|
| 77 |
+
|
| 78 |
+
# auto: prefer mps on Apple, then cuda, else cpu
|
| 79 |
+
if torch.backends.mps.is_available():
|
| 80 |
+
return "mps"
|
| 81 |
+
return "cuda" if torch.cuda.is_available() else "cpu"
|
| 82 |
+
|
| 83 |
+
def initialize(self) -> bool:
|
| 84 |
+
"""Load model and processor onto the selected device.
|
| 85 |
+
|
| 86 |
+
Returns:
|
| 87 |
+
True if initialization successful, False otherwise
|
| 88 |
+
"""
|
| 89 |
+
try:
|
| 90 |
+
cache_dir = self.vision_config.hf_home
|
| 91 |
+
os.makedirs(cache_dir, exist_ok=True)
|
| 92 |
+
os.environ["HF_HOME"] = cache_dir
|
| 93 |
+
|
| 94 |
+
logger.info(f"Loading SmolVLM2 model on {self.device} (HF_HOME={cache_dir})")
|
| 95 |
+
|
| 96 |
+
# Download model to cache first
|
| 97 |
+
logger.info(f"Downloading vision model {self.model_path}...")
|
| 98 |
+
snapshot_download(
|
| 99 |
+
repo_id=self.model_path,
|
| 100 |
+
repo_type="model",
|
| 101 |
+
cache_dir=cache_dir,
|
| 102 |
+
)
|
| 103 |
+
|
| 104 |
+
self.processor = AutoProcessor.from_pretrained(self.model_path)
|
| 105 |
+
|
| 106 |
+
# Select dtype depending on device
|
| 107 |
+
if self.device == "cuda":
|
| 108 |
+
dtype = torch.bfloat16
|
| 109 |
+
elif self.device == "mps":
|
| 110 |
+
dtype = torch.float32 # best for MPS
|
| 111 |
+
else:
|
| 112 |
+
dtype = torch.float32
|
| 113 |
+
|
| 114 |
+
model_kwargs: Dict[str, Any] = {"torch_dtype": dtype}
|
| 115 |
+
|
| 116 |
+
# flash_attention_2 is CUDA-only; skip on MPS/CPU
|
| 117 |
+
if self.device == "cuda":
|
| 118 |
+
model_kwargs["_attn_implementation"] = "flash_attention_2"
|
| 119 |
+
|
| 120 |
+
# Load model weights
|
| 121 |
+
self.model = AutoModelForImageTextToText.from_pretrained(
|
| 122 |
+
self.model_path, **model_kwargs
|
| 123 |
+
).to(self.device)
|
| 124 |
+
|
| 125 |
+
if self.model is not None:
|
| 126 |
+
self.model.eval()
|
| 127 |
+
self._initialized = True
|
| 128 |
+
logger.info(f"Vision model loaded successfully on {self.device}")
|
| 129 |
+
return True
|
| 130 |
+
|
| 131 |
+
except Exception as e:
|
| 132 |
+
logger.error(f"Failed to initialize vision model: {e}")
|
| 133 |
+
return False
|
| 134 |
+
|
| 135 |
+
return False
|
| 136 |
+
|
| 137 |
+
def process_image(
|
| 138 |
+
self,
|
| 139 |
+
cv2_image: NDArray[np.uint8],
|
| 140 |
+
prompt: str = "Briefly describe what you see in one sentence.",
|
| 141 |
+
) -> str:
|
| 142 |
+
"""Process CV2 image and return description with retry logic.
|
| 143 |
+
|
| 144 |
+
Args:
|
| 145 |
+
cv2_image: OpenCV image (BGR format)
|
| 146 |
+
prompt: Question/prompt to ask about the image
|
| 147 |
+
|
| 148 |
+
Returns:
|
| 149 |
+
Text description of the image
|
| 150 |
+
"""
|
| 151 |
+
if not self._initialized or self.processor is None or self.model is None:
|
| 152 |
+
return "Vision model not initialized"
|
| 153 |
+
|
| 154 |
+
for attempt in range(self.vision_config.max_retries):
|
| 155 |
+
try:
|
| 156 |
+
# Convert to JPEG bytes
|
| 157 |
+
success, jpeg_buffer = cv2.imencode(
|
| 158 |
+
".jpg",
|
| 159 |
+
cv2_image,
|
| 160 |
+
[cv2.IMWRITE_JPEG_QUALITY, self.vision_config.jpeg_quality],
|
| 161 |
+
)
|
| 162 |
+
if not success:
|
| 163 |
+
return "Failed to encode image"
|
| 164 |
+
|
| 165 |
+
# Convert to base64
|
| 166 |
+
image_base64 = base64.b64encode(jpeg_buffer.tobytes()).decode("utf-8")
|
| 167 |
+
|
| 168 |
+
messages = [
|
| 169 |
+
{
|
| 170 |
+
"role": "user",
|
| 171 |
+
"content": [
|
| 172 |
+
{
|
| 173 |
+
"type": "image",
|
| 174 |
+
"url": f"data:image/jpeg;base64,{image_base64}",
|
| 175 |
+
},
|
| 176 |
+
{"type": "text", "text": prompt},
|
| 177 |
+
],
|
| 178 |
+
},
|
| 179 |
+
]
|
| 180 |
+
|
| 181 |
+
inputs = self.processor.apply_chat_template(
|
| 182 |
+
messages,
|
| 183 |
+
add_generation_prompt=True,
|
| 184 |
+
tokenize=True,
|
| 185 |
+
return_dict=True,
|
| 186 |
+
return_tensors="pt",
|
| 187 |
+
)
|
| 188 |
+
|
| 189 |
+
# Move tensors to device WITHOUT forcing dtype (keeps input_ids as torch.long)
|
| 190 |
+
inputs = {
|
| 191 |
+
k: (v.to(self.device) if hasattr(v, "to") else v)
|
| 192 |
+
for k, v in inputs.items()
|
| 193 |
+
}
|
| 194 |
+
|
| 195 |
+
with torch.no_grad():
|
| 196 |
+
generated_ids = self.model.generate(
|
| 197 |
+
**inputs,
|
| 198 |
+
do_sample=False,
|
| 199 |
+
max_new_tokens=self.vision_config.max_new_tokens,
|
| 200 |
+
pad_token_id=self.processor.tokenizer.eos_token_id,
|
| 201 |
+
)
|
| 202 |
+
|
| 203 |
+
generated_texts = self.processor.batch_decode(
|
| 204 |
+
generated_ids,
|
| 205 |
+
skip_special_tokens=True,
|
| 206 |
+
)
|
| 207 |
+
|
| 208 |
+
# Extract just the response part
|
| 209 |
+
full_text = generated_texts[0]
|
| 210 |
+
response = self._extract_response(full_text)
|
| 211 |
+
|
| 212 |
+
# Clean up GPU memory if using CUDA
|
| 213 |
+
if self.device == "cuda":
|
| 214 |
+
torch.cuda.empty_cache()
|
| 215 |
+
elif self.device == "mps":
|
| 216 |
+
torch.mps.empty_cache()
|
| 217 |
+
|
| 218 |
+
return response.replace(chr(10), " ").strip()
|
| 219 |
+
|
| 220 |
+
except Exception as e:
|
| 221 |
+
if "OutOfMemory" in str(type(e).__name__):
|
| 222 |
+
logger.error(f"GPU OOM on attempt {attempt + 1}: {e}")
|
| 223 |
+
if self.device == "cuda":
|
| 224 |
+
torch.cuda.empty_cache()
|
| 225 |
+
if attempt < self.vision_config.max_retries - 1:
|
| 226 |
+
time.sleep(self.vision_config.retry_delay * (attempt + 1))
|
| 227 |
+
else:
|
| 228 |
+
return "GPU out of memory - vision processing failed"
|
| 229 |
+
else:
|
| 230 |
+
logger.error(f"Vision processing failed (attempt {attempt + 1}): {e}")
|
| 231 |
+
if attempt < self.vision_config.max_retries - 1:
|
| 232 |
+
time.sleep(self.vision_config.retry_delay)
|
| 233 |
+
else:
|
| 234 |
+
return f"Vision processing error after {self.vision_config.max_retries} attempts"
|
| 235 |
+
|
| 236 |
+
return "Vision processing failed"
|
| 237 |
+
|
| 238 |
+
def _extract_response(self, full_text: str) -> str:
|
| 239 |
+
"""Extract the assistant's response from the full generated text."""
|
| 240 |
+
# Handle different response formats
|
| 241 |
+
markers = ["assistant\n", "Assistant:", "Response:", "\n\n"]
|
| 242 |
+
|
| 243 |
+
for marker in markers:
|
| 244 |
+
if marker in full_text:
|
| 245 |
+
response = full_text.split(marker)[-1].strip()
|
| 246 |
+
if response: # Ensure we got a meaningful response
|
| 247 |
+
return response
|
| 248 |
+
|
| 249 |
+
# Fallback: return the full text cleaned up
|
| 250 |
+
return full_text.strip()
|
| 251 |
+
|
| 252 |
+
def get_model_info(self) -> Dict[str, Any]:
|
| 253 |
+
"""Get information about the loaded model."""
|
| 254 |
+
info = {
|
| 255 |
+
"initialized": self._initialized,
|
| 256 |
+
"device": self.device,
|
| 257 |
+
"model_path": self.model_path,
|
| 258 |
+
"cuda_available": torch.cuda.is_available() if VISION_AVAILABLE else False,
|
| 259 |
+
}
|
| 260 |
+
|
| 261 |
+
if VISION_AVAILABLE and torch.cuda.is_available():
|
| 262 |
+
info["gpu_memory_gb"] = torch.cuda.get_device_properties(0).total_memory // (1024**3)
|
| 263 |
+
else:
|
| 264 |
+
info["gpu_memory_gb"] = "N/A"
|
| 265 |
+
|
| 266 |
+
return info
|
| 267 |
+
|
| 268 |
+
|
| 269 |
+
class VisionManager:
|
| 270 |
+
"""Manages periodic vision processing and scene understanding.
|
| 271 |
+
|
| 272 |
+
This runs in the background, periodically capturing frames and
|
| 273 |
+
generating scene descriptions that can be queried.
|
| 274 |
+
"""
|
| 275 |
+
|
| 276 |
+
def __init__(
|
| 277 |
+
self,
|
| 278 |
+
camera_worker: Any,
|
| 279 |
+
vision_config: Optional[VisionConfig] = None,
|
| 280 |
+
):
|
| 281 |
+
"""Initialize vision manager.
|
| 282 |
+
|
| 283 |
+
Args:
|
| 284 |
+
camera_worker: CameraWorker instance for frame capture
|
| 285 |
+
vision_config: Vision configuration settings
|
| 286 |
+
"""
|
| 287 |
+
self.camera_worker = camera_worker
|
| 288 |
+
self.vision_config = vision_config or VisionConfig()
|
| 289 |
+
self.vision_interval = self.vision_config.vision_interval
|
| 290 |
+
self.processor = VisionProcessor(self.vision_config)
|
| 291 |
+
|
| 292 |
+
self._last_processed_time = 0.0
|
| 293 |
+
self._last_description = ""
|
| 294 |
+
self._description_lock = threading.Lock()
|
| 295 |
+
self._stop_event = threading.Event()
|
| 296 |
+
self._thread: Optional[threading.Thread] = None
|
| 297 |
+
|
| 298 |
+
# Initialize processor
|
| 299 |
+
if not self.processor.initialize():
|
| 300 |
+
logger.error("Failed to initialize vision processor")
|
| 301 |
+
raise RuntimeError("Vision processor initialization failed")
|
| 302 |
+
|
| 303 |
+
def start(self) -> None:
|
| 304 |
+
"""Start the vision processing loop in a background thread."""
|
| 305 |
+
self._stop_event.clear()
|
| 306 |
+
self._thread = threading.Thread(target=self._working_loop, daemon=True)
|
| 307 |
+
self._thread.start()
|
| 308 |
+
logger.info("Local vision processing started")
|
| 309 |
+
|
| 310 |
+
def stop(self) -> None:
|
| 311 |
+
"""Stop the vision processing loop."""
|
| 312 |
+
self._stop_event.set()
|
| 313 |
+
if self._thread is not None:
|
| 314 |
+
self._thread.join(timeout=5.0)
|
| 315 |
+
logger.info("Local vision processing stopped")
|
| 316 |
+
|
| 317 |
+
def get_latest_description(self) -> str:
|
| 318 |
+
"""Get the most recent scene description.
|
| 319 |
+
|
| 320 |
+
Returns:
|
| 321 |
+
Latest scene description or empty string if none available
|
| 322 |
+
"""
|
| 323 |
+
with self._description_lock:
|
| 324 |
+
return self._last_description
|
| 325 |
+
|
| 326 |
+
def process_now(self, prompt: str = "Briefly describe what you see in one sentence.") -> str:
|
| 327 |
+
"""Process the current frame immediately with a custom prompt.
|
| 328 |
+
|
| 329 |
+
Args:
|
| 330 |
+
prompt: Question/prompt to ask about the image
|
| 331 |
+
|
| 332 |
+
Returns:
|
| 333 |
+
Description of what the camera sees
|
| 334 |
+
"""
|
| 335 |
+
frame = self.camera_worker.get_latest_frame()
|
| 336 |
+
if frame is None:
|
| 337 |
+
return "No camera frame available"
|
| 338 |
+
|
| 339 |
+
return self.processor.process_image(frame, prompt)
|
| 340 |
+
|
| 341 |
+
def _working_loop(self) -> None:
|
| 342 |
+
"""Vision processing loop (runs in separate thread)."""
|
| 343 |
+
while not self._stop_event.is_set():
|
| 344 |
+
try:
|
| 345 |
+
current_time = time.time()
|
| 346 |
+
|
| 347 |
+
if current_time - self._last_processed_time >= self.vision_interval:
|
| 348 |
+
frame = self.camera_worker.get_latest_frame()
|
| 349 |
+
if frame is not None:
|
| 350 |
+
description = self.processor.process_image(
|
| 351 |
+
frame,
|
| 352 |
+
"Briefly describe what you see in one sentence.",
|
| 353 |
+
)
|
| 354 |
+
|
| 355 |
+
# Only update if we got a valid response
|
| 356 |
+
if description and not description.startswith(
|
| 357 |
+
("Vision", "Failed", "Error", "GPU")
|
| 358 |
+
):
|
| 359 |
+
with self._description_lock:
|
| 360 |
+
self._last_description = description
|
| 361 |
+
self._last_processed_time = current_time
|
| 362 |
+
logger.debug(f"Vision update: {description}")
|
| 363 |
+
else:
|
| 364 |
+
logger.warning(f"Invalid vision response: {description}")
|
| 365 |
+
|
| 366 |
+
time.sleep(1.0) # Check every second
|
| 367 |
+
|
| 368 |
+
except Exception:
|
| 369 |
+
logger.exception("Vision processing loop error")
|
| 370 |
+
time.sleep(5.0) # Longer sleep on error
|
| 371 |
+
|
| 372 |
+
logger.info("Vision loop finished")
|
| 373 |
+
|
| 374 |
+
def get_status(self) -> Dict[str, Any]:
|
| 375 |
+
"""Get comprehensive status information."""
|
| 376 |
+
return {
|
| 377 |
+
"last_processed": self._last_processed_time,
|
| 378 |
+
"last_description": self.get_latest_description(),
|
| 379 |
+
"processor_info": self.processor.get_model_info(),
|
| 380 |
+
"config": {
|
| 381 |
+
"interval": self.vision_interval,
|
| 382 |
+
},
|
| 383 |
+
}
|
| 384 |
+
|
| 385 |
+
|
| 386 |
+
def initialize_vision_manager(
|
| 387 |
+
camera_worker: Any,
|
| 388 |
+
config: Optional[VisionConfig] = None,
|
| 389 |
+
) -> Optional[VisionManager]:
|
| 390 |
+
"""Initialize vision manager with model download and configuration.
|
| 391 |
+
|
| 392 |
+
Args:
|
| 393 |
+
camera_worker: CameraWorker instance for frame capture
|
| 394 |
+
config: Optional vision configuration
|
| 395 |
+
|
| 396 |
+
Returns:
|
| 397 |
+
VisionManager instance or None if initialization fails
|
| 398 |
+
"""
|
| 399 |
+
if not VISION_AVAILABLE:
|
| 400 |
+
logger.warning("Vision dependencies not available. Install: pip install torch transformers")
|
| 401 |
+
return None
|
| 402 |
+
|
| 403 |
+
try:
|
| 404 |
+
vision_config = config or VisionConfig()
|
| 405 |
+
|
| 406 |
+
# Initialize vision manager
|
| 407 |
+
vision_manager = VisionManager(camera_worker, vision_config)
|
| 408 |
+
|
| 409 |
+
# Log device info
|
| 410 |
+
device_info = vision_manager.processor.get_model_info()
|
| 411 |
+
logger.info(
|
| 412 |
+
f"Local vision enabled: {device_info.get('model_path')} on {device_info.get('device')}"
|
| 413 |
+
)
|
| 414 |
+
|
| 415 |
+
return vision_manager
|
| 416 |
+
|
| 417 |
+
except Exception as e:
|
| 418 |
+
logger.error(f"Failed to initialize vision manager: {e}")
|
| 419 |
+
return None
|
src/reachy_mini_openclaw/vision/yolo_head_tracker.py
ADDED
|
@@ -0,0 +1,152 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""YOLO-based head tracker for face detection.
|
| 2 |
+
|
| 3 |
+
Uses YOLOv11 for fast, accurate face detection.
|
| 4 |
+
"""
|
| 5 |
+
|
| 6 |
+
from __future__ import annotations
|
| 7 |
+
|
| 8 |
+
import logging
|
| 9 |
+
from typing import Tuple, Optional
|
| 10 |
+
|
| 11 |
+
import numpy as np
|
| 12 |
+
from numpy.typing import NDArray
|
| 13 |
+
|
| 14 |
+
try:
|
| 15 |
+
from supervision import Detections
|
| 16 |
+
from ultralytics import YOLO
|
| 17 |
+
except ImportError as e:
|
| 18 |
+
raise ImportError(
|
| 19 |
+
"To use YOLO head tracker, install: pip install ultralytics supervision"
|
| 20 |
+
) from e
|
| 21 |
+
|
| 22 |
+
from huggingface_hub import hf_hub_download
|
| 23 |
+
|
| 24 |
+
|
| 25 |
+
logger = logging.getLogger(__name__)
|
| 26 |
+
|
| 27 |
+
|
| 28 |
+
class HeadTracker:
|
| 29 |
+
"""Lightweight head tracker using YOLO for face detection."""
|
| 30 |
+
|
| 31 |
+
def __init__(
|
| 32 |
+
self,
|
| 33 |
+
model_repo: str = "AdamCodd/YOLOv11n-face-detection",
|
| 34 |
+
model_filename: str = "model.pt",
|
| 35 |
+
confidence_threshold: float = 0.3,
|
| 36 |
+
device: str = "cpu",
|
| 37 |
+
) -> None:
|
| 38 |
+
"""Initialize YOLO-based head tracker.
|
| 39 |
+
|
| 40 |
+
Args:
|
| 41 |
+
model_repo: HuggingFace model repository
|
| 42 |
+
model_filename: Model file name
|
| 43 |
+
confidence_threshold: Minimum confidence for face detection
|
| 44 |
+
device: Device to run inference on ('cpu' or 'cuda')
|
| 45 |
+
"""
|
| 46 |
+
self.confidence_threshold = confidence_threshold
|
| 47 |
+
|
| 48 |
+
try:
|
| 49 |
+
# Download and load YOLO model
|
| 50 |
+
model_path = hf_hub_download(repo_id=model_repo, filename=model_filename)
|
| 51 |
+
self.model = YOLO(model_path).to(device)
|
| 52 |
+
logger.info(f"YOLO face detection model loaded from {model_repo}")
|
| 53 |
+
except Exception as e:
|
| 54 |
+
logger.error(f"Failed to load YOLO model: {e}")
|
| 55 |
+
raise
|
| 56 |
+
|
| 57 |
+
def _select_best_face(self, detections: Detections) -> Optional[int]:
|
| 58 |
+
"""Select the best face based on confidence and area.
|
| 59 |
+
|
| 60 |
+
Args:
|
| 61 |
+
detections: Supervision detections object
|
| 62 |
+
|
| 63 |
+
Returns:
|
| 64 |
+
Index of best face or None if no valid faces
|
| 65 |
+
"""
|
| 66 |
+
if detections.xyxy.shape[0] == 0:
|
| 67 |
+
return None
|
| 68 |
+
|
| 69 |
+
if detections.confidence is None:
|
| 70 |
+
return None
|
| 71 |
+
|
| 72 |
+
# Filter by confidence threshold
|
| 73 |
+
valid_mask = detections.confidence >= self.confidence_threshold
|
| 74 |
+
if not np.any(valid_mask):
|
| 75 |
+
return None
|
| 76 |
+
|
| 77 |
+
valid_indices = np.where(valid_mask)[0]
|
| 78 |
+
|
| 79 |
+
# Calculate areas for valid detections
|
| 80 |
+
boxes = detections.xyxy[valid_indices]
|
| 81 |
+
areas = (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1])
|
| 82 |
+
|
| 83 |
+
# Combine confidence and area (weighted towards larger faces)
|
| 84 |
+
confidences = detections.confidence[valid_indices]
|
| 85 |
+
scores = confidences * 0.7 + (areas / np.max(areas)) * 0.3
|
| 86 |
+
|
| 87 |
+
# Return index of best face
|
| 88 |
+
best_idx = valid_indices[np.argmax(scores)]
|
| 89 |
+
return int(best_idx)
|
| 90 |
+
|
| 91 |
+
def _bbox_to_normalized_coords(
|
| 92 |
+
self, bbox: NDArray[np.float32], w: int, h: int
|
| 93 |
+
) -> NDArray[np.float32]:
|
| 94 |
+
"""Convert bounding box center to normalized coordinates [-1, 1].
|
| 95 |
+
|
| 96 |
+
Args:
|
| 97 |
+
bbox: Bounding box [x1, y1, x2, y2]
|
| 98 |
+
w: Image width
|
| 99 |
+
h: Image height
|
| 100 |
+
|
| 101 |
+
Returns:
|
| 102 |
+
Center point in [-1, 1] coordinates
|
| 103 |
+
"""
|
| 104 |
+
center_x = (bbox[0] + bbox[2]) / 2.0
|
| 105 |
+
center_y = (bbox[1] + bbox[3]) / 2.0
|
| 106 |
+
|
| 107 |
+
# Normalize to [0, 1] then to [-1, 1]
|
| 108 |
+
norm_x = (center_x / w) * 2.0 - 1.0
|
| 109 |
+
norm_y = (center_y / h) * 2.0 - 1.0
|
| 110 |
+
|
| 111 |
+
return np.array([norm_x, norm_y], dtype=np.float32)
|
| 112 |
+
|
| 113 |
+
def get_head_position(
|
| 114 |
+
self, img: NDArray[np.uint8]
|
| 115 |
+
) -> Tuple[Optional[NDArray[np.float32]], Optional[float]]:
|
| 116 |
+
"""Get head position from face detection.
|
| 117 |
+
|
| 118 |
+
Args:
|
| 119 |
+
img: Input image (BGR format)
|
| 120 |
+
|
| 121 |
+
Returns:
|
| 122 |
+
Tuple of (eye_center in [-1,1] coords, roll_angle in radians)
|
| 123 |
+
"""
|
| 124 |
+
h, w = img.shape[:2]
|
| 125 |
+
|
| 126 |
+
try:
|
| 127 |
+
# Run YOLO inference
|
| 128 |
+
results = self.model(img, verbose=False)
|
| 129 |
+
detections = Detections.from_ultralytics(results[0])
|
| 130 |
+
|
| 131 |
+
# Select best face
|
| 132 |
+
face_idx = self._select_best_face(detections)
|
| 133 |
+
if face_idx is None:
|
| 134 |
+
return None, None
|
| 135 |
+
|
| 136 |
+
bbox = detections.xyxy[face_idx]
|
| 137 |
+
|
| 138 |
+
if detections.confidence is not None:
|
| 139 |
+
confidence = detections.confidence[face_idx]
|
| 140 |
+
logger.debug(f"Face detected with confidence: {confidence:.2f}")
|
| 141 |
+
|
| 142 |
+
# Get face center in [-1, 1] coordinates
|
| 143 |
+
face_center = self._bbox_to_normalized_coords(bbox, w, h)
|
| 144 |
+
|
| 145 |
+
# Roll is 0 since we don't have keypoints for precise angle estimation
|
| 146 |
+
roll = 0.0
|
| 147 |
+
|
| 148 |
+
return face_center, roll
|
| 149 |
+
|
| 150 |
+
except Exception as e:
|
| 151 |
+
logger.error(f"Error in head position detection: {e}")
|
| 152 |
+
return None, None
|
style.css
ADDED
|
@@ -0,0 +1,425 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
:root {
|
| 2 |
+
--bg: #0d0a1a;
|
| 3 |
+
--panel: #1a1428;
|
| 4 |
+
--glass: rgba(26, 20, 40, 0.7);
|
| 5 |
+
--card: rgba(255, 255, 255, 0.04);
|
| 6 |
+
--accent: #ff6b6b;
|
| 7 |
+
--accent-2: #9b59b6;
|
| 8 |
+
--accent-3: #f39c12;
|
| 9 |
+
--text: #f0e8f8;
|
| 10 |
+
--muted: #b8a8c8;
|
| 11 |
+
--border: rgba(255, 255, 255, 0.08);
|
| 12 |
+
--shadow: 0 25px 70px rgba(0, 0, 0, 0.45);
|
| 13 |
+
font-family: "Space Grotesk", "Manrope", system-ui, -apple-system, sans-serif;
|
| 14 |
+
}
|
| 15 |
+
|
| 16 |
+
* {
|
| 17 |
+
margin: 0;
|
| 18 |
+
padding: 0;
|
| 19 |
+
box-sizing: border-box;
|
| 20 |
+
}
|
| 21 |
+
|
| 22 |
+
body {
|
| 23 |
+
background: radial-gradient(circle at 20% 20%, rgba(255, 107, 107, 0.15), transparent 30%),
|
| 24 |
+
radial-gradient(circle at 80% 0%, rgba(155, 89, 182, 0.18), transparent 32%),
|
| 25 |
+
radial-gradient(circle at 50% 70%, rgba(243, 156, 18, 0.1), transparent 30%),
|
| 26 |
+
var(--bg);
|
| 27 |
+
color: var(--text);
|
| 28 |
+
min-height: 100vh;
|
| 29 |
+
line-height: 1.6;
|
| 30 |
+
padding-bottom: 3rem;
|
| 31 |
+
}
|
| 32 |
+
|
| 33 |
+
a {
|
| 34 |
+
color: inherit;
|
| 35 |
+
text-decoration: none;
|
| 36 |
+
}
|
| 37 |
+
|
| 38 |
+
.hero {
|
| 39 |
+
padding: 3.5rem clamp(1.5rem, 3vw, 3rem) 2.5rem;
|
| 40 |
+
position: relative;
|
| 41 |
+
overflow: hidden;
|
| 42 |
+
}
|
| 43 |
+
|
| 44 |
+
.hero::after {
|
| 45 |
+
content: "";
|
| 46 |
+
position: absolute;
|
| 47 |
+
inset: 0;
|
| 48 |
+
background: linear-gradient(120deg, rgba(255, 107, 107, 0.12), rgba(155, 89, 182, 0.08), transparent);
|
| 49 |
+
pointer-events: none;
|
| 50 |
+
}
|
| 51 |
+
|
| 52 |
+
.topline {
|
| 53 |
+
display: flex;
|
| 54 |
+
align-items: center;
|
| 55 |
+
justify-content: space-between;
|
| 56 |
+
max-width: 1200px;
|
| 57 |
+
margin: 0 auto 2rem;
|
| 58 |
+
position: relative;
|
| 59 |
+
z-index: 2;
|
| 60 |
+
}
|
| 61 |
+
|
| 62 |
+
.brand {
|
| 63 |
+
display: flex;
|
| 64 |
+
align-items: center;
|
| 65 |
+
gap: 0.5rem;
|
| 66 |
+
font-weight: 700;
|
| 67 |
+
letter-spacing: 0.5px;
|
| 68 |
+
color: var(--text);
|
| 69 |
+
}
|
| 70 |
+
|
| 71 |
+
.logo {
|
| 72 |
+
display: inline-flex;
|
| 73 |
+
align-items: center;
|
| 74 |
+
justify-content: center;
|
| 75 |
+
width: 2.4rem;
|
| 76 |
+
height: 2.4rem;
|
| 77 |
+
border-radius: 10px;
|
| 78 |
+
background: linear-gradient(145deg, rgba(255, 107, 107, 0.2), rgba(155, 89, 182, 0.2));
|
| 79 |
+
box-shadow: 0 10px 30px rgba(0, 0, 0, 0.25);
|
| 80 |
+
font-size: 1.4rem;
|
| 81 |
+
}
|
| 82 |
+
|
| 83 |
+
.brand-name {
|
| 84 |
+
font-size: 1.2rem;
|
| 85 |
+
}
|
| 86 |
+
|
| 87 |
+
.pill {
|
| 88 |
+
background: rgba(255, 255, 255, 0.06);
|
| 89 |
+
border: 1px solid var(--border);
|
| 90 |
+
padding: 0.6rem 1rem;
|
| 91 |
+
border-radius: 999px;
|
| 92 |
+
color: var(--muted);
|
| 93 |
+
font-size: 0.9rem;
|
| 94 |
+
box-shadow: 0 12px 30px rgba(0, 0, 0, 0.2);
|
| 95 |
+
}
|
| 96 |
+
|
| 97 |
+
.hero-grid {
|
| 98 |
+
display: grid;
|
| 99 |
+
grid-template-columns: repeat(auto-fit, minmax(320px, 1fr));
|
| 100 |
+
gap: clamp(1.5rem, 2.5vw, 2.5rem);
|
| 101 |
+
max-width: 1200px;
|
| 102 |
+
margin: 0 auto;
|
| 103 |
+
position: relative;
|
| 104 |
+
z-index: 2;
|
| 105 |
+
align-items: center;
|
| 106 |
+
}
|
| 107 |
+
|
| 108 |
+
.hero-copy h1 {
|
| 109 |
+
font-size: clamp(2.6rem, 4vw, 3.6rem);
|
| 110 |
+
margin-bottom: 1rem;
|
| 111 |
+
line-height: 1.1;
|
| 112 |
+
letter-spacing: -0.5px;
|
| 113 |
+
}
|
| 114 |
+
|
| 115 |
+
.eyebrow {
|
| 116 |
+
display: inline-flex;
|
| 117 |
+
align-items: center;
|
| 118 |
+
gap: 0.5rem;
|
| 119 |
+
text-transform: uppercase;
|
| 120 |
+
letter-spacing: 1px;
|
| 121 |
+
font-size: 0.8rem;
|
| 122 |
+
color: var(--muted);
|
| 123 |
+
margin-bottom: 0.75rem;
|
| 124 |
+
}
|
| 125 |
+
|
| 126 |
+
.eyebrow::before {
|
| 127 |
+
content: "";
|
| 128 |
+
display: inline-block;
|
| 129 |
+
width: 24px;
|
| 130 |
+
height: 2px;
|
| 131 |
+
background: linear-gradient(90deg, var(--accent), var(--accent-2));
|
| 132 |
+
border-radius: 999px;
|
| 133 |
+
}
|
| 134 |
+
|
| 135 |
+
.lede {
|
| 136 |
+
font-size: 1.1rem;
|
| 137 |
+
color: var(--muted);
|
| 138 |
+
max-width: 620px;
|
| 139 |
+
}
|
| 140 |
+
|
| 141 |
+
.hero-actions {
|
| 142 |
+
display: flex;
|
| 143 |
+
gap: 1rem;
|
| 144 |
+
align-items: center;
|
| 145 |
+
margin: 1.6rem 0 1.2rem;
|
| 146 |
+
flex-wrap: wrap;
|
| 147 |
+
}
|
| 148 |
+
|
| 149 |
+
.btn {
|
| 150 |
+
display: inline-flex;
|
| 151 |
+
align-items: center;
|
| 152 |
+
justify-content: center;
|
| 153 |
+
gap: 0.6rem;
|
| 154 |
+
padding: 0.85rem 1.4rem;
|
| 155 |
+
border-radius: 12px;
|
| 156 |
+
font-weight: 700;
|
| 157 |
+
border: 1px solid transparent;
|
| 158 |
+
cursor: pointer;
|
| 159 |
+
transition: transform 0.2s ease, box-shadow 0.2s ease, background 0.2s ease, border-color 0.2s ease;
|
| 160 |
+
}
|
| 161 |
+
|
| 162 |
+
.btn.primary {
|
| 163 |
+
background: linear-gradient(135deg, #ff6b6b, #9b59b6);
|
| 164 |
+
color: #fff;
|
| 165 |
+
box-shadow: 0 15px 30px rgba(255, 107, 107, 0.25);
|
| 166 |
+
}
|
| 167 |
+
|
| 168 |
+
.btn.primary:hover {
|
| 169 |
+
transform: translateY(-2px);
|
| 170 |
+
box-shadow: 0 25px 45px rgba(255, 107, 107, 0.35);
|
| 171 |
+
}
|
| 172 |
+
|
| 173 |
+
.btn.ghost {
|
| 174 |
+
background: rgba(255, 255, 255, 0.05);
|
| 175 |
+
border-color: var(--border);
|
| 176 |
+
color: var(--text);
|
| 177 |
+
}
|
| 178 |
+
|
| 179 |
+
.btn.ghost:hover {
|
| 180 |
+
border-color: rgba(255, 255, 255, 0.3);
|
| 181 |
+
transform: translateY(-2px);
|
| 182 |
+
}
|
| 183 |
+
|
| 184 |
+
.btn.wide {
|
| 185 |
+
width: 100%;
|
| 186 |
+
justify-content: center;
|
| 187 |
+
}
|
| 188 |
+
|
| 189 |
+
.hero-badges {
|
| 190 |
+
display: flex;
|
| 191 |
+
flex-wrap: wrap;
|
| 192 |
+
gap: 0.6rem;
|
| 193 |
+
color: var(--muted);
|
| 194 |
+
font-size: 0.9rem;
|
| 195 |
+
}
|
| 196 |
+
|
| 197 |
+
.hero-badges span {
|
| 198 |
+
padding: 0.5rem 0.8rem;
|
| 199 |
+
border-radius: 10px;
|
| 200 |
+
border: 1px solid var(--border);
|
| 201 |
+
background: rgba(255, 255, 255, 0.04);
|
| 202 |
+
}
|
| 203 |
+
|
| 204 |
+
.hero-visual .glass-card {
|
| 205 |
+
background: rgba(255, 255, 255, 0.03);
|
| 206 |
+
border: 1px solid var(--border);
|
| 207 |
+
border-radius: 18px;
|
| 208 |
+
padding: 1.2rem;
|
| 209 |
+
box-shadow: var(--shadow);
|
| 210 |
+
backdrop-filter: blur(10px);
|
| 211 |
+
}
|
| 212 |
+
|
| 213 |
+
.hero-gif {
|
| 214 |
+
width: 100%;
|
| 215 |
+
max-width: 500px;
|
| 216 |
+
height: auto;
|
| 217 |
+
border-radius: 14px;
|
| 218 |
+
display: block;
|
| 219 |
+
margin: 0 auto;
|
| 220 |
+
}
|
| 221 |
+
|
| 222 |
+
.architecture-preview {
|
| 223 |
+
background: rgba(0, 0, 0, 0.4);
|
| 224 |
+
border-radius: 14px;
|
| 225 |
+
border: 1px solid var(--border);
|
| 226 |
+
padding: 1.5rem;
|
| 227 |
+
overflow-x: auto;
|
| 228 |
+
}
|
| 229 |
+
|
| 230 |
+
.architecture-preview pre {
|
| 231 |
+
font-family: "SF Mono", "Fira Code", "Consolas", monospace;
|
| 232 |
+
font-size: 0.85rem;
|
| 233 |
+
color: var(--accent);
|
| 234 |
+
white-space: pre;
|
| 235 |
+
margin: 0;
|
| 236 |
+
line-height: 1.5;
|
| 237 |
+
}
|
| 238 |
+
|
| 239 |
+
.caption {
|
| 240 |
+
margin-top: 0.75rem;
|
| 241 |
+
color: var(--muted);
|
| 242 |
+
font-size: 0.95rem;
|
| 243 |
+
}
|
| 244 |
+
|
| 245 |
+
.section {
|
| 246 |
+
max-width: 1200px;
|
| 247 |
+
margin: 0 auto;
|
| 248 |
+
padding: clamp(2rem, 4vw, 3.5rem) clamp(1.5rem, 3vw, 3rem);
|
| 249 |
+
}
|
| 250 |
+
|
| 251 |
+
.section-header {
|
| 252 |
+
text-align: center;
|
| 253 |
+
max-width: 780px;
|
| 254 |
+
margin: 0 auto 2rem;
|
| 255 |
+
}
|
| 256 |
+
|
| 257 |
+
.section-header h2 {
|
| 258 |
+
font-size: clamp(2rem, 3vw, 2.6rem);
|
| 259 |
+
margin-bottom: 0.5rem;
|
| 260 |
+
}
|
| 261 |
+
|
| 262 |
+
.intro {
|
| 263 |
+
color: var(--muted);
|
| 264 |
+
font-size: 1.05rem;
|
| 265 |
+
}
|
| 266 |
+
|
| 267 |
+
.feature-grid {
|
| 268 |
+
display: grid;
|
| 269 |
+
grid-template-columns: repeat(auto-fit, minmax(240px, 1fr));
|
| 270 |
+
gap: 1rem;
|
| 271 |
+
}
|
| 272 |
+
|
| 273 |
+
.feature-card {
|
| 274 |
+
background: rgba(255, 255, 255, 0.03);
|
| 275 |
+
border: 1px solid var(--border);
|
| 276 |
+
border-radius: 16px;
|
| 277 |
+
padding: 1.25rem;
|
| 278 |
+
box-shadow: 0 10px 30px rgba(0, 0, 0, 0.2);
|
| 279 |
+
transition: transform 0.2s ease, border-color 0.2s ease, box-shadow 0.2s ease;
|
| 280 |
+
}
|
| 281 |
+
|
| 282 |
+
.feature-card:hover {
|
| 283 |
+
transform: translateY(-4px);
|
| 284 |
+
border-color: rgba(255, 107, 107, 0.3);
|
| 285 |
+
box-shadow: 0 18px 40px rgba(0, 0, 0, 0.3);
|
| 286 |
+
}
|
| 287 |
+
|
| 288 |
+
.feature-card .icon {
|
| 289 |
+
width: 48px;
|
| 290 |
+
height: 48px;
|
| 291 |
+
border-radius: 12px;
|
| 292 |
+
display: grid;
|
| 293 |
+
place-items: center;
|
| 294 |
+
background: linear-gradient(145deg, rgba(255, 107, 107, 0.15), rgba(155, 89, 182, 0.15));
|
| 295 |
+
margin-bottom: 0.8rem;
|
| 296 |
+
font-size: 1.4rem;
|
| 297 |
+
}
|
| 298 |
+
|
| 299 |
+
.feature-card h3 {
|
| 300 |
+
margin-bottom: 0.35rem;
|
| 301 |
+
}
|
| 302 |
+
|
| 303 |
+
.feature-card p {
|
| 304 |
+
color: var(--muted);
|
| 305 |
+
}
|
| 306 |
+
|
| 307 |
+
.story {
|
| 308 |
+
padding-top: 1rem;
|
| 309 |
+
}
|
| 310 |
+
|
| 311 |
+
.story-grid {
|
| 312 |
+
display: grid;
|
| 313 |
+
grid-template-columns: repeat(auto-fit, minmax(280px, 1fr));
|
| 314 |
+
gap: 1rem;
|
| 315 |
+
}
|
| 316 |
+
|
| 317 |
+
.story-card {
|
| 318 |
+
background: rgba(255, 255, 255, 0.03);
|
| 319 |
+
border: 1px solid var(--border);
|
| 320 |
+
border-radius: 18px;
|
| 321 |
+
padding: 1.5rem;
|
| 322 |
+
box-shadow: var(--shadow);
|
| 323 |
+
}
|
| 324 |
+
|
| 325 |
+
.story-card.secondary {
|
| 326 |
+
background: linear-gradient(145deg, rgba(155, 89, 182, 0.1), rgba(255, 107, 107, 0.08));
|
| 327 |
+
}
|
| 328 |
+
|
| 329 |
+
.story-card.highlight {
|
| 330 |
+
background: linear-gradient(145deg, rgba(46, 204, 113, 0.15), rgba(52, 152, 219, 0.1));
|
| 331 |
+
border-color: rgba(46, 204, 113, 0.3);
|
| 332 |
+
}
|
| 333 |
+
|
| 334 |
+
.simulator-callout {
|
| 335 |
+
padding-top: 0;
|
| 336 |
+
}
|
| 337 |
+
|
| 338 |
+
.simulator-callout code {
|
| 339 |
+
background: rgba(0, 0, 0, 0.3);
|
| 340 |
+
padding: 0.2rem 0.5rem;
|
| 341 |
+
border-radius: 4px;
|
| 342 |
+
font-family: "SF Mono", "Fira Code", monospace;
|
| 343 |
+
font-size: 0.85rem;
|
| 344 |
+
}
|
| 345 |
+
|
| 346 |
+
.story-card h3 {
|
| 347 |
+
margin-bottom: 0.8rem;
|
| 348 |
+
}
|
| 349 |
+
|
| 350 |
+
.story-list {
|
| 351 |
+
list-style: none;
|
| 352 |
+
display: grid;
|
| 353 |
+
gap: 0.7rem;
|
| 354 |
+
color: var(--muted);
|
| 355 |
+
font-size: 0.98rem;
|
| 356 |
+
}
|
| 357 |
+
|
| 358 |
+
.story-list li {
|
| 359 |
+
display: flex;
|
| 360 |
+
gap: 0.7rem;
|
| 361 |
+
align-items: flex-start;
|
| 362 |
+
}
|
| 363 |
+
|
| 364 |
+
.story-text {
|
| 365 |
+
color: var(--muted);
|
| 366 |
+
line-height: 1.7;
|
| 367 |
+
margin-bottom: 1rem;
|
| 368 |
+
}
|
| 369 |
+
|
| 370 |
+
.chips {
|
| 371 |
+
display: flex;
|
| 372 |
+
flex-wrap: wrap;
|
| 373 |
+
gap: 0.5rem;
|
| 374 |
+
}
|
| 375 |
+
|
| 376 |
+
.chip {
|
| 377 |
+
padding: 0.45rem 0.8rem;
|
| 378 |
+
border-radius: 12px;
|
| 379 |
+
background: rgba(0, 0, 0, 0.3);
|
| 380 |
+
border: 1px solid var(--border);
|
| 381 |
+
color: var(--text);
|
| 382 |
+
font-size: 0.9rem;
|
| 383 |
+
}
|
| 384 |
+
|
| 385 |
+
.footer {
|
| 386 |
+
text-align: center;
|
| 387 |
+
color: var(--muted);
|
| 388 |
+
padding: 2rem 1.5rem 0;
|
| 389 |
+
max-width: 800px;
|
| 390 |
+
margin: 0 auto;
|
| 391 |
+
}
|
| 392 |
+
|
| 393 |
+
.footer a {
|
| 394 |
+
color: var(--accent);
|
| 395 |
+
border-bottom: 1px solid transparent;
|
| 396 |
+
}
|
| 397 |
+
|
| 398 |
+
.footer a:hover {
|
| 399 |
+
border-color: var(--accent);
|
| 400 |
+
}
|
| 401 |
+
|
| 402 |
+
@media (max-width: 768px) {
|
| 403 |
+
.hero {
|
| 404 |
+
padding-top: 2.5rem;
|
| 405 |
+
}
|
| 406 |
+
|
| 407 |
+
.topline {
|
| 408 |
+
flex-direction: column;
|
| 409 |
+
gap: 0.8rem;
|
| 410 |
+
align-items: flex-start;
|
| 411 |
+
}
|
| 412 |
+
|
| 413 |
+
.hero-actions {
|
| 414 |
+
width: 100%;
|
| 415 |
+
}
|
| 416 |
+
|
| 417 |
+
.btn {
|
| 418 |
+
width: 100%;
|
| 419 |
+
justify-content: center;
|
| 420 |
+
}
|
| 421 |
+
|
| 422 |
+
.hero-badges {
|
| 423 |
+
gap: 0.4rem;
|
| 424 |
+
}
|
| 425 |
+
}
|