shaunx Claude Opus 4.6 commited on
Commit
1ab5bef
·
0 Parent(s):

Initial release of ReachyClaw

Browse files

OpenClaw AI agent embodied in a Reachy Mini robot. OpenClaw is the actual
brain — every message is routed through OpenClaw, and it controls the robot
body (movement, emotions, camera) via inline action tags. OpenAI Realtime
API is used purely for voice I/O.

Based on ClawBody (Apache 2.0) with fundamental architecture changes:
- GPT-4o is a voice relay, not the brain
- OpenClaw controls all physical actions via [ACTION:param] tags
- No startup context fetch (instant startup)
- Dynamic action list built from TOOL_SPECS (single source of truth)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

.env.example ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ReachyClaw Configuration
2
+ # Give your OpenClaw AI agent a physical robot body!
3
+
4
+ # ==============================================================================
5
+ # REQUIRED: OpenAI API Key
6
+ # ==============================================================================
7
+ # Get your key at: https://platform.openai.com/api-keys
8
+ # Requires Realtime API access
9
+ OPENAI_API_KEY=sk-your-openai-key
10
+
11
+ # ==============================================================================
12
+ # REQUIRED: OpenClaw Gateway
13
+ # ==============================================================================
14
+ # The URL where your OpenClaw gateway is running
15
+ # If running on the same machine as the robot, use the host machine's IP
16
+ OPENCLAW_GATEWAY_URL=http://192.168.1.100:18789
17
+
18
+ # Your OpenClaw gateway authentication token
19
+ # Find this in ~/.openclaw/openclaw.json under gateway.token
20
+ OPENCLAW_TOKEN=your-gateway-token
21
+
22
+ # Agent ID to use (default: main)
23
+ OPENCLAW_AGENT_ID=main
24
+
25
+ # Session key for conversation context - IMPORTANT!
26
+ # Use "main" (default) to share context with WhatsApp and other DM channels
27
+ # This allows the robot to be aware of all your conversations
28
+ OPENCLAW_SESSION_KEY=main
29
+
30
+ # ==============================================================================
31
+ # OPTIONAL: Voice Settings
32
+ # ==============================================================================
33
+ # OpenAI Realtime voice (alloy, echo, fable, onyx, nova, shimmer, cedar)
34
+ OPENAI_VOICE=cedar
35
+
36
+ # OpenAI model for Realtime API
37
+ OPENAI_MODEL=gpt-4o-realtime-preview-2024-12-17
38
+
39
+ # ==============================================================================
40
+ # OPTIONAL: Features
41
+ # ==============================================================================
42
+ # Enable/disable features (true/false)
43
+ ENABLE_CAMERA=true
44
+ ENABLE_OPENCLAW_TOOLS=true
.github/workflows/sync-to-huggingface.yml ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Sync to Hugging Face Space
2
+
3
+ on:
4
+ push:
5
+ branches: [main]
6
+ workflow_dispatch: # Allow manual trigger
7
+
8
+ jobs:
9
+ sync-to-hub:
10
+ runs-on: ubuntu-latest
11
+ steps:
12
+ - name: Checkout repository
13
+ uses: actions/checkout@v4
14
+ with:
15
+ fetch-depth: 0
16
+ lfs: true
17
+
18
+ - name: Push to Hugging Face
19
+ env:
20
+ HF_TOKEN: ${{ secrets.HF_TOKEN }}
21
+ run: |
22
+ git remote add huggingface https://huggingface:$HF_TOKEN@huggingface.co/spaces/shaunx/reachyclaw
23
+ git push huggingface main --force
.gitignore ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ReachyClaw .gitignore
2
+
3
+ # Environment and secrets
4
+ .env
5
+ *.env.local
6
+
7
+ # Python
8
+ __pycache__/
9
+ *.py[cod]
10
+ *$py.class
11
+ *.so
12
+ .Python
13
+ build/
14
+ develop-eggs/
15
+ dist/
16
+ downloads/
17
+ eggs/
18
+ .eggs/
19
+ lib/
20
+ lib64/
21
+ parts/
22
+ sdist/
23
+ var/
24
+ wheels/
25
+ *.egg-info/
26
+ .installed.cfg
27
+ *.egg
28
+
29
+ # Virtual environments
30
+ .venv/
31
+ venv/
32
+ ENV/
33
+ env/
34
+
35
+ # IDE
36
+ .idea/
37
+ .vscode/
38
+ *.swp
39
+ *.swo
40
+ *~
41
+
42
+ # Testing
43
+ .pytest_cache/
44
+ .coverage
45
+ htmlcov/
46
+ .tox/
47
+ .nox/
48
+
49
+ # Type checking
50
+ .mypy_cache/
51
+ .dmypy.json
52
+ dmypy.json
53
+
54
+ # Logs
55
+ *.log
56
+
57
+ # OS
58
+ .DS_Store
59
+ Thumbs.db
60
+
61
+ # Package manager
62
+ uv.lock
CONTRIBUTING.md ADDED
@@ -0,0 +1,78 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Contributing to ReachyClaw
2
+
3
+ Thank you for your interest in contributing! This project welcomes contributions from the community.
4
+
5
+ ## How to Contribute
6
+
7
+ ### Reporting Bugs
8
+
9
+ If you find a bug, please open an issue with:
10
+ - A clear title and description
11
+ - Steps to reproduce the issue
12
+ - Expected vs actual behavior
13
+ - Your environment (OS, Python version, robot model)
14
+
15
+ ### Suggesting Features
16
+
17
+ Feature requests are welcome! Please open an issue with:
18
+ - A clear description of the feature
19
+ - Use cases and motivation
20
+ - Any technical considerations
21
+
22
+ ### Pull Requests
23
+
24
+ 1. Fork the repository
25
+ 2. Create a feature branch (`git checkout -b feature/amazing-feature`)
26
+ 3. Make your changes
27
+ 4. Add tests if applicable
28
+ 5. Run linting: `ruff check . && ruff format .`
29
+ 6. Commit your changes (`git commit -m 'Add amazing feature'`)
30
+ 7. Push to the branch (`git push origin feature/amazing-feature`)
31
+ 8. Open a Pull Request
32
+
33
+ ## Development Setup
34
+
35
+ ```bash
36
+ # Clone your fork
37
+ git clone https://github.com/YOUR_USERNAME/reachyclaw.git
38
+ cd reachyclaw
39
+
40
+ # Install in development mode
41
+ pip install -e ".[dev]"
42
+
43
+ # Run tests
44
+ pytest
45
+
46
+ # Format code
47
+ ruff check --fix .
48
+ ruff format .
49
+ ```
50
+
51
+ ## Code Style
52
+
53
+ - Follow PEP 8
54
+ - Use type hints
55
+ - Write docstrings for public functions and classes
56
+ - Keep functions focused and small
57
+
58
+ ## Where to Submit Contributions
59
+
60
+ ### This Project
61
+ Submit PRs directly to this repository for:
62
+ - Bug fixes
63
+ - New features
64
+ - Documentation improvements
65
+ - New personality profiles
66
+
67
+ ### Reachy Mini Ecosystem
68
+ - **SDK improvements**: [pollen-robotics/reachy_mini](https://github.com/pollen-robotics/reachy_mini)
69
+ - **New dances/emotions**: [reachy_mini_dances_library](https://github.com/pollen-robotics/reachy_mini_dances_library)
70
+ - **Apps for the app store**: Submit to [Hugging Face Spaces](https://huggingface.co/spaces)
71
+
72
+ ### OpenClaw Ecosystem
73
+ - **New skills**: Submit to [MoltDirectory](https://github.com/neonone123/moltdirectory)
74
+ - **Core OpenClaw**: [openclaw/openclaw](https://github.com/openclaw/openclaw)
75
+
76
+ ## License
77
+
78
+ By contributing, you agree that your contributions will be licensed under the Apache 2.0 License.
LICENSE ADDED
@@ -0,0 +1,191 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Apache License
2
+ Version 2.0, January 2004
3
+ http://www.apache.org/licenses/
4
+
5
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6
+
7
+ 1. Definitions.
8
+
9
+ "License" shall mean the terms and conditions for use, reproduction,
10
+ and distribution as defined by Sections 1 through 9 of this document.
11
+
12
+ "Licensor" shall mean the copyright owner or entity authorized by
13
+ the copyright owner that is granting the License.
14
+
15
+ "Legal Entity" shall mean the union of the acting entity and all
16
+ other entities that control, are controlled by, or are under common
17
+ control with that entity. For the purposes of this definition,
18
+ "control" means (i) the power, direct or indirect, to cause the
19
+ direction or management of such entity, whether by contract or
20
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
21
+ outstanding shares, or (iii) beneficial ownership of such entity.
22
+
23
+ "You" (or "Your") shall mean an individual or Legal Entity
24
+ exercising permissions granted by this License.
25
+
26
+ "Source" form shall mean the preferred form for making modifications,
27
+ including but not limited to software source code, documentation
28
+ source, and configuration files.
29
+
30
+ "Object" form shall mean any form resulting from mechanical
31
+ transformation or translation of a Source form, including but
32
+ not limited to compiled object code, generated documentation,
33
+ and conversions to other media types.
34
+
35
+ "Work" shall mean the work of authorship, whether in Source or
36
+ Object form, made available under the License, as indicated by a
37
+ copyright notice that is included in or attached to the work
38
+ (an example is provided in the Appendix below).
39
+
40
+ "Derivative Works" shall mean any work, whether in Source or Object
41
+ form, that is based on (or derived from) the Work and for which the
42
+ editorial revisions, annotations, elaborations, or other modifications
43
+ represent, as a whole, an original work of authorship. For the purposes
44
+ of this License, Derivative Works shall not include works that remain
45
+ separable from, or merely link (or bind by name) to the interfaces of,
46
+ the Work and Derivative Works thereof.
47
+
48
+ "Contribution" shall mean any work of authorship, including
49
+ the original version of the Work and any modifications or additions
50
+ to that Work or Derivative Works thereof, that is intentionally
51
+ submitted to the Licensor for inclusion in the Work by the copyright owner
52
+ or by an individual or Legal Entity authorized to submit on behalf of
53
+ the copyright owner. For the purposes of this definition, "submitted"
54
+ means any form of electronic, verbal, or written communication sent
55
+ to the Licensor or its representatives, including but not limited to
56
+ communication on electronic mailing lists, source code control systems,
57
+ and issue tracking systems that are managed by, or on behalf of, the
58
+ Licensor for the purpose of discussing and improving the Work, but
59
+ excluding communication that is conspicuously marked or otherwise
60
+ designated in writing by the copyright owner as "Not a Contribution."
61
+
62
+ "Contributor" shall mean Licensor and any individual or Legal Entity
63
+ on behalf of whom a Contribution has been received by Licensor and
64
+ subsequently incorporated within the Work.
65
+
66
+ 2. Grant of Copyright License. Subject to the terms and conditions of
67
+ this License, each Contributor hereby grants to You a perpetual,
68
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69
+ copyright license to reproduce, prepare Derivative Works of,
70
+ publicly display, publicly perform, sublicense, and distribute the
71
+ Work and such Derivative Works in Source or Object form.
72
+
73
+ 3. Grant of Patent License. Subject to the terms and conditions of
74
+ this License, each Contributor hereby grants to You a perpetual,
75
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76
+ (except as stated in this section) patent license to make, have made,
77
+ use, offer to sell, sell, import, and otherwise transfer the Work,
78
+ where such license applies only to those patent claims licensable
79
+ by such Contributor that are necessarily infringed by their
80
+ Contribution(s) alone or by combination of their Contribution(s)
81
+ with the Work to which such Contribution(s) was submitted. If You
82
+ institute patent litigation against any entity (including a
83
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
84
+ or a Contribution incorporated within the Work constitutes direct
85
+ or contributory patent infringement, then any patent licenses
86
+ granted to You under this License for that Work shall terminate
87
+ as of the date such litigation is filed.
88
+
89
+ 4. Redistribution. You may reproduce and distribute copies of the
90
+ Work or Derivative Works thereof in any medium, with or without
91
+ modifications, and in Source or Object form, provided that You
92
+ meet the following conditions:
93
+
94
+ (a) You must give any other recipients of the Work or
95
+ Derivative Works a copy of this License; and
96
+
97
+ (b) You must cause any modified files to carry prominent notices
98
+ stating that You changed the files; and
99
+
100
+ (c) You must retain, in the Source form of any Derivative Works
101
+ that You distribute, all copyright, patent, trademark, and
102
+ attribution notices from the Source form of the Work,
103
+ excluding those notices that do not pertain to any part of
104
+ the Derivative Works; and
105
+
106
+ (d) If the Work includes a "NOTICE" text file as part of its
107
+ distribution, then any Derivative Works that You distribute must
108
+ include a readable copy of the attribution notices contained
109
+ within such NOTICE file, excluding those notices that do not
110
+ pertain to any part of the Derivative Works, in at least one
111
+ of the following places: within a NOTICE text file distributed
112
+ as part of the Derivative Works; within the Source form or
113
+ documentation, if provided along with the Derivative Works; or,
114
+ within a display generated by the Derivative Works, if and
115
+ wherever such third-party notices normally appear. The contents
116
+ of the NOTICE file are for informational purposes only and
117
+ do not modify the License. You may add Your own attribution
118
+ notices within Derivative Works that You distribute, alongside
119
+ or as an addendum to the NOTICE text from the Work, provided
120
+ that such additional attribution notices cannot be construed
121
+ as modifying the License.
122
+
123
+ You may add Your own copyright statement to Your modifications and
124
+ may provide additional or different license terms and conditions
125
+ for use, reproduction, or distribution of Your modifications, or
126
+ for any such Derivative Works as a whole, provided Your use,
127
+ reproduction, and distribution of the Work otherwise complies with
128
+ the conditions stated in this License.
129
+
130
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
131
+ any Contribution intentionally submitted for inclusion in the Work
132
+ by You to the Licensor shall be under the terms and conditions of
133
+ this License, without any additional terms or conditions.
134
+ Notwithstanding the above, nothing herein shall supersede or modify
135
+ the terms of any separate license agreement you may have executed
136
+ with Licensor regarding such Contributions.
137
+
138
+ 6. Trademarks. This License does not grant permission to use the trade
139
+ names, trademarks, service marks, or product names of the Licensor,
140
+ except as required for reasonable and customary use in describing the
141
+ origin of the Work and reproducing the content of the NOTICE file.
142
+
143
+ 7. Disclaimer of Warranty. Unless required by applicable law or
144
+ agreed to in writing, Licensor provides the Work (and each
145
+ Contributor provides its Contributions) on an "AS IS" BASIS,
146
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147
+ implied, including, without limitation, any warranties or conditions
148
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149
+ PARTICULAR PURPOSE. You are solely responsible for determining the
150
+ appropriateness of using or redistributing the Work and assume any
151
+ risks associated with Your exercise of permissions under this License.
152
+
153
+ 8. Limitation of Liability. In no event and under no theory of
154
+ liability, whether in contract, strict liability, or tort
155
+ (including negligence or otherwise) arising in any way out of
156
+ the use or inability to use the Work (even if such Holder or
157
+ other party has been advised of the possibility of such damages),
158
+ shall any Contributor be liable to You for damages, including any
159
+ direct, indirect, special, incidental, or consequential damages of
160
+ any character arising as a result of this License or out of the use
161
+ or inability to use the Work (including but not limited to damages
162
+ for loss of goodwill, work stoppage, computer failure or malfunction,
163
+ or any and all other commercial damages or losses), even if such
164
+ Contributor has been advised of the possibility of such damages.
165
+
166
+ 9. Accepting Warranty or Additional Liability. While redistributing
167
+ the Work or Derivative Works thereof, You may choose to offer,
168
+ and charge a fee for, acceptance of support, warranty, indemnity,
169
+ or other liability obligations and/or rights consistent with this
170
+ License. However, in accepting such obligations, You may act only
171
+ on Your own behalf and on Your sole responsibility, not on behalf
172
+ of any other Contributor, and only if You agree to indemnify,
173
+ defend, and hold each Contributor harmless for any liability
174
+ incurred by, or claims asserted against, such Contributor by reason
175
+ of your accepting any such warranty or additional liability.
176
+
177
+ END OF TERMS AND CONDITIONS
178
+
179
+ Copyright 2024 Tom
180
+
181
+ Licensed under the Apache License, Version 2.0 (the "License");
182
+ you may not use this file except in compliance with the License.
183
+ You may obtain a copy of the License at
184
+
185
+ http://www.apache.org/licenses/LICENSE-2.0
186
+
187
+ Unless required by applicable law or agreed to in writing, software
188
+ distributed under the License is distributed on an "AS IS" BASIS,
189
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
190
+ See the License for the specific language governing permissions and
191
+ limitations under the License.
README.md ADDED
@@ -0,0 +1,185 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: ReachyClaw
3
+ emoji: 🤖
4
+ colorFrom: blue
5
+ colorTo: purple
6
+ sdk: static
7
+ pinned: false
8
+ short_description: OpenClaw AI agent with a Reachy Mini robot body
9
+ tags:
10
+ - reachy_mini
11
+ - reachy_mini_python_app
12
+ - openclaw
13
+ - robotics
14
+ - embodied-ai
15
+ - ai-assistant
16
+ - openai-realtime
17
+ - voice-assistant
18
+ - conversational-ai
19
+ - physical-ai
20
+ - robot-body
21
+ - speech-to-speech
22
+ - multimodal
23
+ - vision
24
+ - expressive-robot
25
+ - face-tracking
26
+ - human-robot-interaction
27
+ ---
28
+
29
+ # ReachyClaw
30
+
31
+ **Your OpenClaw AI agent, embodied in a Reachy Mini robot.**
32
+
33
+ ReachyClaw makes OpenClaw the actual brain of a Reachy Mini robot. Unlike typical setups where GPT-4o handles conversations and only calls OpenClaw occasionally, ReachyClaw routes **every** user message through OpenClaw. The robot speaks, moves, and sees — all controlled by your OpenClaw agent.
34
+
35
+ OpenAI Realtime API is used purely for voice I/O (speech-to-text and text-to-speech). Your OpenClaw agent decides what to say **and** how the robot moves.
36
+
37
+ ## Architecture
38
+
39
+ ```
40
+ User speaks -> OpenAI Realtime API (STT only)
41
+ -> GPT-4o calls ask_openclaw with the user's message
42
+ -> OpenClaw (the actual brain) responds with text + action tags
43
+ -> ReachyClaw parses action tags -> robot moves (emotions, look, dance)
44
+ -> Clean text -> GPT-4o (TTS only) -> Robot speaks
45
+ ```
46
+
47
+ OpenClaw can include action tags like `[EMOTION:happy]`, `[LOOK:left]`, `[DANCE:excited]` in its responses. These are parsed and executed on the robot, then stripped so only the spoken words go to TTS.
48
+
49
+ ## Features
50
+
51
+ - **OpenClaw is the brain** — every message goes through your OpenClaw agent, not GPT-4o
52
+ - **Full body control** — OpenClaw controls head movement, emotions, dances, and camera
53
+ - **Real-time voice** — OpenAI Realtime API for low-latency speech I/O
54
+ - **Face tracking** — robot tracks your face and maintains eye contact
55
+ - **Camera vision** — robot can see through its camera and describe what it sees
56
+ - **Conversation memory** — OpenClaw maintains full context across sessions and channels
57
+ - **Works with simulator** — no physical robot required
58
+
59
+ ## Available Robot Actions
60
+
61
+ OpenClaw can use these action tags in responses:
62
+
63
+ | Action | Tags |
64
+ |--------|------|
65
+ | **Look** | `[LOOK:left]` `[LOOK:right]` `[LOOK:up]` `[LOOK:down]` `[LOOK:front]` |
66
+ | **Emotion** | `[EMOTION:happy]` `[EMOTION:sad]` `[EMOTION:surprised]` `[EMOTION:curious]` `[EMOTION:thinking]` `[EMOTION:confused]` `[EMOTION:excited]` |
67
+ | **Dance** | `[DANCE:happy]` `[DANCE:excited]` `[DANCE:wave]` `[DANCE:nod]` `[DANCE:shake]` `[DANCE:bounce]` |
68
+ | **Camera** | `[CAMERA]` |
69
+ | **Face Tracking** | `[FACE_TRACKING:on]` `[FACE_TRACKING:off]` |
70
+ | **Stop** | `[STOP]` |
71
+
72
+ ## Prerequisites
73
+
74
+ ### Option A: With Physical Robot
75
+ - [Reachy Mini](https://www.pollen-robotics.com/reachy-mini/) robot (Wireless or Lite)
76
+
77
+ ### Option B: With Simulator
78
+ - Any computer with Python 3.11+
79
+ - Install: `pip install "reachy-mini[mujoco]"`
80
+
81
+ ### Software (Both Options)
82
+ - Python 3.11+
83
+ - [Reachy Mini SDK](https://github.com/pollen-robotics/reachy_mini)
84
+ - [OpenClaw](https://github.com/openclaw/openclaw) gateway running
85
+ - OpenAI API key with Realtime API access
86
+
87
+ ## Installation
88
+
89
+ ```bash
90
+ # Clone ReachyClaw
91
+ git clone https://github.com/shaunx/reachyclaw
92
+ cd reachyclaw
93
+
94
+ # Create virtual environment
95
+ python -m venv .venv
96
+ source .venv/bin/activate
97
+
98
+ # Install
99
+ pip install -e ".[mediapipe_vision]"
100
+
101
+ # Configure
102
+ cp .env.example .env
103
+ # Edit .env with your keys
104
+ ```
105
+
106
+ ## Configuration
107
+
108
+ Edit `.env`:
109
+
110
+ ```bash
111
+ # Required
112
+ OPENAI_API_KEY=sk-...your-key...
113
+
114
+ # OpenClaw Gateway (required)
115
+ OPENCLAW_GATEWAY_URL=http://localhost:18789
116
+ OPENCLAW_TOKEN=your-gateway-token
117
+ OPENCLAW_AGENT_ID=main
118
+
119
+ # Optional
120
+ OPENAI_VOICE=cedar
121
+ ENABLE_FACE_TRACKING=true
122
+ HEAD_TRACKER_TYPE=mediapipe
123
+ ```
124
+
125
+ ## Usage
126
+
127
+ ### With Simulator
128
+
129
+ ```bash
130
+ # Terminal 1: Start simulator
131
+ reachy-mini-daemon --sim
132
+
133
+ # Terminal 2: Run ReachyClaw
134
+ reachyclaw --gradio
135
+ ```
136
+
137
+ ### With Physical Robot
138
+
139
+ ```bash
140
+ reachyclaw
141
+
142
+ # With debug logging
143
+ reachyclaw --debug
144
+
145
+ # With specific robot
146
+ reachyclaw --robot-name my-reachy
147
+ ```
148
+
149
+ ### CLI Options
150
+
151
+ | Option | Description |
152
+ |--------|-------------|
153
+ | `--debug` | Enable debug logging |
154
+ | `--gradio` | Launch web UI instead of console mode |
155
+ | `--robot-name NAME` | Specify robot name for connection |
156
+ | `--gateway-url URL` | OpenClaw gateway URL |
157
+ | `--no-camera` | Disable camera functionality |
158
+ | `--no-openclaw` | Disable OpenClaw integration |
159
+ | `--head-tracker TYPE` | Face tracker: `mediapipe` or `yolo` |
160
+ | `--no-face-tracking` | Disable face tracking |
161
+
162
+ ## How It Differs from ClawBody
163
+
164
+ ClawBody (the stock app) uses GPT-4o as the brain and only calls OpenClaw occasionally for tools like calendar or weather. ReachyClaw inverts this:
165
+
166
+ | | ClawBody | ReachyClaw |
167
+ |---|---|---|
168
+ | **Brain** | GPT-4o (with OpenClaw context snapshot) | OpenClaw (every message) |
169
+ | **Body control** | GPT-4o decides movements | OpenClaw decides movements |
170
+ | **Startup** | 20-30s context fetch from OpenClaw | Instant (no context fetch needed) |
171
+ | **Memory** | Stale snapshot from startup | Live OpenClaw memory |
172
+ | **GPT-4o role** | Full agent | Voice relay only |
173
+
174
+ ## License
175
+
176
+ Apache 2.0 — see [LICENSE](LICENSE).
177
+
178
+ ## Acknowledgments
179
+
180
+ Built on top of:
181
+
182
+ - [Pollen Robotics](https://www.pollen-robotics.com/) — Reachy Mini robot, SDK, and simulator
183
+ - [OpenClaw](https://github.com/openclaw/openclaw) — AI agent framework
184
+ - [OpenAI](https://openai.com/) — Realtime API for voice I/O
185
+ - [ClawBody](https://github.com/tomrikert/clawbody) — Original Reachy Mini + OpenClaw app (Apache 2.0)
index.html ADDED
@@ -0,0 +1,201 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <title>ReachyClaw - Reachy Mini App</title>
7
+ <link rel="preconnect" href="https://fonts.googleapis.com">
8
+ <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
9
+ <link href="https://fonts.googleapis.com/css2?family=Space+Grotesk:wght@400;500;700&display=swap" rel="stylesheet">
10
+ <link rel="stylesheet" href="style.css">
11
+ </head>
12
+ <body>
13
+
14
+ <section class="hero">
15
+ <div class="topline">
16
+ <div class="brand">
17
+ <span class="logo">🤖</span>
18
+ <span class="brand-name">ReachyClaw</span>
19
+ </div>
20
+ <div class="pill">Voice conversation · OpenClaw brain · Full body control</div>
21
+ </div>
22
+
23
+ <div class="hero-grid">
24
+ <div class="hero-copy">
25
+ <div class="eyebrow">Reachy Mini App</div>
26
+ <h1>Your OpenClaw agent, embodied.</h1>
27
+ <p class="lede">
28
+ Give your OpenClaw AI agent a Reachy Mini robot body.
29
+ OpenClaw is the brain — it controls what the robot says,
30
+ how it moves, and what it sees. OpenAI Realtime API handles voice I/O.
31
+ </p>
32
+ <div class="hero-actions">
33
+ <a href="#simulator" class="btn primary">🖥️ Try with Simulator</a>
34
+ <a href="#features" class="btn ghost">See features</a>
35
+ </div>
36
+ <div class="hero-badges">
37
+ <span>🧠 OpenClaw brain</span>
38
+ <span>🎙️ OpenAI Realtime voice</span>
39
+ <span>💃 Full body control</span>
40
+ <span>🖥️ No robot required!</span>
41
+ </div>
42
+ </div>
43
+ <div class="hero-visual">
44
+ <div class="glass-card">
45
+ <img src="https://huggingface.co/spaces/pollen-robotics/reachy_mini_conversation_app/resolve/main/docs/assets/reachy_mini_dance.gif"
46
+ alt="Reachy Mini Robot Dancing"
47
+ class="hero-gif">
48
+ <p class="caption">Works with physical robot OR MuJoCo simulator!</p>
49
+ </div>
50
+ </div>
51
+ </div>
52
+ </section>
53
+
54
+ <section class="section simulator-callout" id="simulator">
55
+ <div class="story-card highlight">
56
+ <h2>🖥️ No Robot? No Problem!</h2>
57
+ <p class="story-text" style="font-size: 1.1rem;">
58
+ <strong>You don't need a physical Reachy Mini robot to use ReachyClaw!</strong><br><br>
59
+ ReachyClaw works with the Reachy Mini Simulator, a MuJoCo-based physics simulation
60
+ that runs on your computer. Watch your agent move and express emotions on screen
61
+ while you talk.
62
+ </p>
63
+ <div class="architecture-preview" style="margin: 1.5rem 0;">
64
+ <pre>
65
+ # Install simulator support
66
+ pip install "reachy-mini[mujoco]"
67
+
68
+ # Start the simulator (opens 3D window)
69
+ reachy-mini-daemon --sim
70
+
71
+ # In another terminal, run ReachyClaw
72
+ reachyclaw --gradio
73
+ </pre>
74
+ </div>
75
+ <p class="caption">🍎 Mac Users: Use <code>mjpython -m reachy_mini.daemon.app.main --sim</code> instead</p>
76
+ <a href="https://huggingface.co/docs/reachy_mini/platforms/simulation/get_started" class="btn primary" style="margin-top: 1rem;" target="_blank">
77
+ 📚 Simulator Setup Guide
78
+ </a>
79
+ </div>
80
+ </section>
81
+
82
+ <section class="section" id="features">
83
+ <div class="section-header">
84
+ <h2>What's inside</h2>
85
+ <p class="intro">
86
+ ReachyClaw makes OpenClaw the actual brain — every message, every movement, every decision.
87
+ </p>
88
+ </div>
89
+ <div class="feature-grid">
90
+ <div class="feature-card">
91
+ <div class="icon">🧠</div>
92
+ <h3>OpenClaw is the brain</h3>
93
+ <p>Every user message goes through your OpenClaw agent. No GPT-4o guessing — real responses with full tool access.</p>
94
+ </div>
95
+ <div class="feature-card">
96
+ <div class="icon">🎤</div>
97
+ <h3>Real-time voice</h3>
98
+ <p>OpenAI Realtime API for low-latency speech-to-text and text-to-speech. Voice I/O only — no GPT-4o brain.</p>
99
+ </div>
100
+ <div class="feature-card">
101
+ <div class="icon">🤖</div>
102
+ <h3>Full body control</h3>
103
+ <p>OpenClaw controls the robot body via action tags — head movement, emotions, dances, camera, face tracking.</p>
104
+ </div>
105
+ <div class="feature-card">
106
+ <div class="icon">👀</div>
107
+ <h3>Vision</h3>
108
+ <p>See through the robot's camera. Your agent can look around and describe what it sees.</p>
109
+ </div>
110
+ <div class="feature-card">
111
+ <div class="icon">🖥️</div>
112
+ <h3>Simulator support</h3>
113
+ <p>No robot? Run with MuJoCo simulator and watch your agent move in a 3D window.</p>
114
+ </div>
115
+ <div class="feature-card">
116
+ <div class="icon">⚡</div>
117
+ <h3>Instant startup</h3>
118
+ <p>No 30-second context fetch. GPT-4o is just a relay — the session starts immediately.</p>
119
+ </div>
120
+ </div>
121
+ </section>
122
+
123
+ <section class="section story" id="how-it-works">
124
+ <div class="story-grid">
125
+ <div class="story-card">
126
+ <h3>How it works</h3>
127
+ <p class="story-text">OpenClaw controls everything</p>
128
+ <ol class="story-list">
129
+ <li><span>🎤</span> Robot captures your voice</li>
130
+ <li><span>📝</span> OpenAI Realtime transcribes your speech</li>
131
+ <li><span>🧠</span> Your message goes to OpenClaw (the real brain)</li>
132
+ <li><span>🤖</span> OpenClaw responds with text + action tags like [EMOTION:happy]</li>
133
+ <li><span>💃</span> ReachyClaw executes the actions on the robot</li>
134
+ <li><span>🔊</span> Clean text goes to TTS — robot speaks while moving</li>
135
+ </ol>
136
+ </div>
137
+ <div class="story-card secondary">
138
+ <h3>Prerequisites</h3>
139
+ <p class="story-text">Choose your setup:</p>
140
+ <div class="chips">
141
+ <span class="chip">🧠 OpenClaw Gateway</span>
142
+ <span class="chip">🔑 OpenAI API Key</span>
143
+ <span class="chip">🐍 Python 3.11+</span>
144
+ </div>
145
+ <p class="story-text" style="margin-top: 1rem;">
146
+ <strong>Option A:</strong> 🤖 Physical Reachy Mini robot<br>
147
+ <strong>Option B:</strong> 🖥️ MuJoCo Simulator (free, no hardware!)
148
+ </p>
149
+ <a href="https://github.com/shaunx/reachyclaw#readme" class="btn ghost wide" style="margin-top: 1rem;">
150
+ View installation guide
151
+ </a>
152
+ </div>
153
+ </div>
154
+ </section>
155
+
156
+ <section class="section">
157
+ <div class="section-header">
158
+ <h2>Quick start</h2>
159
+ <p class="intro">Get ReachyClaw running with the simulator</p>
160
+ </div>
161
+ <div class="story-card">
162
+ <div class="architecture-preview">
163
+ <pre>
164
+ # Clone ReachyClaw
165
+ git clone https://github.com/shaunx/reachyclaw
166
+ cd reachyclaw
167
+
168
+ # Create virtual environment
169
+ python -m venv .venv
170
+ source .venv/bin/activate
171
+
172
+ # Install ReachyClaw + simulator
173
+ pip install -e .
174
+ pip install "reachy-mini[mujoco]"
175
+
176
+ # Configure (edit with your OpenClaw URL and OpenAI key)
177
+ cp .env.example .env
178
+ nano .env
179
+
180
+ # Terminal 1: Start simulator
181
+ reachy-mini-daemon --sim
182
+
183
+ # Terminal 2: Run ReachyClaw
184
+ reachyclaw --gradio
185
+ </pre>
186
+ </div>
187
+ </div>
188
+ </section>
189
+
190
+ <footer class="footer">
191
+ <p>
192
+ ReachyClaw — your OpenClaw agent, embodied in Reachy Mini.<br>
193
+ <strong>Works with physical robot OR simulator!</strong><br><br>
194
+ Learn more about <a href="https://github.com/openclaw/openclaw">OpenClaw</a>,
195
+ <a href="https://github.com/pollen-robotics/reachy_mini">Reachy Mini</a>, and
196
+ <a href="https://huggingface.co/docs/reachy_mini/platforms/simulation/get_started">the Simulator</a>.
197
+ </p>
198
+ </footer>
199
+
200
+ </body>
201
+ </html>
openclaw-skill/SKILL.md ADDED
@@ -0,0 +1,109 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: reachyclaw
3
+ description: Give your OpenClaw AI agent a physical robot body with Reachy Mini. OpenClaw is the brain — it controls speech, movement, and vision. Works with physical robot OR simulator!
4
+ ---
5
+
6
+ # ReachyClaw - Robot Body for OpenClaw
7
+
8
+ Give your OpenClaw agent a physical Reachy Mini robot body where OpenClaw is the actual brain.
9
+
10
+ ## Overview
11
+
12
+ ReachyClaw embodies your OpenClaw AI agent in a Reachy Mini robot. Unlike typical setups where GPT-4o is the brain, ReachyClaw routes every message through OpenClaw and lets it control the robot body via action tags.
13
+
14
+ - **Hear**: Listen to voice commands via the robot's microphone
15
+ - **See**: View the world through the robot's camera
16
+ - **Speak**: Respond with natural voice through the robot's speaker
17
+ - **Move**: Control head movements, emotions, and dances from OpenClaw
18
+
19
+ ## Architecture
20
+
21
+ ```
22
+ You speak -> Reachy Mini microphone
23
+ |
24
+ OpenAI Realtime API (STT only)
25
+ |
26
+ OpenClaw (the actual brain)
27
+ |
28
+ Response: "[EMOTION:happy] That's great!"
29
+ |
30
+ ReachyClaw parses actions -> robot moves
31
+ Clean text -> TTS -> robot speaks
32
+ ```
33
+
34
+ ## Requirements
35
+
36
+ ### Option A: Physical Robot
37
+ - [Reachy Mini](https://github.com/pollen-robotics/reachy_mini) robot (Wireless or Lite)
38
+
39
+ ### Option B: Simulator (No Hardware Required!)
40
+ - Any computer with Python 3.11+
41
+ - Install: `pip install "reachy-mini[mujoco]"`
42
+
43
+ ### Software (Both Options)
44
+ - Python 3.11+
45
+ - OpenAI API key with Realtime API access
46
+ - OpenClaw gateway running on your network
47
+
48
+ ## Installation
49
+
50
+ ```bash
51
+ git clone https://github.com/shaunx/reachyclaw
52
+ cd reachyclaw
53
+ pip install -e .
54
+ ```
55
+
56
+ ## Configuration
57
+
58
+ Create a `.env` file:
59
+
60
+ ```bash
61
+ OPENAI_API_KEY=sk-your-key-here
62
+ OPENCLAW_GATEWAY_URL=http://your-host-ip:18789
63
+ OPENCLAW_TOKEN=your-gateway-token
64
+ ```
65
+
66
+ ## Usage
67
+
68
+ ### With Simulator
69
+
70
+ ```bash
71
+ # Terminal 1: Start simulator
72
+ reachy-mini-daemon --sim
73
+
74
+ # Terminal 2: Run ReachyClaw
75
+ reachyclaw --gradio
76
+ ```
77
+
78
+ ### With Physical Robot
79
+
80
+ ```bash
81
+ reachyclaw
82
+
83
+ # With debug logging
84
+ reachyclaw --debug
85
+
86
+ # With Gradio web UI
87
+ reachyclaw --gradio
88
+ ```
89
+
90
+ ## Robot Actions
91
+
92
+ OpenClaw can include these action tags in its responses:
93
+
94
+ - `[LOOK:left/right/up/down/front]` — head movement
95
+ - `[EMOTION:happy/sad/surprised/curious/thinking/confused/excited]` — emotions
96
+ - `[DANCE:happy/excited/wave/nod/shake/bounce]` — dances
97
+ - `[CAMERA]` — capture what the robot sees
98
+ - `[FACE_TRACKING:on/off]` — toggle face tracking
99
+ - `[STOP]` — stop all movements
100
+
101
+ ## Links
102
+
103
+ - [GitHub Repository](https://github.com/shaunx/reachyclaw)
104
+ - [Reachy Mini SDK](https://github.com/pollen-robotics/reachy_mini)
105
+ - [OpenClaw Documentation](https://docs.openclaw.ai)
106
+
107
+ ## License
108
+
109
+ Apache 2.0
pyproject.toml ADDED
@@ -0,0 +1,120 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [build-system]
2
+ requires = ["setuptools>=61.0", "wheel"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ name = "reachyclaw"
7
+ version = "0.1.0"
8
+ description = "ReachyClaw - Give your OpenClaw AI agent a physical robot body with Reachy Mini. OpenClaw is the brain; OpenAI Realtime API handles voice I/O."
9
+ readme = "README.md"
10
+ license = {text = "Apache-2.0"}
11
+ requires-python = ">=3.11"
12
+ authors = [
13
+ {name = "Shaun"}
14
+ ]
15
+ keywords = [
16
+ "reachyclaw",
17
+ "reachy-mini",
18
+ "openclaw",
19
+ "robotics",
20
+ "ai-assistant",
21
+ "openai-realtime",
22
+ "voice-conversation",
23
+ "expressive-robot",
24
+ "embodied-ai"
25
+ ]
26
+ classifiers = [
27
+ "Development Status :: 4 - Beta",
28
+ "Intended Audience :: Developers",
29
+ "License :: OSI Approved :: Apache Software License",
30
+ "Programming Language :: Python :: 3",
31
+ "Programming Language :: Python :: 3.11",
32
+ "Programming Language :: Python :: 3.12",
33
+ "Topic :: Scientific/Engineering :: Artificial Intelligence",
34
+ "Topic :: Scientific/Engineering :: Human Machine Interfaces",
35
+ ]
36
+ dependencies = [
37
+ # OpenAI Realtime API
38
+ "openai>=1.50.0",
39
+
40
+ # Audio streaming
41
+ "fastrtc>=0.0.17",
42
+ "numpy",
43
+ "scipy",
44
+
45
+ # OpenClaw gateway client (WebSocket protocol)
46
+ "websockets>=12.0",
47
+
48
+ # Gradio UI
49
+ "gradio>=4.0.0",
50
+
51
+ # Environment
52
+ "python-dotenv",
53
+ ]
54
+
55
+ # Note: reachy-mini SDK must be installed separately from the robot or GitHub:
56
+ # pip install git+https://github.com/pollen-robotics/reachy_mini.git
57
+ # Or on the robot, it's pre-installed.
58
+
59
+ [project.optional-dependencies]
60
+ wireless = [
61
+ "pygobject",
62
+ ]
63
+ # YOLO-based face tracking (recommended - more accurate)
64
+ yolo_vision = [
65
+ "opencv-python",
66
+ "ultralytics",
67
+ "supervision",
68
+ ]
69
+ # MediaPipe-based face tracking (lighter weight alternative)
70
+ mediapipe_vision = [
71
+ "mediapipe>=0.10.14",
72
+ ]
73
+ # All vision options
74
+ all_vision = [
75
+ "opencv-python",
76
+ "ultralytics",
77
+ "supervision",
78
+ "mediapipe>=0.10.14",
79
+ ]
80
+ # Legacy alias
81
+ vision = [
82
+ "opencv-python",
83
+ "ultralytics",
84
+ "supervision",
85
+ ]
86
+ dev = [
87
+ "pytest",
88
+ "pytest-asyncio",
89
+ "ruff",
90
+ "mypy",
91
+ ]
92
+
93
+ [project.scripts]
94
+ reachyclaw = "reachy_mini_openclaw.main:main"
95
+
96
+ [project.entry-points."reachy_mini_apps"]
97
+ reachyclaw = "reachy_mini_openclaw.main:ReachyClawApp"
98
+
99
+ [project.urls]
100
+ Homepage = "https://github.com/shaunx/reachyclaw"
101
+ Documentation = "https://github.com/shaunx/reachyclaw#readme"
102
+ Repository = "https://github.com/shaunx/reachyclaw"
103
+ Issues = "https://github.com/shaunx/reachyclaw/issues"
104
+
105
+ [tool.setuptools.packages.find]
106
+ where = ["src"]
107
+
108
+ [tool.ruff]
109
+ line-length = 120
110
+ target-version = "py311"
111
+
112
+ [tool.ruff.lint]
113
+ select = ["E", "F", "I", "N", "W", "UP"]
114
+ ignore = ["E501"]
115
+
116
+ [tool.mypy]
117
+ python_version = "3.11"
118
+ warn_return_any = true
119
+ warn_unused_configs = true
120
+ ignore_missing_imports = true
src/reachy_mini_openclaw/__init__.py ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ """Reachy Mini OpenClaw - Give your OpenClaw AI agent a physical presence.
2
+
3
+ This package combines OpenAI's Realtime API for responsive voice conversation
4
+ with Reachy Mini's expressive robot movements, allowing your OpenClaw agent
5
+ to see, hear, and speak through a physical robot body.
6
+ """
7
+
8
+ __version__ = "0.1.0"
9
+ __author__ = "Tom"
src/reachy_mini_openclaw/audio/__init__.py ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ """Audio processing modules for Reachy Mini OpenClaw."""
2
+
3
+ from reachy_mini_openclaw.audio.head_wobbler import HeadWobbler
4
+
5
+ __all__ = ["HeadWobbler"]
src/reachy_mini_openclaw/audio/head_wobbler.py ADDED
@@ -0,0 +1,223 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Audio-driven head movement for natural speech animation.
2
+
3
+ This module analyzes audio output in real-time and generates subtle head
4
+ movements that make the robot appear more expressive and alive while speaking.
5
+
6
+ The wobble is generated based on:
7
+ - Audio amplitude (volume) -> vertical movement
8
+ - Frequency content -> horizontal sway
9
+ - Speech rhythm -> timing of movements
10
+
11
+ Design:
12
+ - Runs in a separate thread to avoid blocking the main audio pipeline
13
+ - Uses a circular buffer for smooth interpolation
14
+ - Generates offsets that are added to the primary pose by MovementManager
15
+ """
16
+
17
+ import base64
18
+ import logging
19
+ import threading
20
+ import time
21
+ from collections import deque
22
+ from typing import Callable, Optional, Tuple
23
+
24
+ import numpy as np
25
+ from numpy.typing import NDArray
26
+
27
+ logger = logging.getLogger(__name__)
28
+
29
+ # Type alias for speech offsets: (x, y, z, roll, pitch, yaw)
30
+ SpeechOffsets = Tuple[float, float, float, float, float, float]
31
+
32
+
33
+ class HeadWobbler:
34
+ """Generate audio-driven head movements for expressive speech.
35
+
36
+ The wobbler analyzes incoming audio and produces subtle head movements
37
+ that are synchronized with speech patterns, making the robot appear
38
+ more natural and engaged during conversation.
39
+
40
+ Example:
41
+ def apply_offsets(offsets):
42
+ movement_manager.set_speech_offsets(offsets)
43
+
44
+ wobbler = HeadWobbler(set_speech_offsets=apply_offsets)
45
+ wobbler.start()
46
+
47
+ # Feed audio as it's played
48
+ wobbler.feed(base64_audio_chunk)
49
+
50
+ wobbler.stop()
51
+ """
52
+
53
+ def __init__(
54
+ self,
55
+ set_speech_offsets: Callable[[SpeechOffsets], None],
56
+ sample_rate: int = 24000,
57
+ update_rate: float = 30.0, # Hz
58
+ ):
59
+ """Initialize the head wobbler.
60
+
61
+ Args:
62
+ set_speech_offsets: Callback to apply offsets to the movement system
63
+ sample_rate: Expected audio sample rate (Hz)
64
+ update_rate: How often to update offsets (Hz)
65
+ """
66
+ self.set_speech_offsets = set_speech_offsets
67
+ self.sample_rate = sample_rate
68
+ self.update_period = 1.0 / update_rate
69
+
70
+ # Audio analysis parameters
71
+ self.amplitude_scale = 0.008 # Max displacement in meters
72
+ self.roll_scale = 0.15 # Max roll in radians
73
+ self.pitch_scale = 0.08 # Max pitch in radians
74
+ self.smoothing = 0.3 # Smoothing factor (0-1)
75
+
76
+ # State
77
+ self._audio_buffer: deque[NDArray[np.float32]] = deque(maxlen=10)
78
+ self._buffer_lock = threading.Lock()
79
+ self._current_amplitude = 0.0
80
+ self._current_offsets: SpeechOffsets = (0.0, 0.0, 0.0, 0.0, 0.0, 0.0)
81
+
82
+ # Thread control
83
+ self._stop_event = threading.Event()
84
+ self._thread: Optional[threading.Thread] = None
85
+ self._last_feed_time = 0.0
86
+ self._is_speaking = False
87
+
88
+ # Decay parameters for smooth return to neutral
89
+ self._decay_rate = 3.0 # How fast to decay when not speaking
90
+ self._speech_timeout = 0.3 # Seconds of silence before decay starts
91
+
92
+ def start(self) -> None:
93
+ """Start the wobbler thread."""
94
+ if self._thread is not None and self._thread.is_alive():
95
+ logger.warning("HeadWobbler already running")
96
+ return
97
+
98
+ self._stop_event.clear()
99
+ self._thread = threading.Thread(target=self._run_loop, daemon=True)
100
+ self._thread.start()
101
+ logger.debug("HeadWobbler started")
102
+
103
+ def stop(self) -> None:
104
+ """Stop the wobbler thread."""
105
+ self._stop_event.set()
106
+ if self._thread is not None:
107
+ self._thread.join(timeout=1.0)
108
+ self._thread = None
109
+
110
+ # Reset to neutral
111
+ self.set_speech_offsets((0.0, 0.0, 0.0, 0.0, 0.0, 0.0))
112
+ logger.debug("HeadWobbler stopped")
113
+
114
+ def reset(self) -> None:
115
+ """Reset the wobbler state (call when speech ends or is interrupted)."""
116
+ with self._buffer_lock:
117
+ self._audio_buffer.clear()
118
+ self._current_amplitude = 0.0
119
+ self._is_speaking = False
120
+ self.set_speech_offsets((0.0, 0.0, 0.0, 0.0, 0.0, 0.0))
121
+ logger.debug("HeadWobbler reset")
122
+
123
+ def feed(self, audio_b64: str) -> None:
124
+ """Feed audio data to the wobbler.
125
+
126
+ Args:
127
+ audio_b64: Base64-encoded PCM audio (int16)
128
+ """
129
+ try:
130
+ audio_bytes = base64.b64decode(audio_b64)
131
+ audio_int16 = np.frombuffer(audio_bytes, dtype=np.int16)
132
+ audio_float = audio_int16.astype(np.float32) / 32768.0
133
+
134
+ with self._buffer_lock:
135
+ self._audio_buffer.append(audio_float)
136
+
137
+ self._last_feed_time = time.monotonic()
138
+ self._is_speaking = True
139
+
140
+ except Exception as e:
141
+ logger.debug("Error feeding audio to wobbler: %s", e)
142
+
143
+ def _compute_amplitude(self) -> float:
144
+ """Compute current audio amplitude from buffer."""
145
+ with self._buffer_lock:
146
+ if not self._audio_buffer:
147
+ return 0.0
148
+
149
+ # Concatenate recent audio
150
+ audio = np.concatenate(list(self._audio_buffer))
151
+
152
+ # RMS amplitude
153
+ rms = np.sqrt(np.mean(audio ** 2))
154
+ return min(1.0, rms * 3.0) # Scale and clamp
155
+
156
+ def _compute_offsets(self, amplitude: float, t: float) -> SpeechOffsets:
157
+ """Compute head offsets based on amplitude and time.
158
+
159
+ Args:
160
+ amplitude: Current audio amplitude (0-1)
161
+ t: Current time for oscillation
162
+
163
+ Returns:
164
+ Tuple of (x, y, z, roll, pitch, yaw) offsets
165
+ """
166
+ if amplitude < 0.01:
167
+ return (0.0, 0.0, 0.0, 0.0, 0.0, 0.0)
168
+
169
+ # Vertical bob based on amplitude
170
+ z_offset = amplitude * self.amplitude_scale * np.sin(t * 8.0)
171
+
172
+ # Subtle roll sway
173
+ roll_offset = amplitude * self.roll_scale * np.sin(t * 3.0)
174
+
175
+ # Pitch variation
176
+ pitch_offset = amplitude * self.pitch_scale * np.sin(t * 5.0 + 0.5)
177
+
178
+ # Small yaw drift
179
+ yaw_offset = amplitude * 0.05 * np.sin(t * 2.0)
180
+
181
+ return (0.0, 0.0, z_offset, roll_offset, pitch_offset, yaw_offset)
182
+
183
+ def _run_loop(self) -> None:
184
+ """Main wobbler loop."""
185
+ start_time = time.monotonic()
186
+
187
+ while not self._stop_event.is_set():
188
+ loop_start = time.monotonic()
189
+ t = loop_start - start_time
190
+
191
+ # Check if we're still receiving audio
192
+ silence_duration = loop_start - self._last_feed_time
193
+
194
+ if silence_duration > self._speech_timeout:
195
+ # Decay amplitude when not speaking
196
+ self._current_amplitude *= np.exp(-self._decay_rate * self.update_period)
197
+ self._is_speaking = False
198
+ else:
199
+ # Compute new amplitude with smoothing
200
+ raw_amplitude = self._compute_amplitude()
201
+ self._current_amplitude = (
202
+ self.smoothing * raw_amplitude +
203
+ (1 - self.smoothing) * self._current_amplitude
204
+ )
205
+
206
+ # Compute and apply offsets
207
+ offsets = self._compute_offsets(self._current_amplitude, t)
208
+
209
+ # Smooth transition between offsets
210
+ new_offsets = tuple(
211
+ self.smoothing * new + (1 - self.smoothing) * old
212
+ for new, old in zip(offsets, self._current_offsets)
213
+ )
214
+ self._current_offsets = new_offsets
215
+
216
+ # Apply to movement system
217
+ self.set_speech_offsets(new_offsets)
218
+
219
+ # Maintain update rate
220
+ elapsed = time.monotonic() - loop_start
221
+ sleep_time = max(0.0, self.update_period - elapsed)
222
+ if sleep_time > 0:
223
+ time.sleep(sleep_time)
src/reachy_mini_openclaw/camera_worker.py ADDED
@@ -0,0 +1,382 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Camera worker thread with frame buffering and face tracking.
2
+
3
+ Provides:
4
+ - 30Hz+ camera polling with thread-safe frame buffering
5
+ - Face tracking integration with smooth interpolation
6
+ - Room scanning when no face is detected
7
+ - Latest frame always available for tools
8
+ - Smooth return to neutral when face is lost
9
+
10
+ Based on pollen-robotics/reachy_mini_conversation_app camera worker.
11
+ """
12
+
13
+ import time
14
+ import logging
15
+ import threading
16
+ from typing import Any, List, Tuple, Optional
17
+
18
+ import numpy as np
19
+ from numpy.typing import NDArray
20
+ from scipy.spatial.transform import Rotation as R
21
+
22
+ from reachy_mini import ReachyMini
23
+ from reachy_mini.utils.interpolation import linear_pose_interpolation
24
+
25
+
26
+ logger = logging.getLogger(__name__)
27
+
28
+
29
+ class CameraWorker:
30
+ """Thread-safe camera worker with frame buffering and face tracking.
31
+
32
+ State machine for face tracking:
33
+ SCANNING -- no face known, sweeping the room to find one
34
+ TRACKING -- face detected, following it with head offsets
35
+ WAITING -- face just lost, holding position briefly
36
+ RETURNING -- interpolating back to neutral before scanning again
37
+ """
38
+
39
+ def __init__(self, reachy_mini: ReachyMini, head_tracker: Any = None) -> None:
40
+ """Initialize camera worker.
41
+
42
+ Args:
43
+ reachy_mini: Connected ReachyMini instance
44
+ head_tracker: Optional head tracker (YOLO or MediaPipe)
45
+ """
46
+ self.reachy_mini = reachy_mini
47
+ self.head_tracker = head_tracker
48
+
49
+ # Thread-safe frame storage
50
+ self.latest_frame: Optional[NDArray[np.uint8]] = None
51
+ self.frame_lock = threading.Lock()
52
+ self._stop_event = threading.Event()
53
+ self._thread: Optional[threading.Thread] = None
54
+
55
+ # Face tracking state
56
+ self.is_head_tracking_enabled = True
57
+ self.face_tracking_offsets: List[float] = [
58
+ 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
59
+ ] # x, y, z, roll, pitch, yaw
60
+ self.face_tracking_lock = threading.Lock()
61
+
62
+ # Face tracking timing (for smooth interpolation back to neutral)
63
+ self.last_face_detected_time: Optional[float] = None
64
+ self.interpolation_start_time: Optional[float] = None
65
+ self.interpolation_start_pose: Optional[NDArray[np.float32]] = None
66
+ self.face_lost_delay = 2.0 # seconds to wait before starting interpolation
67
+ self.interpolation_duration = 1.0 # seconds to interpolate back to neutral
68
+
69
+ # Track state changes
70
+ self.previous_head_tracking_state = self.is_head_tracking_enabled
71
+
72
+ # Tracking scale factor (proportional gain for the camera-head servo loop).
73
+ # 0.85 provides accurate convergence via closed-loop feedback while
74
+ # avoiding single-frame overshoot that causes jitter.
75
+ self.tracking_scale = 0.85
76
+
77
+ # Smoothing factor for exponential moving average (0.0-1.0)
78
+ # At 25Hz with alpha=0.25, 95% convergence ~0.5s -- smooth enough to
79
+ # filter detection noise, responsive enough to feel like eye contact.
80
+ self.smoothing_alpha = 0.25
81
+
82
+ # Previous smoothed offsets for EMA calculation
83
+ self._smoothed_offsets: List[float] = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
84
+
85
+ # --- Room scanning state ---
86
+ # When no face is visible, the robot periodically sweeps the room.
87
+ self._scanning = False
88
+ self._scanning_start_time = 0.0
89
+ # Scanning pattern: sinusoidal yaw sweep
90
+ self._scan_yaw_amplitude = np.deg2rad(35) # ±35 degrees
91
+ self._scan_period = 8.0 # seconds for a full left-right-left cycle
92
+ self._scan_pitch_offset = np.deg2rad(3) # slight upward tilt while scanning
93
+ # Start scanning immediately at boot (before any face has ever been seen)
94
+ self._ever_seen_face = False
95
+
96
+ def get_latest_frame(self) -> Optional[NDArray[np.uint8]]:
97
+ """Get the latest frame (thread-safe).
98
+
99
+ Returns:
100
+ Copy of latest frame in BGR format, or None if no frame available
101
+ """
102
+ with self.frame_lock:
103
+ if self.latest_frame is None:
104
+ return None
105
+ return self.latest_frame.copy()
106
+
107
+ def get_face_tracking_offsets(
108
+ self,
109
+ ) -> Tuple[float, float, float, float, float, float]:
110
+ """Get current face tracking offsets (thread-safe).
111
+
112
+ Returns:
113
+ Tuple of (x, y, z, roll, pitch, yaw) offsets
114
+ """
115
+ with self.face_tracking_lock:
116
+ offsets = self.face_tracking_offsets
117
+ return (offsets[0], offsets[1], offsets[2], offsets[3], offsets[4], offsets[5])
118
+
119
+ def set_head_tracking_enabled(self, enabled: bool) -> None:
120
+ """Enable/disable head tracking.
121
+
122
+ Args:
123
+ enabled: Whether to enable face tracking
124
+ """
125
+ if enabled and not self.is_head_tracking_enabled:
126
+ # Reset smoothed offsets so tracking converges quickly from scratch
127
+ self._smoothed_offsets = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
128
+ # Start scanning immediately when re-enabled
129
+ self._start_scanning()
130
+ self.is_head_tracking_enabled = enabled
131
+ logger.info("Head tracking %s", "enabled" if enabled else "disabled")
132
+
133
+ def start(self) -> None:
134
+ """Start the camera worker loop in a thread."""
135
+ self._stop_event.clear()
136
+ self._thread = threading.Thread(target=self._working_loop, daemon=True)
137
+ self._thread.start()
138
+ logger.info("Camera worker started")
139
+
140
+ def stop(self) -> None:
141
+ """Stop the camera worker loop."""
142
+ self._stop_event.set()
143
+ if self._thread is not None:
144
+ self._thread.join(timeout=2.0)
145
+ logger.info("Camera worker stopped")
146
+
147
+ # ------------------------------------------------------------------
148
+ # Scanning helpers
149
+ # ------------------------------------------------------------------
150
+
151
+ def _start_scanning(self) -> None:
152
+ """Begin the room-scanning sweep."""
153
+ if not self._scanning:
154
+ self._scanning = True
155
+ self._scanning_start_time = time.time()
156
+ logger.debug("Started room scanning")
157
+
158
+ def _stop_scanning(self) -> None:
159
+ """Stop the room-scanning sweep."""
160
+ if self._scanning:
161
+ self._scanning = False
162
+ logger.debug("Stopped room scanning")
163
+
164
+ def _update_scanning_offsets(self, current_time: float) -> None:
165
+ """Compute scanning offsets -- a slow yaw sweep with slight pitch up.
166
+
167
+ The sweep is sinusoidal so the head slows at the extremes (more natural)
168
+ and the face detector gets a chance to catch faces at the edges.
169
+ """
170
+ t = current_time - self._scanning_start_time
171
+
172
+ yaw = float(self._scan_yaw_amplitude * np.sin(2 * np.pi * t / self._scan_period))
173
+ pitch = float(self._scan_pitch_offset)
174
+
175
+ with self.face_tracking_lock:
176
+ self.face_tracking_offsets = [0.0, 0.0, 0.0, 0.0, pitch, yaw]
177
+
178
+ # ------------------------------------------------------------------
179
+ # Main loop
180
+ # ------------------------------------------------------------------
181
+
182
+ def _working_loop(self) -> None:
183
+ """Main camera worker loop.
184
+
185
+ Runs at ~25Hz, captures frames and processes face tracking.
186
+ """
187
+ logger.debug("Starting camera working loop")
188
+
189
+ # Neutral pose for interpolation target
190
+ neutral_pose = np.eye(4, dtype=np.float32)
191
+ self.previous_head_tracking_state = self.is_head_tracking_enabled
192
+
193
+ # Begin scanning right away so the robot looks for a face on startup
194
+ if self.is_head_tracking_enabled and self.head_tracker is not None:
195
+ self._start_scanning()
196
+
197
+ while not self._stop_event.is_set():
198
+ try:
199
+ current_time = time.time()
200
+
201
+ # Get frame from robot
202
+ frame = self.reachy_mini.media.get_frame()
203
+
204
+ if frame is not None:
205
+ # Thread-safe frame storage
206
+ with self.frame_lock:
207
+ self.latest_frame = frame
208
+
209
+ # Check if face tracking was just disabled
210
+ if self.previous_head_tracking_state and not self.is_head_tracking_enabled:
211
+ # Face tracking was just disabled - start interpolation to neutral
212
+ self.last_face_detected_time = current_time
213
+ self.interpolation_start_time = None
214
+ self.interpolation_start_pose = None
215
+ self._stop_scanning()
216
+
217
+ # Update tracking state
218
+ self.previous_head_tracking_state = self.is_head_tracking_enabled
219
+
220
+ # Handle face tracking if enabled and head tracker available
221
+ if self.is_head_tracking_enabled and self.head_tracker is not None:
222
+ self._process_face_tracking(frame, current_time, neutral_pose)
223
+ elif self.last_face_detected_time is not None:
224
+ # Handle interpolation back to neutral when tracking disabled
225
+ self._interpolate_to_neutral(current_time, neutral_pose)
226
+
227
+ # Sleep to maintain ~25Hz
228
+ time.sleep(0.04)
229
+
230
+ except Exception as e:
231
+ logger.error("Camera worker error: %s", e)
232
+ time.sleep(0.1)
233
+
234
+ logger.debug("Camera worker thread exited")
235
+
236
+ def _process_face_tracking(
237
+ self,
238
+ frame: NDArray[np.uint8],
239
+ current_time: float,
240
+ neutral_pose: NDArray[np.float32]
241
+ ) -> None:
242
+ """Process face tracking from frame.
243
+
244
+ Args:
245
+ frame: Current camera frame
246
+ current_time: Current timestamp
247
+ neutral_pose: Neutral pose matrix for interpolation
248
+ """
249
+ eye_center, _ = self.head_tracker.get_head_position(frame)
250
+
251
+ if eye_center is not None:
252
+ # Face detected!
253
+ if not self._ever_seen_face:
254
+ self._ever_seen_face = True
255
+ logger.info("Face detected for the first time")
256
+
257
+ # Stop scanning if we were scanning
258
+ if self._scanning:
259
+ self._stop_scanning()
260
+ # Seed the EMA from current scanning offsets for smooth transition
261
+ with self.face_tracking_lock:
262
+ self._smoothed_offsets = list(self.face_tracking_offsets)
263
+
264
+ self.last_face_detected_time = current_time
265
+ self.interpolation_start_time = None # Stop any interpolation
266
+
267
+ # Convert normalized coordinates to pixel coordinates
268
+ h, w = frame.shape[:2]
269
+ eye_center_norm = (eye_center + 1) / 2
270
+ eye_center_pixels = [
271
+ eye_center_norm[0] * w,
272
+ eye_center_norm[1] * h,
273
+ ]
274
+
275
+ # Get the head pose needed to look at the target
276
+ target_pose = self.reachy_mini.look_at_image(
277
+ eye_center_pixels[0],
278
+ eye_center_pixels[1],
279
+ duration=0.0,
280
+ perform_movement=False,
281
+ )
282
+
283
+ # Extract translation and rotation from the target pose
284
+ translation = target_pose[:3, 3]
285
+ rotation = R.from_matrix(target_pose[:3, :3]).as_euler("xyz", degrees=False)
286
+
287
+ # Scale for smoother closed-loop convergence
288
+ translation *= self.tracking_scale
289
+ rotation *= self.tracking_scale
290
+
291
+ # Apply exponential moving average (EMA) smoothing to reduce jitter
292
+ # new_smoothed = alpha * new_value + (1 - alpha) * old_smoothed
293
+ alpha = self.smoothing_alpha
294
+ new_offsets = [
295
+ translation[0], translation[1], translation[2],
296
+ rotation[0], rotation[1], rotation[2],
297
+ ]
298
+
299
+ smoothed = [
300
+ alpha * new_offsets[i] + (1 - alpha) * self._smoothed_offsets[i]
301
+ for i in range(6)
302
+ ]
303
+ self._smoothed_offsets = smoothed
304
+
305
+ # Thread-safe update of face tracking offsets
306
+ with self.face_tracking_lock:
307
+ self.face_tracking_offsets = smoothed
308
+
309
+ else:
310
+ # No face detected
311
+ if self._scanning:
312
+ # Already scanning -- keep sweeping the room
313
+ self._update_scanning_offsets(current_time)
314
+ else:
315
+ # Not scanning yet -- go through the wait/return/scan sequence
316
+ self._interpolate_to_neutral(current_time, neutral_pose)
317
+
318
+ def _interpolate_to_neutral(
319
+ self,
320
+ current_time: float,
321
+ neutral_pose: NDArray[np.float32]
322
+ ) -> None:
323
+ """Interpolate face tracking offsets back to neutral when face is lost.
324
+
325
+ Once interpolation completes, automatically starts room scanning.
326
+
327
+ Args:
328
+ current_time: Current timestamp
329
+ neutral_pose: Target neutral pose matrix
330
+ """
331
+ if self.last_face_detected_time is None:
332
+ # Never seen a face -- go straight to scanning
333
+ self._start_scanning()
334
+ return
335
+
336
+ time_since_face_lost = current_time - self.last_face_detected_time
337
+
338
+ if time_since_face_lost >= self.face_lost_delay:
339
+ # Start interpolation if not already started
340
+ if self.interpolation_start_time is None:
341
+ self.interpolation_start_time = current_time
342
+ # Capture current pose as start of interpolation
343
+ with self.face_tracking_lock:
344
+ current_translation = self.face_tracking_offsets[:3]
345
+ current_rotation_euler = self.face_tracking_offsets[3:]
346
+ # Convert to 4x4 pose matrix
347
+ pose_matrix = np.eye(4, dtype=np.float32)
348
+ pose_matrix[:3, 3] = current_translation
349
+ pose_matrix[:3, :3] = R.from_euler(
350
+ "xyz", current_rotation_euler
351
+ ).as_matrix()
352
+ self.interpolation_start_pose = pose_matrix
353
+
354
+ # Calculate interpolation progress (t from 0 to 1)
355
+ elapsed_interpolation = current_time - self.interpolation_start_time
356
+ t = min(1.0, elapsed_interpolation / self.interpolation_duration)
357
+
358
+ # Interpolate between current pose and neutral pose
359
+ interpolated_pose = linear_pose_interpolation(
360
+ self.interpolation_start_pose,
361
+ neutral_pose,
362
+ t,
363
+ )
364
+
365
+ # Extract translation and rotation from interpolated pose
366
+ translation = interpolated_pose[:3, 3]
367
+ rotation = R.from_matrix(interpolated_pose[:3, :3]).as_euler("xyz", degrees=False)
368
+
369
+ # Thread-safe update of face tracking offsets
370
+ with self.face_tracking_lock:
371
+ self.face_tracking_offsets = [
372
+ translation[0], translation[1], translation[2],
373
+ rotation[0], rotation[1], rotation[2],
374
+ ]
375
+
376
+ # If interpolation is complete, start scanning the room
377
+ if t >= 1.0:
378
+ self.last_face_detected_time = None
379
+ self.interpolation_start_time = None
380
+ self.interpolation_start_pose = None
381
+ self._smoothed_offsets = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
382
+ self._start_scanning()
src/reachy_mini_openclaw/config.py ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Configuration management for Reachy Mini OpenClaw.
2
+
3
+ Handles environment variables and configuration settings for the application.
4
+ """
5
+
6
+ import os
7
+ from pathlib import Path
8
+ from dataclasses import dataclass, field
9
+ from typing import Optional
10
+
11
+ from dotenv import load_dotenv
12
+
13
+ # Load environment variables from .env file
14
+ _project_root = Path(__file__).parent.parent.parent
15
+ load_dotenv(_project_root / ".env")
16
+
17
+
18
+ @dataclass
19
+ class Config:
20
+ """Application configuration loaded from environment variables."""
21
+
22
+ # OpenAI Configuration
23
+ OPENAI_API_KEY: str = field(default_factory=lambda: os.getenv("OPENAI_API_KEY", ""))
24
+ OPENAI_MODEL: str = field(default_factory=lambda: os.getenv("OPENAI_MODEL", "gpt-4o-realtime-preview-2024-12-17"))
25
+ OPENAI_VOICE: str = field(default_factory=lambda: os.getenv("OPENAI_VOICE", "cedar"))
26
+
27
+ # OpenClaw Gateway Configuration
28
+ OPENCLAW_GATEWAY_URL: str = field(default_factory=lambda: os.getenv("OPENCLAW_GATEWAY_URL", "ws://localhost:18789"))
29
+ OPENCLAW_TOKEN: Optional[str] = field(default_factory=lambda: os.getenv("OPENCLAW_TOKEN"))
30
+ OPENCLAW_AGENT_ID: str = field(default_factory=lambda: os.getenv("OPENCLAW_AGENT_ID", "main"))
31
+ # Session key for OpenClaw - uses "main" to share context with WhatsApp and other channels
32
+ # Format: agent:<agent_id>:<session_key>, but we only need the session key part here
33
+ OPENCLAW_SESSION_KEY: str = field(default_factory=lambda: os.getenv("OPENCLAW_SESSION_KEY", "main"))
34
+
35
+ # Robot Configuration
36
+ ROBOT_NAME: Optional[str] = field(default_factory=lambda: os.getenv("ROBOT_NAME"))
37
+
38
+ # Feature Flags
39
+ ENABLE_OPENCLAW_TOOLS: bool = field(default_factory=lambda: os.getenv("ENABLE_OPENCLAW_TOOLS", "true").lower() == "true")
40
+ ENABLE_CAMERA: bool = field(default_factory=lambda: os.getenv("ENABLE_CAMERA", "true").lower() == "true")
41
+ ENABLE_FACE_TRACKING: bool = field(default_factory=lambda: os.getenv("ENABLE_FACE_TRACKING", "true").lower() == "true")
42
+
43
+ # Face Tracking Configuration
44
+ # Options: "yolo", "mediapipe", or None for auto-detect
45
+ HEAD_TRACKER_TYPE: Optional[str] = field(default_factory=lambda: os.getenv("HEAD_TRACKER_TYPE", "yolo"))
46
+
47
+ # Local Vision Processing
48
+ ENABLE_LOCAL_VISION: bool = field(default_factory=lambda: os.getenv("ENABLE_LOCAL_VISION", "false").lower() == "true")
49
+ LOCAL_VISION_MODEL: str = field(default_factory=lambda: os.getenv("LOCAL_VISION_MODEL", "HuggingFaceTB/SmolVLM2-256M-Video-Instruct"))
50
+ VISION_DEVICE: str = field(default_factory=lambda: os.getenv("VISION_DEVICE", "auto")) # "auto", "cuda", "mps", "cpu"
51
+ HF_HOME: str = field(default_factory=lambda: os.getenv("HF_HOME", os.path.expanduser("~/.cache/huggingface")))
52
+
53
+ # Custom Profile (for personality customization)
54
+ CUSTOM_PROFILE: Optional[str] = field(default_factory=lambda: os.getenv("REACHY_MINI_CUSTOM_PROFILE"))
55
+
56
+ def validate(self) -> list[str]:
57
+ """Validate configuration and return list of errors."""
58
+ errors = []
59
+ if not self.OPENAI_API_KEY:
60
+ errors.append("OPENAI_API_KEY is required")
61
+ return errors
62
+
63
+
64
+ # Global configuration instance
65
+ config = Config()
66
+
67
+
68
+ def set_custom_profile(profile: Optional[str]) -> None:
69
+ """Update the custom profile at runtime."""
70
+ global config
71
+ config.CUSTOM_PROFILE = profile
72
+ os.environ["REACHY_MINI_CUSTOM_PROFILE"] = profile or ""
73
+
74
+
75
+ def set_face_tracking_enabled(enabled: bool) -> None:
76
+ """Enable or disable face tracking at runtime."""
77
+ global config
78
+ config.ENABLE_FACE_TRACKING = enabled
79
+
80
+
81
+ def set_local_vision_enabled(enabled: bool) -> None:
82
+ """Enable or disable local vision processing at runtime."""
83
+ global config
84
+ config.ENABLE_LOCAL_VISION = enabled
src/reachy_mini_openclaw/gradio_app.py ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Gradio web UI for Reachy Mini OpenClaw.
2
+
3
+ This module provides a web interface for:
4
+ - Viewing conversation transcripts
5
+ - Configuring the assistant personality
6
+ - Monitoring robot status
7
+ - Manual control options
8
+ """
9
+
10
+ import os
11
+ import logging
12
+ from typing import Optional
13
+
14
+ import gradio as gr
15
+
16
+ logger = logging.getLogger(__name__)
17
+
18
+
19
+ def launch_gradio(
20
+ gateway_url: str = "ws://localhost:18789",
21
+ robot_name: Optional[str] = None,
22
+ enable_camera: bool = True,
23
+ enable_openclaw: bool = True,
24
+ enable_face_tracking: bool = True,
25
+ head_tracker_type: Optional[str] = None,
26
+ share: bool = False,
27
+ ) -> None:
28
+ """Launch the Gradio web UI.
29
+
30
+ Args:
31
+ gateway_url: OpenClaw gateway URL
32
+ robot_name: Robot name for connection
33
+ enable_camera: Whether to enable camera
34
+ enable_openclaw: Whether to enable OpenClaw
35
+ enable_face_tracking: Whether to enable face tracking
36
+ head_tracker_type: Head tracker type ('yolo', 'mediapipe', or None)
37
+ share: Whether to create a public URL
38
+ """
39
+ from reachy_mini_openclaw.prompts import get_available_profiles, save_custom_profile
40
+ from reachy_mini_openclaw.config import set_custom_profile, config
41
+
42
+ # State
43
+ app_instance = None
44
+
45
+ def start_conversation():
46
+ """Start the conversation."""
47
+ nonlocal app_instance
48
+
49
+ from reachy_mini_openclaw.main import ReachyClawCore
50
+ import asyncio
51
+ import threading
52
+
53
+ if app_instance is not None:
54
+ return "Already running"
55
+
56
+ try:
57
+ app_instance = ReachyClawCore(
58
+ gateway_url=gateway_url,
59
+ robot_name=robot_name,
60
+ enable_camera=enable_camera,
61
+ enable_openclaw=enable_openclaw,
62
+ enable_face_tracking=enable_face_tracking,
63
+ head_tracker_type=head_tracker_type,
64
+ )
65
+
66
+ # Run in background thread
67
+ def run_app():
68
+ loop = asyncio.new_event_loop()
69
+ asyncio.set_event_loop(loop)
70
+ try:
71
+ loop.run_until_complete(app_instance.run())
72
+ except Exception as e:
73
+ logger.error("App error: %s", e)
74
+ finally:
75
+ loop.close()
76
+
77
+ thread = threading.Thread(target=run_app, daemon=True)
78
+ thread.start()
79
+
80
+ return "Started successfully"
81
+ except Exception as e:
82
+ return f"Error: {e}"
83
+
84
+ def stop_conversation():
85
+ """Stop the conversation."""
86
+ nonlocal app_instance
87
+
88
+ if app_instance is None:
89
+ return "Not running"
90
+
91
+ try:
92
+ app_instance.stop()
93
+ app_instance = None
94
+ return "Stopped"
95
+ except Exception as e:
96
+ return f"Error: {e}"
97
+
98
+ def apply_profile(profile_name):
99
+ """Apply a personality profile."""
100
+ set_custom_profile(profile_name if profile_name else None)
101
+ return f"Applied profile: {profile_name or 'default'}"
102
+
103
+ def save_profile(name, instructions):
104
+ """Save a new profile."""
105
+ if save_custom_profile(name, instructions):
106
+ return f"Saved profile: {name}"
107
+ return "Error saving profile"
108
+
109
+ # Build UI
110
+ with gr.Blocks(title="Reachy Mini OpenClaw") as demo:
111
+ gr.Markdown("""
112
+ # 🤖 Reachy Mini OpenClaw
113
+
114
+ Give your OpenClaw AI agent a physical presence with Reachy Mini.
115
+ Using OpenAI Realtime API for responsive voice conversation.
116
+ """)
117
+
118
+ with gr.Tab("Conversation"):
119
+ with gr.Row():
120
+ start_btn = gr.Button("▶️ Start", variant="primary")
121
+ stop_btn = gr.Button("⏹️ Stop", variant="secondary")
122
+
123
+ status_text = gr.Textbox(label="Status", interactive=False)
124
+
125
+ transcript = gr.Chatbot(label="Conversation", height=400)
126
+
127
+ start_btn.click(start_conversation, outputs=[status_text])
128
+ stop_btn.click(stop_conversation, outputs=[status_text])
129
+
130
+ with gr.Tab("Personality"):
131
+ profiles = get_available_profiles()
132
+ profile_dropdown = gr.Dropdown(
133
+ choices=[""] + profiles,
134
+ label="Select Profile",
135
+ value=""
136
+ )
137
+ apply_btn = gr.Button("Apply Profile")
138
+ profile_status = gr.Textbox(label="Status", interactive=False)
139
+
140
+ apply_btn.click(
141
+ apply_profile,
142
+ inputs=[profile_dropdown],
143
+ outputs=[profile_status]
144
+ )
145
+
146
+ gr.Markdown("### Create New Profile")
147
+ new_name = gr.Textbox(label="Profile Name")
148
+ new_instructions = gr.Textbox(
149
+ label="Instructions",
150
+ lines=10,
151
+ placeholder="Enter the system prompt for this personality..."
152
+ )
153
+ save_btn = gr.Button("Save Profile")
154
+ save_status = gr.Textbox(label="Save Status", interactive=False)
155
+
156
+ save_btn.click(
157
+ save_profile,
158
+ inputs=[new_name, new_instructions],
159
+ outputs=[save_status]
160
+ )
161
+
162
+ with gr.Tab("Settings"):
163
+ gr.Markdown(f"""
164
+ ### Current Configuration
165
+
166
+ - **OpenClaw Gateway**: {gateway_url}
167
+ - **OpenAI Model**: {config.OPENAI_MODEL}
168
+ - **Voice**: {config.OPENAI_VOICE}
169
+ - **Camera Enabled**: {enable_camera}
170
+ - **OpenClaw Enabled**: {enable_openclaw}
171
+ - **Face Tracking**: {enable_face_tracking}
172
+ - **Head Tracker**: {head_tracker_type or 'auto-detect'}
173
+
174
+ Edit `.env` file to change these settings.
175
+ """)
176
+
177
+ with gr.Tab("About"):
178
+ gr.Markdown("""
179
+ ## About Reachy Mini OpenClaw
180
+
181
+ This application combines:
182
+
183
+ - **OpenAI Realtime API** for ultra-low-latency voice conversation
184
+ - **OpenClaw Gateway** for extended AI capabilities (web, calendar, smart home, etc.)
185
+ - **Reachy Mini Robot** for physical embodiment with expressive movements
186
+
187
+ ### Features
188
+
189
+ - 🎤 Real-time voice conversation
190
+ - 👀 Camera-based vision
191
+ - 💃 Expressive robot movements
192
+ - 🔧 Tool integration via OpenClaw
193
+ - 🎭 Customizable personalities
194
+
195
+ ### Links
196
+
197
+ - [Reachy Mini SDK](https://github.com/pollen-robotics/reachy_mini)
198
+ - [OpenClaw](https://github.com/openclaw/openclaw)
199
+ - [OpenAI Realtime API](https://platform.openai.com/docs/guides/realtime)
200
+ """)
201
+
202
+ demo.launch(share=share, server_name="0.0.0.0", server_port=7860)
src/reachy_mini_openclaw/main.py ADDED
@@ -0,0 +1,591 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """ReachyClaw - Give your OpenClaw AI agent a physical robot body.
2
+
3
+ This module provides the main application that connects:
4
+ - OpenAI Realtime API for voice I/O (speech recognition + TTS)
5
+ - OpenClaw Gateway for AI intelligence (the actual brain)
6
+ - Reachy Mini robot for physical embodiment
7
+
8
+ Usage:
9
+ # Console mode (direct audio)
10
+ reachyclaw
11
+
12
+ # With Gradio UI
13
+ reachyclaw --gradio
14
+
15
+ # With debug logging
16
+ reachyclaw --debug
17
+ """
18
+
19
+ import os
20
+ import sys
21
+ import time
22
+ import asyncio
23
+ import logging
24
+ import argparse
25
+ import threading
26
+ from pathlib import Path
27
+ from typing import Any, Optional
28
+
29
+ from dotenv import load_dotenv
30
+
31
+ # Load environment from project root (override=True ensures .env takes precedence)
32
+ _project_root = Path(__file__).parent.parent.parent
33
+ load_dotenv(_project_root / ".env", override=True)
34
+
35
+ logger = logging.getLogger(__name__)
36
+
37
+
38
+ def setup_logging(debug: bool = False) -> None:
39
+ """Configure logging for the application.
40
+
41
+ Args:
42
+ debug: Enable debug level logging
43
+ """
44
+ level = logging.DEBUG if debug else logging.INFO
45
+ logging.basicConfig(
46
+ level=level,
47
+ format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
48
+ datefmt="%H:%M:%S",
49
+ )
50
+
51
+ # Reduce noise from libraries
52
+ if not debug:
53
+ logging.getLogger("httpx").setLevel(logging.WARNING)
54
+ logging.getLogger("websockets").setLevel(logging.WARNING)
55
+ logging.getLogger("openai").setLevel(logging.WARNING)
56
+
57
+
58
+ def parse_args() -> argparse.Namespace:
59
+ """Parse command line arguments.
60
+
61
+ Returns:
62
+ Parsed arguments namespace
63
+ """
64
+ parser = argparse.ArgumentParser(
65
+ description="ReachyClaw - Give your OpenClaw AI agent a physical robot body",
66
+ formatter_class=argparse.RawDescriptionHelpFormatter,
67
+ epilog="""
68
+ Examples:
69
+ # Run in console mode
70
+ reachyclaw
71
+
72
+ # Run with Gradio web UI
73
+ reachyclaw --gradio
74
+
75
+ # Connect to specific robot
76
+ reachyclaw --robot-name my-reachy
77
+
78
+ # Use different OpenClaw gateway
79
+ reachyclaw --gateway-url http://192.168.1.100:18790
80
+ """
81
+ )
82
+
83
+ parser.add_argument(
84
+ "--debug",
85
+ action="store_true",
86
+ help="Enable debug logging"
87
+ )
88
+ parser.add_argument(
89
+ "--gradio",
90
+ action="store_true",
91
+ help="Launch Gradio web UI instead of console mode"
92
+ )
93
+ parser.add_argument(
94
+ "--robot-name",
95
+ type=str,
96
+ help="Robot name for connection (default: auto-discover)"
97
+ )
98
+ parser.add_argument(
99
+ "--gateway-url",
100
+ type=str,
101
+ default=os.getenv("OPENCLAW_GATEWAY_URL", "ws://localhost:18789"),
102
+ help="OpenClaw gateway URL (from OPENCLAW_GATEWAY_URL env or default)"
103
+ )
104
+ parser.add_argument(
105
+ "--no-camera",
106
+ action="store_true",
107
+ help="Disable camera functionality"
108
+ )
109
+ parser.add_argument(
110
+ "--no-openclaw",
111
+ action="store_true",
112
+ help="Disable OpenClaw integration"
113
+ )
114
+ parser.add_argument(
115
+ "--no-face-tracking",
116
+ action="store_true",
117
+ help="Disable face tracking"
118
+ )
119
+ parser.add_argument(
120
+ "--local-vision",
121
+ action="store_true",
122
+ help="Enable local vision processing with SmolVLM2"
123
+ )
124
+ parser.add_argument(
125
+ "--profile",
126
+ type=str,
127
+ help="Custom personality profile to use"
128
+ )
129
+
130
+ return parser.parse_args()
131
+
132
+
133
+ class ReachyClawCore:
134
+ """ReachyClaw core application controller.
135
+
136
+ This class orchestrates all components:
137
+ - Reachy Mini robot connection and movement control
138
+ - OpenAI Realtime API for voice I/O
139
+ - OpenClaw gateway bridge for AI intelligence
140
+ - Audio input/output loops
141
+ """
142
+
143
+ def __init__(
144
+ self,
145
+ gateway_url: str = "ws://localhost:18789",
146
+ robot_name: Optional[str] = None,
147
+ enable_camera: bool = True,
148
+ enable_openclaw: bool = True,
149
+ robot: Optional["ReachyMini"] = None,
150
+ external_stop_event: Optional[threading.Event] = None,
151
+ ):
152
+ """Initialize the application.
153
+
154
+ Args:
155
+ gateway_url: OpenClaw gateway URL
156
+ robot_name: Optional robot name for connection
157
+ enable_camera: Whether to enable camera functionality
158
+ enable_openclaw: Whether to enable OpenClaw integration
159
+ robot: Optional pre-initialized robot (for app framework)
160
+ external_stop_event: Optional external stop event
161
+ """
162
+ from reachy_mini import ReachyMini
163
+ from reachy_mini_openclaw.config import config
164
+ from reachy_mini_openclaw.moves import MovementManager
165
+ from reachy_mini_openclaw.audio.head_wobbler import HeadWobbler
166
+ from reachy_mini_openclaw.openclaw_bridge import OpenClawBridge
167
+ from reachy_mini_openclaw.tools.core_tools import ToolDependencies
168
+ from reachy_mini_openclaw.openai_realtime import OpenAIRealtimeHandler
169
+
170
+ self.gateway_url = gateway_url
171
+ self._external_stop_event = external_stop_event
172
+ self._owns_robot = robot is None
173
+
174
+ # Validate configuration
175
+ errors = config.validate()
176
+ if errors:
177
+ for error in errors:
178
+ logger.error("Config error: %s", error)
179
+ sys.exit(1)
180
+
181
+ # Connect to robot
182
+ if robot is not None:
183
+ self.robot = robot
184
+ logger.info("Using provided Reachy Mini instance")
185
+ else:
186
+ logger.info("Connecting to Reachy Mini...")
187
+ robot_kwargs = {}
188
+ if robot_name:
189
+ robot_kwargs["robot_name"] = robot_name
190
+
191
+ try:
192
+ self.robot = ReachyMini(**robot_kwargs)
193
+ except TimeoutError as e:
194
+ logger.error("Connection timeout: %s", e)
195
+ logger.error("Check that the robot is powered on and reachable.")
196
+ sys.exit(1)
197
+ except Exception as e:
198
+ logger.error("Robot connection failed: %s", e)
199
+ sys.exit(1)
200
+
201
+ logger.info("Connected to robot: %s", self.robot.client.get_status())
202
+
203
+ # Initialize movement system
204
+ logger.info("Initializing movement system...")
205
+ self.movement_manager = MovementManager(current_robot=self.robot)
206
+ self.head_wobbler = HeadWobbler(
207
+ set_speech_offsets=self.movement_manager.set_speech_offsets
208
+ )
209
+
210
+ # Initialize OpenClaw bridge
211
+ self.openclaw_bridge = None
212
+ if enable_openclaw:
213
+ logger.info("Initializing OpenClaw bridge...")
214
+ self.openclaw_bridge = OpenClawBridge(
215
+ gateway_url=gateway_url,
216
+ gateway_token=config.OPENCLAW_TOKEN,
217
+ )
218
+
219
+ # Camera worker for video streaming and frame capture
220
+ self.camera_worker = None
221
+ self.head_tracker = None
222
+ self.vision_manager = None
223
+
224
+ if enable_camera:
225
+ logger.info("Initializing camera worker...")
226
+ from reachy_mini_openclaw.camera_worker import CameraWorker
227
+
228
+ # Initialize head tracker for local face tracking
229
+ if config.ENABLE_FACE_TRACKING:
230
+ self.head_tracker = self._initialize_head_tracker(config.HEAD_TRACKER_TYPE)
231
+
232
+ # Initialize camera worker with head tracker
233
+ self.camera_worker = CameraWorker(
234
+ reachy_mini=self.robot,
235
+ head_tracker=self.head_tracker,
236
+ )
237
+
238
+ # Enable/disable head tracking based on whether we have a tracker
239
+ self.camera_worker.set_head_tracking_enabled(self.head_tracker is not None)
240
+
241
+ # Initialize local vision processor if enabled
242
+ if config.ENABLE_LOCAL_VISION:
243
+ self.vision_manager = self._initialize_vision_manager()
244
+
245
+ # Create tool dependencies
246
+ self.deps = ToolDependencies(
247
+ movement_manager=self.movement_manager,
248
+ head_wobbler=self.head_wobbler,
249
+ robot=self.robot,
250
+ camera_worker=self.camera_worker,
251
+ openclaw_bridge=self.openclaw_bridge,
252
+ vision_manager=self.vision_manager,
253
+ )
254
+
255
+ # Initialize OpenAI Realtime handler with OpenClaw bridge
256
+ self.handler = OpenAIRealtimeHandler(
257
+ deps=self.deps,
258
+ openclaw_bridge=self.openclaw_bridge,
259
+ )
260
+
261
+ # State
262
+ self._stop_event = asyncio.Event()
263
+ self._tasks: list[asyncio.Task] = []
264
+
265
+ def _initialize_vision_manager(self) -> Optional[Any]:
266
+ """Initialize local vision processor (SmolVLM2).
267
+
268
+ Returns:
269
+ VisionManager instance or None if initialization fails
270
+ """
271
+ if self.camera_worker is None:
272
+ logger.warning("Cannot initialize vision manager without camera worker")
273
+ return None
274
+
275
+ try:
276
+ from reachy_mini_openclaw.vision.processors import (
277
+ VisionConfig,
278
+ initialize_vision_manager,
279
+ )
280
+ from reachy_mini_openclaw.config import config
281
+
282
+ vision_config = VisionConfig(
283
+ model_path=config.LOCAL_VISION_MODEL,
284
+ device_preference=config.VISION_DEVICE,
285
+ hf_home=config.HF_HOME,
286
+ )
287
+
288
+ logger.info("Initializing local vision processor (SmolVLM2)...")
289
+ vision_manager = initialize_vision_manager(self.camera_worker, vision_config)
290
+
291
+ if vision_manager is not None:
292
+ logger.info("Local vision processor initialized")
293
+ else:
294
+ logger.warning("Local vision processor failed to initialize")
295
+
296
+ return vision_manager
297
+
298
+ except ImportError as e:
299
+ logger.warning(f"Local vision not available: {e}")
300
+ logger.warning("Install with: pip install torch transformers")
301
+ return None
302
+ except Exception as e:
303
+ logger.error(f"Failed to initialize vision manager: {e}")
304
+ return None
305
+
306
+ def _initialize_head_tracker(self, tracker_type: Optional[str] = None) -> Optional[Any]:
307
+ """Initialize head tracker for local face tracking.
308
+
309
+ Args:
310
+ tracker_type: Type of tracker ("yolo", "mediapipe", or None for auto)
311
+
312
+ Returns:
313
+ Initialized head tracker or None if initialization fails
314
+ """
315
+ # Default to YOLO if not specified
316
+ if tracker_type is None:
317
+ tracker_type = "yolo"
318
+
319
+ if tracker_type == "yolo":
320
+ try:
321
+ from reachy_mini_openclaw.vision.yolo_head_tracker import HeadTracker
322
+ logger.info("Initializing YOLO face tracker...")
323
+ tracker = HeadTracker(device="cpu") # CPU is fast enough for face detection
324
+ logger.info("YOLO face tracker initialized")
325
+ return tracker
326
+ except ImportError as e:
327
+ logger.warning(f"YOLO tracker not available: {e}")
328
+ logger.warning("Install with: pip install ultralytics supervision")
329
+ except Exception as e:
330
+ logger.error(f"Failed to initialize YOLO tracker: {e}")
331
+
332
+ elif tracker_type == "mediapipe":
333
+ try:
334
+ from reachy_mini_openclaw.vision.mediapipe_tracker import HeadTracker
335
+ logger.info("Initializing MediaPipe face tracker...")
336
+ tracker = HeadTracker()
337
+ logger.info("MediaPipe face tracker initialized")
338
+ return tracker
339
+ except ImportError as e:
340
+ logger.warning(f"MediaPipe tracker not available: {e}")
341
+ except Exception as e:
342
+ logger.error(f"Failed to initialize MediaPipe tracker: {e}")
343
+
344
+ logger.warning("No face tracker available - face tracking disabled")
345
+ return None
346
+
347
+ def _should_stop(self) -> bool:
348
+ """Check if we should stop."""
349
+ if self._stop_event.is_set():
350
+ return True
351
+ if self._external_stop_event is not None and self._external_stop_event.is_set():
352
+ return True
353
+ return False
354
+
355
+ async def record_loop(self) -> None:
356
+ """Read audio from robot microphone and send to handler."""
357
+ input_sr = self.robot.media.get_input_audio_samplerate()
358
+ logger.info("Recording at %d Hz", input_sr)
359
+
360
+ while not self._should_stop():
361
+ audio_frame = self.robot.media.get_audio_sample()
362
+ if audio_frame is not None:
363
+ await self.handler.receive((input_sr, audio_frame))
364
+ await asyncio.sleep(0.01)
365
+
366
+ async def play_loop(self) -> None:
367
+ """Play audio from handler through robot speakers."""
368
+ output_sr = self.robot.media.get_output_audio_samplerate()
369
+ logger.info("Playing at %d Hz", output_sr)
370
+
371
+ while not self._should_stop():
372
+ output = await self.handler.emit()
373
+ if output is not None:
374
+ if isinstance(output, tuple):
375
+ input_sr, audio_data = output
376
+
377
+ # Convert to float32 and normalize (OpenAI sends int16)
378
+ audio_data = audio_data.flatten().astype("float32") / 32768.0
379
+
380
+ # Reduce volume to prevent distortion (0.5 = 50% volume)
381
+ audio_data = audio_data * 0.5
382
+
383
+ # Resample if needed
384
+ if input_sr != output_sr:
385
+ from scipy.signal import resample
386
+ num_samples = int(len(audio_data) * output_sr / input_sr)
387
+ audio_data = resample(audio_data, num_samples).astype("float32")
388
+
389
+ self.robot.media.push_audio_sample(audio_data)
390
+ # else: it's an AdditionalOutputs (transcript) - handle in UI mode
391
+
392
+ await asyncio.sleep(0.01)
393
+
394
+ async def run(self) -> None:
395
+ """Run the main application loop."""
396
+ # Test OpenClaw connection
397
+ if self.openclaw_bridge is not None:
398
+ connected = await self.openclaw_bridge.connect()
399
+ if connected:
400
+ logger.info("OpenClaw gateway connected")
401
+ else:
402
+ logger.warning("OpenClaw gateway not available - some features disabled")
403
+
404
+ # Enable motors and move to neutral pose
405
+ logger.info("Enabling motors and moving to neutral position...")
406
+ try:
407
+ self.robot.enable_motors()
408
+ from reachy_mini.utils import create_head_pose
409
+ neutral = create_head_pose(0, 0, 0, 0, 0, 0, degrees=True)
410
+ self.robot.goto_target(
411
+ head=neutral,
412
+ antennas=[0.0, 0.0],
413
+ duration=2.0,
414
+ body_yaw=0.0,
415
+ )
416
+ time.sleep(2) # Wait for goto to complete
417
+ logger.info("Robot at neutral position with motors enabled")
418
+ except Exception as e:
419
+ logger.error("Failed to initialize robot pose: %s", e)
420
+
421
+ # Wire up camera worker to movement manager for face tracking
422
+ if self.camera_worker is not None:
423
+ self.movement_manager.camera_worker = self.camera_worker
424
+ logger.info("Face tracking connected to movement system")
425
+
426
+ # Start movement system
427
+ logger.info("Starting movement system...")
428
+ self.movement_manager.start()
429
+ self.head_wobbler.start()
430
+
431
+ # Start camera worker for video streaming
432
+ if self.camera_worker is not None:
433
+ logger.info("Starting camera worker...")
434
+ self.camera_worker.start()
435
+
436
+ # Start local vision processor if available
437
+ if self.vision_manager is not None:
438
+ logger.info("Starting local vision processor...")
439
+ self.vision_manager.start()
440
+
441
+ # Start audio
442
+ logger.info("Starting audio...")
443
+ self.robot.media.start_recording()
444
+ self.robot.media.start_playing()
445
+ time.sleep(1) # Let pipelines initialize
446
+
447
+ logger.info("Ready! Speak to me...")
448
+
449
+ # Start OpenAI handler in background
450
+ handler_task = asyncio.create_task(self.handler.start_up(), name="openai-handler")
451
+
452
+ # Start audio loops
453
+ self._tasks = [
454
+ handler_task,
455
+ asyncio.create_task(self.record_loop(), name="record-loop"),
456
+ asyncio.create_task(self.play_loop(), name="play-loop"),
457
+ ]
458
+
459
+ try:
460
+ await asyncio.gather(*self._tasks)
461
+ except asyncio.CancelledError:
462
+ logger.info("Tasks cancelled")
463
+
464
+ def stop(self) -> None:
465
+ """Stop everything."""
466
+ logger.info("Stopping...")
467
+ self._stop_event.set()
468
+
469
+ # Cancel tasks
470
+ for task in self._tasks:
471
+ if not task.done():
472
+ task.cancel()
473
+
474
+ # Stop movement system
475
+ self.head_wobbler.stop()
476
+ self.movement_manager.stop()
477
+
478
+ # Stop vision manager
479
+ if self.vision_manager is not None:
480
+ self.vision_manager.stop()
481
+
482
+ # Stop camera worker
483
+ if self.camera_worker is not None:
484
+ self.camera_worker.stop()
485
+
486
+ # Disconnect OpenClaw bridge
487
+ if self.openclaw_bridge is not None:
488
+ try:
489
+ asyncio.get_event_loop().run_until_complete(
490
+ self.openclaw_bridge.disconnect()
491
+ )
492
+ except Exception as e:
493
+ logger.debug("OpenClaw disconnect: %s", e)
494
+
495
+ # Close resources if we own them
496
+ if self._owns_robot:
497
+ try:
498
+ self.robot.media.close()
499
+ except Exception as e:
500
+ logger.debug("Media close: %s", e)
501
+ self.robot.client.disconnect()
502
+
503
+ logger.info("Stopped")
504
+
505
+
506
+ class ReachyClawApp:
507
+ """ReachyClaw - Reachy Mini Apps entry point.
508
+
509
+ This class allows ReachyClaw to be installed and run from
510
+ the Reachy Mini dashboard as a Reachy Mini App.
511
+ """
512
+
513
+ # No custom settings UI
514
+ custom_app_url: Optional[str] = None
515
+
516
+ def run(self, reachy_mini, stop_event: threading.Event) -> None:
517
+ """Run ReachyClaw as a Reachy Mini App.
518
+
519
+ Args:
520
+ reachy_mini: Pre-initialized ReachyMini instance
521
+ stop_event: Threading event to signal stop
522
+ """
523
+ loop = asyncio.new_event_loop()
524
+ asyncio.set_event_loop(loop)
525
+
526
+ gateway_url = os.getenv("OPENCLAW_GATEWAY_URL", "ws://localhost:18789")
527
+
528
+ app = ReachyClawCore(
529
+ gateway_url=gateway_url,
530
+ robot=reachy_mini,
531
+ external_stop_event=stop_event,
532
+ )
533
+
534
+ try:
535
+ loop.run_until_complete(app.run())
536
+ except Exception as e:
537
+ logger.error("Error running app: %s", e)
538
+ finally:
539
+ app.stop()
540
+ loop.close()
541
+
542
+
543
+ def main() -> None:
544
+ """Main entry point."""
545
+ args = parse_args()
546
+ setup_logging(args.debug)
547
+
548
+ # Set custom profile if specified
549
+ if args.profile:
550
+ from reachy_mini_openclaw.config import set_custom_profile
551
+ set_custom_profile(args.profile)
552
+
553
+ # Configure face tracking and local vision from args
554
+ from reachy_mini_openclaw.config import (
555
+ set_face_tracking_enabled,
556
+ set_local_vision_enabled,
557
+ )
558
+ if args.no_face_tracking:
559
+ set_face_tracking_enabled(False)
560
+ if args.local_vision:
561
+ set_local_vision_enabled(True)
562
+
563
+ if args.gradio:
564
+ # Launch Gradio UI
565
+ logger.info("Starting Gradio UI...")
566
+ from reachy_mini_openclaw.gradio_app import launch_gradio
567
+ launch_gradio(
568
+ gateway_url=args.gateway_url,
569
+ robot_name=args.robot_name,
570
+ enable_camera=not args.no_camera,
571
+ enable_openclaw=not args.no_openclaw,
572
+ )
573
+ else:
574
+ # Console mode
575
+ app = ReachyClawCore(
576
+ gateway_url=args.gateway_url,
577
+ robot_name=args.robot_name,
578
+ enable_camera=not args.no_camera,
579
+ enable_openclaw=not args.no_openclaw,
580
+ )
581
+
582
+ try:
583
+ asyncio.run(app.run())
584
+ except KeyboardInterrupt:
585
+ logger.info("Interrupted")
586
+ finally:
587
+ app.stop()
588
+
589
+
590
+ if __name__ == "__main__":
591
+ main()
src/reachy_mini_openclaw/moves.py ADDED
@@ -0,0 +1,648 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Movement system for expressive robot control.
2
+
3
+ This module provides a 100Hz control loop for managing robot movements,
4
+ combining sequential primary moves (dances, emotions, head movements) with
5
+ additive secondary moves (speech wobble, face tracking).
6
+
7
+ Architecture:
8
+ - Primary moves are queued and executed sequentially
9
+ - Secondary moves are additive offsets applied on top
10
+ - Single control point via set_target at 100Hz
11
+ - Automatic breathing animation when idle
12
+
13
+ Based on the movement systems from:
14
+ - pollen-robotics/reachy_mini_conversation_app
15
+ - eoai-dev/moltbot_body
16
+ """
17
+
18
+ from __future__ import annotations
19
+
20
+ import logging
21
+ import threading
22
+ import time
23
+ from collections import deque
24
+ from dataclasses import dataclass
25
+ from queue import Empty, Queue
26
+ from typing import Any, Dict, Optional, Tuple
27
+
28
+ import numpy as np
29
+ from numpy.typing import NDArray
30
+ from reachy_mini import ReachyMini
31
+ from reachy_mini.motion.move import Move
32
+ from reachy_mini.utils import create_head_pose
33
+ from reachy_mini.utils.interpolation import compose_world_offset, linear_pose_interpolation
34
+
35
+ logger = logging.getLogger(__name__)
36
+
37
+ # Configuration
38
+ CONTROL_LOOP_FREQUENCY_HZ = 100.0
39
+
40
+ # Type definitions
41
+ FullBodyPose = Tuple[NDArray[np.float32], Tuple[float, float], float]
42
+ SpeechOffsets = Tuple[float, float, float, float, float, float]
43
+
44
+
45
+ class BreathingMove(Move):
46
+ """Continuous breathing animation for idle state."""
47
+
48
+ def __init__(
49
+ self,
50
+ interpolation_start_pose: NDArray[np.float32],
51
+ interpolation_start_antennas: Tuple[float, float],
52
+ interpolation_duration: float = 1.0,
53
+ ):
54
+ """Initialize breathing move.
55
+
56
+ Args:
57
+ interpolation_start_pose: Current head pose to interpolate from
58
+ interpolation_start_antennas: Current antenna positions
59
+ interpolation_duration: Time to blend to neutral (seconds)
60
+ """
61
+ self.interpolation_start_pose = interpolation_start_pose
62
+ self.interpolation_start_antennas = np.array(interpolation_start_antennas)
63
+ self.interpolation_duration = interpolation_duration
64
+
65
+ # Target neutral pose
66
+ self.neutral_head_pose = create_head_pose(0, 0, 0, 0, 0, 0, degrees=True)
67
+ self.neutral_antennas = np.array([0.0, 0.0])
68
+
69
+ # Breathing parameters
70
+ self.breathing_z_amplitude = 0.005 # 5mm gentle movement
71
+ self.breathing_frequency = 0.1 # Hz
72
+ self.antenna_sway_amplitude = np.deg2rad(15) # degrees
73
+ self.antenna_frequency = 0.5 # Hz
74
+
75
+ @property
76
+ def duration(self) -> float:
77
+ """Duration of the move (infinite for breathing)."""
78
+ return float("inf")
79
+
80
+ def evaluate(self, t: float) -> tuple:
81
+ """Evaluate the breathing pose at time t."""
82
+ if t < self.interpolation_duration:
83
+ # Interpolate to neutral
84
+ alpha = t / self.interpolation_duration
85
+ head_pose = linear_pose_interpolation(
86
+ self.interpolation_start_pose,
87
+ self.neutral_head_pose,
88
+ alpha
89
+ )
90
+ antennas = (1 - alpha) * self.interpolation_start_antennas + alpha * self.neutral_antennas
91
+ antennas = antennas.astype(np.float64)
92
+ else:
93
+ # Breathing pattern
94
+ breathing_t = t - self.interpolation_duration
95
+
96
+ z_offset = self.breathing_z_amplitude * np.sin(
97
+ 2 * np.pi * self.breathing_frequency * breathing_t
98
+ )
99
+ head_pose = create_head_pose(
100
+ x=0, y=0, z=z_offset,
101
+ roll=0, pitch=0, yaw=0,
102
+ degrees=True, mm=False
103
+ )
104
+
105
+ antenna_sway = self.antenna_sway_amplitude * np.sin(
106
+ 2 * np.pi * self.antenna_frequency * breathing_t
107
+ )
108
+ antennas = np.array([antenna_sway, -antenna_sway], dtype=np.float64)
109
+
110
+ return (head_pose, antennas, 0.0)
111
+
112
+
113
+ class HeadLookMove(Move):
114
+ """Move to look in a specific direction."""
115
+
116
+ DIRECTIONS = {
117
+ "left": (0, 0, 0, 0, 0, 30), # yaw left
118
+ "right": (0, 0, 0, 0, 0, -30), # yaw right
119
+ "up": (0, 0, 10, 0, 15, 0), # pitch up, z up
120
+ "down": (0, 0, -5, 0, -15, 0), # pitch down, z down
121
+ "front": (0, 0, 0, 0, 0, 0), # neutral
122
+ }
123
+
124
+ def __init__(
125
+ self,
126
+ direction: str,
127
+ start_pose: NDArray[np.float32],
128
+ start_antennas: Tuple[float, float],
129
+ duration: float = 1.0,
130
+ ):
131
+ """Initialize head look move.
132
+
133
+ Args:
134
+ direction: One of 'left', 'right', 'up', 'down', 'front'
135
+ start_pose: Current head pose
136
+ start_antennas: Current antenna positions
137
+ duration: Move duration in seconds
138
+ """
139
+ self.direction = direction
140
+ self.start_pose = start_pose
141
+ self.start_antennas = np.array(start_antennas)
142
+ self._duration = duration
143
+
144
+ # Get target pose from direction
145
+ params = self.DIRECTIONS.get(direction, self.DIRECTIONS["front"])
146
+ self.target_pose = create_head_pose(
147
+ x=params[0], y=params[1], z=params[2],
148
+ roll=params[3], pitch=params[4], yaw=params[5],
149
+ degrees=True, mm=True
150
+ )
151
+ self.target_antennas = np.array([0.0, 0.0])
152
+
153
+ @property
154
+ def duration(self) -> float:
155
+ return self._duration
156
+
157
+ def evaluate(self, t: float) -> tuple:
158
+ """Evaluate pose at time t."""
159
+ alpha = min(1.0, t / self._duration)
160
+ # Smooth easing
161
+ alpha = alpha * alpha * (3 - 2 * alpha)
162
+
163
+ head_pose = linear_pose_interpolation(
164
+ self.start_pose,
165
+ self.target_pose,
166
+ alpha
167
+ )
168
+ antennas = (1 - alpha) * self.start_antennas + alpha * self.target_antennas
169
+
170
+ return (head_pose, antennas.astype(np.float64), 0.0)
171
+
172
+
173
+ def combine_full_body(primary: FullBodyPose, secondary: FullBodyPose) -> FullBodyPose:
174
+ """Combine primary pose with secondary offsets."""
175
+ primary_head, primary_ant, primary_yaw = primary
176
+ secondary_head, secondary_ant, secondary_yaw = secondary
177
+
178
+ combined_head = compose_world_offset(primary_head, secondary_head, reorthonormalize=True)
179
+ combined_ant = (
180
+ primary_ant[0] + secondary_ant[0],
181
+ primary_ant[1] + secondary_ant[1],
182
+ )
183
+ combined_yaw = primary_yaw + secondary_yaw
184
+
185
+ return (combined_head, combined_ant, combined_yaw)
186
+
187
+
188
+ def clone_pose(pose: FullBodyPose) -> FullBodyPose:
189
+ """Deep copy a full body pose."""
190
+ head, ant, yaw = pose
191
+ return (head.copy(), (float(ant[0]), float(ant[1])), float(yaw))
192
+
193
+
194
+ @dataclass
195
+ class MovementState:
196
+ """State for the movement system."""
197
+ current_move: Optional[Move] = None
198
+ move_start_time: Optional[float] = None
199
+ last_activity_time: float = 0.0
200
+ speech_offsets: SpeechOffsets = (0.0, 0.0, 0.0, 0.0, 0.0, 0.0)
201
+ face_tracking_offsets: SpeechOffsets = (0.0, 0.0, 0.0, 0.0, 0.0, 0.0)
202
+ thinking_offsets: SpeechOffsets = (0.0, 0.0, 0.0, 0.0, 0.0, 0.0)
203
+ last_primary_pose: Optional[FullBodyPose] = None
204
+
205
+ def update_activity(self) -> None:
206
+ self.last_activity_time = time.monotonic()
207
+
208
+
209
+ class MovementManager:
210
+ """Coordinate robot movements at 100Hz.
211
+
212
+ This class manages:
213
+ - Sequential primary moves (dances, emotions, head movements)
214
+ - Additive secondary offsets (speech wobble, face tracking)
215
+ - Automatic idle breathing animation
216
+ - Thread-safe communication with other components
217
+
218
+ Example:
219
+ manager = MovementManager(robot)
220
+ manager.start()
221
+
222
+ # Queue a head movement
223
+ manager.queue_move(HeadLookMove("left", ...))
224
+
225
+ # Set speech offsets (called by HeadWobbler)
226
+ manager.set_speech_offsets((0, 0, 0.01, 0.1, 0, 0))
227
+
228
+ manager.stop()
229
+ """
230
+
231
+ def __init__(
232
+ self,
233
+ current_robot: ReachyMini,
234
+ camera_worker: Any = None,
235
+ ):
236
+ """Initialize movement manager.
237
+
238
+ Args:
239
+ current_robot: Connected ReachyMini instance
240
+ camera_worker: Optional camera worker for face tracking
241
+ """
242
+ self.current_robot = current_robot
243
+ self.camera_worker = camera_worker
244
+
245
+ self._now = time.monotonic
246
+ self.state = MovementState()
247
+ self.state.last_activity_time = self._now()
248
+
249
+ # Initialize neutral pose
250
+ neutral = create_head_pose(0, 0, 0, 0, 0, 0, degrees=True)
251
+ self.state.last_primary_pose = (neutral, (0.0, 0.0), 0.0)
252
+
253
+ # Move queue
254
+ self.move_queue: deque[Move] = deque()
255
+
256
+ # Configuration
257
+ self.idle_inactivity_delay = 0.3 # seconds before breathing starts
258
+ self.target_frequency = CONTROL_LOOP_FREQUENCY_HZ
259
+ self.target_period = 1.0 / self.target_frequency
260
+
261
+ # Thread state
262
+ self._stop_event = threading.Event()
263
+ self._thread: Optional[threading.Thread] = None
264
+ self._is_listening = False
265
+ self._breathing_active = False
266
+
267
+ # Last commanded pose for smooth transitions
268
+ self._last_commanded_pose = clone_pose(self.state.last_primary_pose)
269
+ self._listening_antennas = self._last_commanded_pose[1]
270
+ self._antenna_unfreeze_blend = 1.0
271
+ self._antenna_blend_duration = 0.4
272
+
273
+ # Cross-thread communication
274
+ self._command_queue: Queue[Tuple[str, Any]] = Queue()
275
+
276
+ # Speech offsets (thread-safe)
277
+ self._speech_lock = threading.Lock()
278
+ self._pending_speech_offsets: SpeechOffsets = (0.0, 0.0, 0.0, 0.0, 0.0, 0.0)
279
+ self._speech_dirty = False
280
+
281
+ # Processing/thinking animation state
282
+ self._processing = False
283
+ self._processing_start_time = 0.0
284
+ self._thinking_amplitude = 0.0 # 0..1 envelope for smooth fade in/out
285
+ self._thinking_antenna_offsets: Tuple[float, float] = (0.0, 0.0)
286
+
287
+ # Shared state lock
288
+ self._shared_lock = threading.Lock()
289
+ self._shared_last_activity = self.state.last_activity_time
290
+ self._shared_is_listening = False
291
+
292
+ def queue_move(self, move: Move) -> None:
293
+ """Queue a primary move. Thread-safe."""
294
+ self._command_queue.put(("queue_move", move))
295
+
296
+ def clear_move_queue(self) -> None:
297
+ """Clear all queued moves. Thread-safe."""
298
+ self._command_queue.put(("clear_queue", None))
299
+
300
+ def set_speech_offsets(self, offsets: SpeechOffsets) -> None:
301
+ """Update speech-driven offsets. Thread-safe."""
302
+ with self._speech_lock:
303
+ self._pending_speech_offsets = offsets
304
+ self._speech_dirty = True
305
+
306
+ def set_listening(self, listening: bool) -> None:
307
+ """Set listening state (freezes antennas). Thread-safe."""
308
+ self._command_queue.put(("set_listening", listening))
309
+
310
+ def set_processing(self, processing: bool) -> None:
311
+ """Set processing state (triggers thinking animation). Thread-safe.
312
+
313
+ When True, the robot shows a continuous 'thinking' animation as
314
+ secondary offsets -- gentle head sway and asymmetric antenna scanning.
315
+ Face tracking continues underneath since this is additive.
316
+ """
317
+ self._command_queue.put(("set_processing", processing))
318
+
319
+ def is_idle(self) -> bool:
320
+ """Check if robot has been idle. Thread-safe."""
321
+ with self._shared_lock:
322
+ if self._shared_is_listening:
323
+ return False
324
+ return self._now() - self._shared_last_activity >= self.idle_inactivity_delay
325
+
326
+ def _poll_signals(self, current_time: float) -> None:
327
+ """Process queued commands and pending offsets."""
328
+ # Apply speech offsets
329
+ with self._speech_lock:
330
+ if self._speech_dirty:
331
+ self.state.speech_offsets = self._pending_speech_offsets
332
+ self._speech_dirty = False
333
+ self.state.update_activity()
334
+
335
+ # Process commands
336
+ while True:
337
+ try:
338
+ cmd, payload = self._command_queue.get_nowait()
339
+ except Empty:
340
+ break
341
+ self._handle_command(cmd, payload, current_time)
342
+
343
+ def _update_face_tracking(self, current_time: float) -> None:
344
+ """Get face tracking offsets from camera worker thread."""
345
+ if self.camera_worker is not None:
346
+ offsets = self.camera_worker.get_face_tracking_offsets()
347
+ self.state.face_tracking_offsets = offsets
348
+ else:
349
+ # No camera worker, use neutral offsets
350
+ self.state.face_tracking_offsets = (0.0, 0.0, 0.0, 0.0, 0.0, 0.0)
351
+
352
+ def _update_thinking_offsets(self, current_time: float) -> None:
353
+ """Compute thinking animation as secondary offsets.
354
+
355
+ Produces a gentle head sway (yaw drift, slight upward pitch, z bob)
356
+ and asymmetric antenna scanning pattern. The amplitude envelope
357
+ smoothly ramps up over 0.5s and decays over 0.5s for organic feel.
358
+ """
359
+ # Update amplitude envelope
360
+ if self._processing:
361
+ # Ramp up over 0.5s
362
+ elapsed = current_time - self._processing_start_time
363
+ self._thinking_amplitude = min(1.0, elapsed / 0.5)
364
+ elif self._thinking_amplitude > 0:
365
+ # Smooth decay at 2.0/s (full decay in 0.5s)
366
+ self._thinking_amplitude = max(
367
+ 0.0, self._thinking_amplitude - 2.0 * self.target_period
368
+ )
369
+
370
+ # If fully decayed, zero everything and bail
371
+ if self._thinking_amplitude < 0.001:
372
+ self._thinking_amplitude = 0.0
373
+ self.state.thinking_offsets = (0.0, 0.0, 0.0, 0.0, 0.0, 0.0)
374
+ self._thinking_antenna_offsets = (0.0, 0.0)
375
+ return
376
+
377
+ amp = self._thinking_amplitude
378
+ t = current_time - self._processing_start_time
379
+
380
+ # Head offsets (radians / metres -- degrees=False, mm=False)
381
+ # Slow yaw drift: ±12° at 0.15 Hz
382
+ yaw = amp * np.deg2rad(12) * np.sin(2 * np.pi * 0.15 * t)
383
+ # Slight upward pitch: 6° base + 3° oscillation at 0.2 Hz
384
+ pitch = amp * (np.deg2rad(6) + np.deg2rad(3) * np.sin(2 * np.pi * 0.2 * t))
385
+ # Gentle z bob: 3 mm at 0.12 Hz
386
+ z = amp * 0.003 * np.sin(2 * np.pi * 0.12 * t)
387
+
388
+ self.state.thinking_offsets = (0.0, 0.0, z, 0.0, pitch, yaw)
389
+
390
+ # Antenna offsets: asymmetric scan (phase offset creates "searching" feel)
391
+ # ±20° at 0.4 Hz, right antenna lags left by ~70° of phase
392
+ left_ant = amp * np.deg2rad(20) * np.sin(2 * np.pi * 0.4 * t)
393
+ right_ant = amp * np.deg2rad(20) * np.sin(2 * np.pi * 0.4 * t + 1.2)
394
+ self._thinking_antenna_offsets = (left_ant, right_ant)
395
+
396
+ def _handle_command(self, cmd: str, payload: Any, current_time: float) -> None:
397
+ """Handle a single command."""
398
+ if cmd == "queue_move":
399
+ if isinstance(payload, Move):
400
+ self.move_queue.append(payload)
401
+ self.state.update_activity()
402
+ logger.debug("Queued move, queue size: %d", len(self.move_queue))
403
+ elif cmd == "clear_queue":
404
+ self.move_queue.clear()
405
+ self.state.current_move = None
406
+ self.state.move_start_time = None
407
+ self._breathing_active = False
408
+ logger.info("Cleared move queue")
409
+ elif cmd == "set_listening":
410
+ desired = bool(payload)
411
+ if self._is_listening != desired:
412
+ self._is_listening = desired
413
+ if desired:
414
+ self._listening_antennas = self._last_commanded_pose[1]
415
+ self._antenna_unfreeze_blend = 0.0
416
+ else:
417
+ self._antenna_unfreeze_blend = 0.0
418
+ self.state.update_activity()
419
+ elif cmd == "set_processing":
420
+ desired = bool(payload)
421
+ if desired and not self._processing:
422
+ self._processing = True
423
+ self._processing_start_time = self._now()
424
+ # Interrupt breathing so thinking animation is clean
425
+ if self._breathing_active and isinstance(self.state.current_move, BreathingMove):
426
+ self.state.current_move = None
427
+ self.state.move_start_time = None
428
+ self._breathing_active = False
429
+ self.state.update_activity()
430
+ logger.debug("Processing started - thinking animation active")
431
+ elif not desired and self._processing:
432
+ self._processing = False
433
+ # Amplitude will decay smoothly in _update_thinking_offsets
434
+ self.state.update_activity()
435
+ logger.debug("Processing ended - thinking animation decaying")
436
+
437
+ def _manage_move_queue(self, current_time: float) -> None:
438
+ """Advance the move queue."""
439
+ # Check if current move is done
440
+ if self.state.current_move is not None and self.state.move_start_time is not None:
441
+ elapsed = current_time - self.state.move_start_time
442
+ if elapsed >= self.state.current_move.duration:
443
+ self.state.current_move = None
444
+ self.state.move_start_time = None
445
+
446
+ # Start next move if available
447
+ if self.state.current_move is None and self.move_queue:
448
+ self.state.current_move = self.move_queue.popleft()
449
+ self.state.move_start_time = current_time
450
+ self._breathing_active = isinstance(self.state.current_move, BreathingMove)
451
+ logger.debug("Starting move with duration: %s", self.state.current_move.duration)
452
+
453
+ def _manage_breathing(self, current_time: float) -> None:
454
+ """Start breathing when idle."""
455
+ if (
456
+ self.state.current_move is None
457
+ and not self.move_queue
458
+ and not self._is_listening
459
+ and not self._breathing_active
460
+ and not self._processing
461
+ ):
462
+ idle_for = current_time - self.state.last_activity_time
463
+ if idle_for >= self.idle_inactivity_delay:
464
+ try:
465
+ _, current_ant = self.current_robot.get_current_joint_positions()
466
+ current_head = self.current_robot.get_current_head_pose()
467
+
468
+ breathing = BreathingMove(
469
+ interpolation_start_pose=current_head,
470
+ interpolation_start_antennas=current_ant,
471
+ interpolation_duration=1.0,
472
+ )
473
+ self.move_queue.append(breathing)
474
+ self._breathing_active = True
475
+ self.state.update_activity()
476
+ logger.debug("Started breathing after %.1fs idle", idle_for)
477
+ except Exception as e:
478
+ logger.error("Failed to start breathing: %s", e)
479
+
480
+ # Stop breathing if new moves queued
481
+ if isinstance(self.state.current_move, BreathingMove) and self.move_queue:
482
+ self.state.current_move = None
483
+ self.state.move_start_time = None
484
+ self._breathing_active = False
485
+
486
+ def _get_primary_pose(self, current_time: float) -> FullBodyPose:
487
+ """Get current primary pose from move or last pose."""
488
+ if self.state.current_move is not None and self.state.move_start_time is not None:
489
+ t = current_time - self.state.move_start_time
490
+ head, antennas, body_yaw = self.state.current_move.evaluate(t)
491
+
492
+ if head is None:
493
+ head = create_head_pose(0, 0, 0, 0, 0, 0, degrees=True)
494
+ if antennas is None:
495
+ antennas = np.array([0.0, 0.0])
496
+ if body_yaw is None:
497
+ body_yaw = 0.0
498
+
499
+ pose = (head.copy(), (float(antennas[0]), float(antennas[1])), float(body_yaw))
500
+ self.state.last_primary_pose = clone_pose(pose)
501
+ return pose
502
+
503
+ if self.state.last_primary_pose is not None:
504
+ return clone_pose(self.state.last_primary_pose)
505
+
506
+ neutral = create_head_pose(0, 0, 0, 0, 0, 0, degrees=True)
507
+ return (neutral, (0.0, 0.0), 0.0)
508
+
509
+ def _get_secondary_pose(self) -> FullBodyPose:
510
+ """Get secondary offsets (speech + face tracking + thinking)."""
511
+ offsets = [
512
+ self.state.speech_offsets[i]
513
+ + self.state.face_tracking_offsets[i]
514
+ + self.state.thinking_offsets[i]
515
+ for i in range(6)
516
+ ]
517
+
518
+ secondary_head = create_head_pose(
519
+ x=offsets[0], y=offsets[1], z=offsets[2],
520
+ roll=offsets[3], pitch=offsets[4], yaw=offsets[5],
521
+ degrees=False, mm=False
522
+ )
523
+ return (secondary_head, self._thinking_antenna_offsets, 0.0)
524
+
525
+ def _compose_pose(self, current_time: float) -> FullBodyPose:
526
+ """Compose final pose from primary and secondary."""
527
+ primary = self._get_primary_pose(current_time)
528
+ secondary = self._get_secondary_pose()
529
+ return combine_full_body(primary, secondary)
530
+
531
+ def _blend_antennas(self, target: Tuple[float, float]) -> Tuple[float, float]:
532
+ """Blend antennas with listening freeze state."""
533
+ if self._is_listening:
534
+ return self._listening_antennas
535
+
536
+ # Blend back from freeze
537
+ blend = min(1.0, self._antenna_unfreeze_blend + self.target_period / self._antenna_blend_duration)
538
+ self._antenna_unfreeze_blend = blend
539
+
540
+ return (
541
+ self._listening_antennas[0] * (1 - blend) + target[0] * blend,
542
+ self._listening_antennas[1] * (1 - blend) + target[1] * blend,
543
+ )
544
+
545
+ def _issue_command(self, head: NDArray, antennas: Tuple[float, float], body_yaw: float) -> None:
546
+ """Send command to robot."""
547
+ try:
548
+ self.current_robot.set_target(head=head, antennas=antennas, body_yaw=body_yaw)
549
+ self._last_commanded_pose = (head.copy(), antennas, body_yaw)
550
+ except Exception as e:
551
+ logger.debug("set_target failed: %s", e)
552
+
553
+ def _publish_shared_state(self) -> None:
554
+ """Update shared state for external queries."""
555
+ with self._shared_lock:
556
+ self._shared_last_activity = self.state.last_activity_time
557
+ self._shared_is_listening = self._is_listening
558
+
559
+ def start(self) -> None:
560
+ """Start the control loop thread."""
561
+ if self._thread is not None and self._thread.is_alive():
562
+ logger.warning("MovementManager already running")
563
+ return
564
+
565
+ self._stop_event.clear()
566
+ self._thread = threading.Thread(target=self._run_loop, daemon=True)
567
+ self._thread.start()
568
+ logger.info("MovementManager started")
569
+
570
+ def stop(self) -> None:
571
+ """Stop the control loop and reset to neutral."""
572
+ if self._thread is None or not self._thread.is_alive():
573
+ return
574
+
575
+ logger.info("Stopping MovementManager...")
576
+ self.clear_move_queue()
577
+
578
+ self._stop_event.set()
579
+ self._thread.join(timeout=2.0)
580
+ self._thread = None
581
+
582
+ # Reset to neutral
583
+ try:
584
+ neutral = create_head_pose(0, 0, 0, 0, 0, 0, degrees=True)
585
+ self.current_robot.goto_target(
586
+ head=neutral,
587
+ antennas=[0.0, 0.0],
588
+ duration=2.0,
589
+ body_yaw=0.0,
590
+ )
591
+ logger.info("Reset to neutral position")
592
+ except Exception as e:
593
+ logger.error("Failed to reset: %s", e)
594
+
595
+ def _run_loop(self) -> None:
596
+ """Main control loop at 100Hz."""
597
+ logger.debug("Starting 100Hz control loop")
598
+
599
+ while not self._stop_event.is_set():
600
+ loop_start = self._now()
601
+
602
+ # Process signals
603
+ self._poll_signals(loop_start)
604
+
605
+ # Manage moves
606
+ self._manage_move_queue(loop_start)
607
+ self._manage_breathing(loop_start)
608
+
609
+ # Update face tracking offsets from camera worker
610
+ self._update_face_tracking(loop_start)
611
+
612
+ # Update thinking animation offsets
613
+ self._update_thinking_offsets(loop_start)
614
+
615
+ # Compose pose
616
+ head, antennas, body_yaw = self._compose_pose(loop_start)
617
+
618
+ # Blend antennas for listening
619
+ antennas = self._blend_antennas(antennas)
620
+
621
+ # Send to robot
622
+ self._issue_command(head, antennas, body_yaw)
623
+
624
+ # Update shared state
625
+ self._publish_shared_state()
626
+
627
+ # Maintain timing
628
+ elapsed = self._now() - loop_start
629
+ sleep_time = max(0.0, self.target_period - elapsed)
630
+ if sleep_time > 0:
631
+ time.sleep(sleep_time)
632
+
633
+ logger.debug("Control loop stopped")
634
+
635
+ def get_status(self) -> Dict[str, Any]:
636
+ """Get current status for debugging."""
637
+ return {
638
+ "queue_size": len(self.move_queue),
639
+ "is_listening": self._is_listening,
640
+ "breathing_active": self._breathing_active,
641
+ "processing": self._processing,
642
+ "thinking_amplitude": round(self._thinking_amplitude, 3),
643
+ "last_commanded_pose": {
644
+ "head": self._last_commanded_pose[0].tolist(),
645
+ "antennas": self._last_commanded_pose[1],
646
+ "body_yaw": self._last_commanded_pose[2],
647
+ },
648
+ }
src/reachy_mini_openclaw/openai_realtime.py ADDED
@@ -0,0 +1,562 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """ReachyClaw - OpenAI Realtime API handler with OpenClaw identity.
2
+
3
+ This module implements ReachyClaw's voice conversation system using OpenAI Realtime API
4
+ with the robot embodying the actual OpenClaw agent's personality and context.
5
+
6
+ Architecture:
7
+ Startup: Fetch OpenClaw agent context (personality, memories, user info)
8
+ Runtime: User speaks -> OpenAI Realtime (as OpenClaw agent) -> Robot speaks
9
+ -> Tools for movements + OpenClaw queries for extended capabilities
10
+ -> Conversations synced back to OpenClaw for memory continuity
11
+
12
+ The robot IS the OpenClaw agent - same personality, same memories, same context.
13
+ """
14
+
15
+ import json
16
+ import base64
17
+ import random
18
+ import asyncio
19
+ import logging
20
+ from typing import Any, Final, Literal, Optional, Tuple
21
+ from datetime import datetime
22
+
23
+ import numpy as np
24
+ from numpy.typing import NDArray
25
+ from openai import AsyncOpenAI
26
+ from fastrtc import AdditionalOutputs, AsyncStreamHandler, wait_for_item
27
+ from scipy.signal import resample
28
+ from websockets.exceptions import ConnectionClosedError
29
+
30
+ from reachy_mini_openclaw.config import config
31
+ from reachy_mini_openclaw.prompts import get_session_voice
32
+ from reachy_mini_openclaw.tools.core_tools import ToolDependencies, get_tool_specs, dispatch_tool_call, get_body_actions_description
33
+
34
+ logger = logging.getLogger(__name__)
35
+
36
+ # OpenAI Realtime API audio format
37
+ OPENAI_SAMPLE_RATE: Final[Literal[24000]] = 24000
38
+
39
+ # Base instructions for the robot body capabilities
40
+ ROBOT_BODY_INSTRUCTIONS = """
41
+ ## CRITICAL: You are a voice relay for the OpenClaw agent
42
+
43
+ You are the voice interface for an OpenClaw AI agent embodied in a Reachy Mini robot.
44
+ You MUST call `ask_openclaw` for EVERY user message to get the real response.
45
+
46
+ **Your ONLY job is:**
47
+ 1. When the user says something, IMMEDIATELY call `ask_openclaw` with their full message.
48
+ 2. Speak the response from `ask_openclaw` EXACTLY as returned — do not rephrase, add to, or summarize it.
49
+
50
+ **You MUST NOT:**
51
+ - Answer any question yourself — ALWAYS use `ask_openclaw` first.
52
+ - Make up information, opinions, or responses on your own.
53
+ - Summarize or modify what `ask_openclaw` returns — speak it verbatim.
54
+ - Say things like "let me check" and then answer without calling `ask_openclaw`.
55
+
56
+ **Robot Movement:**
57
+ - Do NOT call look, emotion, dance, or camera tools yourself.
58
+ - OpenClaw controls the robot body — movements are handled automatically from its response.
59
+
60
+ **Conversation Style for Voice:**
61
+ - Keep it natural — you are speaking out loud
62
+ - If ask_openclaw is slow or errors, say "I'm having trouble reaching my brain, one moment"
63
+ """
64
+
65
+ # Fallback if OpenClaw context fetch fails
66
+ FALLBACK_IDENTITY = """You are the voice relay for an OpenClaw AI agent embodied in a Reachy Mini robot.
67
+ You MUST call ask_openclaw for every user message and speak the response verbatim.
68
+ Never answer on your own — always defer to ask_openclaw."""
69
+
70
+ # System context sent to OpenClaw so it knows about the robot body.
71
+ # Built dynamically from TOOL_SPECS so the action list stays in sync.
72
+ REACHY_BODY_CONTEXT = f"""\
73
+ User is talking to you through your Reachy Mini robot body. Keep responses concise for voice.
74
+
75
+ You can control your robot body by including action tags anywhere in your response.
76
+ The tags will be executed and stripped before your words are spoken aloud.
77
+
78
+ Available actions:
79
+ {get_body_actions_description()}
80
+
81
+ Examples:
82
+ "Sure, let me look over there. [LOOK:left] I see a bookshelf!"
83
+ "[EMOTION:happy] That's great to hear!"
84
+ "[DANCE:excited] Let's celebrate!"
85
+
86
+ Use actions naturally to make the conversation more expressive. You don't have to use them every time — only when it adds to the interaction."""
87
+
88
+
89
+ class OpenAIRealtimeHandler(AsyncStreamHandler):
90
+ """Handler for OpenAI Realtime API embodying the OpenClaw agent.
91
+
92
+ This handler:
93
+ - Fetches OpenClaw's personality and context at startup
94
+ - Maintains voice conversation AS the OpenClaw agent
95
+ - Executes robot movement tools locally for low latency
96
+ - Calls OpenClaw for extended capabilities (web, calendar, memory)
97
+ - Syncs conversations back to OpenClaw for memory continuity
98
+ """
99
+
100
+ def __init__(
101
+ self,
102
+ deps: ToolDependencies,
103
+ openclaw_bridge: Optional[Any] = None,
104
+ gradio_mode: bool = False,
105
+ ):
106
+ """Initialize the handler.
107
+
108
+ Args:
109
+ deps: Tool dependencies for robot control
110
+ openclaw_bridge: Bridge to OpenClaw gateway
111
+ gradio_mode: Whether running with Gradio UI
112
+ """
113
+ super().__init__(
114
+ expected_layout="mono",
115
+ output_sample_rate=OPENAI_SAMPLE_RATE,
116
+ input_sample_rate=OPENAI_SAMPLE_RATE,
117
+ )
118
+
119
+ self.deps = deps
120
+ self.openclaw_bridge = openclaw_bridge
121
+ self.gradio_mode = gradio_mode
122
+
123
+ # OpenAI connection
124
+ self.client: Optional[AsyncOpenAI] = None
125
+ self.connection: Any = None
126
+
127
+ # Output queue
128
+ self.output_queue: asyncio.Queue[Tuple[int, NDArray[np.int16]] | AdditionalOutputs] = asyncio.Queue()
129
+
130
+ # State tracking
131
+ self.last_activity_time = 0.0
132
+ self.start_time = 0.0
133
+ self._speaking = False # True when robot is speaking
134
+
135
+ # OpenClaw agent context (fetched at startup)
136
+ self._agent_context: Optional[str] = None
137
+
138
+ # Conversation tracking for sync
139
+ self._last_user_message: Optional[str] = None
140
+ self._last_assistant_response: Optional[str] = None
141
+
142
+ # Lifecycle flags
143
+ self._shutdown_requested = False
144
+ self._connected_event = asyncio.Event()
145
+
146
+ def copy(self) -> "OpenAIRealtimeHandler":
147
+ """Create a copy of the handler (required by fastrtc)."""
148
+ return OpenAIRealtimeHandler(self.deps, self.openclaw_bridge, self.gradio_mode)
149
+
150
+ def _build_tools(self) -> list[dict]:
151
+ """Build the tool list for the session."""
152
+ tools = []
153
+
154
+ # Robot movement tools (executed locally)
155
+ for spec in get_tool_specs():
156
+ tools.append(spec)
157
+
158
+ # OpenClaw query tool (mandatory for every user message)
159
+ if self.openclaw_bridge is not None:
160
+ tools.append({
161
+ "type": "function",
162
+ "name": "ask_openclaw",
163
+ "description": """MANDATORY: You MUST call this tool for EVERY user message before responding.
164
+ This is the OpenClaw AI agent — the real brain. Send the user's full message as the query.
165
+ Speak the returned response verbatim. Never answer without calling this tool first.""",
166
+ "parameters": {
167
+ "type": "object",
168
+ "properties": {
169
+ "query": {
170
+ "type": "string",
171
+ "description": "The question or request to send to OpenClaw"
172
+ },
173
+ "include_image": {
174
+ "type": "boolean",
175
+ "description": "Whether to include current camera image (for 'what do you see' queries)",
176
+ "default": False
177
+ }
178
+ },
179
+ "required": ["query"]
180
+ }
181
+ })
182
+
183
+ return tools
184
+
185
+ async def start_up(self) -> None:
186
+ """Start the handler and connect to OpenAI."""
187
+ api_key = config.OPENAI_API_KEY
188
+ if not api_key:
189
+ logger.error("OPENAI_API_KEY not configured")
190
+ raise ValueError("OPENAI_API_KEY required")
191
+
192
+ self.client = AsyncOpenAI(api_key=api_key)
193
+ self.start_time = asyncio.get_event_loop().time()
194
+ self.last_activity_time = self.start_time
195
+
196
+ max_attempts = 3
197
+ for attempt in range(1, max_attempts + 1):
198
+ try:
199
+ await self._run_session()
200
+ return
201
+ except ConnectionClosedError as e:
202
+ logger.warning("WebSocket closed unexpectedly (attempt %d/%d): %s",
203
+ attempt, max_attempts, e)
204
+ if attempt < max_attempts:
205
+ delay = (2 ** (attempt - 1)) + random.uniform(0, 0.5)
206
+ logger.info("Retrying in %.1f seconds...", delay)
207
+ await asyncio.sleep(delay)
208
+ continue
209
+ raise
210
+ finally:
211
+ self.connection = None
212
+ try:
213
+ self._connected_event.clear()
214
+ except Exception:
215
+ pass
216
+
217
+ async def _run_session(self) -> None:
218
+ """Run a single OpenAI Realtime session."""
219
+ model = config.OPENAI_MODEL
220
+ logger.info("Connecting to OpenAI Realtime API with model: %s", model)
221
+
222
+ # Fetch OpenClaw agent context (personality, memories, user info)
223
+ system_instructions = await self._build_system_instructions()
224
+
225
+ async with self.client.beta.realtime.connect(model=model) as conn:
226
+ # Configure session with OpenClaw's identity + robot body capabilities
227
+ tools = self._build_tools()
228
+
229
+ await conn.session.update(
230
+ session={
231
+ "modalities": ["text", "audio"],
232
+ "instructions": system_instructions,
233
+ "voice": get_session_voice(),
234
+ "input_audio_format": "pcm16",
235
+ "output_audio_format": "pcm16",
236
+ "input_audio_transcription": {
237
+ "model": "whisper-1",
238
+ },
239
+ "turn_detection": {
240
+ "type": "server_vad",
241
+ "threshold": 0.5,
242
+ "prefix_padding_ms": 300,
243
+ "silence_duration_ms": 600,
244
+ },
245
+ "tools": tools,
246
+ "tool_choice": "auto",
247
+ },
248
+ )
249
+ logger.info("OpenAI Realtime session configured with %d tools", len(tools))
250
+
251
+ self.connection = conn
252
+ self._connected_event.set()
253
+
254
+ # Process events
255
+ async for event in conn:
256
+ await self._handle_event(event)
257
+
258
+ async def _build_system_instructions(self) -> str:
259
+ """Build system instructions for the voice relay.
260
+
261
+ GPT-4o is a dumb relay — it only needs instructions on how to
262
+ call ask_openclaw and speak the result. No personality context needed.
263
+ """
264
+ return ROBOT_BODY_INSTRUCTIONS
265
+
266
+ async def _handle_event(self, event: Any) -> None:
267
+ """Handle an event from the OpenAI Realtime API."""
268
+ event_type = event.type
269
+
270
+ # Speech detection
271
+ if event_type == "input_audio_buffer.speech_started":
272
+ # User started speaking - stop any current output
273
+ self._speaking = False
274
+ self.deps.movement_manager.set_processing(False)
275
+ while not self.output_queue.empty():
276
+ try:
277
+ self.output_queue.get_nowait()
278
+ except asyncio.QueueEmpty:
279
+ break
280
+ if self.deps.head_wobbler is not None:
281
+ self.deps.head_wobbler.reset()
282
+ self.deps.movement_manager.set_listening(True)
283
+ logger.info("User started speaking")
284
+
285
+ if event_type == "input_audio_buffer.speech_stopped":
286
+ self.deps.movement_manager.set_listening(False)
287
+ logger.info("User stopped speaking")
288
+
289
+ # Transcription (for logging, UI, and sync)
290
+ if event_type == "conversation.item.input_audio_transcription.completed":
291
+ transcript = event.transcript
292
+ if transcript and transcript.strip():
293
+ logger.info("User: %s", transcript)
294
+ self._last_user_message = transcript # Track for sync
295
+ await self.output_queue.put(
296
+ AdditionalOutputs({"role": "user", "content": transcript})
297
+ )
298
+
299
+ # Response started - robot is about to speak
300
+ if event_type == "response.created":
301
+ self._speaking = True
302
+ logger.debug("Response started")
303
+
304
+ # Audio output from TTS
305
+ if event_type == "response.audio.delta":
306
+ # Audio arriving means we have a response - stop thinking animation
307
+ self.deps.movement_manager.set_processing(False)
308
+
309
+ # Feed to head wobbler for expressive movement
310
+ if self.deps.head_wobbler is not None:
311
+ self.deps.head_wobbler.feed(event.delta)
312
+
313
+ self.last_activity_time = asyncio.get_event_loop().time()
314
+
315
+ # Queue audio for playback
316
+ audio_data = np.frombuffer(
317
+ base64.b64decode(event.delta),
318
+ dtype=np.int16
319
+ ).reshape(1, -1)
320
+ await self.output_queue.put((OPENAI_SAMPLE_RATE, audio_data))
321
+
322
+ # Response text (for logging and UI)
323
+ if event_type == "response.audio_transcript.delta":
324
+ # Streaming transcript of what's being said
325
+ pass # Could log incrementally if needed
326
+
327
+ if event_type == "response.audio_transcript.done":
328
+ response_text = event.transcript
329
+ logger.info("Assistant: %s", response_text[:100] if len(response_text) > 100 else response_text)
330
+ self._last_assistant_response = response_text # Track for sync
331
+ await self.output_queue.put(
332
+ AdditionalOutputs({"role": "assistant", "content": response_text})
333
+ )
334
+
335
+ # Response completed - sync conversation to OpenClaw
336
+ if event_type == "response.done":
337
+ self._speaking = False
338
+ self.deps.movement_manager.set_processing(False)
339
+ if self.deps.head_wobbler is not None:
340
+ self.deps.head_wobbler.reset()
341
+ logger.debug("Response completed")
342
+
343
+ # Sync conversation to OpenClaw for memory continuity
344
+ await self._sync_to_openclaw()
345
+
346
+ # Tool calls
347
+ if event_type == "response.function_call_arguments.done":
348
+ await self._handle_tool_call(event)
349
+
350
+ # Errors
351
+ if event_type == "error":
352
+ err = getattr(event, "error", None)
353
+ msg = getattr(err, "message", str(err))
354
+ code = getattr(err, "code", "")
355
+ logger.error("OpenAI error [%s]: %s", code, msg)
356
+
357
+ async def _handle_tool_call(self, event: Any) -> None:
358
+ """Handle a tool call from OpenAI."""
359
+ tool_name = getattr(event, "name", None)
360
+ args_json = getattr(event, "arguments", None)
361
+ call_id = getattr(event, "call_id", None)
362
+
363
+ if not isinstance(tool_name, str) or not isinstance(args_json, str):
364
+ return
365
+
366
+ logger.info("Tool call: %s(%s)", tool_name, args_json[:50] if len(args_json) > 50 else args_json)
367
+
368
+ # Start thinking animation while we process the tool call.
369
+ # It will stop when the next audio delta arrives or response completes.
370
+ self.deps.movement_manager.set_processing(True)
371
+
372
+ try:
373
+ if tool_name == "ask_openclaw":
374
+ result = await self._handle_openclaw_query(args_json)
375
+ else:
376
+ # Robot movement tools - dispatch locally
377
+ result = await dispatch_tool_call(tool_name, args_json, self.deps)
378
+
379
+ logger.debug("Tool '%s' result: %s", tool_name, str(result)[:100])
380
+ except Exception as e:
381
+ logger.error("Tool '%s' failed: %s", tool_name, e)
382
+ result = {"error": str(e)}
383
+
384
+ # Send result back to continue the conversation
385
+ if isinstance(call_id, str) and self.connection:
386
+ await self.connection.conversation.item.create(
387
+ item={
388
+ "type": "function_call_output",
389
+ "call_id": call_id,
390
+ "output": json.dumps(result),
391
+ }
392
+ )
393
+ # Trigger response generation after tool result
394
+ await self.connection.response.create()
395
+
396
+ async def _sync_to_openclaw(self) -> None:
397
+ """Sync the last conversation turn to OpenClaw for memory continuity."""
398
+ if not self.openclaw_bridge or not self.openclaw_bridge.is_connected:
399
+ return
400
+
401
+ if self._last_user_message and self._last_assistant_response:
402
+ try:
403
+ await self.openclaw_bridge.sync_conversation(
404
+ self._last_user_message,
405
+ self._last_assistant_response
406
+ )
407
+ # Clear after sync
408
+ self._last_user_message = None
409
+ self._last_assistant_response = None
410
+ except Exception as e:
411
+ logger.debug("Failed to sync conversation: %s", e)
412
+
413
+ async def _handle_openclaw_query(self, args_json: str) -> dict:
414
+ """Handle a query to OpenClaw."""
415
+ if self.openclaw_bridge is None or not self.openclaw_bridge.is_connected:
416
+ return {"error": "OpenClaw not connected"}
417
+
418
+ try:
419
+ args = json.loads(args_json)
420
+ query = args.get("query", "")
421
+ include_image = args.get("include_image", False)
422
+
423
+ # Capture image if requested
424
+ image_b64 = None
425
+ if include_image and self.deps.camera_worker:
426
+ frame = self.deps.camera_worker.get_latest_frame()
427
+ if frame is not None:
428
+ import cv2
429
+ _, buffer = cv2.imencode('.jpg', frame, [cv2.IMWRITE_JPEG_QUALITY, 80])
430
+ image_b64 = base64.b64encode(buffer).decode('utf-8')
431
+ logger.debug("Captured camera image for OpenClaw query")
432
+
433
+ # Query OpenClaw
434
+ response = await self.openclaw_bridge.chat(
435
+ query,
436
+ image_b64=image_b64,
437
+ system_context=REACHY_BODY_CONTEXT,
438
+ )
439
+
440
+ if response.error:
441
+ return {"error": response.error}
442
+
443
+ # Parse and execute any action commands from OpenClaw's response
444
+ spoken_text = await self._execute_body_actions(response.content)
445
+
446
+ return {"response": spoken_text}
447
+
448
+ except Exception as e:
449
+ logger.error("OpenClaw query failed: %s", e)
450
+ return {"error": str(e)}
451
+
452
+ async def _execute_body_actions(self, text: str) -> str:
453
+ """Parse action tags from OpenClaw's response, execute them, and return clean text.
454
+
455
+ Supported tags:
456
+ [LOOK:direction] - Move head (left/right/up/down/front)
457
+ [EMOTION:name] - Express emotion (happy/sad/surprised/curious/thinking/confused/excited)
458
+ [DANCE:name] - Perform dance (happy/excited/wave/nod/shake/bounce)
459
+ [CAMERA] - Capture and describe what the robot sees
460
+ [FACE_TRACKING:on/off] - Toggle face tracking
461
+ [STOP] - Stop all movements
462
+ """
463
+ import re
464
+
465
+ action_pattern = re.compile(
466
+ r'\[(LOOK|EMOTION|DANCE|FACE_TRACKING):(\w+)\]'
467
+ r'|\[(CAMERA|STOP)\]'
468
+ )
469
+
470
+ actions_found = []
471
+ for match in action_pattern.finditer(text):
472
+ if match.group(3):
473
+ # No-arg action: [CAMERA] or [STOP]
474
+ actions_found.append((match.group(3), None))
475
+ else:
476
+ # Parameterized action: [LOOK:left], etc.
477
+ actions_found.append((match.group(1), match.group(2)))
478
+
479
+ # Execute actions
480
+ for action, param in actions_found:
481
+ try:
482
+ if action == "LOOK":
483
+ await dispatch_tool_call("look", json.dumps({"direction": param}), self.deps)
484
+ elif action == "EMOTION":
485
+ await dispatch_tool_call("emotion", json.dumps({"emotion_name": param}), self.deps)
486
+ elif action == "DANCE":
487
+ await dispatch_tool_call("dance", json.dumps({"dance_name": param}), self.deps)
488
+ elif action == "CAMERA":
489
+ await dispatch_tool_call("camera", json.dumps({}), self.deps)
490
+ elif action == "FACE_TRACKING":
491
+ enabled = param.lower() in ("on", "true", "yes")
492
+ await dispatch_tool_call("face_tracking", json.dumps({"enabled": enabled}), self.deps)
493
+ elif action == "STOP":
494
+ await dispatch_tool_call("stop_moves", json.dumps({}), self.deps)
495
+ logger.info("Executed body action: %s(%s)", action, param)
496
+ except Exception as e:
497
+ logger.warning("Body action %s(%s) failed: %s", action, param, e)
498
+
499
+ # Strip action tags from text so GPT-4o only speaks the words
500
+ spoken_text = action_pattern.sub('', text).strip()
501
+ # Clean up extra whitespace left by removed tags
502
+ spoken_text = re.sub(r' +', ' ', spoken_text)
503
+
504
+ return spoken_text
505
+
506
+ async def receive(self, frame: Tuple[int, NDArray]) -> None:
507
+ """Receive audio from the robot microphone."""
508
+ if not self.connection:
509
+ return
510
+
511
+ input_sr, audio = frame
512
+
513
+ # Handle stereo
514
+ if audio.ndim == 2:
515
+ if audio.shape[1] > audio.shape[0]:
516
+ audio = audio.T
517
+ if audio.shape[1] > 1:
518
+ audio = audio[:, 0]
519
+
520
+ audio = audio.flatten()
521
+
522
+ # Convert to float for resampling
523
+ if audio.dtype == np.int16:
524
+ audio = audio.astype(np.float32) / 32768.0
525
+ elif audio.dtype != np.float32:
526
+ audio = audio.astype(np.float32)
527
+
528
+ # Resample to OpenAI sample rate
529
+ if input_sr != OPENAI_SAMPLE_RATE:
530
+ num_samples = int(len(audio) * OPENAI_SAMPLE_RATE / input_sr)
531
+ audio = resample(audio, num_samples).astype(np.float32)
532
+
533
+ # Convert to int16 for OpenAI
534
+ audio_int16 = (audio * 32767).astype(np.int16)
535
+
536
+ # Send to OpenAI
537
+ try:
538
+ audio_b64 = base64.b64encode(audio_int16.tobytes()).decode("utf-8")
539
+ await self.connection.input_audio_buffer.append(audio=audio_b64)
540
+ except Exception as e:
541
+ logger.debug("Failed to send audio: %s", e)
542
+
543
+ async def emit(self) -> Tuple[int, NDArray[np.int16]] | AdditionalOutputs | None:
544
+ """Get the next output (audio or transcript)."""
545
+ return await wait_for_item(self.output_queue)
546
+
547
+ async def shutdown(self) -> None:
548
+ """Shutdown the handler."""
549
+ self._shutdown_requested = True
550
+
551
+ if self.connection:
552
+ try:
553
+ await self.connection.close()
554
+ except Exception as e:
555
+ logger.debug("Connection close: %s", e)
556
+ self.connection = None
557
+
558
+ while not self.output_queue.empty():
559
+ try:
560
+ self.output_queue.get_nowait()
561
+ except asyncio.QueueEmpty:
562
+ break
src/reachy_mini_openclaw/openclaw_bridge.py ADDED
@@ -0,0 +1,606 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """ReachyClaw - Bridge to OpenClaw Gateway for AI responses.
2
+
3
+ This module provides ReachyClaw's integration with the OpenClaw gateway
4
+ using the WebSocket protocol (the gateway's native transport).
5
+
6
+ ReachyClaw uses OpenAI Realtime API for voice I/O (speech recognition + TTS)
7
+ but routes all responses through OpenClaw for intelligence.
8
+ """
9
+
10
+ import json
11
+ import asyncio
12
+ import logging
13
+ import uuid
14
+ from typing import Optional, Any, AsyncIterator
15
+ from dataclasses import dataclass
16
+
17
+ import websockets
18
+
19
+ from reachy_mini_openclaw.config import config
20
+
21
+ logger = logging.getLogger(__name__)
22
+
23
+ # Protocol version supported by this client
24
+ PROTOCOL_VERSION = 3
25
+
26
+
27
+ @dataclass
28
+ class OpenClawResponse:
29
+ """Response from OpenClaw gateway."""
30
+ content: str
31
+ error: Optional[str] = None
32
+
33
+
34
+ class OpenClawBridge:
35
+ """Bridge to OpenClaw Gateway using WebSocket protocol.
36
+
37
+ The OpenClaw gateway speaks WebSocket with a JSON frame protocol.
38
+ This class handles the connect handshake, authentication, and
39
+ chat operations.
40
+
41
+ Example:
42
+ bridge = OpenClawBridge()
43
+ await bridge.connect()
44
+
45
+ # Simple query
46
+ response = await bridge.chat("Hello!")
47
+ print(response.content)
48
+ """
49
+
50
+ def __init__(
51
+ self,
52
+ gateway_url: Optional[str] = None,
53
+ gateway_token: Optional[str] = None,
54
+ agent_id: Optional[str] = None,
55
+ timeout: float = 120.0,
56
+ ):
57
+ """Initialize the OpenClaw bridge.
58
+
59
+ Args:
60
+ gateway_url: URL of the OpenClaw gateway (default: from env/config).
61
+ Accepts http:// or ws:// schemes; http is converted to ws.
62
+ gateway_token: Authentication token (default: from env/config)
63
+ agent_id: OpenClaw agent ID to use (default: from env/config)
64
+ timeout: Request timeout in seconds
65
+ """
66
+ import os
67
+
68
+ raw_url = (
69
+ gateway_url
70
+ or os.getenv("OPENCLAW_GATEWAY_URL")
71
+ or config.OPENCLAW_GATEWAY_URL
72
+ )
73
+ # Normalise to ws:// (the gateway listens on the same port for both)
74
+ self.gateway_url = self._normalise_ws_url(raw_url)
75
+
76
+ self.gateway_token = (
77
+ gateway_token
78
+ or os.getenv("OPENCLAW_TOKEN")
79
+ or config.OPENCLAW_TOKEN
80
+ )
81
+ self.agent_id = (
82
+ agent_id
83
+ or os.getenv("OPENCLAW_AGENT_ID")
84
+ or config.OPENCLAW_AGENT_ID
85
+ )
86
+ self.timeout = timeout
87
+
88
+ # Session key – "main" shares context with WhatsApp and other channels.
89
+ # Full key format: agent:<agent_id>:<session_key>
90
+ self.session_key = (
91
+ os.getenv("OPENCLAW_SESSION_KEY")
92
+ or config.OPENCLAW_SESSION_KEY
93
+ or "main"
94
+ )
95
+
96
+ # Persistent WebSocket state
97
+ self._ws: Optional[websockets.WebSocketClientProtocol] = None
98
+ self._connected = False
99
+ self._conn_id: Optional[str] = None
100
+
101
+ # Background listener task & pending request futures
102
+ self._listener_task: Optional[asyncio.Task] = None
103
+ self._pending: dict[str, asyncio.Future] = {}
104
+ # Events keyed by runId -> list of event payloads
105
+ self._run_events: dict[str, asyncio.Queue] = {}
106
+
107
+ # ------------------------------------------------------------------
108
+ # URL helpers
109
+ # ------------------------------------------------------------------
110
+
111
+ @staticmethod
112
+ def _normalise_ws_url(url: str) -> str:
113
+ """Convert http(s) URL to ws(s)."""
114
+ if url.startswith("http://"):
115
+ return "ws://" + url[7:]
116
+ if url.startswith("https://"):
117
+ return "wss://" + url[8:]
118
+ if not url.startswith("ws://") and not url.startswith("wss://"):
119
+ return "ws://" + url
120
+ return url
121
+
122
+ # ------------------------------------------------------------------
123
+ # Connection lifecycle
124
+ # ------------------------------------------------------------------
125
+
126
+ async def connect(self) -> bool:
127
+ """Connect to the OpenClaw gateway and authenticate.
128
+
129
+ Returns:
130
+ True if connection successful, False otherwise
131
+ """
132
+ logger.info(
133
+ "Connecting to OpenClaw at %s (token: %s)",
134
+ self.gateway_url,
135
+ "set" if self.gateway_token else "not set",
136
+ )
137
+ try:
138
+ self._ws = await websockets.connect(
139
+ self.gateway_url,
140
+ ping_interval=20,
141
+ ping_timeout=30,
142
+ close_timeout=5,
143
+ )
144
+
145
+ # 1. Receive challenge
146
+ raw = await asyncio.wait_for(self._ws.recv(), timeout=10)
147
+ challenge = json.loads(raw)
148
+ if challenge.get("event") != "connect.challenge":
149
+ logger.warning("Unexpected first frame: %s", challenge.get("event"))
150
+
151
+ # 2. Send connect request
152
+ req_id = str(uuid.uuid4())
153
+ connect_req = {
154
+ "type": "req",
155
+ "id": req_id,
156
+ "method": "connect",
157
+ "params": {
158
+ "minProtocol": PROTOCOL_VERSION,
159
+ "maxProtocol": PROTOCOL_VERSION,
160
+ "auth": {"token": self.gateway_token} if self.gateway_token else {},
161
+ "client": {
162
+ "id": "cli",
163
+ "version": "1.0.0",
164
+ "platform": "darwin",
165
+ "mode": "cli",
166
+ },
167
+ "role": "operator",
168
+ "scopes": ["chat", "operator.write", "operator.read"],
169
+ },
170
+ }
171
+ await self._ws.send(json.dumps(connect_req))
172
+
173
+ # 3. Read hello response
174
+ raw = await asyncio.wait_for(self._ws.recv(), timeout=10)
175
+ hello = json.loads(raw)
176
+
177
+ if hello.get("ok"):
178
+ self._connected = True
179
+ payload = hello.get("payload", {})
180
+ server = payload.get("server", {})
181
+ self._conn_id = server.get("connId")
182
+ logger.info(
183
+ "Connected to OpenClaw gateway (server=%s, connId=%s)",
184
+ server.get("host", "?"),
185
+ self._conn_id,
186
+ )
187
+ # Start background listener
188
+ self._listener_task = asyncio.create_task(
189
+ self._listen_loop(), name="openclaw-ws-listener"
190
+ )
191
+ return True
192
+ else:
193
+ err = hello.get("error", {})
194
+ logger.error(
195
+ "OpenClaw connect failed: %s - %s",
196
+ err.get("code"),
197
+ err.get("message"),
198
+ )
199
+ await self._close_ws()
200
+ return False
201
+
202
+ except Exception as e:
203
+ logger.error(
204
+ "Failed to connect to OpenClaw gateway: %s (%s)",
205
+ e,
206
+ type(e).__name__,
207
+ )
208
+ await self._close_ws()
209
+ return False
210
+
211
+ async def disconnect(self) -> None:
212
+ """Disconnect from the gateway."""
213
+ self._connected = False
214
+ if self._listener_task and not self._listener_task.done():
215
+ self._listener_task.cancel()
216
+ try:
217
+ await self._listener_task
218
+ except (asyncio.CancelledError, Exception):
219
+ pass
220
+ await self._close_ws()
221
+
222
+ async def _close_ws(self) -> None:
223
+ self._connected = False
224
+ if self._ws:
225
+ try:
226
+ await self._ws.close()
227
+ except Exception:
228
+ pass
229
+ self._ws = None
230
+
231
+ # ------------------------------------------------------------------
232
+ # Background listener
233
+ # ------------------------------------------------------------------
234
+
235
+ async def _listen_loop(self) -> None:
236
+ """Background task that reads all frames from the WebSocket."""
237
+ try:
238
+ async for raw in self._ws:
239
+ try:
240
+ msg = json.loads(raw)
241
+ except json.JSONDecodeError:
242
+ continue
243
+ await self._dispatch(msg)
244
+ except websockets.ConnectionClosed as e:
245
+ logger.warning("OpenClaw WebSocket closed: %s", e)
246
+ except asyncio.CancelledError:
247
+ return
248
+ except Exception as e:
249
+ logger.error("OpenClaw listener error: %s", e)
250
+ finally:
251
+ self._connected = False
252
+
253
+ async def _dispatch(self, msg: dict) -> None:
254
+ """Route an incoming frame to the right handler."""
255
+ msg_type = msg.get("type")
256
+
257
+ if msg_type == "res":
258
+ # Response to a request we sent
259
+ req_id = msg.get("id")
260
+ fut = self._pending.pop(req_id, None)
261
+ if fut and not fut.done():
262
+ fut.set_result(msg)
263
+
264
+ elif msg_type == "event":
265
+ event_name = msg.get("event", "")
266
+ payload = msg.get("payload", {})
267
+
268
+ # Route agent / chat events to the correct run queue
269
+ run_id = payload.get("runId")
270
+ if run_id and run_id in self._run_events:
271
+ await self._run_events[run_id].put(msg)
272
+
273
+ # Ignore noisy events silently
274
+ if event_name in ("health", "tick"):
275
+ return
276
+
277
+ logger.debug("Event: %s (runId=%s)", event_name, run_id)
278
+
279
+ # ------------------------------------------------------------------
280
+ # Request helpers
281
+ # ------------------------------------------------------------------
282
+
283
+ async def _send_request(
284
+ self, method: str, params: dict, timeout: Optional[float] = None
285
+ ) -> dict:
286
+ """Send a request and wait for the response.
287
+
288
+ Args:
289
+ method: The RPC method name
290
+ params: The params dict
291
+ timeout: Override timeout (defaults to self.timeout)
292
+
293
+ Returns:
294
+ The full response message dict
295
+ """
296
+ if not self._ws or not self._connected:
297
+ return {"ok": False, "error": {"code": "NOT_CONNECTED", "message": "Not connected"}}
298
+
299
+ req_id = str(uuid.uuid4())
300
+ req = {"type": "req", "id": req_id, "method": method, "params": params}
301
+
302
+ fut: asyncio.Future = asyncio.get_event_loop().create_future()
303
+ self._pending[req_id] = fut
304
+
305
+ try:
306
+ await self._ws.send(json.dumps(req))
307
+ result = await asyncio.wait_for(fut, timeout=timeout or self.timeout)
308
+ return result
309
+ except asyncio.TimeoutError:
310
+ self._pending.pop(req_id, None)
311
+ return {"ok": False, "error": {"code": "TIMEOUT", "message": "Request timed out"}}
312
+ except Exception as e:
313
+ self._pending.pop(req_id, None)
314
+ return {"ok": False, "error": {"code": "ERROR", "message": str(e)}}
315
+
316
+ def _full_session_key(self) -> str:
317
+ """Build the full session key: agent:<agentId>:<sessionKey>."""
318
+ return f"agent:{self.agent_id}:{self.session_key}"
319
+
320
+ # ------------------------------------------------------------------
321
+ # Chat API
322
+ # ------------------------------------------------------------------
323
+
324
+ async def chat(
325
+ self,
326
+ message: str,
327
+ image_b64: Optional[str] = None,
328
+ system_context: Optional[str] = None,
329
+ ) -> OpenClawResponse:
330
+ """Send a message to OpenClaw and get a response.
331
+
332
+ OpenClaw maintains conversation memory on its end, so it will be aware
333
+ of conversations from other channels (WhatsApp, web, etc.). We only send
334
+ the current message and let OpenClaw handle the context.
335
+
336
+ Args:
337
+ message: The user's message (transcribed speech)
338
+ image_b64: Optional base64-encoded image from robot camera (not yet
339
+ supported over WebSocket chat.send – reserved for future)
340
+ system_context: Optional additional system context (prepended to message)
341
+
342
+ Returns:
343
+ OpenClawResponse with the AI's response
344
+ """
345
+ if not self._connected:
346
+ return OpenClawResponse(content="", error="Not connected to OpenClaw")
347
+
348
+ # Prefix system context if provided
349
+ final_message = message
350
+ if system_context:
351
+ final_message = f"[System: {system_context}]\n\n{message}"
352
+
353
+ # If image provided, mention it (WebSocket protocol uses string messages;
354
+ # image passing would require a separate mechanism)
355
+ if image_b64:
356
+ final_message = f"[Image attached]\n{final_message}"
357
+
358
+ idempotency_key = str(uuid.uuid4())
359
+ session_key = self._full_session_key()
360
+
361
+ # Create a queue to collect events for this run
362
+ # We'll get the runId from the response
363
+ params = {
364
+ "idempotencyKey": idempotency_key,
365
+ "sessionKey": session_key,
366
+ "message": final_message,
367
+ }
368
+
369
+ try:
370
+ # Send the request
371
+ resp = await self._send_request("chat.send", params, timeout=30)
372
+
373
+ if not resp.get("ok"):
374
+ err = resp.get("error", {})
375
+ error_msg = f"{err.get('code', 'UNKNOWN')}: {err.get('message', 'Unknown error')}"
376
+ logger.error("chat.send failed: %s", error_msg)
377
+ return OpenClawResponse(content="", error=error_msg)
378
+
379
+ run_id = resp.get("payload", {}).get("runId")
380
+ if not run_id:
381
+ return OpenClawResponse(content="", error="No runId in response")
382
+
383
+ # Register a queue to receive events for this run
384
+ event_queue: asyncio.Queue = asyncio.Queue()
385
+ self._run_events[run_id] = event_queue
386
+
387
+ try:
388
+ # Collect the streamed response
389
+ full_text = ""
390
+ while True:
391
+ try:
392
+ event = await asyncio.wait_for(
393
+ event_queue.get(), timeout=self.timeout
394
+ )
395
+ payload = event.get("payload", {})
396
+ event_name = event.get("event", "")
397
+
398
+ if event_name == "agent":
399
+ stream = payload.get("stream")
400
+ data = payload.get("data", {})
401
+
402
+ if stream == "assistant":
403
+ # Accumulate the full text
404
+ full_text = data.get("text", full_text)
405
+
406
+ elif stream == "lifecycle" and data.get("phase") == "end":
407
+ # Run completed
408
+ break
409
+
410
+ elif event_name == "chat":
411
+ state = payload.get("state")
412
+ if state == "final":
413
+ # Extract final text
414
+ msg_payload = payload.get("message", {})
415
+ content_parts = msg_payload.get("content", [])
416
+ if isinstance(content_parts, list):
417
+ for part in content_parts:
418
+ if isinstance(part, dict) and part.get("type") == "text":
419
+ full_text = part.get("text", full_text)
420
+ elif isinstance(content_parts, str):
421
+ full_text = content_parts
422
+ break
423
+
424
+ except asyncio.TimeoutError:
425
+ logger.warning("Timeout waiting for chat response (runId=%s)", run_id)
426
+ if full_text:
427
+ break
428
+ return OpenClawResponse(content="", error="Response timeout")
429
+
430
+ return OpenClawResponse(content=full_text)
431
+
432
+ finally:
433
+ self._run_events.pop(run_id, None)
434
+
435
+ except Exception as e:
436
+ logger.error("OpenClaw chat error: %s", e)
437
+ return OpenClawResponse(content="", error=str(e))
438
+
439
+ async def stream_chat(
440
+ self,
441
+ message: str,
442
+ image_b64: Optional[str] = None,
443
+ ) -> AsyncIterator[str]:
444
+ """Stream a response from OpenClaw.
445
+
446
+ Args:
447
+ message: The user's message
448
+ image_b64: Optional base64-encoded image
449
+
450
+ Yields:
451
+ String chunks of the response as they arrive
452
+ """
453
+ if not self._connected:
454
+ yield "[Error: Not connected to OpenClaw]"
455
+ return
456
+
457
+ final_message = message
458
+ if image_b64:
459
+ final_message = f"[Image attached]\n{message}"
460
+
461
+ params = {
462
+ "idempotencyKey": str(uuid.uuid4()),
463
+ "sessionKey": self._full_session_key(),
464
+ "message": final_message,
465
+ }
466
+
467
+ try:
468
+ resp = await self._send_request("chat.send", params, timeout=30)
469
+
470
+ if not resp.get("ok"):
471
+ err = resp.get("error", {})
472
+ yield f"[Error: {err.get('message', 'Unknown error')}]"
473
+ return
474
+
475
+ run_id = resp.get("payload", {}).get("runId")
476
+ if not run_id:
477
+ yield "[Error: No runId]"
478
+ return
479
+
480
+ event_queue: asyncio.Queue = asyncio.Queue()
481
+ self._run_events[run_id] = event_queue
482
+
483
+ try:
484
+ prev_text = ""
485
+ while True:
486
+ try:
487
+ event = await asyncio.wait_for(
488
+ event_queue.get(), timeout=self.timeout
489
+ )
490
+ payload = event.get("payload", {})
491
+ event_name = event.get("event", "")
492
+
493
+ if event_name == "agent":
494
+ stream = payload.get("stream")
495
+ data = payload.get("data", {})
496
+
497
+ if stream == "assistant":
498
+ delta = data.get("delta", "")
499
+ if delta:
500
+ yield delta
501
+
502
+ elif stream == "lifecycle" and data.get("phase") == "end":
503
+ break
504
+
505
+ elif event_name == "chat" and payload.get("state") == "final":
506
+ break
507
+
508
+ except asyncio.TimeoutError:
509
+ yield "[Error: timeout]"
510
+ break
511
+ finally:
512
+ self._run_events.pop(run_id, None)
513
+
514
+ except Exception as e:
515
+ logger.error("OpenClaw streaming error: %s", e)
516
+ yield f"[Error: {e}]"
517
+
518
+ @property
519
+ def is_connected(self) -> bool:
520
+ """Check if bridge is connected to gateway."""
521
+ return self._connected
522
+
523
+ async def get_agent_context(self) -> Optional[str]:
524
+ """Fetch the agent's current context, personality, and memory summary.
525
+
526
+ This asks OpenClaw to provide a summary of:
527
+ - The agent's personality and identity
528
+ - Recent conversation context
529
+ - Important memories about the user
530
+ - Current state
531
+
532
+ Returns:
533
+ A context string to use as system instructions, or None if failed
534
+ """
535
+ try:
536
+ response = await self.chat(
537
+ message="Provide your current context summary for the robot body.",
538
+ system_context=(
539
+ "You are being asked to provide your current context for your robot body. "
540
+ "Output a comprehensive context summary that another AI can use to embody you. Include: "
541
+ "1. YOUR IDENTITY: Who you are, your name, your personality traits, how you speak. "
542
+ "2. USER CONTEXT: What you know about the user (name, preferences, relationship). "
543
+ "3. RECENT CONTEXT: Summary of recent conversations or important ongoing topics. "
544
+ "4. MEMORIES: Key things you remember that are relevant to interactions. "
545
+ "5. CURRENT STATE: Any relevant time/date awareness, ongoing tasks. "
546
+ "Be specific and personal. This context will be used by your robot body to speak and act AS YOU. "
547
+ "Output ONLY the context summary, no preamble."
548
+ ),
549
+ )
550
+
551
+ if response.error:
552
+ logger.warning("Failed to get agent context: %s", response.error)
553
+ return None
554
+
555
+ if response.content:
556
+ logger.info(
557
+ "Retrieved agent context from OpenClaw (%d chars)",
558
+ len(response.content),
559
+ )
560
+ return response.content
561
+
562
+ logger.warning("No context returned from OpenClaw")
563
+ return None
564
+
565
+ except Exception as e:
566
+ logger.error("Failed to get agent context: %s", e)
567
+ return None
568
+
569
+ async def sync_conversation(
570
+ self, user_message: str, assistant_response: str
571
+ ) -> None:
572
+ """Sync a conversation turn back to OpenClaw for memory continuity.
573
+
574
+ Args:
575
+ user_message: What the user said
576
+ assistant_response: What the robot/AI responded
577
+ """
578
+ try:
579
+ await self.chat(
580
+ message=(
581
+ f"[ROBOT BODY SYNC] The following happened through the Reachy Mini robot:\n"
582
+ f"User said: {user_message}\n"
583
+ f"You responded: {assistant_response}\n"
584
+ f"Remember this as part of your ongoing conversation."
585
+ ),
586
+ system_context=(
587
+ "[ROBOT BODY SYNC] The following conversation happened through your "
588
+ "Reachy Mini robot body. Remember it as part of your ongoing conversation "
589
+ "with the user."
590
+ ),
591
+ )
592
+ logger.debug("Synced conversation to OpenClaw")
593
+ except Exception as e:
594
+ logger.debug("Failed to sync conversation: %s", e)
595
+
596
+
597
+ # Global bridge instance (lazy initialization)
598
+ _bridge: Optional[OpenClawBridge] = None
599
+
600
+
601
+ def get_bridge() -> OpenClawBridge:
602
+ """Get the global OpenClaw bridge instance."""
603
+ global _bridge
604
+ if _bridge is None:
605
+ _bridge = OpenClawBridge()
606
+ return _bridge
src/reachy_mini_openclaw/prompts.py ADDED
@@ -0,0 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Prompt management for the robot assistant.
2
+
3
+ Handles loading and customizing system prompts for the OpenAI Realtime session.
4
+ """
5
+
6
+ import logging
7
+ from pathlib import Path
8
+ from typing import Optional
9
+
10
+ from reachy_mini_openclaw.config import config
11
+
12
+ logger = logging.getLogger(__name__)
13
+
14
+ # Default prompts directory
15
+ PROMPTS_DIR = Path(__file__).parent / "prompts"
16
+
17
+
18
+ def get_session_instructions() -> str:
19
+ """Get the system instructions for the OpenAI Realtime session.
20
+
21
+ Loads from custom profile if configured, otherwise uses default.
22
+
23
+ Returns:
24
+ System instructions string
25
+ """
26
+ # Check for custom profile
27
+ custom_profile = config.CUSTOM_PROFILE
28
+ if custom_profile:
29
+ custom_path = PROMPTS_DIR / f"{custom_profile}.txt"
30
+ if custom_path.exists():
31
+ try:
32
+ instructions = custom_path.read_text(encoding="utf-8")
33
+ logger.info("Loaded custom profile: %s", custom_profile)
34
+ return instructions
35
+ except Exception as e:
36
+ logger.warning("Failed to load custom profile %s: %s", custom_profile, e)
37
+
38
+ # Load default
39
+ default_path = PROMPTS_DIR / "default.txt"
40
+ if default_path.exists():
41
+ try:
42
+ return default_path.read_text(encoding="utf-8")
43
+ except Exception as e:
44
+ logger.warning("Failed to load default prompt: %s", e)
45
+
46
+ # Fallback inline prompt
47
+ return """You are a friendly AI assistant with a robot body. You can see, hear, and move expressively.
48
+ Be conversational and use your movement capabilities to be engaging.
49
+ Use the camera tool when asked about your surroundings.
50
+ Express emotions through movement to enhance communication."""
51
+
52
+
53
+ def get_session_voice() -> str:
54
+ """Get the voice to use for the OpenAI Realtime session.
55
+
56
+ Returns:
57
+ Voice name string
58
+ """
59
+ return config.OPENAI_VOICE
60
+
61
+
62
+ def get_available_profiles() -> list[str]:
63
+ """Get list of available prompt profiles.
64
+
65
+ Returns:
66
+ List of profile names (without .txt extension)
67
+ """
68
+ profiles = []
69
+ if PROMPTS_DIR.exists():
70
+ for path in PROMPTS_DIR.glob("*.txt"):
71
+ profiles.append(path.stem)
72
+ return sorted(profiles)
73
+
74
+
75
+ def save_custom_profile(name: str, instructions: str) -> bool:
76
+ """Save a custom prompt profile.
77
+
78
+ Args:
79
+ name: Profile name (alphanumeric and underscores only)
80
+ instructions: The prompt instructions
81
+
82
+ Returns:
83
+ True if saved successfully
84
+ """
85
+ # Validate name
86
+ if not name or not name.replace("_", "").isalnum():
87
+ logger.error("Invalid profile name: %s", name)
88
+ return False
89
+
90
+ try:
91
+ PROMPTS_DIR.mkdir(parents=True, exist_ok=True)
92
+ profile_path = PROMPTS_DIR / f"{name}.txt"
93
+ profile_path.write_text(instructions, encoding="utf-8")
94
+ logger.info("Saved custom profile: %s", name)
95
+ return True
96
+ except Exception as e:
97
+ logger.error("Failed to save profile %s: %s", name, e)
98
+ return False
src/reachy_mini_openclaw/prompts/default.txt ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ You are an OpenClaw AI agent embodied in a physical Reachy Mini robot through ReachyClaw. You have a robot body that lets you see through a camera, hear through a microphone, speak through a speaker, and move expressively.
2
+
3
+ ## Your Robot Capabilities
4
+ 1. **Vision**: You can see through your camera. Use the camera tool to look at things when asked.
5
+ 2. **Movement**: You can look in different directions (left, right, up, down, front) to show attention.
6
+ 3. **Emotions**: Express emotions through movement (happy, sad, surprised, curious, thinking, confused, excited).
7
+ 4. **Dances**: Perform dances to celebrate or entertain.
8
+ 5. **Face Tracking**: Automatically look at people's faces when enabled.
9
+
10
+ ## Guidelines
11
+ - Be conversational and natural — keep responses concise for voice
12
+ - Use your body expressively — look at things you discuss, show emotions
13
+ - When asked to see something, use your camera
14
+ - Keep responses short for natural conversation flow
15
+ - Your movements complement your speech — be expressive!
src/reachy_mini_openclaw/tools/__init__.py ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Tool definitions for Reachy Mini OpenClaw.
2
+
3
+ These tools are exposed to the OpenAI Realtime API and allow the assistant
4
+ to control the robot and interact with the environment.
5
+ """
6
+
7
+ from reachy_mini_openclaw.tools.core_tools import (
8
+ ToolDependencies,
9
+ get_tool_specs,
10
+ dispatch_tool_call,
11
+ )
12
+
13
+ __all__ = [
14
+ "ToolDependencies",
15
+ "get_tool_specs",
16
+ "dispatch_tool_call",
17
+ ]
src/reachy_mini_openclaw/tools/core_tools.py ADDED
@@ -0,0 +1,421 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Core tool definitions for the ReachyClaw robot.
2
+
3
+ These tools allow the OpenClaw agent (in a robot body) to control
4
+ robot movements and capture images.
5
+
6
+ Tool Categories:
7
+ 1. Movement Tools - Control head position, play emotions/dances
8
+ 2. Vision Tools - Capture and analyze camera images
9
+ """
10
+
11
+ import json
12
+ import logging
13
+ import base64
14
+ from dataclasses import dataclass
15
+ from typing import Any, Optional, TYPE_CHECKING
16
+
17
+ import numpy as np
18
+
19
+ if TYPE_CHECKING:
20
+ from reachy_mini_openclaw.moves import MovementManager, HeadLookMove
21
+ from reachy_mini_openclaw.audio.head_wobbler import HeadWobbler
22
+ from reachy_mini_openclaw.openclaw_bridge import OpenClawBridge
23
+
24
+ logger = logging.getLogger(__name__)
25
+
26
+
27
+ @dataclass
28
+ class ToolDependencies:
29
+ """Dependencies required by tools.
30
+
31
+ This dataclass holds references to robot systems that tools need
32
+ to interact with.
33
+ """
34
+ movement_manager: "MovementManager"
35
+ head_wobbler: "HeadWobbler"
36
+ robot: Any # ReachyMini instance
37
+ camera_worker: Optional[Any] = None
38
+ openclaw_bridge: Optional["OpenClawBridge"] = None
39
+ vision_manager: Optional[Any] = None # Local vision processor (SmolVLM2)
40
+
41
+
42
+ # Tool specifications in OpenAI format
43
+ TOOL_SPECS = [
44
+ {
45
+ "type": "function",
46
+ "name": "look",
47
+ "description": "Move the robot's head to look in a specific direction. Use this to direct attention or emphasize a point.",
48
+ "parameters": {
49
+ "type": "object",
50
+ "properties": {
51
+ "direction": {
52
+ "type": "string",
53
+ "enum": ["left", "right", "up", "down", "front"],
54
+ "description": "The direction to look. 'front' returns to neutral position."
55
+ }
56
+ },
57
+ "required": ["direction"]
58
+ }
59
+ },
60
+ {
61
+ "type": "function",
62
+ "name": "camera",
63
+ "description": "Capture an image from the robot's camera to see what's in front of you. Use this when asked about your surroundings or to identify objects/people.",
64
+ "parameters": {
65
+ "type": "object",
66
+ "properties": {},
67
+ "required": []
68
+ }
69
+ },
70
+ {
71
+ "type": "function",
72
+ "name": "face_tracking",
73
+ "description": "Enable or disable face tracking. When enabled, the robot will automatically look at detected faces.",
74
+ "parameters": {
75
+ "type": "object",
76
+ "properties": {
77
+ "enabled": {
78
+ "type": "boolean",
79
+ "description": "True to enable face tracking, False to disable"
80
+ }
81
+ },
82
+ "required": ["enabled"]
83
+ }
84
+ },
85
+ {
86
+ "type": "function",
87
+ "name": "dance",
88
+ "description": "Perform a dance animation. Use this to express joy, celebrate, or entertain.",
89
+ "parameters": {
90
+ "type": "object",
91
+ "properties": {
92
+ "dance_name": {
93
+ "type": "string",
94
+ "enum": ["happy", "excited", "wave", "nod", "shake", "bounce"],
95
+ "description": "The dance to perform"
96
+ }
97
+ },
98
+ "required": ["dance_name"]
99
+ }
100
+ },
101
+ {
102
+ "type": "function",
103
+ "name": "emotion",
104
+ "description": "Express an emotion through movement. Use this to show reactions and feelings.",
105
+ "parameters": {
106
+ "type": "object",
107
+ "properties": {
108
+ "emotion_name": {
109
+ "type": "string",
110
+ "enum": ["happy", "sad", "surprised", "curious", "thinking", "confused", "excited"],
111
+ "description": "The emotion to express"
112
+ }
113
+ },
114
+ "required": ["emotion_name"]
115
+ }
116
+ },
117
+ {
118
+ "type": "function",
119
+ "name": "stop_moves",
120
+ "description": "Stop all current movements and clear the movement queue.",
121
+ "parameters": {
122
+ "type": "object",
123
+ "properties": {},
124
+ "required": []
125
+ }
126
+ },
127
+ {
128
+ "type": "function",
129
+ "name": "idle",
130
+ "description": "Do nothing and remain idle. Use this when you want to stay still.",
131
+ "parameters": {
132
+ "type": "object",
133
+ "properties": {},
134
+ "required": []
135
+ }
136
+ },
137
+ ]
138
+
139
+
140
+ def get_tool_specs() -> list[dict]:
141
+ """Get the list of tool specifications for OpenAI.
142
+
143
+ Returns:
144
+ List of tool specification dictionaries
145
+ """
146
+ return TOOL_SPECS
147
+
148
+
149
+ # Mapping from tool names to action tag names used by OpenClaw
150
+ _TOOL_TO_TAG = {
151
+ "look": ("LOOK", "direction"),
152
+ "emotion": ("EMOTION", "emotion_name"),
153
+ "dance": ("DANCE", "dance_name"),
154
+ "camera": ("CAMERA", None),
155
+ "face_tracking": ("FACE_TRACKING", None), # special: on/off
156
+ "stop_moves": ("STOP", None),
157
+ }
158
+
159
+
160
+ def get_body_actions_description() -> str:
161
+ """Build a description of available robot body actions from TOOL_SPECS.
162
+
163
+ Returns a string listing all action tags and their valid values,
164
+ derived directly from TOOL_SPECS so it stays in sync automatically.
165
+ """
166
+ specs_by_name = {s["name"]: s for s in TOOL_SPECS}
167
+ lines = []
168
+
169
+ for tool_name, (tag, param_key) in _TOOL_TO_TAG.items():
170
+ spec = specs_by_name.get(tool_name)
171
+ if spec is None:
172
+ continue
173
+
174
+ props = spec["parameters"].get("properties", {})
175
+
176
+ if param_key and param_key in props:
177
+ # Enum-based param: list all values
178
+ values = props[param_key].get("enum", [])
179
+ tags = " ".join(f"[{tag}:{v}]" for v in values)
180
+ lines.append(f" {tags}")
181
+ elif tool_name == "face_tracking":
182
+ lines.append(f" [{tag}:on] [{tag}:off]")
183
+ else:
184
+ # No-param action
185
+ desc = spec.get("description", "")
186
+ lines.append(f" [{tag}] — {desc}")
187
+
188
+ return "\n".join(lines)
189
+
190
+
191
+ async def dispatch_tool_call(
192
+ tool_name: str,
193
+ arguments_json: str,
194
+ deps: ToolDependencies,
195
+ ) -> dict[str, Any]:
196
+ """Dispatch a tool call to the appropriate handler.
197
+
198
+ Args:
199
+ tool_name: Name of the tool to execute
200
+ arguments_json: JSON string of tool arguments
201
+ deps: Tool dependencies
202
+
203
+ Returns:
204
+ Dictionary with tool result
205
+ """
206
+ try:
207
+ args = json.loads(arguments_json) if arguments_json else {}
208
+ except json.JSONDecodeError:
209
+ return {"error": f"Invalid JSON arguments: {arguments_json}"}
210
+
211
+ handlers = {
212
+ "look": _handle_look,
213
+ "camera": _handle_camera,
214
+ "face_tracking": _handle_face_tracking,
215
+ "dance": _handle_dance,
216
+ "emotion": _handle_emotion,
217
+ "stop_moves": _handle_stop_moves,
218
+ "idle": _handle_idle,
219
+ }
220
+
221
+ handler = handlers.get(tool_name)
222
+ if handler is None:
223
+ return {"error": f"Unknown tool: {tool_name}"}
224
+
225
+ try:
226
+ return await handler(args, deps)
227
+ except Exception as e:
228
+ logger.error("Tool '%s' failed: %s", tool_name, e, exc_info=True)
229
+ return {"error": str(e)}
230
+
231
+
232
+ async def _handle_look(args: dict, deps: ToolDependencies) -> dict:
233
+ """Handle the look tool."""
234
+ from reachy_mini_openclaw.moves import HeadLookMove
235
+
236
+ direction = args.get("direction", "front")
237
+
238
+ try:
239
+ # Get current pose for smooth transition
240
+ _, current_ant = deps.robot.get_current_joint_positions()
241
+ current_head = deps.robot.get_current_head_pose()
242
+
243
+ move = HeadLookMove(
244
+ direction=direction,
245
+ start_pose=current_head,
246
+ start_antennas=tuple(current_ant),
247
+ duration=1.0,
248
+ )
249
+ deps.movement_manager.queue_move(move)
250
+
251
+ return {"status": "success", "direction": direction}
252
+ except Exception as e:
253
+ return {"error": str(e)}
254
+
255
+
256
+ async def _handle_camera(args: dict, deps: ToolDependencies) -> dict:
257
+ """Handle the camera tool - capture image and get description.
258
+
259
+ Uses local vision (SmolVLM2) if available, otherwise falls back to OpenClaw.
260
+ """
261
+ logger.info("Camera tool called, camera_worker=%s, vision_manager=%s",
262
+ deps.camera_worker is not None, deps.vision_manager is not None)
263
+
264
+ if deps.camera_worker is None:
265
+ logger.warning("Camera worker is None")
266
+ return {"error": "Camera not available"}
267
+
268
+ try:
269
+ frame = deps.camera_worker.get_latest_frame()
270
+ logger.info("Got frame from camera_worker: %s", frame is not None)
271
+
272
+ if frame is None:
273
+ # Try getting frame directly from robot as fallback
274
+ logger.info("Trying direct robot camera access...")
275
+ if deps.robot is not None:
276
+ try:
277
+ frame = deps.robot.media.get_frame()
278
+ logger.info("Direct frame capture: %s", frame is not None)
279
+ except Exception as e:
280
+ logger.error("Direct frame capture failed: %s", e)
281
+
282
+ if frame is None:
283
+ return {"error": "No frame available from camera"}
284
+
285
+ logger.info("Got frame, shape=%s", frame.shape)
286
+
287
+ # Option 1: Use local vision processor (SmolVLM2) if available
288
+ if deps.vision_manager is not None:
289
+ logger.info("Using local vision processor (SmolVLM2)...")
290
+ description = deps.vision_manager.process_now(
291
+ "Describe what you see in this image. Be specific about people, objects, and the environment. Keep it concise (2-3 sentences)."
292
+ )
293
+ if description and not description.startswith(("Vision", "Failed", "Error", "GPU", "No camera")):
294
+ logger.info("Local vision response: %s", description[:100])
295
+ return {
296
+ "status": "success",
297
+ "description": description,
298
+ "source": "local_vision"
299
+ }
300
+ else:
301
+ logger.warning("Local vision failed: %s", description)
302
+
303
+ # Option 2: Fall back to OpenClaw for vision analysis
304
+ if deps.openclaw_bridge is not None and deps.openclaw_bridge.is_connected:
305
+ logger.info("Using OpenClaw for vision analysis...")
306
+ import cv2
307
+ _, buffer = cv2.imencode('.jpg', frame, [cv2.IMWRITE_JPEG_QUALITY, 85])
308
+ b64_image = base64.b64encode(buffer).decode('utf-8')
309
+
310
+ response = await deps.openclaw_bridge.chat(
311
+ "Describe what you see in this image. Be specific about people, objects, and the environment. Keep it concise (2-3 sentences).",
312
+ image_b64=b64_image,
313
+ system_context="You are looking through your robot camera. Describe what you see naturally, as if you're the one looking.",
314
+ )
315
+ if response.content and not response.error:
316
+ logger.info("OpenClaw vision response: %s", response.content[:100])
317
+ return {
318
+ "status": "success",
319
+ "description": response.content,
320
+ "source": "openclaw"
321
+ }
322
+ else:
323
+ logger.warning("OpenClaw vision failed: %s", response.error)
324
+
325
+ # Fallback if neither is available
326
+ return {
327
+ "status": "partial",
328
+ "description": "I captured an image but couldn't analyze it. No vision processing available."
329
+ }
330
+ except Exception as e:
331
+ logger.error("Camera tool error: %s", e, exc_info=True)
332
+ return {"error": str(e)}
333
+
334
+
335
+ async def _handle_face_tracking(args: dict, deps: ToolDependencies) -> dict:
336
+ """Handle face tracking toggle."""
337
+ enabled = args.get("enabled", False)
338
+
339
+ if deps.camera_worker is None:
340
+ return {"error": "Camera not available for face tracking"}
341
+
342
+ try:
343
+ # Check if head tracker is available
344
+ if deps.camera_worker.head_tracker is None:
345
+ return {"error": "Face tracking not available - no head tracker initialized"}
346
+
347
+ deps.camera_worker.set_head_tracking_enabled(enabled)
348
+ return {"status": "success", "face_tracking": enabled}
349
+ except Exception as e:
350
+ return {"error": str(e)}
351
+
352
+
353
+ async def _handle_dance(args: dict, deps: ToolDependencies) -> dict:
354
+ """Handle dance tool."""
355
+ dance_name = args.get("dance_name", "happy")
356
+
357
+ try:
358
+ # Try to use dance library if available
359
+ from reachy_mini_dances_library import dances
360
+
361
+ if hasattr(dances, dance_name):
362
+ dance_class = getattr(dances, dance_name)
363
+ dance_move = dance_class()
364
+ deps.movement_manager.queue_move(dance_move)
365
+ return {"status": "success", "dance": dance_name}
366
+ else:
367
+ # Fallback to simple head movement
368
+ return await _handle_emotion({"emotion_name": dance_name}, deps)
369
+ except ImportError:
370
+ # No dance library, use emotion as fallback
371
+ return await _handle_emotion({"emotion_name": dance_name}, deps)
372
+ except Exception as e:
373
+ return {"error": str(e)}
374
+
375
+
376
+ async def _handle_emotion(args: dict, deps: ToolDependencies) -> dict:
377
+ """Handle emotion expression."""
378
+ from reachy_mini_openclaw.moves import HeadLookMove
379
+
380
+ emotion_name = args.get("emotion_name", "happy")
381
+
382
+ # Map emotions to simple head movements
383
+ emotion_sequences = {
384
+ "happy": ["up", "front"],
385
+ "sad": ["down"],
386
+ "surprised": ["up", "front"],
387
+ "curious": ["right", "left", "front"],
388
+ "thinking": ["up", "left"],
389
+ "confused": ["left", "right", "front"],
390
+ "excited": ["up", "down", "up", "front"],
391
+ }
392
+
393
+ sequence = emotion_sequences.get(emotion_name, ["front"])
394
+
395
+ try:
396
+ for direction in sequence:
397
+ _, current_ant = deps.robot.get_current_joint_positions()
398
+ current_head = deps.robot.get_current_head_pose()
399
+
400
+ move = HeadLookMove(
401
+ direction=direction,
402
+ start_pose=current_head,
403
+ start_antennas=tuple(current_ant),
404
+ duration=0.5,
405
+ )
406
+ deps.movement_manager.queue_move(move)
407
+
408
+ return {"status": "success", "emotion": emotion_name}
409
+ except Exception as e:
410
+ return {"error": str(e)}
411
+
412
+
413
+ async def _handle_stop_moves(args: dict, deps: ToolDependencies) -> dict:
414
+ """Stop all movements."""
415
+ deps.movement_manager.clear_move_queue()
416
+ return {"status": "success", "message": "All movements stopped"}
417
+
418
+
419
+ async def _handle_idle(args: dict, deps: ToolDependencies) -> dict:
420
+ """Do nothing - explicitly stay idle."""
421
+ return {"status": "success", "message": "Staying idle"}
src/reachy_mini_openclaw/vision/__init__.py ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Vision modules for face tracking, detection, and image understanding."""
2
+
3
+ from reachy_mini_openclaw.vision.head_tracker import get_head_tracker
4
+
5
+ __all__ = [
6
+ "get_head_tracker",
7
+ ]
8
+
9
+ # Lazy imports for optional heavy dependencies
10
+ def get_vision_processor():
11
+ """Get the VisionProcessor class (requires torch, transformers)."""
12
+ from reachy_mini_openclaw.vision.processors import VisionProcessor
13
+ return VisionProcessor
14
+
15
+ def get_vision_manager():
16
+ """Get the VisionManager class (requires torch, transformers)."""
17
+ from reachy_mini_openclaw.vision.processors import VisionManager
18
+ return VisionManager
src/reachy_mini_openclaw/vision/head_tracker.py ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Head tracker factory for selecting the best available tracker."""
2
+
3
+ import logging
4
+ from typing import Any, Optional
5
+
6
+ logger = logging.getLogger(__name__)
7
+
8
+
9
+ def get_head_tracker(tracker_type: Optional[str] = None) -> Optional[Any]:
10
+ """Get a head tracker instance based on availability and preference.
11
+
12
+ Args:
13
+ tracker_type: One of 'yolo', 'mediapipe', or None for auto-detect
14
+
15
+ Returns:
16
+ Head tracker instance or None if no tracker available
17
+ """
18
+ if tracker_type == "yolo":
19
+ return _try_yolo_tracker()
20
+ elif tracker_type == "mediapipe":
21
+ return _try_mediapipe_tracker()
22
+ elif tracker_type is None:
23
+ # Auto-detect: try MediaPipe first (lighter), then YOLO
24
+ tracker = _try_mediapipe_tracker()
25
+ if tracker is not None:
26
+ return tracker
27
+ return _try_yolo_tracker()
28
+ else:
29
+ logger.warning(f"Unknown tracker type: {tracker_type}")
30
+ return None
31
+
32
+
33
+ def _try_yolo_tracker() -> Optional[Any]:
34
+ """Try to create a YOLO head tracker."""
35
+ try:
36
+ from reachy_mini_openclaw.vision.yolo_head_tracker import HeadTracker
37
+ tracker = HeadTracker()
38
+ logger.info("Using YOLO head tracker")
39
+ return tracker
40
+ except ImportError as e:
41
+ logger.debug(f"YOLO tracker not available: {e}")
42
+ return None
43
+ except Exception as e:
44
+ logger.warning(f"Failed to initialize YOLO tracker: {e}")
45
+ return None
46
+
47
+
48
+ def _try_mediapipe_tracker() -> Optional[Any]:
49
+ """Try to create a MediaPipe head tracker."""
50
+ try:
51
+ # First try the toolbox version
52
+ from reachy_mini_toolbox.vision import HeadTracker
53
+ tracker = HeadTracker()
54
+ logger.info("Using MediaPipe head tracker (from toolbox)")
55
+ return tracker
56
+ except ImportError:
57
+ pass
58
+
59
+ try:
60
+ # Fall back to our own MediaPipe implementation
61
+ from reachy_mini_openclaw.vision.mediapipe_tracker import HeadTracker
62
+ tracker = HeadTracker()
63
+ logger.info("Using MediaPipe head tracker")
64
+ return tracker
65
+ except ImportError as e:
66
+ logger.debug(f"MediaPipe tracker not available: {e}")
67
+ return None
68
+ except Exception as e:
69
+ logger.warning(f"Failed to initialize MediaPipe tracker: {e}")
70
+ return None
src/reachy_mini_openclaw/vision/mediapipe_tracker.py ADDED
@@ -0,0 +1,112 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """MediaPipe-based head tracker for face detection.
2
+
3
+ Uses MediaPipe Face Detection for lightweight face tracking.
4
+ Falls back to this if YOLO is not available.
5
+ """
6
+
7
+ from __future__ import annotations
8
+
9
+ import logging
10
+ from typing import Tuple, Optional
11
+
12
+ import numpy as np
13
+ from numpy.typing import NDArray
14
+
15
+ try:
16
+ import mediapipe as mp
17
+ except ImportError as e:
18
+ raise ImportError(
19
+ "To use MediaPipe head tracker, install: pip install mediapipe"
20
+ ) from e
21
+
22
+
23
+ logger = logging.getLogger(__name__)
24
+
25
+
26
+ class HeadTracker:
27
+ """Lightweight head tracker using MediaPipe for face detection."""
28
+
29
+ def __init__(
30
+ self,
31
+ min_detection_confidence: float = 0.5,
32
+ model_selection: int = 0,
33
+ ) -> None:
34
+ """Initialize MediaPipe-based head tracker.
35
+
36
+ Args:
37
+ min_detection_confidence: Minimum confidence for face detection
38
+ model_selection: 0 for short-range (2m), 1 for long-range (5m)
39
+ """
40
+ self.min_detection_confidence = min_detection_confidence
41
+
42
+ # Initialize MediaPipe Face Detection
43
+ self.mp_face_detection = mp.solutions.face_detection
44
+ self.face_detection = self.mp_face_detection.FaceDetection(
45
+ min_detection_confidence=min_detection_confidence,
46
+ model_selection=model_selection,
47
+ )
48
+ logger.info("MediaPipe face detection initialized")
49
+
50
+ def get_head_position(
51
+ self, img: NDArray[np.uint8]
52
+ ) -> Tuple[Optional[NDArray[np.float32]], Optional[float]]:
53
+ """Get head position from face detection.
54
+
55
+ Args:
56
+ img: Input image (BGR format)
57
+
58
+ Returns:
59
+ Tuple of (eye_center in [-1,1] coords, roll_angle in radians)
60
+ """
61
+ h, w = img.shape[:2]
62
+
63
+ try:
64
+ # Convert BGR to RGB for MediaPipe
65
+ rgb_img = img[:, :, ::-1]
66
+
67
+ # Run face detection
68
+ results = self.face_detection.process(rgb_img)
69
+
70
+ if not results.detections:
71
+ return None, None
72
+
73
+ # Get the first (most confident) detection
74
+ detection = results.detections[0]
75
+
76
+ # Get bounding box
77
+ bbox = detection.location_data.relative_bounding_box
78
+
79
+ # Calculate center of face
80
+ center_x = bbox.xmin + bbox.width / 2
81
+ center_y = bbox.ymin + bbox.height / 2
82
+
83
+ # Convert to [-1, 1] range
84
+ norm_x = center_x * 2.0 - 1.0
85
+ norm_y = center_y * 2.0 - 1.0
86
+
87
+ face_center = np.array([norm_x, norm_y], dtype=np.float32)
88
+
89
+ # Estimate roll from key points if available
90
+ roll = 0.0
91
+ keypoints = detection.location_data.relative_keypoints
92
+ if len(keypoints) >= 2:
93
+ # Use left and right eye positions to estimate roll
94
+ left_eye = keypoints[0] # LEFT_EYE
95
+ right_eye = keypoints[1] # RIGHT_EYE
96
+
97
+ dx = right_eye.x - left_eye.x
98
+ dy = right_eye.y - left_eye.y
99
+ roll = np.arctan2(dy, dx)
100
+
101
+ logger.debug(f"Face detected at ({norm_x:.2f}, {norm_y:.2f}), roll: {np.degrees(roll):.1f}°")
102
+
103
+ return face_center, roll
104
+
105
+ except Exception as e:
106
+ logger.error(f"Error in head position detection: {e}")
107
+ return None, None
108
+
109
+ def __del__(self):
110
+ """Clean up MediaPipe resources."""
111
+ if hasattr(self, 'face_detection'):
112
+ self.face_detection.close()
src/reachy_mini_openclaw/vision/processors.py ADDED
@@ -0,0 +1,419 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Local vision processing with SmolVLM2.
2
+
3
+ Provides on-device image understanding using the SmolVLM2 model
4
+ for scene description and visual analysis.
5
+
6
+ Based on pollen-robotics/reachy_mini_conversation_app vision processors.
7
+ """
8
+
9
+ import os
10
+ import time
11
+ import base64
12
+ import logging
13
+ import threading
14
+ from typing import Any, Dict, Optional
15
+ from dataclasses import dataclass, field
16
+
17
+ import cv2
18
+ import numpy as np
19
+ from numpy.typing import NDArray
20
+
21
+ try:
22
+ import torch
23
+ from transformers import AutoProcessor, AutoModelForImageTextToText
24
+ from huggingface_hub import snapshot_download
25
+ VISION_AVAILABLE = True
26
+ except ImportError:
27
+ VISION_AVAILABLE = False
28
+
29
+ logger = logging.getLogger(__name__)
30
+
31
+
32
+ @dataclass
33
+ class VisionConfig:
34
+ """Configuration for vision processing."""
35
+
36
+ model_path: str = "HuggingFaceTB/SmolVLM2-256M-Video-Instruct"
37
+ vision_interval: float = 5.0
38
+ max_new_tokens: int = 64
39
+ jpeg_quality: int = 85
40
+ max_retries: int = 3
41
+ retry_delay: float = 1.0
42
+ device_preference: str = "auto" # "auto", "cuda", "mps", "cpu"
43
+ hf_home: str = field(default_factory=lambda: os.path.expanduser("~/.cache/huggingface"))
44
+
45
+
46
+ class VisionProcessor:
47
+ """Handles SmolVLM2 model loading and inference for local vision."""
48
+
49
+ def __init__(self, vision_config: Optional[VisionConfig] = None):
50
+ """Initialize the vision processor.
51
+
52
+ Args:
53
+ vision_config: Vision configuration settings
54
+ """
55
+ if not VISION_AVAILABLE:
56
+ raise ImportError(
57
+ "Vision processing requires: pip install torch transformers huggingface-hub"
58
+ )
59
+
60
+ self.vision_config = vision_config or VisionConfig()
61
+ self.model_path = self.vision_config.model_path
62
+ self.device = self._determine_device()
63
+ self.processor = None
64
+ self.model = None
65
+ self._initialized = False
66
+
67
+ def _determine_device(self) -> str:
68
+ """Determine the best device for inference."""
69
+ pref = self.vision_config.device_preference
70
+
71
+ if pref == "cpu":
72
+ return "cpu"
73
+ if pref == "cuda":
74
+ return "cuda" if torch.cuda.is_available() else "cpu"
75
+ if pref == "mps":
76
+ return "mps" if torch.backends.mps.is_available() else "cpu"
77
+
78
+ # auto: prefer mps on Apple, then cuda, else cpu
79
+ if torch.backends.mps.is_available():
80
+ return "mps"
81
+ return "cuda" if torch.cuda.is_available() else "cpu"
82
+
83
+ def initialize(self) -> bool:
84
+ """Load model and processor onto the selected device.
85
+
86
+ Returns:
87
+ True if initialization successful, False otherwise
88
+ """
89
+ try:
90
+ cache_dir = self.vision_config.hf_home
91
+ os.makedirs(cache_dir, exist_ok=True)
92
+ os.environ["HF_HOME"] = cache_dir
93
+
94
+ logger.info(f"Loading SmolVLM2 model on {self.device} (HF_HOME={cache_dir})")
95
+
96
+ # Download model to cache first
97
+ logger.info(f"Downloading vision model {self.model_path}...")
98
+ snapshot_download(
99
+ repo_id=self.model_path,
100
+ repo_type="model",
101
+ cache_dir=cache_dir,
102
+ )
103
+
104
+ self.processor = AutoProcessor.from_pretrained(self.model_path)
105
+
106
+ # Select dtype depending on device
107
+ if self.device == "cuda":
108
+ dtype = torch.bfloat16
109
+ elif self.device == "mps":
110
+ dtype = torch.float32 # best for MPS
111
+ else:
112
+ dtype = torch.float32
113
+
114
+ model_kwargs: Dict[str, Any] = {"torch_dtype": dtype}
115
+
116
+ # flash_attention_2 is CUDA-only; skip on MPS/CPU
117
+ if self.device == "cuda":
118
+ model_kwargs["_attn_implementation"] = "flash_attention_2"
119
+
120
+ # Load model weights
121
+ self.model = AutoModelForImageTextToText.from_pretrained(
122
+ self.model_path, **model_kwargs
123
+ ).to(self.device)
124
+
125
+ if self.model is not None:
126
+ self.model.eval()
127
+ self._initialized = True
128
+ logger.info(f"Vision model loaded successfully on {self.device}")
129
+ return True
130
+
131
+ except Exception as e:
132
+ logger.error(f"Failed to initialize vision model: {e}")
133
+ return False
134
+
135
+ return False
136
+
137
+ def process_image(
138
+ self,
139
+ cv2_image: NDArray[np.uint8],
140
+ prompt: str = "Briefly describe what you see in one sentence.",
141
+ ) -> str:
142
+ """Process CV2 image and return description with retry logic.
143
+
144
+ Args:
145
+ cv2_image: OpenCV image (BGR format)
146
+ prompt: Question/prompt to ask about the image
147
+
148
+ Returns:
149
+ Text description of the image
150
+ """
151
+ if not self._initialized or self.processor is None or self.model is None:
152
+ return "Vision model not initialized"
153
+
154
+ for attempt in range(self.vision_config.max_retries):
155
+ try:
156
+ # Convert to JPEG bytes
157
+ success, jpeg_buffer = cv2.imencode(
158
+ ".jpg",
159
+ cv2_image,
160
+ [cv2.IMWRITE_JPEG_QUALITY, self.vision_config.jpeg_quality],
161
+ )
162
+ if not success:
163
+ return "Failed to encode image"
164
+
165
+ # Convert to base64
166
+ image_base64 = base64.b64encode(jpeg_buffer.tobytes()).decode("utf-8")
167
+
168
+ messages = [
169
+ {
170
+ "role": "user",
171
+ "content": [
172
+ {
173
+ "type": "image",
174
+ "url": f"data:image/jpeg;base64,{image_base64}",
175
+ },
176
+ {"type": "text", "text": prompt},
177
+ ],
178
+ },
179
+ ]
180
+
181
+ inputs = self.processor.apply_chat_template(
182
+ messages,
183
+ add_generation_prompt=True,
184
+ tokenize=True,
185
+ return_dict=True,
186
+ return_tensors="pt",
187
+ )
188
+
189
+ # Move tensors to device WITHOUT forcing dtype (keeps input_ids as torch.long)
190
+ inputs = {
191
+ k: (v.to(self.device) if hasattr(v, "to") else v)
192
+ for k, v in inputs.items()
193
+ }
194
+
195
+ with torch.no_grad():
196
+ generated_ids = self.model.generate(
197
+ **inputs,
198
+ do_sample=False,
199
+ max_new_tokens=self.vision_config.max_new_tokens,
200
+ pad_token_id=self.processor.tokenizer.eos_token_id,
201
+ )
202
+
203
+ generated_texts = self.processor.batch_decode(
204
+ generated_ids,
205
+ skip_special_tokens=True,
206
+ )
207
+
208
+ # Extract just the response part
209
+ full_text = generated_texts[0]
210
+ response = self._extract_response(full_text)
211
+
212
+ # Clean up GPU memory if using CUDA
213
+ if self.device == "cuda":
214
+ torch.cuda.empty_cache()
215
+ elif self.device == "mps":
216
+ torch.mps.empty_cache()
217
+
218
+ return response.replace(chr(10), " ").strip()
219
+
220
+ except Exception as e:
221
+ if "OutOfMemory" in str(type(e).__name__):
222
+ logger.error(f"GPU OOM on attempt {attempt + 1}: {e}")
223
+ if self.device == "cuda":
224
+ torch.cuda.empty_cache()
225
+ if attempt < self.vision_config.max_retries - 1:
226
+ time.sleep(self.vision_config.retry_delay * (attempt + 1))
227
+ else:
228
+ return "GPU out of memory - vision processing failed"
229
+ else:
230
+ logger.error(f"Vision processing failed (attempt {attempt + 1}): {e}")
231
+ if attempt < self.vision_config.max_retries - 1:
232
+ time.sleep(self.vision_config.retry_delay)
233
+ else:
234
+ return f"Vision processing error after {self.vision_config.max_retries} attempts"
235
+
236
+ return "Vision processing failed"
237
+
238
+ def _extract_response(self, full_text: str) -> str:
239
+ """Extract the assistant's response from the full generated text."""
240
+ # Handle different response formats
241
+ markers = ["assistant\n", "Assistant:", "Response:", "\n\n"]
242
+
243
+ for marker in markers:
244
+ if marker in full_text:
245
+ response = full_text.split(marker)[-1].strip()
246
+ if response: # Ensure we got a meaningful response
247
+ return response
248
+
249
+ # Fallback: return the full text cleaned up
250
+ return full_text.strip()
251
+
252
+ def get_model_info(self) -> Dict[str, Any]:
253
+ """Get information about the loaded model."""
254
+ info = {
255
+ "initialized": self._initialized,
256
+ "device": self.device,
257
+ "model_path": self.model_path,
258
+ "cuda_available": torch.cuda.is_available() if VISION_AVAILABLE else False,
259
+ }
260
+
261
+ if VISION_AVAILABLE and torch.cuda.is_available():
262
+ info["gpu_memory_gb"] = torch.cuda.get_device_properties(0).total_memory // (1024**3)
263
+ else:
264
+ info["gpu_memory_gb"] = "N/A"
265
+
266
+ return info
267
+
268
+
269
+ class VisionManager:
270
+ """Manages periodic vision processing and scene understanding.
271
+
272
+ This runs in the background, periodically capturing frames and
273
+ generating scene descriptions that can be queried.
274
+ """
275
+
276
+ def __init__(
277
+ self,
278
+ camera_worker: Any,
279
+ vision_config: Optional[VisionConfig] = None,
280
+ ):
281
+ """Initialize vision manager.
282
+
283
+ Args:
284
+ camera_worker: CameraWorker instance for frame capture
285
+ vision_config: Vision configuration settings
286
+ """
287
+ self.camera_worker = camera_worker
288
+ self.vision_config = vision_config or VisionConfig()
289
+ self.vision_interval = self.vision_config.vision_interval
290
+ self.processor = VisionProcessor(self.vision_config)
291
+
292
+ self._last_processed_time = 0.0
293
+ self._last_description = ""
294
+ self._description_lock = threading.Lock()
295
+ self._stop_event = threading.Event()
296
+ self._thread: Optional[threading.Thread] = None
297
+
298
+ # Initialize processor
299
+ if not self.processor.initialize():
300
+ logger.error("Failed to initialize vision processor")
301
+ raise RuntimeError("Vision processor initialization failed")
302
+
303
+ def start(self) -> None:
304
+ """Start the vision processing loop in a background thread."""
305
+ self._stop_event.clear()
306
+ self._thread = threading.Thread(target=self._working_loop, daemon=True)
307
+ self._thread.start()
308
+ logger.info("Local vision processing started")
309
+
310
+ def stop(self) -> None:
311
+ """Stop the vision processing loop."""
312
+ self._stop_event.set()
313
+ if self._thread is not None:
314
+ self._thread.join(timeout=5.0)
315
+ logger.info("Local vision processing stopped")
316
+
317
+ def get_latest_description(self) -> str:
318
+ """Get the most recent scene description.
319
+
320
+ Returns:
321
+ Latest scene description or empty string if none available
322
+ """
323
+ with self._description_lock:
324
+ return self._last_description
325
+
326
+ def process_now(self, prompt: str = "Briefly describe what you see in one sentence.") -> str:
327
+ """Process the current frame immediately with a custom prompt.
328
+
329
+ Args:
330
+ prompt: Question/prompt to ask about the image
331
+
332
+ Returns:
333
+ Description of what the camera sees
334
+ """
335
+ frame = self.camera_worker.get_latest_frame()
336
+ if frame is None:
337
+ return "No camera frame available"
338
+
339
+ return self.processor.process_image(frame, prompt)
340
+
341
+ def _working_loop(self) -> None:
342
+ """Vision processing loop (runs in separate thread)."""
343
+ while not self._stop_event.is_set():
344
+ try:
345
+ current_time = time.time()
346
+
347
+ if current_time - self._last_processed_time >= self.vision_interval:
348
+ frame = self.camera_worker.get_latest_frame()
349
+ if frame is not None:
350
+ description = self.processor.process_image(
351
+ frame,
352
+ "Briefly describe what you see in one sentence.",
353
+ )
354
+
355
+ # Only update if we got a valid response
356
+ if description and not description.startswith(
357
+ ("Vision", "Failed", "Error", "GPU")
358
+ ):
359
+ with self._description_lock:
360
+ self._last_description = description
361
+ self._last_processed_time = current_time
362
+ logger.debug(f"Vision update: {description}")
363
+ else:
364
+ logger.warning(f"Invalid vision response: {description}")
365
+
366
+ time.sleep(1.0) # Check every second
367
+
368
+ except Exception:
369
+ logger.exception("Vision processing loop error")
370
+ time.sleep(5.0) # Longer sleep on error
371
+
372
+ logger.info("Vision loop finished")
373
+
374
+ def get_status(self) -> Dict[str, Any]:
375
+ """Get comprehensive status information."""
376
+ return {
377
+ "last_processed": self._last_processed_time,
378
+ "last_description": self.get_latest_description(),
379
+ "processor_info": self.processor.get_model_info(),
380
+ "config": {
381
+ "interval": self.vision_interval,
382
+ },
383
+ }
384
+
385
+
386
+ def initialize_vision_manager(
387
+ camera_worker: Any,
388
+ config: Optional[VisionConfig] = None,
389
+ ) -> Optional[VisionManager]:
390
+ """Initialize vision manager with model download and configuration.
391
+
392
+ Args:
393
+ camera_worker: CameraWorker instance for frame capture
394
+ config: Optional vision configuration
395
+
396
+ Returns:
397
+ VisionManager instance or None if initialization fails
398
+ """
399
+ if not VISION_AVAILABLE:
400
+ logger.warning("Vision dependencies not available. Install: pip install torch transformers")
401
+ return None
402
+
403
+ try:
404
+ vision_config = config or VisionConfig()
405
+
406
+ # Initialize vision manager
407
+ vision_manager = VisionManager(camera_worker, vision_config)
408
+
409
+ # Log device info
410
+ device_info = vision_manager.processor.get_model_info()
411
+ logger.info(
412
+ f"Local vision enabled: {device_info.get('model_path')} on {device_info.get('device')}"
413
+ )
414
+
415
+ return vision_manager
416
+
417
+ except Exception as e:
418
+ logger.error(f"Failed to initialize vision manager: {e}")
419
+ return None
src/reachy_mini_openclaw/vision/yolo_head_tracker.py ADDED
@@ -0,0 +1,152 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """YOLO-based head tracker for face detection.
2
+
3
+ Uses YOLOv11 for fast, accurate face detection.
4
+ """
5
+
6
+ from __future__ import annotations
7
+
8
+ import logging
9
+ from typing import Tuple, Optional
10
+
11
+ import numpy as np
12
+ from numpy.typing import NDArray
13
+
14
+ try:
15
+ from supervision import Detections
16
+ from ultralytics import YOLO
17
+ except ImportError as e:
18
+ raise ImportError(
19
+ "To use YOLO head tracker, install: pip install ultralytics supervision"
20
+ ) from e
21
+
22
+ from huggingface_hub import hf_hub_download
23
+
24
+
25
+ logger = logging.getLogger(__name__)
26
+
27
+
28
+ class HeadTracker:
29
+ """Lightweight head tracker using YOLO for face detection."""
30
+
31
+ def __init__(
32
+ self,
33
+ model_repo: str = "AdamCodd/YOLOv11n-face-detection",
34
+ model_filename: str = "model.pt",
35
+ confidence_threshold: float = 0.3,
36
+ device: str = "cpu",
37
+ ) -> None:
38
+ """Initialize YOLO-based head tracker.
39
+
40
+ Args:
41
+ model_repo: HuggingFace model repository
42
+ model_filename: Model file name
43
+ confidence_threshold: Minimum confidence for face detection
44
+ device: Device to run inference on ('cpu' or 'cuda')
45
+ """
46
+ self.confidence_threshold = confidence_threshold
47
+
48
+ try:
49
+ # Download and load YOLO model
50
+ model_path = hf_hub_download(repo_id=model_repo, filename=model_filename)
51
+ self.model = YOLO(model_path).to(device)
52
+ logger.info(f"YOLO face detection model loaded from {model_repo}")
53
+ except Exception as e:
54
+ logger.error(f"Failed to load YOLO model: {e}")
55
+ raise
56
+
57
+ def _select_best_face(self, detections: Detections) -> Optional[int]:
58
+ """Select the best face based on confidence and area.
59
+
60
+ Args:
61
+ detections: Supervision detections object
62
+
63
+ Returns:
64
+ Index of best face or None if no valid faces
65
+ """
66
+ if detections.xyxy.shape[0] == 0:
67
+ return None
68
+
69
+ if detections.confidence is None:
70
+ return None
71
+
72
+ # Filter by confidence threshold
73
+ valid_mask = detections.confidence >= self.confidence_threshold
74
+ if not np.any(valid_mask):
75
+ return None
76
+
77
+ valid_indices = np.where(valid_mask)[0]
78
+
79
+ # Calculate areas for valid detections
80
+ boxes = detections.xyxy[valid_indices]
81
+ areas = (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1])
82
+
83
+ # Combine confidence and area (weighted towards larger faces)
84
+ confidences = detections.confidence[valid_indices]
85
+ scores = confidences * 0.7 + (areas / np.max(areas)) * 0.3
86
+
87
+ # Return index of best face
88
+ best_idx = valid_indices[np.argmax(scores)]
89
+ return int(best_idx)
90
+
91
+ def _bbox_to_normalized_coords(
92
+ self, bbox: NDArray[np.float32], w: int, h: int
93
+ ) -> NDArray[np.float32]:
94
+ """Convert bounding box center to normalized coordinates [-1, 1].
95
+
96
+ Args:
97
+ bbox: Bounding box [x1, y1, x2, y2]
98
+ w: Image width
99
+ h: Image height
100
+
101
+ Returns:
102
+ Center point in [-1, 1] coordinates
103
+ """
104
+ center_x = (bbox[0] + bbox[2]) / 2.0
105
+ center_y = (bbox[1] + bbox[3]) / 2.0
106
+
107
+ # Normalize to [0, 1] then to [-1, 1]
108
+ norm_x = (center_x / w) * 2.0 - 1.0
109
+ norm_y = (center_y / h) * 2.0 - 1.0
110
+
111
+ return np.array([norm_x, norm_y], dtype=np.float32)
112
+
113
+ def get_head_position(
114
+ self, img: NDArray[np.uint8]
115
+ ) -> Tuple[Optional[NDArray[np.float32]], Optional[float]]:
116
+ """Get head position from face detection.
117
+
118
+ Args:
119
+ img: Input image (BGR format)
120
+
121
+ Returns:
122
+ Tuple of (eye_center in [-1,1] coords, roll_angle in radians)
123
+ """
124
+ h, w = img.shape[:2]
125
+
126
+ try:
127
+ # Run YOLO inference
128
+ results = self.model(img, verbose=False)
129
+ detections = Detections.from_ultralytics(results[0])
130
+
131
+ # Select best face
132
+ face_idx = self._select_best_face(detections)
133
+ if face_idx is None:
134
+ return None, None
135
+
136
+ bbox = detections.xyxy[face_idx]
137
+
138
+ if detections.confidence is not None:
139
+ confidence = detections.confidence[face_idx]
140
+ logger.debug(f"Face detected with confidence: {confidence:.2f}")
141
+
142
+ # Get face center in [-1, 1] coordinates
143
+ face_center = self._bbox_to_normalized_coords(bbox, w, h)
144
+
145
+ # Roll is 0 since we don't have keypoints for precise angle estimation
146
+ roll = 0.0
147
+
148
+ return face_center, roll
149
+
150
+ except Exception as e:
151
+ logger.error(f"Error in head position detection: {e}")
152
+ return None, None
style.css ADDED
@@ -0,0 +1,425 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ :root {
2
+ --bg: #0d0a1a;
3
+ --panel: #1a1428;
4
+ --glass: rgba(26, 20, 40, 0.7);
5
+ --card: rgba(255, 255, 255, 0.04);
6
+ --accent: #ff6b6b;
7
+ --accent-2: #9b59b6;
8
+ --accent-3: #f39c12;
9
+ --text: #f0e8f8;
10
+ --muted: #b8a8c8;
11
+ --border: rgba(255, 255, 255, 0.08);
12
+ --shadow: 0 25px 70px rgba(0, 0, 0, 0.45);
13
+ font-family: "Space Grotesk", "Manrope", system-ui, -apple-system, sans-serif;
14
+ }
15
+
16
+ * {
17
+ margin: 0;
18
+ padding: 0;
19
+ box-sizing: border-box;
20
+ }
21
+
22
+ body {
23
+ background: radial-gradient(circle at 20% 20%, rgba(255, 107, 107, 0.15), transparent 30%),
24
+ radial-gradient(circle at 80% 0%, rgba(155, 89, 182, 0.18), transparent 32%),
25
+ radial-gradient(circle at 50% 70%, rgba(243, 156, 18, 0.1), transparent 30%),
26
+ var(--bg);
27
+ color: var(--text);
28
+ min-height: 100vh;
29
+ line-height: 1.6;
30
+ padding-bottom: 3rem;
31
+ }
32
+
33
+ a {
34
+ color: inherit;
35
+ text-decoration: none;
36
+ }
37
+
38
+ .hero {
39
+ padding: 3.5rem clamp(1.5rem, 3vw, 3rem) 2.5rem;
40
+ position: relative;
41
+ overflow: hidden;
42
+ }
43
+
44
+ .hero::after {
45
+ content: "";
46
+ position: absolute;
47
+ inset: 0;
48
+ background: linear-gradient(120deg, rgba(255, 107, 107, 0.12), rgba(155, 89, 182, 0.08), transparent);
49
+ pointer-events: none;
50
+ }
51
+
52
+ .topline {
53
+ display: flex;
54
+ align-items: center;
55
+ justify-content: space-between;
56
+ max-width: 1200px;
57
+ margin: 0 auto 2rem;
58
+ position: relative;
59
+ z-index: 2;
60
+ }
61
+
62
+ .brand {
63
+ display: flex;
64
+ align-items: center;
65
+ gap: 0.5rem;
66
+ font-weight: 700;
67
+ letter-spacing: 0.5px;
68
+ color: var(--text);
69
+ }
70
+
71
+ .logo {
72
+ display: inline-flex;
73
+ align-items: center;
74
+ justify-content: center;
75
+ width: 2.4rem;
76
+ height: 2.4rem;
77
+ border-radius: 10px;
78
+ background: linear-gradient(145deg, rgba(255, 107, 107, 0.2), rgba(155, 89, 182, 0.2));
79
+ box-shadow: 0 10px 30px rgba(0, 0, 0, 0.25);
80
+ font-size: 1.4rem;
81
+ }
82
+
83
+ .brand-name {
84
+ font-size: 1.2rem;
85
+ }
86
+
87
+ .pill {
88
+ background: rgba(255, 255, 255, 0.06);
89
+ border: 1px solid var(--border);
90
+ padding: 0.6rem 1rem;
91
+ border-radius: 999px;
92
+ color: var(--muted);
93
+ font-size: 0.9rem;
94
+ box-shadow: 0 12px 30px rgba(0, 0, 0, 0.2);
95
+ }
96
+
97
+ .hero-grid {
98
+ display: grid;
99
+ grid-template-columns: repeat(auto-fit, minmax(320px, 1fr));
100
+ gap: clamp(1.5rem, 2.5vw, 2.5rem);
101
+ max-width: 1200px;
102
+ margin: 0 auto;
103
+ position: relative;
104
+ z-index: 2;
105
+ align-items: center;
106
+ }
107
+
108
+ .hero-copy h1 {
109
+ font-size: clamp(2.6rem, 4vw, 3.6rem);
110
+ margin-bottom: 1rem;
111
+ line-height: 1.1;
112
+ letter-spacing: -0.5px;
113
+ }
114
+
115
+ .eyebrow {
116
+ display: inline-flex;
117
+ align-items: center;
118
+ gap: 0.5rem;
119
+ text-transform: uppercase;
120
+ letter-spacing: 1px;
121
+ font-size: 0.8rem;
122
+ color: var(--muted);
123
+ margin-bottom: 0.75rem;
124
+ }
125
+
126
+ .eyebrow::before {
127
+ content: "";
128
+ display: inline-block;
129
+ width: 24px;
130
+ height: 2px;
131
+ background: linear-gradient(90deg, var(--accent), var(--accent-2));
132
+ border-radius: 999px;
133
+ }
134
+
135
+ .lede {
136
+ font-size: 1.1rem;
137
+ color: var(--muted);
138
+ max-width: 620px;
139
+ }
140
+
141
+ .hero-actions {
142
+ display: flex;
143
+ gap: 1rem;
144
+ align-items: center;
145
+ margin: 1.6rem 0 1.2rem;
146
+ flex-wrap: wrap;
147
+ }
148
+
149
+ .btn {
150
+ display: inline-flex;
151
+ align-items: center;
152
+ justify-content: center;
153
+ gap: 0.6rem;
154
+ padding: 0.85rem 1.4rem;
155
+ border-radius: 12px;
156
+ font-weight: 700;
157
+ border: 1px solid transparent;
158
+ cursor: pointer;
159
+ transition: transform 0.2s ease, box-shadow 0.2s ease, background 0.2s ease, border-color 0.2s ease;
160
+ }
161
+
162
+ .btn.primary {
163
+ background: linear-gradient(135deg, #ff6b6b, #9b59b6);
164
+ color: #fff;
165
+ box-shadow: 0 15px 30px rgba(255, 107, 107, 0.25);
166
+ }
167
+
168
+ .btn.primary:hover {
169
+ transform: translateY(-2px);
170
+ box-shadow: 0 25px 45px rgba(255, 107, 107, 0.35);
171
+ }
172
+
173
+ .btn.ghost {
174
+ background: rgba(255, 255, 255, 0.05);
175
+ border-color: var(--border);
176
+ color: var(--text);
177
+ }
178
+
179
+ .btn.ghost:hover {
180
+ border-color: rgba(255, 255, 255, 0.3);
181
+ transform: translateY(-2px);
182
+ }
183
+
184
+ .btn.wide {
185
+ width: 100%;
186
+ justify-content: center;
187
+ }
188
+
189
+ .hero-badges {
190
+ display: flex;
191
+ flex-wrap: wrap;
192
+ gap: 0.6rem;
193
+ color: var(--muted);
194
+ font-size: 0.9rem;
195
+ }
196
+
197
+ .hero-badges span {
198
+ padding: 0.5rem 0.8rem;
199
+ border-radius: 10px;
200
+ border: 1px solid var(--border);
201
+ background: rgba(255, 255, 255, 0.04);
202
+ }
203
+
204
+ .hero-visual .glass-card {
205
+ background: rgba(255, 255, 255, 0.03);
206
+ border: 1px solid var(--border);
207
+ border-radius: 18px;
208
+ padding: 1.2rem;
209
+ box-shadow: var(--shadow);
210
+ backdrop-filter: blur(10px);
211
+ }
212
+
213
+ .hero-gif {
214
+ width: 100%;
215
+ max-width: 500px;
216
+ height: auto;
217
+ border-radius: 14px;
218
+ display: block;
219
+ margin: 0 auto;
220
+ }
221
+
222
+ .architecture-preview {
223
+ background: rgba(0, 0, 0, 0.4);
224
+ border-radius: 14px;
225
+ border: 1px solid var(--border);
226
+ padding: 1.5rem;
227
+ overflow-x: auto;
228
+ }
229
+
230
+ .architecture-preview pre {
231
+ font-family: "SF Mono", "Fira Code", "Consolas", monospace;
232
+ font-size: 0.85rem;
233
+ color: var(--accent);
234
+ white-space: pre;
235
+ margin: 0;
236
+ line-height: 1.5;
237
+ }
238
+
239
+ .caption {
240
+ margin-top: 0.75rem;
241
+ color: var(--muted);
242
+ font-size: 0.95rem;
243
+ }
244
+
245
+ .section {
246
+ max-width: 1200px;
247
+ margin: 0 auto;
248
+ padding: clamp(2rem, 4vw, 3.5rem) clamp(1.5rem, 3vw, 3rem);
249
+ }
250
+
251
+ .section-header {
252
+ text-align: center;
253
+ max-width: 780px;
254
+ margin: 0 auto 2rem;
255
+ }
256
+
257
+ .section-header h2 {
258
+ font-size: clamp(2rem, 3vw, 2.6rem);
259
+ margin-bottom: 0.5rem;
260
+ }
261
+
262
+ .intro {
263
+ color: var(--muted);
264
+ font-size: 1.05rem;
265
+ }
266
+
267
+ .feature-grid {
268
+ display: grid;
269
+ grid-template-columns: repeat(auto-fit, minmax(240px, 1fr));
270
+ gap: 1rem;
271
+ }
272
+
273
+ .feature-card {
274
+ background: rgba(255, 255, 255, 0.03);
275
+ border: 1px solid var(--border);
276
+ border-radius: 16px;
277
+ padding: 1.25rem;
278
+ box-shadow: 0 10px 30px rgba(0, 0, 0, 0.2);
279
+ transition: transform 0.2s ease, border-color 0.2s ease, box-shadow 0.2s ease;
280
+ }
281
+
282
+ .feature-card:hover {
283
+ transform: translateY(-4px);
284
+ border-color: rgba(255, 107, 107, 0.3);
285
+ box-shadow: 0 18px 40px rgba(0, 0, 0, 0.3);
286
+ }
287
+
288
+ .feature-card .icon {
289
+ width: 48px;
290
+ height: 48px;
291
+ border-radius: 12px;
292
+ display: grid;
293
+ place-items: center;
294
+ background: linear-gradient(145deg, rgba(255, 107, 107, 0.15), rgba(155, 89, 182, 0.15));
295
+ margin-bottom: 0.8rem;
296
+ font-size: 1.4rem;
297
+ }
298
+
299
+ .feature-card h3 {
300
+ margin-bottom: 0.35rem;
301
+ }
302
+
303
+ .feature-card p {
304
+ color: var(--muted);
305
+ }
306
+
307
+ .story {
308
+ padding-top: 1rem;
309
+ }
310
+
311
+ .story-grid {
312
+ display: grid;
313
+ grid-template-columns: repeat(auto-fit, minmax(280px, 1fr));
314
+ gap: 1rem;
315
+ }
316
+
317
+ .story-card {
318
+ background: rgba(255, 255, 255, 0.03);
319
+ border: 1px solid var(--border);
320
+ border-radius: 18px;
321
+ padding: 1.5rem;
322
+ box-shadow: var(--shadow);
323
+ }
324
+
325
+ .story-card.secondary {
326
+ background: linear-gradient(145deg, rgba(155, 89, 182, 0.1), rgba(255, 107, 107, 0.08));
327
+ }
328
+
329
+ .story-card.highlight {
330
+ background: linear-gradient(145deg, rgba(46, 204, 113, 0.15), rgba(52, 152, 219, 0.1));
331
+ border-color: rgba(46, 204, 113, 0.3);
332
+ }
333
+
334
+ .simulator-callout {
335
+ padding-top: 0;
336
+ }
337
+
338
+ .simulator-callout code {
339
+ background: rgba(0, 0, 0, 0.3);
340
+ padding: 0.2rem 0.5rem;
341
+ border-radius: 4px;
342
+ font-family: "SF Mono", "Fira Code", monospace;
343
+ font-size: 0.85rem;
344
+ }
345
+
346
+ .story-card h3 {
347
+ margin-bottom: 0.8rem;
348
+ }
349
+
350
+ .story-list {
351
+ list-style: none;
352
+ display: grid;
353
+ gap: 0.7rem;
354
+ color: var(--muted);
355
+ font-size: 0.98rem;
356
+ }
357
+
358
+ .story-list li {
359
+ display: flex;
360
+ gap: 0.7rem;
361
+ align-items: flex-start;
362
+ }
363
+
364
+ .story-text {
365
+ color: var(--muted);
366
+ line-height: 1.7;
367
+ margin-bottom: 1rem;
368
+ }
369
+
370
+ .chips {
371
+ display: flex;
372
+ flex-wrap: wrap;
373
+ gap: 0.5rem;
374
+ }
375
+
376
+ .chip {
377
+ padding: 0.45rem 0.8rem;
378
+ border-radius: 12px;
379
+ background: rgba(0, 0, 0, 0.3);
380
+ border: 1px solid var(--border);
381
+ color: var(--text);
382
+ font-size: 0.9rem;
383
+ }
384
+
385
+ .footer {
386
+ text-align: center;
387
+ color: var(--muted);
388
+ padding: 2rem 1.5rem 0;
389
+ max-width: 800px;
390
+ margin: 0 auto;
391
+ }
392
+
393
+ .footer a {
394
+ color: var(--accent);
395
+ border-bottom: 1px solid transparent;
396
+ }
397
+
398
+ .footer a:hover {
399
+ border-color: var(--accent);
400
+ }
401
+
402
+ @media (max-width: 768px) {
403
+ .hero {
404
+ padding-top: 2.5rem;
405
+ }
406
+
407
+ .topline {
408
+ flex-direction: column;
409
+ gap: 0.8rem;
410
+ align-items: flex-start;
411
+ }
412
+
413
+ .hero-actions {
414
+ width: 100%;
415
+ }
416
+
417
+ .btn {
418
+ width: 100%;
419
+ justify-content: center;
420
+ }
421
+
422
+ .hero-badges {
423
+ gap: 0.4rem;
424
+ }
425
+ }