Spaces:
Running
Running
File size: 6,922 Bytes
6af0b1a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 |
---
title: Moltbot Body
emoji: π€
colorFrom: green
colorTo: blue
sdk: static
pinned: false
short_description: Give Moltbot a physical presence with Reachy Mini
tags:
- reachy_mini
- reachy_mini_python_app
- clawdbot
- moltbot
---
# Moltbot's Body
> **Security Warning**: This project uses Moltbot, which runs AI-generated code with access to your system. Ensure you understand the security implications before installation. Only run Moltbot from trusted sources and review its permissions carefully. See the [Moltbot Security documentation](https://docs.molt.bot/gateway/security) for details.
Reachy Mini integration with Moltbot β giving Moltbot a physical presence.
## What is Moltbot?
[Moltbot](https://docs.molt.bot/start/getting-started) is an AI assistant platform that can connect to various chat surfaces (WhatsApp, Telegram, Discord, etc.) and execute tasks autonomously. This project extends Moltbot by giving it a physical robot body using [Reachy Mini](https://huggingface.co/spaces/pollen-robotics/Reachy_Mini), a small expressive robot from Pollen Robotics.
With this integration, Moltbot can:
- Listen to speech via the robot's microphone
- Transcribe speech locally using Whisper
- Generate responses through the Moltbot gateway
- Speak responses through ElevenLabs TTS
- Move its head expressively while speaking
## Architecture
```
Microphone β VAD β Whisper STT β Moltbot Gateway β ElevenLabs TTS β Speaker
β
MovementManager
HeadWobbler (speech-driven head movement)
```
## Prerequisites
Before running this project, you need:
### 1. Moltbot Gateway (Required)
Moltbot must be installed and the gateway must be running. Follow the [Moltbot Getting Started guide](https://docs.molt.bot/start/getting-started) to:
1. Install the CLI: `curl -fsSL https://molt.bot/install.sh | bash`
2. Run the onboarding wizard: `moltbot onboard --install-daemon`
3. Start the gateway: `moltbot gateway --port 18789`
Verify it's running:
```bash
moltbot gateway status
```
### 2. Reachy Mini Robot (Required)
You need a [Reachy Mini](https://huggingface.co/spaces/pollen-robotics/Reachy_Mini) robot from Pollen Robotics with its daemon running.
Verify the daemon is running:
```bash
curl -s http://localhost:8000/api/daemon/status | jq .state
```
### 3. ElevenLabs Account (Required)
Sign up at [ElevenLabs](https://elevenlabs.io/) and get an API key for text-to-speech.
### 4. Python 3.12+ and uv
This project requires Python 3.12 or later and uses [uv](https://docs.astral.sh/uv/) for package management.
## Setup
```bash
git clone <this-repo>
cd reachy
uv sync
```
### Environment Variables
Create a `.env` file:
```bash
CLAWDBOT_TOKEN=your_gateway_token
ELEVENLABS_API_KEY=your_elevenlabs_key
```
Get your gateway token from the Moltbot configuration, or these will be pulled from the Moltbot config automatically if not set.
## Running
```bash
# Make sure Reachy Mini daemon is running
curl -s http://localhost:8000/api/daemon/status | jq .state
# Make sure Moltbot gateway is running
moltbot gateway status
# Start Moltbot's body
uv run moltbot-body
```
## CLI Options
| Flag | Description |
|------|-------------|
| `--debug` | Enable debug logging (verbose output) |
| `--profile` | Enable timing profiler - prints detailed timing breakdown after each conversation turn |
| `--profile-once` | Profile one conversation turn then exit (useful for benchmarking) |
| `--robot-name NAME` | Specify robot name for connection (if you have multiple robots) |
| `--gateway-url URL` | Moltbot gateway URL (default: `http://localhost:18789`) |
### Examples
```bash
# Run with debug logging
uv run moltbot-body --debug
# Profile a single conversation turn
uv run moltbot-body --profile-once
# Connect to a specific robot and gateway
uv run moltbot-body --robot-name my-reachy --gateway-url http://192.168.1.100:18789
```
### Profiling Output
When using `--profile` or `--profile-once`, you'll see a detailed timing breakdown after each turn:
```
============================================================
CONVERSATION TIMING PROFILE
============================================================
π User: "Hello, how are you?"
π€ Assistant: "I'm doing well, thank you for asking!"
------------------------------------------------------------
TIMING BREAKDOWN
------------------------------------------------------------
π€ Speech Detection:
Duration spoken: 1.23s
π Whisper Transcription:
Time: 0.45s
π§ LLM (Moltbot):
Time to first token: 0.32s
Streaming time: 1.15s
Total time: 1.47s
Tokens: 42 (36.5 tok/s)
π TTS (ElevenLabs):
Time to first audio: 0.28s
Total streaming: 1.82s
Audio chunks: 15
------------------------------------------------------------
END-TO-END LATENCY
------------------------------------------------------------
β±οΈ Speech end β First audio: 1.05s
β±οΈ Total turn time: 4.50s
============================================================
```
## Features
- **Voice Activation**: Listens for speech, processes when silence detected
- **Whisper STT**: Local speech-to-text transcription using faster-whisper
- **Moltbot Brain**: Claude-powered responses via the Moltbot gateway API
- **ElevenLabs TTS**: Natural voice output with streaming
- **Head Wobble**: Audio-driven head movement while speaking for natural expressiveness
- **Movement Manager**: 100Hz control loop for smooth robot motion
- **Breathing Animation**: Gentle idle breathing when not actively engaged
## Tips for a Better Experience
### Use a Low-Latency Inference Provider
For natural, conversational interactions, response latency is critical. The time from when you stop speaking to when the robot starts responding should ideally be under 1 second.
Consider using a fast inference provider like [Groq](https://groq.com/) which offers extremely low latency for supported models. You can configure this in your Moltbot settings. Use the `--profile` flag to measure your end-to-end latency and identify bottlenecks.
### Let Moltbot Help You Set Up
Since Moltbot is an AI coding assistant, you can chat with it to help configure and customize the robot body! Try asking Moltbot (via any of its chat surfaces) to:
- Help you tune the head movement parameters
- Adjust the voice activation sensitivity
- Add new expressions or gestures
- Debug connection issues
Moltbot can read and modify this codebase, so it's a great collaborator for extending the robot's capabilities.
## Roadmap
- [ ] Face tracking (look at the person speaking)
- [ ] DoA-based head tracking (direction of arrival for speaker localization)
- [ ] Wake word detection
- [ ] Expression gestures
## License
MIT License - see [LICENSE](LICENSE) for details.
|