Spaces:

eoai-dev
/

moltbot_body

Running

App Files Files Community

moltbot_body / README.md

Eddie Hudson

Initial commit

6af0b1a 16 days ago

preview code

raw

history blame contribute delete

6.92 kB

metadata

title: Moltbot Body
emoji: 🤖
colorFrom: green
colorTo: blue
sdk: static
pinned: false
short_description: Give Moltbot a physical presence with Reachy Mini
tags:
  - reachy_mini
  - reachy_mini_python_app
  - clawdbot
  - moltbot

Moltbot's Body

Security Warning: This project uses Moltbot, which runs AI-generated code with access to your system. Ensure you understand the security implications before installation. Only run Moltbot from trusted sources and review its permissions carefully. See the Moltbot Security documentation for details.

Reachy Mini integration with Moltbot — giving Moltbot a physical presence.

What is Moltbot?

Moltbot is an AI assistant platform that can connect to various chat surfaces (WhatsApp, Telegram, Discord, etc.) and execute tasks autonomously. This project extends Moltbot by giving it a physical robot body using Reachy Mini, a small expressive robot from Pollen Robotics.

With this integration, Moltbot can:

Listen to speech via the robot's microphone
Transcribe speech locally using Whisper
Generate responses through the Moltbot gateway
Speak responses through ElevenLabs TTS
Move its head expressively while speaking

Architecture

Microphone → VAD → Whisper STT → Moltbot Gateway → ElevenLabs TTS → Speaker
                                        ↓
                                   MovementManager
                                   HeadWobbler (speech-driven head movement)

Prerequisites

Before running this project, you need:

1. Moltbot Gateway (Required)

Moltbot must be installed and the gateway must be running. Follow the Moltbot Getting Started guide to:

Install the CLI: curl -fsSL https://molt.bot/install.sh | bash
Run the onboarding wizard: moltbot onboard --install-daemon
Start the gateway: moltbot gateway --port 18789

Verify it's running:

moltbot gateway status

2. Reachy Mini Robot (Required)

You need a Reachy Mini robot from Pollen Robotics with its daemon running.

Verify the daemon is running:

curl -s http://localhost:8000/api/daemon/status | jq .state

3. ElevenLabs Account (Required)

4. Python 3.12+ and uv

This project requires Python 3.12 or later and uses uv for package management.

Setup

git clone <this-repo>
cd reachy
uv sync

Environment Variables

Create a .env file:

CLAWDBOT_TOKEN=your_gateway_token
ELEVENLABS_API_KEY=your_elevenlabs_key

Get your gateway token from the Moltbot configuration, or these will be pulled from the Moltbot config automatically if not set.

Running

# Make sure Reachy Mini daemon is running
curl -s http://localhost:8000/api/daemon/status | jq .state

# Make sure Moltbot gateway is running
moltbot gateway status

# Start Moltbot's body
uv run moltbot-body

CLI Options

Flag	Description
`--debug`	Enable debug logging (verbose output)
`--profile`	Enable timing profiler - prints detailed timing breakdown after each conversation turn
`--profile-once`	Profile one conversation turn then exit (useful for benchmarking)
`--robot-name NAME`	Specify robot name for connection (if you have multiple robots)
`--gateway-url URL`	Moltbot gateway URL (default: `http://localhost:18789`)

Examples

# Run with debug logging
uv run moltbot-body --debug

# Profile a single conversation turn
uv run moltbot-body --profile-once

# Connect to a specific robot and gateway
uv run moltbot-body --robot-name my-reachy --gateway-url http://192.168.1.100:18789

Profiling Output

When using --profile or --profile-once, you'll see a detailed timing breakdown after each turn:

============================================================
CONVERSATION TIMING PROFILE
============================================================

📝 User: "Hello, how are you?"
🤖 Assistant: "I'm doing well, thank you for asking!"

------------------------------------------------------------
TIMING BREAKDOWN
------------------------------------------------------------

🎤 Speech Detection:
   Duration spoken:     1.23s

📜 Whisper Transcription:
   Time:                0.45s

🧠 LLM (Moltbot):
   Time to first token: 0.32s
   Streaming time:      1.15s
   Total time:          1.47s
   Tokens:              42 (36.5 tok/s)

🔊 TTS (ElevenLabs):
   Time to first audio: 0.28s
   Total streaming:     1.82s
   Audio chunks:        15

------------------------------------------------------------
END-TO-END LATENCY
------------------------------------------------------------

⏱️  Speech end → First audio: 1.05s
⏱️  Total turn time:          4.50s

============================================================

Features

Voice Activation: Listens for speech, processes when silence detected
Whisper STT: Local speech-to-text transcription using faster-whisper
Moltbot Brain: Claude-powered responses via the Moltbot gateway API
ElevenLabs TTS: Natural voice output with streaming
Head Wobble: Audio-driven head movement while speaking for natural expressiveness
Movement Manager: 100Hz control loop for smooth robot motion
Breathing Animation: Gentle idle breathing when not actively engaged

Tips for a Better Experience

Use a Low-Latency Inference Provider

For natural, conversational interactions, response latency is critical. The time from when you stop speaking to when the robot starts responding should ideally be under 1 second.

Consider using a fast inference provider like Groq which offers extremely low latency for supported models. You can configure this in your Moltbot settings. Use the --profile flag to measure your end-to-end latency and identify bottlenecks.

Let Moltbot Help You Set Up

Since Moltbot is an AI coding assistant, you can chat with it to help configure and customize the robot body! Try asking Moltbot (via any of its chat surfaces) to:

Help you tune the head movement parameters
Adjust the voice activation sensitivity
Add new expressions or gestures
Debug connection issues

Moltbot can read and modify this codebase, so it's a great collaborator for extending the robot's capabilities.

Roadmap

Face tracking (look at the person speaking)
DoA-based head tracking (direction of arrival for speaker localization)
Wake word detection
Expression gestures

License

MIT License - see LICENSE for details.