OpenClaw Embodiment SDK v1.2: bidirectional loop, local inference, multi-modal triggers (Apache 2.0)
Posted to: Hugging Face Discussion
Post
Most embodied AI frameworks are built around the agent: the agent decides, the hardware executes. We think that's backwards -- and the SDK now proves it in both directions.
What is it: The OpenClaw Embodiment SDK is a HAL (hardware abstraction layer) that sits between physical hardware and the OpenClaw multi-agent runtime. The SDK wraps device-native SDKs to preserve hardware safety limits, then surfaces sensor capture, context delivery, and physical actuation as clean Python ABCs -- so agent logic never couples to device specifics.
v1.2 closes the bidirectional loop. Gate 4 is complete.
The original architecture was device β agent. The device noticed, the agent thought, the device acted -- but only if you wired the response path yourself. v1.2 builds the full return path:
IMU + Camera + Microphone
β
TriggerDetector + AudioTriggerDetector
β
TriggerArbiter (FIRST_WINS / AUDIO_PRIORITY / HIGHEST_CONFIDENCE)
β
TransportHal (BLE Β· HTTP Β· LocalMLX Β· Attachment)
β
OpenClaw Agent Runtime
β
AgentResponseListener
β
AudioOutputHal.speak() + DisplayHal.show_card() + ActuatorHal.execute()
Reachy hears something. Turns toward it. Looks. Thinks. Responds. All in Python.
What's new in v1.2:
Gate 4: AgentResponseListener (core/response.py). The agent's response now flows back to the device automatically. TEXT routes to AudioOutputHal (TTS) with DisplayHal fallback. DISPLAY routes to the face display. ACTION fires the actuator. HEARTBEAT is acknowledged. No polling; event-driven.
LocalMLXTransport (transport/mlx.py). On-device inference via mlx_lm on Apple Silicon. Default model: mlx-community/Qwen3-0.6B-4bit. Load time: 8.6s. Inference: 0.59s. Zero cloud dependency. Hybrid routing: local fast path, HTTP gateway fallback for complex reasoning. Install: pip install mlx-lm.
AudioTriggerDetector (triggers/audio_trigger.py). RMS energy threshold detection using arecord (ALSA). State machine: IDLE β DETECTING β TRIGGERED β COOLDOWN. Configurable threshold, min duration, cooldown. Runs concurrently with visual TriggerDetector.
TriggerArbiter (triggers/arbiter.py). Fuses visual + audio trigger streams. Policies: FIRST_WINS (default, lowest latency), AUDIO_PRIORITY, VISUAL_PRIORITY, HIGHEST_CONFIDENCE (holds window, emits best confidence). 500ms deduplication window prevents double-firing.
StatusIndicatorHal (9th HAL ABC). LED/status feedback across all profiles. set_color(r, g, b), blink(interval_ms), pulse(pattern), off(). Patterns: heartbeat, alert, processing, idle. Simulator included for CI.
TransportHal latency awareness. get_expected_latency_ms() is now abstract -- every transport declares its expected overhead. BLE: 50ms. HTTP: 10ms. LocalMLX: 5ms. AttachmentTransport: 100ms. Pipeline uses this to offset actuation timing for synchronized TTS + display delivery.
AttachmentTransport (transport/attachment.py). Attaches raw camera frames (base64) directly to openclaw sessions spawn turns. The agent sees the actual frame, not a compressed embedding. Useful for high-value visual context where BLE's 25KB limit isn't enough.
MicrophoneHal.transcribe() is now abstract. Default: delegates to openclaw stt transcribe via subprocess. All device profiles get STT without a per-device speech stack. Async variant included.
HeartbeatWake. TriggerDetector calls POST /heartbeat/trigger on the OpenClaw gateway at every CAPTURE transition. Eliminates up to 30s agent polling latency. 5s cooldown prevents spam.
Reachy 2 profile (hal/reachy2_reference.py). Full humanoid: 7-DOF arms Γ 2, grippers, 3-DOF neck, stereo cameras, optional wheeled mobile base. Same HAL ABCs as Reachy Mini, richer actuator map. load_profile("reachy2", host="reachy.local").
Even G2 audio path complete. LC3 mic capture via BLE characteristic 0xF1. A2DP speaker routing via OS audio stack. Full bidirectional audio on the glasses.
Numbers:
| Metric | v1.1 (Gate 3) | v1.2 (Gate 4) |
|---|---|---|
| HAL ABCs | 8 | 9 |
| Device profiles | 6 | 7 |
| Tests passing | 237 | 270 |
| Bidirectional loop | No | β Yes |
| Offline inference | No | β Yes |
| Multi-modal triggers | No | β Yes |
Status:
- Apache 2.0, fully open source
- Hardware validated: Raspberry Pi Compute Module 5 (Reachy Mini)
- SDK: github.com/mmartoccia/openclaw-embodiment
- v1.2.0 tagged
- Quick start:
pip install openclaw-embodiment && openclaw-embodiment doctor
On the roadmap (v2.0 -- design specs in docs/specs/):
HalOrchestrator: explicittrigger β capture β classify β transport β actuateloop as first-class async object with per-stage hooks and middlewareCrossEmbodimentOrchestrator: one agent, multiple simultaneous device types (glasses + robot + phone) with intent routing
Looking for collaborators:
Reachy Mini lead times are around 90 days. If you already have hardware and want to run multi-agent workflows on Reachy today -- open an issue on the repo and we will get you set up. Looking for 2-3 developers to validate edge cases on hardware we don't have on hand.
If you're building anything with Reachy Mini, Reachy 2, Even Realities G2, OAK-D, or Frame and OpenClaw might be a fit -- drop a reply here or open an issue.
The Embodiment SDK integrates with the OpenClaw agent framework (openclaw.ai). We built the HAL layer, not OpenClaw itself.