OpenClaw Embodiment SDK v1.2: bidirectional loop, local inference, multi-modal triggers (Apache 2.0)

Posted to: Hugging Face Discussion

Post

Most embodied AI frameworks are built around the agent: the agent decides, the hardware executes. We think that's backwards -- and the SDK now proves it in both directions.

What is it: The OpenClaw Embodiment SDK is a HAL (hardware abstraction layer) that sits between physical hardware and the OpenClaw multi-agent runtime. The SDK wraps device-native SDKs to preserve hardware safety limits, then surfaces sensor capture, context delivery, and physical actuation as clean Python ABCs -- so agent logic never couples to device specifics.

v1.2 closes the bidirectional loop. Gate 4 is complete.

The original architecture was device → agent. The device noticed, the agent thought, the device acted -- but only if you wired the response path yourself. v1.2 builds the full return path:

IMU + Camera + Microphone
       ↓
TriggerDetector + AudioTriggerDetector
       ↓
TriggerArbiter (FIRST_WINS / AUDIO_PRIORITY / HIGHEST_CONFIDENCE)
       ↓
TransportHal (BLE · HTTP · LocalMLX · Attachment)
       ↓
OpenClaw Agent Runtime
       ↓
AgentResponseListener
       ↓
AudioOutputHal.speak() + DisplayHal.show_card() + ActuatorHal.execute()

Reachy hears something. Turns toward it. Looks. Thinks. Responds. All in Python.

What's new in v1.2:

Gate 4: AgentResponseListener (core/response.py). The agent's response now flows back to the device automatically. TEXT routes to AudioOutputHal (TTS) with DisplayHal fallback. DISPLAY routes to the face display. ACTION fires the actuator. HEARTBEAT is acknowledged. No polling; event-driven.

LocalMLXTransport (transport/mlx.py). On-device inference via mlx_lm on Apple Silicon. Default model: mlx-community/Qwen3-0.6B-4bit. Load time: 8.6s. Inference: 0.59s. Zero cloud dependency. Hybrid routing: local fast path, HTTP gateway fallback for complex reasoning. Install: pip install mlx-lm.

AudioTriggerDetector (triggers/audio_trigger.py). RMS energy threshold detection using arecord (ALSA). State machine: IDLE → DETECTING → TRIGGERED → COOLDOWN. Configurable threshold, min duration, cooldown. Runs concurrently with visual TriggerDetector.

TriggerArbiter (triggers/arbiter.py). Fuses visual + audio trigger streams. Policies: FIRST_WINS (default, lowest latency), AUDIO_PRIORITY, VISUAL_PRIORITY, HIGHEST_CONFIDENCE (holds window, emits best confidence). 500ms deduplication window prevents double-firing.

StatusIndicatorHal (9th HAL ABC). LED/status feedback across all profiles. set_color(r, g, b), blink(interval_ms), pulse(pattern), off(). Patterns: heartbeat, alert, processing, idle. Simulator included for CI.

TransportHal latency awareness. get_expected_latency_ms() is now abstract -- every transport declares its expected overhead. BLE: 50ms. HTTP: 10ms. LocalMLX: 5ms. AttachmentTransport: 100ms. Pipeline uses this to offset actuation timing for synchronized TTS + display delivery.

AttachmentTransport (transport/attachment.py). Attaches raw camera frames (base64) directly to openclaw sessions spawn turns. The agent sees the actual frame, not a compressed embedding. Useful for high-value visual context where BLE's 25KB limit isn't enough.

MicrophoneHal.transcribe() is now abstract. Default: delegates to openclaw stt transcribe via subprocess. All device profiles get STT without a per-device speech stack. Async variant included.

HeartbeatWake. TriggerDetector calls POST /heartbeat/trigger on the OpenClaw gateway at every CAPTURE transition. Eliminates up to 30s agent polling latency. 5s cooldown prevents spam.

Reachy 2 profile (hal/reachy2_reference.py). Full humanoid: 7-DOF arms × 2, grippers, 3-DOF neck, stereo cameras, optional wheeled mobile base. Same HAL ABCs as Reachy Mini, richer actuator map. load_profile("reachy2", host="reachy.local").

Even G2 audio path complete. LC3 mic capture via BLE characteristic 0xF1. A2DP speaker routing via OS audio stack. Full bidirectional audio on the glasses.

Numbers:

Metric	v1.1 (Gate 3)	v1.2 (Gate 4)
HAL ABCs	8	9
Device profiles	6	7
Tests passing	237	270
Bidirectional loop	No	✅ Yes
Offline inference	No	✅ Yes
Multi-modal triggers	No	✅ Yes

Status:

Apache 2.0, fully open source
Hardware validated: Raspberry Pi Compute Module 5 (Reachy Mini)
SDK: github.com/mmartoccia/openclaw-embodiment
v1.2.0 tagged
Quick start: pip install openclaw-embodiment && openclaw-embodiment doctor

On the roadmap (v2.0 -- design specs in docs/specs/):

HalOrchestrator: explicit trigger → capture → classify → transport → actuate loop as first-class async object with per-stage hooks and middleware
CrossEmbodimentOrchestrator: one agent, multiple simultaneous device types (glasses + robot + phone) with intent routing

Looking for collaborators:

Reachy Mini lead times are around 90 days. If you already have hardware and want to run multi-agent workflows on Reachy today -- open an issue on the repo and we will get you set up. Looking for 2-3 developers to validate edge cases on hardware we don't have on hand.

If you're building anything with Reachy Mini, Reachy 2, Even Realities G2, OAK-D, or Frame and OpenClaw might be a fit -- drop a reply here or open an issue.

The Embodiment SDK integrates with the OpenClaw agent framework (openclaw.ai). We built the HAL layer, not OpenClaw itself.

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Robotics