Spaces:
Running
Running
docs(story): move parts of the lib into web workers
Browse files- docs/planning/009_web_worker.md +167 -0
docs/planning/009_web_worker.md
ADDED
|
@@ -0,0 +1,167 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# User Story 009: Web Worker Architecture (Main-thread Safe Web Library)
|
| 2 |
+
|
| 3 |
+
## Story
|
| 4 |
+
|
| 5 |
+
**As a** user building robotics UIs that also render live camera previews and interactive controls
|
| 6 |
+
**I want** `@lerobot/web` to run heavy control/recording work off the main thread
|
| 7 |
+
**So that** my UI stays smooth (no flicker/jank) even when teleoperation and recording are active
|
| 8 |
+
|
| 9 |
+
## Background
|
| 10 |
+
|
| 11 |
+
The current browser implementation runs teleoperation control loops, dataset assembly, and export logic on the main thread. When activating keyboard teleoperation while previewing a camera stream, the preview can flicker due to main-thread contention. This is a UX blocker for real-world apps that combine live video, UI interactions, and hardware control.
|
| 12 |
+
|
| 13 |
+
A worker-based architecture lets us move CPU-intensive, frequent, or bursty work off the main thread. The main thread remains responsible for DOM, video rendering and user interactions. The library must preserve the existing API (`calibrate()`, `teleoperate()`, `record()`) while transparently using workers when available, and cleanly falling back to the current approach otherwise.
|
| 14 |
+
|
| 15 |
+
## Goals
|
| 16 |
+
|
| 17 |
+
- Identical public API to todayβs `@lerobot/web` (no breaking changes)
|
| 18 |
+
- Main-thread safe by default: heavy or frequent work executes in a Web Worker
|
| 19 |
+
- Graceful fallback when workers or specific APIs arenβt available
|
| 20 |
+
- Type-safe, minimal-copy message protocol using Transferables when possible
|
| 21 |
+
- Strict library/demo separation: UI and storage remain in demos
|
| 22 |
+
- Maintain Python lerobot UX parity and behavior
|
| 23 |
+
|
| 24 |
+
## Non-Goals (for this story)
|
| 25 |
+
|
| 26 |
+
- Changing dataset formats or camera acquisition approach
|
| 27 |
+
- Rewriting Web Serial API usage into worker (browser support is limited in workers)
|
| 28 |
+
- Introducing new external dependencies
|
| 29 |
+
|
| 30 |
+
## Acceptance Criteria
|
| 31 |
+
|
| 32 |
+
- Smooth UI under load:
|
| 33 |
+
- With at least one active camera preview and keyboard teleoperation at 60β120 Hz, the preview does not flicker and UI remains responsive at ~60 FPS
|
| 34 |
+
- API compatibility:
|
| 35 |
+
- `calibrate()`, `teleoperate()`, `record()` signatures and return shapes are unchanged
|
| 36 |
+
- Feature-detect workers; automatically use worker-backed runtime when available, otherwise use current main-thread runtime
|
| 37 |
+
- Clear separation of responsibilities:
|
| 38 |
+
- Worker executes control loops, interpolation, dataset assembly, export packaging, and CPU-heavy transforms
|
| 39 |
+
- Main thread owns DOM/UI and browser-only APIs that are unavailable in workers (e.g., Web Serial write calls)
|
| 40 |
+
- Type-safe protocol:
|
| 41 |
+
- Strongly typed request/response messages with versioned `type` fields; Transferable payloads used for large data
|
| 42 |
+
- Reliability & fallback:
|
| 43 |
+
- If the worker crashes or becomes unavailable, operations fail gracefully with descriptive errors and suggest retry
|
| 44 |
+
- Fallback path (main-thread) is automatically used when worker creation fails
|
| 45 |
+
- Tests & docs:
|
| 46 |
+
- Unit tests cover protocol routing and basic round-trips
|
| 47 |
+
- Planning docs updated; README notes main-thread-safe architecture
|
| 48 |
+
|
| 49 |
+
## Architecture Overview
|
| 50 |
+
|
| 51 |
+
### Worker Boundaries
|
| 52 |
+
|
| 53 |
+
- Execute in Worker:
|
| 54 |
+
- Control loop scheduling and target computation for teleoperation (keyboard/direct and future teleoperators)
|
| 55 |
+
- Episode/frame buffering and interpolation (regularization) for recording
|
| 56 |
+
- Dataset assembly (tables/metadata), packaging (ZIP writer), and background export streaming
|
| 57 |
+
- Lightweight telemetry aggregation for UI
|
| 58 |
+
- Execute on Main Thread:
|
| 59 |
+
- DOM, UI, and camera previews (`<video>` elements)
|
| 60 |
+
- Web Serial API read/write bridge (if browser does not permit worker access)
|
| 61 |
+
- MediaRecorder handling (browser-optimized implementation already off main CPU in many engines)
|
| 62 |
+
|
| 63 |
+
### Threading Model
|
| 64 |
+
|
| 65 |
+
- Main thread spawns one worker per βprocessβ instance as needed:
|
| 66 |
+
- TeleoperationProcess β TeleopWorker
|
| 67 |
+
- RecordProcess β RecordWorker (can be shared or composed with teleop worker depending on lifecycle)
|
| 68 |
+
- The public process objects returned from `teleoperate()`/`record()` are proxies. Method calls post messages to the worker and return promises where appropriate.
|
| 69 |
+
- SerialBridge (main-thread): worker requests motor write/read; main thread performs Web Serial operations and returns results. This preserves worker advantages while respecting browser API constraints.
|
| 70 |
+
|
| 71 |
+
### Message Protocol (Typed)
|
| 72 |
+
|
| 73 |
+
All messages include a discriminant `type` and a `requestId` when a response is expected.
|
| 74 |
+
|
| 75 |
+
- Teleoperation (examples):
|
| 76 |
+
- `teleop/start`, `teleop/stop`
|
| 77 |
+
- `teleop/update_key_state` { key, pressed }
|
| 78 |
+
- `teleop/move_motor` { motorName, position }
|
| 79 |
+
- `teleop/state_update` { motorConfigs, keyStates, lastUpdate } (worker β main)
|
| 80 |
+
- `serial/write_position` { id, position } (worker β main) β `serial/ack`
|
| 81 |
+
- Recording (examples):
|
| 82 |
+
- `record/start`, `record/stop`, `record/next_episode`
|
| 83 |
+
- `record/frame_append` { payload transferable }
|
| 84 |
+
- `record/export_zip` { options } β streaming progress events
|
| 85 |
+
- Error & lifecycle:
|
| 86 |
+
- `worker/error`, `worker/ready`, `worker/teardown`
|
| 87 |
+
|
| 88 |
+
Use Transferables (ArrayBuffer/MessagePort) for large payloads to avoid copies.
|
| 89 |
+
|
| 90 |
+
### File Structure (web package)
|
| 91 |
+
|
| 92 |
+
```
|
| 93 |
+
packages/web/src/
|
| 94 |
+
βββ workers/
|
| 95 |
+
β βββ teleop.worker.ts # Teleoperation control loop
|
| 96 |
+
β βββ record.worker.ts # Recording assembly/export
|
| 97 |
+
β βββ protocol.ts # Message types & guards
|
| 98 |
+
β βββ utils.worker.ts # Worker-side helpers (interpolation, zip)
|
| 99 |
+
βββ bridges/
|
| 100 |
+
β βββ serial-bridge.ts # Main-thread serial proxy for workers
|
| 101 |
+
βββ teleoperate.ts # Spawns worker, returns proxy process
|
| 102 |
+
βββ record.ts # Spawns worker, returns proxy process
|
| 103 |
+
βββ types/
|
| 104 |
+
βββ worker.ts # Public worker-related types (narrow)
|
| 105 |
+
```
|
| 106 |
+
|
| 107 |
+
### Lifecycle & Fallback
|
| 108 |
+
|
| 109 |
+
- On `teleoperate()`/`record()` call:
|
| 110 |
+
- Try to instantiate corresponding worker via `new Worker(new URL(...), { type: 'module' })`
|
| 111 |
+
- If success: wire protocol channels and return proxy-backed process
|
| 112 |
+
- If fail: fall back to current main-thread implementation (no behavioral changes)
|
| 113 |
+
- On `process.stop()` or page unload: send `worker/teardown` and terminate the worker
|
| 114 |
+
|
| 115 |
+
### Performance Notes
|
| 116 |
+
|
| 117 |
+
- Control loop cadence generated inside worker to avoid main-thread timers
|
| 118 |
+
- Batch serial commands from worker to main-thread bridge to minimize postMessage overhead
|
| 119 |
+
- Use coarse-to-fine update: high-rate calculations in worker; lower-rate UI state updates to main thread (e.g., 10β20 Hz) for rendering
|
| 120 |
+
- For export, stream chunks from worker; main thread triggers download or HF upload
|
| 121 |
+
|
| 122 |
+
### Error Handling
|
| 123 |
+
|
| 124 |
+
- All request/response messages enforce timeouts with descriptive errors
|
| 125 |
+
- Worker initialization guarded with feature detection and clear fallback
|
| 126 |
+
- Protocol version field enables future evolution without breaking older callers
|
| 127 |
+
|
| 128 |
+
## Phased Implementation Plan
|
| 129 |
+
|
| 130 |
+
### Phase 1: Dataset & Export Offload (Low Risk)
|
| 131 |
+
|
| 132 |
+
- Move episode interpolation, dataset assembly, and ZIP packaging to `record.worker.ts`
|
| 133 |
+
- Main thread keeps MediaRecorder and camera preview as-is
|
| 134 |
+
- Public API unchanged; verify ZIP download and HF upload via streamed messages
|
| 135 |
+
|
| 136 |
+
### Phase 2: Teleoperation Offload with SerialBridge
|
| 137 |
+
|
| 138 |
+
- Move control loop scheduling and target computation to `teleop.worker.ts`
|
| 139 |
+
- Implement SerialBridge on main thread for Web Serial commands
|
| 140 |
+
- Worker posts motor write requests; main thread executes and responds
|
| 141 |
+
- Throttle state updates to UI while maintaining high-rate control internally
|
| 142 |
+
|
| 143 |
+
### Phase 3: Fine-Grained Optimizations
|
| 144 |
+
|
| 145 |
+
- Introduce Transferables for large buffers
|
| 146 |
+
- Optional OffscreenCanvas pipelines for future video transforms (not required for current scope)
|
| 147 |
+
- Tune batching and message cadence under hardware testing
|
| 148 |
+
|
| 149 |
+
### Phase 4: Reliability & Observability
|
| 150 |
+
|
| 151 |
+
- Heartbeat messages and auto-restart policy for worker failures
|
| 152 |
+
- Dev diagnostics toggles; production minimal logging
|
| 153 |
+
|
| 154 |
+
## Risks & Mitigations
|
| 155 |
+
|
| 156 |
+
- Web Serial availability in workers: use main-thread SerialBridge (design accounts for this)
|
| 157 |
+
- Message overhead at high Hz: batch commands and reduce UI state update frequency
|
| 158 |
+
- Browser differences: feature-detect and test on Chromium, Firefox (where supported), Safari Technology Preview
|
| 159 |
+
|
| 160 |
+
## Definition of Done
|
| 161 |
+
|
| 162 |
+
- UI remains smooth with active camera preview and keyboard teleoperation; no flicker observed in manual tests
|
| 163 |
+
- Worker-backed runtime enabled by default when available; fallback path verified
|
| 164 |
+
- `calibrate()`, `teleoperate()`, `record()` maintain identical signatures and behavior
|
| 165 |
+
- Typed protocol implemented with Transferables where applicable
|
| 166 |
+
- Unit tests for protocol routing and error timeouts
|
| 167 |
+
- Documentation updated (this user story + README note)
|