marionette / ARCHITECTURE.md
RemiFabre
Refactor marionette into modules, fix audio sync, improve tests
dbc544f

Marionette Architecture

This document explains the design of the Marionette app for developers who want to understand, modify, or debug it.

What Marionette Does

Marionette records and plays back head movements (+ optional audio) for the Reachy Mini robot. A user moves the robot's head by hand while the app records the pose at 100 Hz, then replays it as an animated movement. Audio can be recorded from the mic, uploaded as a file, downloaded from YouTube, or picked from the robot's filesystem.

System Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         HTTP (polling)        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Browser (UI)  β”‚ ◄──────────────────────────── β”‚  FastAPI Server  β”‚
β”‚   main.js       β”‚ ──────────────────────────── β–Ίβ”‚  (Uvicorn)       β”‚
β”‚   index.html    β”‚     POST /api/record          β”‚                  β”‚
β”‚   style.css     β”‚     POST /api/play            β”‚  marionette/     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     GET  /api/state           β”‚  β”œβ”€β”€ app.py      β”‚
                                                  β”‚  β”œβ”€β”€ routes.py   β”‚
                                                  β”‚  β”œβ”€β”€ recording.pyβ”‚
                                                  β”‚  β”œβ”€β”€ datasets.py β”‚
                                                  β”‚  β”œβ”€β”€ audio.py    β”‚
                                                  β”‚  β”œβ”€β”€ state.py    β”‚
                                                  β”‚  └── models.py   β”‚
                                                  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                           β”‚
                                                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                                  β”‚   Reachy Mini    β”‚
                                                  β”‚   (Robot SDK)    β”‚
                                                  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Two-Machine vs Single-Machine

  • Reachy Mini: Backend runs on the robot (Linux ARM), browser runs on a laptop. Connected via WiFi.
  • Reachy Mini Light: Backend and browser run on the same laptop. The robot connects via USB.

Module Map

Module Purpose Key classes/functions
models.py Data types, constants, Pydantic models RecordingMetadata, RecordingRequest, DatasetEntry, all *Payload classes
audio.py Stateless audio functions play_wav_chunked(), preload_wav(), play_preloaded_wav()
state.py Thread-safe state read/write StateMixin._serialize_state(), _set_state(), _set_idle_state()
recording.py Motion capture + playback RecordingMixin._capture_motion(), _perform_recording(), _perform_playback()
datasets.py Dataset filesystem + HF sync DatasetMixin._load_dataset_registry(), _sync_dataset(), _check_hf_login()
routes.py HTTP endpoint definitions register_routes() β€” all FastAPI route closures
app.py Main class, run loop, robot helpers Marionette, create_app()
main.py Re-export hub Imports and re-exports everything for backward compatibility
motion_models.py Lead compensation model MotionModelRegistry, shifts commands forward to counter mechanical lag

Threading Model

The app has three types of threads:

  1. Uvicorn thread β€” Runs the FastAPI HTTP server. Handles all API requests. This is the thread that calls route handlers in routes.py.
  2. Main robot thread β€” Runs Marionette.run(). Polls for pending jobs (recording or playback) in a 50ms loop. Executes recording/playback synchronously.
  3. Audio threads β€” Spawned as daemon threads during playback. Push audio chunks to the robot's GStreamer pipeline.

Thread Safety

All shared state is protected by _state_lock (a threading.Lock):

  • The HTTP thread writes: _pending_recording, _pending_playback, _mode
  • The robot thread reads and clears: _pending_recording, _pending_playback
  • Both threads read: _mode, _recordings, _datasets

The lock is held briefly β€” never during I/O or network calls.

Cancel Events

  • _recording_cancel_event β€” Set by the HTTP thread (POST /api/record/stop), checked by the robot thread's capture loop.
  • _playback_cancel_event β€” Set by the HTTP thread (POST /api/play/stop), checked by the robot thread's playback loop.

State Machine

          POST /api/record
  idle ──────────────────── queued
   β–²                          β”‚
   β”‚                     (robot thread picks up)
   β”‚                          β–Ό
   β”‚                      countdown (3s)
   β”‚                          β”‚
   β”‚                          β–Ό
   └──────────────────── recording
                              β”‚
                         (duration elapsed or stop)
                              β–Ό
                            idle

          POST /api/play
  idle ──────────────────── queued
   β–²                          β”‚
   β”‚                     (robot thread picks up)
   β”‚                          β–Ό
   └──────────────────── playing
                              β”‚
                         (move ends or stop)
                              β–Ό
                            idle

Additional states: starting_up (during boot animation), error (transient, returns to idle).

Frontend Architecture

The frontend is vanilla JavaScript (no framework). Key patterns:

Polling Loop

The browser polls GET /api/state at regular intervals:

  • 1500ms when idle (nothing happening)
  • 200ms when active (recording, playing, countdown)

Each poll returns the full app state. The frontend updates the UI accordingly.

Dirty Flag Pattern (Flickering Fix)

The moves list and dataset dropdown are expensive to rebuild (full DOM replacement). Without optimization, they flicker on every poll. The fix:

  • movesListDirty and datasetsDirty flags start as true
  • renderMoves() / updateDatasetUI() only run when the flag is true
  • Flags are set true after user actions (record, delete, switch dataset, etc.)
  • During idle polling, only lightweight updates happen (mode badge, play/stop buttons)

NTP-Style Clock Sync

The backend sends server_time with each state response. The frontend computes clockOffset = serverTime - localTime and uses it to accurately display countdown timers and recording progress bars, even when the browser and robot are on different machines with unsynchronized clocks.

Phase Overlay

During countdown and recording, a full-screen overlay shows progress. This uses requestAnimationFrame for smooth 60fps animation, independent of the polling interval.

Dataset Layout

~/reachy_mini_datasets/           # dataset root (configurable)
β”œβ”€β”€ local_dataset/                # default dataset
β”‚   └── data/                     # all recordings live here
β”‚       β”œβ”€β”€ happy-dance.json      # motion trajectory
β”‚       └── happy-dance.wav       # optional audio
β”œβ”€β”€ my-custom-set/
β”‚   └── data/
β”‚       └── ...
└── user-community-set/           # downloaded from HF
    └── data/
        └── ...

The dataset_registry.json file (next to the app) tracks which datasets exist, which is active, and per-dataset metadata (uploaded move IDs, origin, etc.).

Audio Playback Pipeline

Audio is played through the Reachy Mini's GStreamer pipeline using chunk-based pushing:

  1. Preload: Read WAV file, resample to output rate if needed
  2. Prime pipeline: Call start_playing() to initialize GStreamer
  3. Wait for sync: Audio thread waits for start_signal (set when first motion command is sent)
  4. Push chunks: Feed 20ms audio chunks at ~1.25x real-time
  5. Drain buffer: Wait for remaining audio to play out
  6. Cleanup: Call stop_playing() with timeout (GStreamer can hang)

This push-based approach allows stopping audio at any time, unlike play_sound() which creates an uninterruptible pipeline.

Lead Compensation

Mechanical lag means the robot's actual motion trails the commanded trajectory. The lead compensation model (in motion_models.py) shifts commands forward in time so the actual motion matches the original recording. Parameters (head lead, antenna lead) are tunable in the settings UI.

Testing

Tests live in tests/test_api.py and use FastAPI's TestClient for synchronous HTTP testing without a real robot. The conftest.py creates a Marionette instance with a temporary dataset directory.

Key test patterns:

  • HF functions are monkeypatched on marionette.datasets (the canonical location)
  • Audio functions are monkeypatched on marionette.recording (where they're imported)
  • Recording/playback tests use fake ReachyMini objects with stub methods