Spaces:
Running
title: Hello World
emoji: π€
colorFrom: blue
colorTo: purple
sdk: static
pinned: false
short_description: One App to Rule Them All β 146 APIs, 81 emotions
tags:
- reachy-mini
- reachy_mini
- reachy_mini_python_app
models:
- onnx-community/yolo26n-ONNX
- onnx-community/yolo26n-pose-ONNX
- onnx-community/yolo26s-ONNX
- onnx-community/yolo26m-ONNX
- onnx-community/yolo26m-pose-ONNX
- onnx-community/yolo26s-pose-ONNX
datasets:
- pollen-robotics/reachy-mini-emotions-library
- pollen-robotics/reachy-mini-dances-library
thumbnail: >-
https://huggingface.co/spaces/panny247/hello_world/resolve/main/screenshots/thumbnail.png
Hello World β One App to Rule Them All
Unbox your Reachy Mini. Install Hello World. Everything works.
The app that gives every Reachy Mini owner a running start β and a platform for developers to build upon.
Hit the ground running. AI conversation, real-time YOLO vision, 81 emotions, system monitoring, browser terminal, and full robot control. One install, one dashboard, everything you need.
Build on top of it. 146 REST endpoints. OpenAPI spec. 23 modular Python API files. Fork it, extend it, make it yours.
Lightweight by design. Pure Python + vanilla JS. No React, no bundler, no node_modules. Runs on a Raspberry Pi CM4 with 4GB RAM.
Why This Exists
See It in Action
AI Conversation with 31 Tools
Ask Reachy anything β it reasons with 31 tools, moves its head, plays emotions, takes photos, and controls music. All through natural voice conversation.
Real-time YOLO Vision
Object detection, pose estimation, and segmentation running live on the camera β on the Pi's ARM CPU (ONNX Runtime) or your browser's GPU (WebGPU).
81 Emotions with 3D Preview
Browse the full library, preview any animation on the 3D model, then play it on the physical robot.
What Makes This Different
Technology Stack
| Category | Technologies |
|---|---|
| NVIDIA Ecosystem | ONNX Runtime (ARM64 CPU inference on Pi CM4) β’ ONNX Runtime Web (WebGPU + WASM browser inference) β’ MuJoCo (physics-grade URDF robot model) |
| Pollen Robotics | reachy-mini SDK (ReachyMiniApp base class, motor control, audio pipeline) β’ Reachy Mini (9-DOF expressive robot head, Pi CM4, camera, mic/speaker) |
| HuggingFace | HuggingFace Hub (on-demand YOLO model downloads with disk space checking) β’ HuggingFace Spaces (community distribution) |
| AI / ML | YOLO v8/11 (detection, pose, segmentation, open vocabulary) β’ LiteLLM (unified multi-provider LLM/TTS/STT) β’ webrtcvad (voice activity detection) |
| Backend | FastAPI (146 REST endpoints + 6 WebSocket channels) β’ Python 3.10+ β’ OpenCV (image processing, video recording) |
| Frontend | Vanilla JavaScript (zero framework, 24 modules) β’ Three.js (3D URDF rendering) β’ xterm.js (terminal emulator) β’ WebRTC (camera streaming) β’ WebGPU (browser-side ML inference) |
Quick Start
# Clone and install (on your Reachy Mini, in the apps venv)
git clone https://huggingface.co/spaces/panny247/hello_world
cd hello_world
pip install -e .
# Restart the daemon β it discovers the app automatically
reachy-restart
# Open your browser
# http://reachy-mini.local:8042
That's it. No build step, no npm install, no configuration files to edit. The app works out of the box β all URLs default to localhost. Add your AI provider API keys in the UI when you're ready for conversation and vision features.
Want to add your own feature? Every module in
api/follows the same pattern. Create a file, add aregister_routes(app)function, wire it inapi/__init__.py:# api/my_feature.py β that's all you need def register_routes(app) -> None: @app.settings_app.get("/api/my-feature/status") async def get_status(): return {"status": "ok"}Full OpenAPI spec at
/openapi.json. See Project Structure for the complete layout.
Architecture Overview
graph LR
subgraph Browser
UI[Single-Page App<br>Vanilla JS]
WG[WebGPU<br>ONNX Runtime]
XT[xterm.js<br>Terminal]
end
subgraph Reachy Mini - Pi CM4
FW[FastAPI :8042]
WS[WebSocket Hub]
INF[ONNX Runtime<br>YOLO26n]
SDK[reachy-mini SDK]
PTY[PTY + tmux]
CAM[Camera]
MOT[9x Motors]
MIC[Mic / Speaker]
end
subgraph LiteLLM - Cloud
STT[STT]
LLM[LLM]
TTS[TTS]
end
UI -->|REST| FW
UI <-->|WS| WS
UI -->|WebRTC| CAM
WG -->|fetch frame| FW
XT <-->|WS PTY| PTY
FW --> SDK
SDK --> MOT
SDK --> CAM
SDK --> MIC
INF --> CAM
INF -->|detections| WS
FW -->|voice pipeline| STT
STT --> LLM
LLM --> TTS
TTS -->|audio| MIC
Feature Tour
Floating Panels
Two persistent panels hover above every tab:
- Camera Panel -- Live WebRTC camera feed with snapshot capture, video recording, mic recording, and listen/speak buttons. Supports robot camera and any browser camera (laptop webcam, iPhone Continuity Camera, USB cameras) via a unified Camera Registry.
- Joystick Panel -- 5-axis head control: look direction, z/roll, x/y translation, body rotation, and individual antenna control. Return-to-center toggle for smooth operation.
Both panels are draggable, resizable, minimizable, and remember their position across sessions. Bluetooth audio routing (A2DP at 48kHz) and 43 searchable help topics round out the feature set.
NVIDIA Ecosystem Integration
ONNX Runtime: Dual-Backend Vision Pipeline
This project demonstrates ONNX Runtime running on two fundamentally different backends simultaneously, connected through a shared WebSocket detection channel:
graph TB
CAM[Camera Frame]
subgraph CM4 Mode
CAM --> PRE1[Preprocess<br>Letterbox 640px]
PRE1 --> ONNX[ONNX Runtime<br>CPU ARM64]
ONNX --> POST1[Postprocess<br>NMS + Filter]
POST1 --> WS1[WebSocket<br>detections + frame]
end
subgraph WebGPU Mode
CAM --> API["/api/vision/frame<br>JPEG endpoint"]
API --> PRE2[Preprocess<br>Canvas letterbox]
PRE2 --> WGPU[ONNX Runtime Web<br>WebGPU / WASM]
WGPU --> POST2[Postprocess<br>JS NMS + Filter]
end
WS1 --> OVR[Canvas Overlay<br>Boxes / Keypoints / Labels]
POST2 --> OVR
style ONNX fill:#6366f1,color:#fff
style WGPU fill:#22c55e,color:#fff
| CM4 Backend | WebGPU Backend | |
|---|---|---|
| Runtime | ONNX Runtime (ARM64 CPU) | ONNX Runtime Web (WebGPU/WASM) |
| Hardware | Raspberry Pi CM4 | Any modern browser GPU |
| Models | Nano (~5 MB) | Nano, Small, Medium |
| Speed | ~5-8 FPS | ~30-60 FPS |
| Model Source | HuggingFace Hub (on-demand download) | HuggingFace Hub (on-demand download) |
| Vision Tasks | Detection, Pose, Segmentation, Open Vocab | Detection, Pose, Segmentation, Open Vocab |
Models are downloaded on demand from HuggingFace Hub with automatic disk space checking (the Pi CM4 has limited eMMC storage). The dual-backend approach lets developers choose the right tradeoff: always-on lightweight inference on the edge device, or high-performance GPU-accelerated analysis when a browser is connected.
MuJoCo: Physics-Grade Robot Model
The 3D simulation uses the robot's URDF model (the same format used by MuJoCo and other physics simulators) rendered with Three.js and post-processing effects. The simulation receives live pose data at 15Hz over WebSocket, creating a real-time digital twin. Pre-computed joint angle data lets users preview all 101 animations (81 emotions + 20 dances) in 3D before playing them on the physical robot.
WebGPU: Browser-Side ML Inference
The WebGPU backend uses ONNX Runtime Web v1.20.0 to run YOLO models directly in the browser's GPU. This offloads compute-intensive inference from the resource-constrained Pi CM4, achieving 4-10x higher frame rates. The implementation includes letterbox preprocessing on canvas, JavaScript NMS post-processing, and a detection overlay renderer β all running client-side with zero server load.
Voice Pipeline
graph LR
RM[Robot Mic] --> VAD
BM[Browser Mic] --> VAD
VAD{VAD<br>webrtcvad} -->|speech| STT[STT<br>Whisper/Groq]
STT -->|text + conf| FILT{Threshold<br>Filter}
FILT -->|pass| LLM[LLM<br>+ 31 Tools]
LLM -->|response| TTS[TTS]
TTS --> RS[Robot Speaker]
TTS --> BS[Browser Speaker]
LLM -->|tool calls| TOOLS[Emotions / Dances<br>Camera / Music<br>Head Control]
style VAD fill:#6366f1,color:#fff
style LLM fill:#6366f1,color:#fff
The voice listener runs a headless pipeline:
- VAD: webrtcvad (aggressiveness 2), 30ms frames, 1s silence timeout, 300ms minimum speech
- Audio Input: Robot mic (SDK) or browser mic (via WebSocket)
- Audio Output: Robot speaker (SDK
push_audio_sample) or browser (WebSocket base64 WAV) - TTS Queue: Serialized via lock β one speaker at a time, no overlapping speech
- Antenna Wiggle: Physical feedback on speech detection (3-pattern rotation)
AI Provider Support
All providers accessed via LiteLLM β enter your API keys in the UI.
| Capability | Providers |
|---|---|
| STT | OpenAI Whisper, Groq |
| LLM | OpenAI, Anthropic, Groq, Gemini, DeepSeek |
| TTS | OpenAI, ElevenLabs, Groq Orpheus, Gemini |
| VLM | Vision-capable models auto-detected per provider |
| Web Search | Anthropic, Gemini (always); OpenAI, Groq (model-dependent) |
Provider capabilities and available models are discovered dynamically from live API calls (cached 10 minutes).
What You Can Ask Reachy
The AI assistant has 31 tools it can call autonomously during conversation. It decides which tools to use based on your request β no manual selection needed.
View all 31 tools with example prompts
| Tool | What it does | Example prompt |
|---|---|---|
| ignore | Skip background noise / non-directed speech | (called automatically ~95% of ambient audio) |
| play_emotion | Play one of 81 emotion animations | "Show me you're happy" / "Are you scared?" |
| play_dance | Play one of 20 dance moves | "Dance for me" / "Do the chicken peck" |
| set_head_pose | Move head (yaw/pitch/roll) | "Look left" / "Nod your head" |
| take_snapshot | Capture camera image + VLM description | "Take a photo" / "What do you see?" |
| start_recording | Record video from the camera | "Record a 10 second video" |
| stop_recording | Stop video recording | "Stop recording" |
| start_sound_recording | Record audio from the mic | "Record what you hear" |
| stop_sound_recording | Stop audio recording | "Stop the audio recording" |
| play_music | Play a track on the robot speaker | "Play some music" |
| stop_music | Stop music playback | "Stop the music" |
| list_music | List available tracks | "What music do you have?" |
| get_system_status | Get CPU, RAM, uptime, etc. | "How are your systems?" |
| get_date_time | Get current date and time | "What time is it?" |
| see_objects | YOLO object detection through camera | "What objects do you see?" / "Is anyone there?" |
| set_timer | Set a countdown timer with name | "Set a timer for 5 minutes for pasta" |
| check_timers | Check status of active timers | "How much time is left on the pasta timer?" |
| cancel_timer | Cancel a running timer | "Cancel the pasta timer" |
| set_alarm | Set a recurring or one-shot alarm | "Set an alarm for 7 AM" |
| manage_alarm | List, cancel, snooze, or toggle alarms | "Snooze my morning alarm" |
| play_ambient | Play looping ambient sounds with sleep timer | "Play rain sounds for 30 minutes" |
| stop_ambient | Stop ambient sound playback | "Stop the rain sounds" |
| search_help | Search the 43 built-in help topics | "How do I use the joystick?" |
| create_scratchpad | Create rich HTML visualizations | "Show me a chart of CPU usage" |
| set_volume | Adjust master, speech, music, or effects volume | "Turn the volume up" / "Set music to 50%" |
| start_oscillation | Start a head movement pattern | "Sway your head gently" |
| stop_oscillation | Stop head oscillation | "Stop swaying" |
| set_motor_mode | Change motor mode (enabled/disabled/gravity) | "Turn off your motors" |
| set_vision_mode | Switch vision mode (off/cm4/webgpu) | "Start object detection" |
| control_listener | Start, stop, or mute the voice listener | "Stop listening for a bit" |
| bluetooth_manage | Scan, pair, connect Bluetooth devices | "Find Bluetooth speakers" |
WebSocket Architecture
Six WebSocket channels handle all real-time communication β no polling anywhere in the system:
graph LR
subgraph Browser
CH[Charts & 3D Sim]
TR[Transcript UI]
IC[Intercom]
VIS[Vision Overlay]
XT[xterm.js Terminal]
end
subgraph Server :8042
L["/ws/live"]
T["/ws/transcribe"]
I["/ws/intercom"]
B["/ws/browser-mic"]
PT["/ws/terminal"]
end
L -->|robot state 15Hz| CH
L -->|system stats 1Hz| CH
L -->|vision detections| VIS
T -->|transcriptions| TR
T -->|LLM responses| TR
T -->|TTS audio base64| TR
IC <-->|PCM audio| I
IC -->|mic only| B
XT <-->|PTY I/O| PT
style L fill:#6366f1,color:#fff
style T fill:#6366f1,color:#fff
style PT fill:#6366f1,color:#fff
| Endpoint | Direction | Rate | Purpose |
|---|---|---|---|
/ws/live |
server β client | Configurable (15Hz robot, 1Hz stats) | Robot state, system stats, vision detections, camera frames |
/ws/intercom |
bidirectional | Real-time | Browser mic PCM β robot speaker; robot mic β browser |
/ws/browser-mic |
client β server | Real-time | Browser mic only (feeds listener, no speaker feedback) |
/ws/transcribe |
server β client | Event-driven | Transcriptions, LLM responses, tool calls, TTS audio, errors |
/ws/camera |
client β server | Configurable (2 FPS default) | Browser camera JPEG frames for vision/snapshots/recording |
/ws/terminal |
bidirectional | Real-time | PTY terminal via shared tmux session (xterm.js client) |
/ws/live Subscriptions
Clients send {"subscribe": ["robot", "stats", "vision"]} to control data flow:
- robot: head_pose (x/y/z/roll/pitch/yaw), joint angles, antennas, errors
- stats: CPU, RAM, disk, network, WiFi, load, fan, throttle, disk I/O, processes
- vision: detection results + base64 JPEG camera frames (when CM4 mode active)
Project Structure
hello_world/
βββ main.py # Entry point (imports HelloWorld)
βββ app.py # ReachyMiniApp subclass, 50Hz main loop
βββ config.py # Centralized config (@dataclass, env var overrides)
βββ settings.py # JSON settings persistence (load/save/defaults)
βββ stats.py # 12 system telemetry functions
βββ websocket.py # 6 WebSocket endpoints
βββ vision_inference.py # CM4 ONNX Runtime inference engine (threaded)
βββ api/
β βββ __init__.py # Auto-discovery route registration (23 modules)
β βββ conversation/ # LLM chat, tool calling, provider discovery
β β βββ discovery.py # Provider/model/voice discovery + caches
β β βββ prompts.py # System prompt, 31 tool definitions
β β βββ tool_executor.py # Tool dispatch (31 branches)
β β βββ tts.py # TTS playback (robot + browser routing)
β β βββ chat.py # Chat endpoint, history, Pydantic models
β βββ listener.py # Headless voice pipeline (VAD + STT + LLM + TTS)
β βββ vision.py # YOLO vision mode switching + model management
β βββ oscillation.py # Head oscillation patterns (6 patterns, 50Hz)
β βββ moves.py # 81 emotions + 20 dances gallery + playback
β βββ timers.py # Countdown timers with sound notifications
β βββ alarms.py # Recurring/one-shot alarms with scheduling
β βββ ambient.py # Ambient sound loops with sleep timers
β βββ music.py # Music library (upload, play, metadata)
β βββ cameras.py # Camera registry (robot + browser cameras)
β βββ bluetooth.py # Bluetooth device management + A2DP audio
β βββ scratchpad.py # Generative HTML visualizations
β βββ help.py # 43 auto-discovered help topics
β βββ ... # 23 modules total
βββ static/
βββ index.html # Single-page app (117KB)
βββ css/styles.css # Theme-aware styles (dark/light)
βββ js/ # 24 modules: core/ + features/ + controls/ + media/
Configuration
All settings are managed through the web UI and persisted to ~/hello_world/settings.json. API keys are entered in the UI under AI Provider Settings. Default URLs point to localhost so the app works out of the box on any Reachy Mini.
Environment variables (optional)
| Variable | Default | Description |
|---|---|---|
REACHY_DAEMON_URL |
http://localhost:8000 |
Daemon API URL |
REACHY_MEDIA_DIR |
~/hello_world/media |
Media storage directory |
REACHY_LOG_LEVEL |
INFO |
Logging level |
REACHY_SETTINGS_FILE |
~/hello_world/settings.json |
Settings file path |
Settings reference (54 keys)
| Group | Key settings |
|---|---|
| Motor | motor_mode (enabled/disabled/gravity_compensation) |
| Update rates | robot_update_hz (15), stats_update_hz (1) |
| Video | video_view (off/camera/simulation/both) |
| Voice | audio_input, audio_output, stt_provider, stt_model, stt_language |
| LLM | llm_provider, llm_model, system_prompt, web_search |
| TTS | tts_provider, tts_model, tts_voice |
| VLM | vlm_provider, vlm_model |
| Thresholds | conf_threshold, vol_threshold, mic_gain |
| Volume | master_volume, speech_volume, music_volume, effects_volume, ambient_volume |
| Vision | vision_mode, vision_task, vision_model_size, vision_confidence, vision_classes, vision_prompt, vision_overlay, vision_reactions, vision_fps_target, vision_source, vision_pose_mode |
| Oscillation | oscillation_amplitude, oscillation_speed |
| Cameras | browser_camera_fps, browser_camera_quality, browser_camera_width, active_camera |
| Timers | alarms, alarm_sound, timer_sound, custom_timer_sounds, custom_alarm_sounds, custom_ambient_sounds |
| API Keys | api_keys dict (openai, anthropic, groq, deepseek, gemini, elevenlabs) |
| UI | last_active_tab, system_stats_order, tab_order, shell_mode |
Dependencies
Python packages
Required:
reachy-miniβ Robot SDK (ReachyMiniApp base class, media, motors)litellmβ Unified LLM/TTS/STT provider interfacewebrtcvadβ Voice activity detection for listenersoundfileβ Audio file I/O (WAV read/write)mutagenβ Music metadata extraction (ID3, FLAC, M4A)opencv-pythonβ Image processing, video recording, JPEG encodingnumpyβ Array operations (inference, audio processing)scipyβ Rotation math (Euler angle conversions)psutilβ System statistics (CPU, memory, disk, processes)libtmuxβ Tmux session management for shell terminal
Optional (for CM4 vision inference):
onnxruntimeβ ONNX model inference on ARM64huggingface-hubβ Model downloads from HuggingFace Hub
Frontend libraries (CDN)
- Three.js v0.169.0 β 3D rendering (URDF viewer)
- urdf-loader v0.12.3 β URDF parsing
- ONNX Runtime Web v1.20.0 β Browser-side YOLO inference (WebGPU/WASM)
- xterm.js v5.3.0 + FitAddon v0.8.0 β Terminal emulator
- marked.js v14.1.0 β Markdown rendering
- GstWebRTC β Camera streaming (loaded from daemon)
- Kinematics WASM β Passive joint forward kinematics
API Reference
146 REST endpoints across 23 modules, plus 6 WebSocket channels. Full OpenAPI spec available at /openapi.json when running.
About & Health
| Method | Path | Description |
|---|---|---|
| GET | /api/about/readme |
README.md content for in-app docs |
| GET | /api/health |
Overall system health (daemon + providers) |
| GET | /api/health/daemon |
Daemon API status |
| GET | /api/health/config |
Current config (URLs, timeouts) |
System & Settings
| Method | Path | Description |
|---|---|---|
| GET | /api/system/stats |
All 12 telemetry functions aggregated |
| GET | /api/system/cpu |
CPU cores + temperature |
| GET | /api/system/memory |
RAM breakdown |
| GET | /api/system/disk |
Local + swap usage |
| GET | /api/system/network |
TX/RX speeds + WiFi |
| GET | /api/system/processes |
Top processes by CPU |
| GET | /api/system/hardware |
Static hardware inventory |
| GET | /api/system/health |
Service dependency health |
| GET | /api/settings |
Return all settings |
| PUT | /api/settings |
Update settings (whitelisted keys only) |
Conversation & Listener
| Method | Path | Description |
|---|---|---|
| GET | /api/conversation/known-providers |
All providers + static capabilities |
| GET | /api/conversation/providers |
Available providers (have API keys) |
| GET | /api/conversation/models |
Models for provider/capability (live discovery) |
| GET | /api/conversation/voices |
TTS voices (dynamic probing or static list) |
| GET | /api/conversation/default-prompt |
Built-in system prompt |
| GET | /api/conversation/web-search-support |
Check if model supports web search |
| POST | /api/conversation/chat |
LLM chat with tool calling |
| POST | /api/conversation/reset |
Clear session history |
| POST | /api/conversation/speak |
TTS-only (no LLM) |
| GET | /api/listener/status |
Running state + mute status |
| POST | /api/listener/start |
Start VAD + STT + chat pipeline |
| POST | /api/listener/stop |
Stop listening |
| POST | /api/listener/mute |
Mute (optional auto-unmute duration) |
Vision & Cameras
| Method | Path | Description |
|---|---|---|
| GET | /api/vision/status |
Current pipeline status |
| POST | /api/vision/mode |
Switch mode: off, cm4, or webgpu |
| GET | /api/vision/models |
Available models for current task/mode |
| GET | /api/vision/tasks |
List vision tasks |
| GET | /api/vision/health |
Backend health check |
| GET | /api/vision/classes |
COCO 80 class list for filtering |
| GET | /api/vision/model-status |
Cached models + disk free space |
| POST | /api/vision/download-model |
Download model (with disk space check) |
| GET | /api/vision/frame |
Current camera frame as JPEG |
| GET | /api/vision/detections |
Latest detection results |
| POST | /api/vision/detections |
Submit detections from WebGPU client |
| GET | /api/cameras |
List all cameras with status |
| GET | /api/cameras/active |
Get active camera ID |
| POST | /api/cameras/active |
Set active camera |
| GET | /api/cameras/{camera_id}/frame |
Get frame as JPEG |
Moves & Oscillation
| Method | Path | Description |
|---|---|---|
| GET | /api/moves/metadata |
All emotions/dances with descriptions |
| GET | /api/moves/audio/{type}/{name} |
Emotion audio file |
| GET | /api/moves/sim/{type}/{name} |
Pre-computed joint angles for 3D sim |
| POST | /api/moves/play |
Play move with audio routing |
| POST | /api/moves/stop |
Stop playback + reset head |
| GET | /api/moves/status |
Current playback status |
| POST | /api/oscillation/start |
Start pattern (amplitude, speed) |
| POST | /api/oscillation/stop |
Stop + reset head to center |
| GET | /api/oscillation/status |
Current state |
| PATCH | /api/oscillation/update |
Update parameters while running |
Media (Snapshots, Recordings, Sounds, Music)
| Method | Path | Description |
|---|---|---|
| POST | /api/snapshots/capture |
Capture from camera (+ antenna wiggle) |
| POST | /api/snapshots/upload |
Upload client-captured image |
| GET | /api/snapshots/list |
List all snapshots |
| DELETE | /api/snapshots/{filename} |
Delete snapshot |
| POST | /api/recordings/start |
Start video recording |
| POST | /api/recordings/stop |
Stop + generate thumbnail |
| POST | /api/recordings/upload |
Upload WebM, convert to MP4 |
| GET | /api/recordings/list |
List MP4 files with duration |
| DELETE | /api/recordings/{filename} |
Delete recording + thumbnail |
| POST | /api/sounds/start |
Start audio recording |
| POST | /api/sounds/stop |
Stop + generate waveform thumbnail |
| GET | /api/sounds/list |
List WAV files with duration |
| DELETE | /api/sounds/{filename} |
Delete sound |
| GET | /api/music/list |
List with metadata |
| POST | /api/music/upload |
Upload music file |
| POST | /api/music/play/{filename} |
Play via ffmpeg + SDK |
| POST | /api/music/stop |
Stop playback |
Timers, Alarms & Ambient
| Method | Path | Description |
|---|---|---|
| POST | /api/timers/create |
Create countdown timer |
| GET | /api/timers/list |
List timers with remaining time |
| POST | /api/timers/{id}/pause |
Pause timer |
| POST | /api/timers/{id}/resume |
Resume timer |
| POST | /api/timers/{id}/cancel |
Cancel timer |
| POST | /api/alarms/create |
Create alarm (time, name, days) |
| GET | /api/alarms/list |
List all alarms |
| PUT | /api/alarms/{id} |
Update alarm |
| DELETE | /api/alarms/{id} |
Delete alarm |
| POST | /api/alarms/{id}/toggle |
Enable/disable |
| POST | /api/alarms/{id}/snooze |
Snooze triggered alarm |
| GET | /api/ambient/sounds |
List ambient sounds |
| POST | /api/ambient/play |
Start ambient sound with sleep timer |
| POST | /api/ambient/stop |
Stop ambient playback |
Shell, Scratchpad, Bluetooth & Help
| Method | Path | Description |
|---|---|---|
| GET | /api/tmux/status |
Tmux session status + output |
| POST | /api/tmux/send |
Send text to tmux session |
| POST | /api/tmux/key/{key} |
Send special key |
| GET | /api/scratchpad/list |
List all entries |
| GET | /api/scratchpad/latest |
Most recent entry |
| POST | /api/scratchpad/create |
Create HTML entry |
| DELETE | /api/scratchpad/{id} |
Delete entry |
| GET | /api/bluetooth/status |
Adapter and connection status |
| POST | /api/bluetooth/scan |
Scan for devices |
| GET | /api/bluetooth/devices |
Discovered/paired devices |
| POST | /api/bluetooth/pair/{address} |
Pair device |
| POST | /api/bluetooth/connect/{address} |
Connect device |
| GET | /api/help/topics |
All help topics |
| GET | /api/help/search |
Search topics |
Transcript & Model
| Method | Path | Description |
|---|---|---|
| POST | /api/transcript/claude |
Broadcast LLM response |
| POST | /api/transcript/tool |
Broadcast tool activity |
| POST | /api/transcript/error |
Broadcast error |
| POST | /api/transcript/speaking |
Set speaking status |
| GET | /api/model/mjcf |
Robot MJCF XML model |
| GET | /api/model/meshes |
List mesh file paths |
| GET | /api/model/mesh/{path} |
Individual mesh file |
| GET | /api/model/urdf/{path} |
URDF files + meshes |
Security
- Path validation:
validate_path_in_directory()prevents directory traversal on all file-serving endpoints - Settings whitelist:
ALLOWED_SETTINGS_KEYSrejects unknown keys - Settings validation: Numeric bounds enforced (volumes 0-100, FPS 1-30, etc.) β out-of-range values silently clamped
- File type checks: Extension verification before operations
- Upload limits: 10 MB images, 100 MB video, 50 MB music
- API keys: Stored in local
settings.json(user's responsibility) - No external telemetry: All data stays on the local network
Developer Note
Built solo, with love. This is one developer's attempt to build the most complete Reachy Mini experience possible. 146 endpoints, 31 AI tools, 81 emotions, and counting. If you find a bug or have an idea, open a discussion β I read every one. Expect passion, not perfection.
Built with Pollen Robotics reachy-mini SDK | ONNX Runtime | HuggingFace Hub | LiteLLM