Spaces:
Running
Running
File size: 5,523 Bytes
c8edd3d d86bdca c8edd3d d86bdca c8edd3d d86bdca c8edd3d d86bdca c8edd3d d86bdca c8edd3d d86bdca c8edd3d d86bdca c8edd3d d86bdca c8edd3d d86bdca c8edd3d d86bdca c8edd3d d86bdca c8edd3d d86bdca c8edd3d d86bdca |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 |
# Reachy Mini DanceML Architecture
## System Architecture
```mermaid
flowchart TB
subgraph Input["π€ Input Layer"]
USER["User Voice"]
MIC["Browser Microphone<br/>(Laptop/Mobile)"]
end
subgraph Streaming["β‘ Streaming Layer"]
GRADIO["Gradio UI<br/>:8042"]
end
subgraph AI["π§ AI Layer (OpenAI Realtime)"]
ASR["Speech-to-Text<br/>(Whisper)"]
REASON["gpt-realtime<br/>+ SYSTEM_INSTRUCTIONS"]
TTS["Text-to-Speech"]
end
subgraph Tools["π§ 11 Tools"]
direction TB
subgraph Core["Core Movement"]
GOTO["goto_pose"]
LOOK["look_at"]
STOP["stop_movement"]
end
subgraph Library["Library Moves"]
SEARCH["search_moves"]
PLAY["play_move"]
end
subgraph Procedural["Procedural Motion"]
GENMOTION["generate_motion"]
end
subgraph Sequences["Multi-Step"]
EXECSEQ["execute_sequence"]
end
subgraph BuiltIn["Lifecycle & Control"]
WAKE["wake_up"]
SLEEP["goto_sleep"]
MOTOR["motor_control"]
end
subgraph Reference["Reference"]
GUIDE["get_choreography_guide"]
end
end
subgraph Planner["π€ Sequence Planner (GPT-4.1)"]
PLAN["SequencePlanner<br/>+ PLANNER_SYSTEM_PROMPT"]
end
subgraph Backend["π¦ Backend"]
HANDLER["RealtimeHandler<br/>(tool dispatch)"]
GENERATOR["MovementGenerator<br/>(50Hz motor thread)"]
EXECUTOR["SequenceExecutor"]
PROCMOVE["ProceduralMove"]
MOVELIBRARY["MoveLibrary<br/>(101 moves)"]
end
subgraph Robot["π€ Reachy Mini"]
HEAD["Head<br/>roll/pitch/yaw"]
BODY["Body<br/>yaw Β±180Β°"]
ANTENNAS["Antennas<br/>left/right"]
SPEAKER["Speaker"]
end
%% Flow
USER --> MIC --> GRADIO --> ASR --> REASON
REASON --> TTS --> SPEAKER
REASON -->|"function_call"| Tools
Tools --> HANDLER
%% Tool routing
HANDLER --> GENERATOR
HANDLER --> EXECUTOR
HANDLER --> MOVELIBRARY
HANDLER --> PROCMOVE
%% Sequence planning
EXECSEQ -.->|"plan request"| PLAN
PLAN -.->|"SequencePlan"| EXECUTOR
%% Execution to hardware
GENERATOR --> HEAD
GENERATOR --> BODY
GENERATOR --> ANTENNAS
```
---
## Tool Reference (11 Tools)
| Tool | Category | Description |
|------|----------|-------------|
| `goto_pose` | Core | Move to specific head/body angles with duration |
| `look_at` | Core | Look at direction (up/down/left/right/floor/ceiling) or 3D point |
| `stop_movement` | Core | Stop all movement and return to neutral |
| `search_moves` | Library | Semantic search of 101 pre-recorded moves |
| `play_move` | Library | Play a named library move |
| `generate_motion` | Procedural | Continuous procedural motion with waveforms, drifts, antenna control |
| `execute_sequence` | Sequences | Multi-step choreography with timing (uses GPT-4.1 planner) |
| `wake_up` | Lifecycle | Play built-in wake animation |
| `goto_sleep` | Lifecycle | Play built-in sleep animation |
| `motor_control` | Control | Enable/disable motors or gravity compensation |
| `get_choreography_guide` | Reference | Load choreography guide for custom movements |
---
## Tool Selection Flow
```mermaid
flowchart TD
START(("π€ User<br/>Request")) --> INTENT{"Classify<br/>Intent"}
INTENT -->|"look left<br/>tilt head"| SIMPLE["π― SIMPLE"]
INTENT -->|"stop<br/>freeze"| EMERGENCY["π STOP"]
INTENT -->|"show happy<br/>do a dance"| EMOTION["π EMOTION"]
INTENT -->|"spiral motion<br/>wiggle antenna"| PROCEDURAL["π PROCEDURAL"]
INTENT -->|"peek-a-boo<br/>multi-step"| SEQUENCE["π¬ SEQUENCE"]
SIMPLE --> GOTO_POSE["goto_pose()"]
EMERGENCY --> STOP_MOVE["stop_movement()"]
EMOTION --> SEARCH_LIB["search_moves()"]
SEARCH_LIB --> FOUND{"Results?"}
FOUND -->|"Yes"| PLAY_MOVE["play_move()"]
FOUND -->|"No"| GEN_MOTION
PROCEDURAL --> GEN_MOTION["generate_motion()"]
SEQUENCE --> EXEC_SEQ["execute_sequence()"]
GOTO_POSE --> EXECUTE["β‘ Execute"]
STOP_MOVE --> EXECUTE
PLAY_MOVE --> EXECUTE
GEN_MOTION --> EXECUTE
EXEC_SEQ --> EXECUTE
EXECUTE --> ROBOT(("π€ Robot<br/>Moves"))
```
---
## Component Summary
| Layer | Component | Purpose |
|-------|-----------|---------|
| **Input** | Gradio UI | Web interface + audio capture |
| **AI** | OpenAI Realtime API | Speech recognition, reasoning, TTS |
| **AI** | GPT-4.1 (Planner) | Sequence planning for multi-step actions |
| **Tools** | 11 functions | Intent execution via function calling |
| **Backend** | MoveLibrary | 101 pre-recorded HuggingFace moves |
| **Backend** | MovementGenerator | 50Hz motor control thread |
| **Backend** | ProceduralMove | Waveform-based motion generation |
| **Backend** | SequenceExecutor | Step-by-step sequence execution |
| **Output** | Reachy Mini SDK | Motor control, audio playback |
---
## System Prompts
The agent uses **two system prompts**:
1. **SYSTEM_INSTRUCTIONS** ([realtime_handler.py](../reachy_mini_danceml/realtime_handler.py#L19))
- Main conversational AI instructions
- Tool selection guide, physical conventions, physics envelope
- ~200 lines
2. **PLANNER_SYSTEM_PROMPT** ([sequence_planner.py](../reachy_mini_danceml/sequence_planner.py#L56))
- GPT-4.1 sequence planning instructions
- Step types: move, wait, speak, motion
- ~35 lines
|