File size: 5,523 Bytes
c8edd3d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d86bdca
 
c8edd3d
 
 
d86bdca
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c8edd3d
 
d86bdca
 
c8edd3d
 
 
d86bdca
 
 
 
 
c8edd3d
 
 
 
d86bdca
c8edd3d
 
 
 
d86bdca
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c8edd3d
d86bdca
c8edd3d
 
 
 
 
d86bdca
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c8edd3d
 
 
 
 
 
 
 
 
d86bdca
 
c8edd3d
 
 
 
 
 
 
d86bdca
c8edd3d
d86bdca
 
c8edd3d
 
 
 
d86bdca
 
c8edd3d
 
 
 
 
 
 
 
 
 
d86bdca
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
# Reachy Mini DanceML Architecture

## System Architecture

```mermaid
flowchart TB
    subgraph Input["🎀 Input Layer"]
        USER["User Voice"]
        MIC["Browser Microphone<br/>(Laptop/Mobile)"]
    end

    subgraph Streaming["⚑ Streaming Layer"]
        GRADIO["Gradio UI<br/>:8042"]
    end

    subgraph AI["🧠 AI Layer (OpenAI Realtime)"]
        ASR["Speech-to-Text<br/>(Whisper)"]
        REASON["gpt-realtime<br/>+ SYSTEM_INSTRUCTIONS"]
        TTS["Text-to-Speech"]
    end

    subgraph Tools["πŸ”§ 11 Tools"]
        direction TB
        subgraph Core["Core Movement"]
            GOTO["goto_pose"]
            LOOK["look_at"]
            STOP["stop_movement"]
        end
        subgraph Library["Library Moves"]
            SEARCH["search_moves"]
            PLAY["play_move"]
        end
        subgraph Procedural["Procedural Motion"]
            GENMOTION["generate_motion"]
        end
        subgraph Sequences["Multi-Step"]
            EXECSEQ["execute_sequence"]
        end
        subgraph BuiltIn["Lifecycle & Control"]
            WAKE["wake_up"]
            SLEEP["goto_sleep"]
            MOTOR["motor_control"]
        end
        subgraph Reference["Reference"]
            GUIDE["get_choreography_guide"]
        end
    end

    subgraph Planner["πŸ€– Sequence Planner (GPT-4.1)"]
        PLAN["SequencePlanner<br/>+ PLANNER_SYSTEM_PROMPT"]
    end

    subgraph Backend["πŸ“¦ Backend"]
        HANDLER["RealtimeHandler<br/>(tool dispatch)"]
        GENERATOR["MovementGenerator<br/>(50Hz motor thread)"]
        EXECUTOR["SequenceExecutor"]
        PROCMOVE["ProceduralMove"]
        MOVELIBRARY["MoveLibrary<br/>(101 moves)"]
    end

    subgraph Robot["πŸ€– Reachy Mini"]
        HEAD["Head<br/>roll/pitch/yaw"]
        BODY["Body<br/>yaw Β±180Β°"]
        ANTENNAS["Antennas<br/>left/right"]
        SPEAKER["Speaker"]
    end

    %% Flow
    USER --> MIC --> GRADIO --> ASR --> REASON
    REASON --> TTS --> SPEAKER
    REASON -->|"function_call"| Tools
    Tools --> HANDLER

    %% Tool routing
    HANDLER --> GENERATOR
    HANDLER --> EXECUTOR
    HANDLER --> MOVELIBRARY
    HANDLER --> PROCMOVE
    
    %% Sequence planning
    EXECSEQ -.->|"plan request"| PLAN
    PLAN -.->|"SequencePlan"| EXECUTOR

    %% Execution to hardware
    GENERATOR --> HEAD
    GENERATOR --> BODY
    GENERATOR --> ANTENNAS
```

---

## Tool Reference (11 Tools)

| Tool | Category | Description |
|------|----------|-------------|
| `goto_pose` | Core | Move to specific head/body angles with duration |
| `look_at` | Core | Look at direction (up/down/left/right/floor/ceiling) or 3D point |
| `stop_movement` | Core | Stop all movement and return to neutral |
| `search_moves` | Library | Semantic search of 101 pre-recorded moves |
| `play_move` | Library | Play a named library move |
| `generate_motion` | Procedural | Continuous procedural motion with waveforms, drifts, antenna control |
| `execute_sequence` | Sequences | Multi-step choreography with timing (uses GPT-4.1 planner) |
| `wake_up` | Lifecycle | Play built-in wake animation |
| `goto_sleep` | Lifecycle | Play built-in sleep animation |
| `motor_control` | Control | Enable/disable motors or gravity compensation |
| `get_choreography_guide` | Reference | Load choreography guide for custom movements |

---

## Tool Selection Flow

```mermaid
flowchart TD
    START(("🎀 User<br/>Request")) --> INTENT{"Classify<br/>Intent"}

    INTENT -->|"look left<br/>tilt head"| SIMPLE["🎯 SIMPLE"]
    INTENT -->|"stop<br/>freeze"| EMERGENCY["πŸ›‘ STOP"]
    INTENT -->|"show happy<br/>do a dance"| EMOTION["🎭 EMOTION"]
    INTENT -->|"spiral motion<br/>wiggle antenna"| PROCEDURAL["🌊 PROCEDURAL"]
    INTENT -->|"peek-a-boo<br/>multi-step"| SEQUENCE["🎬 SEQUENCE"]

    SIMPLE --> GOTO_POSE["goto_pose()"]
    EMERGENCY --> STOP_MOVE["stop_movement()"]
    
    EMOTION --> SEARCH_LIB["search_moves()"]
    SEARCH_LIB --> FOUND{"Results?"}
    FOUND -->|"Yes"| PLAY_MOVE["play_move()"]
    FOUND -->|"No"| GEN_MOTION
    
    PROCEDURAL --> GEN_MOTION["generate_motion()"]
    SEQUENCE --> EXEC_SEQ["execute_sequence()"]

    GOTO_POSE --> EXECUTE["⚑ Execute"]
    STOP_MOVE --> EXECUTE
    PLAY_MOVE --> EXECUTE
    GEN_MOTION --> EXECUTE
    EXEC_SEQ --> EXECUTE

    EXECUTE --> ROBOT(("πŸ€– Robot<br/>Moves"))
```

---

## Component Summary

| Layer | Component | Purpose |
|-------|-----------|---------|
| **Input** | Gradio UI | Web interface + audio capture |
| **AI** | OpenAI Realtime API | Speech recognition, reasoning, TTS |
| **AI** | GPT-4.1 (Planner) | Sequence planning for multi-step actions |
| **Tools** | 11 functions | Intent execution via function calling |
| **Backend** | MoveLibrary | 101 pre-recorded HuggingFace moves |
| **Backend** | MovementGenerator | 50Hz motor control thread |
| **Backend** | ProceduralMove | Waveform-based motion generation |
| **Backend** | SequenceExecutor | Step-by-step sequence execution |
| **Output** | Reachy Mini SDK | Motor control, audio playback |

---

## System Prompts

The agent uses **two system prompts**:

1. **SYSTEM_INSTRUCTIONS** ([realtime_handler.py](../reachy_mini_danceml/realtime_handler.py#L19))
   - Main conversational AI instructions
   - Tool selection guide, physical conventions, physics envelope
   - ~200 lines

2. **PLANNER_SYSTEM_PROMPT** ([sequence_planner.py](../reachy_mini_danceml/sequence_planner.py#L56))
   - GPT-4.1 sequence planning instructions
   - Step types: move, wait, speak, motion
   - ~35 lines