Spaces:

Boopster
/

reachy_mini_danceml

Running

App Files Files Community

reachy_mini_danceml / docs /ARCHITECTURE.md

Boopster

docs: enhance choreography guide with detailed movement parameters and remove outdated voice control plan.

d86bdca 11 days ago

preview code

raw

history blame contribute delete

5.52 kB

	# Reachy Mini DanceML Architecture

	## System Architecture

	```mermaid
	flowchart TB
	subgraph Input["🎤 Input Layer"]
	USER["User Voice"]
	MIC["Browser Microphone<br/>(Laptop/Mobile)"]
	end

	subgraph Streaming["⚡ Streaming Layer"]
	GRADIO["Gradio UI<br/>:8042"]
	end

	subgraph AI["🧠 AI Layer (OpenAI Realtime)"]
	ASR["Speech-to-Text<br/>(Whisper)"]
	REASON["gpt-realtime<br/>+ SYSTEM_INSTRUCTIONS"]
	TTS["Text-to-Speech"]
	end

	subgraph Tools["🔧 11 Tools"]
	direction TB
	subgraph Core["Core Movement"]
	GOTO["goto_pose"]
	LOOK["look_at"]
	STOP["stop_movement"]
	end
	subgraph Library["Library Moves"]
	SEARCH["search_moves"]
	PLAY["play_move"]
	end
	subgraph Procedural["Procedural Motion"]
	GENMOTION["generate_motion"]
	end
	subgraph Sequences["Multi-Step"]
	EXECSEQ["execute_sequence"]
	end
	subgraph BuiltIn["Lifecycle & Control"]
	WAKE["wake_up"]
	SLEEP["goto_sleep"]
	MOTOR["motor_control"]
	end
	subgraph Reference["Reference"]
	GUIDE["get_choreography_guide"]
	end
	end

	subgraph Planner["🤖 Sequence Planner (GPT-4.1)"]
	PLAN["SequencePlanner<br/>+ PLANNER_SYSTEM_PROMPT"]
	end

	subgraph Backend["📦 Backend"]
	HANDLER["RealtimeHandler<br/>(tool dispatch)"]
	GENERATOR["MovementGenerator<br/>(50Hz motor thread)"]
	EXECUTOR["SequenceExecutor"]
	PROCMOVE["ProceduralMove"]
	MOVELIBRARY["MoveLibrary<br/>(101 moves)"]
	end

	subgraph Robot["🤖 Reachy Mini"]
	HEAD["Head<br/>roll/pitch/yaw"]
	BODY["Body<br/>yaw ±180°"]
	ANTENNAS["Antennas<br/>left/right"]
	SPEAKER["Speaker"]
	end

	%% Flow
	USER --> MIC --> GRADIO --> ASR --> REASON
	REASON --> TTS --> SPEAKER
	REASON -->\|"function_call"\| Tools
	Tools --> HANDLER

	%% Tool routing
	HANDLER --> GENERATOR
	HANDLER --> EXECUTOR
	HANDLER --> MOVELIBRARY
	HANDLER --> PROCMOVE

	%% Sequence planning
	EXECSEQ -.->\|"plan request"\| PLAN
	PLAN -.->\|"SequencePlan"\| EXECUTOR

	%% Execution to hardware
	GENERATOR --> HEAD
	GENERATOR --> BODY
	GENERATOR --> ANTENNAS
	```

	---

	## Tool Reference (11 Tools)

	\| Tool \| Category \| Description \|
	\|------\|----------\|-------------\|
	\| `goto_pose` \| Core \| Move to specific head/body angles with duration \|
	\| `look_at` \| Core \| Look at direction (up/down/left/right/floor/ceiling) or 3D point \|
	\| `stop_movement` \| Core \| Stop all movement and return to neutral \|
	\| `search_moves` \| Library \| Semantic search of 101 pre-recorded moves \|
	\| `play_move` \| Library \| Play a named library move \|
	\| `generate_motion` \| Procedural \| Continuous procedural motion with waveforms, drifts, antenna control \|
	\| `execute_sequence` \| Sequences \| Multi-step choreography with timing (uses GPT-4.1 planner) \|
	\| `wake_up` \| Lifecycle \| Play built-in wake animation \|
	\| `goto_sleep` \| Lifecycle \| Play built-in sleep animation \|
	\| `motor_control` \| Control \| Enable/disable motors or gravity compensation \|
	\| `get_choreography_guide` \| Reference \| Load choreography guide for custom movements \|

	---

	## Tool Selection Flow

	```mermaid
	flowchart TD
	START(("🎤 User<br/>Request")) --> INTENT{"Classify<br/>Intent"}

	INTENT -->\|"look left<br/>tilt head"\| SIMPLE["🎯 SIMPLE"]
	INTENT -->\|"stop<br/>freeze"\| EMERGENCY["🛑 STOP"]
	INTENT -->\|"show happy<br/>do a dance"\| EMOTION["🎭 EMOTION"]
	INTENT -->\|"spiral motion<br/>wiggle antenna"\| PROCEDURAL["🌊 PROCEDURAL"]
	INTENT -->\|"peek-a-boo<br/>multi-step"\| SEQUENCE["🎬 SEQUENCE"]

	SIMPLE --> GOTO_POSE["goto_pose()"]
	EMERGENCY --> STOP_MOVE["stop_movement()"]

	EMOTION --> SEARCH_LIB["search_moves()"]
	SEARCH_LIB --> FOUND{"Results?"}
	FOUND -->\|"Yes"\| PLAY_MOVE["play_move()"]
	FOUND -->\|"No"\| GEN_MOTION

	PROCEDURAL --> GEN_MOTION["generate_motion()"]
	SEQUENCE --> EXEC_SEQ["execute_sequence()"]

	GOTO_POSE --> EXECUTE["⚡ Execute"]
	STOP_MOVE --> EXECUTE
	PLAY_MOVE --> EXECUTE
	GEN_MOTION --> EXECUTE
	EXEC_SEQ --> EXECUTE

	EXECUTE --> ROBOT(("🤖 Robot<br/>Moves"))
	```

	---

	## Component Summary

	\| Layer \| Component \| Purpose \|
	\|-------\|-----------\|---------\|
	\| Input \| Gradio UI \| Web interface + audio capture \|
	\| AI \| OpenAI Realtime API \| Speech recognition, reasoning, TTS \|
	\| AI \| GPT-4.1 (Planner) \| Sequence planning for multi-step actions \|
	\| Tools \| 11 functions \| Intent execution via function calling \|
	\| Backend \| MoveLibrary \| 101 pre-recorded HuggingFace moves \|
	\| Backend \| MovementGenerator \| 50Hz motor control thread \|
	\| Backend \| ProceduralMove \| Waveform-based motion generation \|
	\| Backend \| SequenceExecutor \| Step-by-step sequence execution \|
	\| Output \| Reachy Mini SDK \| Motor control, audio playback \|

	---

	## System Prompts

	The agent uses two system prompts:

	1. SYSTEM_INSTRUCTIONS ([realtime_handler.py](../reachy_mini_danceml/realtime_handler.py#L19))
	- Main conversational AI instructions
	- Tool selection guide, physical conventions, physics envelope
	- ~200 lines

	2. PLANNER_SYSTEM_PROMPT ([sequence_planner.py](../reachy_mini_danceml/sequence_planner.py#L56))
	- GPT-4.1 sequence planning instructions
	- Step types: move, wait, speak, motion
	- ~35 lines