Spaces:

Boopster
/

reachy_mini_danceml

Running

App Files Files Community

reachy_mini_danceml / docs /ARCHITECTURE.md

Boopster

docs: enhance choreography guide with detailed movement parameters and remove outdated voice control plan.

d86bdca 7 days ago

preview code

raw

history blame contribute delete

5.52 kB

Reachy Mini DanceML Architecture

System Architecture

flowchart TB
    subgraph Input["🎤 Input Layer"]
        USER["User Voice"]
        MIC["Browser Microphone<br/>(Laptop/Mobile)"]
    end

    subgraph Streaming["⚡ Streaming Layer"]
        GRADIO["Gradio UI<br/>:8042"]
    end

    subgraph AI["🧠 AI Layer (OpenAI Realtime)"]
        ASR["Speech-to-Text<br/>(Whisper)"]
        REASON["gpt-realtime<br/>+ SYSTEM_INSTRUCTIONS"]
        TTS["Text-to-Speech"]
    end

    subgraph Tools["🔧 11 Tools"]
        direction TB
        subgraph Core["Core Movement"]
            GOTO["goto_pose"]
            LOOK["look_at"]
            STOP["stop_movement"]
        end
        subgraph Library["Library Moves"]
            SEARCH["search_moves"]
            PLAY["play_move"]
        end
        subgraph Procedural["Procedural Motion"]
            GENMOTION["generate_motion"]
        end
        subgraph Sequences["Multi-Step"]
            EXECSEQ["execute_sequence"]
        end
        subgraph BuiltIn["Lifecycle & Control"]
            WAKE["wake_up"]
            SLEEP["goto_sleep"]
            MOTOR["motor_control"]
        end
        subgraph Reference["Reference"]
            GUIDE["get_choreography_guide"]
        end
    end

    subgraph Planner["🤖 Sequence Planner (GPT-4.1)"]
        PLAN["SequencePlanner<br/>+ PLANNER_SYSTEM_PROMPT"]
    end

    subgraph Backend["📦 Backend"]
        HANDLER["RealtimeHandler<br/>(tool dispatch)"]
        GENERATOR["MovementGenerator<br/>(50Hz motor thread)"]
        EXECUTOR["SequenceExecutor"]
        PROCMOVE["ProceduralMove"]
        MOVELIBRARY["MoveLibrary<br/>(101 moves)"]
    end

    subgraph Robot["🤖 Reachy Mini"]
        HEAD["Head<br/>roll/pitch/yaw"]
        BODY["Body<br/>yaw ±180°"]
        ANTENNAS["Antennas<br/>left/right"]
        SPEAKER["Speaker"]
    end

    %% Flow
    USER --> MIC --> GRADIO --> ASR --> REASON
    REASON --> TTS --> SPEAKER
    REASON -->|"function_call"| Tools
    Tools --> HANDLER

    %% Tool routing
    HANDLER --> GENERATOR
    HANDLER --> EXECUTOR
    HANDLER --> MOVELIBRARY
    HANDLER --> PROCMOVE
    
    %% Sequence planning
    EXECSEQ -.->|"plan request"| PLAN
    PLAN -.->|"SequencePlan"| EXECUTOR

    %% Execution to hardware
    GENERATOR --> HEAD
    GENERATOR --> BODY
    GENERATOR --> ANTENNAS

Tool Reference (11 Tools)

Tool	Category	Description
`goto_pose`	Core	Move to specific head/body angles with duration
`look_at`	Core	Look at direction (up/down/left/right/floor/ceiling) or 3D point
`stop_movement`	Core	Stop all movement and return to neutral
`search_moves`	Library	Semantic search of 101 pre-recorded moves
`play_move`	Library	Play a named library move
`generate_motion`	Procedural	Continuous procedural motion with waveforms, drifts, antenna control
`execute_sequence`	Sequences	Multi-step choreography with timing (uses GPT-4.1 planner)
`wake_up`	Lifecycle	Play built-in wake animation
`goto_sleep`	Lifecycle	Play built-in sleep animation
`motor_control`	Control	Enable/disable motors or gravity compensation
`get_choreography_guide`	Reference	Load choreography guide for custom movements

Tool Selection Flow

flowchart TD
    START(("🎤 User<br/>Request")) --> INTENT{"Classify<br/>Intent"}

    INTENT -->|"look left<br/>tilt head"| SIMPLE["🎯 SIMPLE"]
    INTENT -->|"stop<br/>freeze"| EMERGENCY["🛑 STOP"]
    INTENT -->|"show happy<br/>do a dance"| EMOTION["🎭 EMOTION"]
    INTENT -->|"spiral motion<br/>wiggle antenna"| PROCEDURAL["🌊 PROCEDURAL"]
    INTENT -->|"peek-a-boo<br/>multi-step"| SEQUENCE["🎬 SEQUENCE"]

    SIMPLE --> GOTO_POSE["goto_pose()"]
    EMERGENCY --> STOP_MOVE["stop_movement()"]
    
    EMOTION --> SEARCH_LIB["search_moves()"]
    SEARCH_LIB --> FOUND{"Results?"}
    FOUND -->|"Yes"| PLAY_MOVE["play_move()"]
    FOUND -->|"No"| GEN_MOTION
    
    PROCEDURAL --> GEN_MOTION["generate_motion()"]
    SEQUENCE --> EXEC_SEQ["execute_sequence()"]

    GOTO_POSE --> EXECUTE["⚡ Execute"]
    STOP_MOVE --> EXECUTE
    PLAY_MOVE --> EXECUTE
    GEN_MOTION --> EXECUTE
    EXEC_SEQ --> EXECUTE

    EXECUTE --> ROBOT(("🤖 Robot<br/>Moves"))

Component Summary

Layer	Component	Purpose
Input	Gradio UI	Web interface + audio capture
AI	OpenAI Realtime API	Speech recognition, reasoning, TTS
AI	GPT-4.1 (Planner)	Sequence planning for multi-step actions
Tools	11 functions	Intent execution via function calling
Backend	MoveLibrary	101 pre-recorded HuggingFace moves
Backend	MovementGenerator	50Hz motor control thread
Backend	ProceduralMove	Waveform-based motion generation
Backend	SequenceExecutor	Step-by-step sequence execution
Output	Reachy Mini SDK	Motor control, audio playback

System Prompts

The agent uses two system prompts:

SYSTEM_INSTRUCTIONS (realtime_handler.py)
- Main conversational AI instructions
- Tool selection guide, physical conventions, physics envelope
- ~200 lines
PLANNER_SYSTEM_PROMPT (sequence_planner.py)
- GPT-4.1 sequence planning instructions
- Step types: move, wait, speak, motion
- ~35 lines