Spaces:
Sleeping
Sleeping
SemSorter commited on
Commit ·
2588ff8
0
Parent(s):
feat: SemSorter — AI hazard sorting with Vision-Agents SDK
Browse files- Phase 1: MuJoCo Franka Panda simulation with pick-and-place
- Phase 2: Gemini VLM hazard detection pipeline
- Phase 3: Vision-Agents SDK agent (gemini.LLM + deepgram.STT + elevenlabs.TTS)
- Phase 4: FastAPI web server with WebSocket live video + chat UI
Closes all phases.
- .env.example +15 -0
- .gitignore +34 -0
- Dockerfile +34 -0
- README.md +120 -0
- SemSorter/agent/__init__.py +0 -0
- SemSorter/agent/agent.py +252 -0
- SemSorter/agent/semsorter_instructions.md +22 -0
- SemSorter/server/__init__.py +1 -0
- SemSorter/server/agent_bridge.py +363 -0
- SemSorter/server/app.py +207 -0
- SemSorter/server/static/index.html +427 -0
- SemSorter/simulation/__init__.py +1 -0
- SemSorter/simulation/controller.py +786 -0
- SemSorter/simulation/interactive_test.py +70 -0
- SemSorter/simulation/semsorter_scene.xml +194 -0
- SemSorter/vision/__init__.py +1 -0
- SemSorter/vision/test_obs.py +29 -0
- SemSorter/vision/vision_pipeline.py +239 -0
- SemSorter/vision/vlm_bridge.py +269 -0
- Vision-Agents +1 -0
- render.yaml +21 -0
- requirements-server.txt +18 -0
.env.example
ADDED
|
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# SemSorter Environment Variables
|
| 2 |
+
# Copy to .env and fill in your API keys
|
| 3 |
+
|
| 4 |
+
# GetStream (for real-time video/audio transport)
|
| 5 |
+
STREAM_API_KEY="y3dc7e4xhnsd"
|
| 6 |
+
STREAM_API_SECRET="7kg397sb74r4ambaty5tw4uftd63866sddkgmdtbnktk6ga28cfxuyqevtsffuey"
|
| 7 |
+
|
| 8 |
+
# Google Gemini (for LLM orchestration + VLM hazard detection)
|
| 9 |
+
GOOGLE_API_KEY="AIzaSyCsQc6fjzXElbhAagjL5ORfwUf2v8FZzb4"
|
| 10 |
+
|
| 11 |
+
# Deepgram (for Speech-to-Text)
|
| 12 |
+
DEEPGRAM_API_KEY="21e5a2c42257394eb0019d131809c16a6377d19b"
|
| 13 |
+
|
| 14 |
+
# ElevenLabs (for Text-to-Speech)
|
| 15 |
+
ELEVENLABS_API_KEY="sk_124e4f931e99dc230b0a1a435ab667cf330c088a1b769a15"
|
.gitignore
ADDED
|
@@ -0,0 +1,34 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Debug / generated images
|
| 2 |
+
*.png
|
| 3 |
+
!SemSorter/vision/vision_debug.png
|
| 4 |
+
|
| 5 |
+
# Python
|
| 6 |
+
__pycache__/
|
| 7 |
+
*.pyc
|
| 8 |
+
*.pyo
|
| 9 |
+
.eggs/
|
| 10 |
+
*.egg-info/
|
| 11 |
+
|
| 12 |
+
# MuJoCo
|
| 13 |
+
*.mjb
|
| 14 |
+
mujoco-*/
|
| 15 |
+
mujoco_menagerie/
|
| 16 |
+
|
| 17 |
+
# Vision-Agents SDK venv (too large for git)
|
| 18 |
+
Vision-Agents/.venv/
|
| 19 |
+
Vision-Agents/__pycache__/
|
| 20 |
+
|
| 21 |
+
# uv cache
|
| 22 |
+
.uv/
|
| 23 |
+
uv.lock
|
| 24 |
+
|
| 25 |
+
# IDE
|
| 26 |
+
.vscode/
|
| 27 |
+
.idea/
|
| 28 |
+
|
| 29 |
+
# OS
|
| 30 |
+
.DS_Store
|
| 31 |
+
Thumbs.db
|
| 32 |
+
|
| 33 |
+
# Environment (never commit secrets)
|
| 34 |
+
.env
|
Dockerfile
ADDED
|
@@ -0,0 +1,34 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
FROM python:3.10-slim
|
| 2 |
+
|
| 3 |
+
# ── System deps for MuJoCo EGL rendering ─────────────────────────────────────
|
| 4 |
+
RUN apt-get update && apt-get install -y --no-install-recommends \
|
| 5 |
+
libgl1-mesa-glx \
|
| 6 |
+
libglib2.0-0 \
|
| 7 |
+
libegl1-mesa \
|
| 8 |
+
libegl1 \
|
| 9 |
+
libgles2 \
|
| 10 |
+
libglvnd0 \
|
| 11 |
+
libglx0 \
|
| 12 |
+
libx11-6 \
|
| 13 |
+
wget \
|
| 14 |
+
&& rm -rf /var/lib/apt/lists/*
|
| 15 |
+
|
| 16 |
+
# ── Create working directory ──────────────────────────────────────────────────
|
| 17 |
+
WORKDIR /app
|
| 18 |
+
|
| 19 |
+
# ── Copy requirements first (layer caching) ──────────────────────────────────
|
| 20 |
+
COPY requirements-server.txt ./
|
| 21 |
+
RUN pip install --no-cache-dir -r requirements-server.txt
|
| 22 |
+
|
| 23 |
+
# ── Copy project ──────────────────────────────────────────────────────────────
|
| 24 |
+
COPY . .
|
| 25 |
+
|
| 26 |
+
# ── MuJoCo environment ────────────────────────────────────────────────────────
|
| 27 |
+
ENV MUJOCO_GL=egl
|
| 28 |
+
ENV PYOPENGL_PLATFORM=egl
|
| 29 |
+
|
| 30 |
+
# ── Expose port ───────────────────────────────────────────────────────────────
|
| 31 |
+
EXPOSE 8000
|
| 32 |
+
|
| 33 |
+
# ── Start server ──────────────────────────────────────────────────────────────
|
| 34 |
+
CMD ["uvicorn", "SemSorter.server.app:app", "--host", "0.0.0.0", "--port", "8000"]
|
README.md
ADDED
|
@@ -0,0 +1,120 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# SemSorter — AI Hazard Sorting System
|
| 2 |
+
|
| 3 |
+
> **Real-time robotic arm simulation controlled by a multimodal AI agent using the [Vision-Agents SDK](https://github.com/GetStream/vision-agents) by GetStream.**
|
| 4 |
+
|
| 5 |
+
[](https://semsorter.onrender.com)
|
| 6 |
+
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
## 🤖 Overview
|
| 10 |
+
|
| 11 |
+
SemSorter is an AI-powered hazardous waste sorting system where a Franka Panda robotic arm, simulated in MuJoCo, is controlled by a multimodal AI agent. The agent:
|
| 12 |
+
|
| 13 |
+
1. **Watches** the conveyor belt via a live camera feed
|
| 14 |
+
2. **Detects** hazardous items (flammable / chemical) using **Gemini VLM**
|
| 15 |
+
3. **Plans and executes** pick-and-place operations via **Gemini LLM function-calling**
|
| 16 |
+
4. **Speaks back** results using **ElevenLabs TTS**
|
| 17 |
+
5. **Listens** to voice commands via **Deepgram STT**
|
| 18 |
+
|
| 19 |
+
All orchestration uses the **[Vision-Agents SDK](https://github.com/GetStream/vision-agents)** by GetStream.
|
| 20 |
+
|
| 21 |
+
---
|
| 22 |
+
|
| 23 |
+
## 🏗 Architecture
|
| 24 |
+
|
| 25 |
+
```
|
| 26 |
+
Browser ←─── WebSocket ───→ FastAPI Server
|
| 27 |
+
│
|
| 28 |
+
Vision-Agents SDK Agent
|
| 29 |
+
┌─────────┴──────────┐
|
| 30 |
+
gemini.LLM deepgram.STT
|
| 31 |
+
(tool-calling) (voice→text)
|
| 32 |
+
│
|
| 33 |
+
VLM Bridge
|
| 34 |
+
│
|
| 35 |
+
MuJoCo Sim (Franka Panda)
|
| 36 |
+
```
|
| 37 |
+
|
| 38 |
+
---
|
| 39 |
+
|
| 40 |
+
## 🚀 Quick Start
|
| 41 |
+
|
| 42 |
+
### Prerequisites
|
| 43 |
+
- Python 3.10+
|
| 44 |
+
- MuJoCo 3.x
|
| 45 |
+
- EGL (headless GPU rendering)
|
| 46 |
+
|
| 47 |
+
### Local Setup
|
| 48 |
+
|
| 49 |
+
```bash
|
| 50 |
+
# Clone
|
| 51 |
+
git clone https://github.com/YOUR_USERNAME/SemSorter.git
|
| 52 |
+
cd SemSorter
|
| 53 |
+
|
| 54 |
+
# Install dependencies
|
| 55 |
+
pip install -r requirements-server.txt
|
| 56 |
+
|
| 57 |
+
# Configure API keys
|
| 58 |
+
cp .env.example .env
|
| 59 |
+
# Edit .env with your keys:
|
| 60 |
+
# GOOGLE_API_KEY, DEEPGRAM_API_KEY, ELEVENLABS_API_KEY
|
| 61 |
+
# STREAM_API_KEY, STREAM_API_SECRET
|
| 62 |
+
|
| 63 |
+
# Run
|
| 64 |
+
MUJOCO_GL=egl uvicorn SemSorter.server.app:app --host 0.0.0.0 --port 8000
|
| 65 |
+
# Open http://localhost:8000
|
| 66 |
+
```
|
| 67 |
+
|
| 68 |
+
### Voice Agent (Vision-Agents SDK CLI)
|
| 69 |
+
```bash
|
| 70 |
+
cd Vision-Agents
|
| 71 |
+
MUJOCO_GL=egl uv run python ../SemSorter/agent/agent.py run
|
| 72 |
+
```
|
| 73 |
+
|
| 74 |
+
---
|
| 75 |
+
|
| 76 |
+
## 📦 Project Structure
|
| 77 |
+
|
| 78 |
+
```
|
| 79 |
+
SemSorter/
|
| 80 |
+
├── SemSorter/
|
| 81 |
+
│ ├── simulation/
|
| 82 |
+
│ │ ├── controller.py # MuJoCo sim + IK + pick-and-place
|
| 83 |
+
│ │ └── semsorter_scene.xml # MJCF scene (Panda + conveyor + bins)
|
| 84 |
+
│ ├── vision/
|
| 85 |
+
│ │ ├── vision_pipeline.py # Gemini VLM hazard detection
|
| 86 |
+
│ │ └── vlm_bridge.py # VLM → sim item matching
|
| 87 |
+
│ ├── agent/
|
| 88 |
+
│ │ ├── agent.py # Vision-Agents SDK agent
|
| 89 |
+
│ │ └── semsorter_instructions.md
|
| 90 |
+
│ └── server/
|
| 91 |
+
│ ├── app.py # FastAPI + WebSocket video stream
|
| 92 |
+
│ ├── agent_bridge.py # SDK bridge + quota detection
|
| 93 |
+
│ └── static/index.html # Web UI
|
| 94 |
+
├── Vision-Agents/ # GetStream Vision-Agents SDK
|
| 95 |
+
├── Dockerfile
|
| 96 |
+
├── render.yaml
|
| 97 |
+
└── requirements-server.txt
|
| 98 |
+
```
|
| 99 |
+
|
| 100 |
+
---
|
| 101 |
+
|
| 102 |
+
## 🔑 API Keys Required
|
| 103 |
+
|
| 104 |
+
| Service | Purpose | Free tier |
|
| 105 |
+
|---|---|---|
|
| 106 |
+
| Google Gemini | LLM orchestration + VLM detection | 15 RPM |
|
| 107 |
+
| Deepgram | Speech-to-Text | 45 min/month |
|
| 108 |
+
| ElevenLabs | Text-to-Speech | ~10k chars/month |
|
| 109 |
+
| GetStream | Real-time video call (Voice agent) | Free tier available |
|
| 110 |
+
|
| 111 |
+
> **API exhaustion handling:** The server detects quota errors (`429 / ResourceExhausted`) and automatically switches to demo-mode per service, showing a banner in the UI.
|
| 112 |
+
|
| 113 |
+
---
|
| 114 |
+
|
| 115 |
+
## 🐳 Deploy to Render
|
| 116 |
+
|
| 117 |
+
1. Fork this repo
|
| 118 |
+
2. Create a new **Web Service** on [Render.com](https://render.com) pointing to your fork
|
| 119 |
+
3. Add your API keys as **Environment Variables** in the Render dashboard
|
| 120 |
+
4. Done — Render auto-deploys from `render.yaml`
|
SemSorter/agent/__init__.py
ADDED
|
File without changes
|
SemSorter/agent/agent.py
ADDED
|
@@ -0,0 +1,252 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
SemSorter Agent — Vision-Agents SDK Integration
|
| 3 |
+
|
| 4 |
+
This module creates a real-time AI agent using GetStream's Vision-Agents SDK.
|
| 5 |
+
The agent watches the MuJoCo simulation via video, listens to voice commands,
|
| 6 |
+
detects hazardous items using Gemini VLM, and triggers pick-and-place operations.
|
| 7 |
+
|
| 8 |
+
Usage (from the Vision-Agents directory):
|
| 9 |
+
# Set env vars in .env first, then:
|
| 10 |
+
uv run python ../SemSorter/SemSorter/agent/agent.py run
|
| 11 |
+
"""
|
| 12 |
+
|
| 13 |
+
import logging
|
| 14 |
+
import os
|
| 15 |
+
import sys
|
| 16 |
+
import atexit
|
| 17 |
+
from pathlib import Path
|
| 18 |
+
from typing import Any, Dict
|
| 19 |
+
|
| 20 |
+
from dotenv import load_dotenv
|
| 21 |
+
from vision_agents.core import Agent, AgentLauncher, Runner, User
|
| 22 |
+
from vision_agents.plugins import deepgram, elevenlabs, gemini, getstream
|
| 23 |
+
|
| 24 |
+
logger = logging.getLogger(__name__)
|
| 25 |
+
|
| 26 |
+
# ─── Path setup ──────────────────────────────────────────────────────────────
|
| 27 |
+
# Add SemSorter packages to sys.path so we can import simulation & vision
|
| 28 |
+
AGENT_DIR = Path(__file__).resolve().parent
|
| 29 |
+
SEMSORTER_DIR = AGENT_DIR.parent
|
| 30 |
+
PROJECT_ROOT = SEMSORTER_DIR.parent
|
| 31 |
+
|
| 32 |
+
sys.path.insert(0, str(SEMSORTER_DIR / "simulation"))
|
| 33 |
+
sys.path.insert(0, str(SEMSORTER_DIR / "vision"))
|
| 34 |
+
|
| 35 |
+
# Load environment variables
|
| 36 |
+
load_dotenv(PROJECT_ROOT / ".env")
|
| 37 |
+
|
| 38 |
+
# ─── Simulation singleton ───────────────────────────────────────────────────
|
| 39 |
+
_simulation = None
|
| 40 |
+
_bridge = None
|
| 41 |
+
|
| 42 |
+
|
| 43 |
+
def get_simulation():
|
| 44 |
+
"""Lazy-initialize the MuJoCo simulation (singleton)."""
|
| 45 |
+
global _simulation
|
| 46 |
+
if _simulation is None:
|
| 47 |
+
os.environ.setdefault("MUJOCO_GL", "egl")
|
| 48 |
+
from controller import SemSorterSimulation
|
| 49 |
+
|
| 50 |
+
logger.info("Initializing MuJoCo simulation...")
|
| 51 |
+
_simulation = SemSorterSimulation()
|
| 52 |
+
_simulation.load_scene()
|
| 53 |
+
_simulation.step(200) # Let physics settle
|
| 54 |
+
logger.info("Simulation ready.")
|
| 55 |
+
return _simulation
|
| 56 |
+
|
| 57 |
+
|
| 58 |
+
def get_bridge():
|
| 59 |
+
"""Lazy-initialize the VLM-Simulation bridge (singleton)."""
|
| 60 |
+
global _bridge
|
| 61 |
+
if _bridge is None:
|
| 62 |
+
from vlm_bridge import VLMSimBridge
|
| 63 |
+
|
| 64 |
+
sim = get_simulation()
|
| 65 |
+
_bridge = VLMSimBridge(simulation=sim, use_direct=True)
|
| 66 |
+
logger.info("VLM-Sim bridge ready.")
|
| 67 |
+
return _bridge
|
| 68 |
+
|
| 69 |
+
|
| 70 |
+
class _EGLStderrFilter:
|
| 71 |
+
"""Stderr wrapper that suppresses only known EGL teardown noise."""
|
| 72 |
+
_SUPPRESSED = ("EGLError", "eglDestroyContext", "eglMakeCurrent",
|
| 73 |
+
"EGL_NOT_INITIALIZED", "GLContext.__del__",
|
| 74 |
+
"Renderer.__del__", "SfuStatsReporter",
|
| 75 |
+
"Task was destroyed but it is pending")
|
| 76 |
+
|
| 77 |
+
def __init__(self, real):
|
| 78 |
+
self._real = real
|
| 79 |
+
|
| 80 |
+
def write(self, s):
|
| 81 |
+
if any(tok in s for tok in self._SUPPRESSED):
|
| 82 |
+
return len(s) # silently consume
|
| 83 |
+
return self._real.write(s)
|
| 84 |
+
|
| 85 |
+
def flush(self):
|
| 86 |
+
self._real.flush()
|
| 87 |
+
|
| 88 |
+
def __getattr__(self, name):
|
| 89 |
+
return getattr(self._real, name)
|
| 90 |
+
|
| 91 |
+
|
| 92 |
+
def close_resources() -> None:
|
| 93 |
+
"""Release singleton resources on process shutdown."""
|
| 94 |
+
# Only muffle known-harmless EGL teardown noise, keep real errors visible
|
| 95 |
+
sys.stderr = _EGLStderrFilter(sys.stderr)
|
| 96 |
+
|
| 97 |
+
global _bridge, _simulation
|
| 98 |
+
if _bridge is not None:
|
| 99 |
+
try:
|
| 100 |
+
_bridge.close()
|
| 101 |
+
except Exception:
|
| 102 |
+
pass
|
| 103 |
+
_bridge = None
|
| 104 |
+
if _simulation is not None and hasattr(_simulation, "close"):
|
| 105 |
+
try:
|
| 106 |
+
_simulation.close()
|
| 107 |
+
except Exception:
|
| 108 |
+
pass
|
| 109 |
+
_simulation = None
|
| 110 |
+
|
| 111 |
+
|
| 112 |
+
atexit.register(close_resources)
|
| 113 |
+
|
| 114 |
+
|
| 115 |
+
# ─── LLM Setup with Tool Registration ───────────────────────────────────────
|
| 116 |
+
|
| 117 |
+
INSTRUCTIONS = Path(AGENT_DIR / "semsorter_instructions.md").read_text()
|
| 118 |
+
|
| 119 |
+
|
| 120 |
+
def setup_llm(model: str = "gemini-3-flash-preview") -> gemini.LLM:
|
| 121 |
+
"""Create and configure the Gemini LLM with registered simulation tools."""
|
| 122 |
+
llm = gemini.LLM(model)
|
| 123 |
+
|
| 124 |
+
@llm.register_function(
|
| 125 |
+
description="Scan the conveyor belt camera feed for hazardous items. "
|
| 126 |
+
"Returns a list of detected hazardous items with their types and positions."
|
| 127 |
+
)
|
| 128 |
+
async def scan_for_hazards() -> Dict[str, Any]:
|
| 129 |
+
"""Capture a frame, match detections to sim items, and return actionable IDs."""
|
| 130 |
+
bridge = get_bridge()
|
| 131 |
+
detections = bridge.processor.detect_hazards()
|
| 132 |
+
matched = bridge.match_detections_to_items(detections)
|
| 133 |
+
return {
|
| 134 |
+
"hazards_found": len(detections),
|
| 135 |
+
"items_matched": len(matched),
|
| 136 |
+
"items": [
|
| 137 |
+
{
|
| 138 |
+
"item_name": d.get("sim_item", "unknown"),
|
| 139 |
+
"bin_type": d.get("bin_type").value if d.get("bin_type") else "unknown",
|
| 140 |
+
"detected_name": d.get("name", "unknown"),
|
| 141 |
+
"type": str(d.get("type", "unknown")).lower(),
|
| 142 |
+
"color": d.get("color", "unknown"),
|
| 143 |
+
"shape": d.get("shape", "unknown"),
|
| 144 |
+
}
|
| 145 |
+
for d in matched
|
| 146 |
+
],
|
| 147 |
+
}
|
| 148 |
+
|
| 149 |
+
@llm.register_function(
|
| 150 |
+
description="Pick a specific item from the conveyor and place it in "
|
| 151 |
+
"the designated hazard bin. Use item_name from scan results. "
|
| 152 |
+
"bin_type must be 'flammable' or 'chemical'."
|
| 153 |
+
)
|
| 154 |
+
async def pick_and_place_item(item_name: str, bin_type: str) -> Dict[str, Any]:
|
| 155 |
+
"""Execute a pick-and-place operation for a specific item."""
|
| 156 |
+
from controller import BinType
|
| 157 |
+
|
| 158 |
+
sim = get_simulation()
|
| 159 |
+
|
| 160 |
+
type_map = {"flammable": BinType.FLAMMABLE, "chemical": BinType.CHEMICAL}
|
| 161 |
+
target_bin = type_map.get(bin_type.lower())
|
| 162 |
+
if target_bin is None:
|
| 163 |
+
return {"success": False, "error": f"Unknown bin type: {bin_type}"}
|
| 164 |
+
|
| 165 |
+
if item_name not in sim.items:
|
| 166 |
+
return {"success": False, "error": f"Unknown item: {item_name}"}
|
| 167 |
+
|
| 168 |
+
if sim.items[item_name].picked:
|
| 169 |
+
return {"success": False, "error": f"Item {item_name} already sorted"}
|
| 170 |
+
|
| 171 |
+
success = sim.pick_and_place(item_name, target_bin)
|
| 172 |
+
return {
|
| 173 |
+
"success": success,
|
| 174 |
+
"item": item_name,
|
| 175 |
+
"bin": bin_type,
|
| 176 |
+
"total_sorted": sim._items_sorted,
|
| 177 |
+
}
|
| 178 |
+
|
| 179 |
+
@llm.register_function(
|
| 180 |
+
description="Get the current state of the simulation: items, robot position, "
|
| 181 |
+
"and sorting progress."
|
| 182 |
+
)
|
| 183 |
+
async def get_simulation_state() -> Dict[str, Any]:
|
| 184 |
+
"""Return current simulation state snapshot."""
|
| 185 |
+
sim = get_simulation()
|
| 186 |
+
state = sim.get_state()
|
| 187 |
+
return {
|
| 188 |
+
"time": round(state.time, 2),
|
| 189 |
+
"arm_busy": state.arm_busy,
|
| 190 |
+
"gripper_open": state.gripper_open,
|
| 191 |
+
"items_sorted": state.items_sorted,
|
| 192 |
+
"ee_position": [round(x, 3) for x in state.ee_pos],
|
| 193 |
+
"items": state.items,
|
| 194 |
+
}
|
| 195 |
+
|
| 196 |
+
@llm.register_function(
|
| 197 |
+
description="Automatically scan for ALL hazardous items and sort them into "
|
| 198 |
+
"the correct bins. This runs the full detect-match-sort pipeline."
|
| 199 |
+
)
|
| 200 |
+
async def sort_all_hazards() -> Dict[str, Any]:
|
| 201 |
+
"""Full automated pipeline: detect → match → pick-and-place all hazards."""
|
| 202 |
+
bridge = get_bridge()
|
| 203 |
+
result = bridge.detect_and_sort()
|
| 204 |
+
return {
|
| 205 |
+
"hazards_detected": result["detected"],
|
| 206 |
+
"items_matched": result["matched"],
|
| 207 |
+
"items_sorted": result["sorted"],
|
| 208 |
+
"details": result["details"],
|
| 209 |
+
}
|
| 210 |
+
|
| 211 |
+
return llm
|
| 212 |
+
|
| 213 |
+
|
| 214 |
+
# ─── Agent Creation ──────────────────────────────────────────────────────────
|
| 215 |
+
|
| 216 |
+
|
| 217 |
+
async def create_agent(**kwargs) -> Agent:
|
| 218 |
+
"""Create the SemSorter agent with Vision-Agents SDK."""
|
| 219 |
+
llm = setup_llm()
|
| 220 |
+
|
| 221 |
+
agent = Agent(
|
| 222 |
+
edge=getstream.Edge(),
|
| 223 |
+
agent_user=User(name="SemSorter AI", id="semsorter-agent"),
|
| 224 |
+
instructions=INSTRUCTIONS,
|
| 225 |
+
llm=llm,
|
| 226 |
+
tts=elevenlabs.TTS(model_id="eleven_flash_v2_5"),
|
| 227 |
+
stt=deepgram.STT(eager_turn_detection=True),
|
| 228 |
+
processors=[],
|
| 229 |
+
)
|
| 230 |
+
|
| 231 |
+
return agent
|
| 232 |
+
|
| 233 |
+
|
| 234 |
+
async def join_call(agent: Agent, call_type: str, call_id: str, **kwargs) -> None:
|
| 235 |
+
"""Join a GetStream video call and start the agent loop."""
|
| 236 |
+
call = await agent.create_call(call_type, call_id)
|
| 237 |
+
|
| 238 |
+
async with agent.join(call):
|
| 239 |
+
# Greet the user
|
| 240 |
+
await agent.simple_response(
|
| 241 |
+
"Hello! I'm the SemSorter AI. I can scan the conveyor belt "
|
| 242 |
+
"for hazardous items and sort them into the correct bins. "
|
| 243 |
+
"Just tell me what to do!"
|
| 244 |
+
)
|
| 245 |
+
# Run until the call ends
|
| 246 |
+
await agent.finish()
|
| 247 |
+
|
| 248 |
+
|
| 249 |
+
# ─── Entry point ─────────────────────────────────────────────────────────────
|
| 250 |
+
|
| 251 |
+
if __name__ == "__main__":
|
| 252 |
+
Runner(AgentLauncher(create_agent=create_agent, join_call=join_call)).cli()
|
SemSorter/agent/semsorter_instructions.md
ADDED
|
@@ -0,0 +1,22 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
You are the SemSorter AI assistant — a robotic waste sorting system operator.
|
| 2 |
+
|
| 3 |
+
## Your Role
|
| 4 |
+
You control a Franka Panda robot arm that sorts hazardous waste items on a conveyor belt into the correct safety bins:
|
| 5 |
+
- **Flammable items** (red colored) → Red flammable bin
|
| 6 |
+
- **Chemical items** (yellow colored) → Yellow chemical bin
|
| 7 |
+
- **Safe items** (gray/white/blue/green) → Leave on conveyor (no action needed)
|
| 8 |
+
|
| 9 |
+
## Available Tools
|
| 10 |
+
|
| 11 |
+
1. **scan_for_hazards** — Capture a frame from the conveyor camera and analyze it with the VLM to detect hazardous items. Call this FIRST when asked to sort items.
|
| 12 |
+
2. **pick_and_place_item** — Pick a specific item and place it in the designated bin. Use the item_name and bin_type returned by scan_for_hazards.
|
| 13 |
+
3. **get_simulation_state** — Check the current status: which items exist, which have been sorted, and the robot's position.
|
| 14 |
+
4. **sort_all_hazards** — Automatically scan and sort ALL detected hazardous items in one go.
|
| 15 |
+
|
| 16 |
+
## Behavior Rules
|
| 17 |
+
- When asked to "sort items" or "clean up", call `sort_all_hazards` for the full automated pipeline.
|
| 18 |
+
- When asked about "what's on the belt" or "scan", call `scan_for_hazards` and describe the results.
|
| 19 |
+
- When asked about a specific item, call `get_simulation_state` to check its status.
|
| 20 |
+
- Keep responses SHORT and conversational (1-2 sentences).
|
| 21 |
+
- Announce each action as you do it: "Scanning the belt...", "Picking up the red cylinder...", "Placed in flammable bin!"
|
| 22 |
+
- If no hazards are found, say something like "All clear! No hazardous items detected."
|
SemSorter/server/__init__.py
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
# SemSorter Web Server
|
SemSorter/server/agent_bridge.py
ADDED
|
@@ -0,0 +1,363 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
SemSorter Agent Bridge
|
| 3 |
+
======================
|
| 4 |
+
Wraps the Vision-Agents SDK components (gemini.LLM, deepgram.STT, elevenlabs.TTS)
|
| 5 |
+
and the MuJoCo simulation into a single async service used by the FastAPI server.
|
| 6 |
+
|
| 7 |
+
Quota/API exhaustion is detected per-service and a UIstatus message is returned
|
| 8 |
+
so the frontend can display an informative banner before demo-mode engages.
|
| 9 |
+
"""
|
| 10 |
+
|
| 11 |
+
import asyncio
|
| 12 |
+
import logging
|
| 13 |
+
import os
|
| 14 |
+
import sys
|
| 15 |
+
from pathlib import Path
|
| 16 |
+
from typing import Any, Callable, Dict, List, Optional
|
| 17 |
+
|
| 18 |
+
logger = logging.getLogger(__name__)
|
| 19 |
+
|
| 20 |
+
# ── Path setup ────────────────────────────────────────────────────────────────
|
| 21 |
+
_SERVER_DIR = Path(__file__).resolve().parent
|
| 22 |
+
_SEMSORTER_DIR = _SERVER_DIR.parent
|
| 23 |
+
_PROJECT_ROOT = _SEMSORTER_DIR.parent
|
| 24 |
+
|
| 25 |
+
sys.path.insert(0, str(_SEMSORTER_DIR / "simulation"))
|
| 26 |
+
sys.path.insert(0, str(_SEMSORTER_DIR / "vision"))
|
| 27 |
+
sys.path.insert(0, str(_PROJECT_ROOT / "Vision-Agents" / "agents-core"))
|
| 28 |
+
for _plugin in ("gemini", "deepgram", "elevenlabs", "getstream"):
|
| 29 |
+
_plugin_path = _PROJECT_ROOT / "Vision-Agents" / "plugins" / _plugin
|
| 30 |
+
if _plugin_path.exists():
|
| 31 |
+
sys.path.insert(0, str(_plugin_path))
|
| 32 |
+
|
| 33 |
+
# ── Quota-tracking state ──────────────────────────────────────────────────────
|
| 34 |
+
_quota_exceeded: Dict[str, bool] = {
|
| 35 |
+
"gemini": False,
|
| 36 |
+
"deepgram": False,
|
| 37 |
+
"elevenlabs": False,
|
| 38 |
+
}
|
| 39 |
+
|
| 40 |
+
# ── Demo-mode pre-recorded detections ────────────────────────────────────────
|
| 41 |
+
_DEMO_DETECTIONS = [
|
| 42 |
+
{"name": "red cylinder", "type": "FLAMMABLE", "color": "red",
|
| 43 |
+
"shape": "cylinder", "box_2d": [240, 200, 290, 260]},
|
| 44 |
+
{"name": "green box", "type": "FLAMMABLE", "color": "green",
|
| 45 |
+
"shape": "box", "box_2d": [240, 260, 285, 310]},
|
| 46 |
+
{"name": "yellow box", "type": "CHEMICAL", "color": "yellow",
|
| 47 |
+
"shape": "box", "box_2d": [240, 310, 285, 360]},
|
| 48 |
+
{"name": "blue box", "type": "CHEMICAL", "color": "blue",
|
| 49 |
+
"shape": "box", "box_2d": [240, 370, 285, 420]},
|
| 50 |
+
]
|
| 51 |
+
|
| 52 |
+
# ── Singleton resources ───────────────────────────────────────────────────────
|
| 53 |
+
_sim = None
|
| 54 |
+
_bridge = None
|
| 55 |
+
_llm = None
|
| 56 |
+
_tts = None
|
| 57 |
+
_notify_cb: Optional[Callable[[Dict], None]] = None # Push events to WebSocket
|
| 58 |
+
|
| 59 |
+
|
| 60 |
+
def set_notify_callback(cb: Callable[[Dict], None]) -> None:
|
| 61 |
+
"""Register a callback that pushes quota/status events to connected WS clients."""
|
| 62 |
+
global _notify_cb
|
| 63 |
+
_notify_cb = cb
|
| 64 |
+
|
| 65 |
+
|
| 66 |
+
def _push(event: Dict) -> None:
|
| 67 |
+
"""Fire-and-forget push to the registered notify callback."""
|
| 68 |
+
if _notify_cb:
|
| 69 |
+
try:
|
| 70 |
+
_notify_cb(event)
|
| 71 |
+
except Exception:
|
| 72 |
+
pass
|
| 73 |
+
|
| 74 |
+
|
| 75 |
+
def _check_quota_error(exc: Exception) -> Optional[str]:
|
| 76 |
+
"""Return service name if the exception indicates API quota exhaustion."""
|
| 77 |
+
msg = str(exc).lower()
|
| 78 |
+
if "resource_exhausted" in msg or "429" in msg or "quota" in msg:
|
| 79 |
+
if "gemini" in msg or "google" in msg:
|
| 80 |
+
return "gemini"
|
| 81 |
+
if "deepgram" in msg:
|
| 82 |
+
return "deepgram"
|
| 83 |
+
if "elevenlabs" in msg or "eleven" in msg:
|
| 84 |
+
return "elevenlabs"
|
| 85 |
+
return "unknown"
|
| 86 |
+
return None
|
| 87 |
+
|
| 88 |
+
|
| 89 |
+
def _mark_quota_exceeded(service: str) -> None:
|
| 90 |
+
"""Mark a service as quota-exceeded and push a warning to the UI."""
|
| 91 |
+
if not _quota_exceeded.get(service):
|
| 92 |
+
_quota_exceeded[service] = True
|
| 93 |
+
_push({
|
| 94 |
+
"type": "quota_warning",
|
| 95 |
+
"service": service,
|
| 96 |
+
"message": (
|
| 97 |
+
f"⚠️ {service.title()} API quota exceeded — "
|
| 98 |
+
f"switching to demo mode for this service."
|
| 99 |
+
),
|
| 100 |
+
})
|
| 101 |
+
logger.warning("Quota exceeded for %s — demo mode activated", service)
|
| 102 |
+
|
| 103 |
+
|
| 104 |
+
# ── Lazy initializers ─────────────────────────────────────────────────────────
|
| 105 |
+
|
| 106 |
+
def get_simulation():
|
| 107 |
+
global _sim
|
| 108 |
+
if _sim is None:
|
| 109 |
+
os.environ.setdefault("MUJOCO_GL", "egl")
|
| 110 |
+
from controller import SemSorterSimulation
|
| 111 |
+
logger.info("Initialising MuJoCo simulation…")
|
| 112 |
+
_sim = SemSorterSimulation()
|
| 113 |
+
_sim.load_scene()
|
| 114 |
+
_sim.step(300)
|
| 115 |
+
logger.info("Simulation ready: %d items", len(_sim.items))
|
| 116 |
+
return _sim
|
| 117 |
+
|
| 118 |
+
|
| 119 |
+
def get_bridge():
|
| 120 |
+
global _bridge
|
| 121 |
+
if _bridge is None:
|
| 122 |
+
from vlm_bridge import VLMSimBridge
|
| 123 |
+
_bridge = VLMSimBridge(simulation=get_simulation(), use_direct=True)
|
| 124 |
+
logger.info("VLM bridge ready")
|
| 125 |
+
return _bridge
|
| 126 |
+
|
| 127 |
+
|
| 128 |
+
def get_llm():
|
| 129 |
+
"""Return a configured gemini.LLM instance from the Vision-Agents SDK."""
|
| 130 |
+
global _llm
|
| 131 |
+
if _llm is None:
|
| 132 |
+
from vision_agents.plugins.gemini.gemini_llm import GeminiLLM as GeminiLLMCls
|
| 133 |
+
_llm = GeminiLLMCls("gemini-2.0-flash")
|
| 134 |
+
_register_tools(_llm)
|
| 135 |
+
logger.info("Gemini LLM ready")
|
| 136 |
+
return _llm
|
| 137 |
+
|
| 138 |
+
|
| 139 |
+
def _register_tools(llm) -> None:
|
| 140 |
+
"""Register simulation control tools on the LLM."""
|
| 141 |
+
|
| 142 |
+
@llm.register_function(description="Scan the conveyor belt for hazardous items.")
|
| 143 |
+
async def scan_for_hazards() -> Dict[str, Any]:
|
| 144 |
+
return await _scan_hazards_impl()
|
| 145 |
+
|
| 146 |
+
@llm.register_function(
|
| 147 |
+
description="Pick a specific item by sim name and place it in its bin. "
|
| 148 |
+
"bin_type must be 'flammable' or 'chemical'.")
|
| 149 |
+
async def pick_and_place_item(item_name: str, bin_type: str) -> Dict[str, Any]:
|
| 150 |
+
return await _pick_place_impl(item_name, bin_type)
|
| 151 |
+
|
| 152 |
+
@llm.register_function(description="Get current simulation state snapshot.")
|
| 153 |
+
async def get_simulation_state() -> Dict[str, Any]:
|
| 154 |
+
return _state_impl()
|
| 155 |
+
|
| 156 |
+
@llm.register_function(
|
| 157 |
+
description="Detect ALL hazardous items and sort them automatically.")
|
| 158 |
+
async def sort_all_hazards() -> Dict[str, Any]:
|
| 159 |
+
return await _sort_all_impl()
|
| 160 |
+
|
| 161 |
+
|
| 162 |
+
# ── Tool implementations ──────────────────────────────────────────────────────
|
| 163 |
+
|
| 164 |
+
async def _scan_hazards_impl() -> Dict[str, Any]:
|
| 165 |
+
if _quota_exceeded["gemini"]:
|
| 166 |
+
# Already in demo mode — return pre-recorded detections
|
| 167 |
+
bridge = get_bridge()
|
| 168 |
+
matched = bridge.match_detections_to_items(_DEMO_DETECTIONS)
|
| 169 |
+
return _format_scan(matched, demo=True)
|
| 170 |
+
|
| 171 |
+
try:
|
| 172 |
+
bridge = get_bridge()
|
| 173 |
+
loop = asyncio.get_event_loop()
|
| 174 |
+
detections = await loop.run_in_executor(
|
| 175 |
+
None, bridge.processor.detect_hazards)
|
| 176 |
+
matched = bridge.match_detections_to_items(detections)
|
| 177 |
+
return _format_scan(matched, demo=False)
|
| 178 |
+
except Exception as exc:
|
| 179 |
+
svc = _check_quota_error(exc)
|
| 180 |
+
if svc:
|
| 181 |
+
_mark_quota_exceeded(svc)
|
| 182 |
+
bridge = get_bridge()
|
| 183 |
+
matched = bridge.match_detections_to_items(_DEMO_DETECTIONS)
|
| 184 |
+
return _format_scan(matched, demo=True)
|
| 185 |
+
raise
|
| 186 |
+
|
| 187 |
+
|
| 188 |
+
def _format_scan(matched: List[Dict], demo: bool) -> Dict[str, Any]:
|
| 189 |
+
return {
|
| 190 |
+
"demo_mode": demo,
|
| 191 |
+
"hazards_found": len(matched),
|
| 192 |
+
"items": [
|
| 193 |
+
{
|
| 194 |
+
"item_name": d.get("sim_item", "unknown"),
|
| 195 |
+
"bin_type": d["bin_type"].value if d.get("bin_type") else "unknown",
|
| 196 |
+
"detected_name": d.get("name", "unknown"),
|
| 197 |
+
"type": str(d.get("type", "")).lower(),
|
| 198 |
+
"color": d.get("color", ""),
|
| 199 |
+
"shape": d.get("shape", ""),
|
| 200 |
+
}
|
| 201 |
+
for d in matched
|
| 202 |
+
],
|
| 203 |
+
}
|
| 204 |
+
|
| 205 |
+
|
| 206 |
+
async def _pick_place_impl(item_name: str, bin_type: str) -> Dict[str, Any]:
|
| 207 |
+
from controller import BinType
|
| 208 |
+
sim = get_simulation()
|
| 209 |
+
type_map = {"flammable": BinType.FLAMMABLE, "chemical": BinType.CHEMICAL}
|
| 210 |
+
target = type_map.get(bin_type.lower())
|
| 211 |
+
if not target:
|
| 212 |
+
return {"success": False, "error": f"Unknown bin: {bin_type}"}
|
| 213 |
+
if item_name not in sim.items:
|
| 214 |
+
return {"success": False, "error": f"Unknown item: {item_name}"}
|
| 215 |
+
if sim.items[item_name].picked:
|
| 216 |
+
return {"success": False, "error": f"{item_name} already sorted"}
|
| 217 |
+
|
| 218 |
+
loop = asyncio.get_event_loop()
|
| 219 |
+
success = await loop.run_in_executor(None, sim.pick_and_place, item_name, target)
|
| 220 |
+
return {"success": success, "item": item_name, "bin": bin_type,
|
| 221 |
+
"total_sorted": sim._items_sorted}
|
| 222 |
+
|
| 223 |
+
|
| 224 |
+
def _state_impl() -> Dict[str, Any]:
|
| 225 |
+
sim = get_simulation()
|
| 226 |
+
state = sim.get_state()
|
| 227 |
+
return {
|
| 228 |
+
"time": round(state.time, 2),
|
| 229 |
+
"arm_busy": state.arm_busy,
|
| 230 |
+
"items_sorted": state.items_sorted,
|
| 231 |
+
"ee_position": [round(x, 3) for x in state.ee_pos],
|
| 232 |
+
"quota_exceeded": dict(_quota_exceeded),
|
| 233 |
+
"items": [
|
| 234 |
+
{"name": i["name"], "picked": i["picked"],
|
| 235 |
+
"hazard_type": i.get("hazard_type")}
|
| 236 |
+
for i in state.items
|
| 237 |
+
],
|
| 238 |
+
}
|
| 239 |
+
|
| 240 |
+
|
| 241 |
+
async def _sort_all_impl() -> Dict[str, Any]:
|
| 242 |
+
"""Full detect → match → sort pipeline."""
|
| 243 |
+
# 1. Detect
|
| 244 |
+
scan_result = await _scan_hazards_impl()
|
| 245 |
+
items = scan_result["items"]
|
| 246 |
+
demo = scan_result["demo_mode"]
|
| 247 |
+
|
| 248 |
+
if not items:
|
| 249 |
+
return {"hazards_detected": 0, "items_matched": 0, "items_sorted": 0,
|
| 250 |
+
"details": [], "demo_mode": demo}
|
| 251 |
+
|
| 252 |
+
# 2. Sort each matched item
|
| 253 |
+
details = []
|
| 254 |
+
sorted_count = 0
|
| 255 |
+
for item in items:
|
| 256 |
+
r = await _pick_place_impl(item["item_name"], item["bin_type"])
|
| 257 |
+
details.append({"item": item["item_name"], "bin": item["bin_type"],
|
| 258 |
+
"success": r.get("success", False)})
|
| 259 |
+
if r.get("success"):
|
| 260 |
+
sorted_count += 1
|
| 261 |
+
|
| 262 |
+
return {"hazards_detected": len(items), "items_matched": len(items),
|
| 263 |
+
"items_sorted": sorted_count, "details": details, "demo_mode": demo}
|
| 264 |
+
|
| 265 |
+
|
| 266 |
+
# ── Text → agent response ─────────────────────────────────────────────────────
|
| 267 |
+
|
| 268 |
+
async def process_text_command(text: str) -> str:
|
| 269 |
+
"""
|
| 270 |
+
Send a text command to the Gemini LLM (Vision-Agents SDK).
|
| 271 |
+
Returns the agent's text response.
|
| 272 |
+
On quota error: marks exceeded + returns a canned message.
|
| 273 |
+
"""
|
| 274 |
+
if _quota_exceeded["gemini"]:
|
| 275 |
+
return await _llm_demo_response(text)
|
| 276 |
+
|
| 277 |
+
try:
|
| 278 |
+
llm = get_llm()
|
| 279 |
+
# Use the LLM's chat method to get a response with tool-calling
|
| 280 |
+
response = await llm.chat(text)
|
| 281 |
+
return response
|
| 282 |
+
except Exception as exc:
|
| 283 |
+
svc = _check_quota_error(exc)
|
| 284 |
+
if svc:
|
| 285 |
+
_mark_quota_exceeded(svc)
|
| 286 |
+
return await _llm_demo_response(text)
|
| 287 |
+
logger.exception("LLM error")
|
| 288 |
+
return f"Error processing command: {exc}"
|
| 289 |
+
|
| 290 |
+
|
| 291 |
+
async def _llm_demo_response(text: str) -> str:
|
| 292 |
+
"""Return a plausible demo response when Gemini quota is exhausted."""
|
| 293 |
+
t = text.lower()
|
| 294 |
+
if "scan" in t:
|
| 295 |
+
return ("I found 4 hazardous items on the conveyor belt: "
|
| 296 |
+
"2 flammable and 2 chemical. [Demo mode — Gemini quota exceeded]")
|
| 297 |
+
if "sort" in t or "pick" in t or "place" in t:
|
| 298 |
+
return ("Sorting all hazardous items into their respective bins. "
|
| 299 |
+
"[Demo mode — Gemini quota exceeded]")
|
| 300 |
+
if "state" in t or "status" in t:
|
| 301 |
+
state = _state_impl()
|
| 302 |
+
return (f"Simulation time: {state['time']}s. "
|
| 303 |
+
f"Items sorted: {state['items_sorted']}. "
|
| 304 |
+
f"Arm busy: {state['arm_busy']}. [Demo mode]")
|
| 305 |
+
return "I'm SemSorter AI. Ask me to scan or sort items! [Demo mode]"
|
| 306 |
+
|
| 307 |
+
|
| 308 |
+
# ── TTS helper ────────────────────────────────────────────────────────────────
|
| 309 |
+
|
| 310 |
+
async def text_to_speech(text: str) -> Optional[bytes]:
|
| 311 |
+
"""
|
| 312 |
+
Convert text to audio bytes using ElevenLabs (Vision-Agents SDK plugin).
|
| 313 |
+
Returns None on quota error (frontend falls back to browser SpeechSynthesis).
|
| 314 |
+
"""
|
| 315 |
+
if _quota_exceeded["elevenlabs"]:
|
| 316 |
+
return None
|
| 317 |
+
try:
|
| 318 |
+
from vision_agents.plugins.elevenlabs.elevenlabs_tts import ElevenLabsTTS
|
| 319 |
+
tts = ElevenLabsTTS(model_id="eleven_flash_v2_5")
|
| 320 |
+
audio_bytes = await tts.synthesize(text)
|
| 321 |
+
return audio_bytes
|
| 322 |
+
except Exception as exc:
|
| 323 |
+
svc = _check_quota_error(exc)
|
| 324 |
+
if svc == "elevenlabs" or svc == "unknown":
|
| 325 |
+
_mark_quota_exceeded("elevenlabs")
|
| 326 |
+
else:
|
| 327 |
+
logger.exception("TTS error")
|
| 328 |
+
return None
|
| 329 |
+
|
| 330 |
+
|
| 331 |
+
# ── STT helper (Deepgram) ─────────────────────────────────────────────────────
|
| 332 |
+
|
| 333 |
+
async def transcribe_audio(audio_bytes: bytes, mime: str = "audio/webm") -> Optional[str]:
|
| 334 |
+
"""
|
| 335 |
+
Transcribe audio using Deepgram STT (Vision-Agents SDK plugin).
|
| 336 |
+
Returns None on quota error (frontend falls back to Web Speech API result).
|
| 337 |
+
"""
|
| 338 |
+
if _quota_exceeded["deepgram"]:
|
| 339 |
+
return None
|
| 340 |
+
try:
|
| 341 |
+
import httpx, os
|
| 342 |
+
api_key = os.environ.get("DEEPGRAM_API_KEY", "")
|
| 343 |
+
if not api_key:
|
| 344 |
+
return None
|
| 345 |
+
async with httpx.AsyncClient() as client:
|
| 346 |
+
resp = await client.post(
|
| 347 |
+
"https://api.deepgram.com/v1/listen?model=nova-2",
|
| 348 |
+
headers={"Authorization": f"Token {api_key}",
|
| 349 |
+
"Content-Type": mime},
|
| 350 |
+
content=audio_bytes,
|
| 351 |
+
timeout=10,
|
| 352 |
+
)
|
| 353 |
+
if resp.status_code == 429:
|
| 354 |
+
_mark_quota_exceeded("deepgram")
|
| 355 |
+
return None
|
| 356 |
+
data = resp.json()
|
| 357 |
+
return (data.get("results", {})
|
| 358 |
+
.get("channels", [{}])[0]
|
| 359 |
+
.get("alternatives", [{}])[0]
|
| 360 |
+
.get("transcript", ""))
|
| 361 |
+
except Exception as exc:
|
| 362 |
+
logger.warning("Deepgram STT error: %s", exc)
|
| 363 |
+
return None
|
SemSorter/server/app.py
ADDED
|
@@ -0,0 +1,207 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
SemSorter FastAPI Server
|
| 3 |
+
========================
|
| 4 |
+
Serves the web UI and bridges the Vision-Agents SDK + MuJoCo simulation.
|
| 5 |
+
|
| 6 |
+
Endpoints
|
| 7 |
+
---------
|
| 8 |
+
GET / → index.html
|
| 9 |
+
WS /ws/video → MJPEG frames (~10 fps) from MuJoCo renderer
|
| 10 |
+
WS /ws/chat → bidirectional: text commands → agent responses + events
|
| 11 |
+
GET /api/state → current simulation state JSON
|
| 12 |
+
POST /api/sort → trigger sort_all_hazards pipeline
|
| 13 |
+
POST /api/command → send a text command to the agent
|
| 14 |
+
POST /api/transcribe → transcribe uploaded audio via Deepgram
|
| 15 |
+
|
| 16 |
+
Run locally:
|
| 17 |
+
cd SemSorter && MUJOCO_GL=egl uvicorn server.app:app --host 0.0.0.0 --port 8000 --reload
|
| 18 |
+
"""
|
| 19 |
+
|
| 20 |
+
import asyncio
|
| 21 |
+
import base64
|
| 22 |
+
import io
|
| 23 |
+
import json
|
| 24 |
+
import logging
|
| 25 |
+
import os
|
| 26 |
+
from pathlib import Path
|
| 27 |
+
from typing import Set
|
| 28 |
+
|
| 29 |
+
from fastapi import FastAPI, WebSocket, WebSocketDisconnect, UploadFile, File
|
| 30 |
+
from fastapi.responses import HTMLResponse, JSONResponse
|
| 31 |
+
from fastapi.staticfiles import StaticFiles
|
| 32 |
+
import numpy as np
|
| 33 |
+
from PIL import Image
|
| 34 |
+
|
| 35 |
+
# ── Local imports ─────────────────────────────────────────────────────────────
|
| 36 |
+
from . import agent_bridge as bridge
|
| 37 |
+
|
| 38 |
+
logging.basicConfig(level=logging.INFO,
|
| 39 |
+
format="%(asctime)s %(levelname)s %(name)s %(message)s")
|
| 40 |
+
logger = logging.getLogger(__name__)
|
| 41 |
+
|
| 42 |
+
app = FastAPI(title="SemSorter", version="1.0")
|
| 43 |
+
|
| 44 |
+
# ── Static files ──────────────────────────────────────────────────────────────
|
| 45 |
+
_STATIC = Path(__file__).parent / "static"
|
| 46 |
+
_STATIC.mkdir(exist_ok=True)
|
| 47 |
+
|
| 48 |
+
# ── Connected WebSocket clients ───────────────────────────────────────────────
|
| 49 |
+
_chat_clients: Set[WebSocket] = set()
|
| 50 |
+
_video_clients: Set[WebSocket] = set()
|
| 51 |
+
|
| 52 |
+
|
| 53 |
+
async def _broadcast_chat(event: dict) -> None:
|
| 54 |
+
"""Push a JSON event to all connected chat WebSocket clients."""
|
| 55 |
+
payload = json.dumps(event)
|
| 56 |
+
dead = set()
|
| 57 |
+
for ws in list(_chat_clients):
|
| 58 |
+
try:
|
| 59 |
+
await ws.send_text(payload)
|
| 60 |
+
except Exception:
|
| 61 |
+
dead.add(ws)
|
| 62 |
+
_chat_clients -= dead
|
| 63 |
+
|
| 64 |
+
|
| 65 |
+
def _sync_broadcast(event: dict) -> None:
|
| 66 |
+
"""Thread-safe push called from sync code (bridge callbacks)."""
|
| 67 |
+
try:
|
| 68 |
+
loop = asyncio.get_event_loop()
|
| 69 |
+
if loop.is_running():
|
| 70 |
+
asyncio.create_task(_broadcast_chat(event))
|
| 71 |
+
except Exception:
|
| 72 |
+
pass
|
| 73 |
+
|
| 74 |
+
|
| 75 |
+
# Register the broadcast callback so agent_bridge can push quota warnings
|
| 76 |
+
bridge.set_notify_callback(_sync_broadcast)
|
| 77 |
+
|
| 78 |
+
|
| 79 |
+
# ── Startup: pre-warm simulation ──────────────────────────────────────────────
|
| 80 |
+
@app.on_event("startup")
|
| 81 |
+
async def startup():
|
| 82 |
+
logger.info("Pre-warming MuJoCo simulation…")
|
| 83 |
+
loop = asyncio.get_event_loop()
|
| 84 |
+
await loop.run_in_executor(None, bridge.get_simulation)
|
| 85 |
+
logger.info("Simulation ready")
|
| 86 |
+
|
| 87 |
+
|
| 88 |
+
# ── REST endpoints ────────────────────────────────────────────────────────────
|
| 89 |
+
|
| 90 |
+
@app.get("/", response_class=HTMLResponse)
|
| 91 |
+
async def index():
|
| 92 |
+
html_path = _STATIC / "index.html"
|
| 93 |
+
return HTMLResponse(html_path.read_text())
|
| 94 |
+
|
| 95 |
+
|
| 96 |
+
@app.get("/api/state")
|
| 97 |
+
async def api_state():
|
| 98 |
+
loop = asyncio.get_event_loop()
|
| 99 |
+
state = await loop.run_in_executor(None, bridge._state_impl)
|
| 100 |
+
return JSONResponse(state)
|
| 101 |
+
|
| 102 |
+
|
| 103 |
+
@app.post("/api/sort")
|
| 104 |
+
async def api_sort():
|
| 105 |
+
"""Trigger the full detect-match-sort pipeline."""
|
| 106 |
+
result = await bridge._sort_all_impl()
|
| 107 |
+
await _broadcast_chat({"type": "sort_result", "data": result})
|
| 108 |
+
return JSONResponse(result)
|
| 109 |
+
|
| 110 |
+
|
| 111 |
+
@app.post("/api/command")
|
| 112 |
+
async def api_command(body: dict):
|
| 113 |
+
text = body.get("text", "").strip()
|
| 114 |
+
if not text:
|
| 115 |
+
return JSONResponse({"error": "empty command"}, status_code=400)
|
| 116 |
+
response_text = await bridge.process_text_command(text)
|
| 117 |
+
await _broadcast_chat({"type": "agent_response", "text": response_text})
|
| 118 |
+
return JSONResponse({"response": response_text})
|
| 119 |
+
|
| 120 |
+
|
| 121 |
+
@app.post("/api/transcribe")
|
| 122 |
+
async def api_transcribe(file: UploadFile = File(...)):
|
| 123 |
+
"""Transcribe uploaded audio using Deepgram; returns transcript or null."""
|
| 124 |
+
audio_bytes = await file.read()
|
| 125 |
+
transcript = await bridge.transcribe_audio(audio_bytes, mime=file.content_type)
|
| 126 |
+
return JSONResponse({"transcript": transcript})
|
| 127 |
+
|
| 128 |
+
|
| 129 |
+
# ── WebSocket: chat ───────────────────────────────────────────────────────────
|
| 130 |
+
|
| 131 |
+
@app.websocket("/ws/chat")
|
| 132 |
+
async def ws_chat(ws: WebSocket):
|
| 133 |
+
await ws.accept()
|
| 134 |
+
_chat_clients.add(ws)
|
| 135 |
+
logger.info("Chat client connected (%d total)", len(_chat_clients))
|
| 136 |
+
try:
|
| 137 |
+
await ws.send_text(json.dumps({
|
| 138 |
+
"type": "welcome",
|
| 139 |
+
"text": "Connected to SemSorter AI. Ask me to scan or sort items!",
|
| 140 |
+
}))
|
| 141 |
+
while True:
|
| 142 |
+
raw = await ws.receive_text()
|
| 143 |
+
try:
|
| 144 |
+
msg = json.loads(raw)
|
| 145 |
+
except json.JSONDecodeError:
|
| 146 |
+
msg = {"type": "command", "text": raw}
|
| 147 |
+
|
| 148 |
+
msg_type = msg.get("type", "command")
|
| 149 |
+
|
| 150 |
+
if msg_type == "command":
|
| 151 |
+
text = msg.get("text", "").strip()
|
| 152 |
+
if text:
|
| 153 |
+
await _broadcast_chat({"type": "user_message", "text": text})
|
| 154 |
+
response = await bridge.process_text_command(text)
|
| 155 |
+
await _broadcast_chat({"type": "agent_response", "text": response})
|
| 156 |
+
|
| 157 |
+
elif msg_type == "scan":
|
| 158 |
+
result = await bridge._scan_hazards_impl()
|
| 159 |
+
await _broadcast_chat({"type": "scan_result", "data": result})
|
| 160 |
+
|
| 161 |
+
elif msg_type == "sort":
|
| 162 |
+
result = await bridge._sort_all_impl()
|
| 163 |
+
await _broadcast_chat({"type": "sort_result", "data": result})
|
| 164 |
+
|
| 165 |
+
elif msg_type == "state":
|
| 166 |
+
loop = asyncio.get_event_loop()
|
| 167 |
+
state = await loop.run_in_executor(None, bridge._state_impl)
|
| 168 |
+
await ws.send_text(json.dumps({"type": "state", "data": state}))
|
| 169 |
+
|
| 170 |
+
except WebSocketDisconnect:
|
| 171 |
+
pass
|
| 172 |
+
finally:
|
| 173 |
+
_chat_clients.discard(ws)
|
| 174 |
+
logger.info("Chat client disconnected (%d remaining)", len(_chat_clients))
|
| 175 |
+
|
| 176 |
+
|
| 177 |
+
# ── WebSocket: live video stream ──────────────────────────────────────────────
|
| 178 |
+
|
| 179 |
+
def _render_frame_jpeg(quality: int = 75) -> bytes:
|
| 180 |
+
"""Render a MuJoCo frame and encode as JPEG bytes."""
|
| 181 |
+
sim = bridge.get_simulation()
|
| 182 |
+
frame = sim.render_frame(camera="overview") # numpy H×W×3
|
| 183 |
+
img = Image.fromarray(frame)
|
| 184 |
+
buf = io.BytesIO()
|
| 185 |
+
img.save(buf, format="JPEG", quality=quality)
|
| 186 |
+
return buf.getvalue()
|
| 187 |
+
|
| 188 |
+
|
| 189 |
+
@app.websocket("/ws/video")
|
| 190 |
+
async def ws_video(ws: WebSocket):
|
| 191 |
+
await ws.accept()
|
| 192 |
+
_video_clients.add(ws)
|
| 193 |
+
logger.info("Video client connected")
|
| 194 |
+
try:
|
| 195 |
+
loop = asyncio.get_event_loop()
|
| 196 |
+
while True:
|
| 197 |
+
jpeg_bytes = await loop.run_in_executor(None, _render_frame_jpeg)
|
| 198 |
+
b64 = base64.b64encode(jpeg_bytes).decode()
|
| 199 |
+
await ws.send_text(json.dumps({"type": "frame", "data": b64}))
|
| 200 |
+
await asyncio.sleep(0.1) # ~10 fps
|
| 201 |
+
except WebSocketDisconnect:
|
| 202 |
+
pass
|
| 203 |
+
except Exception as e:
|
| 204 |
+
logger.warning("Video stream error: %s", e)
|
| 205 |
+
finally:
|
| 206 |
+
_video_clients.discard(ws)
|
| 207 |
+
logger.info("Video client disconnected")
|
SemSorter/server/static/index.html
ADDED
|
@@ -0,0 +1,427 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!DOCTYPE html>
|
| 2 |
+
<html lang="en">
|
| 3 |
+
<head>
|
| 4 |
+
<meta charset="UTF-8"/>
|
| 5 |
+
<meta name="viewport" content="width=device-width, initial-scale=1.0"/>
|
| 6 |
+
<title>SemSorter — AI Hazard Sorting System</title>
|
| 7 |
+
<link rel="preconnect" href="https://fonts.googleapis.com"/>
|
| 8 |
+
<link href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&family=JetBrains+Mono:wght@400;500&display=swap" rel="stylesheet"/>
|
| 9 |
+
<style>
|
| 10 |
+
/* ── Design tokens ── */
|
| 11 |
+
:root{
|
| 12 |
+
--bg:#0a0d14;--surface:#111827;--surface2:#1a2235;--border:#1e2d45;
|
| 13 |
+
--accent:#3b82f6;--accent-glow:rgba(59,130,246,.35);
|
| 14 |
+
--success:#22c55e;--warning:#f59e0b;--danger:#ef4444;--chemical:#a78bfa;
|
| 15 |
+
--text:#e2e8f0;--text-muted:#64748b;--text-dim:#94a3b8;
|
| 16 |
+
--radius:12px;--radius-sm:8px;
|
| 17 |
+
--font:'Inter',system-ui,sans-serif;--mono:'JetBrains Mono',monospace;
|
| 18 |
+
}
|
| 19 |
+
*{box-sizing:border-box;margin:0;padding:0}
|
| 20 |
+
body{background:var(--bg);color:var(--text);font-family:var(--font);min-height:100vh;
|
| 21 |
+
display:grid;grid-template-rows:auto 1fr;overflow:hidden;height:100vh}
|
| 22 |
+
|
| 23 |
+
/* ── Header ── */
|
| 24 |
+
header{display:flex;align-items:center;justify-content:space-between;
|
| 25 |
+
padding:14px 24px;background:var(--surface);border-bottom:1px solid var(--border);
|
| 26 |
+
backdrop-filter:blur(8px);}
|
| 27 |
+
.logo{display:flex;align-items:center;gap:10px}
|
| 28 |
+
.logo-icon{width:36px;height:36px;background:linear-gradient(135deg,var(--accent),#8b5cf6);
|
| 29 |
+
border-radius:9px;display:flex;align-items:center;justify-content:center;font-size:18px}
|
| 30 |
+
.logo-text{font-weight:700;font-size:18px;letter-spacing:-.3px}
|
| 31 |
+
.logo-sub{font-size:11px;color:var(--text-muted);font-weight:400;margin-top:1px}
|
| 32 |
+
.header-status{display:flex;align-items:center;gap:8px;font-size:13px}
|
| 33 |
+
.dot{width:8px;height:8px;border-radius:50%;background:var(--success);
|
| 34 |
+
box-shadow:0 0 8px var(--success);animation:pulse 2s infinite}
|
| 35 |
+
@keyframes pulse{0%,100%{opacity:1}50%{opacity:.5}}
|
| 36 |
+
|
| 37 |
+
/* ── Layout ── */
|
| 38 |
+
main{display:grid;grid-template-columns:1fr 380px;gap:0;overflow:hidden}
|
| 39 |
+
|
| 40 |
+
/* ── Left: Simulation panel ── */
|
| 41 |
+
.sim-panel{display:flex;flex-direction:column;padding:20px;gap:16px;overflow:hidden}
|
| 42 |
+
.sim-header{display:flex;align-items:center;justify-content:space-between}
|
| 43 |
+
.panel-title{font-size:13px;font-weight:600;color:var(--text-dim);text-transform:uppercase;letter-spacing:.8px}
|
| 44 |
+
.sim-container{flex:1;background:var(--surface);border:1px solid var(--border);
|
| 45 |
+
border-radius:var(--radius);overflow:hidden;position:relative;
|
| 46 |
+
display:flex;align-items:center;justify-content:center;min-height:300px}
|
| 47 |
+
#sim-video{width:100%;height:100%;object-fit:contain;display:block}
|
| 48 |
+
.sim-overlay{position:absolute;top:0;left:0;right:0;bottom:0;display:flex;
|
| 49 |
+
align-items:center;justify-content:center;background:rgba(10,13,20,.85);
|
| 50 |
+
flex-direction:column;gap:12px;transition:.3s}
|
| 51 |
+
.sim-overlay.hidden{opacity:0;pointer-events:none}
|
| 52 |
+
.spinner{width:40px;height:40px;border:3px solid var(--border);
|
| 53 |
+
border-top-color:var(--accent);border-radius:50%;animation:spin 1s linear infinite}
|
| 54 |
+
@keyframes spin{to{transform:rotate(360deg)}}
|
| 55 |
+
.sim-overlay p{color:var(--text-muted);font-size:14px}
|
| 56 |
+
|
| 57 |
+
/* ── Status cards ── */
|
| 58 |
+
.stats-row{display:grid;grid-template-columns:repeat(3,1fr);gap:12px}
|
| 59 |
+
.stat-card{background:var(--surface);border:1px solid var(--border);border-radius:var(--radius-sm);
|
| 60 |
+
padding:12px 16px}
|
| 61 |
+
.stat-label{font-size:11px;color:var(--text-muted);text-transform:uppercase;letter-spacing:.6px;margin-bottom:4px}
|
| 62 |
+
.stat-value{font-size:22px;font-weight:700;font-family:var(--mono)}
|
| 63 |
+
.stat-value.ok{color:var(--success)}
|
| 64 |
+
.stat-value.busy{color:var(--warning)}
|
| 65 |
+
|
| 66 |
+
/* ── Right: Agent panel ── */
|
| 67 |
+
.agent-panel{background:var(--surface);border-left:1px solid var(--border);
|
| 68 |
+
display:flex;flex-direction:column;overflow:hidden}
|
| 69 |
+
.agent-header{padding:16px 20px;border-bottom:1px solid var(--border);
|
| 70 |
+
display:flex;align-items:center;justify-content:space-between}
|
| 71 |
+
.agent-title{font-weight:600;font-size:15px}
|
| 72 |
+
.sdk-badge{background:rgba(59,130,246,.12);border:1px solid rgba(59,130,246,.3);
|
| 73 |
+
color:var(--accent);font-size:10px;font-weight:600;padding:2px 8px;
|
| 74 |
+
border-radius:20px;letter-spacing:.5px}
|
| 75 |
+
|
| 76 |
+
/* ── Transcript ── */
|
| 77 |
+
.transcript{flex:1;overflow-y:auto;padding:16px;display:flex;flex-direction:column;gap:10px;
|
| 78 |
+
scroll-behavior:smooth}
|
| 79 |
+
.transcript::-webkit-scrollbar{width:4px}
|
| 80 |
+
.transcript::-webkit-scrollbar-thumb{background:var(--border);border-radius:2px}
|
| 81 |
+
.msg{display:flex;flex-direction:column;gap:3px;animation:fadeIn .25s ease}
|
| 82 |
+
@keyframes fadeIn{from{opacity:0;transform:translateY(6px)}to{opacity:1;transform:none}}
|
| 83 |
+
.msg-role{font-size:10px;font-weight:600;text-transform:uppercase;letter-spacing:.6px;color:var(--text-muted)}
|
| 84 |
+
.msg-text{font-size:14px;line-height:1.55;padding:10px 13px;border-radius:10px;
|
| 85 |
+
background:var(--surface2);border:1px solid var(--border);max-width:100%}
|
| 86 |
+
.msg.user .msg-role{color:var(--accent)}
|
| 87 |
+
.msg.user .msg-text{background:rgba(59,130,246,.08);border-color:rgba(59,130,246,.2)}
|
| 88 |
+
.msg.agent .msg-role{color:var(--success)}
|
| 89 |
+
.msg.agent .msg-text{background:rgba(34,197,94,.06);border-color:rgba(34,197,94,.15)}
|
| 90 |
+
.msg.system .msg-role{color:var(--text-muted)}
|
| 91 |
+
.msg.system .msg-text{font-family:var(--mono);font-size:12px;background:var(--surface);
|
| 92 |
+
border-style:dashed;white-space:pre-wrap}
|
| 93 |
+
.msg.warning .msg-role{color:var(--warning)}
|
| 94 |
+
.msg.warning .msg-text{background:rgba(245,158,11,.08);border-color:rgba(245,158,11,.3)}
|
| 95 |
+
|
| 96 |
+
/* ── Quota warning banner ── */
|
| 97 |
+
#quota-banner{display:none;background:rgba(245,158,11,.1);border:1px solid rgba(245,158,11,.35);
|
| 98 |
+
border-radius:var(--radius-sm);margin:0 16px;padding:10px 14px;
|
| 99 |
+
font-size:12px;color:var(--warning);line-height:1.5}
|
| 100 |
+
#quota-banner.show{display:block}
|
| 101 |
+
|
| 102 |
+
/* ── Input area ── */
|
| 103 |
+
.input-area{padding:14px 16px;border-top:1px solid var(--border);display:flex;flex-direction:column;gap:10px}
|
| 104 |
+
.input-row{display:flex;gap:8px}
|
| 105 |
+
#cmd-input{flex:1;background:var(--surface2);border:1px solid var(--border);
|
| 106 |
+
border-radius:var(--radius-sm);padding:10px 14px;color:var(--text);
|
| 107 |
+
font-family:var(--font);font-size:14px;outline:none;transition:.2s}
|
| 108 |
+
#cmd-input:focus{border-color:var(--accent);box-shadow:0 0 0 3px var(--accent-glow)}
|
| 109 |
+
#cmd-input::placeholder{color:var(--text-muted)}
|
| 110 |
+
.btn{border:none;cursor:pointer;border-radius:var(--radius-sm);font-family:var(--font);
|
| 111 |
+
font-weight:600;font-size:13px;transition:.18s;display:flex;align-items:center;gap:6px;
|
| 112 |
+
white-space:nowrap}
|
| 113 |
+
.btn:active{transform:scale(.96)}
|
| 114 |
+
.btn-primary{background:var(--accent);color:#fff;padding:10px 18px}
|
| 115 |
+
.btn-primary:hover{background:#2563eb}
|
| 116 |
+
.btn-voice{background:var(--surface2);border:1px solid var(--border);color:var(--text);padding:10px 14px;font-size:16px}
|
| 117 |
+
.btn-voice.listening{background:rgba(239,68,68,.15);border-color:var(--danger);color:var(--danger);animation:pulse 1s infinite}
|
| 118 |
+
.btn-voice:hover{border-color:var(--accent)}
|
| 119 |
+
.action-btns{display:flex;gap:8px}
|
| 120 |
+
.btn-action{flex:1;padding:9px 12px;font-size:12px}
|
| 121 |
+
.btn-scan{background:rgba(59,130,246,.12);border:1px solid rgba(59,130,246,.3);color:var(--accent)}
|
| 122 |
+
.btn-scan:hover{background:rgba(59,130,246,.22)}
|
| 123 |
+
.btn-sort{background:rgba(34,197,94,.1);border:1px solid rgba(34,197,94,.3);color:var(--success)}
|
| 124 |
+
.btn-sort:hover{background:rgba(34,197,94,.2)}
|
| 125 |
+
.btn-state{background:rgba(167,139,250,.1);border:1px solid rgba(167,139,250,.3);color:var(--chemical)}
|
| 126 |
+
.btn-state:hover{background:rgba(167,139,250,.2)}
|
| 127 |
+
.stt-hint{font-size:11px;color:var(--text-muted);text-align:center}
|
| 128 |
+
|
| 129 |
+
/* ── Item list ── */
|
| 130 |
+
.items-section{padding:0 16px 10px;border-top:1px solid var(--border);padding-top:10px}
|
| 131 |
+
.items-title{font-size:11px;font-weight:600;color:var(--text-muted);text-transform:uppercase;
|
| 132 |
+
letter-spacing:.6px;margin-bottom:8px}
|
| 133 |
+
.item-list{display:flex;flex-direction:column;gap:5px;max-height:110px;overflow-y:auto}
|
| 134 |
+
.item-pill{display:flex;align-items:center;justify-content:space-between;
|
| 135 |
+
padding:5px 10px;border-radius:6px;font-size:12px;
|
| 136 |
+
background:var(--surface2);border:1px solid var(--border)}
|
| 137 |
+
.item-pill .name{font-family:var(--mono);color:var(--text-dim)}
|
| 138 |
+
.item-pill .badge{font-size:10px;font-weight:600;padding:2px 7px;border-radius:20px}
|
| 139 |
+
.badge.flammable{background:rgba(239,68,68,.15);color:#f87171;border:1px solid rgba(239,68,68,.25)}
|
| 140 |
+
.badge.chemical{background:rgba(167,139,250,.15);color:#c4b5fd;border:1px solid rgba(167,139,250,.25)}
|
| 141 |
+
.badge.safe{background:rgba(100,116,139,.15);color:#94a3b8;border:1px solid rgba(100,116,139,.25)}
|
| 142 |
+
.badge.sorted{background:rgba(34,197,94,.1);color:#4ade80;border:1px solid rgba(34,197,94,.2)}
|
| 143 |
+
</style>
|
| 144 |
+
</head>
|
| 145 |
+
<body>
|
| 146 |
+
|
| 147 |
+
<header>
|
| 148 |
+
<div class="logo">
|
| 149 |
+
<div class="logo-icon">🤖</div>
|
| 150 |
+
<div>
|
| 151 |
+
<div class="logo-text">SemSorter</div>
|
| 152 |
+
<div class="logo-sub">AI Hazard Sorting — Vision-Agents SDK</div>
|
| 153 |
+
</div>
|
| 154 |
+
</div>
|
| 155 |
+
<div class="header-status">
|
| 156 |
+
<div class="dot" id="conn-dot"></div>
|
| 157 |
+
<span id="conn-label">Connecting…</span>
|
| 158 |
+
</div>
|
| 159 |
+
</header>
|
| 160 |
+
|
| 161 |
+
<main>
|
| 162 |
+
<!-- Left: simulation video -->
|
| 163 |
+
<div class="sim-panel">
|
| 164 |
+
<div class="sim-header">
|
| 165 |
+
<span class="panel-title">Live Simulation Feed</span>
|
| 166 |
+
<span style="font-size:12px;color:var(--text-muted)" id="fps-label">— fps</span>
|
| 167 |
+
</div>
|
| 168 |
+
|
| 169 |
+
<div class="sim-container">
|
| 170 |
+
<img id="sim-video" alt="MuJoCo simulation" src=""/>
|
| 171 |
+
<div class="sim-overlay" id="sim-overlay">
|
| 172 |
+
<div class="spinner"></div>
|
| 173 |
+
<p>Warming up simulation…</p>
|
| 174 |
+
</div>
|
| 175 |
+
</div>
|
| 176 |
+
|
| 177 |
+
<div class="stats-row">
|
| 178 |
+
<div class="stat-card">
|
| 179 |
+
<div class="stat-label">Items Sorted</div>
|
| 180 |
+
<div class="stat-value ok" id="stat-sorted">0</div>
|
| 181 |
+
</div>
|
| 182 |
+
<div class="stat-card">
|
| 183 |
+
<div class="stat-label">Arm Status</div>
|
| 184 |
+
<div class="stat-value ok" id="stat-arm">Idle</div>
|
| 185 |
+
</div>
|
| 186 |
+
<div class="stat-card">
|
| 187 |
+
<div class="stat-label">Sim Time</div>
|
| 188 |
+
<div class="stat-value" id="stat-time" style="font-size:18px">0.0 s</div>
|
| 189 |
+
</div>
|
| 190 |
+
</div>
|
| 191 |
+
</div>
|
| 192 |
+
|
| 193 |
+
<!-- Right: agent chat panel -->
|
| 194 |
+
<div class="agent-panel">
|
| 195 |
+
<div class="agent-header">
|
| 196 |
+
<div class="agent-title">SemSorter AI</div>
|
| 197 |
+
<div class="sdk-badge">Vision-Agents SDK</div>
|
| 198 |
+
</div>
|
| 199 |
+
|
| 200 |
+
<div id="quota-banner"></div>
|
| 201 |
+
|
| 202 |
+
<div class="transcript" id="transcript"></div>
|
| 203 |
+
|
| 204 |
+
<div class="items-section">
|
| 205 |
+
<div class="items-title">Conveyor Items</div>
|
| 206 |
+
<div class="item-list" id="item-list"><span style="font-size:12px;color:var(--text-muted)">Loading…</span></div>
|
| 207 |
+
</div>
|
| 208 |
+
|
| 209 |
+
<div class="input-area">
|
| 210 |
+
<div class="action-btns">
|
| 211 |
+
<button class="btn btn-action btn-scan" onclick="sendWs('scan')">🔍 Scan</button>
|
| 212 |
+
<button class="btn btn-action btn-sort" onclick="sendWs('sort')">⚡ Sort All</button>
|
| 213 |
+
<button class="btn btn-action btn-state" onclick="sendWs('state')">📊 State</button>
|
| 214 |
+
</div>
|
| 215 |
+
|
| 216 |
+
<div class="input-row">
|
| 217 |
+
<input id="cmd-input" placeholder="Type a command…" autocomplete="off"
|
| 218 |
+
onkeydown="if(event.key==='Enter')sendCommand()"/>
|
| 219 |
+
<button class="btn btn-voice" id="voice-btn" onclick="toggleVoice()" title="Voice input">🎤</button>
|
| 220 |
+
<button class="btn btn-primary" onclick="sendCommand()">Send</button>
|
| 221 |
+
</div>
|
| 222 |
+
<div class="stt-hint" id="stt-hint">Using browser speech recognition</div>
|
| 223 |
+
</div>
|
| 224 |
+
</div>
|
| 225 |
+
</main>
|
| 226 |
+
|
| 227 |
+
<script>
|
| 228 |
+
// ─── WebSocket connections ────────────────────────────────────────────────────
|
| 229 |
+
const WS_BASE = `${location.protocol === 'https:' ? 'wss' : 'ws'}://${location.host}`;
|
| 230 |
+
let chatWs = null;
|
| 231 |
+
let videoWs = null;
|
| 232 |
+
let reconnectDelay = 1000;
|
| 233 |
+
|
| 234 |
+
// ─── State ────────────────────────────────────────────────────────────────────
|
| 235 |
+
let frameCount = 0, lastFpsTime = Date.now();
|
| 236 |
+
let listening = false;
|
| 237 |
+
let recognition = null;
|
| 238 |
+
|
| 239 |
+
// ─── Chat WebSocket ───────────────────────────────────────────────────────────
|
| 240 |
+
function connectChat() {
|
| 241 |
+
chatWs = new WebSocket(`${WS_BASE}/ws/chat`);
|
| 242 |
+
chatWs.onopen = () => {
|
| 243 |
+
setConnected(true);
|
| 244 |
+
reconnectDelay = 1000;
|
| 245 |
+
pollState();
|
| 246 |
+
};
|
| 247 |
+
chatWs.onmessage = ({data}) => handleChatMessage(JSON.parse(data));
|
| 248 |
+
chatWs.onclose = () => {
|
| 249 |
+
setConnected(false);
|
| 250 |
+
setTimeout(connectChat, reconnectDelay = Math.min(reconnectDelay * 1.5, 10000));
|
| 251 |
+
};
|
| 252 |
+
chatWs.onerror = () => chatWs.close();
|
| 253 |
+
}
|
| 254 |
+
|
| 255 |
+
function handleChatMessage(msg) {
|
| 256 |
+
switch(msg.type) {
|
| 257 |
+
case 'welcome': addMsg('agent', 'SemSorter AI', msg.text); break;
|
| 258 |
+
case 'user_message': addMsg('user', 'You', msg.text); break;
|
| 259 |
+
case 'agent_response': addMsg('agent','SemSorter AI', msg.text); break;
|
| 260 |
+
case 'scan_result': renderScanResult(msg.data); break;
|
| 261 |
+
case 'sort_result': renderSortResult(msg.data); break;
|
| 262 |
+
case 'state': renderState(msg.data); break;
|
| 263 |
+
case 'quota_warning': showQuotaWarning(msg.service, msg.message); break;
|
| 264 |
+
case 'system': addMsg('system', 'System', msg.text); break;
|
| 265 |
+
}
|
| 266 |
+
}
|
| 267 |
+
|
| 268 |
+
function sendWs(type, extra={}) {
|
| 269 |
+
if (!chatWs || chatWs.readyState !== 1) return;
|
| 270 |
+
chatWs.send(JSON.stringify({type, ...extra}));
|
| 271 |
+
}
|
| 272 |
+
|
| 273 |
+
function sendCommand() {
|
| 274 |
+
const input = document.getElementById('cmd-input');
|
| 275 |
+
const text = input.value.trim();
|
| 276 |
+
if (!text) return;
|
| 277 |
+
input.value = '';
|
| 278 |
+
sendWs('command', {text});
|
| 279 |
+
}
|
| 280 |
+
|
| 281 |
+
// ─── Video WebSocket ──────────────────────────────────────────────────────────
|
| 282 |
+
function connectVideo() {
|
| 283 |
+
videoWs = new WebSocket(`${WS_BASE}/ws/video`);
|
| 284 |
+
videoWs.onmessage = ({data}) => {
|
| 285 |
+
const {type, data: b64} = JSON.parse(data);
|
| 286 |
+
if (type === 'frame') {
|
| 287 |
+
const img = document.getElementById('sim-video');
|
| 288 |
+
img.src = `data:image/jpeg;base64,${b64}`;
|
| 289 |
+
document.getElementById('sim-overlay').classList.add('hidden');
|
| 290 |
+
frameCount++;
|
| 291 |
+
}
|
| 292 |
+
};
|
| 293 |
+
videoWs.onclose = () => setTimeout(connectVideo, 2000);
|
| 294 |
+
videoWs.onerror = () => videoWs.close();
|
| 295 |
+
}
|
| 296 |
+
|
| 297 |
+
// FPS counter
|
| 298 |
+
setInterval(() => {
|
| 299 |
+
const now = Date.now();
|
| 300 |
+
const fps = Math.round(frameCount / ((now - lastFpsTime) / 1000));
|
| 301 |
+
document.getElementById('fps-label').textContent = `${fps} fps`;
|
| 302 |
+
frameCount = 0; lastFpsTime = now;
|
| 303 |
+
}, 2000);
|
| 304 |
+
|
| 305 |
+
// ─── State polling ────────────────────────────────────────────────────────────
|
| 306 |
+
function pollState() {
|
| 307 |
+
fetch('/api/state').then(r => r.json()).then(renderState).catch(()=>{});
|
| 308 |
+
setTimeout(pollState, 3000);
|
| 309 |
+
}
|
| 310 |
+
|
| 311 |
+
function renderState(s) {
|
| 312 |
+
document.getElementById('stat-sorted').textContent = s.items_sorted ?? 0;
|
| 313 |
+
const armEl = document.getElementById('stat-arm');
|
| 314 |
+
armEl.textContent = s.arm_busy ? 'Busy' : 'Idle';
|
| 315 |
+
armEl.className = `stat-value ${s.arm_busy ? 'busy' : 'ok'}`;
|
| 316 |
+
document.getElementById('stat-time').textContent = `${s.time ?? 0} s`;
|
| 317 |
+
if (s.items) renderItems(s.items);
|
| 318 |
+
if (s.quota_exceeded) Object.entries(s.quota_exceeded).forEach(([svc, exceeded]) => {
|
| 319 |
+
if (exceeded) showQuotaWarning(svc, `⚠️ ${svc} quota exceeded — demo mode active`);
|
| 320 |
+
});
|
| 321 |
+
}
|
| 322 |
+
|
| 323 |
+
function renderItems(items) {
|
| 324 |
+
const list = document.getElementById('item-list');
|
| 325 |
+
list.innerHTML = items.map(i => {
|
| 326 |
+
const type = i.hazard_type || 'safe';
|
| 327 |
+
const cls = i.picked ? 'sorted' : type.toLowerCase();
|
| 328 |
+
const label = i.picked ? '✓ sorted' : type;
|
| 329 |
+
return `<div class="item-pill">
|
| 330 |
+
<span class="name">${i.name}</span>
|
| 331 |
+
<span class="badge ${cls}">${label}</span>
|
| 332 |
+
</div>`;
|
| 333 |
+
}).join('');
|
| 334 |
+
}
|
| 335 |
+
|
| 336 |
+
// ─── Scan / sort renderers ────────────────────────────────────────────────────
|
| 337 |
+
function renderScanResult(d) {
|
| 338 |
+
const demoNote = d.demo_mode ? ' [demo mode]' : '';
|
| 339 |
+
const lines = [`Found ${d.hazards_found} hazardous item(s)${demoNote}:`];
|
| 340 |
+
(d.items||[]).forEach(i =>
|
| 341 |
+
lines.push(` • ${i.item_name} (${i.type}) → ${i.bin_type} bin`));
|
| 342 |
+
addMsg('system', 'Scan Result', lines.join('\n'));
|
| 343 |
+
}
|
| 344 |
+
|
| 345 |
+
function renderSortResult(d) {
|
| 346 |
+
const demoNote = d.demo_mode ? ' [demo mode]' : '';
|
| 347 |
+
const lines = [
|
| 348 |
+
`Sorted ${d.items_sorted}/${d.items_matched} item(s)${demoNote}:`,
|
| 349 |
+
...(d.details||[]).map(x => ` ${x.success ? '✅' : '❌'} ${x.item} → ${x.bin}`)
|
| 350 |
+
];
|
| 351 |
+
addMsg('system', 'Sort Result', lines.join('\n'));
|
| 352 |
+
}
|
| 353 |
+
|
| 354 |
+
// ─── Quota warning ────────────────────────────────────────────────────────────
|
| 355 |
+
const _shownWarnings = new Set();
|
| 356 |
+
function showQuotaWarning(service, message) {
|
| 357 |
+
if (_shownWarnings.has(service)) return;
|
| 358 |
+
_shownWarnings.add(service);
|
| 359 |
+
const banner = document.getElementById('quota-banner');
|
| 360 |
+
banner.textContent = message;
|
| 361 |
+
banner.classList.add('show');
|
| 362 |
+
addMsg('warning', 'API Status', message);
|
| 363 |
+
}
|
| 364 |
+
|
| 365 |
+
// ─── Transcript helpers ───────────────────────────────────────────────────────
|
| 366 |
+
function addMsg(cls, role, text) {
|
| 367 |
+
const t = document.getElementById('transcript');
|
| 368 |
+
const div = document.createElement('div');
|
| 369 |
+
div.className = `msg ${cls}`;
|
| 370 |
+
div.innerHTML = `<div class="msg-role">${role}</div><div class="msg-text">${escHtml(text)}</div>`;
|
| 371 |
+
t.appendChild(div);
|
| 372 |
+
t.scrollTop = t.scrollHeight;
|
| 373 |
+
}
|
| 374 |
+
function escHtml(s){ return s.replace(/&/g,'&').replace(/</g,'<').replace(/>/g,'>').replace(/\n/g,'<br>'); }
|
| 375 |
+
|
| 376 |
+
// ─── Connection status ────────────────────────────────────────────────────────
|
| 377 |
+
function setConnected(ok) {
|
| 378 |
+
const dot = document.getElementById('conn-dot');
|
| 379 |
+
const lbl = document.getElementById('conn-label');
|
| 380 |
+
dot.style.background = ok ? 'var(--success)' : 'var(--danger)';
|
| 381 |
+
dot.style.boxShadow = ok ? '0 0 8px var(--success)' : '0 0 8px var(--danger)';
|
| 382 |
+
dot.style.animation = ok ? 'pulse 2s infinite' : 'none';
|
| 383 |
+
lbl.textContent = ok ? 'Connected' : 'Reconnecting…';
|
| 384 |
+
}
|
| 385 |
+
|
| 386 |
+
// ─── Voice input (Web Speech API) ────────────────────────────────────────────
|
| 387 |
+
function toggleVoice() {
|
| 388 |
+
const SpeechRec = window.SpeechRecognition || window.webkitSpeechRecognition;
|
| 389 |
+
if (!SpeechRec) {
|
| 390 |
+
addMsg('system','System','Browser speech recognition not supported. Use the text input.');
|
| 391 |
+
return;
|
| 392 |
+
}
|
| 393 |
+
if (listening) { recognition.stop(); return; }
|
| 394 |
+
|
| 395 |
+
recognition = new SpeechRec();
|
| 396 |
+
recognition.lang = 'en-US';
|
| 397 |
+
recognition.interimResults = false;
|
| 398 |
+
recognition.onresult = e => {
|
| 399 |
+
const text = e.results[0][0].transcript;
|
| 400 |
+
document.getElementById('cmd-input').value = text;
|
| 401 |
+
document.getElementById('stt-hint').textContent =
|
| 402 |
+
`Heard: "${text}" — sending…`;
|
| 403 |
+
sendCommand();
|
| 404 |
+
};
|
| 405 |
+
recognition.onend = () => {
|
| 406 |
+
listening = false;
|
| 407 |
+
document.getElementById('voice-btn').classList.remove('listening');
|
| 408 |
+
document.getElementById('stt-hint').textContent = 'Using browser speech recognition';
|
| 409 |
+
};
|
| 410 |
+
recognition.onerror = e => {
|
| 411 |
+
addMsg('system','STT',`Speech recognition error: ${e.error}`);
|
| 412 |
+
listening = false;
|
| 413 |
+
document.getElementById('voice-btn').classList.remove('listening');
|
| 414 |
+
};
|
| 415 |
+
recognition.start();
|
| 416 |
+
listening = true;
|
| 417 |
+
document.getElementById('voice-btn').classList.add('listening');
|
| 418 |
+
document.getElementById('stt-hint').textContent = '🎙️ Listening… speak now';
|
| 419 |
+
}
|
| 420 |
+
|
| 421 |
+
// ─── Boot ─────────────────────────────────────────────────────────────────────
|
| 422 |
+
setConnected(false);
|
| 423 |
+
connectChat();
|
| 424 |
+
connectVideo();
|
| 425 |
+
</script>
|
| 426 |
+
</body>
|
| 427 |
+
</html>
|
SemSorter/simulation/__init__.py
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
# SemSorter Simulation Module
|
SemSorter/simulation/controller.py
ADDED
|
@@ -0,0 +1,786 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
SemSorter MuJoCo Simulation Controller
|
| 3 |
+
|
| 4 |
+
This module manages the Franka Panda robotic arm simulation for the SemSorter
|
| 5 |
+
project. It loads the Panda from mujoco_menagerie, adds conveyors, waste bins,
|
| 6 |
+
and hazardous items, then provides an async API for pick-and-place operations.
|
| 7 |
+
|
| 8 |
+
Usage:
|
| 9 |
+
python controller.py # Launch interactive viewer
|
| 10 |
+
python controller.py --render # Render a test frame to PNG
|
| 11 |
+
"""
|
| 12 |
+
|
| 13 |
+
import asyncio
|
| 14 |
+
import json
|
| 15 |
+
import logging
|
| 16 |
+
import math
|
| 17 |
+
import os
|
| 18 |
+
import time
|
| 19 |
+
from dataclasses import dataclass, field
|
| 20 |
+
from enum import Enum
|
| 21 |
+
from pathlib import Path
|
| 22 |
+
from typing import Dict, List, Optional, Tuple
|
| 23 |
+
|
| 24 |
+
import mujoco
|
| 25 |
+
import mujoco.viewer
|
| 26 |
+
import numpy as np
|
| 27 |
+
|
| 28 |
+
logger = logging.getLogger(__name__)
|
| 29 |
+
|
| 30 |
+
# ─── Path configuration ─────────────────────────────────────────────────────
|
| 31 |
+
PROJECT_ROOT = Path(__file__).resolve().parent.parent.parent # SemSorter/
|
| 32 |
+
MENAGERIE_DIR = PROJECT_ROOT / "mujoco_menagerie"
|
| 33 |
+
PANDA_SCENE = MENAGERIE_DIR / "franka_emika_panda" / "scene.xml"
|
| 34 |
+
|
| 35 |
+
|
| 36 |
+
# ─── Data types ──────────────────────────────────────────────────────────────
|
| 37 |
+
class BinType(str, Enum):
|
| 38 |
+
FLAMMABLE = "flammable"
|
| 39 |
+
CHEMICAL = "chemical"
|
| 40 |
+
OUTPUT = "output" # safe items go to output conveyor
|
| 41 |
+
|
| 42 |
+
|
| 43 |
+
@dataclass
|
| 44 |
+
class ItemInfo:
|
| 45 |
+
"""Metadata for a conveyor item."""
|
| 46 |
+
name: str
|
| 47 |
+
body_id: int
|
| 48 |
+
geom_id: int
|
| 49 |
+
is_hazardous: bool
|
| 50 |
+
hazard_type: Optional[BinType] = None # which bin it should go to
|
| 51 |
+
picked: bool = False
|
| 52 |
+
|
| 53 |
+
|
| 54 |
+
@dataclass
|
| 55 |
+
class SimState:
|
| 56 |
+
"""Observable simulation state for the frontend."""
|
| 57 |
+
time: float = 0.0
|
| 58 |
+
ee_pos: Tuple[float, float, float] = (0, 0, 0)
|
| 59 |
+
gripper_open: bool = True
|
| 60 |
+
items: List[Dict] = field(default_factory=list)
|
| 61 |
+
arm_busy: bool = False
|
| 62 |
+
items_sorted: int = 0
|
| 63 |
+
|
| 64 |
+
|
| 65 |
+
# ─── Bin positions (world coordinates) ───────────────────────────────────────
|
| 66 |
+
BIN_POSITIONS = {
|
| 67 |
+
BinType.FLAMMABLE: np.array([-0.25, -0.40, 0.35]), # Above the red bin
|
| 68 |
+
BinType.CHEMICAL: np.array([0.25, -0.40, 0.35]), # Above the yellow bin
|
| 69 |
+
BinType.OUTPUT: np.array([0.40, 0.0, 0.40]), # Output conveyor
|
| 70 |
+
}
|
| 71 |
+
|
| 72 |
+
# ─── Panda joint configuration ──────────────────────────────────────────────
|
| 73 |
+
# Actuator indices (from panda.xml):
|
| 74 |
+
# 0-6: arm joints (actuator1-7)
|
| 75 |
+
# 7: gripper (actuator8, ctrl 0=closed, 255=fully open)
|
| 76 |
+
GRIPPER_ACTUATOR_ID = 7
|
| 77 |
+
GRIPPER_OPEN = 255.0
|
| 78 |
+
GRIPPER_CLOSED = 0.0
|
| 79 |
+
NUM_ARM_JOINTS = 7
|
| 80 |
+
ENV_CONTACT_TYPE = 2 # Keep environment/item contacts separate from robot links.
|
| 81 |
+
|
| 82 |
+
|
| 83 |
+
class SemSorterSimulation:
|
| 84 |
+
"""
|
| 85 |
+
MuJoCo simulation controller for the SemSorter pick-and-place task.
|
| 86 |
+
|
| 87 |
+
Loads the Franka Panda from menagerie, adds the warehouse environment
|
| 88 |
+
(conveyors, bins, items), and provides an async API for robot control.
|
| 89 |
+
"""
|
| 90 |
+
|
| 91 |
+
def __init__(self):
|
| 92 |
+
self.model: Optional[mujoco.MjModel] = None
|
| 93 |
+
self.data: Optional[mujoco.MjData] = None
|
| 94 |
+
self.renderer: Optional[mujoco.Renderer] = None
|
| 95 |
+
self.items: Dict[str, ItemInfo] = {}
|
| 96 |
+
self._arm_busy = False
|
| 97 |
+
self._items_sorted = 0
|
| 98 |
+
self._running = False
|
| 99 |
+
|
| 100 |
+
# ─── Scene loading ───────────────────────────────────────────────────
|
| 101 |
+
|
| 102 |
+
def load_scene(self) -> None:
|
| 103 |
+
"""Load the Panda scene from menagerie and add SemSorter objects."""
|
| 104 |
+
# Load base Panda scene
|
| 105 |
+
logger.info(f"Loading Panda from: {PANDA_SCENE}")
|
| 106 |
+
spec = mujoco.MjSpec.from_file(str(PANDA_SCENE))
|
| 107 |
+
|
| 108 |
+
# Modify the model name
|
| 109 |
+
spec.modelname = "semsorter"
|
| 110 |
+
|
| 111 |
+
# Set offscreen framebuffer size for rendering
|
| 112 |
+
spec.visual.global_.offwidth = 1920
|
| 113 |
+
spec.visual.global_.offheight = 1080
|
| 114 |
+
|
| 115 |
+
# ─── Add additional lights ───────────────────────────────────────
|
| 116 |
+
world = spec.worldbody
|
| 117 |
+
light = world.add_light()
|
| 118 |
+
light.pos = [0, -1, 2]
|
| 119 |
+
light.dir = [0, 0.5, -0.8]
|
| 120 |
+
light.diffuse = [0.4, 0.4, 0.4]
|
| 121 |
+
|
| 122 |
+
light2 = world.add_light()
|
| 123 |
+
light2.pos = [-1, -1, 2]
|
| 124 |
+
light2.dir = [0.3, 0.3, -0.8]
|
| 125 |
+
light2.diffuse = [0.3, 0.3, 0.3]
|
| 126 |
+
|
| 127 |
+
# ─── Add cameras ────────────────────────────────────────────────
|
| 128 |
+
cam_overview = world.add_camera()
|
| 129 |
+
cam_overview.name = "overview"
|
| 130 |
+
cam_overview.pos = [0, -1.4, 1.3]
|
| 131 |
+
cam_overview.quat = [0.92, 0.38, 0, 0] # Look slightly down
|
| 132 |
+
cam_overview.fovy = 50
|
| 133 |
+
|
| 134 |
+
cam_top = world.add_camera()
|
| 135 |
+
cam_top.name = "topdown"
|
| 136 |
+
cam_top.pos = [0, 0, 2.0]
|
| 137 |
+
cam_top.quat = [0.0, 0.0, 0.0, 1.0] # Look straight down
|
| 138 |
+
cam_top.fovy = 60
|
| 139 |
+
|
| 140 |
+
cam_side = world.add_camera()
|
| 141 |
+
cam_side.name = "side"
|
| 142 |
+
cam_side.pos = [1.5, 0, 0.8]
|
| 143 |
+
cam_side.quat = [0.65, 0.27, 0.27, 0.65] # Side view
|
| 144 |
+
cam_side.fovy = 45
|
| 145 |
+
|
| 146 |
+
# ─── Add conveyors ──────────────────────────────────────────────
|
| 147 |
+
self._add_conveyor(spec, "input", pos=[-0.40, 0, 0])
|
| 148 |
+
self._add_conveyor(spec, "output", pos=[0.40, 0, 0])
|
| 149 |
+
|
| 150 |
+
# ─── Add waste bins ─────────────────────────────────────────────
|
| 151 |
+
self._add_bin(spec, "flammable", pos=[-0.25, -0.40, 0],
|
| 152 |
+
color=[0.85, 0.15, 0.1, 0.9])
|
| 153 |
+
self._add_bin(spec, "chemical", pos=[0.25, -0.40, 0],
|
| 154 |
+
color=[0.95, 0.75, 0.1, 0.9])
|
| 155 |
+
|
| 156 |
+
# ─── Add hazardous items on input conveyor ──────────────────────
|
| 157 |
+
items_spec = [
|
| 158 |
+
("item_flammable_1", [-0.50, 0.0, 0.40], "cylinder", [0.025, 0.03],
|
| 159 |
+
[0.9, 0.1, 0.1, 1], True, BinType.FLAMMABLE),
|
| 160 |
+
("item_chemical_1", [-0.40, 0.05, 0.40], "box", [0.025, 0.025, 0.025],
|
| 161 |
+
[0.95, 0.85, 0.1, 1], True, BinType.CHEMICAL),
|
| 162 |
+
("item_chemical_2", [-0.30, -0.03, 0.40], "sphere", [0.025],
|
| 163 |
+
[0.95, 0.85, 0.1, 1], True, BinType.CHEMICAL),
|
| 164 |
+
("item_safe_1", [-0.35, -0.05, 0.40], "box", [0.03, 0.025, 0.02],
|
| 165 |
+
[0.6, 0.6, 0.6, 1], False, BinType.OUTPUT),
|
| 166 |
+
("item_safe_2", [-0.55, 0.04, 0.40], "cylinder", [0.022, 0.025],
|
| 167 |
+
[0.9, 0.9, 0.9, 1], False, BinType.OUTPUT),
|
| 168 |
+
("item_flammable_2", [-0.45, 0.02, 0.40], "box", [0.022, 0.022, 0.022],
|
| 169 |
+
[0.9, 0.1, 0.1, 1], True, BinType.FLAMMABLE),
|
| 170 |
+
]
|
| 171 |
+
|
| 172 |
+
for name, pos, shape, size, rgba, is_haz, haz_type in items_spec:
|
| 173 |
+
self._add_item(spec, name, pos, shape, size, rgba)
|
| 174 |
+
self.items[name] = ItemInfo(
|
| 175 |
+
name=name, body_id=-1, geom_id=-1,
|
| 176 |
+
is_hazardous=is_haz, hazard_type=haz_type if is_haz else None,
|
| 177 |
+
)
|
| 178 |
+
|
| 179 |
+
# Store desired spawn positions for post-keyframe initialization
|
| 180 |
+
self._item_spawn_positions = {
|
| 181 |
+
name: pos for name, pos, *_ in items_spec
|
| 182 |
+
}
|
| 183 |
+
|
| 184 |
+
# ─── Compile the model ──────────────────────────────────────────
|
| 185 |
+
self.model = spec.compile()
|
| 186 |
+
self.data = mujoco.MjData(self.model)
|
| 187 |
+
|
| 188 |
+
# Keep floor contacts in the environment collision group (not robot group).
|
| 189 |
+
floor_geom_id = mujoco.mj_name2id(
|
| 190 |
+
self.model, mujoco.mjtObj.mjOBJ_GEOM, "floor")
|
| 191 |
+
if floor_geom_id >= 0:
|
| 192 |
+
self.model.geom_contype[floor_geom_id] = ENV_CONTACT_TYPE
|
| 193 |
+
self.model.geom_conaffinity[floor_geom_id] = ENV_CONTACT_TYPE
|
| 194 |
+
|
| 195 |
+
# Resolve body/geom IDs for items
|
| 196 |
+
for name in self.items:
|
| 197 |
+
self.items[name].body_id = mujoco.mj_name2id(
|
| 198 |
+
self.model, mujoco.mjtObj.mjOBJ_BODY, name)
|
| 199 |
+
geom_name = f"{name}_geom"
|
| 200 |
+
self.items[name].geom_id = mujoco.mj_name2id(
|
| 201 |
+
self.model, mujoco.mjtObj.mjOBJ_GEOM, geom_name)
|
| 202 |
+
|
| 203 |
+
# ─── Reset to home pose ─────────────────────────────────────────
|
| 204 |
+
key_id = mujoco.mj_name2id(
|
| 205 |
+
self.model, mujoco.mjtObj.mjOBJ_KEY, "home")
|
| 206 |
+
if key_id >= 0:
|
| 207 |
+
mujoco.mj_resetDataKeyframe(self.model, self.data, key_id)
|
| 208 |
+
|
| 209 |
+
# ─── Set item initial positions (keyframe only has arm joints) ──
|
| 210 |
+
for name, pos in self._item_spawn_positions.items():
|
| 211 |
+
jnt_name = f"{name}_jnt"
|
| 212 |
+
jnt_id = mujoco.mj_name2id(
|
| 213 |
+
self.model, mujoco.mjtObj.mjOBJ_JOINT, jnt_name)
|
| 214 |
+
if jnt_id >= 0:
|
| 215 |
+
qadr = self.model.jnt_qposadr[jnt_id]
|
| 216 |
+
# freejoint qpos: [x, y, z, qw, qx, qy, qz]
|
| 217 |
+
self.data.qpos[qadr:qadr+3] = pos
|
| 218 |
+
self.data.qpos[qadr+3:qadr+7] = [1, 0, 0, 0] # identity quat
|
| 219 |
+
|
| 220 |
+
mujoco.mj_forward(self.model, self.data)
|
| 221 |
+
|
| 222 |
+
logger.info(f"Scene compiled: {self.model.nbody} bodies, "
|
| 223 |
+
f"{self.model.njnt} joints, {self.model.nu} actuators")
|
| 224 |
+
logger.info(f"Items registered: {list(self.items.keys())}")
|
| 225 |
+
|
| 226 |
+
def _add_conveyor(self, spec: mujoco.MjSpec, name: str, pos: list) -> None:
|
| 227 |
+
"""Add a conveyor belt with frame and legs."""
|
| 228 |
+
world = spec.worldbody
|
| 229 |
+
body = world.add_body()
|
| 230 |
+
body.name = f"conveyor_{name}"
|
| 231 |
+
body.pos = pos
|
| 232 |
+
|
| 233 |
+
# Belt surface
|
| 234 |
+
belt = body.add_geom()
|
| 235 |
+
belt.name = f"belt_{name}"
|
| 236 |
+
belt.type = mujoco.mjtGeom.mjGEOM_BOX
|
| 237 |
+
belt.size = [0.35, 0.12, 0.005]
|
| 238 |
+
belt.pos = [0, 0, 0.35]
|
| 239 |
+
belt.rgba = [0.15, 0.15, 0.15, 1]
|
| 240 |
+
belt.friction = [0.8, 0.005, 0.0001]
|
| 241 |
+
belt.contype = ENV_CONTACT_TYPE
|
| 242 |
+
belt.conaffinity = ENV_CONTACT_TYPE
|
| 243 |
+
|
| 244 |
+
# Side rails
|
| 245 |
+
for side_name, y in [("L", 0.125), ("R", -0.125)]:
|
| 246 |
+
rail = body.add_geom()
|
| 247 |
+
rail.name = f"rail_{name}_{side_name}"
|
| 248 |
+
rail.type = mujoco.mjtGeom.mjGEOM_BOX
|
| 249 |
+
rail.size = [0.35, 0.005, 0.02]
|
| 250 |
+
rail.pos = [0, y, 0.37]
|
| 251 |
+
rail.rgba = [0.4, 0.4, 0.45, 1]
|
| 252 |
+
rail.contype = ENV_CONTACT_TYPE
|
| 253 |
+
rail.conaffinity = ENV_CONTACT_TYPE
|
| 254 |
+
|
| 255 |
+
# Legs
|
| 256 |
+
for lx, ly in [(-0.3, 0.1), (-0.3, -0.1), (0.3, 0.1), (0.3, -0.1)]:
|
| 257 |
+
leg = body.add_geom()
|
| 258 |
+
leg.type = mujoco.mjtGeom.mjGEOM_CYLINDER
|
| 259 |
+
leg.size = [0.015, 0.175, 0]
|
| 260 |
+
leg.pos = [lx, ly, 0.175]
|
| 261 |
+
leg.rgba = [0.4, 0.4, 0.45, 1]
|
| 262 |
+
leg.contype = ENV_CONTACT_TYPE
|
| 263 |
+
leg.conaffinity = ENV_CONTACT_TYPE
|
| 264 |
+
|
| 265 |
+
def _add_bin(self, spec: mujoco.MjSpec, name: str, pos: list,
|
| 266 |
+
color: list) -> None:
|
| 267 |
+
"""Add an open-top waste bin."""
|
| 268 |
+
world = spec.worldbody
|
| 269 |
+
body = world.add_body()
|
| 270 |
+
body.name = f"bin_{name}"
|
| 271 |
+
body.pos = pos
|
| 272 |
+
|
| 273 |
+
# Walls
|
| 274 |
+
wall_specs = [
|
| 275 |
+
(f"bin_{name}_back", [0, -0.095, 0.12], [0.1, 0.005, 0.12]),
|
| 276 |
+
(f"bin_{name}_front", [0, 0.095, 0.12], [0.1, 0.005, 0.12]),
|
| 277 |
+
(f"bin_{name}_left", [-0.095, 0, 0.12], [0.005, 0.1, 0.12]),
|
| 278 |
+
(f"bin_{name}_right", [0.095, 0, 0.12], [0.005, 0.1, 0.12]),
|
| 279 |
+
]
|
| 280 |
+
for wname, wpos, wsize in wall_specs:
|
| 281 |
+
wall = body.add_geom()
|
| 282 |
+
wall.name = wname
|
| 283 |
+
wall.type = mujoco.mjtGeom.mjGEOM_BOX
|
| 284 |
+
wall.size = wsize
|
| 285 |
+
wall.pos = wpos
|
| 286 |
+
wall.rgba = color
|
| 287 |
+
wall.contype = ENV_CONTACT_TYPE
|
| 288 |
+
wall.conaffinity = ENV_CONTACT_TYPE
|
| 289 |
+
|
| 290 |
+
# Bottom
|
| 291 |
+
bottom = body.add_geom()
|
| 292 |
+
bottom.name = f"bin_{name}_bottom"
|
| 293 |
+
bottom.type = mujoco.mjtGeom.mjGEOM_BOX
|
| 294 |
+
bottom.size = [0.1, 0.1, 0.005]
|
| 295 |
+
bottom.pos = [0, 0, 0.005]
|
| 296 |
+
bottom.rgba = [0.1, 0.1, 0.1, 1]
|
| 297 |
+
bottom.contype = ENV_CONTACT_TYPE
|
| 298 |
+
bottom.conaffinity = ENV_CONTACT_TYPE
|
| 299 |
+
|
| 300 |
+
def _add_item(self, spec: mujoco.MjSpec, name: str, pos: list,
|
| 301 |
+
shape: str, size: list, rgba: list) -> None:
|
| 302 |
+
"""Add a free-jointed item to the world."""
|
| 303 |
+
world = spec.worldbody
|
| 304 |
+
body = world.add_body()
|
| 305 |
+
body.name = name
|
| 306 |
+
body.pos = pos
|
| 307 |
+
|
| 308 |
+
# Free joint
|
| 309 |
+
jnt = body.add_freejoint()
|
| 310 |
+
jnt.name = f"{name}_jnt"
|
| 311 |
+
|
| 312 |
+
# Geom
|
| 313 |
+
geom = body.add_geom()
|
| 314 |
+
geom.name = f"{name}_geom"
|
| 315 |
+
shape_map = {
|
| 316 |
+
"box": mujoco.mjtGeom.mjGEOM_BOX,
|
| 317 |
+
"sphere": mujoco.mjtGeom.mjGEOM_SPHERE,
|
| 318 |
+
"cylinder": mujoco.mjtGeom.mjGEOM_CYLINDER,
|
| 319 |
+
}
|
| 320 |
+
geom.type = shape_map[shape]
|
| 321 |
+
geom.size = size + [0] * (3 - len(size)) # Pad to 3 elements
|
| 322 |
+
geom.rgba = rgba
|
| 323 |
+
geom.mass = 0.05
|
| 324 |
+
geom.friction = [1.0, 0.005, 0.0001]
|
| 325 |
+
geom.priority = 1
|
| 326 |
+
geom.contype = ENV_CONTACT_TYPE
|
| 327 |
+
geom.conaffinity = ENV_CONTACT_TYPE
|
| 328 |
+
|
| 329 |
+
# ─── End-effector helpers ────────────────────────────────────────────
|
| 330 |
+
|
| 331 |
+
def get_ee_pos(self) -> np.ndarray:
|
| 332 |
+
"""Get current end-effector (hand) position in world coords."""
|
| 333 |
+
hand_id = mujoco.mj_name2id(
|
| 334 |
+
self.model, mujoco.mjtObj.mjOBJ_BODY, "hand")
|
| 335 |
+
return self.data.xpos[hand_id].copy()
|
| 336 |
+
|
| 337 |
+
def get_ee_site_pos(self) -> np.ndarray:
|
| 338 |
+
"""Get EE position — alias."""
|
| 339 |
+
return self.get_ee_pos()
|
| 340 |
+
|
| 341 |
+
def get_item_pos(self, item_name: str) -> Optional[np.ndarray]:
|
| 342 |
+
"""Get position of an item by name."""
|
| 343 |
+
info = self.items.get(item_name)
|
| 344 |
+
if info and info.body_id >= 0:
|
| 345 |
+
return self.data.xpos[info.body_id].copy()
|
| 346 |
+
return None
|
| 347 |
+
|
| 348 |
+
def _set_item_pose(self, item_name: str, pos: np.ndarray,
|
| 349 |
+
quat: Tuple[float, float, float, float] = (1, 0, 0, 0)) -> bool:
|
| 350 |
+
"""Directly place an item free-joint at a world pose."""
|
| 351 |
+
jnt_name = f"{item_name}_jnt"
|
| 352 |
+
jnt_id = mujoco.mj_name2id(
|
| 353 |
+
self.model, mujoco.mjtObj.mjOBJ_JOINT, jnt_name)
|
| 354 |
+
if jnt_id < 0:
|
| 355 |
+
return False
|
| 356 |
+
qadr = self.model.jnt_qposadr[jnt_id]
|
| 357 |
+
self.data.qpos[qadr:qadr+3] = pos
|
| 358 |
+
self.data.qpos[qadr+3:qadr+7] = quat
|
| 359 |
+
dadr = self.model.jnt_dofadr[jnt_id]
|
| 360 |
+
self.data.qvel[dadr:dadr+6] = 0.0
|
| 361 |
+
return True
|
| 362 |
+
|
| 363 |
+
# ─── IK (Solver-based) ────────────────────────────────────────────
|
| 364 |
+
|
| 365 |
+
def reset_arm_neutral(self) -> None:
|
| 366 |
+
"""
|
| 367 |
+
Move arm to a neutral upright pose where IK works well in all directions.
|
| 368 |
+
"""
|
| 369 |
+
neutral_qpos = [0.0, -0.3, 0.0, -2.0, 0.0, 1.8, 0.0]
|
| 370 |
+
# Set qpos directly for arm joints (first 7)
|
| 371 |
+
self.data.qpos[:NUM_ARM_JOINTS] = neutral_qpos
|
| 372 |
+
self.data.ctrl[:NUM_ARM_JOINTS] = neutral_qpos
|
| 373 |
+
mujoco.mj_forward(self.model, self.data)
|
| 374 |
+
|
| 375 |
+
def solve_ik(self, target_pos: np.ndarray,
|
| 376 |
+
target_quat: Optional[np.ndarray] = None,
|
| 377 |
+
max_iter: int = 300,
|
| 378 |
+
tolerance: float = 0.015,
|
| 379 |
+
step_size: float = 0.5,
|
| 380 |
+
damping: float = 0.05) -> Optional[np.ndarray]:
|
| 381 |
+
"""
|
| 382 |
+
Pure kinematic IK solver — iterates Jacobian on qpos WITHOUT physics.
|
| 383 |
+
Returns joint angles (length 7) or None if failed.
|
| 384 |
+
"""
|
| 385 |
+
hand_id = mujoco.mj_name2id(
|
| 386 |
+
self.model, mujoco.mjtObj.mjOBJ_BODY, "hand")
|
| 387 |
+
|
| 388 |
+
# Save original qpos to restore later (critical for not corrupting physics)
|
| 389 |
+
orig_qpos = self.data.qpos.copy()
|
| 390 |
+
|
| 391 |
+
# Work on a copy of qpos
|
| 392 |
+
qpos_arm = orig_qpos[:NUM_ARM_JOINTS].copy()
|
| 393 |
+
|
| 394 |
+
try:
|
| 395 |
+
for _ in range(max_iter):
|
| 396 |
+
# Temporarily set qpos, run forward kinematics
|
| 397 |
+
self.data.qpos[:NUM_ARM_JOINTS] = qpos_arm
|
| 398 |
+
mujoco.mj_forward(self.model, self.data)
|
| 399 |
+
|
| 400 |
+
current_pos = self.data.xpos[hand_id].copy()
|
| 401 |
+
err_pos = target_pos - current_pos
|
| 402 |
+
|
| 403 |
+
# Position Jacobian
|
| 404 |
+
jacp = np.zeros((3, self.model.nv))
|
| 405 |
+
mujoco.mj_jacBody(self.model, self.data, jacp, None, hand_id)
|
| 406 |
+
J = jacp[:, :NUM_ARM_JOINTS]
|
| 407 |
+
error = err_pos
|
| 408 |
+
|
| 409 |
+
if target_quat is not None:
|
| 410 |
+
current_quat = self.data.xquat[hand_id].copy()
|
| 411 |
+
err_rot = np.zeros(3)
|
| 412 |
+
mujoco.mju_subQuat(err_rot, target_quat, current_quat)
|
| 413 |
+
|
| 414 |
+
# Rotation Jacobian
|
| 415 |
+
jacr = np.zeros((3, self.model.nv))
|
| 416 |
+
mujoco.mj_jacBody(self.model, self.data, None, jacr, hand_id)
|
| 417 |
+
Jr = jacr[:, :NUM_ARM_JOINTS]
|
| 418 |
+
|
| 419 |
+
# Scale rotational error so position takes priority
|
| 420 |
+
J = np.vstack([J, Jr * 0.5])
|
| 421 |
+
error = np.concatenate([error, err_rot * 0.5])
|
| 422 |
+
|
| 423 |
+
if np.linalg.norm(error) < tolerance:
|
| 424 |
+
return qpos_arm.copy()
|
| 425 |
+
|
| 426 |
+
# Damped least squares
|
| 427 |
+
JJT = J @ J.T + damping**2 * np.eye(J.shape[0])
|
| 428 |
+
dq = J.T @ np.linalg.solve(JJT, error)
|
| 429 |
+
|
| 430 |
+
# Update with step size and clamping
|
| 431 |
+
dq = np.clip(dq * step_size, -0.2, 0.2)
|
| 432 |
+
qpos_arm += dq
|
| 433 |
+
|
| 434 |
+
# Clamp to joint limits
|
| 435 |
+
for j in range(NUM_ARM_JOINTS):
|
| 436 |
+
jnt_id = j # arm joints are first 7
|
| 437 |
+
lo = self.model.jnt_range[jnt_id, 0]
|
| 438 |
+
hi = self.model.jnt_range[jnt_id, 1]
|
| 439 |
+
if lo < hi:
|
| 440 |
+
qpos_arm[j] = np.clip(qpos_arm[j], lo * 0.95, hi * 0.95)
|
| 441 |
+
|
| 442 |
+
return None # Did not converge
|
| 443 |
+
|
| 444 |
+
finally:
|
| 445 |
+
# Always restore original qpos and run forward to fix physics state
|
| 446 |
+
self.data.qpos[:] = orig_qpos
|
| 447 |
+
mujoco.mj_forward(self.model, self.data)
|
| 448 |
+
|
| 449 |
+
def move_to_position(self, target_pos: np.ndarray,
|
| 450 |
+
move_steps: int = 400,
|
| 451 |
+
settle_steps: int = 100,
|
| 452 |
+
position_tolerance: float = 0.05,
|
| 453 |
+
carry_item: Optional[str] = None,
|
| 454 |
+
carry_offset: Optional[np.ndarray] = None) -> bool:
|
| 455 |
+
"""
|
| 456 |
+
Move end-effector to target position.
|
| 457 |
+
1. Solve IK kinematically
|
| 458 |
+
2. Interpolate joint targets smoothly (ease-in/ease-out)
|
| 459 |
+
3. Step physics to let arm move
|
| 460 |
+
Returns True if IK solution found.
|
| 461 |
+
"""
|
| 462 |
+
solution = self.solve_ik(target_pos)
|
| 463 |
+
if solution is None:
|
| 464 |
+
logger.warning(f"IK failed for target {target_pos}")
|
| 465 |
+
return False
|
| 466 |
+
|
| 467 |
+
current_ctrl = self.data.ctrl[:NUM_ARM_JOINTS].copy()
|
| 468 |
+
|
| 469 |
+
if carry_item is not None and carry_offset is None:
|
| 470 |
+
carry_offset = np.array([0.0, 0.0, -0.06])
|
| 471 |
+
|
| 472 |
+
# Smooth interpolation to target
|
| 473 |
+
for i in range(move_steps):
|
| 474 |
+
alpha = (i + 1) / move_steps
|
| 475 |
+
t = alpha * alpha * (3 - 2 * alpha) # Smoothstep
|
| 476 |
+
self.data.ctrl[:NUM_ARM_JOINTS] = current_ctrl * (1 - t) + solution * t
|
| 477 |
+
mujoco.mj_step(self.model, self.data)
|
| 478 |
+
if carry_item is not None:
|
| 479 |
+
ee = self.get_ee_pos()
|
| 480 |
+
self._set_item_pose(carry_item, ee + carry_offset)
|
| 481 |
+
|
| 482 |
+
# Settle
|
| 483 |
+
for _ in range(settle_steps):
|
| 484 |
+
mujoco.mj_step(self.model, self.data)
|
| 485 |
+
if carry_item is not None:
|
| 486 |
+
ee = self.get_ee_pos()
|
| 487 |
+
self._set_item_pose(carry_item, ee + carry_offset)
|
| 488 |
+
|
| 489 |
+
if carry_item is not None:
|
| 490 |
+
ee = self.get_ee_pos()
|
| 491 |
+
self._set_item_pose(carry_item, ee + carry_offset)
|
| 492 |
+
mujoco.mj_forward(self.model, self.data)
|
| 493 |
+
|
| 494 |
+
final_ee = self.get_ee_pos()
|
| 495 |
+
err = np.linalg.norm(target_pos - final_ee)
|
| 496 |
+
if err > position_tolerance:
|
| 497 |
+
logger.warning(
|
| 498 |
+
f"Move failed: target {target_pos}, reached {final_ee}, err={err:.4f}")
|
| 499 |
+
return False
|
| 500 |
+
return True
|
| 501 |
+
|
| 502 |
+
def set_gripper(self, open_gripper: bool) -> None:
|
| 503 |
+
"""Open or close the gripper."""
|
| 504 |
+
self.data.ctrl[GRIPPER_ACTUATOR_ID] = (
|
| 505 |
+
GRIPPER_OPEN if open_gripper else GRIPPER_CLOSED
|
| 506 |
+
)
|
| 507 |
+
|
| 508 |
+
def step(self, n: int = 1) -> None:
|
| 509 |
+
"""Advance the simulation by n steps."""
|
| 510 |
+
for _ in range(n):
|
| 511 |
+
mujoco.mj_step(self.model, self.data)
|
| 512 |
+
|
| 513 |
+
# ─── High-level pick-place operations ────────────────────────────────
|
| 514 |
+
|
| 515 |
+
def _stabilize_unpicked_items(self, exclude: str = "") -> None:
|
| 516 |
+
"""Zero out velocities of all unpicked items to prevent physics drift.
|
| 517 |
+
|
| 518 |
+
Called before/after each pick-and-place so that the arm doesn't
|
| 519 |
+
knock neighboring items off the conveyor.
|
| 520 |
+
"""
|
| 521 |
+
for name, info in self.items.items():
|
| 522 |
+
if name == exclude or info.picked:
|
| 523 |
+
continue
|
| 524 |
+
jnt_name = f"{name}_jnt"
|
| 525 |
+
jnt_id = mujoco.mj_name2id(
|
| 526 |
+
self.model, mujoco.mjtObj.mjOBJ_JOINT, jnt_name)
|
| 527 |
+
if jnt_id < 0:
|
| 528 |
+
continue
|
| 529 |
+
dadr = self.model.jnt_dofadr[jnt_id]
|
| 530 |
+
self.data.qvel[dadr:dadr + 6] = 0.0
|
| 531 |
+
mujoco.mj_forward(self.model, self.data)
|
| 532 |
+
|
| 533 |
+
def pick_and_place(self, item_name: str, target_bin: BinType) -> bool:
|
| 534 |
+
"""
|
| 535 |
+
Execute a full pick-and-place sequence:
|
| 536 |
+
1. Open gripper
|
| 537 |
+
2. Move above item
|
| 538 |
+
3. Move down to item
|
| 539 |
+
4. Close gripper
|
| 540 |
+
5. Move up
|
| 541 |
+
6. Move above target bin
|
| 542 |
+
7. Open gripper (drop)
|
| 543 |
+
8. Return to neutral
|
| 544 |
+
"""
|
| 545 |
+
info = self.items.get(item_name)
|
| 546 |
+
if not info or info.picked:
|
| 547 |
+
logger.warning(f"Item {item_name} not found or already picked")
|
| 548 |
+
return False
|
| 549 |
+
|
| 550 |
+
# Freeze all other items in place before we move the arm
|
| 551 |
+
self._stabilize_unpicked_items(exclude=item_name)
|
| 552 |
+
|
| 553 |
+
self._arm_busy = True
|
| 554 |
+
try:
|
| 555 |
+
item_pos = self.get_item_pos(item_name)
|
| 556 |
+
if item_pos is None:
|
| 557 |
+
logger.warning(f"Cannot get position for {item_name}")
|
| 558 |
+
return False
|
| 559 |
+
|
| 560 |
+
# Sanity check: item must be within reachable workspace
|
| 561 |
+
if (abs(item_pos[0]) > 1.0 or abs(item_pos[1]) > 1.0
|
| 562 |
+
or item_pos[2] < 0.0 or item_pos[2] > 1.0):
|
| 563 |
+
logger.warning(
|
| 564 |
+
f"Item {item_name} at {item_pos} is outside reachable "
|
| 565 |
+
f"workspace — it may have been displaced by physics")
|
| 566 |
+
return False
|
| 567 |
+
|
| 568 |
+
logger.info(f"Picking {item_name} at {item_pos} -> {target_bin.value}")
|
| 569 |
+
|
| 570 |
+
# 1. Open gripper
|
| 571 |
+
self.set_gripper(True)
|
| 572 |
+
self.step(50)
|
| 573 |
+
|
| 574 |
+
# 1.5 Move high to ensure we clear the scene
|
| 575 |
+
safe_high = np.array([0.0, 0.0, 0.65])
|
| 576 |
+
if not self.move_to_position(safe_high, move_steps=200, settle_steps=50):
|
| 577 |
+
return False
|
| 578 |
+
|
| 579 |
+
# Re-read item position after safe-high move (physics may shift items)
|
| 580 |
+
item_pos = self.get_item_pos(item_name)
|
| 581 |
+
if item_pos is None or item_pos[2] < 0.0 or item_pos[2] > 1.0:
|
| 582 |
+
logger.warning(
|
| 583 |
+
f"Item {item_name} moved to invalid position {item_pos} "
|
| 584 |
+
f"during arm movement")
|
| 585 |
+
return False
|
| 586 |
+
|
| 587 |
+
# 2. Move above item (approach from above)
|
| 588 |
+
approach_pos = item_pos.copy()
|
| 589 |
+
approach_pos[2] += 0.10
|
| 590 |
+
if not self.move_to_position(approach_pos):
|
| 591 |
+
logger.warning(f"Failed to reach approach position for {item_name}")
|
| 592 |
+
return False
|
| 593 |
+
|
| 594 |
+
# 3. Move down to grasp
|
| 595 |
+
grasp_pos = item_pos.copy()
|
| 596 |
+
grasp_pos[2] += 0.03
|
| 597 |
+
if not self.move_to_position(grasp_pos):
|
| 598 |
+
logger.warning(f"Failed to reach grasp position for {item_name}")
|
| 599 |
+
return False
|
| 600 |
+
|
| 601 |
+
# 4. Close gripper
|
| 602 |
+
self.set_gripper(False)
|
| 603 |
+
self.step(120) # allow gripper to close
|
| 604 |
+
|
| 605 |
+
# Verify we are close enough to claim a grasp.
|
| 606 |
+
ee_pos = self.get_ee_pos()
|
| 607 |
+
item_now = self.get_item_pos(item_name)
|
| 608 |
+
if item_now is None or np.linalg.norm(ee_pos - item_now) > 0.12:
|
| 609 |
+
logger.warning(
|
| 610 |
+
f"Grasp verification failed for {item_name}: "
|
| 611 |
+
f"ee={ee_pos}, item={item_now}")
|
| 612 |
+
return False
|
| 613 |
+
|
| 614 |
+
# Kinematic carry of the item for deterministic phase testing.
|
| 615 |
+
carry_offset = np.array([0.0, 0.0, -0.06])
|
| 616 |
+
self._set_item_pose(item_name, ee_pos + carry_offset)
|
| 617 |
+
mujoco.mj_forward(self.model, self.data)
|
| 618 |
+
|
| 619 |
+
# 5. Lift up while carrying.
|
| 620 |
+
lift_pos = grasp_pos.copy()
|
| 621 |
+
lift_pos[2] += 0.22
|
| 622 |
+
if not self.move_to_position(
|
| 623 |
+
lift_pos, carry_item=item_name, carry_offset=carry_offset):
|
| 624 |
+
return False
|
| 625 |
+
|
| 626 |
+
# 6. Move above target bin while carrying.
|
| 627 |
+
bin_pos = BIN_POSITIONS[target_bin].copy()
|
| 628 |
+
if not self.move_to_position(
|
| 629 |
+
bin_pos, carry_item=item_name, carry_offset=carry_offset):
|
| 630 |
+
return False
|
| 631 |
+
|
| 632 |
+
# 7. Place and release.
|
| 633 |
+
drop_pos = bin_pos.copy()
|
| 634 |
+
drop_pos[2] -= 0.12
|
| 635 |
+
self._set_item_pose(item_name, drop_pos)
|
| 636 |
+
mujoco.mj_forward(self.model, self.data)
|
| 637 |
+
self.set_gripper(True)
|
| 638 |
+
self.step(100)
|
| 639 |
+
|
| 640 |
+
# Mark item as sorted only after successful place.
|
| 641 |
+
info.picked = True
|
| 642 |
+
self._items_sorted += 1
|
| 643 |
+
|
| 644 |
+
# 8. Return to neutral.
|
| 645 |
+
neutral = np.array([0.0, 0.0, 0.6])
|
| 646 |
+
self.move_to_position(neutral)
|
| 647 |
+
|
| 648 |
+
# Stabilize remaining items after arm movement
|
| 649 |
+
self._stabilize_unpicked_items()
|
| 650 |
+
|
| 651 |
+
logger.info(f"Successfully placed {item_name} in {target_bin.value}")
|
| 652 |
+
return True
|
| 653 |
+
finally:
|
| 654 |
+
self._arm_busy = False
|
| 655 |
+
|
| 656 |
+
# ─── State snapshot ──────────────────────────────────────────────────
|
| 657 |
+
|
| 658 |
+
def get_state(self) -> SimState:
|
| 659 |
+
"""Get current simulation state for the frontend."""
|
| 660 |
+
ee = self.get_ee_pos()
|
| 661 |
+
items_info = []
|
| 662 |
+
for name, info in self.items.items():
|
| 663 |
+
pos = self.get_item_pos(name)
|
| 664 |
+
items_info.append({
|
| 665 |
+
"name": name,
|
| 666 |
+
"pos": pos.tolist() if pos is not None else [0, 0, 0],
|
| 667 |
+
"is_hazardous": info.is_hazardous,
|
| 668 |
+
"hazard_type": info.hazard_type.value if info.hazard_type else None,
|
| 669 |
+
"picked": info.picked,
|
| 670 |
+
})
|
| 671 |
+
|
| 672 |
+
return SimState(
|
| 673 |
+
time=self.data.time,
|
| 674 |
+
ee_pos=tuple(ee),
|
| 675 |
+
gripper_open=self.data.ctrl[GRIPPER_ACTUATOR_ID] > 100,
|
| 676 |
+
items=items_info,
|
| 677 |
+
arm_busy=self._arm_busy,
|
| 678 |
+
items_sorted=self._items_sorted,
|
| 679 |
+
)
|
| 680 |
+
|
| 681 |
+
# ─── Rendering ───────────────────────────────────────────────────────
|
| 682 |
+
|
| 683 |
+
def render_frame(self, width: int = 1280, height: int = 720,
|
| 684 |
+
camera: str = "overview") -> np.ndarray:
|
| 685 |
+
"""Render a frame from the specified camera. Returns RGB array."""
|
| 686 |
+
if self.renderer is None:
|
| 687 |
+
self.renderer = mujoco.Renderer(self.model, height, width)
|
| 688 |
+
|
| 689 |
+
cam_id = mujoco.mj_name2id(
|
| 690 |
+
self.model, mujoco.mjtObj.mjOBJ_CAMERA, camera)
|
| 691 |
+
|
| 692 |
+
self.renderer.update_scene(self.data, camera=cam_id)
|
| 693 |
+
return self.renderer.render()
|
| 694 |
+
|
| 695 |
+
def save_frame(self, path: str, camera: str = "overview") -> None:
|
| 696 |
+
"""Render a frame and save as PNG."""
|
| 697 |
+
from PIL import Image
|
| 698 |
+
frame = self.render_frame(camera=camera)
|
| 699 |
+
Image.fromarray(frame).save(path)
|
| 700 |
+
logger.info(f"Frame saved to {path}")
|
| 701 |
+
|
| 702 |
+
def close(self) -> None:
|
| 703 |
+
"""Release renderer resources explicitly."""
|
| 704 |
+
if self.renderer is not None:
|
| 705 |
+
try:
|
| 706 |
+
self.renderer.close()
|
| 707 |
+
except Exception:
|
| 708 |
+
pass # EGL cleanup errors are harmless at shutdown
|
| 709 |
+
self.renderer = None
|
| 710 |
+
|
| 711 |
+
# ─── Interactive viewer ──────────────────────────────────────────────
|
| 712 |
+
|
| 713 |
+
def launch_viewer(self) -> None:
|
| 714 |
+
"""Launch the interactive MuJoCo viewer."""
|
| 715 |
+
key_id = mujoco.mj_name2id(
|
| 716 |
+
self.model, mujoco.mjtObj.mjOBJ_KEY, "home")
|
| 717 |
+
if key_id >= 0:
|
| 718 |
+
mujoco.mj_resetDataKeyframe(self.model, self.data, key_id)
|
| 719 |
+
mujoco.viewer.launch(self.model, self.data)
|
| 720 |
+
|
| 721 |
+
# ─── Async interface for agent integration ───────────────────────────
|
| 722 |
+
|
| 723 |
+
async def async_pick_and_place(self, item_name: str,
|
| 724 |
+
target_bin: BinType) -> Dict:
|
| 725 |
+
"""Async wrapper around pick_and_place for agent integration."""
|
| 726 |
+
loop = asyncio.get_event_loop()
|
| 727 |
+
success = await loop.run_in_executor(
|
| 728 |
+
None, self.pick_and_place, item_name, target_bin
|
| 729 |
+
)
|
| 730 |
+
return {
|
| 731 |
+
"success": success,
|
| 732 |
+
"item": item_name,
|
| 733 |
+
"target_bin": target_bin.value,
|
| 734 |
+
"items_sorted": self._items_sorted,
|
| 735 |
+
}
|
| 736 |
+
|
| 737 |
+
async def async_get_state(self) -> Dict:
|
| 738 |
+
"""Async state snapshot."""
|
| 739 |
+
state = self.get_state()
|
| 740 |
+
return {
|
| 741 |
+
"time": state.time,
|
| 742 |
+
"ee_pos": list(state.ee_pos),
|
| 743 |
+
"gripper_open": state.gripper_open,
|
| 744 |
+
"items": state.items,
|
| 745 |
+
"arm_busy": state.arm_busy,
|
| 746 |
+
"items_sorted": state.items_sorted,
|
| 747 |
+
}
|
| 748 |
+
|
| 749 |
+
|
| 750 |
+
# ─── CLI entry point ────────────────────────────────────────────────────────
|
| 751 |
+
|
| 752 |
+
def main():
|
| 753 |
+
import argparse
|
| 754 |
+
|
| 755 |
+
parser = argparse.ArgumentParser(description="SemSorter Simulation Controller")
|
| 756 |
+
parser.add_argument("--render", action="store_true",
|
| 757 |
+
help="Render a test frame and save as PNG")
|
| 758 |
+
parser.add_argument("--test-pick", action="store_true",
|
| 759 |
+
help="Test pick-and-place of first hazardous item")
|
| 760 |
+
parser.add_argument("--output", default="test_frame.png",
|
| 761 |
+
help="Output path for rendered frame")
|
| 762 |
+
args = parser.parse_args()
|
| 763 |
+
|
| 764 |
+
logging.basicConfig(level=logging.INFO)
|
| 765 |
+
|
| 766 |
+
sim = SemSorterSimulation()
|
| 767 |
+
sim.load_scene()
|
| 768 |
+
|
| 769 |
+
try:
|
| 770 |
+
if args.render:
|
| 771 |
+
sim.save_frame(args.output)
|
| 772 |
+
print(f"Frame saved to {args.output}")
|
| 773 |
+
elif args.test_pick:
|
| 774 |
+
print("Testing pick-and-place...")
|
| 775 |
+
sim.pick_and_place("item_flammable_1", BinType.FLAMMABLE)
|
| 776 |
+
sim.save_frame("after_pick.png")
|
| 777 |
+
print(f"Done! Items sorted: {sim._items_sorted}")
|
| 778 |
+
else:
|
| 779 |
+
print("Launching interactive viewer...")
|
| 780 |
+
sim.launch_viewer()
|
| 781 |
+
finally:
|
| 782 |
+
sim.close()
|
| 783 |
+
|
| 784 |
+
|
| 785 |
+
if __name__ == "__main__":
|
| 786 |
+
main()
|
SemSorter/simulation/interactive_test.py
ADDED
|
@@ -0,0 +1,70 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Interactive viewer for SemSorter simulation.
|
| 3 |
+
Runs pick-and-place in real time with the MuJoCo viewer.
|
| 4 |
+
|
| 5 |
+
Usage:
|
| 6 |
+
python3 interactive_test.py
|
| 7 |
+
"""
|
| 8 |
+
import os
|
| 9 |
+
import time
|
| 10 |
+
import mujoco
|
| 11 |
+
import mujoco.viewer
|
| 12 |
+
try:
|
| 13 |
+
from .controller import SemSorterSimulation, BinType
|
| 14 |
+
except ImportError:
|
| 15 |
+
from controller import SemSorterSimulation, BinType
|
| 16 |
+
|
| 17 |
+
# How often to sync the viewer (every N physics steps)
|
| 18 |
+
VIEWER_SYNC_INTERVAL = 10
|
| 19 |
+
|
| 20 |
+
def main():
|
| 21 |
+
print("Initializing simulation...")
|
| 22 |
+
# NOTE: Do NOT set MUJOCO_GL=egl when using the interactive viewer
|
| 23 |
+
if 'MUJOCO_GL' in os.environ:
|
| 24 |
+
del os.environ['MUJOCO_GL']
|
| 25 |
+
|
| 26 |
+
sim = SemSorterSimulation()
|
| 27 |
+
sim.load_scene()
|
| 28 |
+
|
| 29 |
+
print("Launching interactive viewer. Watch the arm move!")
|
| 30 |
+
|
| 31 |
+
with mujoco.viewer.launch_passive(sim.model, sim.data) as viewer:
|
| 32 |
+
|
| 33 |
+
# Patch mj_step to sync viewer every N steps (much faster than every step)
|
| 34 |
+
original_mj_step = mujoco.mj_step
|
| 35 |
+
step_counter = [0]
|
| 36 |
+
|
| 37 |
+
def patched_mj_step(model, data):
|
| 38 |
+
original_mj_step(model, data)
|
| 39 |
+
step_counter[0] += 1
|
| 40 |
+
if step_counter[0] % VIEWER_SYNC_INTERVAL == 0:
|
| 41 |
+
viewer.sync()
|
| 42 |
+
# Sleep only on sync frames to maintain ~real-time playback
|
| 43 |
+
time.sleep(model.opt.timestep * VIEWER_SYNC_INTERVAL)
|
| 44 |
+
|
| 45 |
+
mujoco.mj_step = patched_mj_step
|
| 46 |
+
|
| 47 |
+
try:
|
| 48 |
+
# Let the scene settle
|
| 49 |
+
sim.step(200)
|
| 50 |
+
|
| 51 |
+
time.sleep(2) # Give user time to see the initial state
|
| 52 |
+
print("\nStarting pick-and-place operation...")
|
| 53 |
+
success = sim.pick_and_place("item_flammable_1", BinType.FLAMMABLE)
|
| 54 |
+
|
| 55 |
+
print(f"\nDone! success={success}, items sorted: {sim._items_sorted}")
|
| 56 |
+
print("\nYou can close the viewer window now, or press Ctrl+C.")
|
| 57 |
+
|
| 58 |
+
# Keep viewer open until user closes it
|
| 59 |
+
while viewer.is_running():
|
| 60 |
+
original_mj_step(sim.model, sim.data)
|
| 61 |
+
viewer.sync()
|
| 62 |
+
time.sleep(0.02) # ~50 FPS idle
|
| 63 |
+
|
| 64 |
+
except KeyboardInterrupt:
|
| 65 |
+
print("\nViewer closed.")
|
| 66 |
+
finally:
|
| 67 |
+
mujoco.mj_step = original_mj_step
|
| 68 |
+
|
| 69 |
+
if __name__ == "__main__":
|
| 70 |
+
main()
|
SemSorter/simulation/semsorter_scene.xml
ADDED
|
@@ -0,0 +1,194 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<mujoco model="semsorter">
|
| 2 |
+
<!-- Let panda.xml handle its own meshdir via its embedded compiler element -->
|
| 3 |
+
<compiler angle="radian" autolimits="true"/>
|
| 4 |
+
|
| 5 |
+
<option integrator="implicitfast" gravity="0 0 -9.81" timestep="0.002"/>
|
| 6 |
+
|
| 7 |
+
<!-- ============================================================ -->
|
| 8 |
+
<!-- Include the Franka Panda arm (with integrated gripper) -->
|
| 9 |
+
<!-- ============================================================ -->
|
| 10 |
+
<include file="../../mujoco_menagerie/franka_emika_panda/panda.xml"/>
|
| 11 |
+
|
| 12 |
+
<statistic center="0 0 0.5" extent="1.5"/>
|
| 13 |
+
|
| 14 |
+
<!-- ============================================================ -->
|
| 15 |
+
<!-- Visual settings -->
|
| 16 |
+
<!-- ============================================================ -->
|
| 17 |
+
<visual>
|
| 18 |
+
<headlight diffuse="0.6 0.6 0.6" ambient="0.3 0.3 0.3" specular="0 0 0"/>
|
| 19 |
+
<rgba haze="0.15 0.25 0.35 1"/>
|
| 20 |
+
<global azimuth="150" elevation="-25"/>
|
| 21 |
+
</visual>
|
| 22 |
+
|
| 23 |
+
<!-- ============================================================ -->
|
| 24 |
+
<!-- Textures & Materials -->
|
| 25 |
+
<!-- ============================================================ -->
|
| 26 |
+
<asset>
|
| 27 |
+
<texture type="skybox" builtin="gradient" rgb1="0.3 0.5 0.7" rgb2="0 0 0" width="512" height="3072"/>
|
| 28 |
+
<texture type="2d" name="groundplane" builtin="checker" mark="edge"
|
| 29 |
+
rgb1="0.2 0.3 0.4" rgb2="0.1 0.2 0.3" markrgb="0.8 0.8 0.8" width="300" height="300"/>
|
| 30 |
+
<material name="groundplane" texture="groundplane" texuniform="true" texrepeat="5 5" reflectance="0.2"/>
|
| 31 |
+
|
| 32 |
+
<!-- Conveyor belt material -->
|
| 33 |
+
<texture type="2d" name="belt_tex" builtin="checker" rgb1="0.15 0.15 0.15" rgb2="0.2 0.2 0.2"
|
| 34 |
+
width="100" height="100"/>
|
| 35 |
+
<material name="belt_mat" texture="belt_tex" texrepeat="10 2" specular="0.1" shininess="0.05"/>
|
| 36 |
+
|
| 37 |
+
<!-- Conveyor frame material -->
|
| 38 |
+
<material name="frame_mat" rgba="0.4 0.4 0.45 1" specular="0.3" shininess="0.2"/>
|
| 39 |
+
|
| 40 |
+
<!-- Bin materials -->
|
| 41 |
+
<material name="bin_flammable_mat" rgba="0.85 0.15 0.1 0.9" specular="0.2" shininess="0.1"/>
|
| 42 |
+
<material name="bin_chemical_mat" rgba="0.95 0.75 0.1 0.9" specular="0.2" shininess="0.1"/>
|
| 43 |
+
<material name="bin_inner_mat" rgba="0.1 0.1 0.1 1"/>
|
| 44 |
+
|
| 45 |
+
<!-- Hazardous item materials -->
|
| 46 |
+
<material name="hazard_red" rgba="0.9 0.1 0.1 1" specular="0.4" shininess="0.3"/>
|
| 47 |
+
<material name="hazard_green" rgba="0.1 0.8 0.2 1" specular="0.4" shininess="0.3"/>
|
| 48 |
+
<material name="hazard_blue" rgba="0.1 0.2 0.9 1" specular="0.4" shininess="0.3"/>
|
| 49 |
+
<material name="hazard_yellow" rgba="0.95 0.85 0.1 1" specular="0.4" shininess="0.3"/>
|
| 50 |
+
<material name="safe_gray" rgba="0.6 0.6 0.6 1" specular="0.3" shininess="0.2"/>
|
| 51 |
+
<material name="safe_white" rgba="0.9 0.9 0.9 1" specular="0.3" shininess="0.2"/>
|
| 52 |
+
</asset>
|
| 53 |
+
|
| 54 |
+
<!-- ============================================================ -->
|
| 55 |
+
<!-- World: floor, lights, conveyors, bins, items -->
|
| 56 |
+
<!-- ============================================================ -->
|
| 57 |
+
<worldbody>
|
| 58 |
+
<!-- Lighting -->
|
| 59 |
+
<light pos="0 0 3" dir="0 0 -1" directional="true" diffuse="0.5 0.5 0.5"/>
|
| 60 |
+
<light pos="1 -1 2" dir="-0.3 0.3 -0.8" diffuse="0.3 0.3 0.3"/>
|
| 61 |
+
<light pos="-1 -1 2" dir="0.3 0.3 -0.8" diffuse="0.3 0.3 0.3"/>
|
| 62 |
+
|
| 63 |
+
<!-- Ground plane -->
|
| 64 |
+
<geom name="floor" size="0 0 0.05" type="plane" material="groundplane"/>
|
| 65 |
+
|
| 66 |
+
<!-- ======================================================== -->
|
| 67 |
+
<!-- CONVEYOR A (Input) — items arrive here from the left -->
|
| 68 |
+
<!-- ======================================================== -->
|
| 69 |
+
<body name="conveyor_input" pos="-0.55 0 0">
|
| 70 |
+
<!-- Belt surface -->
|
| 71 |
+
<geom name="belt_input" type="box" size="0.35 0.12 0.005" pos="0 0 0.35"
|
| 72 |
+
material="belt_mat" friction="0.8 0.005 0.0001"/>
|
| 73 |
+
<!-- Side rails -->
|
| 74 |
+
<geom name="rail_input_L" type="box" size="0.35 0.005 0.02" pos="0 0.125 0.37"
|
| 75 |
+
material="frame_mat"/>
|
| 76 |
+
<geom name="rail_input_R" type="box" size="0.35 0.005 0.02" pos="0 -0.125 0.37"
|
| 77 |
+
material="frame_mat"/>
|
| 78 |
+
<!-- Legs -->
|
| 79 |
+
<geom type="cylinder" size="0.015 0.175" pos="-0.3 0.1 0.175" material="frame_mat"/>
|
| 80 |
+
<geom type="cylinder" size="0.015 0.175" pos="-0.3 -0.1 0.175" material="frame_mat"/>
|
| 81 |
+
<geom type="cylinder" size="0.015 0.175" pos="0.3 0.1 0.175" material="frame_mat"/>
|
| 82 |
+
<geom type="cylinder" size="0.015 0.175" pos="0.3 -0.1 0.175" material="frame_mat"/>
|
| 83 |
+
</body>
|
| 84 |
+
|
| 85 |
+
<!-- ======================================================== -->
|
| 86 |
+
<!-- CONVEYOR B (Output) — clean items continue here (right) -->
|
| 87 |
+
<!-- ======================================================== -->
|
| 88 |
+
<body name="conveyor_output" pos="0.55 0 0">
|
| 89 |
+
<!-- Belt surface -->
|
| 90 |
+
<geom name="belt_output" type="box" size="0.35 0.12 0.005" pos="0 0 0.35"
|
| 91 |
+
material="belt_mat" friction="0.8 0.005 0.0001"/>
|
| 92 |
+
<!-- Side rails -->
|
| 93 |
+
<geom name="rail_output_L" type="box" size="0.35 0.005 0.02" pos="0 0.125 0.37"
|
| 94 |
+
material="frame_mat"/>
|
| 95 |
+
<geom name="rail_output_R" type="box" size="0.35 0.005 0.02" pos="0 -0.125 0.37"
|
| 96 |
+
material="frame_mat"/>
|
| 97 |
+
<!-- Legs -->
|
| 98 |
+
<geom type="cylinder" size="0.015 0.175" pos="-0.3 0.1 0.175" material="frame_mat"/>
|
| 99 |
+
<geom type="cylinder" size="0.015 0.175" pos="-0.3 -0.1 0.175" material="frame_mat"/>
|
| 100 |
+
<geom type="cylinder" size="0.015 0.175" pos="0.3 0.1 0.175" material="frame_mat"/>
|
| 101 |
+
<geom type="cylinder" size="0.015 0.175" pos="0.3 -0.1 0.175" material="frame_mat"/>
|
| 102 |
+
</body>
|
| 103 |
+
|
| 104 |
+
<!-- ======================================================== -->
|
| 105 |
+
<!-- FLAMMABLE WASTE BIN (Red) — front-left of the arm -->
|
| 106 |
+
<!-- ======================================================== -->
|
| 107 |
+
<body name="bin_flammable" pos="-0.25 0.45 0">
|
| 108 |
+
<!-- Bin walls (open top box) -->
|
| 109 |
+
<geom name="bin_fl_back" type="box" size="0.1 0.005 0.12" pos="0 -0.095 0.12" material="bin_flammable_mat"/>
|
| 110 |
+
<geom name="bin_fl_front" type="box" size="0.1 0.005 0.12" pos="0 0.095 0.12" material="bin_flammable_mat"/>
|
| 111 |
+
<geom name="bin_fl_left" type="box" size="0.005 0.1 0.12" pos="-0.095 0 0.12" material="bin_flammable_mat"/>
|
| 112 |
+
<geom name="bin_fl_right" type="box" size="0.005 0.1 0.12" pos="0.095 0 0.12" material="bin_flammable_mat"/>
|
| 113 |
+
<geom name="bin_fl_bottom" type="box" size="0.1 0.1 0.005" pos="0 0 0.005" material="bin_inner_mat"/>
|
| 114 |
+
<!-- Label area (slightly raised red panel on front) -->
|
| 115 |
+
<site name="bin_flammable_label" pos="0 0.1 0.18" size="0.06 0.005 0.03" type="box" rgba="1 0 0 1"/>
|
| 116 |
+
</body>
|
| 117 |
+
|
| 118 |
+
<!-- ======================================================== -->
|
| 119 |
+
<!-- CHEMICAL WASTE BIN (Yellow) — front-right of the arm -->
|
| 120 |
+
<!-- ======================================================== -->
|
| 121 |
+
<body name="bin_chemical" pos="0.25 0.45 0">
|
| 122 |
+
<!-- Bin walls (open top box) -->
|
| 123 |
+
<geom name="bin_ch_back" type="box" size="0.1 0.005 0.12" pos="0 -0.095 0.12" material="bin_chemical_mat"/>
|
| 124 |
+
<geom name="bin_ch_front" type="box" size="0.1 0.005 0.12" pos="0 0.095 0.12" material="bin_chemical_mat"/>
|
| 125 |
+
<geom name="bin_ch_left" type="box" size="0.005 0.1 0.12" pos="-0.095 0 0.12" material="bin_chemical_mat"/>
|
| 126 |
+
<geom name="bin_ch_right" type="box" size="0.005 0.1 0.12" pos="0.095 0 0.12" material="bin_chemical_mat"/>
|
| 127 |
+
<geom name="bin_ch_bottom" type="box" size="0.1 0.1 0.005" pos="0 0 0.005" material="bin_inner_mat"/>
|
| 128 |
+
<!-- Label area -->
|
| 129 |
+
<site name="bin_chemical_label" pos="0 0.1 0.18" size="0.06 0.005 0.03" type="box" rgba="1 0.8 0 1"/>
|
| 130 |
+
</body>
|
| 131 |
+
|
| 132 |
+
<!-- ======================================================== -->
|
| 133 |
+
<!-- HAZARDOUS ITEMS (on input conveyor, with free joints) -->
|
| 134 |
+
<!-- ======================================================== -->
|
| 135 |
+
|
| 136 |
+
<!-- Item 1: Red cylinder (flammable chemical) — leftmost -->
|
| 137 |
+
<body name="item_flammable_1" pos="-0.82 0 0.39">
|
| 138 |
+
<freejoint name="item_flammable_1_jnt"/>
|
| 139 |
+
<geom name="item_flammable_1_geom" type="cylinder" size="0.02 0.025"
|
| 140 |
+
material="hazard_red" mass="0.05" friction="1 0.005 0.0001" priority="1"/>
|
| 141 |
+
</body>
|
| 142 |
+
|
| 143 |
+
<!-- Item 2: Safe white cylinder (goes to output conveyor) -->
|
| 144 |
+
<body name="item_safe_2" pos="-0.70 0 0.39">
|
| 145 |
+
<freejoint name="item_safe_2_jnt"/>
|
| 146 |
+
<geom name="item_safe_2_geom" type="cylinder" size="0.018 0.02"
|
| 147 |
+
material="safe_white" mass="0.04" friction="1 0.005 0.0001" priority="1"/>
|
| 148 |
+
</body>
|
| 149 |
+
|
| 150 |
+
<!-- Item 3: Yellow box (chemical waste) -->
|
| 151 |
+
<body name="item_chemical_1" pos="-0.58 0 0.385">
|
| 152 |
+
<freejoint name="item_chemical_1_jnt"/>
|
| 153 |
+
<geom name="item_chemical_1_geom" type="box" size="0.02 0.02 0.02"
|
| 154 |
+
material="hazard_yellow" mass="0.04" friction="1 0.005 0.0001" priority="1"/>
|
| 155 |
+
</body>
|
| 156 |
+
|
| 157 |
+
<!-- Item 4: Safe gray box (goes to output conveyor) -->
|
| 158 |
+
<body name="item_safe_1" pos="-0.46 0 0.385">
|
| 159 |
+
<freejoint name="item_safe_1_jnt"/>
|
| 160 |
+
<geom name="item_safe_1_geom" type="box" size="0.025 0.02 0.015"
|
| 161 |
+
material="safe_gray" mass="0.05" friction="1 0.005 0.0001" priority="1"/>
|
| 162 |
+
</body>
|
| 163 |
+
|
| 164 |
+
<!-- Item 5: Blue box (chemical waste) -->
|
| 165 |
+
<body name="item_chemical_2" pos="-0.34 0 0.385">
|
| 166 |
+
<freejoint name="item_chemical_2_jnt"/>
|
| 167 |
+
<geom name="item_chemical_2_geom" type="box" size="0.018 0.018 0.018"
|
| 168 |
+
material="hazard_blue" mass="0.03" friction="1 0.005 0.0001" priority="1"/>
|
| 169 |
+
</body>
|
| 170 |
+
|
| 171 |
+
<!-- Item 6: Green box (flammable) — rightmost -->
|
| 172 |
+
<body name="item_flammable_2" pos="-0.22 0 0.385">
|
| 173 |
+
<freejoint name="item_flammable_2_jnt"/>
|
| 174 |
+
<geom name="item_flammable_2_geom" type="box" size="0.018 0.018 0.018"
|
| 175 |
+
material="hazard_green" mass="0.035" friction="1 0.005 0.0001" priority="1"/>
|
| 176 |
+
</body>
|
| 177 |
+
|
| 178 |
+
<!-- ======================================================== -->
|
| 179 |
+
<!-- Camera for the overview shot (used by OBS or renderer) -->
|
| 180 |
+
<!-- ======================================================== -->
|
| 181 |
+
<camera name="overview" pos="0 -1.2 1.2" xyaxes="1 0 0 0 0.7 0.7" fovy="50"/>
|
| 182 |
+
<camera name="topdown" pos="0 0 2.0" xyaxes="1 0 0 0 1 0" fovy="60"/>
|
| 183 |
+
<camera name="side" pos="1.5 0 0.8" xyaxes="0 1 0 -0.5 0 0.87" fovy="45"/>
|
| 184 |
+
|
| 185 |
+
</worldbody>
|
| 186 |
+
|
| 187 |
+
<!-- ============================================================ -->
|
| 188 |
+
<!-- Sensors for end-effector position tracking -->
|
| 189 |
+
<!-- ============================================================ -->
|
| 190 |
+
<sensor>
|
| 191 |
+
<framepos name="end_effector_pos" objtype="body" objname="hand"/>
|
| 192 |
+
</sensor>
|
| 193 |
+
|
| 194 |
+
</mujoco>
|
SemSorter/vision/__init__.py
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
# SemSorter Vision Module
|
SemSorter/vision/test_obs.py
ADDED
|
@@ -0,0 +1,29 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import cv2
|
| 2 |
+
import time
|
| 3 |
+
|
| 4 |
+
def main():
|
| 5 |
+
print("Testing OBS Virtual Camera on /dev/video4...")
|
| 6 |
+
# Open the virtual camera
|
| 7 |
+
cap = cv2.VideoCapture(4)
|
| 8 |
+
|
| 9 |
+
if not cap.isOpened():
|
| 10 |
+
print("Error: Could not open video device /dev/video4.")
|
| 11 |
+
print("Please ensure OBS Virtual Camera is running.")
|
| 12 |
+
return
|
| 13 |
+
|
| 14 |
+
print("Successfully opened camera. Waiting 2 seconds for it to warm up...")
|
| 15 |
+
time.sleep(2)
|
| 16 |
+
|
| 17 |
+
ret, frame = cap.read()
|
| 18 |
+
|
| 19 |
+
if not ret:
|
| 20 |
+
print("Error: Could not read frame from camera.")
|
| 21 |
+
else:
|
| 22 |
+
output_file = "obs_snapshot.png"
|
| 23 |
+
cv2.imwrite(output_file, frame)
|
| 24 |
+
print(f"Success! Captured frame with shape {frame.shape} and saved to {output_file}.")
|
| 25 |
+
|
| 26 |
+
cap.release()
|
| 27 |
+
|
| 28 |
+
if __name__ == "__main__":
|
| 29 |
+
main()
|
SemSorter/vision/vision_pipeline.py
ADDED
|
@@ -0,0 +1,239 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
SemSorter Vision Pipeline — Hazard Detection Processor
|
| 3 |
+
|
| 4 |
+
Captures frames from OBS Virtual Camera or directly from the simulation,
|
| 5 |
+
then sends them to Gemini VLM for hazardous item detection.
|
| 6 |
+
|
| 7 |
+
Usage:
|
| 8 |
+
# From OBS Virtual Camera:
|
| 9 |
+
GOOGLE_API_KEY=... python3 vision_pipeline.py
|
| 10 |
+
|
| 11 |
+
# From simulation directly (no OBS needed):
|
| 12 |
+
GOOGLE_API_KEY=... python3 vision_pipeline.py --direct
|
| 13 |
+
"""
|
| 14 |
+
|
| 15 |
+
import os
|
| 16 |
+
import sys
|
| 17 |
+
import cv2
|
| 18 |
+
import json
|
| 19 |
+
import time
|
| 20 |
+
import logging
|
| 21 |
+
import google.generativeai as genai
|
| 22 |
+
from PIL import Image
|
| 23 |
+
from typing import List, Dict, Optional
|
| 24 |
+
|
| 25 |
+
logger = logging.getLogger(__name__)
|
| 26 |
+
|
| 27 |
+
|
| 28 |
+
class HazardDetectionProcessor:
|
| 29 |
+
"""
|
| 30 |
+
Detects hazardous items in the SemSorter simulation using Gemini VLM.
|
| 31 |
+
|
| 32 |
+
Supports two input modes:
|
| 33 |
+
- OBS Virtual Camera: reads from /dev/videoX
|
| 34 |
+
- Direct simulation rendering: calls sim.render_frame()
|
| 35 |
+
"""
|
| 36 |
+
|
| 37 |
+
def __init__(self, device_id: int = 4, simulation=None):
|
| 38 |
+
"""
|
| 39 |
+
Args:
|
| 40 |
+
device_id: Video device ID for OBS Virtual Camera (e.g., 4 for /dev/video4)
|
| 41 |
+
simulation: Optional SemSorterSimulation instance for direct rendering
|
| 42 |
+
"""
|
| 43 |
+
self.device_id = device_id
|
| 44 |
+
self.simulation = simulation
|
| 45 |
+
self._video_cap = None # Reusable VideoCapture
|
| 46 |
+
self._gemini_model = None # Lazy-initialized
|
| 47 |
+
|
| 48 |
+
# System instructions to enforce structured JSON output
|
| 49 |
+
self.system_instruction = (
|
| 50 |
+
"You are an AI vision system for a robotic waste sorting arm. "
|
| 51 |
+
"You are given an image of a conveyor belt with a robotic arm and waste bins. "
|
| 52 |
+
"Your task is to identify hazardous items on the conveyor belt. "
|
| 53 |
+
"Hazardous items are categorized as:\n"
|
| 54 |
+
"- FLAMMABLE: Red-colored items (cylinders, boxes)\n"
|
| 55 |
+
"- CHEMICAL: Yellow-colored items (boxes, spheres)\n\n"
|
| 56 |
+
"Safe items are gray, white, green, or blue — IGNORE these.\n\n"
|
| 57 |
+
"For each hazardous item detected, return a JSON object with:\n"
|
| 58 |
+
"- 'name': descriptive name like 'red_cylinder_1' or 'yellow_box_1'\n"
|
| 59 |
+
"- 'type': either 'FLAMMABLE' or 'CHEMICAL'\n"
|
| 60 |
+
"- 'color': the detected color (e.g., 'red', 'yellow')\n"
|
| 61 |
+
"- 'shape': the detected shape (e.g., 'cylinder', 'box', 'sphere')\n"
|
| 62 |
+
"- 'box_2d': bounding box as [ymin, xmin, ymax, xmax] normalized to 0-1000 scale\n\n"
|
| 63 |
+
"Return ONLY a JSON array of detected hazardous items. "
|
| 64 |
+
"If no hazardous items are visible, return an empty array []."
|
| 65 |
+
)
|
| 66 |
+
|
| 67 |
+
def _get_gemini_model(self):
|
| 68 |
+
"""Lazy-initialize Gemini model (only when analyze_frame is called)."""
|
| 69 |
+
if self._gemini_model is None:
|
| 70 |
+
api_key = os.environ.get("GOOGLE_API_KEY")
|
| 71 |
+
if not api_key:
|
| 72 |
+
raise ValueError(
|
| 73 |
+
"GOOGLE_API_KEY environment variable not set.\n"
|
| 74 |
+
"Get one at https://aistudio.google.com/apikey"
|
| 75 |
+
)
|
| 76 |
+
genai.configure(api_key=api_key)
|
| 77 |
+
self._gemini_model = genai.GenerativeModel(
|
| 78 |
+
model_name="gemini-3-flash-preview",
|
| 79 |
+
system_instruction=self.system_instruction,
|
| 80 |
+
generation_config={"response_mime_type": "application/json"}
|
| 81 |
+
)
|
| 82 |
+
return self._gemini_model
|
| 83 |
+
|
| 84 |
+
def capture_frame(self) -> Image.Image:
|
| 85 |
+
"""
|
| 86 |
+
Capture a single frame.
|
| 87 |
+
Uses direct simulation rendering if available, otherwise OBS camera.
|
| 88 |
+
"""
|
| 89 |
+
if self.simulation is not None:
|
| 90 |
+
return self._capture_from_simulation()
|
| 91 |
+
else:
|
| 92 |
+
return self._capture_from_obs()
|
| 93 |
+
|
| 94 |
+
def _capture_from_simulation(self) -> Image.Image:
|
| 95 |
+
"""Render a frame directly from the MuJoCo simulation."""
|
| 96 |
+
frame = self.simulation.render_frame(camera="overview")
|
| 97 |
+
return Image.fromarray(frame)
|
| 98 |
+
|
| 99 |
+
def _capture_from_obs(self) -> Image.Image:
|
| 100 |
+
"""Capture a frame from the OBS Virtual Camera."""
|
| 101 |
+
if self._video_cap is None or not self._video_cap.isOpened():
|
| 102 |
+
self._video_cap = cv2.VideoCapture(self.device_id)
|
| 103 |
+
if not self._video_cap.isOpened():
|
| 104 |
+
raise RuntimeError(
|
| 105 |
+
f"Could not open video device /dev/video{self.device_id}. "
|
| 106 |
+
"Ensure OBS Virtual Camera is running."
|
| 107 |
+
)
|
| 108 |
+
# Warm up — discard stale frames
|
| 109 |
+
for _ in range(5):
|
| 110 |
+
self._video_cap.read()
|
| 111 |
+
|
| 112 |
+
ret, frame = self._video_cap.read()
|
| 113 |
+
if not ret:
|
| 114 |
+
raise RuntimeError("Failed to read frame from OBS Virtual Camera")
|
| 115 |
+
|
| 116 |
+
frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
|
| 117 |
+
return Image.fromarray(frame_rgb)
|
| 118 |
+
|
| 119 |
+
def analyze_frame(self, pil_image: Image.Image) -> List[Dict]:
|
| 120 |
+
"""
|
| 121 |
+
Send the image to Gemini VLM and parse the structured JSON response.
|
| 122 |
+
|
| 123 |
+
Returns:
|
| 124 |
+
List of dicts, each with keys: name, type, color, shape, box_2d
|
| 125 |
+
"""
|
| 126 |
+
prompt = (
|
| 127 |
+
"Analyze this image of a robotic sorting station. "
|
| 128 |
+
"Identify all FLAMMABLE (red) and CHEMICAL (yellow) items "
|
| 129 |
+
"on the conveyor belt. Return their positions as bounding boxes."
|
| 130 |
+
)
|
| 131 |
+
|
| 132 |
+
logger.info("Sending frame to Gemini VLM...")
|
| 133 |
+
model = self._get_gemini_model()
|
| 134 |
+
response = model.generate_content([prompt, pil_image])
|
| 135 |
+
|
| 136 |
+
raw_text = getattr(response, "text", None)
|
| 137 |
+
if not isinstance(raw_text, str) or not raw_text.strip():
|
| 138 |
+
logger.error("VLM response did not contain JSON text output")
|
| 139 |
+
return []
|
| 140 |
+
|
| 141 |
+
try:
|
| 142 |
+
results = json.loads(raw_text)
|
| 143 |
+
if isinstance(results, dict) and "items" in results:
|
| 144 |
+
results = results["items"]
|
| 145 |
+
if not isinstance(results, list):
|
| 146 |
+
logger.error(f"Unexpected VLM JSON shape: {type(results).__name__}")
|
| 147 |
+
return []
|
| 148 |
+
logger.info(f"VLM detected {len(results)} hazardous items")
|
| 149 |
+
return results
|
| 150 |
+
except (json.JSONDecodeError, TypeError):
|
| 151 |
+
logger.error(f"Failed to parse VLM response:\n{raw_text}")
|
| 152 |
+
return []
|
| 153 |
+
|
| 154 |
+
def detect_hazards(self) -> List[Dict]:
|
| 155 |
+
"""
|
| 156 |
+
Full pipeline: capture frame → analyze → return results.
|
| 157 |
+
Convenience method combining capture_frame() and analyze_frame().
|
| 158 |
+
"""
|
| 159 |
+
image = self.capture_frame()
|
| 160 |
+
return self.analyze_frame(image)
|
| 161 |
+
|
| 162 |
+
def close(self):
|
| 163 |
+
"""Release video capture resources."""
|
| 164 |
+
if self._video_cap is not None:
|
| 165 |
+
self._video_cap.release()
|
| 166 |
+
self._video_cap = None
|
| 167 |
+
|
| 168 |
+
|
| 169 |
+
# ─── CLI entry point ────────────────────────────────────────────────────────
|
| 170 |
+
|
| 171 |
+
def main():
|
| 172 |
+
import argparse
|
| 173 |
+
|
| 174 |
+
parser = argparse.ArgumentParser(description="SemSorter Hazard Detection")
|
| 175 |
+
parser.add_argument("--direct", action="store_true",
|
| 176 |
+
help="Use direct simulation rendering instead of OBS")
|
| 177 |
+
parser.add_argument("--device", type=int, default=4,
|
| 178 |
+
help="OBS Virtual Camera device ID (default: 4)")
|
| 179 |
+
parser.add_argument("--output", default="vision_debug.png",
|
| 180 |
+
help="Save captured frame to this path")
|
| 181 |
+
args = parser.parse_args()
|
| 182 |
+
|
| 183 |
+
logging.basicConfig(level=logging.INFO)
|
| 184 |
+
|
| 185 |
+
simulation = None
|
| 186 |
+
if args.direct:
|
| 187 |
+
# Must be set before importing MuJoCo/controller in this process.
|
| 188 |
+
os.environ.setdefault("MUJOCO_GL", "egl")
|
| 189 |
+
# Import and initialize simulation for direct rendering
|
| 190 |
+
try:
|
| 191 |
+
from ..simulation.controller import SemSorterSimulation
|
| 192 |
+
except ImportError:
|
| 193 |
+
sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..', 'simulation'))
|
| 194 |
+
from controller import SemSorterSimulation
|
| 195 |
+
print("Initializing simulation for direct rendering...")
|
| 196 |
+
simulation = SemSorterSimulation()
|
| 197 |
+
simulation.load_scene()
|
| 198 |
+
simulation.step(200) # Let physics settle
|
| 199 |
+
|
| 200 |
+
processor = HazardDetectionProcessor(
|
| 201 |
+
device_id=args.device,
|
| 202 |
+
simulation=simulation
|
| 203 |
+
)
|
| 204 |
+
|
| 205 |
+
try:
|
| 206 |
+
print("Capturing frame...")
|
| 207 |
+
image = processor.capture_frame()
|
| 208 |
+
image.save(args.output)
|
| 209 |
+
print(f"Saved frame to {args.output}")
|
| 210 |
+
|
| 211 |
+
print("Analyzing frame with Gemini VLM...")
|
| 212 |
+
results = processor.analyze_frame(image)
|
| 213 |
+
|
| 214 |
+
print("\n" + "=" * 50)
|
| 215 |
+
print(" HAZARD DETECTION RESULTS")
|
| 216 |
+
print("=" * 50)
|
| 217 |
+
|
| 218 |
+
if not results:
|
| 219 |
+
print(" No hazardous items detected.")
|
| 220 |
+
else:
|
| 221 |
+
for i, item in enumerate(results, 1):
|
| 222 |
+
print(f"\n [{i}] {item.get('name', 'unknown')}")
|
| 223 |
+
print(f" Type: {item.get('type', '?')}")
|
| 224 |
+
print(f" Color: {item.get('color', '?')}")
|
| 225 |
+
print(f" Shape: {item.get('shape', '?')}")
|
| 226 |
+
print(f" Box: {item.get('box_2d', '?')}")
|
| 227 |
+
|
| 228 |
+
print("\n" + "=" * 50)
|
| 229 |
+
print(f" Total hazardous items: {len(results)}")
|
| 230 |
+
print("=" * 50)
|
| 231 |
+
|
| 232 |
+
finally:
|
| 233 |
+
processor.close()
|
| 234 |
+
if simulation is not None and hasattr(simulation, "close"):
|
| 235 |
+
simulation.close()
|
| 236 |
+
|
| 237 |
+
|
| 238 |
+
if __name__ == "__main__":
|
| 239 |
+
main()
|
SemSorter/vision/vlm_bridge.py
ADDED
|
@@ -0,0 +1,269 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
SemSorter VLM-to-Simulation Bridge
|
| 3 |
+
|
| 4 |
+
Maps VLM hazard detections to simulation item names and orchestrates
|
| 5 |
+
the pick-and-place sequence. This is the glue between Phase 2 (Vision)
|
| 6 |
+
and Phase 1 (Simulation).
|
| 7 |
+
|
| 8 |
+
Usage:
|
| 9 |
+
# End-to-end test (direct render, no OBS):
|
| 10 |
+
MUJOCO_GL=egl GOOGLE_API_KEY=... python3 vlm_bridge.py --direct
|
| 11 |
+
|
| 12 |
+
# With OBS Virtual Camera:
|
| 13 |
+
GOOGLE_API_KEY=... python3 vlm_bridge.py
|
| 14 |
+
"""
|
| 15 |
+
|
| 16 |
+
import os
|
| 17 |
+
import sys
|
| 18 |
+
import logging
|
| 19 |
+
from typing import List, Dict, Optional, Tuple
|
| 20 |
+
|
| 21 |
+
try:
|
| 22 |
+
from .vision_pipeline import HazardDetectionProcessor
|
| 23 |
+
except ImportError:
|
| 24 |
+
from vision_pipeline import HazardDetectionProcessor
|
| 25 |
+
|
| 26 |
+
try:
|
| 27 |
+
from ..simulation.controller import BinType, SemSorterSimulation
|
| 28 |
+
except ImportError:
|
| 29 |
+
sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..', 'simulation'))
|
| 30 |
+
from controller import BinType, SemSorterSimulation
|
| 31 |
+
|
| 32 |
+
logger = logging.getLogger(__name__)
|
| 33 |
+
|
| 34 |
+
|
| 35 |
+
class VLMSimBridge:
|
| 36 |
+
"""
|
| 37 |
+
Bridge between VLM hazard detections and the simulation controller.
|
| 38 |
+
|
| 39 |
+
Matching strategy:
|
| 40 |
+
1. VLM detects items by color/shape → returns type (FLAMMABLE/CHEMICAL)
|
| 41 |
+
2. Simulation has named items with known hazard types
|
| 42 |
+
3. We match VLM detections to unpicked simulation items of the same type
|
| 43 |
+
4. For multiple items of the same type, we use spatial ordering (left-to-right
|
| 44 |
+
on the conveyor) to assign matches
|
| 45 |
+
"""
|
| 46 |
+
|
| 47 |
+
def __init__(self, simulation, device_id: int = 4, use_direct: bool = False):
|
| 48 |
+
"""
|
| 49 |
+
Args:
|
| 50 |
+
simulation: SemSorterSimulation instance
|
| 51 |
+
device_id: OBS Virtual Camera device ID
|
| 52 |
+
use_direct: If True, render frames from simulation instead of OBS
|
| 53 |
+
"""
|
| 54 |
+
self.simulation = simulation
|
| 55 |
+
self.processor = HazardDetectionProcessor(
|
| 56 |
+
device_id=device_id,
|
| 57 |
+
simulation=simulation if use_direct else None
|
| 58 |
+
)
|
| 59 |
+
|
| 60 |
+
def get_unpicked_items_by_type(self, hazard_type: str) -> List[Tuple[str, float]]:
|
| 61 |
+
"""
|
| 62 |
+
Get unpicked simulation items of a given hazard type,
|
| 63 |
+
sorted by X position (leftmost first = highest priority on conveyor).
|
| 64 |
+
|
| 65 |
+
Returns:
|
| 66 |
+
List of (item_name, x_position) tuples
|
| 67 |
+
"""
|
| 68 |
+
type_map = {
|
| 69 |
+
"FLAMMABLE": BinType.FLAMMABLE,
|
| 70 |
+
"CHEMICAL": BinType.CHEMICAL,
|
| 71 |
+
}
|
| 72 |
+
|
| 73 |
+
target_type = type_map.get(hazard_type)
|
| 74 |
+
if target_type is None:
|
| 75 |
+
return []
|
| 76 |
+
|
| 77 |
+
items = []
|
| 78 |
+
for name, info in self.simulation.items.items():
|
| 79 |
+
if info.hazard_type == target_type and not info.picked:
|
| 80 |
+
pos = self.simulation.get_item_pos(name)
|
| 81 |
+
if pos is not None:
|
| 82 |
+
items.append((name, pos[0])) # x_position for sorting
|
| 83 |
+
|
| 84 |
+
# Sort by X (most negative = leftmost on conveyor = first to pick)
|
| 85 |
+
items.sort(key=lambda x: x[1])
|
| 86 |
+
return items
|
| 87 |
+
|
| 88 |
+
def match_detections_to_items(self, detections: List[Dict]) -> List[Dict]:
|
| 89 |
+
"""
|
| 90 |
+
Match VLM detections to simulation item names.
|
| 91 |
+
|
| 92 |
+
Each detection gets an additional 'sim_item' key with the matched
|
| 93 |
+
simulation item name, and 'bin_type' with the target bin.
|
| 94 |
+
|
| 95 |
+
Returns:
|
| 96 |
+
List of matched detections with sim_item and bin_type fields added
|
| 97 |
+
"""
|
| 98 |
+
# Track which items have already been matched
|
| 99 |
+
matched_items = set()
|
| 100 |
+
results = []
|
| 101 |
+
|
| 102 |
+
def box_left_x(det: Dict) -> float:
|
| 103 |
+
box = det.get("box_2d")
|
| 104 |
+
if isinstance(box, (list, tuple)) and len(box) >= 2:
|
| 105 |
+
try:
|
| 106 |
+
return float(box[1])
|
| 107 |
+
except (TypeError, ValueError):
|
| 108 |
+
pass
|
| 109 |
+
return 1000.0
|
| 110 |
+
|
| 111 |
+
# Group detections by type
|
| 112 |
+
for det_type in ["FLAMMABLE", "CHEMICAL"]:
|
| 113 |
+
type_detections = []
|
| 114 |
+
for d in detections:
|
| 115 |
+
if not isinstance(d, dict):
|
| 116 |
+
continue
|
| 117 |
+
dtype = str(d.get("type", "")).strip().upper()
|
| 118 |
+
if dtype == det_type:
|
| 119 |
+
type_detections.append(d)
|
| 120 |
+
available_items = self.get_unpicked_items_by_type(det_type)
|
| 121 |
+
|
| 122 |
+
# Sort detections by x position of bounding box (leftmost first)
|
| 123 |
+
type_detections.sort(key=box_left_x)
|
| 124 |
+
|
| 125 |
+
bin_type = BinType.FLAMMABLE if det_type == "FLAMMABLE" else BinType.CHEMICAL
|
| 126 |
+
|
| 127 |
+
for i, detection in enumerate(type_detections):
|
| 128 |
+
# Find first available item not yet matched
|
| 129 |
+
sim_item = None
|
| 130 |
+
for item_name, _ in available_items:
|
| 131 |
+
if item_name not in matched_items:
|
| 132 |
+
sim_item = item_name
|
| 133 |
+
matched_items.add(item_name)
|
| 134 |
+
break
|
| 135 |
+
|
| 136 |
+
if sim_item:
|
| 137 |
+
detection["sim_item"] = sim_item
|
| 138 |
+
detection["bin_type"] = bin_type
|
| 139 |
+
results.append(detection)
|
| 140 |
+
logger.info(f"Matched VLM '{detection.get('name')}' → "
|
| 141 |
+
f"sim '{sim_item}' → bin '{bin_type.value}'")
|
| 142 |
+
else:
|
| 143 |
+
logger.warning(f"No unmatched sim item for VLM detection: "
|
| 144 |
+
f"{detection.get('name')} ({det_type})")
|
| 145 |
+
|
| 146 |
+
return results
|
| 147 |
+
|
| 148 |
+
def detect_and_sort(self) -> Dict:
|
| 149 |
+
"""
|
| 150 |
+
Full pipeline: detect hazards → match to sim items → pick and place all.
|
| 151 |
+
|
| 152 |
+
Returns:
|
| 153 |
+
Summary dict with detection count, sort count, and details
|
| 154 |
+
"""
|
| 155 |
+
# Step 1: Detect hazards
|
| 156 |
+
logger.info("Step 1: Detecting hazards with VLM...")
|
| 157 |
+
detections = self.processor.detect_hazards()
|
| 158 |
+
logger.info(f"VLM found {len(detections)} hazardous items")
|
| 159 |
+
|
| 160 |
+
if not detections:
|
| 161 |
+
return {"detected": 0, "matched": 0, "sorted": 0, "details": []}
|
| 162 |
+
|
| 163 |
+
# Step 2: Match to simulation items
|
| 164 |
+
logger.info("Step 2: Matching detections to simulation items...")
|
| 165 |
+
matched = self.match_detections_to_items(detections)
|
| 166 |
+
logger.info(f"Matched {len(matched)} items")
|
| 167 |
+
|
| 168 |
+
# Step 3: Pick and place each matched item
|
| 169 |
+
logger.info("Step 3: Executing pick-and-place sequence...")
|
| 170 |
+
details = []
|
| 171 |
+
sorted_count = 0
|
| 172 |
+
|
| 173 |
+
for match in matched:
|
| 174 |
+
item_name = match["sim_item"]
|
| 175 |
+
bin_type = match["bin_type"]
|
| 176 |
+
vlm_name = match.get("name", "unknown")
|
| 177 |
+
|
| 178 |
+
logger.info(f"Sorting: {vlm_name} ({item_name}) → {bin_type.value}")
|
| 179 |
+
success = self.simulation.pick_and_place(item_name, bin_type)
|
| 180 |
+
|
| 181 |
+
# Let remaining items settle after the arm moves
|
| 182 |
+
self.simulation.step(200)
|
| 183 |
+
|
| 184 |
+
details.append({
|
| 185 |
+
"vlm_name": vlm_name,
|
| 186 |
+
"sim_item": item_name,
|
| 187 |
+
"target_bin": bin_type.value,
|
| 188 |
+
"success": success,
|
| 189 |
+
})
|
| 190 |
+
|
| 191 |
+
if success:
|
| 192 |
+
sorted_count += 1
|
| 193 |
+
|
| 194 |
+
return {
|
| 195 |
+
"detected": len(detections),
|
| 196 |
+
"matched": len(matched),
|
| 197 |
+
"sorted": sorted_count,
|
| 198 |
+
"details": details,
|
| 199 |
+
}
|
| 200 |
+
|
| 201 |
+
def close(self):
|
| 202 |
+
"""Release resources."""
|
| 203 |
+
self.processor.close()
|
| 204 |
+
|
| 205 |
+
|
| 206 |
+
# ─── CLI entry point ────────────────────────────────────────────────────────
|
| 207 |
+
|
| 208 |
+
def main():
|
| 209 |
+
import argparse
|
| 210 |
+
|
| 211 |
+
parser = argparse.ArgumentParser(description="SemSorter VLM-Sim Bridge")
|
| 212 |
+
parser.add_argument("--direct", action="store_true",
|
| 213 |
+
help="Use direct simulation rendering instead of OBS")
|
| 214 |
+
parser.add_argument("--device", type=int, default=4,
|
| 215 |
+
help="OBS Virtual Camera device ID (default: 4)")
|
| 216 |
+
args = parser.parse_args()
|
| 217 |
+
|
| 218 |
+
logging.basicConfig(level=logging.INFO)
|
| 219 |
+
|
| 220 |
+
# Initialize simulation
|
| 221 |
+
print("Initializing simulation...")
|
| 222 |
+
if args.direct:
|
| 223 |
+
os.environ.setdefault("MUJOCO_GL", "egl")
|
| 224 |
+
sim = SemSorterSimulation()
|
| 225 |
+
sim.load_scene()
|
| 226 |
+
sim.step(200) # Let physics settle
|
| 227 |
+
|
| 228 |
+
# Initialize bridge
|
| 229 |
+
bridge = VLMSimBridge(
|
| 230 |
+
simulation=sim,
|
| 231 |
+
device_id=args.device,
|
| 232 |
+
use_direct=args.direct,
|
| 233 |
+
)
|
| 234 |
+
|
| 235 |
+
try:
|
| 236 |
+
# Run full detect → match → sort pipeline
|
| 237 |
+
print("\n" + "=" * 60)
|
| 238 |
+
print(" SemSorter: VLM-Driven Hazard Sorting")
|
| 239 |
+
print("=" * 60)
|
| 240 |
+
|
| 241 |
+
result = bridge.detect_and_sort()
|
| 242 |
+
|
| 243 |
+
print("\n" + "=" * 60)
|
| 244 |
+
print(" SORTING RESULTS")
|
| 245 |
+
print("=" * 60)
|
| 246 |
+
print(f" Hazards detected by VLM: {result['detected']}")
|
| 247 |
+
print(f" Matched to sim items: {result['matched']}")
|
| 248 |
+
print(f" Successfully sorted: {result['sorted']}")
|
| 249 |
+
|
| 250 |
+
if result['details']:
|
| 251 |
+
print("\n Details:")
|
| 252 |
+
for d in result['details']:
|
| 253 |
+
status = "✅" if d['success'] else "❌"
|
| 254 |
+
print(f" {status} {d['vlm_name']} ({d['sim_item']}) → {d['target_bin']}")
|
| 255 |
+
|
| 256 |
+
print("=" * 60)
|
| 257 |
+
|
| 258 |
+
# Save final state
|
| 259 |
+
sim.save_frame("after_sort.png")
|
| 260 |
+
print(f"\nFinal scene saved to after_sort.png")
|
| 261 |
+
|
| 262 |
+
finally:
|
| 263 |
+
bridge.close()
|
| 264 |
+
if hasattr(sim, "close"):
|
| 265 |
+
sim.close()
|
| 266 |
+
|
| 267 |
+
|
| 268 |
+
if __name__ == "__main__":
|
| 269 |
+
main()
|
Vision-Agents
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
Subproject commit f684ece6c3b6540b02de9c73431a5ffe0c576f29
|
render.yaml
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
services:
|
| 2 |
+
- type: web
|
| 3 |
+
name: semsorter
|
| 4 |
+
env: docker
|
| 5 |
+
dockerfilePath: ./Dockerfile
|
| 6 |
+
plan: free
|
| 7 |
+
envVars:
|
| 8 |
+
- key: MUJOCO_GL
|
| 9 |
+
value: egl
|
| 10 |
+
- key: GOOGLE_API_KEY
|
| 11 |
+
sync: false # Set in Render dashboard — not committed to git
|
| 12 |
+
- key: DEEPGRAM_API_KEY
|
| 13 |
+
sync: false
|
| 14 |
+
- key: ELEVENLABS_API_KEY
|
| 15 |
+
sync: false
|
| 16 |
+
- key: STREAM_API_KEY
|
| 17 |
+
sync: false
|
| 18 |
+
- key: STREAM_API_SECRET
|
| 19 |
+
sync: false
|
| 20 |
+
healthCheckPath: /api/state
|
| 21 |
+
autoDeploy: true
|
requirements-server.txt
ADDED
|
@@ -0,0 +1,18 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# SemSorter Web Server Dependencies
|
| 2 |
+
fastapi==0.115.0
|
| 3 |
+
uvicorn[standard]==0.30.6
|
| 4 |
+
websockets==13.1
|
| 5 |
+
python-multipart==0.0.12
|
| 6 |
+
httpx==0.27.2
|
| 7 |
+
pillow==10.4.0
|
| 8 |
+
numpy==1.26.4
|
| 9 |
+
|
| 10 |
+
# MuJoCo (headless, EGL)
|
| 11 |
+
mujoco==3.2.0
|
| 12 |
+
|
| 13 |
+
# Google Gemini (legacy + new SDK — both used for compatibility)
|
| 14 |
+
google-generativeai==0.8.3
|
| 15 |
+
google-genai==1.0.0
|
| 16 |
+
|
| 17 |
+
# dotenv for loading .env files
|
| 18 |
+
python-dotenv==1.0.1
|