SemSorter commited on
Commit
2588ff8
·
0 Parent(s):

feat: SemSorter — AI hazard sorting with Vision-Agents SDK

Browse files

- Phase 1: MuJoCo Franka Panda simulation with pick-and-place
- Phase 2: Gemini VLM hazard detection pipeline
- Phase 3: Vision-Agents SDK agent (gemini.LLM + deepgram.STT + elevenlabs.TTS)
- Phase 4: FastAPI web server with WebSocket live video + chat UI

Closes all phases.

.env.example ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SemSorter Environment Variables
2
+ # Copy to .env and fill in your API keys
3
+
4
+ # GetStream (for real-time video/audio transport)
5
+ STREAM_API_KEY="y3dc7e4xhnsd"
6
+ STREAM_API_SECRET="7kg397sb74r4ambaty5tw4uftd63866sddkgmdtbnktk6ga28cfxuyqevtsffuey"
7
+
8
+ # Google Gemini (for LLM orchestration + VLM hazard detection)
9
+ GOOGLE_API_KEY="AIzaSyCsQc6fjzXElbhAagjL5ORfwUf2v8FZzb4"
10
+
11
+ # Deepgram (for Speech-to-Text)
12
+ DEEPGRAM_API_KEY="21e5a2c42257394eb0019d131809c16a6377d19b"
13
+
14
+ # ElevenLabs (for Text-to-Speech)
15
+ ELEVENLABS_API_KEY="sk_124e4f931e99dc230b0a1a435ab667cf330c088a1b769a15"
.gitignore ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Debug / generated images
2
+ *.png
3
+ !SemSorter/vision/vision_debug.png
4
+
5
+ # Python
6
+ __pycache__/
7
+ *.pyc
8
+ *.pyo
9
+ .eggs/
10
+ *.egg-info/
11
+
12
+ # MuJoCo
13
+ *.mjb
14
+ mujoco-*/
15
+ mujoco_menagerie/
16
+
17
+ # Vision-Agents SDK venv (too large for git)
18
+ Vision-Agents/.venv/
19
+ Vision-Agents/__pycache__/
20
+
21
+ # uv cache
22
+ .uv/
23
+ uv.lock
24
+
25
+ # IDE
26
+ .vscode/
27
+ .idea/
28
+
29
+ # OS
30
+ .DS_Store
31
+ Thumbs.db
32
+
33
+ # Environment (never commit secrets)
34
+ .env
Dockerfile ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.10-slim
2
+
3
+ # ── System deps for MuJoCo EGL rendering ─────────────────────────────────────
4
+ RUN apt-get update && apt-get install -y --no-install-recommends \
5
+ libgl1-mesa-glx \
6
+ libglib2.0-0 \
7
+ libegl1-mesa \
8
+ libegl1 \
9
+ libgles2 \
10
+ libglvnd0 \
11
+ libglx0 \
12
+ libx11-6 \
13
+ wget \
14
+ && rm -rf /var/lib/apt/lists/*
15
+
16
+ # ── Create working directory ──────────────────────────────────────────────────
17
+ WORKDIR /app
18
+
19
+ # ── Copy requirements first (layer caching) ──────────────────────────────────
20
+ COPY requirements-server.txt ./
21
+ RUN pip install --no-cache-dir -r requirements-server.txt
22
+
23
+ # ── Copy project ──────────────────────────────────────────────────────────────
24
+ COPY . .
25
+
26
+ # ── MuJoCo environment ────────────────────────────────────────────────────────
27
+ ENV MUJOCO_GL=egl
28
+ ENV PYOPENGL_PLATFORM=egl
29
+
30
+ # ── Expose port ───────────────────────────────────────────────────────────────
31
+ EXPOSE 8000
32
+
33
+ # ── Start server ──────────────────────────────────────────────────────────────
34
+ CMD ["uvicorn", "SemSorter.server.app:app", "--host", "0.0.0.0", "--port", "8000"]
README.md ADDED
@@ -0,0 +1,120 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SemSorter — AI Hazard Sorting System
2
+
3
+ > **Real-time robotic arm simulation controlled by a multimodal AI agent using the [Vision-Agents SDK](https://github.com/GetStream/vision-agents) by GetStream.**
4
+
5
+ [![Demo](https://img.shields.io/badge/Live%20Demo-Render.com-4f46e5)](https://semsorter.onrender.com)
6
+
7
+ ---
8
+
9
+ ## 🤖 Overview
10
+
11
+ SemSorter is an AI-powered hazardous waste sorting system where a Franka Panda robotic arm, simulated in MuJoCo, is controlled by a multimodal AI agent. The agent:
12
+
13
+ 1. **Watches** the conveyor belt via a live camera feed
14
+ 2. **Detects** hazardous items (flammable / chemical) using **Gemini VLM**
15
+ 3. **Plans and executes** pick-and-place operations via **Gemini LLM function-calling**
16
+ 4. **Speaks back** results using **ElevenLabs TTS**
17
+ 5. **Listens** to voice commands via **Deepgram STT**
18
+
19
+ All orchestration uses the **[Vision-Agents SDK](https://github.com/GetStream/vision-agents)** by GetStream.
20
+
21
+ ---
22
+
23
+ ## 🏗 Architecture
24
+
25
+ ```
26
+ Browser ←─── WebSocket ───→ FastAPI Server
27
+
28
+ Vision-Agents SDK Agent
29
+ ┌─────────┴──────────┐
30
+ gemini.LLM deepgram.STT
31
+ (tool-calling) (voice→text)
32
+
33
+ VLM Bridge
34
+
35
+ MuJoCo Sim (Franka Panda)
36
+ ```
37
+
38
+ ---
39
+
40
+ ## 🚀 Quick Start
41
+
42
+ ### Prerequisites
43
+ - Python 3.10+
44
+ - MuJoCo 3.x
45
+ - EGL (headless GPU rendering)
46
+
47
+ ### Local Setup
48
+
49
+ ```bash
50
+ # Clone
51
+ git clone https://github.com/YOUR_USERNAME/SemSorter.git
52
+ cd SemSorter
53
+
54
+ # Install dependencies
55
+ pip install -r requirements-server.txt
56
+
57
+ # Configure API keys
58
+ cp .env.example .env
59
+ # Edit .env with your keys:
60
+ # GOOGLE_API_KEY, DEEPGRAM_API_KEY, ELEVENLABS_API_KEY
61
+ # STREAM_API_KEY, STREAM_API_SECRET
62
+
63
+ # Run
64
+ MUJOCO_GL=egl uvicorn SemSorter.server.app:app --host 0.0.0.0 --port 8000
65
+ # Open http://localhost:8000
66
+ ```
67
+
68
+ ### Voice Agent (Vision-Agents SDK CLI)
69
+ ```bash
70
+ cd Vision-Agents
71
+ MUJOCO_GL=egl uv run python ../SemSorter/agent/agent.py run
72
+ ```
73
+
74
+ ---
75
+
76
+ ## 📦 Project Structure
77
+
78
+ ```
79
+ SemSorter/
80
+ ├── SemSorter/
81
+ │ ├── simulation/
82
+ │ │ ├── controller.py # MuJoCo sim + IK + pick-and-place
83
+ │ │ └── semsorter_scene.xml # MJCF scene (Panda + conveyor + bins)
84
+ │ ├── vision/
85
+ │ │ ├── vision_pipeline.py # Gemini VLM hazard detection
86
+ │ │ └── vlm_bridge.py # VLM → sim item matching
87
+ │ ├── agent/
88
+ │ │ ├── agent.py # Vision-Agents SDK agent
89
+ │ │ └── semsorter_instructions.md
90
+ │ └── server/
91
+ │ ├── app.py # FastAPI + WebSocket video stream
92
+ │ ├── agent_bridge.py # SDK bridge + quota detection
93
+ │ └── static/index.html # Web UI
94
+ ├── Vision-Agents/ # GetStream Vision-Agents SDK
95
+ ├── Dockerfile
96
+ ├── render.yaml
97
+ └── requirements-server.txt
98
+ ```
99
+
100
+ ---
101
+
102
+ ## 🔑 API Keys Required
103
+
104
+ | Service | Purpose | Free tier |
105
+ |---|---|---|
106
+ | Google Gemini | LLM orchestration + VLM detection | 15 RPM |
107
+ | Deepgram | Speech-to-Text | 45 min/month |
108
+ | ElevenLabs | Text-to-Speech | ~10k chars/month |
109
+ | GetStream | Real-time video call (Voice agent) | Free tier available |
110
+
111
+ > **API exhaustion handling:** The server detects quota errors (`429 / ResourceExhausted`) and automatically switches to demo-mode per service, showing a banner in the UI.
112
+
113
+ ---
114
+
115
+ ## 🐳 Deploy to Render
116
+
117
+ 1. Fork this repo
118
+ 2. Create a new **Web Service** on [Render.com](https://render.com) pointing to your fork
119
+ 3. Add your API keys as **Environment Variables** in the Render dashboard
120
+ 4. Done — Render auto-deploys from `render.yaml`
SemSorter/agent/__init__.py ADDED
File without changes
SemSorter/agent/agent.py ADDED
@@ -0,0 +1,252 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ SemSorter Agent — Vision-Agents SDK Integration
3
+
4
+ This module creates a real-time AI agent using GetStream's Vision-Agents SDK.
5
+ The agent watches the MuJoCo simulation via video, listens to voice commands,
6
+ detects hazardous items using Gemini VLM, and triggers pick-and-place operations.
7
+
8
+ Usage (from the Vision-Agents directory):
9
+ # Set env vars in .env first, then:
10
+ uv run python ../SemSorter/SemSorter/agent/agent.py run
11
+ """
12
+
13
+ import logging
14
+ import os
15
+ import sys
16
+ import atexit
17
+ from pathlib import Path
18
+ from typing import Any, Dict
19
+
20
+ from dotenv import load_dotenv
21
+ from vision_agents.core import Agent, AgentLauncher, Runner, User
22
+ from vision_agents.plugins import deepgram, elevenlabs, gemini, getstream
23
+
24
+ logger = logging.getLogger(__name__)
25
+
26
+ # ─── Path setup ──────────────────────────────────────────────────────────────
27
+ # Add SemSorter packages to sys.path so we can import simulation & vision
28
+ AGENT_DIR = Path(__file__).resolve().parent
29
+ SEMSORTER_DIR = AGENT_DIR.parent
30
+ PROJECT_ROOT = SEMSORTER_DIR.parent
31
+
32
+ sys.path.insert(0, str(SEMSORTER_DIR / "simulation"))
33
+ sys.path.insert(0, str(SEMSORTER_DIR / "vision"))
34
+
35
+ # Load environment variables
36
+ load_dotenv(PROJECT_ROOT / ".env")
37
+
38
+ # ─── Simulation singleton ───────────────────────────────────────────────────
39
+ _simulation = None
40
+ _bridge = None
41
+
42
+
43
+ def get_simulation():
44
+ """Lazy-initialize the MuJoCo simulation (singleton)."""
45
+ global _simulation
46
+ if _simulation is None:
47
+ os.environ.setdefault("MUJOCO_GL", "egl")
48
+ from controller import SemSorterSimulation
49
+
50
+ logger.info("Initializing MuJoCo simulation...")
51
+ _simulation = SemSorterSimulation()
52
+ _simulation.load_scene()
53
+ _simulation.step(200) # Let physics settle
54
+ logger.info("Simulation ready.")
55
+ return _simulation
56
+
57
+
58
+ def get_bridge():
59
+ """Lazy-initialize the VLM-Simulation bridge (singleton)."""
60
+ global _bridge
61
+ if _bridge is None:
62
+ from vlm_bridge import VLMSimBridge
63
+
64
+ sim = get_simulation()
65
+ _bridge = VLMSimBridge(simulation=sim, use_direct=True)
66
+ logger.info("VLM-Sim bridge ready.")
67
+ return _bridge
68
+
69
+
70
+ class _EGLStderrFilter:
71
+ """Stderr wrapper that suppresses only known EGL teardown noise."""
72
+ _SUPPRESSED = ("EGLError", "eglDestroyContext", "eglMakeCurrent",
73
+ "EGL_NOT_INITIALIZED", "GLContext.__del__",
74
+ "Renderer.__del__", "SfuStatsReporter",
75
+ "Task was destroyed but it is pending")
76
+
77
+ def __init__(self, real):
78
+ self._real = real
79
+
80
+ def write(self, s):
81
+ if any(tok in s for tok in self._SUPPRESSED):
82
+ return len(s) # silently consume
83
+ return self._real.write(s)
84
+
85
+ def flush(self):
86
+ self._real.flush()
87
+
88
+ def __getattr__(self, name):
89
+ return getattr(self._real, name)
90
+
91
+
92
+ def close_resources() -> None:
93
+ """Release singleton resources on process shutdown."""
94
+ # Only muffle known-harmless EGL teardown noise, keep real errors visible
95
+ sys.stderr = _EGLStderrFilter(sys.stderr)
96
+
97
+ global _bridge, _simulation
98
+ if _bridge is not None:
99
+ try:
100
+ _bridge.close()
101
+ except Exception:
102
+ pass
103
+ _bridge = None
104
+ if _simulation is not None and hasattr(_simulation, "close"):
105
+ try:
106
+ _simulation.close()
107
+ except Exception:
108
+ pass
109
+ _simulation = None
110
+
111
+
112
+ atexit.register(close_resources)
113
+
114
+
115
+ # ─── LLM Setup with Tool Registration ───────────────────────────────────────
116
+
117
+ INSTRUCTIONS = Path(AGENT_DIR / "semsorter_instructions.md").read_text()
118
+
119
+
120
+ def setup_llm(model: str = "gemini-3-flash-preview") -> gemini.LLM:
121
+ """Create and configure the Gemini LLM with registered simulation tools."""
122
+ llm = gemini.LLM(model)
123
+
124
+ @llm.register_function(
125
+ description="Scan the conveyor belt camera feed for hazardous items. "
126
+ "Returns a list of detected hazardous items with their types and positions."
127
+ )
128
+ async def scan_for_hazards() -> Dict[str, Any]:
129
+ """Capture a frame, match detections to sim items, and return actionable IDs."""
130
+ bridge = get_bridge()
131
+ detections = bridge.processor.detect_hazards()
132
+ matched = bridge.match_detections_to_items(detections)
133
+ return {
134
+ "hazards_found": len(detections),
135
+ "items_matched": len(matched),
136
+ "items": [
137
+ {
138
+ "item_name": d.get("sim_item", "unknown"),
139
+ "bin_type": d.get("bin_type").value if d.get("bin_type") else "unknown",
140
+ "detected_name": d.get("name", "unknown"),
141
+ "type": str(d.get("type", "unknown")).lower(),
142
+ "color": d.get("color", "unknown"),
143
+ "shape": d.get("shape", "unknown"),
144
+ }
145
+ for d in matched
146
+ ],
147
+ }
148
+
149
+ @llm.register_function(
150
+ description="Pick a specific item from the conveyor and place it in "
151
+ "the designated hazard bin. Use item_name from scan results. "
152
+ "bin_type must be 'flammable' or 'chemical'."
153
+ )
154
+ async def pick_and_place_item(item_name: str, bin_type: str) -> Dict[str, Any]:
155
+ """Execute a pick-and-place operation for a specific item."""
156
+ from controller import BinType
157
+
158
+ sim = get_simulation()
159
+
160
+ type_map = {"flammable": BinType.FLAMMABLE, "chemical": BinType.CHEMICAL}
161
+ target_bin = type_map.get(bin_type.lower())
162
+ if target_bin is None:
163
+ return {"success": False, "error": f"Unknown bin type: {bin_type}"}
164
+
165
+ if item_name not in sim.items:
166
+ return {"success": False, "error": f"Unknown item: {item_name}"}
167
+
168
+ if sim.items[item_name].picked:
169
+ return {"success": False, "error": f"Item {item_name} already sorted"}
170
+
171
+ success = sim.pick_and_place(item_name, target_bin)
172
+ return {
173
+ "success": success,
174
+ "item": item_name,
175
+ "bin": bin_type,
176
+ "total_sorted": sim._items_sorted,
177
+ }
178
+
179
+ @llm.register_function(
180
+ description="Get the current state of the simulation: items, robot position, "
181
+ "and sorting progress."
182
+ )
183
+ async def get_simulation_state() -> Dict[str, Any]:
184
+ """Return current simulation state snapshot."""
185
+ sim = get_simulation()
186
+ state = sim.get_state()
187
+ return {
188
+ "time": round(state.time, 2),
189
+ "arm_busy": state.arm_busy,
190
+ "gripper_open": state.gripper_open,
191
+ "items_sorted": state.items_sorted,
192
+ "ee_position": [round(x, 3) for x in state.ee_pos],
193
+ "items": state.items,
194
+ }
195
+
196
+ @llm.register_function(
197
+ description="Automatically scan for ALL hazardous items and sort them into "
198
+ "the correct bins. This runs the full detect-match-sort pipeline."
199
+ )
200
+ async def sort_all_hazards() -> Dict[str, Any]:
201
+ """Full automated pipeline: detect → match → pick-and-place all hazards."""
202
+ bridge = get_bridge()
203
+ result = bridge.detect_and_sort()
204
+ return {
205
+ "hazards_detected": result["detected"],
206
+ "items_matched": result["matched"],
207
+ "items_sorted": result["sorted"],
208
+ "details": result["details"],
209
+ }
210
+
211
+ return llm
212
+
213
+
214
+ # ─── Agent Creation ──────────────────────────────────────────────────────────
215
+
216
+
217
+ async def create_agent(**kwargs) -> Agent:
218
+ """Create the SemSorter agent with Vision-Agents SDK."""
219
+ llm = setup_llm()
220
+
221
+ agent = Agent(
222
+ edge=getstream.Edge(),
223
+ agent_user=User(name="SemSorter AI", id="semsorter-agent"),
224
+ instructions=INSTRUCTIONS,
225
+ llm=llm,
226
+ tts=elevenlabs.TTS(model_id="eleven_flash_v2_5"),
227
+ stt=deepgram.STT(eager_turn_detection=True),
228
+ processors=[],
229
+ )
230
+
231
+ return agent
232
+
233
+
234
+ async def join_call(agent: Agent, call_type: str, call_id: str, **kwargs) -> None:
235
+ """Join a GetStream video call and start the agent loop."""
236
+ call = await agent.create_call(call_type, call_id)
237
+
238
+ async with agent.join(call):
239
+ # Greet the user
240
+ await agent.simple_response(
241
+ "Hello! I'm the SemSorter AI. I can scan the conveyor belt "
242
+ "for hazardous items and sort them into the correct bins. "
243
+ "Just tell me what to do!"
244
+ )
245
+ # Run until the call ends
246
+ await agent.finish()
247
+
248
+
249
+ # ─── Entry point ─────────────────────────────────────────────────────────────
250
+
251
+ if __name__ == "__main__":
252
+ Runner(AgentLauncher(create_agent=create_agent, join_call=join_call)).cli()
SemSorter/agent/semsorter_instructions.md ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ You are the SemSorter AI assistant — a robotic waste sorting system operator.
2
+
3
+ ## Your Role
4
+ You control a Franka Panda robot arm that sorts hazardous waste items on a conveyor belt into the correct safety bins:
5
+ - **Flammable items** (red colored) → Red flammable bin
6
+ - **Chemical items** (yellow colored) → Yellow chemical bin
7
+ - **Safe items** (gray/white/blue/green) → Leave on conveyor (no action needed)
8
+
9
+ ## Available Tools
10
+
11
+ 1. **scan_for_hazards** — Capture a frame from the conveyor camera and analyze it with the VLM to detect hazardous items. Call this FIRST when asked to sort items.
12
+ 2. **pick_and_place_item** — Pick a specific item and place it in the designated bin. Use the item_name and bin_type returned by scan_for_hazards.
13
+ 3. **get_simulation_state** — Check the current status: which items exist, which have been sorted, and the robot's position.
14
+ 4. **sort_all_hazards** — Automatically scan and sort ALL detected hazardous items in one go.
15
+
16
+ ## Behavior Rules
17
+ - When asked to "sort items" or "clean up", call `sort_all_hazards` for the full automated pipeline.
18
+ - When asked about "what's on the belt" or "scan", call `scan_for_hazards` and describe the results.
19
+ - When asked about a specific item, call `get_simulation_state` to check its status.
20
+ - Keep responses SHORT and conversational (1-2 sentences).
21
+ - Announce each action as you do it: "Scanning the belt...", "Picking up the red cylinder...", "Placed in flammable bin!"
22
+ - If no hazards are found, say something like "All clear! No hazardous items detected."
SemSorter/server/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+ # SemSorter Web Server
SemSorter/server/agent_bridge.py ADDED
@@ -0,0 +1,363 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ SemSorter Agent Bridge
3
+ ======================
4
+ Wraps the Vision-Agents SDK components (gemini.LLM, deepgram.STT, elevenlabs.TTS)
5
+ and the MuJoCo simulation into a single async service used by the FastAPI server.
6
+
7
+ Quota/API exhaustion is detected per-service and a UIstatus message is returned
8
+ so the frontend can display an informative banner before demo-mode engages.
9
+ """
10
+
11
+ import asyncio
12
+ import logging
13
+ import os
14
+ import sys
15
+ from pathlib import Path
16
+ from typing import Any, Callable, Dict, List, Optional
17
+
18
+ logger = logging.getLogger(__name__)
19
+
20
+ # ── Path setup ────────────────────────────────────────────────────────────────
21
+ _SERVER_DIR = Path(__file__).resolve().parent
22
+ _SEMSORTER_DIR = _SERVER_DIR.parent
23
+ _PROJECT_ROOT = _SEMSORTER_DIR.parent
24
+
25
+ sys.path.insert(0, str(_SEMSORTER_DIR / "simulation"))
26
+ sys.path.insert(0, str(_SEMSORTER_DIR / "vision"))
27
+ sys.path.insert(0, str(_PROJECT_ROOT / "Vision-Agents" / "agents-core"))
28
+ for _plugin in ("gemini", "deepgram", "elevenlabs", "getstream"):
29
+ _plugin_path = _PROJECT_ROOT / "Vision-Agents" / "plugins" / _plugin
30
+ if _plugin_path.exists():
31
+ sys.path.insert(0, str(_plugin_path))
32
+
33
+ # ── Quota-tracking state ──────────────────────────────────────────────────────
34
+ _quota_exceeded: Dict[str, bool] = {
35
+ "gemini": False,
36
+ "deepgram": False,
37
+ "elevenlabs": False,
38
+ }
39
+
40
+ # ── Demo-mode pre-recorded detections ────────────────────────────────────────
41
+ _DEMO_DETECTIONS = [
42
+ {"name": "red cylinder", "type": "FLAMMABLE", "color": "red",
43
+ "shape": "cylinder", "box_2d": [240, 200, 290, 260]},
44
+ {"name": "green box", "type": "FLAMMABLE", "color": "green",
45
+ "shape": "box", "box_2d": [240, 260, 285, 310]},
46
+ {"name": "yellow box", "type": "CHEMICAL", "color": "yellow",
47
+ "shape": "box", "box_2d": [240, 310, 285, 360]},
48
+ {"name": "blue box", "type": "CHEMICAL", "color": "blue",
49
+ "shape": "box", "box_2d": [240, 370, 285, 420]},
50
+ ]
51
+
52
+ # ── Singleton resources ───────────────────────────────────────────────────────
53
+ _sim = None
54
+ _bridge = None
55
+ _llm = None
56
+ _tts = None
57
+ _notify_cb: Optional[Callable[[Dict], None]] = None # Push events to WebSocket
58
+
59
+
60
+ def set_notify_callback(cb: Callable[[Dict], None]) -> None:
61
+ """Register a callback that pushes quota/status events to connected WS clients."""
62
+ global _notify_cb
63
+ _notify_cb = cb
64
+
65
+
66
+ def _push(event: Dict) -> None:
67
+ """Fire-and-forget push to the registered notify callback."""
68
+ if _notify_cb:
69
+ try:
70
+ _notify_cb(event)
71
+ except Exception:
72
+ pass
73
+
74
+
75
+ def _check_quota_error(exc: Exception) -> Optional[str]:
76
+ """Return service name if the exception indicates API quota exhaustion."""
77
+ msg = str(exc).lower()
78
+ if "resource_exhausted" in msg or "429" in msg or "quota" in msg:
79
+ if "gemini" in msg or "google" in msg:
80
+ return "gemini"
81
+ if "deepgram" in msg:
82
+ return "deepgram"
83
+ if "elevenlabs" in msg or "eleven" in msg:
84
+ return "elevenlabs"
85
+ return "unknown"
86
+ return None
87
+
88
+
89
+ def _mark_quota_exceeded(service: str) -> None:
90
+ """Mark a service as quota-exceeded and push a warning to the UI."""
91
+ if not _quota_exceeded.get(service):
92
+ _quota_exceeded[service] = True
93
+ _push({
94
+ "type": "quota_warning",
95
+ "service": service,
96
+ "message": (
97
+ f"⚠️ {service.title()} API quota exceeded — "
98
+ f"switching to demo mode for this service."
99
+ ),
100
+ })
101
+ logger.warning("Quota exceeded for %s — demo mode activated", service)
102
+
103
+
104
+ # ── Lazy initializers ─────────────────────────────────────────────────────────
105
+
106
+ def get_simulation():
107
+ global _sim
108
+ if _sim is None:
109
+ os.environ.setdefault("MUJOCO_GL", "egl")
110
+ from controller import SemSorterSimulation
111
+ logger.info("Initialising MuJoCo simulation…")
112
+ _sim = SemSorterSimulation()
113
+ _sim.load_scene()
114
+ _sim.step(300)
115
+ logger.info("Simulation ready: %d items", len(_sim.items))
116
+ return _sim
117
+
118
+
119
+ def get_bridge():
120
+ global _bridge
121
+ if _bridge is None:
122
+ from vlm_bridge import VLMSimBridge
123
+ _bridge = VLMSimBridge(simulation=get_simulation(), use_direct=True)
124
+ logger.info("VLM bridge ready")
125
+ return _bridge
126
+
127
+
128
+ def get_llm():
129
+ """Return a configured gemini.LLM instance from the Vision-Agents SDK."""
130
+ global _llm
131
+ if _llm is None:
132
+ from vision_agents.plugins.gemini.gemini_llm import GeminiLLM as GeminiLLMCls
133
+ _llm = GeminiLLMCls("gemini-2.0-flash")
134
+ _register_tools(_llm)
135
+ logger.info("Gemini LLM ready")
136
+ return _llm
137
+
138
+
139
+ def _register_tools(llm) -> None:
140
+ """Register simulation control tools on the LLM."""
141
+
142
+ @llm.register_function(description="Scan the conveyor belt for hazardous items.")
143
+ async def scan_for_hazards() -> Dict[str, Any]:
144
+ return await _scan_hazards_impl()
145
+
146
+ @llm.register_function(
147
+ description="Pick a specific item by sim name and place it in its bin. "
148
+ "bin_type must be 'flammable' or 'chemical'.")
149
+ async def pick_and_place_item(item_name: str, bin_type: str) -> Dict[str, Any]:
150
+ return await _pick_place_impl(item_name, bin_type)
151
+
152
+ @llm.register_function(description="Get current simulation state snapshot.")
153
+ async def get_simulation_state() -> Dict[str, Any]:
154
+ return _state_impl()
155
+
156
+ @llm.register_function(
157
+ description="Detect ALL hazardous items and sort them automatically.")
158
+ async def sort_all_hazards() -> Dict[str, Any]:
159
+ return await _sort_all_impl()
160
+
161
+
162
+ # ── Tool implementations ──────────────────────────────────────────────────────
163
+
164
+ async def _scan_hazards_impl() -> Dict[str, Any]:
165
+ if _quota_exceeded["gemini"]:
166
+ # Already in demo mode — return pre-recorded detections
167
+ bridge = get_bridge()
168
+ matched = bridge.match_detections_to_items(_DEMO_DETECTIONS)
169
+ return _format_scan(matched, demo=True)
170
+
171
+ try:
172
+ bridge = get_bridge()
173
+ loop = asyncio.get_event_loop()
174
+ detections = await loop.run_in_executor(
175
+ None, bridge.processor.detect_hazards)
176
+ matched = bridge.match_detections_to_items(detections)
177
+ return _format_scan(matched, demo=False)
178
+ except Exception as exc:
179
+ svc = _check_quota_error(exc)
180
+ if svc:
181
+ _mark_quota_exceeded(svc)
182
+ bridge = get_bridge()
183
+ matched = bridge.match_detections_to_items(_DEMO_DETECTIONS)
184
+ return _format_scan(matched, demo=True)
185
+ raise
186
+
187
+
188
+ def _format_scan(matched: List[Dict], demo: bool) -> Dict[str, Any]:
189
+ return {
190
+ "demo_mode": demo,
191
+ "hazards_found": len(matched),
192
+ "items": [
193
+ {
194
+ "item_name": d.get("sim_item", "unknown"),
195
+ "bin_type": d["bin_type"].value if d.get("bin_type") else "unknown",
196
+ "detected_name": d.get("name", "unknown"),
197
+ "type": str(d.get("type", "")).lower(),
198
+ "color": d.get("color", ""),
199
+ "shape": d.get("shape", ""),
200
+ }
201
+ for d in matched
202
+ ],
203
+ }
204
+
205
+
206
+ async def _pick_place_impl(item_name: str, bin_type: str) -> Dict[str, Any]:
207
+ from controller import BinType
208
+ sim = get_simulation()
209
+ type_map = {"flammable": BinType.FLAMMABLE, "chemical": BinType.CHEMICAL}
210
+ target = type_map.get(bin_type.lower())
211
+ if not target:
212
+ return {"success": False, "error": f"Unknown bin: {bin_type}"}
213
+ if item_name not in sim.items:
214
+ return {"success": False, "error": f"Unknown item: {item_name}"}
215
+ if sim.items[item_name].picked:
216
+ return {"success": False, "error": f"{item_name} already sorted"}
217
+
218
+ loop = asyncio.get_event_loop()
219
+ success = await loop.run_in_executor(None, sim.pick_and_place, item_name, target)
220
+ return {"success": success, "item": item_name, "bin": bin_type,
221
+ "total_sorted": sim._items_sorted}
222
+
223
+
224
+ def _state_impl() -> Dict[str, Any]:
225
+ sim = get_simulation()
226
+ state = sim.get_state()
227
+ return {
228
+ "time": round(state.time, 2),
229
+ "arm_busy": state.arm_busy,
230
+ "items_sorted": state.items_sorted,
231
+ "ee_position": [round(x, 3) for x in state.ee_pos],
232
+ "quota_exceeded": dict(_quota_exceeded),
233
+ "items": [
234
+ {"name": i["name"], "picked": i["picked"],
235
+ "hazard_type": i.get("hazard_type")}
236
+ for i in state.items
237
+ ],
238
+ }
239
+
240
+
241
+ async def _sort_all_impl() -> Dict[str, Any]:
242
+ """Full detect → match → sort pipeline."""
243
+ # 1. Detect
244
+ scan_result = await _scan_hazards_impl()
245
+ items = scan_result["items"]
246
+ demo = scan_result["demo_mode"]
247
+
248
+ if not items:
249
+ return {"hazards_detected": 0, "items_matched": 0, "items_sorted": 0,
250
+ "details": [], "demo_mode": demo}
251
+
252
+ # 2. Sort each matched item
253
+ details = []
254
+ sorted_count = 0
255
+ for item in items:
256
+ r = await _pick_place_impl(item["item_name"], item["bin_type"])
257
+ details.append({"item": item["item_name"], "bin": item["bin_type"],
258
+ "success": r.get("success", False)})
259
+ if r.get("success"):
260
+ sorted_count += 1
261
+
262
+ return {"hazards_detected": len(items), "items_matched": len(items),
263
+ "items_sorted": sorted_count, "details": details, "demo_mode": demo}
264
+
265
+
266
+ # ── Text → agent response ─────────────────────────────────────────────────────
267
+
268
+ async def process_text_command(text: str) -> str:
269
+ """
270
+ Send a text command to the Gemini LLM (Vision-Agents SDK).
271
+ Returns the agent's text response.
272
+ On quota error: marks exceeded + returns a canned message.
273
+ """
274
+ if _quota_exceeded["gemini"]:
275
+ return await _llm_demo_response(text)
276
+
277
+ try:
278
+ llm = get_llm()
279
+ # Use the LLM's chat method to get a response with tool-calling
280
+ response = await llm.chat(text)
281
+ return response
282
+ except Exception as exc:
283
+ svc = _check_quota_error(exc)
284
+ if svc:
285
+ _mark_quota_exceeded(svc)
286
+ return await _llm_demo_response(text)
287
+ logger.exception("LLM error")
288
+ return f"Error processing command: {exc}"
289
+
290
+
291
+ async def _llm_demo_response(text: str) -> str:
292
+ """Return a plausible demo response when Gemini quota is exhausted."""
293
+ t = text.lower()
294
+ if "scan" in t:
295
+ return ("I found 4 hazardous items on the conveyor belt: "
296
+ "2 flammable and 2 chemical. [Demo mode — Gemini quota exceeded]")
297
+ if "sort" in t or "pick" in t or "place" in t:
298
+ return ("Sorting all hazardous items into their respective bins. "
299
+ "[Demo mode — Gemini quota exceeded]")
300
+ if "state" in t or "status" in t:
301
+ state = _state_impl()
302
+ return (f"Simulation time: {state['time']}s. "
303
+ f"Items sorted: {state['items_sorted']}. "
304
+ f"Arm busy: {state['arm_busy']}. [Demo mode]")
305
+ return "I'm SemSorter AI. Ask me to scan or sort items! [Demo mode]"
306
+
307
+
308
+ # ── TTS helper ────────────────────────────────────────────────────────────────
309
+
310
+ async def text_to_speech(text: str) -> Optional[bytes]:
311
+ """
312
+ Convert text to audio bytes using ElevenLabs (Vision-Agents SDK plugin).
313
+ Returns None on quota error (frontend falls back to browser SpeechSynthesis).
314
+ """
315
+ if _quota_exceeded["elevenlabs"]:
316
+ return None
317
+ try:
318
+ from vision_agents.plugins.elevenlabs.elevenlabs_tts import ElevenLabsTTS
319
+ tts = ElevenLabsTTS(model_id="eleven_flash_v2_5")
320
+ audio_bytes = await tts.synthesize(text)
321
+ return audio_bytes
322
+ except Exception as exc:
323
+ svc = _check_quota_error(exc)
324
+ if svc == "elevenlabs" or svc == "unknown":
325
+ _mark_quota_exceeded("elevenlabs")
326
+ else:
327
+ logger.exception("TTS error")
328
+ return None
329
+
330
+
331
+ # ── STT helper (Deepgram) ─────────────────────────────────────────────────────
332
+
333
+ async def transcribe_audio(audio_bytes: bytes, mime: str = "audio/webm") -> Optional[str]:
334
+ """
335
+ Transcribe audio using Deepgram STT (Vision-Agents SDK plugin).
336
+ Returns None on quota error (frontend falls back to Web Speech API result).
337
+ """
338
+ if _quota_exceeded["deepgram"]:
339
+ return None
340
+ try:
341
+ import httpx, os
342
+ api_key = os.environ.get("DEEPGRAM_API_KEY", "")
343
+ if not api_key:
344
+ return None
345
+ async with httpx.AsyncClient() as client:
346
+ resp = await client.post(
347
+ "https://api.deepgram.com/v1/listen?model=nova-2",
348
+ headers={"Authorization": f"Token {api_key}",
349
+ "Content-Type": mime},
350
+ content=audio_bytes,
351
+ timeout=10,
352
+ )
353
+ if resp.status_code == 429:
354
+ _mark_quota_exceeded("deepgram")
355
+ return None
356
+ data = resp.json()
357
+ return (data.get("results", {})
358
+ .get("channels", [{}])[0]
359
+ .get("alternatives", [{}])[0]
360
+ .get("transcript", ""))
361
+ except Exception as exc:
362
+ logger.warning("Deepgram STT error: %s", exc)
363
+ return None
SemSorter/server/app.py ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ SemSorter FastAPI Server
3
+ ========================
4
+ Serves the web UI and bridges the Vision-Agents SDK + MuJoCo simulation.
5
+
6
+ Endpoints
7
+ ---------
8
+ GET / → index.html
9
+ WS /ws/video → MJPEG frames (~10 fps) from MuJoCo renderer
10
+ WS /ws/chat → bidirectional: text commands → agent responses + events
11
+ GET /api/state → current simulation state JSON
12
+ POST /api/sort → trigger sort_all_hazards pipeline
13
+ POST /api/command → send a text command to the agent
14
+ POST /api/transcribe → transcribe uploaded audio via Deepgram
15
+
16
+ Run locally:
17
+ cd SemSorter && MUJOCO_GL=egl uvicorn server.app:app --host 0.0.0.0 --port 8000 --reload
18
+ """
19
+
20
+ import asyncio
21
+ import base64
22
+ import io
23
+ import json
24
+ import logging
25
+ import os
26
+ from pathlib import Path
27
+ from typing import Set
28
+
29
+ from fastapi import FastAPI, WebSocket, WebSocketDisconnect, UploadFile, File
30
+ from fastapi.responses import HTMLResponse, JSONResponse
31
+ from fastapi.staticfiles import StaticFiles
32
+ import numpy as np
33
+ from PIL import Image
34
+
35
+ # ── Local imports ─────────────────────────────────────────────────────────────
36
+ from . import agent_bridge as bridge
37
+
38
+ logging.basicConfig(level=logging.INFO,
39
+ format="%(asctime)s %(levelname)s %(name)s %(message)s")
40
+ logger = logging.getLogger(__name__)
41
+
42
+ app = FastAPI(title="SemSorter", version="1.0")
43
+
44
+ # ── Static files ──────────────────────────────────────────────────────────────
45
+ _STATIC = Path(__file__).parent / "static"
46
+ _STATIC.mkdir(exist_ok=True)
47
+
48
+ # ── Connected WebSocket clients ───────────────────────────────────────────────
49
+ _chat_clients: Set[WebSocket] = set()
50
+ _video_clients: Set[WebSocket] = set()
51
+
52
+
53
+ async def _broadcast_chat(event: dict) -> None:
54
+ """Push a JSON event to all connected chat WebSocket clients."""
55
+ payload = json.dumps(event)
56
+ dead = set()
57
+ for ws in list(_chat_clients):
58
+ try:
59
+ await ws.send_text(payload)
60
+ except Exception:
61
+ dead.add(ws)
62
+ _chat_clients -= dead
63
+
64
+
65
+ def _sync_broadcast(event: dict) -> None:
66
+ """Thread-safe push called from sync code (bridge callbacks)."""
67
+ try:
68
+ loop = asyncio.get_event_loop()
69
+ if loop.is_running():
70
+ asyncio.create_task(_broadcast_chat(event))
71
+ except Exception:
72
+ pass
73
+
74
+
75
+ # Register the broadcast callback so agent_bridge can push quota warnings
76
+ bridge.set_notify_callback(_sync_broadcast)
77
+
78
+
79
+ # ── Startup: pre-warm simulation ──────────────────────────────────────────────
80
+ @app.on_event("startup")
81
+ async def startup():
82
+ logger.info("Pre-warming MuJoCo simulation…")
83
+ loop = asyncio.get_event_loop()
84
+ await loop.run_in_executor(None, bridge.get_simulation)
85
+ logger.info("Simulation ready")
86
+
87
+
88
+ # ── REST endpoints ────────────────────────────────────────────────────────────
89
+
90
+ @app.get("/", response_class=HTMLResponse)
91
+ async def index():
92
+ html_path = _STATIC / "index.html"
93
+ return HTMLResponse(html_path.read_text())
94
+
95
+
96
+ @app.get("/api/state")
97
+ async def api_state():
98
+ loop = asyncio.get_event_loop()
99
+ state = await loop.run_in_executor(None, bridge._state_impl)
100
+ return JSONResponse(state)
101
+
102
+
103
+ @app.post("/api/sort")
104
+ async def api_sort():
105
+ """Trigger the full detect-match-sort pipeline."""
106
+ result = await bridge._sort_all_impl()
107
+ await _broadcast_chat({"type": "sort_result", "data": result})
108
+ return JSONResponse(result)
109
+
110
+
111
+ @app.post("/api/command")
112
+ async def api_command(body: dict):
113
+ text = body.get("text", "").strip()
114
+ if not text:
115
+ return JSONResponse({"error": "empty command"}, status_code=400)
116
+ response_text = await bridge.process_text_command(text)
117
+ await _broadcast_chat({"type": "agent_response", "text": response_text})
118
+ return JSONResponse({"response": response_text})
119
+
120
+
121
+ @app.post("/api/transcribe")
122
+ async def api_transcribe(file: UploadFile = File(...)):
123
+ """Transcribe uploaded audio using Deepgram; returns transcript or null."""
124
+ audio_bytes = await file.read()
125
+ transcript = await bridge.transcribe_audio(audio_bytes, mime=file.content_type)
126
+ return JSONResponse({"transcript": transcript})
127
+
128
+
129
+ # ── WebSocket: chat ───────────────────────────────────────────────────────────
130
+
131
+ @app.websocket("/ws/chat")
132
+ async def ws_chat(ws: WebSocket):
133
+ await ws.accept()
134
+ _chat_clients.add(ws)
135
+ logger.info("Chat client connected (%d total)", len(_chat_clients))
136
+ try:
137
+ await ws.send_text(json.dumps({
138
+ "type": "welcome",
139
+ "text": "Connected to SemSorter AI. Ask me to scan or sort items!",
140
+ }))
141
+ while True:
142
+ raw = await ws.receive_text()
143
+ try:
144
+ msg = json.loads(raw)
145
+ except json.JSONDecodeError:
146
+ msg = {"type": "command", "text": raw}
147
+
148
+ msg_type = msg.get("type", "command")
149
+
150
+ if msg_type == "command":
151
+ text = msg.get("text", "").strip()
152
+ if text:
153
+ await _broadcast_chat({"type": "user_message", "text": text})
154
+ response = await bridge.process_text_command(text)
155
+ await _broadcast_chat({"type": "agent_response", "text": response})
156
+
157
+ elif msg_type == "scan":
158
+ result = await bridge._scan_hazards_impl()
159
+ await _broadcast_chat({"type": "scan_result", "data": result})
160
+
161
+ elif msg_type == "sort":
162
+ result = await bridge._sort_all_impl()
163
+ await _broadcast_chat({"type": "sort_result", "data": result})
164
+
165
+ elif msg_type == "state":
166
+ loop = asyncio.get_event_loop()
167
+ state = await loop.run_in_executor(None, bridge._state_impl)
168
+ await ws.send_text(json.dumps({"type": "state", "data": state}))
169
+
170
+ except WebSocketDisconnect:
171
+ pass
172
+ finally:
173
+ _chat_clients.discard(ws)
174
+ logger.info("Chat client disconnected (%d remaining)", len(_chat_clients))
175
+
176
+
177
+ # ── WebSocket: live video stream ──────────────────────────────────────────────
178
+
179
+ def _render_frame_jpeg(quality: int = 75) -> bytes:
180
+ """Render a MuJoCo frame and encode as JPEG bytes."""
181
+ sim = bridge.get_simulation()
182
+ frame = sim.render_frame(camera="overview") # numpy H×W×3
183
+ img = Image.fromarray(frame)
184
+ buf = io.BytesIO()
185
+ img.save(buf, format="JPEG", quality=quality)
186
+ return buf.getvalue()
187
+
188
+
189
+ @app.websocket("/ws/video")
190
+ async def ws_video(ws: WebSocket):
191
+ await ws.accept()
192
+ _video_clients.add(ws)
193
+ logger.info("Video client connected")
194
+ try:
195
+ loop = asyncio.get_event_loop()
196
+ while True:
197
+ jpeg_bytes = await loop.run_in_executor(None, _render_frame_jpeg)
198
+ b64 = base64.b64encode(jpeg_bytes).decode()
199
+ await ws.send_text(json.dumps({"type": "frame", "data": b64}))
200
+ await asyncio.sleep(0.1) # ~10 fps
201
+ except WebSocketDisconnect:
202
+ pass
203
+ except Exception as e:
204
+ logger.warning("Video stream error: %s", e)
205
+ finally:
206
+ _video_clients.discard(ws)
207
+ logger.info("Video client disconnected")
SemSorter/server/static/index.html ADDED
@@ -0,0 +1,427 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8"/>
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0"/>
6
+ <title>SemSorter — AI Hazard Sorting System</title>
7
+ <link rel="preconnect" href="https://fonts.googleapis.com"/>
8
+ <link href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&family=JetBrains+Mono:wght@400;500&display=swap" rel="stylesheet"/>
9
+ <style>
10
+ /* ── Design tokens ── */
11
+ :root{
12
+ --bg:#0a0d14;--surface:#111827;--surface2:#1a2235;--border:#1e2d45;
13
+ --accent:#3b82f6;--accent-glow:rgba(59,130,246,.35);
14
+ --success:#22c55e;--warning:#f59e0b;--danger:#ef4444;--chemical:#a78bfa;
15
+ --text:#e2e8f0;--text-muted:#64748b;--text-dim:#94a3b8;
16
+ --radius:12px;--radius-sm:8px;
17
+ --font:'Inter',system-ui,sans-serif;--mono:'JetBrains Mono',monospace;
18
+ }
19
+ *{box-sizing:border-box;margin:0;padding:0}
20
+ body{background:var(--bg);color:var(--text);font-family:var(--font);min-height:100vh;
21
+ display:grid;grid-template-rows:auto 1fr;overflow:hidden;height:100vh}
22
+
23
+ /* ── Header ── */
24
+ header{display:flex;align-items:center;justify-content:space-between;
25
+ padding:14px 24px;background:var(--surface);border-bottom:1px solid var(--border);
26
+ backdrop-filter:blur(8px);}
27
+ .logo{display:flex;align-items:center;gap:10px}
28
+ .logo-icon{width:36px;height:36px;background:linear-gradient(135deg,var(--accent),#8b5cf6);
29
+ border-radius:9px;display:flex;align-items:center;justify-content:center;font-size:18px}
30
+ .logo-text{font-weight:700;font-size:18px;letter-spacing:-.3px}
31
+ .logo-sub{font-size:11px;color:var(--text-muted);font-weight:400;margin-top:1px}
32
+ .header-status{display:flex;align-items:center;gap:8px;font-size:13px}
33
+ .dot{width:8px;height:8px;border-radius:50%;background:var(--success);
34
+ box-shadow:0 0 8px var(--success);animation:pulse 2s infinite}
35
+ @keyframes pulse{0%,100%{opacity:1}50%{opacity:.5}}
36
+
37
+ /* ── Layout ── */
38
+ main{display:grid;grid-template-columns:1fr 380px;gap:0;overflow:hidden}
39
+
40
+ /* ── Left: Simulation panel ── */
41
+ .sim-panel{display:flex;flex-direction:column;padding:20px;gap:16px;overflow:hidden}
42
+ .sim-header{display:flex;align-items:center;justify-content:space-between}
43
+ .panel-title{font-size:13px;font-weight:600;color:var(--text-dim);text-transform:uppercase;letter-spacing:.8px}
44
+ .sim-container{flex:1;background:var(--surface);border:1px solid var(--border);
45
+ border-radius:var(--radius);overflow:hidden;position:relative;
46
+ display:flex;align-items:center;justify-content:center;min-height:300px}
47
+ #sim-video{width:100%;height:100%;object-fit:contain;display:block}
48
+ .sim-overlay{position:absolute;top:0;left:0;right:0;bottom:0;display:flex;
49
+ align-items:center;justify-content:center;background:rgba(10,13,20,.85);
50
+ flex-direction:column;gap:12px;transition:.3s}
51
+ .sim-overlay.hidden{opacity:0;pointer-events:none}
52
+ .spinner{width:40px;height:40px;border:3px solid var(--border);
53
+ border-top-color:var(--accent);border-radius:50%;animation:spin 1s linear infinite}
54
+ @keyframes spin{to{transform:rotate(360deg)}}
55
+ .sim-overlay p{color:var(--text-muted);font-size:14px}
56
+
57
+ /* ── Status cards ── */
58
+ .stats-row{display:grid;grid-template-columns:repeat(3,1fr);gap:12px}
59
+ .stat-card{background:var(--surface);border:1px solid var(--border);border-radius:var(--radius-sm);
60
+ padding:12px 16px}
61
+ .stat-label{font-size:11px;color:var(--text-muted);text-transform:uppercase;letter-spacing:.6px;margin-bottom:4px}
62
+ .stat-value{font-size:22px;font-weight:700;font-family:var(--mono)}
63
+ .stat-value.ok{color:var(--success)}
64
+ .stat-value.busy{color:var(--warning)}
65
+
66
+ /* ── Right: Agent panel ── */
67
+ .agent-panel{background:var(--surface);border-left:1px solid var(--border);
68
+ display:flex;flex-direction:column;overflow:hidden}
69
+ .agent-header{padding:16px 20px;border-bottom:1px solid var(--border);
70
+ display:flex;align-items:center;justify-content:space-between}
71
+ .agent-title{font-weight:600;font-size:15px}
72
+ .sdk-badge{background:rgba(59,130,246,.12);border:1px solid rgba(59,130,246,.3);
73
+ color:var(--accent);font-size:10px;font-weight:600;padding:2px 8px;
74
+ border-radius:20px;letter-spacing:.5px}
75
+
76
+ /* ── Transcript ── */
77
+ .transcript{flex:1;overflow-y:auto;padding:16px;display:flex;flex-direction:column;gap:10px;
78
+ scroll-behavior:smooth}
79
+ .transcript::-webkit-scrollbar{width:4px}
80
+ .transcript::-webkit-scrollbar-thumb{background:var(--border);border-radius:2px}
81
+ .msg{display:flex;flex-direction:column;gap:3px;animation:fadeIn .25s ease}
82
+ @keyframes fadeIn{from{opacity:0;transform:translateY(6px)}to{opacity:1;transform:none}}
83
+ .msg-role{font-size:10px;font-weight:600;text-transform:uppercase;letter-spacing:.6px;color:var(--text-muted)}
84
+ .msg-text{font-size:14px;line-height:1.55;padding:10px 13px;border-radius:10px;
85
+ background:var(--surface2);border:1px solid var(--border);max-width:100%}
86
+ .msg.user .msg-role{color:var(--accent)}
87
+ .msg.user .msg-text{background:rgba(59,130,246,.08);border-color:rgba(59,130,246,.2)}
88
+ .msg.agent .msg-role{color:var(--success)}
89
+ .msg.agent .msg-text{background:rgba(34,197,94,.06);border-color:rgba(34,197,94,.15)}
90
+ .msg.system .msg-role{color:var(--text-muted)}
91
+ .msg.system .msg-text{font-family:var(--mono);font-size:12px;background:var(--surface);
92
+ border-style:dashed;white-space:pre-wrap}
93
+ .msg.warning .msg-role{color:var(--warning)}
94
+ .msg.warning .msg-text{background:rgba(245,158,11,.08);border-color:rgba(245,158,11,.3)}
95
+
96
+ /* ── Quota warning banner ── */
97
+ #quota-banner{display:none;background:rgba(245,158,11,.1);border:1px solid rgba(245,158,11,.35);
98
+ border-radius:var(--radius-sm);margin:0 16px;padding:10px 14px;
99
+ font-size:12px;color:var(--warning);line-height:1.5}
100
+ #quota-banner.show{display:block}
101
+
102
+ /* ── Input area ── */
103
+ .input-area{padding:14px 16px;border-top:1px solid var(--border);display:flex;flex-direction:column;gap:10px}
104
+ .input-row{display:flex;gap:8px}
105
+ #cmd-input{flex:1;background:var(--surface2);border:1px solid var(--border);
106
+ border-radius:var(--radius-sm);padding:10px 14px;color:var(--text);
107
+ font-family:var(--font);font-size:14px;outline:none;transition:.2s}
108
+ #cmd-input:focus{border-color:var(--accent);box-shadow:0 0 0 3px var(--accent-glow)}
109
+ #cmd-input::placeholder{color:var(--text-muted)}
110
+ .btn{border:none;cursor:pointer;border-radius:var(--radius-sm);font-family:var(--font);
111
+ font-weight:600;font-size:13px;transition:.18s;display:flex;align-items:center;gap:6px;
112
+ white-space:nowrap}
113
+ .btn:active{transform:scale(.96)}
114
+ .btn-primary{background:var(--accent);color:#fff;padding:10px 18px}
115
+ .btn-primary:hover{background:#2563eb}
116
+ .btn-voice{background:var(--surface2);border:1px solid var(--border);color:var(--text);padding:10px 14px;font-size:16px}
117
+ .btn-voice.listening{background:rgba(239,68,68,.15);border-color:var(--danger);color:var(--danger);animation:pulse 1s infinite}
118
+ .btn-voice:hover{border-color:var(--accent)}
119
+ .action-btns{display:flex;gap:8px}
120
+ .btn-action{flex:1;padding:9px 12px;font-size:12px}
121
+ .btn-scan{background:rgba(59,130,246,.12);border:1px solid rgba(59,130,246,.3);color:var(--accent)}
122
+ .btn-scan:hover{background:rgba(59,130,246,.22)}
123
+ .btn-sort{background:rgba(34,197,94,.1);border:1px solid rgba(34,197,94,.3);color:var(--success)}
124
+ .btn-sort:hover{background:rgba(34,197,94,.2)}
125
+ .btn-state{background:rgba(167,139,250,.1);border:1px solid rgba(167,139,250,.3);color:var(--chemical)}
126
+ .btn-state:hover{background:rgba(167,139,250,.2)}
127
+ .stt-hint{font-size:11px;color:var(--text-muted);text-align:center}
128
+
129
+ /* ── Item list ── */
130
+ .items-section{padding:0 16px 10px;border-top:1px solid var(--border);padding-top:10px}
131
+ .items-title{font-size:11px;font-weight:600;color:var(--text-muted);text-transform:uppercase;
132
+ letter-spacing:.6px;margin-bottom:8px}
133
+ .item-list{display:flex;flex-direction:column;gap:5px;max-height:110px;overflow-y:auto}
134
+ .item-pill{display:flex;align-items:center;justify-content:space-between;
135
+ padding:5px 10px;border-radius:6px;font-size:12px;
136
+ background:var(--surface2);border:1px solid var(--border)}
137
+ .item-pill .name{font-family:var(--mono);color:var(--text-dim)}
138
+ .item-pill .badge{font-size:10px;font-weight:600;padding:2px 7px;border-radius:20px}
139
+ .badge.flammable{background:rgba(239,68,68,.15);color:#f87171;border:1px solid rgba(239,68,68,.25)}
140
+ .badge.chemical{background:rgba(167,139,250,.15);color:#c4b5fd;border:1px solid rgba(167,139,250,.25)}
141
+ .badge.safe{background:rgba(100,116,139,.15);color:#94a3b8;border:1px solid rgba(100,116,139,.25)}
142
+ .badge.sorted{background:rgba(34,197,94,.1);color:#4ade80;border:1px solid rgba(34,197,94,.2)}
143
+ </style>
144
+ </head>
145
+ <body>
146
+
147
+ <header>
148
+ <div class="logo">
149
+ <div class="logo-icon">🤖</div>
150
+ <div>
151
+ <div class="logo-text">SemSorter</div>
152
+ <div class="logo-sub">AI Hazard Sorting — Vision-Agents SDK</div>
153
+ </div>
154
+ </div>
155
+ <div class="header-status">
156
+ <div class="dot" id="conn-dot"></div>
157
+ <span id="conn-label">Connecting…</span>
158
+ </div>
159
+ </header>
160
+
161
+ <main>
162
+ <!-- Left: simulation video -->
163
+ <div class="sim-panel">
164
+ <div class="sim-header">
165
+ <span class="panel-title">Live Simulation Feed</span>
166
+ <span style="font-size:12px;color:var(--text-muted)" id="fps-label">— fps</span>
167
+ </div>
168
+
169
+ <div class="sim-container">
170
+ <img id="sim-video" alt="MuJoCo simulation" src=""/>
171
+ <div class="sim-overlay" id="sim-overlay">
172
+ <div class="spinner"></div>
173
+ <p>Warming up simulation…</p>
174
+ </div>
175
+ </div>
176
+
177
+ <div class="stats-row">
178
+ <div class="stat-card">
179
+ <div class="stat-label">Items Sorted</div>
180
+ <div class="stat-value ok" id="stat-sorted">0</div>
181
+ </div>
182
+ <div class="stat-card">
183
+ <div class="stat-label">Arm Status</div>
184
+ <div class="stat-value ok" id="stat-arm">Idle</div>
185
+ </div>
186
+ <div class="stat-card">
187
+ <div class="stat-label">Sim Time</div>
188
+ <div class="stat-value" id="stat-time" style="font-size:18px">0.0 s</div>
189
+ </div>
190
+ </div>
191
+ </div>
192
+
193
+ <!-- Right: agent chat panel -->
194
+ <div class="agent-panel">
195
+ <div class="agent-header">
196
+ <div class="agent-title">SemSorter AI</div>
197
+ <div class="sdk-badge">Vision-Agents SDK</div>
198
+ </div>
199
+
200
+ <div id="quota-banner"></div>
201
+
202
+ <div class="transcript" id="transcript"></div>
203
+
204
+ <div class="items-section">
205
+ <div class="items-title">Conveyor Items</div>
206
+ <div class="item-list" id="item-list"><span style="font-size:12px;color:var(--text-muted)">Loading…</span></div>
207
+ </div>
208
+
209
+ <div class="input-area">
210
+ <div class="action-btns">
211
+ <button class="btn btn-action btn-scan" onclick="sendWs('scan')">🔍 Scan</button>
212
+ <button class="btn btn-action btn-sort" onclick="sendWs('sort')">⚡ Sort All</button>
213
+ <button class="btn btn-action btn-state" onclick="sendWs('state')">📊 State</button>
214
+ </div>
215
+
216
+ <div class="input-row">
217
+ <input id="cmd-input" placeholder="Type a command…" autocomplete="off"
218
+ onkeydown="if(event.key==='Enter')sendCommand()"/>
219
+ <button class="btn btn-voice" id="voice-btn" onclick="toggleVoice()" title="Voice input">🎤</button>
220
+ <button class="btn btn-primary" onclick="sendCommand()">Send</button>
221
+ </div>
222
+ <div class="stt-hint" id="stt-hint">Using browser speech recognition</div>
223
+ </div>
224
+ </div>
225
+ </main>
226
+
227
+ <script>
228
+ // ─── WebSocket connections ────────────────────────────────────────────────────
229
+ const WS_BASE = `${location.protocol === 'https:' ? 'wss' : 'ws'}://${location.host}`;
230
+ let chatWs = null;
231
+ let videoWs = null;
232
+ let reconnectDelay = 1000;
233
+
234
+ // ─── State ────────────────────────────────────────────────────────────────────
235
+ let frameCount = 0, lastFpsTime = Date.now();
236
+ let listening = false;
237
+ let recognition = null;
238
+
239
+ // ─── Chat WebSocket ───────────────────────────────────────────────────────────
240
+ function connectChat() {
241
+ chatWs = new WebSocket(`${WS_BASE}/ws/chat`);
242
+ chatWs.onopen = () => {
243
+ setConnected(true);
244
+ reconnectDelay = 1000;
245
+ pollState();
246
+ };
247
+ chatWs.onmessage = ({data}) => handleChatMessage(JSON.parse(data));
248
+ chatWs.onclose = () => {
249
+ setConnected(false);
250
+ setTimeout(connectChat, reconnectDelay = Math.min(reconnectDelay * 1.5, 10000));
251
+ };
252
+ chatWs.onerror = () => chatWs.close();
253
+ }
254
+
255
+ function handleChatMessage(msg) {
256
+ switch(msg.type) {
257
+ case 'welcome': addMsg('agent', 'SemSorter AI', msg.text); break;
258
+ case 'user_message': addMsg('user', 'You', msg.text); break;
259
+ case 'agent_response': addMsg('agent','SemSorter AI', msg.text); break;
260
+ case 'scan_result': renderScanResult(msg.data); break;
261
+ case 'sort_result': renderSortResult(msg.data); break;
262
+ case 'state': renderState(msg.data); break;
263
+ case 'quota_warning': showQuotaWarning(msg.service, msg.message); break;
264
+ case 'system': addMsg('system', 'System', msg.text); break;
265
+ }
266
+ }
267
+
268
+ function sendWs(type, extra={}) {
269
+ if (!chatWs || chatWs.readyState !== 1) return;
270
+ chatWs.send(JSON.stringify({type, ...extra}));
271
+ }
272
+
273
+ function sendCommand() {
274
+ const input = document.getElementById('cmd-input');
275
+ const text = input.value.trim();
276
+ if (!text) return;
277
+ input.value = '';
278
+ sendWs('command', {text});
279
+ }
280
+
281
+ // ─── Video WebSocket ──────────────────────────────────────────────────────────
282
+ function connectVideo() {
283
+ videoWs = new WebSocket(`${WS_BASE}/ws/video`);
284
+ videoWs.onmessage = ({data}) => {
285
+ const {type, data: b64} = JSON.parse(data);
286
+ if (type === 'frame') {
287
+ const img = document.getElementById('sim-video');
288
+ img.src = `data:image/jpeg;base64,${b64}`;
289
+ document.getElementById('sim-overlay').classList.add('hidden');
290
+ frameCount++;
291
+ }
292
+ };
293
+ videoWs.onclose = () => setTimeout(connectVideo, 2000);
294
+ videoWs.onerror = () => videoWs.close();
295
+ }
296
+
297
+ // FPS counter
298
+ setInterval(() => {
299
+ const now = Date.now();
300
+ const fps = Math.round(frameCount / ((now - lastFpsTime) / 1000));
301
+ document.getElementById('fps-label').textContent = `${fps} fps`;
302
+ frameCount = 0; lastFpsTime = now;
303
+ }, 2000);
304
+
305
+ // ─── State polling ────────────────────────────────────────────────────────────
306
+ function pollState() {
307
+ fetch('/api/state').then(r => r.json()).then(renderState).catch(()=>{});
308
+ setTimeout(pollState, 3000);
309
+ }
310
+
311
+ function renderState(s) {
312
+ document.getElementById('stat-sorted').textContent = s.items_sorted ?? 0;
313
+ const armEl = document.getElementById('stat-arm');
314
+ armEl.textContent = s.arm_busy ? 'Busy' : 'Idle';
315
+ armEl.className = `stat-value ${s.arm_busy ? 'busy' : 'ok'}`;
316
+ document.getElementById('stat-time').textContent = `${s.time ?? 0} s`;
317
+ if (s.items) renderItems(s.items);
318
+ if (s.quota_exceeded) Object.entries(s.quota_exceeded).forEach(([svc, exceeded]) => {
319
+ if (exceeded) showQuotaWarning(svc, `⚠️ ${svc} quota exceeded — demo mode active`);
320
+ });
321
+ }
322
+
323
+ function renderItems(items) {
324
+ const list = document.getElementById('item-list');
325
+ list.innerHTML = items.map(i => {
326
+ const type = i.hazard_type || 'safe';
327
+ const cls = i.picked ? 'sorted' : type.toLowerCase();
328
+ const label = i.picked ? '✓ sorted' : type;
329
+ return `<div class="item-pill">
330
+ <span class="name">${i.name}</span>
331
+ <span class="badge ${cls}">${label}</span>
332
+ </div>`;
333
+ }).join('');
334
+ }
335
+
336
+ // ─── Scan / sort renderers ────────────────────────────────────────────────────
337
+ function renderScanResult(d) {
338
+ const demoNote = d.demo_mode ? ' [demo mode]' : '';
339
+ const lines = [`Found ${d.hazards_found} hazardous item(s)${demoNote}:`];
340
+ (d.items||[]).forEach(i =>
341
+ lines.push(` • ${i.item_name} (${i.type}) → ${i.bin_type} bin`));
342
+ addMsg('system', 'Scan Result', lines.join('\n'));
343
+ }
344
+
345
+ function renderSortResult(d) {
346
+ const demoNote = d.demo_mode ? ' [demo mode]' : '';
347
+ const lines = [
348
+ `Sorted ${d.items_sorted}/${d.items_matched} item(s)${demoNote}:`,
349
+ ...(d.details||[]).map(x => ` ${x.success ? '✅' : '❌'} ${x.item} → ${x.bin}`)
350
+ ];
351
+ addMsg('system', 'Sort Result', lines.join('\n'));
352
+ }
353
+
354
+ // ─── Quota warning ────────────────────────────────────────────────────────────
355
+ const _shownWarnings = new Set();
356
+ function showQuotaWarning(service, message) {
357
+ if (_shownWarnings.has(service)) return;
358
+ _shownWarnings.add(service);
359
+ const banner = document.getElementById('quota-banner');
360
+ banner.textContent = message;
361
+ banner.classList.add('show');
362
+ addMsg('warning', 'API Status', message);
363
+ }
364
+
365
+ // ─── Transcript helpers ───────────────────────────────────────────────────────
366
+ function addMsg(cls, role, text) {
367
+ const t = document.getElementById('transcript');
368
+ const div = document.createElement('div');
369
+ div.className = `msg ${cls}`;
370
+ div.innerHTML = `<div class="msg-role">${role}</div><div class="msg-text">${escHtml(text)}</div>`;
371
+ t.appendChild(div);
372
+ t.scrollTop = t.scrollHeight;
373
+ }
374
+ function escHtml(s){ return s.replace(/&/g,'&amp;').replace(/</g,'&lt;').replace(/>/g,'&gt;').replace(/\n/g,'<br>'); }
375
+
376
+ // ─── Connection status ────────────────────────────────────────────────────────
377
+ function setConnected(ok) {
378
+ const dot = document.getElementById('conn-dot');
379
+ const lbl = document.getElementById('conn-label');
380
+ dot.style.background = ok ? 'var(--success)' : 'var(--danger)';
381
+ dot.style.boxShadow = ok ? '0 0 8px var(--success)' : '0 0 8px var(--danger)';
382
+ dot.style.animation = ok ? 'pulse 2s infinite' : 'none';
383
+ lbl.textContent = ok ? 'Connected' : 'Reconnecting…';
384
+ }
385
+
386
+ // ─── Voice input (Web Speech API) ────────────────────────────────────────────
387
+ function toggleVoice() {
388
+ const SpeechRec = window.SpeechRecognition || window.webkitSpeechRecognition;
389
+ if (!SpeechRec) {
390
+ addMsg('system','System','Browser speech recognition not supported. Use the text input.');
391
+ return;
392
+ }
393
+ if (listening) { recognition.stop(); return; }
394
+
395
+ recognition = new SpeechRec();
396
+ recognition.lang = 'en-US';
397
+ recognition.interimResults = false;
398
+ recognition.onresult = e => {
399
+ const text = e.results[0][0].transcript;
400
+ document.getElementById('cmd-input').value = text;
401
+ document.getElementById('stt-hint').textContent =
402
+ `Heard: "${text}" — sending…`;
403
+ sendCommand();
404
+ };
405
+ recognition.onend = () => {
406
+ listening = false;
407
+ document.getElementById('voice-btn').classList.remove('listening');
408
+ document.getElementById('stt-hint').textContent = 'Using browser speech recognition';
409
+ };
410
+ recognition.onerror = e => {
411
+ addMsg('system','STT',`Speech recognition error: ${e.error}`);
412
+ listening = false;
413
+ document.getElementById('voice-btn').classList.remove('listening');
414
+ };
415
+ recognition.start();
416
+ listening = true;
417
+ document.getElementById('voice-btn').classList.add('listening');
418
+ document.getElementById('stt-hint').textContent = '🎙️ Listening… speak now';
419
+ }
420
+
421
+ // ─── Boot ─────────────────────────────────────────────────────────────────────
422
+ setConnected(false);
423
+ connectChat();
424
+ connectVideo();
425
+ </script>
426
+ </body>
427
+ </html>
SemSorter/simulation/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+ # SemSorter Simulation Module
SemSorter/simulation/controller.py ADDED
@@ -0,0 +1,786 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ SemSorter MuJoCo Simulation Controller
3
+
4
+ This module manages the Franka Panda robotic arm simulation for the SemSorter
5
+ project. It loads the Panda from mujoco_menagerie, adds conveyors, waste bins,
6
+ and hazardous items, then provides an async API for pick-and-place operations.
7
+
8
+ Usage:
9
+ python controller.py # Launch interactive viewer
10
+ python controller.py --render # Render a test frame to PNG
11
+ """
12
+
13
+ import asyncio
14
+ import json
15
+ import logging
16
+ import math
17
+ import os
18
+ import time
19
+ from dataclasses import dataclass, field
20
+ from enum import Enum
21
+ from pathlib import Path
22
+ from typing import Dict, List, Optional, Tuple
23
+
24
+ import mujoco
25
+ import mujoco.viewer
26
+ import numpy as np
27
+
28
+ logger = logging.getLogger(__name__)
29
+
30
+ # ─── Path configuration ─────────────────────────────────────────────────────
31
+ PROJECT_ROOT = Path(__file__).resolve().parent.parent.parent # SemSorter/
32
+ MENAGERIE_DIR = PROJECT_ROOT / "mujoco_menagerie"
33
+ PANDA_SCENE = MENAGERIE_DIR / "franka_emika_panda" / "scene.xml"
34
+
35
+
36
+ # ─── Data types ──────────────────────────────────────────────────────────────
37
+ class BinType(str, Enum):
38
+ FLAMMABLE = "flammable"
39
+ CHEMICAL = "chemical"
40
+ OUTPUT = "output" # safe items go to output conveyor
41
+
42
+
43
+ @dataclass
44
+ class ItemInfo:
45
+ """Metadata for a conveyor item."""
46
+ name: str
47
+ body_id: int
48
+ geom_id: int
49
+ is_hazardous: bool
50
+ hazard_type: Optional[BinType] = None # which bin it should go to
51
+ picked: bool = False
52
+
53
+
54
+ @dataclass
55
+ class SimState:
56
+ """Observable simulation state for the frontend."""
57
+ time: float = 0.0
58
+ ee_pos: Tuple[float, float, float] = (0, 0, 0)
59
+ gripper_open: bool = True
60
+ items: List[Dict] = field(default_factory=list)
61
+ arm_busy: bool = False
62
+ items_sorted: int = 0
63
+
64
+
65
+ # ─── Bin positions (world coordinates) ───────────────────────────────────────
66
+ BIN_POSITIONS = {
67
+ BinType.FLAMMABLE: np.array([-0.25, -0.40, 0.35]), # Above the red bin
68
+ BinType.CHEMICAL: np.array([0.25, -0.40, 0.35]), # Above the yellow bin
69
+ BinType.OUTPUT: np.array([0.40, 0.0, 0.40]), # Output conveyor
70
+ }
71
+
72
+ # ─── Panda joint configuration ──────────────────────────────────────────────
73
+ # Actuator indices (from panda.xml):
74
+ # 0-6: arm joints (actuator1-7)
75
+ # 7: gripper (actuator8, ctrl 0=closed, 255=fully open)
76
+ GRIPPER_ACTUATOR_ID = 7
77
+ GRIPPER_OPEN = 255.0
78
+ GRIPPER_CLOSED = 0.0
79
+ NUM_ARM_JOINTS = 7
80
+ ENV_CONTACT_TYPE = 2 # Keep environment/item contacts separate from robot links.
81
+
82
+
83
+ class SemSorterSimulation:
84
+ """
85
+ MuJoCo simulation controller for the SemSorter pick-and-place task.
86
+
87
+ Loads the Franka Panda from menagerie, adds the warehouse environment
88
+ (conveyors, bins, items), and provides an async API for robot control.
89
+ """
90
+
91
+ def __init__(self):
92
+ self.model: Optional[mujoco.MjModel] = None
93
+ self.data: Optional[mujoco.MjData] = None
94
+ self.renderer: Optional[mujoco.Renderer] = None
95
+ self.items: Dict[str, ItemInfo] = {}
96
+ self._arm_busy = False
97
+ self._items_sorted = 0
98
+ self._running = False
99
+
100
+ # ─── Scene loading ───────────────────────────────────────────────────
101
+
102
+ def load_scene(self) -> None:
103
+ """Load the Panda scene from menagerie and add SemSorter objects."""
104
+ # Load base Panda scene
105
+ logger.info(f"Loading Panda from: {PANDA_SCENE}")
106
+ spec = mujoco.MjSpec.from_file(str(PANDA_SCENE))
107
+
108
+ # Modify the model name
109
+ spec.modelname = "semsorter"
110
+
111
+ # Set offscreen framebuffer size for rendering
112
+ spec.visual.global_.offwidth = 1920
113
+ spec.visual.global_.offheight = 1080
114
+
115
+ # ─── Add additional lights ───────────────────────────────────────
116
+ world = spec.worldbody
117
+ light = world.add_light()
118
+ light.pos = [0, -1, 2]
119
+ light.dir = [0, 0.5, -0.8]
120
+ light.diffuse = [0.4, 0.4, 0.4]
121
+
122
+ light2 = world.add_light()
123
+ light2.pos = [-1, -1, 2]
124
+ light2.dir = [0.3, 0.3, -0.8]
125
+ light2.diffuse = [0.3, 0.3, 0.3]
126
+
127
+ # ─── Add cameras ────────────────────────────────────────────────
128
+ cam_overview = world.add_camera()
129
+ cam_overview.name = "overview"
130
+ cam_overview.pos = [0, -1.4, 1.3]
131
+ cam_overview.quat = [0.92, 0.38, 0, 0] # Look slightly down
132
+ cam_overview.fovy = 50
133
+
134
+ cam_top = world.add_camera()
135
+ cam_top.name = "topdown"
136
+ cam_top.pos = [0, 0, 2.0]
137
+ cam_top.quat = [0.0, 0.0, 0.0, 1.0] # Look straight down
138
+ cam_top.fovy = 60
139
+
140
+ cam_side = world.add_camera()
141
+ cam_side.name = "side"
142
+ cam_side.pos = [1.5, 0, 0.8]
143
+ cam_side.quat = [0.65, 0.27, 0.27, 0.65] # Side view
144
+ cam_side.fovy = 45
145
+
146
+ # ─── Add conveyors ──────────────────────────────────────────────
147
+ self._add_conveyor(spec, "input", pos=[-0.40, 0, 0])
148
+ self._add_conveyor(spec, "output", pos=[0.40, 0, 0])
149
+
150
+ # ─── Add waste bins ─────────────────────────────────────────────
151
+ self._add_bin(spec, "flammable", pos=[-0.25, -0.40, 0],
152
+ color=[0.85, 0.15, 0.1, 0.9])
153
+ self._add_bin(spec, "chemical", pos=[0.25, -0.40, 0],
154
+ color=[0.95, 0.75, 0.1, 0.9])
155
+
156
+ # ─── Add hazardous items on input conveyor ──────────────────────
157
+ items_spec = [
158
+ ("item_flammable_1", [-0.50, 0.0, 0.40], "cylinder", [0.025, 0.03],
159
+ [0.9, 0.1, 0.1, 1], True, BinType.FLAMMABLE),
160
+ ("item_chemical_1", [-0.40, 0.05, 0.40], "box", [0.025, 0.025, 0.025],
161
+ [0.95, 0.85, 0.1, 1], True, BinType.CHEMICAL),
162
+ ("item_chemical_2", [-0.30, -0.03, 0.40], "sphere", [0.025],
163
+ [0.95, 0.85, 0.1, 1], True, BinType.CHEMICAL),
164
+ ("item_safe_1", [-0.35, -0.05, 0.40], "box", [0.03, 0.025, 0.02],
165
+ [0.6, 0.6, 0.6, 1], False, BinType.OUTPUT),
166
+ ("item_safe_2", [-0.55, 0.04, 0.40], "cylinder", [0.022, 0.025],
167
+ [0.9, 0.9, 0.9, 1], False, BinType.OUTPUT),
168
+ ("item_flammable_2", [-0.45, 0.02, 0.40], "box", [0.022, 0.022, 0.022],
169
+ [0.9, 0.1, 0.1, 1], True, BinType.FLAMMABLE),
170
+ ]
171
+
172
+ for name, pos, shape, size, rgba, is_haz, haz_type in items_spec:
173
+ self._add_item(spec, name, pos, shape, size, rgba)
174
+ self.items[name] = ItemInfo(
175
+ name=name, body_id=-1, geom_id=-1,
176
+ is_hazardous=is_haz, hazard_type=haz_type if is_haz else None,
177
+ )
178
+
179
+ # Store desired spawn positions for post-keyframe initialization
180
+ self._item_spawn_positions = {
181
+ name: pos for name, pos, *_ in items_spec
182
+ }
183
+
184
+ # ─── Compile the model ──────────────────────────────────────────
185
+ self.model = spec.compile()
186
+ self.data = mujoco.MjData(self.model)
187
+
188
+ # Keep floor contacts in the environment collision group (not robot group).
189
+ floor_geom_id = mujoco.mj_name2id(
190
+ self.model, mujoco.mjtObj.mjOBJ_GEOM, "floor")
191
+ if floor_geom_id >= 0:
192
+ self.model.geom_contype[floor_geom_id] = ENV_CONTACT_TYPE
193
+ self.model.geom_conaffinity[floor_geom_id] = ENV_CONTACT_TYPE
194
+
195
+ # Resolve body/geom IDs for items
196
+ for name in self.items:
197
+ self.items[name].body_id = mujoco.mj_name2id(
198
+ self.model, mujoco.mjtObj.mjOBJ_BODY, name)
199
+ geom_name = f"{name}_geom"
200
+ self.items[name].geom_id = mujoco.mj_name2id(
201
+ self.model, mujoco.mjtObj.mjOBJ_GEOM, geom_name)
202
+
203
+ # ─── Reset to home pose ─────────────────────────────────────────
204
+ key_id = mujoco.mj_name2id(
205
+ self.model, mujoco.mjtObj.mjOBJ_KEY, "home")
206
+ if key_id >= 0:
207
+ mujoco.mj_resetDataKeyframe(self.model, self.data, key_id)
208
+
209
+ # ─── Set item initial positions (keyframe only has arm joints) ──
210
+ for name, pos in self._item_spawn_positions.items():
211
+ jnt_name = f"{name}_jnt"
212
+ jnt_id = mujoco.mj_name2id(
213
+ self.model, mujoco.mjtObj.mjOBJ_JOINT, jnt_name)
214
+ if jnt_id >= 0:
215
+ qadr = self.model.jnt_qposadr[jnt_id]
216
+ # freejoint qpos: [x, y, z, qw, qx, qy, qz]
217
+ self.data.qpos[qadr:qadr+3] = pos
218
+ self.data.qpos[qadr+3:qadr+7] = [1, 0, 0, 0] # identity quat
219
+
220
+ mujoco.mj_forward(self.model, self.data)
221
+
222
+ logger.info(f"Scene compiled: {self.model.nbody} bodies, "
223
+ f"{self.model.njnt} joints, {self.model.nu} actuators")
224
+ logger.info(f"Items registered: {list(self.items.keys())}")
225
+
226
+ def _add_conveyor(self, spec: mujoco.MjSpec, name: str, pos: list) -> None:
227
+ """Add a conveyor belt with frame and legs."""
228
+ world = spec.worldbody
229
+ body = world.add_body()
230
+ body.name = f"conveyor_{name}"
231
+ body.pos = pos
232
+
233
+ # Belt surface
234
+ belt = body.add_geom()
235
+ belt.name = f"belt_{name}"
236
+ belt.type = mujoco.mjtGeom.mjGEOM_BOX
237
+ belt.size = [0.35, 0.12, 0.005]
238
+ belt.pos = [0, 0, 0.35]
239
+ belt.rgba = [0.15, 0.15, 0.15, 1]
240
+ belt.friction = [0.8, 0.005, 0.0001]
241
+ belt.contype = ENV_CONTACT_TYPE
242
+ belt.conaffinity = ENV_CONTACT_TYPE
243
+
244
+ # Side rails
245
+ for side_name, y in [("L", 0.125), ("R", -0.125)]:
246
+ rail = body.add_geom()
247
+ rail.name = f"rail_{name}_{side_name}"
248
+ rail.type = mujoco.mjtGeom.mjGEOM_BOX
249
+ rail.size = [0.35, 0.005, 0.02]
250
+ rail.pos = [0, y, 0.37]
251
+ rail.rgba = [0.4, 0.4, 0.45, 1]
252
+ rail.contype = ENV_CONTACT_TYPE
253
+ rail.conaffinity = ENV_CONTACT_TYPE
254
+
255
+ # Legs
256
+ for lx, ly in [(-0.3, 0.1), (-0.3, -0.1), (0.3, 0.1), (0.3, -0.1)]:
257
+ leg = body.add_geom()
258
+ leg.type = mujoco.mjtGeom.mjGEOM_CYLINDER
259
+ leg.size = [0.015, 0.175, 0]
260
+ leg.pos = [lx, ly, 0.175]
261
+ leg.rgba = [0.4, 0.4, 0.45, 1]
262
+ leg.contype = ENV_CONTACT_TYPE
263
+ leg.conaffinity = ENV_CONTACT_TYPE
264
+
265
+ def _add_bin(self, spec: mujoco.MjSpec, name: str, pos: list,
266
+ color: list) -> None:
267
+ """Add an open-top waste bin."""
268
+ world = spec.worldbody
269
+ body = world.add_body()
270
+ body.name = f"bin_{name}"
271
+ body.pos = pos
272
+
273
+ # Walls
274
+ wall_specs = [
275
+ (f"bin_{name}_back", [0, -0.095, 0.12], [0.1, 0.005, 0.12]),
276
+ (f"bin_{name}_front", [0, 0.095, 0.12], [0.1, 0.005, 0.12]),
277
+ (f"bin_{name}_left", [-0.095, 0, 0.12], [0.005, 0.1, 0.12]),
278
+ (f"bin_{name}_right", [0.095, 0, 0.12], [0.005, 0.1, 0.12]),
279
+ ]
280
+ for wname, wpos, wsize in wall_specs:
281
+ wall = body.add_geom()
282
+ wall.name = wname
283
+ wall.type = mujoco.mjtGeom.mjGEOM_BOX
284
+ wall.size = wsize
285
+ wall.pos = wpos
286
+ wall.rgba = color
287
+ wall.contype = ENV_CONTACT_TYPE
288
+ wall.conaffinity = ENV_CONTACT_TYPE
289
+
290
+ # Bottom
291
+ bottom = body.add_geom()
292
+ bottom.name = f"bin_{name}_bottom"
293
+ bottom.type = mujoco.mjtGeom.mjGEOM_BOX
294
+ bottom.size = [0.1, 0.1, 0.005]
295
+ bottom.pos = [0, 0, 0.005]
296
+ bottom.rgba = [0.1, 0.1, 0.1, 1]
297
+ bottom.contype = ENV_CONTACT_TYPE
298
+ bottom.conaffinity = ENV_CONTACT_TYPE
299
+
300
+ def _add_item(self, spec: mujoco.MjSpec, name: str, pos: list,
301
+ shape: str, size: list, rgba: list) -> None:
302
+ """Add a free-jointed item to the world."""
303
+ world = spec.worldbody
304
+ body = world.add_body()
305
+ body.name = name
306
+ body.pos = pos
307
+
308
+ # Free joint
309
+ jnt = body.add_freejoint()
310
+ jnt.name = f"{name}_jnt"
311
+
312
+ # Geom
313
+ geom = body.add_geom()
314
+ geom.name = f"{name}_geom"
315
+ shape_map = {
316
+ "box": mujoco.mjtGeom.mjGEOM_BOX,
317
+ "sphere": mujoco.mjtGeom.mjGEOM_SPHERE,
318
+ "cylinder": mujoco.mjtGeom.mjGEOM_CYLINDER,
319
+ }
320
+ geom.type = shape_map[shape]
321
+ geom.size = size + [0] * (3 - len(size)) # Pad to 3 elements
322
+ geom.rgba = rgba
323
+ geom.mass = 0.05
324
+ geom.friction = [1.0, 0.005, 0.0001]
325
+ geom.priority = 1
326
+ geom.contype = ENV_CONTACT_TYPE
327
+ geom.conaffinity = ENV_CONTACT_TYPE
328
+
329
+ # ─── End-effector helpers ────────────────────────────────────────────
330
+
331
+ def get_ee_pos(self) -> np.ndarray:
332
+ """Get current end-effector (hand) position in world coords."""
333
+ hand_id = mujoco.mj_name2id(
334
+ self.model, mujoco.mjtObj.mjOBJ_BODY, "hand")
335
+ return self.data.xpos[hand_id].copy()
336
+
337
+ def get_ee_site_pos(self) -> np.ndarray:
338
+ """Get EE position — alias."""
339
+ return self.get_ee_pos()
340
+
341
+ def get_item_pos(self, item_name: str) -> Optional[np.ndarray]:
342
+ """Get position of an item by name."""
343
+ info = self.items.get(item_name)
344
+ if info and info.body_id >= 0:
345
+ return self.data.xpos[info.body_id].copy()
346
+ return None
347
+
348
+ def _set_item_pose(self, item_name: str, pos: np.ndarray,
349
+ quat: Tuple[float, float, float, float] = (1, 0, 0, 0)) -> bool:
350
+ """Directly place an item free-joint at a world pose."""
351
+ jnt_name = f"{item_name}_jnt"
352
+ jnt_id = mujoco.mj_name2id(
353
+ self.model, mujoco.mjtObj.mjOBJ_JOINT, jnt_name)
354
+ if jnt_id < 0:
355
+ return False
356
+ qadr = self.model.jnt_qposadr[jnt_id]
357
+ self.data.qpos[qadr:qadr+3] = pos
358
+ self.data.qpos[qadr+3:qadr+7] = quat
359
+ dadr = self.model.jnt_dofadr[jnt_id]
360
+ self.data.qvel[dadr:dadr+6] = 0.0
361
+ return True
362
+
363
+ # ─── IK (Solver-based) ────────────────────────────────────────────
364
+
365
+ def reset_arm_neutral(self) -> None:
366
+ """
367
+ Move arm to a neutral upright pose where IK works well in all directions.
368
+ """
369
+ neutral_qpos = [0.0, -0.3, 0.0, -2.0, 0.0, 1.8, 0.0]
370
+ # Set qpos directly for arm joints (first 7)
371
+ self.data.qpos[:NUM_ARM_JOINTS] = neutral_qpos
372
+ self.data.ctrl[:NUM_ARM_JOINTS] = neutral_qpos
373
+ mujoco.mj_forward(self.model, self.data)
374
+
375
+ def solve_ik(self, target_pos: np.ndarray,
376
+ target_quat: Optional[np.ndarray] = None,
377
+ max_iter: int = 300,
378
+ tolerance: float = 0.015,
379
+ step_size: float = 0.5,
380
+ damping: float = 0.05) -> Optional[np.ndarray]:
381
+ """
382
+ Pure kinematic IK solver — iterates Jacobian on qpos WITHOUT physics.
383
+ Returns joint angles (length 7) or None if failed.
384
+ """
385
+ hand_id = mujoco.mj_name2id(
386
+ self.model, mujoco.mjtObj.mjOBJ_BODY, "hand")
387
+
388
+ # Save original qpos to restore later (critical for not corrupting physics)
389
+ orig_qpos = self.data.qpos.copy()
390
+
391
+ # Work on a copy of qpos
392
+ qpos_arm = orig_qpos[:NUM_ARM_JOINTS].copy()
393
+
394
+ try:
395
+ for _ in range(max_iter):
396
+ # Temporarily set qpos, run forward kinematics
397
+ self.data.qpos[:NUM_ARM_JOINTS] = qpos_arm
398
+ mujoco.mj_forward(self.model, self.data)
399
+
400
+ current_pos = self.data.xpos[hand_id].copy()
401
+ err_pos = target_pos - current_pos
402
+
403
+ # Position Jacobian
404
+ jacp = np.zeros((3, self.model.nv))
405
+ mujoco.mj_jacBody(self.model, self.data, jacp, None, hand_id)
406
+ J = jacp[:, :NUM_ARM_JOINTS]
407
+ error = err_pos
408
+
409
+ if target_quat is not None:
410
+ current_quat = self.data.xquat[hand_id].copy()
411
+ err_rot = np.zeros(3)
412
+ mujoco.mju_subQuat(err_rot, target_quat, current_quat)
413
+
414
+ # Rotation Jacobian
415
+ jacr = np.zeros((3, self.model.nv))
416
+ mujoco.mj_jacBody(self.model, self.data, None, jacr, hand_id)
417
+ Jr = jacr[:, :NUM_ARM_JOINTS]
418
+
419
+ # Scale rotational error so position takes priority
420
+ J = np.vstack([J, Jr * 0.5])
421
+ error = np.concatenate([error, err_rot * 0.5])
422
+
423
+ if np.linalg.norm(error) < tolerance:
424
+ return qpos_arm.copy()
425
+
426
+ # Damped least squares
427
+ JJT = J @ J.T + damping**2 * np.eye(J.shape[0])
428
+ dq = J.T @ np.linalg.solve(JJT, error)
429
+
430
+ # Update with step size and clamping
431
+ dq = np.clip(dq * step_size, -0.2, 0.2)
432
+ qpos_arm += dq
433
+
434
+ # Clamp to joint limits
435
+ for j in range(NUM_ARM_JOINTS):
436
+ jnt_id = j # arm joints are first 7
437
+ lo = self.model.jnt_range[jnt_id, 0]
438
+ hi = self.model.jnt_range[jnt_id, 1]
439
+ if lo < hi:
440
+ qpos_arm[j] = np.clip(qpos_arm[j], lo * 0.95, hi * 0.95)
441
+
442
+ return None # Did not converge
443
+
444
+ finally:
445
+ # Always restore original qpos and run forward to fix physics state
446
+ self.data.qpos[:] = orig_qpos
447
+ mujoco.mj_forward(self.model, self.data)
448
+
449
+ def move_to_position(self, target_pos: np.ndarray,
450
+ move_steps: int = 400,
451
+ settle_steps: int = 100,
452
+ position_tolerance: float = 0.05,
453
+ carry_item: Optional[str] = None,
454
+ carry_offset: Optional[np.ndarray] = None) -> bool:
455
+ """
456
+ Move end-effector to target position.
457
+ 1. Solve IK kinematically
458
+ 2. Interpolate joint targets smoothly (ease-in/ease-out)
459
+ 3. Step physics to let arm move
460
+ Returns True if IK solution found.
461
+ """
462
+ solution = self.solve_ik(target_pos)
463
+ if solution is None:
464
+ logger.warning(f"IK failed for target {target_pos}")
465
+ return False
466
+
467
+ current_ctrl = self.data.ctrl[:NUM_ARM_JOINTS].copy()
468
+
469
+ if carry_item is not None and carry_offset is None:
470
+ carry_offset = np.array([0.0, 0.0, -0.06])
471
+
472
+ # Smooth interpolation to target
473
+ for i in range(move_steps):
474
+ alpha = (i + 1) / move_steps
475
+ t = alpha * alpha * (3 - 2 * alpha) # Smoothstep
476
+ self.data.ctrl[:NUM_ARM_JOINTS] = current_ctrl * (1 - t) + solution * t
477
+ mujoco.mj_step(self.model, self.data)
478
+ if carry_item is not None:
479
+ ee = self.get_ee_pos()
480
+ self._set_item_pose(carry_item, ee + carry_offset)
481
+
482
+ # Settle
483
+ for _ in range(settle_steps):
484
+ mujoco.mj_step(self.model, self.data)
485
+ if carry_item is not None:
486
+ ee = self.get_ee_pos()
487
+ self._set_item_pose(carry_item, ee + carry_offset)
488
+
489
+ if carry_item is not None:
490
+ ee = self.get_ee_pos()
491
+ self._set_item_pose(carry_item, ee + carry_offset)
492
+ mujoco.mj_forward(self.model, self.data)
493
+
494
+ final_ee = self.get_ee_pos()
495
+ err = np.linalg.norm(target_pos - final_ee)
496
+ if err > position_tolerance:
497
+ logger.warning(
498
+ f"Move failed: target {target_pos}, reached {final_ee}, err={err:.4f}")
499
+ return False
500
+ return True
501
+
502
+ def set_gripper(self, open_gripper: bool) -> None:
503
+ """Open or close the gripper."""
504
+ self.data.ctrl[GRIPPER_ACTUATOR_ID] = (
505
+ GRIPPER_OPEN if open_gripper else GRIPPER_CLOSED
506
+ )
507
+
508
+ def step(self, n: int = 1) -> None:
509
+ """Advance the simulation by n steps."""
510
+ for _ in range(n):
511
+ mujoco.mj_step(self.model, self.data)
512
+
513
+ # ─── High-level pick-place operations ────────────────────────────────
514
+
515
+ def _stabilize_unpicked_items(self, exclude: str = "") -> None:
516
+ """Zero out velocities of all unpicked items to prevent physics drift.
517
+
518
+ Called before/after each pick-and-place so that the arm doesn't
519
+ knock neighboring items off the conveyor.
520
+ """
521
+ for name, info in self.items.items():
522
+ if name == exclude or info.picked:
523
+ continue
524
+ jnt_name = f"{name}_jnt"
525
+ jnt_id = mujoco.mj_name2id(
526
+ self.model, mujoco.mjtObj.mjOBJ_JOINT, jnt_name)
527
+ if jnt_id < 0:
528
+ continue
529
+ dadr = self.model.jnt_dofadr[jnt_id]
530
+ self.data.qvel[dadr:dadr + 6] = 0.0
531
+ mujoco.mj_forward(self.model, self.data)
532
+
533
+ def pick_and_place(self, item_name: str, target_bin: BinType) -> bool:
534
+ """
535
+ Execute a full pick-and-place sequence:
536
+ 1. Open gripper
537
+ 2. Move above item
538
+ 3. Move down to item
539
+ 4. Close gripper
540
+ 5. Move up
541
+ 6. Move above target bin
542
+ 7. Open gripper (drop)
543
+ 8. Return to neutral
544
+ """
545
+ info = self.items.get(item_name)
546
+ if not info or info.picked:
547
+ logger.warning(f"Item {item_name} not found or already picked")
548
+ return False
549
+
550
+ # Freeze all other items in place before we move the arm
551
+ self._stabilize_unpicked_items(exclude=item_name)
552
+
553
+ self._arm_busy = True
554
+ try:
555
+ item_pos = self.get_item_pos(item_name)
556
+ if item_pos is None:
557
+ logger.warning(f"Cannot get position for {item_name}")
558
+ return False
559
+
560
+ # Sanity check: item must be within reachable workspace
561
+ if (abs(item_pos[0]) > 1.0 or abs(item_pos[1]) > 1.0
562
+ or item_pos[2] < 0.0 or item_pos[2] > 1.0):
563
+ logger.warning(
564
+ f"Item {item_name} at {item_pos} is outside reachable "
565
+ f"workspace — it may have been displaced by physics")
566
+ return False
567
+
568
+ logger.info(f"Picking {item_name} at {item_pos} -> {target_bin.value}")
569
+
570
+ # 1. Open gripper
571
+ self.set_gripper(True)
572
+ self.step(50)
573
+
574
+ # 1.5 Move high to ensure we clear the scene
575
+ safe_high = np.array([0.0, 0.0, 0.65])
576
+ if not self.move_to_position(safe_high, move_steps=200, settle_steps=50):
577
+ return False
578
+
579
+ # Re-read item position after safe-high move (physics may shift items)
580
+ item_pos = self.get_item_pos(item_name)
581
+ if item_pos is None or item_pos[2] < 0.0 or item_pos[2] > 1.0:
582
+ logger.warning(
583
+ f"Item {item_name} moved to invalid position {item_pos} "
584
+ f"during arm movement")
585
+ return False
586
+
587
+ # 2. Move above item (approach from above)
588
+ approach_pos = item_pos.copy()
589
+ approach_pos[2] += 0.10
590
+ if not self.move_to_position(approach_pos):
591
+ logger.warning(f"Failed to reach approach position for {item_name}")
592
+ return False
593
+
594
+ # 3. Move down to grasp
595
+ grasp_pos = item_pos.copy()
596
+ grasp_pos[2] += 0.03
597
+ if not self.move_to_position(grasp_pos):
598
+ logger.warning(f"Failed to reach grasp position for {item_name}")
599
+ return False
600
+
601
+ # 4. Close gripper
602
+ self.set_gripper(False)
603
+ self.step(120) # allow gripper to close
604
+
605
+ # Verify we are close enough to claim a grasp.
606
+ ee_pos = self.get_ee_pos()
607
+ item_now = self.get_item_pos(item_name)
608
+ if item_now is None or np.linalg.norm(ee_pos - item_now) > 0.12:
609
+ logger.warning(
610
+ f"Grasp verification failed for {item_name}: "
611
+ f"ee={ee_pos}, item={item_now}")
612
+ return False
613
+
614
+ # Kinematic carry of the item for deterministic phase testing.
615
+ carry_offset = np.array([0.0, 0.0, -0.06])
616
+ self._set_item_pose(item_name, ee_pos + carry_offset)
617
+ mujoco.mj_forward(self.model, self.data)
618
+
619
+ # 5. Lift up while carrying.
620
+ lift_pos = grasp_pos.copy()
621
+ lift_pos[2] += 0.22
622
+ if not self.move_to_position(
623
+ lift_pos, carry_item=item_name, carry_offset=carry_offset):
624
+ return False
625
+
626
+ # 6. Move above target bin while carrying.
627
+ bin_pos = BIN_POSITIONS[target_bin].copy()
628
+ if not self.move_to_position(
629
+ bin_pos, carry_item=item_name, carry_offset=carry_offset):
630
+ return False
631
+
632
+ # 7. Place and release.
633
+ drop_pos = bin_pos.copy()
634
+ drop_pos[2] -= 0.12
635
+ self._set_item_pose(item_name, drop_pos)
636
+ mujoco.mj_forward(self.model, self.data)
637
+ self.set_gripper(True)
638
+ self.step(100)
639
+
640
+ # Mark item as sorted only after successful place.
641
+ info.picked = True
642
+ self._items_sorted += 1
643
+
644
+ # 8. Return to neutral.
645
+ neutral = np.array([0.0, 0.0, 0.6])
646
+ self.move_to_position(neutral)
647
+
648
+ # Stabilize remaining items after arm movement
649
+ self._stabilize_unpicked_items()
650
+
651
+ logger.info(f"Successfully placed {item_name} in {target_bin.value}")
652
+ return True
653
+ finally:
654
+ self._arm_busy = False
655
+
656
+ # ─── State snapshot ──────────────────────────────────────────────────
657
+
658
+ def get_state(self) -> SimState:
659
+ """Get current simulation state for the frontend."""
660
+ ee = self.get_ee_pos()
661
+ items_info = []
662
+ for name, info in self.items.items():
663
+ pos = self.get_item_pos(name)
664
+ items_info.append({
665
+ "name": name,
666
+ "pos": pos.tolist() if pos is not None else [0, 0, 0],
667
+ "is_hazardous": info.is_hazardous,
668
+ "hazard_type": info.hazard_type.value if info.hazard_type else None,
669
+ "picked": info.picked,
670
+ })
671
+
672
+ return SimState(
673
+ time=self.data.time,
674
+ ee_pos=tuple(ee),
675
+ gripper_open=self.data.ctrl[GRIPPER_ACTUATOR_ID] > 100,
676
+ items=items_info,
677
+ arm_busy=self._arm_busy,
678
+ items_sorted=self._items_sorted,
679
+ )
680
+
681
+ # ─── Rendering ───────────────────────────────────────────────────────
682
+
683
+ def render_frame(self, width: int = 1280, height: int = 720,
684
+ camera: str = "overview") -> np.ndarray:
685
+ """Render a frame from the specified camera. Returns RGB array."""
686
+ if self.renderer is None:
687
+ self.renderer = mujoco.Renderer(self.model, height, width)
688
+
689
+ cam_id = mujoco.mj_name2id(
690
+ self.model, mujoco.mjtObj.mjOBJ_CAMERA, camera)
691
+
692
+ self.renderer.update_scene(self.data, camera=cam_id)
693
+ return self.renderer.render()
694
+
695
+ def save_frame(self, path: str, camera: str = "overview") -> None:
696
+ """Render a frame and save as PNG."""
697
+ from PIL import Image
698
+ frame = self.render_frame(camera=camera)
699
+ Image.fromarray(frame).save(path)
700
+ logger.info(f"Frame saved to {path}")
701
+
702
+ def close(self) -> None:
703
+ """Release renderer resources explicitly."""
704
+ if self.renderer is not None:
705
+ try:
706
+ self.renderer.close()
707
+ except Exception:
708
+ pass # EGL cleanup errors are harmless at shutdown
709
+ self.renderer = None
710
+
711
+ # ─── Interactive viewer ──────────────────────────────────────────────
712
+
713
+ def launch_viewer(self) -> None:
714
+ """Launch the interactive MuJoCo viewer."""
715
+ key_id = mujoco.mj_name2id(
716
+ self.model, mujoco.mjtObj.mjOBJ_KEY, "home")
717
+ if key_id >= 0:
718
+ mujoco.mj_resetDataKeyframe(self.model, self.data, key_id)
719
+ mujoco.viewer.launch(self.model, self.data)
720
+
721
+ # ─── Async interface for agent integration ───────────────────────────
722
+
723
+ async def async_pick_and_place(self, item_name: str,
724
+ target_bin: BinType) -> Dict:
725
+ """Async wrapper around pick_and_place for agent integration."""
726
+ loop = asyncio.get_event_loop()
727
+ success = await loop.run_in_executor(
728
+ None, self.pick_and_place, item_name, target_bin
729
+ )
730
+ return {
731
+ "success": success,
732
+ "item": item_name,
733
+ "target_bin": target_bin.value,
734
+ "items_sorted": self._items_sorted,
735
+ }
736
+
737
+ async def async_get_state(self) -> Dict:
738
+ """Async state snapshot."""
739
+ state = self.get_state()
740
+ return {
741
+ "time": state.time,
742
+ "ee_pos": list(state.ee_pos),
743
+ "gripper_open": state.gripper_open,
744
+ "items": state.items,
745
+ "arm_busy": state.arm_busy,
746
+ "items_sorted": state.items_sorted,
747
+ }
748
+
749
+
750
+ # ─── CLI entry point ────────────────────────────────────────────────────────
751
+
752
+ def main():
753
+ import argparse
754
+
755
+ parser = argparse.ArgumentParser(description="SemSorter Simulation Controller")
756
+ parser.add_argument("--render", action="store_true",
757
+ help="Render a test frame and save as PNG")
758
+ parser.add_argument("--test-pick", action="store_true",
759
+ help="Test pick-and-place of first hazardous item")
760
+ parser.add_argument("--output", default="test_frame.png",
761
+ help="Output path for rendered frame")
762
+ args = parser.parse_args()
763
+
764
+ logging.basicConfig(level=logging.INFO)
765
+
766
+ sim = SemSorterSimulation()
767
+ sim.load_scene()
768
+
769
+ try:
770
+ if args.render:
771
+ sim.save_frame(args.output)
772
+ print(f"Frame saved to {args.output}")
773
+ elif args.test_pick:
774
+ print("Testing pick-and-place...")
775
+ sim.pick_and_place("item_flammable_1", BinType.FLAMMABLE)
776
+ sim.save_frame("after_pick.png")
777
+ print(f"Done! Items sorted: {sim._items_sorted}")
778
+ else:
779
+ print("Launching interactive viewer...")
780
+ sim.launch_viewer()
781
+ finally:
782
+ sim.close()
783
+
784
+
785
+ if __name__ == "__main__":
786
+ main()
SemSorter/simulation/interactive_test.py ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Interactive viewer for SemSorter simulation.
3
+ Runs pick-and-place in real time with the MuJoCo viewer.
4
+
5
+ Usage:
6
+ python3 interactive_test.py
7
+ """
8
+ import os
9
+ import time
10
+ import mujoco
11
+ import mujoco.viewer
12
+ try:
13
+ from .controller import SemSorterSimulation, BinType
14
+ except ImportError:
15
+ from controller import SemSorterSimulation, BinType
16
+
17
+ # How often to sync the viewer (every N physics steps)
18
+ VIEWER_SYNC_INTERVAL = 10
19
+
20
+ def main():
21
+ print("Initializing simulation...")
22
+ # NOTE: Do NOT set MUJOCO_GL=egl when using the interactive viewer
23
+ if 'MUJOCO_GL' in os.environ:
24
+ del os.environ['MUJOCO_GL']
25
+
26
+ sim = SemSorterSimulation()
27
+ sim.load_scene()
28
+
29
+ print("Launching interactive viewer. Watch the arm move!")
30
+
31
+ with mujoco.viewer.launch_passive(sim.model, sim.data) as viewer:
32
+
33
+ # Patch mj_step to sync viewer every N steps (much faster than every step)
34
+ original_mj_step = mujoco.mj_step
35
+ step_counter = [0]
36
+
37
+ def patched_mj_step(model, data):
38
+ original_mj_step(model, data)
39
+ step_counter[0] += 1
40
+ if step_counter[0] % VIEWER_SYNC_INTERVAL == 0:
41
+ viewer.sync()
42
+ # Sleep only on sync frames to maintain ~real-time playback
43
+ time.sleep(model.opt.timestep * VIEWER_SYNC_INTERVAL)
44
+
45
+ mujoco.mj_step = patched_mj_step
46
+
47
+ try:
48
+ # Let the scene settle
49
+ sim.step(200)
50
+
51
+ time.sleep(2) # Give user time to see the initial state
52
+ print("\nStarting pick-and-place operation...")
53
+ success = sim.pick_and_place("item_flammable_1", BinType.FLAMMABLE)
54
+
55
+ print(f"\nDone! success={success}, items sorted: {sim._items_sorted}")
56
+ print("\nYou can close the viewer window now, or press Ctrl+C.")
57
+
58
+ # Keep viewer open until user closes it
59
+ while viewer.is_running():
60
+ original_mj_step(sim.model, sim.data)
61
+ viewer.sync()
62
+ time.sleep(0.02) # ~50 FPS idle
63
+
64
+ except KeyboardInterrupt:
65
+ print("\nViewer closed.")
66
+ finally:
67
+ mujoco.mj_step = original_mj_step
68
+
69
+ if __name__ == "__main__":
70
+ main()
SemSorter/simulation/semsorter_scene.xml ADDED
@@ -0,0 +1,194 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <mujoco model="semsorter">
2
+ <!-- Let panda.xml handle its own meshdir via its embedded compiler element -->
3
+ <compiler angle="radian" autolimits="true"/>
4
+
5
+ <option integrator="implicitfast" gravity="0 0 -9.81" timestep="0.002"/>
6
+
7
+ <!-- ============================================================ -->
8
+ <!-- Include the Franka Panda arm (with integrated gripper) -->
9
+ <!-- ============================================================ -->
10
+ <include file="../../mujoco_menagerie/franka_emika_panda/panda.xml"/>
11
+
12
+ <statistic center="0 0 0.5" extent="1.5"/>
13
+
14
+ <!-- ============================================================ -->
15
+ <!-- Visual settings -->
16
+ <!-- ============================================================ -->
17
+ <visual>
18
+ <headlight diffuse="0.6 0.6 0.6" ambient="0.3 0.3 0.3" specular="0 0 0"/>
19
+ <rgba haze="0.15 0.25 0.35 1"/>
20
+ <global azimuth="150" elevation="-25"/>
21
+ </visual>
22
+
23
+ <!-- ============================================================ -->
24
+ <!-- Textures & Materials -->
25
+ <!-- ============================================================ -->
26
+ <asset>
27
+ <texture type="skybox" builtin="gradient" rgb1="0.3 0.5 0.7" rgb2="0 0 0" width="512" height="3072"/>
28
+ <texture type="2d" name="groundplane" builtin="checker" mark="edge"
29
+ rgb1="0.2 0.3 0.4" rgb2="0.1 0.2 0.3" markrgb="0.8 0.8 0.8" width="300" height="300"/>
30
+ <material name="groundplane" texture="groundplane" texuniform="true" texrepeat="5 5" reflectance="0.2"/>
31
+
32
+ <!-- Conveyor belt material -->
33
+ <texture type="2d" name="belt_tex" builtin="checker" rgb1="0.15 0.15 0.15" rgb2="0.2 0.2 0.2"
34
+ width="100" height="100"/>
35
+ <material name="belt_mat" texture="belt_tex" texrepeat="10 2" specular="0.1" shininess="0.05"/>
36
+
37
+ <!-- Conveyor frame material -->
38
+ <material name="frame_mat" rgba="0.4 0.4 0.45 1" specular="0.3" shininess="0.2"/>
39
+
40
+ <!-- Bin materials -->
41
+ <material name="bin_flammable_mat" rgba="0.85 0.15 0.1 0.9" specular="0.2" shininess="0.1"/>
42
+ <material name="bin_chemical_mat" rgba="0.95 0.75 0.1 0.9" specular="0.2" shininess="0.1"/>
43
+ <material name="bin_inner_mat" rgba="0.1 0.1 0.1 1"/>
44
+
45
+ <!-- Hazardous item materials -->
46
+ <material name="hazard_red" rgba="0.9 0.1 0.1 1" specular="0.4" shininess="0.3"/>
47
+ <material name="hazard_green" rgba="0.1 0.8 0.2 1" specular="0.4" shininess="0.3"/>
48
+ <material name="hazard_blue" rgba="0.1 0.2 0.9 1" specular="0.4" shininess="0.3"/>
49
+ <material name="hazard_yellow" rgba="0.95 0.85 0.1 1" specular="0.4" shininess="0.3"/>
50
+ <material name="safe_gray" rgba="0.6 0.6 0.6 1" specular="0.3" shininess="0.2"/>
51
+ <material name="safe_white" rgba="0.9 0.9 0.9 1" specular="0.3" shininess="0.2"/>
52
+ </asset>
53
+
54
+ <!-- ============================================================ -->
55
+ <!-- World: floor, lights, conveyors, bins, items -->
56
+ <!-- ============================================================ -->
57
+ <worldbody>
58
+ <!-- Lighting -->
59
+ <light pos="0 0 3" dir="0 0 -1" directional="true" diffuse="0.5 0.5 0.5"/>
60
+ <light pos="1 -1 2" dir="-0.3 0.3 -0.8" diffuse="0.3 0.3 0.3"/>
61
+ <light pos="-1 -1 2" dir="0.3 0.3 -0.8" diffuse="0.3 0.3 0.3"/>
62
+
63
+ <!-- Ground plane -->
64
+ <geom name="floor" size="0 0 0.05" type="plane" material="groundplane"/>
65
+
66
+ <!-- ======================================================== -->
67
+ <!-- CONVEYOR A (Input) — items arrive here from the left -->
68
+ <!-- ======================================================== -->
69
+ <body name="conveyor_input" pos="-0.55 0 0">
70
+ <!-- Belt surface -->
71
+ <geom name="belt_input" type="box" size="0.35 0.12 0.005" pos="0 0 0.35"
72
+ material="belt_mat" friction="0.8 0.005 0.0001"/>
73
+ <!-- Side rails -->
74
+ <geom name="rail_input_L" type="box" size="0.35 0.005 0.02" pos="0 0.125 0.37"
75
+ material="frame_mat"/>
76
+ <geom name="rail_input_R" type="box" size="0.35 0.005 0.02" pos="0 -0.125 0.37"
77
+ material="frame_mat"/>
78
+ <!-- Legs -->
79
+ <geom type="cylinder" size="0.015 0.175" pos="-0.3 0.1 0.175" material="frame_mat"/>
80
+ <geom type="cylinder" size="0.015 0.175" pos="-0.3 -0.1 0.175" material="frame_mat"/>
81
+ <geom type="cylinder" size="0.015 0.175" pos="0.3 0.1 0.175" material="frame_mat"/>
82
+ <geom type="cylinder" size="0.015 0.175" pos="0.3 -0.1 0.175" material="frame_mat"/>
83
+ </body>
84
+
85
+ <!-- ======================================================== -->
86
+ <!-- CONVEYOR B (Output) — clean items continue here (right) -->
87
+ <!-- ======================================================== -->
88
+ <body name="conveyor_output" pos="0.55 0 0">
89
+ <!-- Belt surface -->
90
+ <geom name="belt_output" type="box" size="0.35 0.12 0.005" pos="0 0 0.35"
91
+ material="belt_mat" friction="0.8 0.005 0.0001"/>
92
+ <!-- Side rails -->
93
+ <geom name="rail_output_L" type="box" size="0.35 0.005 0.02" pos="0 0.125 0.37"
94
+ material="frame_mat"/>
95
+ <geom name="rail_output_R" type="box" size="0.35 0.005 0.02" pos="0 -0.125 0.37"
96
+ material="frame_mat"/>
97
+ <!-- Legs -->
98
+ <geom type="cylinder" size="0.015 0.175" pos="-0.3 0.1 0.175" material="frame_mat"/>
99
+ <geom type="cylinder" size="0.015 0.175" pos="-0.3 -0.1 0.175" material="frame_mat"/>
100
+ <geom type="cylinder" size="0.015 0.175" pos="0.3 0.1 0.175" material="frame_mat"/>
101
+ <geom type="cylinder" size="0.015 0.175" pos="0.3 -0.1 0.175" material="frame_mat"/>
102
+ </body>
103
+
104
+ <!-- ======================================================== -->
105
+ <!-- FLAMMABLE WASTE BIN (Red) — front-left of the arm -->
106
+ <!-- ======================================================== -->
107
+ <body name="bin_flammable" pos="-0.25 0.45 0">
108
+ <!-- Bin walls (open top box) -->
109
+ <geom name="bin_fl_back" type="box" size="0.1 0.005 0.12" pos="0 -0.095 0.12" material="bin_flammable_mat"/>
110
+ <geom name="bin_fl_front" type="box" size="0.1 0.005 0.12" pos="0 0.095 0.12" material="bin_flammable_mat"/>
111
+ <geom name="bin_fl_left" type="box" size="0.005 0.1 0.12" pos="-0.095 0 0.12" material="bin_flammable_mat"/>
112
+ <geom name="bin_fl_right" type="box" size="0.005 0.1 0.12" pos="0.095 0 0.12" material="bin_flammable_mat"/>
113
+ <geom name="bin_fl_bottom" type="box" size="0.1 0.1 0.005" pos="0 0 0.005" material="bin_inner_mat"/>
114
+ <!-- Label area (slightly raised red panel on front) -->
115
+ <site name="bin_flammable_label" pos="0 0.1 0.18" size="0.06 0.005 0.03" type="box" rgba="1 0 0 1"/>
116
+ </body>
117
+
118
+ <!-- ======================================================== -->
119
+ <!-- CHEMICAL WASTE BIN (Yellow) — front-right of the arm -->
120
+ <!-- ======================================================== -->
121
+ <body name="bin_chemical" pos="0.25 0.45 0">
122
+ <!-- Bin walls (open top box) -->
123
+ <geom name="bin_ch_back" type="box" size="0.1 0.005 0.12" pos="0 -0.095 0.12" material="bin_chemical_mat"/>
124
+ <geom name="bin_ch_front" type="box" size="0.1 0.005 0.12" pos="0 0.095 0.12" material="bin_chemical_mat"/>
125
+ <geom name="bin_ch_left" type="box" size="0.005 0.1 0.12" pos="-0.095 0 0.12" material="bin_chemical_mat"/>
126
+ <geom name="bin_ch_right" type="box" size="0.005 0.1 0.12" pos="0.095 0 0.12" material="bin_chemical_mat"/>
127
+ <geom name="bin_ch_bottom" type="box" size="0.1 0.1 0.005" pos="0 0 0.005" material="bin_inner_mat"/>
128
+ <!-- Label area -->
129
+ <site name="bin_chemical_label" pos="0 0.1 0.18" size="0.06 0.005 0.03" type="box" rgba="1 0.8 0 1"/>
130
+ </body>
131
+
132
+ <!-- ======================================================== -->
133
+ <!-- HAZARDOUS ITEMS (on input conveyor, with free joints) -->
134
+ <!-- ======================================================== -->
135
+
136
+ <!-- Item 1: Red cylinder (flammable chemical) — leftmost -->
137
+ <body name="item_flammable_1" pos="-0.82 0 0.39">
138
+ <freejoint name="item_flammable_1_jnt"/>
139
+ <geom name="item_flammable_1_geom" type="cylinder" size="0.02 0.025"
140
+ material="hazard_red" mass="0.05" friction="1 0.005 0.0001" priority="1"/>
141
+ </body>
142
+
143
+ <!-- Item 2: Safe white cylinder (goes to output conveyor) -->
144
+ <body name="item_safe_2" pos="-0.70 0 0.39">
145
+ <freejoint name="item_safe_2_jnt"/>
146
+ <geom name="item_safe_2_geom" type="cylinder" size="0.018 0.02"
147
+ material="safe_white" mass="0.04" friction="1 0.005 0.0001" priority="1"/>
148
+ </body>
149
+
150
+ <!-- Item 3: Yellow box (chemical waste) -->
151
+ <body name="item_chemical_1" pos="-0.58 0 0.385">
152
+ <freejoint name="item_chemical_1_jnt"/>
153
+ <geom name="item_chemical_1_geom" type="box" size="0.02 0.02 0.02"
154
+ material="hazard_yellow" mass="0.04" friction="1 0.005 0.0001" priority="1"/>
155
+ </body>
156
+
157
+ <!-- Item 4: Safe gray box (goes to output conveyor) -->
158
+ <body name="item_safe_1" pos="-0.46 0 0.385">
159
+ <freejoint name="item_safe_1_jnt"/>
160
+ <geom name="item_safe_1_geom" type="box" size="0.025 0.02 0.015"
161
+ material="safe_gray" mass="0.05" friction="1 0.005 0.0001" priority="1"/>
162
+ </body>
163
+
164
+ <!-- Item 5: Blue box (chemical waste) -->
165
+ <body name="item_chemical_2" pos="-0.34 0 0.385">
166
+ <freejoint name="item_chemical_2_jnt"/>
167
+ <geom name="item_chemical_2_geom" type="box" size="0.018 0.018 0.018"
168
+ material="hazard_blue" mass="0.03" friction="1 0.005 0.0001" priority="1"/>
169
+ </body>
170
+
171
+ <!-- Item 6: Green box (flammable) — rightmost -->
172
+ <body name="item_flammable_2" pos="-0.22 0 0.385">
173
+ <freejoint name="item_flammable_2_jnt"/>
174
+ <geom name="item_flammable_2_geom" type="box" size="0.018 0.018 0.018"
175
+ material="hazard_green" mass="0.035" friction="1 0.005 0.0001" priority="1"/>
176
+ </body>
177
+
178
+ <!-- ======================================================== -->
179
+ <!-- Camera for the overview shot (used by OBS or renderer) -->
180
+ <!-- ======================================================== -->
181
+ <camera name="overview" pos="0 -1.2 1.2" xyaxes="1 0 0 0 0.7 0.7" fovy="50"/>
182
+ <camera name="topdown" pos="0 0 2.0" xyaxes="1 0 0 0 1 0" fovy="60"/>
183
+ <camera name="side" pos="1.5 0 0.8" xyaxes="0 1 0 -0.5 0 0.87" fovy="45"/>
184
+
185
+ </worldbody>
186
+
187
+ <!-- ============================================================ -->
188
+ <!-- Sensors for end-effector position tracking -->
189
+ <!-- ============================================================ -->
190
+ <sensor>
191
+ <framepos name="end_effector_pos" objtype="body" objname="hand"/>
192
+ </sensor>
193
+
194
+ </mujoco>
SemSorter/vision/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+ # SemSorter Vision Module
SemSorter/vision/test_obs.py ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import cv2
2
+ import time
3
+
4
+ def main():
5
+ print("Testing OBS Virtual Camera on /dev/video4...")
6
+ # Open the virtual camera
7
+ cap = cv2.VideoCapture(4)
8
+
9
+ if not cap.isOpened():
10
+ print("Error: Could not open video device /dev/video4.")
11
+ print("Please ensure OBS Virtual Camera is running.")
12
+ return
13
+
14
+ print("Successfully opened camera. Waiting 2 seconds for it to warm up...")
15
+ time.sleep(2)
16
+
17
+ ret, frame = cap.read()
18
+
19
+ if not ret:
20
+ print("Error: Could not read frame from camera.")
21
+ else:
22
+ output_file = "obs_snapshot.png"
23
+ cv2.imwrite(output_file, frame)
24
+ print(f"Success! Captured frame with shape {frame.shape} and saved to {output_file}.")
25
+
26
+ cap.release()
27
+
28
+ if __name__ == "__main__":
29
+ main()
SemSorter/vision/vision_pipeline.py ADDED
@@ -0,0 +1,239 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ SemSorter Vision Pipeline — Hazard Detection Processor
3
+
4
+ Captures frames from OBS Virtual Camera or directly from the simulation,
5
+ then sends them to Gemini VLM for hazardous item detection.
6
+
7
+ Usage:
8
+ # From OBS Virtual Camera:
9
+ GOOGLE_API_KEY=... python3 vision_pipeline.py
10
+
11
+ # From simulation directly (no OBS needed):
12
+ GOOGLE_API_KEY=... python3 vision_pipeline.py --direct
13
+ """
14
+
15
+ import os
16
+ import sys
17
+ import cv2
18
+ import json
19
+ import time
20
+ import logging
21
+ import google.generativeai as genai
22
+ from PIL import Image
23
+ from typing import List, Dict, Optional
24
+
25
+ logger = logging.getLogger(__name__)
26
+
27
+
28
+ class HazardDetectionProcessor:
29
+ """
30
+ Detects hazardous items in the SemSorter simulation using Gemini VLM.
31
+
32
+ Supports two input modes:
33
+ - OBS Virtual Camera: reads from /dev/videoX
34
+ - Direct simulation rendering: calls sim.render_frame()
35
+ """
36
+
37
+ def __init__(self, device_id: int = 4, simulation=None):
38
+ """
39
+ Args:
40
+ device_id: Video device ID for OBS Virtual Camera (e.g., 4 for /dev/video4)
41
+ simulation: Optional SemSorterSimulation instance for direct rendering
42
+ """
43
+ self.device_id = device_id
44
+ self.simulation = simulation
45
+ self._video_cap = None # Reusable VideoCapture
46
+ self._gemini_model = None # Lazy-initialized
47
+
48
+ # System instructions to enforce structured JSON output
49
+ self.system_instruction = (
50
+ "You are an AI vision system for a robotic waste sorting arm. "
51
+ "You are given an image of a conveyor belt with a robotic arm and waste bins. "
52
+ "Your task is to identify hazardous items on the conveyor belt. "
53
+ "Hazardous items are categorized as:\n"
54
+ "- FLAMMABLE: Red-colored items (cylinders, boxes)\n"
55
+ "- CHEMICAL: Yellow-colored items (boxes, spheres)\n\n"
56
+ "Safe items are gray, white, green, or blue — IGNORE these.\n\n"
57
+ "For each hazardous item detected, return a JSON object with:\n"
58
+ "- 'name': descriptive name like 'red_cylinder_1' or 'yellow_box_1'\n"
59
+ "- 'type': either 'FLAMMABLE' or 'CHEMICAL'\n"
60
+ "- 'color': the detected color (e.g., 'red', 'yellow')\n"
61
+ "- 'shape': the detected shape (e.g., 'cylinder', 'box', 'sphere')\n"
62
+ "- 'box_2d': bounding box as [ymin, xmin, ymax, xmax] normalized to 0-1000 scale\n\n"
63
+ "Return ONLY a JSON array of detected hazardous items. "
64
+ "If no hazardous items are visible, return an empty array []."
65
+ )
66
+
67
+ def _get_gemini_model(self):
68
+ """Lazy-initialize Gemini model (only when analyze_frame is called)."""
69
+ if self._gemini_model is None:
70
+ api_key = os.environ.get("GOOGLE_API_KEY")
71
+ if not api_key:
72
+ raise ValueError(
73
+ "GOOGLE_API_KEY environment variable not set.\n"
74
+ "Get one at https://aistudio.google.com/apikey"
75
+ )
76
+ genai.configure(api_key=api_key)
77
+ self._gemini_model = genai.GenerativeModel(
78
+ model_name="gemini-3-flash-preview",
79
+ system_instruction=self.system_instruction,
80
+ generation_config={"response_mime_type": "application/json"}
81
+ )
82
+ return self._gemini_model
83
+
84
+ def capture_frame(self) -> Image.Image:
85
+ """
86
+ Capture a single frame.
87
+ Uses direct simulation rendering if available, otherwise OBS camera.
88
+ """
89
+ if self.simulation is not None:
90
+ return self._capture_from_simulation()
91
+ else:
92
+ return self._capture_from_obs()
93
+
94
+ def _capture_from_simulation(self) -> Image.Image:
95
+ """Render a frame directly from the MuJoCo simulation."""
96
+ frame = self.simulation.render_frame(camera="overview")
97
+ return Image.fromarray(frame)
98
+
99
+ def _capture_from_obs(self) -> Image.Image:
100
+ """Capture a frame from the OBS Virtual Camera."""
101
+ if self._video_cap is None or not self._video_cap.isOpened():
102
+ self._video_cap = cv2.VideoCapture(self.device_id)
103
+ if not self._video_cap.isOpened():
104
+ raise RuntimeError(
105
+ f"Could not open video device /dev/video{self.device_id}. "
106
+ "Ensure OBS Virtual Camera is running."
107
+ )
108
+ # Warm up — discard stale frames
109
+ for _ in range(5):
110
+ self._video_cap.read()
111
+
112
+ ret, frame = self._video_cap.read()
113
+ if not ret:
114
+ raise RuntimeError("Failed to read frame from OBS Virtual Camera")
115
+
116
+ frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
117
+ return Image.fromarray(frame_rgb)
118
+
119
+ def analyze_frame(self, pil_image: Image.Image) -> List[Dict]:
120
+ """
121
+ Send the image to Gemini VLM and parse the structured JSON response.
122
+
123
+ Returns:
124
+ List of dicts, each with keys: name, type, color, shape, box_2d
125
+ """
126
+ prompt = (
127
+ "Analyze this image of a robotic sorting station. "
128
+ "Identify all FLAMMABLE (red) and CHEMICAL (yellow) items "
129
+ "on the conveyor belt. Return their positions as bounding boxes."
130
+ )
131
+
132
+ logger.info("Sending frame to Gemini VLM...")
133
+ model = self._get_gemini_model()
134
+ response = model.generate_content([prompt, pil_image])
135
+
136
+ raw_text = getattr(response, "text", None)
137
+ if not isinstance(raw_text, str) or not raw_text.strip():
138
+ logger.error("VLM response did not contain JSON text output")
139
+ return []
140
+
141
+ try:
142
+ results = json.loads(raw_text)
143
+ if isinstance(results, dict) and "items" in results:
144
+ results = results["items"]
145
+ if not isinstance(results, list):
146
+ logger.error(f"Unexpected VLM JSON shape: {type(results).__name__}")
147
+ return []
148
+ logger.info(f"VLM detected {len(results)} hazardous items")
149
+ return results
150
+ except (json.JSONDecodeError, TypeError):
151
+ logger.error(f"Failed to parse VLM response:\n{raw_text}")
152
+ return []
153
+
154
+ def detect_hazards(self) -> List[Dict]:
155
+ """
156
+ Full pipeline: capture frame → analyze → return results.
157
+ Convenience method combining capture_frame() and analyze_frame().
158
+ """
159
+ image = self.capture_frame()
160
+ return self.analyze_frame(image)
161
+
162
+ def close(self):
163
+ """Release video capture resources."""
164
+ if self._video_cap is not None:
165
+ self._video_cap.release()
166
+ self._video_cap = None
167
+
168
+
169
+ # ─── CLI entry point ────────────────────────────────────────────────────────
170
+
171
+ def main():
172
+ import argparse
173
+
174
+ parser = argparse.ArgumentParser(description="SemSorter Hazard Detection")
175
+ parser.add_argument("--direct", action="store_true",
176
+ help="Use direct simulation rendering instead of OBS")
177
+ parser.add_argument("--device", type=int, default=4,
178
+ help="OBS Virtual Camera device ID (default: 4)")
179
+ parser.add_argument("--output", default="vision_debug.png",
180
+ help="Save captured frame to this path")
181
+ args = parser.parse_args()
182
+
183
+ logging.basicConfig(level=logging.INFO)
184
+
185
+ simulation = None
186
+ if args.direct:
187
+ # Must be set before importing MuJoCo/controller in this process.
188
+ os.environ.setdefault("MUJOCO_GL", "egl")
189
+ # Import and initialize simulation for direct rendering
190
+ try:
191
+ from ..simulation.controller import SemSorterSimulation
192
+ except ImportError:
193
+ sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..', 'simulation'))
194
+ from controller import SemSorterSimulation
195
+ print("Initializing simulation for direct rendering...")
196
+ simulation = SemSorterSimulation()
197
+ simulation.load_scene()
198
+ simulation.step(200) # Let physics settle
199
+
200
+ processor = HazardDetectionProcessor(
201
+ device_id=args.device,
202
+ simulation=simulation
203
+ )
204
+
205
+ try:
206
+ print("Capturing frame...")
207
+ image = processor.capture_frame()
208
+ image.save(args.output)
209
+ print(f"Saved frame to {args.output}")
210
+
211
+ print("Analyzing frame with Gemini VLM...")
212
+ results = processor.analyze_frame(image)
213
+
214
+ print("\n" + "=" * 50)
215
+ print(" HAZARD DETECTION RESULTS")
216
+ print("=" * 50)
217
+
218
+ if not results:
219
+ print(" No hazardous items detected.")
220
+ else:
221
+ for i, item in enumerate(results, 1):
222
+ print(f"\n [{i}] {item.get('name', 'unknown')}")
223
+ print(f" Type: {item.get('type', '?')}")
224
+ print(f" Color: {item.get('color', '?')}")
225
+ print(f" Shape: {item.get('shape', '?')}")
226
+ print(f" Box: {item.get('box_2d', '?')}")
227
+
228
+ print("\n" + "=" * 50)
229
+ print(f" Total hazardous items: {len(results)}")
230
+ print("=" * 50)
231
+
232
+ finally:
233
+ processor.close()
234
+ if simulation is not None and hasattr(simulation, "close"):
235
+ simulation.close()
236
+
237
+
238
+ if __name__ == "__main__":
239
+ main()
SemSorter/vision/vlm_bridge.py ADDED
@@ -0,0 +1,269 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ SemSorter VLM-to-Simulation Bridge
3
+
4
+ Maps VLM hazard detections to simulation item names and orchestrates
5
+ the pick-and-place sequence. This is the glue between Phase 2 (Vision)
6
+ and Phase 1 (Simulation).
7
+
8
+ Usage:
9
+ # End-to-end test (direct render, no OBS):
10
+ MUJOCO_GL=egl GOOGLE_API_KEY=... python3 vlm_bridge.py --direct
11
+
12
+ # With OBS Virtual Camera:
13
+ GOOGLE_API_KEY=... python3 vlm_bridge.py
14
+ """
15
+
16
+ import os
17
+ import sys
18
+ import logging
19
+ from typing import List, Dict, Optional, Tuple
20
+
21
+ try:
22
+ from .vision_pipeline import HazardDetectionProcessor
23
+ except ImportError:
24
+ from vision_pipeline import HazardDetectionProcessor
25
+
26
+ try:
27
+ from ..simulation.controller import BinType, SemSorterSimulation
28
+ except ImportError:
29
+ sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..', 'simulation'))
30
+ from controller import BinType, SemSorterSimulation
31
+
32
+ logger = logging.getLogger(__name__)
33
+
34
+
35
+ class VLMSimBridge:
36
+ """
37
+ Bridge between VLM hazard detections and the simulation controller.
38
+
39
+ Matching strategy:
40
+ 1. VLM detects items by color/shape → returns type (FLAMMABLE/CHEMICAL)
41
+ 2. Simulation has named items with known hazard types
42
+ 3. We match VLM detections to unpicked simulation items of the same type
43
+ 4. For multiple items of the same type, we use spatial ordering (left-to-right
44
+ on the conveyor) to assign matches
45
+ """
46
+
47
+ def __init__(self, simulation, device_id: int = 4, use_direct: bool = False):
48
+ """
49
+ Args:
50
+ simulation: SemSorterSimulation instance
51
+ device_id: OBS Virtual Camera device ID
52
+ use_direct: If True, render frames from simulation instead of OBS
53
+ """
54
+ self.simulation = simulation
55
+ self.processor = HazardDetectionProcessor(
56
+ device_id=device_id,
57
+ simulation=simulation if use_direct else None
58
+ )
59
+
60
+ def get_unpicked_items_by_type(self, hazard_type: str) -> List[Tuple[str, float]]:
61
+ """
62
+ Get unpicked simulation items of a given hazard type,
63
+ sorted by X position (leftmost first = highest priority on conveyor).
64
+
65
+ Returns:
66
+ List of (item_name, x_position) tuples
67
+ """
68
+ type_map = {
69
+ "FLAMMABLE": BinType.FLAMMABLE,
70
+ "CHEMICAL": BinType.CHEMICAL,
71
+ }
72
+
73
+ target_type = type_map.get(hazard_type)
74
+ if target_type is None:
75
+ return []
76
+
77
+ items = []
78
+ for name, info in self.simulation.items.items():
79
+ if info.hazard_type == target_type and not info.picked:
80
+ pos = self.simulation.get_item_pos(name)
81
+ if pos is not None:
82
+ items.append((name, pos[0])) # x_position for sorting
83
+
84
+ # Sort by X (most negative = leftmost on conveyor = first to pick)
85
+ items.sort(key=lambda x: x[1])
86
+ return items
87
+
88
+ def match_detections_to_items(self, detections: List[Dict]) -> List[Dict]:
89
+ """
90
+ Match VLM detections to simulation item names.
91
+
92
+ Each detection gets an additional 'sim_item' key with the matched
93
+ simulation item name, and 'bin_type' with the target bin.
94
+
95
+ Returns:
96
+ List of matched detections with sim_item and bin_type fields added
97
+ """
98
+ # Track which items have already been matched
99
+ matched_items = set()
100
+ results = []
101
+
102
+ def box_left_x(det: Dict) -> float:
103
+ box = det.get("box_2d")
104
+ if isinstance(box, (list, tuple)) and len(box) >= 2:
105
+ try:
106
+ return float(box[1])
107
+ except (TypeError, ValueError):
108
+ pass
109
+ return 1000.0
110
+
111
+ # Group detections by type
112
+ for det_type in ["FLAMMABLE", "CHEMICAL"]:
113
+ type_detections = []
114
+ for d in detections:
115
+ if not isinstance(d, dict):
116
+ continue
117
+ dtype = str(d.get("type", "")).strip().upper()
118
+ if dtype == det_type:
119
+ type_detections.append(d)
120
+ available_items = self.get_unpicked_items_by_type(det_type)
121
+
122
+ # Sort detections by x position of bounding box (leftmost first)
123
+ type_detections.sort(key=box_left_x)
124
+
125
+ bin_type = BinType.FLAMMABLE if det_type == "FLAMMABLE" else BinType.CHEMICAL
126
+
127
+ for i, detection in enumerate(type_detections):
128
+ # Find first available item not yet matched
129
+ sim_item = None
130
+ for item_name, _ in available_items:
131
+ if item_name not in matched_items:
132
+ sim_item = item_name
133
+ matched_items.add(item_name)
134
+ break
135
+
136
+ if sim_item:
137
+ detection["sim_item"] = sim_item
138
+ detection["bin_type"] = bin_type
139
+ results.append(detection)
140
+ logger.info(f"Matched VLM '{detection.get('name')}' → "
141
+ f"sim '{sim_item}' → bin '{bin_type.value}'")
142
+ else:
143
+ logger.warning(f"No unmatched sim item for VLM detection: "
144
+ f"{detection.get('name')} ({det_type})")
145
+
146
+ return results
147
+
148
+ def detect_and_sort(self) -> Dict:
149
+ """
150
+ Full pipeline: detect hazards → match to sim items → pick and place all.
151
+
152
+ Returns:
153
+ Summary dict with detection count, sort count, and details
154
+ """
155
+ # Step 1: Detect hazards
156
+ logger.info("Step 1: Detecting hazards with VLM...")
157
+ detections = self.processor.detect_hazards()
158
+ logger.info(f"VLM found {len(detections)} hazardous items")
159
+
160
+ if not detections:
161
+ return {"detected": 0, "matched": 0, "sorted": 0, "details": []}
162
+
163
+ # Step 2: Match to simulation items
164
+ logger.info("Step 2: Matching detections to simulation items...")
165
+ matched = self.match_detections_to_items(detections)
166
+ logger.info(f"Matched {len(matched)} items")
167
+
168
+ # Step 3: Pick and place each matched item
169
+ logger.info("Step 3: Executing pick-and-place sequence...")
170
+ details = []
171
+ sorted_count = 0
172
+
173
+ for match in matched:
174
+ item_name = match["sim_item"]
175
+ bin_type = match["bin_type"]
176
+ vlm_name = match.get("name", "unknown")
177
+
178
+ logger.info(f"Sorting: {vlm_name} ({item_name}) → {bin_type.value}")
179
+ success = self.simulation.pick_and_place(item_name, bin_type)
180
+
181
+ # Let remaining items settle after the arm moves
182
+ self.simulation.step(200)
183
+
184
+ details.append({
185
+ "vlm_name": vlm_name,
186
+ "sim_item": item_name,
187
+ "target_bin": bin_type.value,
188
+ "success": success,
189
+ })
190
+
191
+ if success:
192
+ sorted_count += 1
193
+
194
+ return {
195
+ "detected": len(detections),
196
+ "matched": len(matched),
197
+ "sorted": sorted_count,
198
+ "details": details,
199
+ }
200
+
201
+ def close(self):
202
+ """Release resources."""
203
+ self.processor.close()
204
+
205
+
206
+ # ─── CLI entry point ────────────────────────────────────────────────────────
207
+
208
+ def main():
209
+ import argparse
210
+
211
+ parser = argparse.ArgumentParser(description="SemSorter VLM-Sim Bridge")
212
+ parser.add_argument("--direct", action="store_true",
213
+ help="Use direct simulation rendering instead of OBS")
214
+ parser.add_argument("--device", type=int, default=4,
215
+ help="OBS Virtual Camera device ID (default: 4)")
216
+ args = parser.parse_args()
217
+
218
+ logging.basicConfig(level=logging.INFO)
219
+
220
+ # Initialize simulation
221
+ print("Initializing simulation...")
222
+ if args.direct:
223
+ os.environ.setdefault("MUJOCO_GL", "egl")
224
+ sim = SemSorterSimulation()
225
+ sim.load_scene()
226
+ sim.step(200) # Let physics settle
227
+
228
+ # Initialize bridge
229
+ bridge = VLMSimBridge(
230
+ simulation=sim,
231
+ device_id=args.device,
232
+ use_direct=args.direct,
233
+ )
234
+
235
+ try:
236
+ # Run full detect → match → sort pipeline
237
+ print("\n" + "=" * 60)
238
+ print(" SemSorter: VLM-Driven Hazard Sorting")
239
+ print("=" * 60)
240
+
241
+ result = bridge.detect_and_sort()
242
+
243
+ print("\n" + "=" * 60)
244
+ print(" SORTING RESULTS")
245
+ print("=" * 60)
246
+ print(f" Hazards detected by VLM: {result['detected']}")
247
+ print(f" Matched to sim items: {result['matched']}")
248
+ print(f" Successfully sorted: {result['sorted']}")
249
+
250
+ if result['details']:
251
+ print("\n Details:")
252
+ for d in result['details']:
253
+ status = "✅" if d['success'] else "❌"
254
+ print(f" {status} {d['vlm_name']} ({d['sim_item']}) → {d['target_bin']}")
255
+
256
+ print("=" * 60)
257
+
258
+ # Save final state
259
+ sim.save_frame("after_sort.png")
260
+ print(f"\nFinal scene saved to after_sort.png")
261
+
262
+ finally:
263
+ bridge.close()
264
+ if hasattr(sim, "close"):
265
+ sim.close()
266
+
267
+
268
+ if __name__ == "__main__":
269
+ main()
Vision-Agents ADDED
@@ -0,0 +1 @@
 
 
1
+ Subproject commit f684ece6c3b6540b02de9c73431a5ffe0c576f29
render.yaml ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ services:
2
+ - type: web
3
+ name: semsorter
4
+ env: docker
5
+ dockerfilePath: ./Dockerfile
6
+ plan: free
7
+ envVars:
8
+ - key: MUJOCO_GL
9
+ value: egl
10
+ - key: GOOGLE_API_KEY
11
+ sync: false # Set in Render dashboard — not committed to git
12
+ - key: DEEPGRAM_API_KEY
13
+ sync: false
14
+ - key: ELEVENLABS_API_KEY
15
+ sync: false
16
+ - key: STREAM_API_KEY
17
+ sync: false
18
+ - key: STREAM_API_SECRET
19
+ sync: false
20
+ healthCheckPath: /api/state
21
+ autoDeploy: true
requirements-server.txt ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SemSorter Web Server Dependencies
2
+ fastapi==0.115.0
3
+ uvicorn[standard]==0.30.6
4
+ websockets==13.1
5
+ python-multipart==0.0.12
6
+ httpx==0.27.2
7
+ pillow==10.4.0
8
+ numpy==1.26.4
9
+
10
+ # MuJoCo (headless, EGL)
11
+ mujoco==3.2.0
12
+
13
+ # Google Gemini (legacy + new SDK — both used for compatibility)
14
+ google-generativeai==0.8.3
15
+ google-genai==1.0.0
16
+
17
+ # dotenv for loading .env files
18
+ python-dotenv==1.0.1