Spaces:

TheSilentOne
/

SemSorter

Sleeping

SemSorter commited on Feb 27

Commit

2588ff8

0 Parent(s):

feat: SemSorter — AI hazard sorting with Vision-Agents SDK

- Phase 1: MuJoCo Franka Panda simulation with pick-and-place
- Phase 2: Gemini VLM hazard detection pipeline
- Phase 3: Vision-Agents SDK agent (gemini.LLM + deepgram.STT + elevenlabs.TTS)
- Phase 4: FastAPI web server with WebSocket live video + chat UI

Closes all phases.

Files changed (22) hide show

.env.example +15 -0
.gitignore +34 -0
Dockerfile +34 -0
README.md +120 -0
SemSorter/agent/__init__.py +0 -0
SemSorter/agent/agent.py +252 -0
SemSorter/agent/semsorter_instructions.md +22 -0
SemSorter/server/__init__.py +1 -0
SemSorter/server/agent_bridge.py +363 -0
SemSorter/server/app.py +207 -0
SemSorter/server/static/index.html +427 -0
SemSorter/simulation/__init__.py +1 -0
SemSorter/simulation/controller.py +786 -0
SemSorter/simulation/interactive_test.py +70 -0
SemSorter/simulation/semsorter_scene.xml +194 -0
SemSorter/vision/__init__.py +1 -0
SemSorter/vision/test_obs.py +29 -0
SemSorter/vision/vision_pipeline.py +239 -0
SemSorter/vision/vlm_bridge.py +269 -0
Vision-Agents +1 -0
render.yaml +21 -0
requirements-server.txt +18 -0

.env.example ADDED Viewed

	@@ -0,0 +1,15 @@

+# SemSorter Environment Variables
+# Copy to .env and fill in your API keys
+# GetStream (for real-time video/audio transport)
+STREAM_API_KEY="y3dc7e4xhnsd"
+STREAM_API_SECRET="7kg397sb74r4ambaty5tw4uftd63866sddkgmdtbnktk6ga28cfxuyqevtsffuey"
+# Google Gemini (for LLM orchestration + VLM hazard detection)
+GOOGLE_API_KEY="AIzaSyCsQc6fjzXElbhAagjL5ORfwUf2v8FZzb4"
+# Deepgram (for Speech-to-Text)
+DEEPGRAM_API_KEY="21e5a2c42257394eb0019d131809c16a6377d19b"
+# ElevenLabs (for Text-to-Speech)
+ELEVENLABS_API_KEY="sk_124e4f931e99dc230b0a1a435ab667cf330c088a1b769a15"

.gitignore ADDED Viewed

	@@ -0,0 +1,34 @@

+# Debug / generated images
+*.png
+!SemSorter/vision/vision_debug.png
+# Python
+__pycache__/
+*.pyc
+*.pyo
+.eggs/
+*.egg-info/
+# MuJoCo
+*.mjb
+mujoco-*/
+mujoco_menagerie/
+# Vision-Agents SDK venv (too large for git)
+Vision-Agents/.venv/
+Vision-Agents/__pycache__/
+# uv cache
+.uv/
+uv.lock
+# IDE
+.vscode/
+.idea/
+# OS
+.DS_Store
+Thumbs.db
+# Environment (never commit secrets)
+.env

Dockerfile ADDED Viewed

	@@ -0,0 +1,34 @@

+FROM python:3.10-slim
+# ── System deps for MuJoCo EGL rendering ─────────────────────────────────────
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    libgl1-mesa-glx \
+    libglib2.0-0 \
+    libegl1-mesa \
+    libegl1 \
+    libgles2 \
+    libglvnd0 \
+    libglx0 \
+    libx11-6 \
+    wget \
+    && rm -rf /var/lib/apt/lists/*
+# ── Create working directory ──────────────────────────────────────────────────
+WORKDIR /app
+# ── Copy requirements first (layer caching) ──────────────────────────────────
+COPY requirements-server.txt ./
+RUN pip install --no-cache-dir -r requirements-server.txt
+# ── Copy project ──────────────────────────────────────────────────────────────
+COPY . .
+# ── MuJoCo environment ────────────────────────────────────────────────────────
+ENV MUJOCO_GL=egl
+ENV PYOPENGL_PLATFORM=egl
+# ── Expose port ───────────────────────────────────────────────────────────────
+EXPOSE 8000
+# ── Start server ──────────────────────────────────────────────────────────────
+CMD ["uvicorn", "SemSorter.server.app:app", "--host", "0.0.0.0", "--port", "8000"]

README.md ADDED Viewed

	@@ -0,0 +1,120 @@

+# SemSorter — AI Hazard Sorting System
+> **Real-time robotic arm simulation controlled by a multimodal AI agent using the [Vision-Agents SDK](https://github.com/GetStream/vision-agents) by GetStream.**
+[![Demo](https://img.shields.io/badge/Live%20Demo-Render.com-4f46e5)](https://semsorter.onrender.com)
+---
+## 🤖 Overview
+SemSorter is an AI-powered hazardous waste sorting system where a Franka Panda robotic arm, simulated in MuJoCo, is controlled by a multimodal AI agent. The agent:
+1. **Watches** the conveyor belt via a live camera feed
+2. **Detects** hazardous items (flammable / chemical) using **Gemini VLM**
+3. **Plans and executes** pick-and-place operations via **Gemini LLM function-calling**
+4. **Speaks back** results using **ElevenLabs TTS**
+5. **Listens** to voice commands via **Deepgram STT**
+All orchestration uses the **[Vision-Agents SDK](https://github.com/GetStream/vision-agents)** by GetStream.
+---
+## 🏗 Architecture
+```
+Browser  ←─── WebSocket ───→  FastAPI Server
+                                    │
+                          Vision-Agents SDK Agent
+                          ┌─────────┴──────────┐
+                     gemini.LLM          deepgram.STT
+                     (tool-calling)      (voice→text)
+                          │
+                     VLM Bridge
+                          │
+                     MuJoCo Sim (Franka Panda)
+```
+---
+## 🚀 Quick Start
+### Prerequisites
+- Python 3.10+
+- MuJoCo 3.x
+- EGL (headless GPU rendering)
+### Local Setup
+```bash
+# Clone
+git clone https://github.com/YOUR_USERNAME/SemSorter.git
+cd SemSorter
+# Install dependencies
+pip install -r requirements-server.txt
+# Configure API keys
+cp .env.example .env
+# Edit .env with your keys:
+# GOOGLE_API_KEY, DEEPGRAM_API_KEY, ELEVENLABS_API_KEY
+# STREAM_API_KEY, STREAM_API_SECRET
+# Run
+MUJOCO_GL=egl uvicorn SemSorter.server.app:app --host 0.0.0.0 --port 8000
+# Open http://localhost:8000
+```
+### Voice Agent (Vision-Agents SDK CLI)
+```bash
+cd Vision-Agents
+MUJOCO_GL=egl uv run python ../SemSorter/agent/agent.py run
+```
+---
+## 📦 Project Structure
+```
+SemSorter/
+├── SemSorter/
+│   ├── simulation/
+│   │   ├── controller.py          # MuJoCo sim + IK + pick-and-place
+│   │   └── semsorter_scene.xml    # MJCF scene (Panda + conveyor + bins)
+│   ├── vision/
+│   │   ├── vision_pipeline.py     # Gemini VLM hazard detection
+│   │   └── vlm_bridge.py         # VLM → sim item matching
+│   ├── agent/
+│   │   ├── agent.py               # Vision-Agents SDK agent
+│   │   └── semsorter_instructions.md
+│   └── server/
+│       ├── app.py                 # FastAPI + WebSocket video stream
+│       ├── agent_bridge.py        # SDK bridge + quota detection
+│       └── static/index.html      # Web UI
+├── Vision-Agents/                 # GetStream Vision-Agents SDK
+├── Dockerfile
+├── render.yaml
+└── requirements-server.txt
+```
+---
+## 🔑 API Keys Required
+| Service | Purpose | Free tier |
+|---|---|---|
+| Google Gemini | LLM orchestration + VLM detection | 15 RPM |
+| Deepgram | Speech-to-Text | 45 min/month |
+| ElevenLabs | Text-to-Speech | ~10k chars/month |
+| GetStream | Real-time video call (Voice agent) | Free tier available |
+> **API exhaustion handling:** The server detects quota errors (`429 / ResourceExhausted`) and automatically switches to demo-mode per service, showing a banner in the UI.
+---
+## 🐳 Deploy to Render
+1. Fork this repo
+2. Create a new **Web Service** on [Render.com](https://render.com) pointing to your fork
+3. Add your API keys as **Environment Variables** in the Render dashboard
+4. Done — Render auto-deploys from `render.yaml`

SemSorter/agent/__init__.py ADDED Viewed

File without changes

SemSorter/agent/agent.py ADDED Viewed

	@@ -0,0 +1,252 @@

+"""
+SemSorter Agent — Vision-Agents SDK Integration
+This module creates a real-time AI agent using GetStream's Vision-Agents SDK.
+The agent watches the MuJoCo simulation via video, listens to voice commands,
+detects hazardous items using Gemini VLM, and triggers pick-and-place operations.
+Usage (from the Vision-Agents directory):
+    # Set env vars in .env first, then:
+    uv run python ../SemSorter/SemSorter/agent/agent.py run
+"""
+import logging
+import os
+import sys
+import atexit
+from pathlib import Path
+from typing import Any, Dict
+from dotenv import load_dotenv
+from vision_agents.core import Agent, AgentLauncher, Runner, User
+from vision_agents.plugins import deepgram, elevenlabs, gemini, getstream
+logger = logging.getLogger(__name__)
+# ─── Path setup ──────────────────────────────────────────────────────────────
+# Add SemSorter packages to sys.path so we can import simulation & vision
+AGENT_DIR = Path(__file__).resolve().parent
+SEMSORTER_DIR = AGENT_DIR.parent
+PROJECT_ROOT = SEMSORTER_DIR.parent
+sys.path.insert(0, str(SEMSORTER_DIR / "simulation"))
+sys.path.insert(0, str(SEMSORTER_DIR / "vision"))
+# Load environment variables
+load_dotenv(PROJECT_ROOT / ".env")
+# ─── Simulation singleton ───────────────────────────────────────────────────
+_simulation = None
+_bridge = None
+def get_simulation():
+    """Lazy-initialize the MuJoCo simulation (singleton)."""
+    global _simulation
+    if _simulation is None:
+        os.environ.setdefault("MUJOCO_GL", "egl")
+        from controller import SemSorterSimulation
+        logger.info("Initializing MuJoCo simulation...")
+        _simulation = SemSorterSimulation()
+        _simulation.load_scene()
+        _simulation.step(200)  # Let physics settle
+        logger.info("Simulation ready.")
+    return _simulation
+def get_bridge():
+    """Lazy-initialize the VLM-Simulation bridge (singleton)."""
+    global _bridge
+    if _bridge is None:
+        from vlm_bridge import VLMSimBridge
+        sim = get_simulation()
+        _bridge = VLMSimBridge(simulation=sim, use_direct=True)
+        logger.info("VLM-Sim bridge ready.")
+    return _bridge
+class _EGLStderrFilter:
+    """Stderr wrapper that suppresses only known EGL teardown noise."""
+    _SUPPRESSED = ("EGLError", "eglDestroyContext", "eglMakeCurrent",
+                   "EGL_NOT_INITIALIZED", "GLContext.__del__",
+                   "Renderer.__del__", "SfuStatsReporter",
+                   "Task was destroyed but it is pending")
+    def __init__(self, real):
+        self._real = real
+    def write(self, s):
+        if any(tok in s for tok in self._SUPPRESSED):
+            return len(s)  # silently consume
+        return self._real.write(s)
+    def flush(self):
+        self._real.flush()
+    def __getattr__(self, name):
+        return getattr(self._real, name)
+def close_resources() -> None:
+    """Release singleton resources on process shutdown."""
+    # Only muffle known-harmless EGL teardown noise, keep real errors visible
+    sys.stderr = _EGLStderrFilter(sys.stderr)
+    global _bridge, _simulation
+    if _bridge is not None:
+        try:
+            _bridge.close()
+        except Exception:
+            pass
+        _bridge = None
+    if _simulation is not None and hasattr(_simulation, "close"):
+        try:
+            _simulation.close()
+        except Exception:
+            pass
+    _simulation = None
+atexit.register(close_resources)
+# ─── LLM Setup with Tool Registration ───────────────────────────────────────
+INSTRUCTIONS = Path(AGENT_DIR / "semsorter_instructions.md").read_text()
+def setup_llm(model: str = "gemini-3-flash-preview") -> gemini.LLM:
+    """Create and configure the Gemini LLM with registered simulation tools."""
+    llm = gemini.LLM(model)
+    @llm.register_function(
+        description="Scan the conveyor belt camera feed for hazardous items. "
+        "Returns a list of detected hazardous items with their types and positions."
+    )
+    async def scan_for_hazards() -> Dict[str, Any]:
+        """Capture a frame, match detections to sim items, and return actionable IDs."""
+        bridge = get_bridge()
+        detections = bridge.processor.detect_hazards()
+        matched = bridge.match_detections_to_items(detections)
+        return {
+            "hazards_found": len(detections),
+            "items_matched": len(matched),
+            "items": [
+                {
+                    "item_name": d.get("sim_item", "unknown"),
+                    "bin_type": d.get("bin_type").value if d.get("bin_type") else "unknown",
+                    "detected_name": d.get("name", "unknown"),
+                    "type": str(d.get("type", "unknown")).lower(),
+                    "color": d.get("color", "unknown"),
+                    "shape": d.get("shape", "unknown"),
+                }
+                for d in matched
+            ],
+        }
+    @llm.register_function(
+        description="Pick a specific item from the conveyor and place it in "
+        "the designated hazard bin. Use item_name from scan results. "
+        "bin_type must be 'flammable' or 'chemical'."
+    )
+    async def pick_and_place_item(item_name: str, bin_type: str) -> Dict[str, Any]:
+        """Execute a pick-and-place operation for a specific item."""
+        from controller import BinType
+        sim = get_simulation()
+        type_map = {"flammable": BinType.FLAMMABLE, "chemical": BinType.CHEMICAL}
+        target_bin = type_map.get(bin_type.lower())
+        if target_bin is None:
+            return {"success": False, "error": f"Unknown bin type: {bin_type}"}
+        if item_name not in sim.items:
+            return {"success": False, "error": f"Unknown item: {item_name}"}
+        if sim.items[item_name].picked:
+            return {"success": False, "error": f"Item {item_name} already sorted"}
+        success = sim.pick_and_place(item_name, target_bin)
+        return {
+            "success": success,
+            "item": item_name,
+            "bin": bin_type,
+            "total_sorted": sim._items_sorted,
+        }
+    @llm.register_function(
+        description="Get the current state of the simulation: items, robot position, "
+        "and sorting progress."
+    )
+    async def get_simulation_state() -> Dict[str, Any]:
+        """Return current simulation state snapshot."""
+        sim = get_simulation()
+        state = sim.get_state()
+        return {
+            "time": round(state.time, 2),
+            "arm_busy": state.arm_busy,
+            "gripper_open": state.gripper_open,
+            "items_sorted": state.items_sorted,
+            "ee_position": [round(x, 3) for x in state.ee_pos],
+            "items": state.items,
+        }
+    @llm.register_function(
+        description="Automatically scan for ALL hazardous items and sort them into "
+        "the correct bins. This runs the full detect-match-sort pipeline."
+    )
+    async def sort_all_hazards() -> Dict[str, Any]:
+        """Full automated pipeline: detect → match → pick-and-place all hazards."""
+        bridge = get_bridge()
+        result = bridge.detect_and_sort()
+        return {
+            "hazards_detected": result["detected"],
+            "items_matched": result["matched"],
+            "items_sorted": result["sorted"],
+            "details": result["details"],
+        }
+    return llm
+# ─── Agent Creation ──────────────────────────────────────────────────────────
+async def create_agent(**kwargs) -> Agent:
+    """Create the SemSorter agent with Vision-Agents SDK."""
+    llm = setup_llm()
+    agent = Agent(
+        edge=getstream.Edge(),
+        agent_user=User(name="SemSorter AI", id="semsorter-agent"),
+        instructions=INSTRUCTIONS,
+        llm=llm,
+        tts=elevenlabs.TTS(model_id="eleven_flash_v2_5"),
+        stt=deepgram.STT(eager_turn_detection=True),
+        processors=[],
+    )
+    return agent
+async def join_call(agent: Agent, call_type: str, call_id: str, **kwargs) -> None:
+    """Join a GetStream video call and start the agent loop."""
+    call = await agent.create_call(call_type, call_id)
+    async with agent.join(call):
+        # Greet the user
+        await agent.simple_response(
+            "Hello! I'm the SemSorter AI. I can scan the conveyor belt "
+            "for hazardous items and sort them into the correct bins. "
+            "Just tell me what to do!"
+        )
+        # Run until the call ends
+        await agent.finish()
+# ─── Entry point ─────────────────────────────────────────────────────────────
+if __name__ == "__main__":
+    Runner(AgentLauncher(create_agent=create_agent, join_call=join_call)).cli()

SemSorter/agent/semsorter_instructions.md ADDED Viewed

	@@ -0,0 +1,22 @@

+You are the SemSorter AI assistant — a robotic waste sorting system operator.
+## Your Role
+You control a Franka Panda robot arm that sorts hazardous waste items on a conveyor belt into the correct safety bins:
+- **Flammable items** (red colored) → Red flammable bin
+- **Chemical items** (yellow colored) → Yellow chemical bin
+- **Safe items** (gray/white/blue/green) → Leave on conveyor (no action needed)
+## Available Tools
+1. **scan_for_hazards** — Capture a frame from the conveyor camera and analyze it with the VLM to detect hazardous items. Call this FIRST when asked to sort items.
+2. **pick_and_place_item** — Pick a specific item and place it in the designated bin. Use the item_name and bin_type returned by scan_for_hazards.
+3. **get_simulation_state** — Check the current status: which items exist, which have been sorted, and the robot's position.
+4. **sort_all_hazards** — Automatically scan and sort ALL detected hazardous items in one go.
+## Behavior Rules
+- When asked to "sort items" or "clean up", call `sort_all_hazards` for the full automated pipeline.
+- When asked about "what's on the belt" or "scan", call `scan_for_hazards` and describe the results.
+- When asked about a specific item, call `get_simulation_state` to check its status.
+- Keep responses SHORT and conversational (1-2 sentences).
+- Announce each action as you do it: "Scanning the belt...", "Picking up the red cylinder...", "Placed in flammable bin!"
+- If no hazards are found, say something like "All clear! No hazardous items detected."

SemSorter/server/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ # SemSorter Web Server

SemSorter/server/agent_bridge.py ADDED Viewed

	@@ -0,0 +1,363 @@

+"""
+SemSorter Agent Bridge
+======================
+Wraps the Vision-Agents SDK components (gemini.LLM, deepgram.STT, elevenlabs.TTS)
+and the MuJoCo simulation into a single async service used by the FastAPI server.
+Quota/API exhaustion is detected per-service and a UIstatus message is returned
+so the frontend can display an informative banner before demo-mode engages.
+"""
+import asyncio
+import logging
+import os
+import sys
+from pathlib import Path
+from typing import Any, Callable, Dict, List, Optional
+logger = logging.getLogger(__name__)
+# ── Path setup ────────────────────────────────────────────────────────────────
+_SERVER_DIR = Path(__file__).resolve().parent
+_SEMSORTER_DIR = _SERVER_DIR.parent
+_PROJECT_ROOT = _SEMSORTER_DIR.parent
+sys.path.insert(0, str(_SEMSORTER_DIR / "simulation"))
+sys.path.insert(0, str(_SEMSORTER_DIR / "vision"))
+sys.path.insert(0, str(_PROJECT_ROOT / "Vision-Agents" / "agents-core"))
+for _plugin in ("gemini", "deepgram", "elevenlabs", "getstream"):
+    _plugin_path = _PROJECT_ROOT / "Vision-Agents" / "plugins" / _plugin
+    if _plugin_path.exists():
+        sys.path.insert(0, str(_plugin_path))
+# ── Quota-tracking state ──────────────────────────────────────────────────────
+_quota_exceeded: Dict[str, bool] = {
+    "gemini": False,
+    "deepgram": False,
+    "elevenlabs": False,
+}
+# ── Demo-mode pre-recorded detections ────────────────────────────────────────
+_DEMO_DETECTIONS = [
+    {"name": "red cylinder", "type": "FLAMMABLE", "color": "red",
+     "shape": "cylinder", "box_2d": [240, 200, 290, 260]},
+    {"name": "green box", "type": "FLAMMABLE", "color": "green",
+     "shape": "box", "box_2d": [240, 260, 285, 310]},
+    {"name": "yellow box", "type": "CHEMICAL", "color": "yellow",
+     "shape": "box", "box_2d": [240, 310, 285, 360]},
+    {"name": "blue box", "type": "CHEMICAL", "color": "blue",
+     "shape": "box", "box_2d": [240, 370, 285, 420]},
+]
+# ── Singleton resources ───────────────────────────────────────────────────────
+_sim = None
+_bridge = None
+_llm = None
+_tts = None
+_notify_cb: Optional[Callable[[Dict], None]] = None  # Push events to WebSocket
+def set_notify_callback(cb: Callable[[Dict], None]) -> None:
+    """Register a callback that pushes quota/status events to connected WS clients."""
+    global _notify_cb
+    _notify_cb = cb
+def _push(event: Dict) -> None:
+    """Fire-and-forget push to the registered notify callback."""
+    if _notify_cb:
+        try:
+            _notify_cb(event)
+        except Exception:
+            pass
+def _check_quota_error(exc: Exception) -> Optional[str]:
+    """Return service name if the exception indicates API quota exhaustion."""
+    msg = str(exc).lower()
+    if "resource_exhausted" in msg or "429" in msg or "quota" in msg:
+        if "gemini" in msg or "google" in msg:
+            return "gemini"
+        if "deepgram" in msg:
+            return "deepgram"
+        if "elevenlabs" in msg or "eleven" in msg:
+            return "elevenlabs"
+        return "unknown"
+    return None
+def _mark_quota_exceeded(service: str) -> None:
+    """Mark a service as quota-exceeded and push a warning to the UI."""
+    if not _quota_exceeded.get(service):
+        _quota_exceeded[service] = True
+        _push({
+            "type": "quota_warning",
+            "service": service,
+            "message": (
+                f"⚠️ {service.title()} API quota exceeded — "
+                f"switching to demo mode for this service."
+            ),
+        })
+        logger.warning("Quota exceeded for %s — demo mode activated", service)
+# ── Lazy initializers ─────────────────────────────────────────────────────────
+def get_simulation():
+    global _sim
+    if _sim is None:
+        os.environ.setdefault("MUJOCO_GL", "egl")
+        from controller import SemSorterSimulation
+        logger.info("Initialising MuJoCo simulation…")
+        _sim = SemSorterSimulation()
+        _sim.load_scene()
+        _sim.step(300)
+        logger.info("Simulation ready: %d items", len(_sim.items))
+    return _sim
+def get_bridge():
+    global _bridge
+    if _bridge is None:
+        from vlm_bridge import VLMSimBridge
+        _bridge = VLMSimBridge(simulation=get_simulation(), use_direct=True)
+        logger.info("VLM bridge ready")
+    return _bridge
+def get_llm():
+    """Return a configured gemini.LLM instance from the Vision-Agents SDK."""
+    global _llm
+    if _llm is None:
+        from vision_agents.plugins.gemini.gemini_llm import GeminiLLM as GeminiLLMCls
+        _llm = GeminiLLMCls("gemini-2.0-flash")
+        _register_tools(_llm)
+        logger.info("Gemini LLM ready")
+    return _llm
+def _register_tools(llm) -> None:
+    """Register simulation control tools on the LLM."""
+    @llm.register_function(description="Scan the conveyor belt for hazardous items.")
+    async def scan_for_hazards() -> Dict[str, Any]:
+        return await _scan_hazards_impl()
+    @llm.register_function(
+        description="Pick a specific item by sim name and place it in its bin. "
+                    "bin_type must be 'flammable' or 'chemical'.")
+    async def pick_and_place_item(item_name: str, bin_type: str) -> Dict[str, Any]:
+        return await _pick_place_impl(item_name, bin_type)
+    @llm.register_function(description="Get current simulation state snapshot.")
+    async def get_simulation_state() -> Dict[str, Any]:
+        return _state_impl()
+    @llm.register_function(
+        description="Detect ALL hazardous items and sort them automatically.")
+    async def sort_all_hazards() -> Dict[str, Any]:
+        return await _sort_all_impl()
+# ── Tool implementations ──────────────────────────────────────────────────────
+async def _scan_hazards_impl() -> Dict[str, Any]:
+    if _quota_exceeded["gemini"]:
+        # Already in demo mode — return pre-recorded detections
+        bridge = get_bridge()
+        matched = bridge.match_detections_to_items(_DEMO_DETECTIONS)
+        return _format_scan(matched, demo=True)
+    try:
+        bridge = get_bridge()
+        loop = asyncio.get_event_loop()
+        detections = await loop.run_in_executor(
+            None, bridge.processor.detect_hazards)
+        matched = bridge.match_detections_to_items(detections)
+        return _format_scan(matched, demo=False)
+    except Exception as exc:
+        svc = _check_quota_error(exc)
+        if svc:
+            _mark_quota_exceeded(svc)
+            bridge = get_bridge()
+            matched = bridge.match_detections_to_items(_DEMO_DETECTIONS)
+            return _format_scan(matched, demo=True)
+        raise
+def _format_scan(matched: List[Dict], demo: bool) -> Dict[str, Any]:
+    return {
+        "demo_mode": demo,
+        "hazards_found": len(matched),
+        "items": [
+            {
+                "item_name": d.get("sim_item", "unknown"),
+                "bin_type": d["bin_type"].value if d.get("bin_type") else "unknown",
+                "detected_name": d.get("name", "unknown"),
+                "type": str(d.get("type", "")).lower(),
+                "color": d.get("color", ""),
+                "shape": d.get("shape", ""),
+            }
+            for d in matched
+        ],
+    }
+async def _pick_place_impl(item_name: str, bin_type: str) -> Dict[str, Any]:
+    from controller import BinType
+    sim = get_simulation()
+    type_map = {"flammable": BinType.FLAMMABLE, "chemical": BinType.CHEMICAL}
+    target = type_map.get(bin_type.lower())
+    if not target:
+        return {"success": False, "error": f"Unknown bin: {bin_type}"}
+    if item_name not in sim.items:
+        return {"success": False, "error": f"Unknown item: {item_name}"}
+    if sim.items[item_name].picked:
+        return {"success": False, "error": f"{item_name} already sorted"}
+    loop = asyncio.get_event_loop()
+    success = await loop.run_in_executor(None, sim.pick_and_place, item_name, target)
+    return {"success": success, "item": item_name, "bin": bin_type,
+            "total_sorted": sim._items_sorted}
+def _state_impl() -> Dict[str, Any]:
+    sim = get_simulation()
+    state = sim.get_state()
+    return {
+        "time": round(state.time, 2),
+        "arm_busy": state.arm_busy,
+        "items_sorted": state.items_sorted,
+        "ee_position": [round(x, 3) for x in state.ee_pos],
+        "quota_exceeded": dict(_quota_exceeded),
+        "items": [
+            {"name": i["name"], "picked": i["picked"],
+             "hazard_type": i.get("hazard_type")}
+            for i in state.items
+        ],
+    }
+async def _sort_all_impl() -> Dict[str, Any]:
+    """Full detect → match → sort pipeline."""
+    # 1. Detect
+    scan_result = await _scan_hazards_impl()
+    items = scan_result["items"]
+    demo = scan_result["demo_mode"]
+    if not items:
+        return {"hazards_detected": 0, "items_matched": 0, "items_sorted": 0,
+                "details": [], "demo_mode": demo}
+    # 2. Sort each matched item
+    details = []
+    sorted_count = 0
+    for item in items:
+        r = await _pick_place_impl(item["item_name"], item["bin_type"])
+        details.append({"item": item["item_name"], "bin": item["bin_type"],
+                         "success": r.get("success", False)})
+        if r.get("success"):
+            sorted_count += 1
+    return {"hazards_detected": len(items), "items_matched": len(items),
+            "items_sorted": sorted_count, "details": details, "demo_mode": demo}
+# ── Text → agent response ─────────────────────────────────────────────────────
+async def process_text_command(text: str) -> str:
+    """
+    Send a text command to the Gemini LLM (Vision-Agents SDK).
+    Returns the agent's text response.
+    On quota error: marks exceeded + returns a canned message.
+    """
+    if _quota_exceeded["gemini"]:
+        return await _llm_demo_response(text)
+    try:
+        llm = get_llm()
+        # Use the LLM's chat method to get a response with tool-calling
+        response = await llm.chat(text)
+        return response
+    except Exception as exc:
+        svc = _check_quota_error(exc)
+        if svc:
+            _mark_quota_exceeded(svc)
+            return await _llm_demo_response(text)
+        logger.exception("LLM error")
+        return f"Error processing command: {exc}"
+async def _llm_demo_response(text: str) -> str:
+    """Return a plausible demo response when Gemini quota is exhausted."""
+    t = text.lower()
+    if "scan" in t:
+        return ("I found 4 hazardous items on the conveyor belt: "
+                "2 flammable and 2 chemical. [Demo mode — Gemini quota exceeded]")
+    if "sort" in t or "pick" in t or "place" in t:
+        return ("Sorting all hazardous items into their respective bins. "
+                "[Demo mode — Gemini quota exceeded]")
+    if "state" in t or "status" in t:
+        state = _state_impl()
+        return (f"Simulation time: {state['time']}s. "
+                f"Items sorted: {state['items_sorted']}. "
+                f"Arm busy: {state['arm_busy']}. [Demo mode]")
+    return "I'm SemSorter AI. Ask me to scan or sort items! [Demo mode]"
+# ── TTS helper ────────────────────────────────────────────────────────────────
+async def text_to_speech(text: str) -> Optional[bytes]:
+    """
+    Convert text to audio bytes using ElevenLabs (Vision-Agents SDK plugin).
+    Returns None on quota error (frontend falls back to browser SpeechSynthesis).
+    """
+    if _quota_exceeded["elevenlabs"]:
+        return None
+    try:
+        from vision_agents.plugins.elevenlabs.elevenlabs_tts import ElevenLabsTTS
+        tts = ElevenLabsTTS(model_id="eleven_flash_v2_5")
+        audio_bytes = await tts.synthesize(text)
+        return audio_bytes
+    except Exception as exc:
+        svc = _check_quota_error(exc)
+        if svc == "elevenlabs" or svc == "unknown":
+            _mark_quota_exceeded("elevenlabs")
+        else:
+            logger.exception("TTS error")
+        return None
+# ── STT helper (Deepgram) ─────────────────────────────────────────────────────
+async def transcribe_audio(audio_bytes: bytes, mime: str = "audio/webm") -> Optional[str]:
+    """
+    Transcribe audio using Deepgram STT (Vision-Agents SDK plugin).
+    Returns None on quota error (frontend falls back to Web Speech API result).
+    """
+    if _quota_exceeded["deepgram"]:
+        return None
+    try:
+        import httpx, os
+        api_key = os.environ.get("DEEPGRAM_API_KEY", "")
+        if not api_key:
+            return None
+        async with httpx.AsyncClient() as client:
+            resp = await client.post(
+                "https://api.deepgram.com/v1/listen?model=nova-2",
+                headers={"Authorization": f"Token {api_key}",
+                         "Content-Type": mime},
+                content=audio_bytes,
+                timeout=10,
+            )
+        if resp.status_code == 429:
+            _mark_quota_exceeded("deepgram")
+            return None
+        data = resp.json()
+        return (data.get("results", {})
+                    .get("channels", [{}])[0]
+                    .get("alternatives", [{}])[0]
+                    .get("transcript", ""))
+    except Exception as exc:
+        logger.warning("Deepgram STT error: %s", exc)
+        return None

SemSorter/server/app.py ADDED Viewed

	@@ -0,0 +1,207 @@

+"""
+SemSorter FastAPI Server
+========================
+Serves the web UI and bridges the Vision-Agents SDK + MuJoCo simulation.
+Endpoints
+---------
+GET  /              → index.html
+WS   /ws/video      → MJPEG frames (~10 fps) from MuJoCo renderer
+WS   /ws/chat       → bidirectional: text commands → agent responses + events
+GET  /api/state     → current simulation state JSON
+POST /api/sort      → trigger sort_all_hazards pipeline
+POST /api/command   → send a text command to the agent
+POST /api/transcribe → transcribe uploaded audio via Deepgram
+Run locally:
+    cd SemSorter && MUJOCO_GL=egl uvicorn server.app:app --host 0.0.0.0 --port 8000 --reload
+"""
+import asyncio
+import base64
+import io
+import json
+import logging
+import os
+from pathlib import Path
+from typing import Set
+from fastapi import FastAPI, WebSocket, WebSocketDisconnect, UploadFile, File
+from fastapi.responses import HTMLResponse, JSONResponse
+from fastapi.staticfiles import StaticFiles
+import numpy as np
+from PIL import Image
+# ── Local imports ─────────────────────────────────────────────────────────────
+from . import agent_bridge as bridge
+logging.basicConfig(level=logging.INFO,
+                    format="%(asctime)s %(levelname)s %(name)s  %(message)s")
+logger = logging.getLogger(__name__)
+app = FastAPI(title="SemSorter", version="1.0")
+# ── Static files ──────────────────────────────────────────────────────────────
+_STATIC = Path(__file__).parent / "static"
+_STATIC.mkdir(exist_ok=True)
+# ── Connected WebSocket clients ───────────────────────────────────────────────
+_chat_clients: Set[WebSocket] = set()
+_video_clients: Set[WebSocket] = set()
+async def _broadcast_chat(event: dict) -> None:
+    """Push a JSON event to all connected chat WebSocket clients."""
+    payload = json.dumps(event)
+    dead = set()
+    for ws in list(_chat_clients):
+        try:
+            await ws.send_text(payload)
+        except Exception:
+            dead.add(ws)
+    _chat_clients -= dead
+def _sync_broadcast(event: dict) -> None:
+    """Thread-safe push called from sync code (bridge callbacks)."""
+    try:
+        loop = asyncio.get_event_loop()
+        if loop.is_running():
+            asyncio.create_task(_broadcast_chat(event))
+    except Exception:
+        pass
+# Register the broadcast callback so agent_bridge can push quota warnings
+bridge.set_notify_callback(_sync_broadcast)
+# ── Startup: pre-warm simulation ──────────────────────────────────────────────
+@app.on_event("startup")
+async def startup():
+    logger.info("Pre-warming MuJoCo simulation…")
+    loop = asyncio.get_event_loop()
+    await loop.run_in_executor(None, bridge.get_simulation)
+    logger.info("Simulation ready")
+# ── REST endpoints ────────────────────────────────────────────────────────────
+@app.get("/", response_class=HTMLResponse)
+async def index():
+    html_path = _STATIC / "index.html"
+    return HTMLResponse(html_path.read_text())
+@app.get("/api/state")
+async def api_state():
+    loop = asyncio.get_event_loop()
+    state = await loop.run_in_executor(None, bridge._state_impl)
+    return JSONResponse(state)
+@app.post("/api/sort")
+async def api_sort():
+    """Trigger the full detect-match-sort pipeline."""
+    result = await bridge._sort_all_impl()
+    await _broadcast_chat({"type": "sort_result", "data": result})
+    return JSONResponse(result)
+@app.post("/api/command")
+async def api_command(body: dict):
+    text = body.get("text", "").strip()
+    if not text:
+        return JSONResponse({"error": "empty command"}, status_code=400)
+    response_text = await bridge.process_text_command(text)
+    await _broadcast_chat({"type": "agent_response", "text": response_text})
+    return JSONResponse({"response": response_text})
+@app.post("/api/transcribe")
+async def api_transcribe(file: UploadFile = File(...)):
+    """Transcribe uploaded audio using Deepgram; returns transcript or null."""
+    audio_bytes = await file.read()
+    transcript = await bridge.transcribe_audio(audio_bytes, mime=file.content_type)
+    return JSONResponse({"transcript": transcript})
+# ── WebSocket: chat ───────────────────────────────────────────────────────────
+@app.websocket("/ws/chat")
+async def ws_chat(ws: WebSocket):
+    await ws.accept()
+    _chat_clients.add(ws)
+    logger.info("Chat client connected (%d total)", len(_chat_clients))
+    try:
+        await ws.send_text(json.dumps({
+            "type": "welcome",
+            "text": "Connected to SemSorter AI. Ask me to scan or sort items!",
+        }))
+        while True:
+            raw = await ws.receive_text()
+            try:
+                msg = json.loads(raw)
+            except json.JSONDecodeError:
+                msg = {"type": "command", "text": raw}
+            msg_type = msg.get("type", "command")
+            if msg_type == "command":
+                text = msg.get("text", "").strip()
+                if text:
+                    await _broadcast_chat({"type": "user_message", "text": text})
+                    response = await bridge.process_text_command(text)
+                    await _broadcast_chat({"type": "agent_response", "text": response})
+            elif msg_type == "scan":
+                result = await bridge._scan_hazards_impl()
+                await _broadcast_chat({"type": "scan_result", "data": result})
+            elif msg_type == "sort":
+                result = await bridge._sort_all_impl()
+                await _broadcast_chat({"type": "sort_result", "data": result})
+            elif msg_type == "state":
+                loop = asyncio.get_event_loop()
+                state = await loop.run_in_executor(None, bridge._state_impl)
+                await ws.send_text(json.dumps({"type": "state", "data": state}))
+    except WebSocketDisconnect:
+        pass
+    finally:
+        _chat_clients.discard(ws)
+        logger.info("Chat client disconnected (%d remaining)", len(_chat_clients))
+# ── WebSocket: live video stream ──────────────────────────────────────────────
+def _render_frame_jpeg(quality: int = 75) -> bytes:
+    """Render a MuJoCo frame and encode as JPEG bytes."""
+    sim = bridge.get_simulation()
+    frame = sim.render_frame(camera="overview")         # numpy H×W×3
+    img = Image.fromarray(frame)
+    buf = io.BytesIO()
+    img.save(buf, format="JPEG", quality=quality)
+    return buf.getvalue()
+@app.websocket("/ws/video")
+async def ws_video(ws: WebSocket):
+    await ws.accept()
+    _video_clients.add(ws)
+    logger.info("Video client connected")
+    try:
+        loop = asyncio.get_event_loop()
+        while True:
+            jpeg_bytes = await loop.run_in_executor(None, _render_frame_jpeg)
+            b64 = base64.b64encode(jpeg_bytes).decode()
+            await ws.send_text(json.dumps({"type": "frame", "data": b64}))
+            await asyncio.sleep(0.1)   # ~10 fps
+    except WebSocketDisconnect:
+        pass
+    except Exception as e:
+        logger.warning("Video stream error: %s", e)
+    finally:
+        _video_clients.discard(ws)
+        logger.info("Video client disconnected")

SemSorter/server/static/index.html ADDED Viewed

	@@ -0,0 +1,427 @@

+<!DOCTYPE html>
+<html lang="en">
+<head>
+<meta charset="UTF-8"/>
+<meta name="viewport" content="width=device-width, initial-scale=1.0"/>
+<title>SemSorter — AI Hazard Sorting System</title>
+<link rel="preconnect" href="https://fonts.googleapis.com"/>
+<link href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&family=JetBrains+Mono:wght@400;500&display=swap" rel="stylesheet"/>
+<style>
+/* ── Design tokens ── */
+:root{
+  --bg:#0a0d14;--surface:#111827;--surface2:#1a2235;--border:#1e2d45;
+  --accent:#3b82f6;--accent-glow:rgba(59,130,246,.35);
+  --success:#22c55e;--warning:#f59e0b;--danger:#ef4444;--chemical:#a78bfa;
+  --text:#e2e8f0;--text-muted:#64748b;--text-dim:#94a3b8;
+  --radius:12px;--radius-sm:8px;
+  --font:'Inter',system-ui,sans-serif;--mono:'JetBrains Mono',monospace;
+}
+*{box-sizing:border-box;margin:0;padding:0}
+body{background:var(--bg);color:var(--text);font-family:var(--font);min-height:100vh;
+     display:grid;grid-template-rows:auto 1fr;overflow:hidden;height:100vh}
+/* ── Header ── */
+header{display:flex;align-items:center;justify-content:space-between;
+       padding:14px 24px;background:var(--surface);border-bottom:1px solid var(--border);
+       backdrop-filter:blur(8px);}
+.logo{display:flex;align-items:center;gap:10px}
+.logo-icon{width:36px;height:36px;background:linear-gradient(135deg,var(--accent),#8b5cf6);
+           border-radius:9px;display:flex;align-items:center;justify-content:center;font-size:18px}
+.logo-text{font-weight:700;font-size:18px;letter-spacing:-.3px}
+.logo-sub{font-size:11px;color:var(--text-muted);font-weight:400;margin-top:1px}
+.header-status{display:flex;align-items:center;gap:8px;font-size:13px}
+.dot{width:8px;height:8px;border-radius:50%;background:var(--success);
+     box-shadow:0 0 8px var(--success);animation:pulse 2s infinite}
+@keyframes pulse{0%,100%{opacity:1}50%{opacity:.5}}
+/* ── Layout ── */
+main{display:grid;grid-template-columns:1fr 380px;gap:0;overflow:hidden}
+/* ── Left: Simulation panel ── */
+.sim-panel{display:flex;flex-direction:column;padding:20px;gap:16px;overflow:hidden}
+.sim-header{display:flex;align-items:center;justify-content:space-between}
+.panel-title{font-size:13px;font-weight:600;color:var(--text-dim);text-transform:uppercase;letter-spacing:.8px}
+.sim-container{flex:1;background:var(--surface);border:1px solid var(--border);
+               border-radius:var(--radius);overflow:hidden;position:relative;
+               display:flex;align-items:center;justify-content:center;min-height:300px}
+#sim-video{width:100%;height:100%;object-fit:contain;display:block}
+.sim-overlay{position:absolute;top:0;left:0;right:0;bottom:0;display:flex;
+             align-items:center;justify-content:center;background:rgba(10,13,20,.85);
+             flex-direction:column;gap:12px;transition:.3s}
+.sim-overlay.hidden{opacity:0;pointer-events:none}
+.spinner{width:40px;height:40px;border:3px solid var(--border);
+         border-top-color:var(--accent);border-radius:50%;animation:spin 1s linear infinite}
+@keyframes spin{to{transform:rotate(360deg)}}
+.sim-overlay p{color:var(--text-muted);font-size:14px}
+/* ── Status cards ── */
+.stats-row{display:grid;grid-template-columns:repeat(3,1fr);gap:12px}
+.stat-card{background:var(--surface);border:1px solid var(--border);border-radius:var(--radius-sm);
+           padding:12px 16px}
+.stat-label{font-size:11px;color:var(--text-muted);text-transform:uppercase;letter-spacing:.6px;margin-bottom:4px}
+.stat-value{font-size:22px;font-weight:700;font-family:var(--mono)}
+.stat-value.ok{color:var(--success)}
+.stat-value.busy{color:var(--warning)}
+/* ── Right: Agent panel ── */
+.agent-panel{background:var(--surface);border-left:1px solid var(--border);
+             display:flex;flex-direction:column;overflow:hidden}
+.agent-header{padding:16px 20px;border-bottom:1px solid var(--border);
+              display:flex;align-items:center;justify-content:space-between}
+.agent-title{font-weight:600;font-size:15px}
+.sdk-badge{background:rgba(59,130,246,.12);border:1px solid rgba(59,130,246,.3);
+           color:var(--accent);font-size:10px;font-weight:600;padding:2px 8px;
+           border-radius:20px;letter-spacing:.5px}
+/* ── Transcript ── */
+.transcript{flex:1;overflow-y:auto;padding:16px;display:flex;flex-direction:column;gap:10px;
+            scroll-behavior:smooth}
+.transcript::-webkit-scrollbar{width:4px}
+.transcript::-webkit-scrollbar-thumb{background:var(--border);border-radius:2px}
+.msg{display:flex;flex-direction:column;gap:3px;animation:fadeIn .25s ease}
+@keyframes fadeIn{from{opacity:0;transform:translateY(6px)}to{opacity:1;transform:none}}
+.msg-role{font-size:10px;font-weight:600;text-transform:uppercase;letter-spacing:.6px;color:var(--text-muted)}
+.msg-text{font-size:14px;line-height:1.55;padding:10px 13px;border-radius:10px;
+          background:var(--surface2);border:1px solid var(--border);max-width:100%}
+.msg.user .msg-role{color:var(--accent)}
+.msg.user .msg-text{background:rgba(59,130,246,.08);border-color:rgba(59,130,246,.2)}
+.msg.agent .msg-role{color:var(--success)}
+.msg.agent .msg-text{background:rgba(34,197,94,.06);border-color:rgba(34,197,94,.15)}
+.msg.system .msg-role{color:var(--text-muted)}
+.msg.system .msg-text{font-family:var(--mono);font-size:12px;background:var(--surface);
+                       border-style:dashed;white-space:pre-wrap}
+.msg.warning .msg-role{color:var(--warning)}
+.msg.warning .msg-text{background:rgba(245,158,11,.08);border-color:rgba(245,158,11,.3)}
+/* ── Quota warning banner ── */
+#quota-banner{display:none;background:rgba(245,158,11,.1);border:1px solid rgba(245,158,11,.35);
+              border-radius:var(--radius-sm);margin:0 16px;padding:10px 14px;
+              font-size:12px;color:var(--warning);line-height:1.5}
+#quota-banner.show{display:block}
+/* ── Input area ── */
+.input-area{padding:14px 16px;border-top:1px solid var(--border);display:flex;flex-direction:column;gap:10px}
+.input-row{display:flex;gap:8px}
+#cmd-input{flex:1;background:var(--surface2);border:1px solid var(--border);
+           border-radius:var(--radius-sm);padding:10px 14px;color:var(--text);
+           font-family:var(--font);font-size:14px;outline:none;transition:.2s}
+#cmd-input:focus{border-color:var(--accent);box-shadow:0 0 0 3px var(--accent-glow)}
+#cmd-input::placeholder{color:var(--text-muted)}
+.btn{border:none;cursor:pointer;border-radius:var(--radius-sm);font-family:var(--font);
+     font-weight:600;font-size:13px;transition:.18s;display:flex;align-items:center;gap:6px;
+     white-space:nowrap}
+.btn:active{transform:scale(.96)}
+.btn-primary{background:var(--accent);color:#fff;padding:10px 18px}
+.btn-primary:hover{background:#2563eb}
+.btn-voice{background:var(--surface2);border:1px solid var(--border);color:var(--text);padding:10px 14px;font-size:16px}
+.btn-voice.listening{background:rgba(239,68,68,.15);border-color:var(--danger);color:var(--danger);animation:pulse 1s infinite}
+.btn-voice:hover{border-color:var(--accent)}
+.action-btns{display:flex;gap:8px}
+.btn-action{flex:1;padding:9px 12px;font-size:12px}
+.btn-scan{background:rgba(59,130,246,.12);border:1px solid rgba(59,130,246,.3);color:var(--accent)}
+.btn-scan:hover{background:rgba(59,130,246,.22)}
+.btn-sort{background:rgba(34,197,94,.1);border:1px solid rgba(34,197,94,.3);color:var(--success)}
+.btn-sort:hover{background:rgba(34,197,94,.2)}
+.btn-state{background:rgba(167,139,250,.1);border:1px solid rgba(167,139,250,.3);color:var(--chemical)}
+.btn-state:hover{background:rgba(167,139,250,.2)}
+.stt-hint{font-size:11px;color:var(--text-muted);text-align:center}
+/* ── Item list ── */
+.items-section{padding:0 16px 10px;border-top:1px solid var(--border);padding-top:10px}
+.items-title{font-size:11px;font-weight:600;color:var(--text-muted);text-transform:uppercase;
+             letter-spacing:.6px;margin-bottom:8px}
+.item-list{display:flex;flex-direction:column;gap:5px;max-height:110px;overflow-y:auto}
+.item-pill{display:flex;align-items:center;justify-content:space-between;
+           padding:5px 10px;border-radius:6px;font-size:12px;
+           background:var(--surface2);border:1px solid var(--border)}
+.item-pill .name{font-family:var(--mono);color:var(--text-dim)}
+.item-pill .badge{font-size:10px;font-weight:600;padding:2px 7px;border-radius:20px}
+.badge.flammable{background:rgba(239,68,68,.15);color:#f87171;border:1px solid rgba(239,68,68,.25)}
+.badge.chemical{background:rgba(167,139,250,.15);color:#c4b5fd;border:1px solid rgba(167,139,250,.25)}
+.badge.safe{background:rgba(100,116,139,.15);color:#94a3b8;border:1px solid rgba(100,116,139,.25)}
+.badge.sorted{background:rgba(34,197,94,.1);color:#4ade80;border:1px solid rgba(34,197,94,.2)}
+</style>
+</head>
+<body>
+<header>
+  <div class="logo">
+    <div class="logo-icon">🤖</div>
+    <div>
+      <div class="logo-text">SemSorter</div>
+      <div class="logo-sub">AI Hazard Sorting — Vision-Agents SDK</div>
+    </div>
+  </div>
+  <div class="header-status">
+    <div class="dot" id="conn-dot"></div>
+    <span id="conn-label">Connecting…</span>
+  </div>
+</header>
+<main>
+  <!-- Left: simulation video -->
+  <div class="sim-panel">
+    <div class="sim-header">
+      <span class="panel-title">Live Simulation Feed</span>
+      <span style="font-size:12px;color:var(--text-muted)" id="fps-label">— fps</span>
+    </div>
+    <div class="sim-container">
+      <img id="sim-video" alt="MuJoCo simulation" src=""/>
+      <div class="sim-overlay" id="sim-overlay">
+        <div class="spinner"></div>
+        <p>Warming up simulation…</p>
+      </div>
+    </div>
+    <div class="stats-row">
+      <div class="stat-card">
+        <div class="stat-label">Items Sorted</div>
+        <div class="stat-value ok" id="stat-sorted">0</div>
+      </div>
+      <div class="stat-card">
+        <div class="stat-label">Arm Status</div>
+        <div class="stat-value ok" id="stat-arm">Idle</div>
+      </div>
+      <div class="stat-card">
+        <div class="stat-label">Sim Time</div>
+        <div class="stat-value" id="stat-time" style="font-size:18px">0.0 s</div>
+      </div>
+    </div>
+  </div>
+  <!-- Right: agent chat panel -->
+  <div class="agent-panel">
+    <div class="agent-header">
+      <div class="agent-title">SemSorter AI</div>
+      <div class="sdk-badge">Vision-Agents SDK</div>
+    </div>
+    <div id="quota-banner"></div>
+    <div class="transcript" id="transcript"></div>
+    <div class="items-section">
+      <div class="items-title">Conveyor Items</div>
+      <div class="item-list" id="item-list"><span style="font-size:12px;color:var(--text-muted)">Loading…</span></div>
+    </div>
+    <div class="input-area">
+      <div class="action-btns">
+        <button class="btn btn-action btn-scan" onclick="sendWs('scan')">🔍 Scan</button>
+        <button class="btn btn-action btn-sort" onclick="sendWs('sort')">⚡ Sort All</button>
+        <button class="btn btn-action btn-state" onclick="sendWs('state')">📊 State</button>
+      </div>
+      <div class="input-row">
+        <input id="cmd-input" placeholder="Type a command…" autocomplete="off"
+               onkeydown="if(event.key==='Enter')sendCommand()"/>
+        <button class="btn btn-voice" id="voice-btn" onclick="toggleVoice()" title="Voice input">🎤</button>
+        <button class="btn btn-primary" onclick="sendCommand()">Send</button>
+      </div>
+      <div class="stt-hint" id="stt-hint">Using browser speech recognition</div>
+    </div>
+  </div>
+</main>
+<script>
+// ─── WebSocket connections ────────────────────────────────────────────────────
+const WS_BASE = `${location.protocol === 'https:' ? 'wss' : 'ws'}://${location.host}`;
+let chatWs = null;
+let videoWs = null;
+let reconnectDelay = 1000;
+// ─── State ────────────────────────────────────────────────────────────────────
+let frameCount = 0, lastFpsTime = Date.now();
+let listening = false;
+let recognition = null;
+// ─── Chat WebSocket ───────────────────────────────────────────────────────────
+function connectChat() {
+  chatWs = new WebSocket(`${WS_BASE}/ws/chat`);
+  chatWs.onopen = () => {
+    setConnected(true);
+    reconnectDelay = 1000;
+    pollState();
+  };
+  chatWs.onmessage = ({data}) => handleChatMessage(JSON.parse(data));
+  chatWs.onclose = () => {
+    setConnected(false);
+    setTimeout(connectChat, reconnectDelay = Math.min(reconnectDelay * 1.5, 10000));
+  };
+  chatWs.onerror = () => chatWs.close();
+}
+function handleChatMessage(msg) {
+  switch(msg.type) {
+    case 'welcome':       addMsg('agent', 'SemSorter AI', msg.text); break;
+    case 'user_message':  addMsg('user',  'You',          msg.text); break;
+    case 'agent_response': addMsg('agent','SemSorter AI', msg.text); break;
+    case 'scan_result':   renderScanResult(msg.data); break;
+    case 'sort_result':   renderSortResult(msg.data); break;
+    case 'state':         renderState(msg.data); break;
+    case 'quota_warning': showQuotaWarning(msg.service, msg.message); break;
+    case 'system':        addMsg('system', 'System', msg.text); break;
+  }
+}
+function sendWs(type, extra={}) {
+  if (!chatWs || chatWs.readyState !== 1) return;
+  chatWs.send(JSON.stringify({type, ...extra}));
+}
+function sendCommand() {
+  const input = document.getElementById('cmd-input');
+  const text = input.value.trim();
+  if (!text) return;
+  input.value = '';
+  sendWs('command', {text});
+}
+// ─── Video WebSocket ──────────────────────────────────────────────────────────
+function connectVideo() {
+  videoWs = new WebSocket(`${WS_BASE}/ws/video`);
+  videoWs.onmessage = ({data}) => {
+    const {type, data: b64} = JSON.parse(data);
+    if (type === 'frame') {
+      const img = document.getElementById('sim-video');
+      img.src = `data:image/jpeg;base64,${b64}`;
+      document.getElementById('sim-overlay').classList.add('hidden');
+      frameCount++;
+    }
+  };
+  videoWs.onclose = () => setTimeout(connectVideo, 2000);
+  videoWs.onerror = () => videoWs.close();
+}
+// FPS counter
+setInterval(() => {
+  const now = Date.now();
+  const fps = Math.round(frameCount / ((now - lastFpsTime) / 1000));
+  document.getElementById('fps-label').textContent = `${fps} fps`;
+  frameCount = 0; lastFpsTime = now;
+}, 2000);
+// ─── State polling ────────────────────────────────────────────────────────────
+function pollState() {
+  fetch('/api/state').then(r => r.json()).then(renderState).catch(()=>{});
+  setTimeout(pollState, 3000);
+}
+function renderState(s) {
+  document.getElementById('stat-sorted').textContent = s.items_sorted ?? 0;
+  const armEl = document.getElementById('stat-arm');
+  armEl.textContent = s.arm_busy ? 'Busy' : 'Idle';
+  armEl.className = `stat-value ${s.arm_busy ? 'busy' : 'ok'}`;
+  document.getElementById('stat-time').textContent = `${s.time ?? 0} s`;
+  if (s.items) renderItems(s.items);
+  if (s.quota_exceeded) Object.entries(s.quota_exceeded).forEach(([svc, exceeded]) => {
+    if (exceeded) showQuotaWarning(svc, `⚠️ ${svc} quota exceeded — demo mode active`);
+  });
+}
+function renderItems(items) {
+  const list = document.getElementById('item-list');
+  list.innerHTML = items.map(i => {
+    const type = i.hazard_type || 'safe';
+    const cls = i.picked ? 'sorted' : type.toLowerCase();
+    const label = i.picked ? '✓ sorted' : type;
+    return `<div class="item-pill">
+      <span class="name">${i.name}</span>
+      <span class="badge ${cls}">${label}</span>
+    </div>`;
+  }).join('');
+}
+// ─── Scan / sort renderers ────────────────────────────────────────────────────
+function renderScanResult(d) {
+  const demoNote = d.demo_mode ? ' [demo mode]' : '';
+  const lines = [`Found ${d.hazards_found} hazardous item(s)${demoNote}:`];
+  (d.items||[]).forEach(i =>
+    lines.push(`  • ${i.item_name} (${i.type}) → ${i.bin_type} bin`));
+  addMsg('system', 'Scan Result', lines.join('\n'));
+}
+function renderSortResult(d) {
+  const demoNote = d.demo_mode ? ' [demo mode]' : '';
+  const lines = [
+    `Sorted ${d.items_sorted}/${d.items_matched} item(s)${demoNote}:`,
+    ...(d.details||[]).map(x => `  ${x.success ? '✅' : '❌'} ${x.item} → ${x.bin}`)
+  ];
+  addMsg('system', 'Sort Result', lines.join('\n'));
+}
+// ─── Quota warning ────────────────────────────────────────────────────────────
+const _shownWarnings = new Set();
+function showQuotaWarning(service, message) {
+  if (_shownWarnings.has(service)) return;
+  _shownWarnings.add(service);
+  const banner = document.getElementById('quota-banner');
+  banner.textContent = message;
+  banner.classList.add('show');
+  addMsg('warning', 'API Status', message);
+}
+// ─── Transcript helpers ───────────────────────────────────────────────────────
+function addMsg(cls, role, text) {
+  const t = document.getElementById('transcript');
+  const div = document.createElement('div');
+  div.className = `msg ${cls}`;
+  div.innerHTML = `<div class="msg-role">${role}</div><div class="msg-text">${escHtml(text)}</div>`;
+  t.appendChild(div);
+  t.scrollTop = t.scrollHeight;
+}
+function escHtml(s){ return s.replace(/&/g,'&amp;').replace(/</g,'&lt;').replace(/>/g,'&gt;').replace(/\n/g,'<br>'); }
+// ─── Connection status ────────────────────────────────────────────────────────
+function setConnected(ok) {
+  const dot = document.getElementById('conn-dot');
+  const lbl = document.getElementById('conn-label');
+  dot.style.background = ok ? 'var(--success)' : 'var(--danger)';
+  dot.style.boxShadow = ok ? '0 0 8px var(--success)' : '0 0 8px var(--danger)';
+  dot.style.animation = ok ? 'pulse 2s infinite' : 'none';
+  lbl.textContent = ok ? 'Connected' : 'Reconnecting…';
+}
+// ─── Voice input (Web Speech API) ────────────────────────────────────────────
+function toggleVoice() {
+  const SpeechRec = window.SpeechRecognition || window.webkitSpeechRecognition;
+  if (!SpeechRec) {
+    addMsg('system','System','Browser speech recognition not supported. Use the text input.');
+    return;
+  }
+  if (listening) { recognition.stop(); return; }
+  recognition = new SpeechRec();
+  recognition.lang = 'en-US';
+  recognition.interimResults = false;
+  recognition.onresult = e => {
+    const text = e.results[0][0].transcript;
+    document.getElementById('cmd-input').value = text;
+    document.getElementById('stt-hint').textContent =
+      `Heard: "${text}" — sending…`;
+    sendCommand();
+  };
+  recognition.onend = () => {
+    listening = false;
+    document.getElementById('voice-btn').classList.remove('listening');
+    document.getElementById('stt-hint').textContent = 'Using browser speech recognition';
+  };
+  recognition.onerror = e => {
+    addMsg('system','STT',`Speech recognition error: ${e.error}`);
+    listening = false;
+    document.getElementById('voice-btn').classList.remove('listening');
+  };
+  recognition.start();
+  listening = true;
+  document.getElementById('voice-btn').classList.add('listening');
+  document.getElementById('stt-hint').textContent = '🎙️ Listening… speak now';
+}
+// ─── Boot ─────────────────────────────────────────────────────────────────────
+setConnected(false);
+connectChat();
+connectVideo();
+</script>
+</body>
+</html>

SemSorter/simulation/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ # SemSorter Simulation Module

SemSorter/simulation/controller.py ADDED Viewed

	@@ -0,0 +1,786 @@

+"""
+SemSorter MuJoCo Simulation Controller
+This module manages the Franka Panda robotic arm simulation for the SemSorter
+project. It loads the Panda from mujoco_menagerie, adds conveyors, waste bins,
+and hazardous items, then provides an async API for pick-and-place operations.
+Usage:
+    python controller.py              # Launch interactive viewer
+    python controller.py --render     # Render a test frame to PNG
+"""
+import asyncio
+import json
+import logging
+import math
+import os
+import time
+from dataclasses import dataclass, field
+from enum import Enum
+from pathlib import Path
+from typing import Dict, List, Optional, Tuple
+import mujoco
+import mujoco.viewer
+import numpy as np
+logger = logging.getLogger(__name__)
+# ─── Path configuration ─────────────────────────────────────────────────────
+PROJECT_ROOT = Path(__file__).resolve().parent.parent.parent  # SemSorter/
+MENAGERIE_DIR = PROJECT_ROOT / "mujoco_menagerie"
+PANDA_SCENE = MENAGERIE_DIR / "franka_emika_panda" / "scene.xml"
+# ─── Data types ──────────────────────────────────────────────────────────────
+class BinType(str, Enum):
+    FLAMMABLE = "flammable"
+    CHEMICAL = "chemical"
+    OUTPUT = "output"  # safe items go to output conveyor
+@dataclass
+class ItemInfo:
+    """Metadata for a conveyor item."""
+    name: str
+    body_id: int
+    geom_id: int
+    is_hazardous: bool
+    hazard_type: Optional[BinType] = None  # which bin it should go to
+    picked: bool = False
+@dataclass
+class SimState:
+    """Observable simulation state for the frontend."""
+    time: float = 0.0
+    ee_pos: Tuple[float, float, float] = (0, 0, 0)
+    gripper_open: bool = True
+    items: List[Dict] = field(default_factory=list)
+    arm_busy: bool = False
+    items_sorted: int = 0
+# ─── Bin positions (world coordinates) ───────────────────────────────────────
+BIN_POSITIONS = {
+    BinType.FLAMMABLE: np.array([-0.25, -0.40, 0.35]),   # Above the red bin
+    BinType.CHEMICAL:  np.array([0.25, -0.40, 0.35]),     # Above the yellow bin
+    BinType.OUTPUT:    np.array([0.40, 0.0, 0.40]),       # Output conveyor
+}
+# ─── Panda joint configuration ──────────────────────────────────────────────
+# Actuator indices (from panda.xml):
+#   0-6: arm joints (actuator1-7)
+#   7:   gripper (actuator8, ctrl 0=closed, 255=fully open)
+GRIPPER_ACTUATOR_ID = 7
+GRIPPER_OPEN = 255.0
+GRIPPER_CLOSED = 0.0
+NUM_ARM_JOINTS = 7
+ENV_CONTACT_TYPE = 2  # Keep environment/item contacts separate from robot links.
+class SemSorterSimulation:
+    """
+    MuJoCo simulation controller for the SemSorter pick-and-place task.
+    Loads the Franka Panda from menagerie, adds the warehouse environment
+    (conveyors, bins, items), and provides an async API for robot control.
+    """
+    def __init__(self):
+        self.model: Optional[mujoco.MjModel] = None
+        self.data: Optional[mujoco.MjData] = None
+        self.renderer: Optional[mujoco.Renderer] = None
+        self.items: Dict[str, ItemInfo] = {}
+        self._arm_busy = False
+        self._items_sorted = 0
+        self._running = False
+    # ─── Scene loading ───────────────────────────────────────────────────
+    def load_scene(self) -> None:
+        """Load the Panda scene from menagerie and add SemSorter objects."""
+        # Load base Panda scene
+        logger.info(f"Loading Panda from: {PANDA_SCENE}")
+        spec = mujoco.MjSpec.from_file(str(PANDA_SCENE))
+        # Modify the model name
+        spec.modelname = "semsorter"
+        # Set offscreen framebuffer size for rendering
+        spec.visual.global_.offwidth = 1920
+        spec.visual.global_.offheight = 1080
+        # ─── Add additional lights ───────────────────────────────────────
+        world = spec.worldbody
+        light = world.add_light()
+        light.pos = [0, -1, 2]
+        light.dir = [0, 0.5, -0.8]
+        light.diffuse = [0.4, 0.4, 0.4]
+        light2 = world.add_light()
+        light2.pos = [-1, -1, 2]
+        light2.dir = [0.3, 0.3, -0.8]
+        light2.diffuse = [0.3, 0.3, 0.3]
+        # ─── Add cameras ────────────────────────────────────────────────
+        cam_overview = world.add_camera()
+        cam_overview.name = "overview"
+        cam_overview.pos = [0, -1.4, 1.3]
+        cam_overview.quat = [0.92, 0.38, 0, 0]  # Look slightly down
+        cam_overview.fovy = 50
+        cam_top = world.add_camera()
+        cam_top.name = "topdown"
+        cam_top.pos = [0, 0, 2.0]
+        cam_top.quat = [0.0, 0.0, 0.0, 1.0]  # Look straight down
+        cam_top.fovy = 60
+        cam_side = world.add_camera()
+        cam_side.name = "side"
+        cam_side.pos = [1.5, 0, 0.8]
+        cam_side.quat = [0.65, 0.27, 0.27, 0.65]  # Side view
+        cam_side.fovy = 45
+        # ─── Add conveyors ──────────────────────────────────────────────
+        self._add_conveyor(spec, "input", pos=[-0.40, 0, 0])
+        self._add_conveyor(spec, "output", pos=[0.40, 0, 0])
+        # ─── Add waste bins ─────────────────────────────────────────────
+        self._add_bin(spec, "flammable", pos=[-0.25, -0.40, 0],
+                      color=[0.85, 0.15, 0.1, 0.9])
+        self._add_bin(spec, "chemical", pos=[0.25, -0.40, 0],
+                      color=[0.95, 0.75, 0.1, 0.9])
+        # ─── Add hazardous items on input conveyor ──────────────────────
+        items_spec = [
+            ("item_flammable_1", [-0.50, 0.0, 0.40], "cylinder", [0.025, 0.03],
+             [0.9, 0.1, 0.1, 1], True, BinType.FLAMMABLE),
+            ("item_chemical_1", [-0.40, 0.05, 0.40], "box", [0.025, 0.025, 0.025],
+             [0.95, 0.85, 0.1, 1], True, BinType.CHEMICAL),
+            ("item_chemical_2", [-0.30, -0.03, 0.40], "sphere", [0.025],
+             [0.95, 0.85, 0.1, 1], True, BinType.CHEMICAL),
+            ("item_safe_1", [-0.35, -0.05, 0.40], "box", [0.03, 0.025, 0.02],
+             [0.6, 0.6, 0.6, 1], False, BinType.OUTPUT),
+            ("item_safe_2", [-0.55, 0.04, 0.40], "cylinder", [0.022, 0.025],
+             [0.9, 0.9, 0.9, 1], False, BinType.OUTPUT),
+            ("item_flammable_2", [-0.45, 0.02, 0.40], "box", [0.022, 0.022, 0.022],
+             [0.9, 0.1, 0.1, 1], True, BinType.FLAMMABLE),
+        ]
+        for name, pos, shape, size, rgba, is_haz, haz_type in items_spec:
+            self._add_item(spec, name, pos, shape, size, rgba)
+            self.items[name] = ItemInfo(
+                name=name, body_id=-1, geom_id=-1,
+                is_hazardous=is_haz, hazard_type=haz_type if is_haz else None,
+            )
+        # Store desired spawn positions for post-keyframe initialization
+        self._item_spawn_positions = {
+            name: pos for name, pos, *_ in items_spec
+        }
+        # ─── Compile the model ──────────────────────────────────────────
+        self.model = spec.compile()
+        self.data = mujoco.MjData(self.model)
+        # Keep floor contacts in the environment collision group (not robot group).
+        floor_geom_id = mujoco.mj_name2id(
+            self.model, mujoco.mjtObj.mjOBJ_GEOM, "floor")
+        if floor_geom_id >= 0:
+            self.model.geom_contype[floor_geom_id] = ENV_CONTACT_TYPE
+            self.model.geom_conaffinity[floor_geom_id] = ENV_CONTACT_TYPE
+        # Resolve body/geom IDs for items
+        for name in self.items:
+            self.items[name].body_id = mujoco.mj_name2id(
+                self.model, mujoco.mjtObj.mjOBJ_BODY, name)
+            geom_name = f"{name}_geom"
+            self.items[name].geom_id = mujoco.mj_name2id(
+                self.model, mujoco.mjtObj.mjOBJ_GEOM, geom_name)
+        # ─── Reset to home pose ─────────────────────────────────────────
+        key_id = mujoco.mj_name2id(
+            self.model, mujoco.mjtObj.mjOBJ_KEY, "home")
+        if key_id >= 0:
+            mujoco.mj_resetDataKeyframe(self.model, self.data, key_id)
+        # ─── Set item initial positions (keyframe only has arm joints) ──
+        for name, pos in self._item_spawn_positions.items():
+            jnt_name = f"{name}_jnt"
+            jnt_id = mujoco.mj_name2id(
+                self.model, mujoco.mjtObj.mjOBJ_JOINT, jnt_name)
+            if jnt_id >= 0:
+                qadr = self.model.jnt_qposadr[jnt_id]
+                # freejoint qpos: [x, y, z, qw, qx, qy, qz]
+                self.data.qpos[qadr:qadr+3] = pos
+                self.data.qpos[qadr+3:qadr+7] = [1, 0, 0, 0]  # identity quat
+        mujoco.mj_forward(self.model, self.data)
+        logger.info(f"Scene compiled: {self.model.nbody} bodies, "
+                     f"{self.model.njnt} joints, {self.model.nu} actuators")
+        logger.info(f"Items registered: {list(self.items.keys())}")
+    def _add_conveyor(self, spec: mujoco.MjSpec, name: str, pos: list) -> None:
+        """Add a conveyor belt with frame and legs."""
+        world = spec.worldbody
+        body = world.add_body()
+        body.name = f"conveyor_{name}"
+        body.pos = pos
+        # Belt surface
+        belt = body.add_geom()
+        belt.name = f"belt_{name}"
+        belt.type = mujoco.mjtGeom.mjGEOM_BOX
+        belt.size = [0.35, 0.12, 0.005]
+        belt.pos = [0, 0, 0.35]
+        belt.rgba = [0.15, 0.15, 0.15, 1]
+        belt.friction = [0.8, 0.005, 0.0001]
+        belt.contype = ENV_CONTACT_TYPE
+        belt.conaffinity = ENV_CONTACT_TYPE
+        # Side rails
+        for side_name, y in [("L", 0.125), ("R", -0.125)]:
+            rail = body.add_geom()
+            rail.name = f"rail_{name}_{side_name}"
+            rail.type = mujoco.mjtGeom.mjGEOM_BOX
+            rail.size = [0.35, 0.005, 0.02]
+            rail.pos = [0, y, 0.37]
+            rail.rgba = [0.4, 0.4, 0.45, 1]
+            rail.contype = ENV_CONTACT_TYPE
+            rail.conaffinity = ENV_CONTACT_TYPE
+        # Legs
+        for lx, ly in [(-0.3, 0.1), (-0.3, -0.1), (0.3, 0.1), (0.3, -0.1)]:
+            leg = body.add_geom()
+            leg.type = mujoco.mjtGeom.mjGEOM_CYLINDER
+            leg.size = [0.015, 0.175, 0]
+            leg.pos = [lx, ly, 0.175]
+            leg.rgba = [0.4, 0.4, 0.45, 1]
+            leg.contype = ENV_CONTACT_TYPE
+            leg.conaffinity = ENV_CONTACT_TYPE
+    def _add_bin(self, spec: mujoco.MjSpec, name: str, pos: list,
+                 color: list) -> None:
+        """Add an open-top waste bin."""
+        world = spec.worldbody
+        body = world.add_body()
+        body.name = f"bin_{name}"
+        body.pos = pos
+        # Walls
+        wall_specs = [
+            (f"bin_{name}_back",  [0, -0.095, 0.12], [0.1, 0.005, 0.12]),
+            (f"bin_{name}_front", [0, 0.095, 0.12],  [0.1, 0.005, 0.12]),
+            (f"bin_{name}_left",  [-0.095, 0, 0.12], [0.005, 0.1, 0.12]),
+            (f"bin_{name}_right", [0.095, 0, 0.12],  [0.005, 0.1, 0.12]),
+        ]
+        for wname, wpos, wsize in wall_specs:
+            wall = body.add_geom()
+            wall.name = wname
+            wall.type = mujoco.mjtGeom.mjGEOM_BOX
+            wall.size = wsize
+            wall.pos = wpos
+            wall.rgba = color
+            wall.contype = ENV_CONTACT_TYPE
+            wall.conaffinity = ENV_CONTACT_TYPE
+        # Bottom
+        bottom = body.add_geom()
+        bottom.name = f"bin_{name}_bottom"
+        bottom.type = mujoco.mjtGeom.mjGEOM_BOX
+        bottom.size = [0.1, 0.1, 0.005]
+        bottom.pos = [0, 0, 0.005]
+        bottom.rgba = [0.1, 0.1, 0.1, 1]
+        bottom.contype = ENV_CONTACT_TYPE
+        bottom.conaffinity = ENV_CONTACT_TYPE
+    def _add_item(self, spec: mujoco.MjSpec, name: str, pos: list,
+                  shape: str, size: list, rgba: list) -> None:
+        """Add a free-jointed item to the world."""
+        world = spec.worldbody
+        body = world.add_body()
+        body.name = name
+        body.pos = pos
+        # Free joint
+        jnt = body.add_freejoint()
+        jnt.name = f"{name}_jnt"
+        # Geom
+        geom = body.add_geom()
+        geom.name = f"{name}_geom"
+        shape_map = {
+            "box": mujoco.mjtGeom.mjGEOM_BOX,
+            "sphere": mujoco.mjtGeom.mjGEOM_SPHERE,
+            "cylinder": mujoco.mjtGeom.mjGEOM_CYLINDER,
+        }
+        geom.type = shape_map[shape]
+        geom.size = size + [0] * (3 - len(size))  # Pad to 3 elements
+        geom.rgba = rgba
+        geom.mass = 0.05
+        geom.friction = [1.0, 0.005, 0.0001]
+        geom.priority = 1
+        geom.contype = ENV_CONTACT_TYPE
+        geom.conaffinity = ENV_CONTACT_TYPE
+    # ─── End-effector helpers ────────────────────────────────────────────
+    def get_ee_pos(self) -> np.ndarray:
+        """Get current end-effector (hand) position in world coords."""
+        hand_id = mujoco.mj_name2id(
+            self.model, mujoco.mjtObj.mjOBJ_BODY, "hand")
+        return self.data.xpos[hand_id].copy()
+    def get_ee_site_pos(self) -> np.ndarray:
+        """Get EE position — alias."""
+        return self.get_ee_pos()
+    def get_item_pos(self, item_name: str) -> Optional[np.ndarray]:
+        """Get position of an item by name."""
+        info = self.items.get(item_name)
+        if info and info.body_id >= 0:
+            return self.data.xpos[info.body_id].copy()
+        return None
+    def _set_item_pose(self, item_name: str, pos: np.ndarray,
+                       quat: Tuple[float, float, float, float] = (1, 0, 0, 0)) -> bool:
+        """Directly place an item free-joint at a world pose."""
+        jnt_name = f"{item_name}_jnt"
+        jnt_id = mujoco.mj_name2id(
+            self.model, mujoco.mjtObj.mjOBJ_JOINT, jnt_name)
+        if jnt_id < 0:
+            return False
+        qadr = self.model.jnt_qposadr[jnt_id]
+        self.data.qpos[qadr:qadr+3] = pos
+        self.data.qpos[qadr+3:qadr+7] = quat
+        dadr = self.model.jnt_dofadr[jnt_id]
+        self.data.qvel[dadr:dadr+6] = 0.0
+        return True
+    # ─── IK (Solver-based) ────────────────────────────────────────────
+    def reset_arm_neutral(self) -> None:
+        """
+        Move arm to a neutral upright pose where IK works well in all directions.
+        """
+        neutral_qpos = [0.0, -0.3, 0.0, -2.0, 0.0, 1.8, 0.0]
+        # Set qpos directly for arm joints (first 7)
+        self.data.qpos[:NUM_ARM_JOINTS] = neutral_qpos
+        self.data.ctrl[:NUM_ARM_JOINTS] = neutral_qpos
+        mujoco.mj_forward(self.model, self.data)
+    def solve_ik(self, target_pos: np.ndarray,
+                 target_quat: Optional[np.ndarray] = None,
+                 max_iter: int = 300,
+                 tolerance: float = 0.015,
+                 step_size: float = 0.5,
+                 damping: float = 0.05) -> Optional[np.ndarray]:
+        """
+        Pure kinematic IK solver — iterates Jacobian on qpos WITHOUT physics.
+        Returns joint angles (length 7) or None if failed.
+        """
+        hand_id = mujoco.mj_name2id(
+            self.model, mujoco.mjtObj.mjOBJ_BODY, "hand")
+        # Save original qpos to restore later (critical for not corrupting physics)
+        orig_qpos = self.data.qpos.copy()
+        # Work on a copy of qpos
+        qpos_arm = orig_qpos[:NUM_ARM_JOINTS].copy()
+        try:
+            for _ in range(max_iter):
+                # Temporarily set qpos, run forward kinematics
+                self.data.qpos[:NUM_ARM_JOINTS] = qpos_arm
+                mujoco.mj_forward(self.model, self.data)
+                current_pos = self.data.xpos[hand_id].copy()
+                err_pos = target_pos - current_pos
+                # Position Jacobian
+                jacp = np.zeros((3, self.model.nv))
+                mujoco.mj_jacBody(self.model, self.data, jacp, None, hand_id)
+                J = jacp[:, :NUM_ARM_JOINTS]
+                error = err_pos
+                if target_quat is not None:
+                    current_quat = self.data.xquat[hand_id].copy()
+                    err_rot = np.zeros(3)
+                    mujoco.mju_subQuat(err_rot, target_quat, current_quat)
+                    # Rotation Jacobian
+                    jacr = np.zeros((3, self.model.nv))
+                    mujoco.mj_jacBody(self.model, self.data, None, jacr, hand_id)
+                    Jr = jacr[:, :NUM_ARM_JOINTS]
+                    # Scale rotational error so position takes priority
+                    J = np.vstack([J, Jr * 0.5])
+                    error = np.concatenate([error, err_rot * 0.5])
+                if np.linalg.norm(error) < tolerance:
+                    return qpos_arm.copy()
+                # Damped least squares
+                JJT = J @ J.T + damping**2 * np.eye(J.shape[0])
+                dq = J.T @ np.linalg.solve(JJT, error)
+                # Update with step size and clamping
+                dq = np.clip(dq * step_size, -0.2, 0.2)
+                qpos_arm += dq
+                # Clamp to joint limits
+                for j in range(NUM_ARM_JOINTS):
+                    jnt_id = j  # arm joints are first 7
+                    lo = self.model.jnt_range[jnt_id, 0]
+                    hi = self.model.jnt_range[jnt_id, 1]
+                    if lo < hi:
+                        qpos_arm[j] = np.clip(qpos_arm[j], lo * 0.95, hi * 0.95)
+            return None  # Did not converge
+        finally:
+            # Always restore original qpos and run forward to fix physics state
+            self.data.qpos[:] = orig_qpos
+            mujoco.mj_forward(self.model, self.data)
+    def move_to_position(self, target_pos: np.ndarray,
+                         move_steps: int = 400,
+                         settle_steps: int = 100,
+                         position_tolerance: float = 0.05,
+                         carry_item: Optional[str] = None,
+                         carry_offset: Optional[np.ndarray] = None) -> bool:
+        """
+        Move end-effector to target position.
+        1. Solve IK kinematically
+        2. Interpolate joint targets smoothly (ease-in/ease-out)
+        3. Step physics to let arm move
+        Returns True if IK solution found.
+        """
+        solution = self.solve_ik(target_pos)
+        if solution is None:
+            logger.warning(f"IK failed for target {target_pos}")
+            return False
+        current_ctrl = self.data.ctrl[:NUM_ARM_JOINTS].copy()
+        if carry_item is not None and carry_offset is None:
+            carry_offset = np.array([0.0, 0.0, -0.06])
+        # Smooth interpolation to target
+        for i in range(move_steps):
+            alpha = (i + 1) / move_steps
+            t = alpha * alpha * (3 - 2 * alpha)  # Smoothstep
+            self.data.ctrl[:NUM_ARM_JOINTS] = current_ctrl * (1 - t) + solution * t
+            mujoco.mj_step(self.model, self.data)
+            if carry_item is not None:
+                ee = self.get_ee_pos()
+                self._set_item_pose(carry_item, ee + carry_offset)
+        # Settle
+        for _ in range(settle_steps):
+            mujoco.mj_step(self.model, self.data)
+            if carry_item is not None:
+                ee = self.get_ee_pos()
+                self._set_item_pose(carry_item, ee + carry_offset)
+        if carry_item is not None:
+            ee = self.get_ee_pos()
+            self._set_item_pose(carry_item, ee + carry_offset)
+            mujoco.mj_forward(self.model, self.data)
+        final_ee = self.get_ee_pos()
+        err = np.linalg.norm(target_pos - final_ee)
+        if err > position_tolerance:
+            logger.warning(
+                f"Move failed: target {target_pos}, reached {final_ee}, err={err:.4f}")
+            return False
+        return True
+    def set_gripper(self, open_gripper: bool) -> None:
+        """Open or close the gripper."""
+        self.data.ctrl[GRIPPER_ACTUATOR_ID] = (
+            GRIPPER_OPEN if open_gripper else GRIPPER_CLOSED
+        )
+    def step(self, n: int = 1) -> None:
+        """Advance the simulation by n steps."""
+        for _ in range(n):
+            mujoco.mj_step(self.model, self.data)
+    # ─── High-level pick-place operations ────────────────────────────────
+    def _stabilize_unpicked_items(self, exclude: str = "") -> None:
+        """Zero out velocities of all unpicked items to prevent physics drift.
+        Called before/after each pick-and-place so that the arm doesn't
+        knock neighboring items off the conveyor.
+        """
+        for name, info in self.items.items():
+            if name == exclude or info.picked:
+                continue
+            jnt_name = f"{name}_jnt"
+            jnt_id = mujoco.mj_name2id(
+                self.model, mujoco.mjtObj.mjOBJ_JOINT, jnt_name)
+            if jnt_id < 0:
+                continue
+            dadr = self.model.jnt_dofadr[jnt_id]
+            self.data.qvel[dadr:dadr + 6] = 0.0
+        mujoco.mj_forward(self.model, self.data)
+    def pick_and_place(self, item_name: str, target_bin: BinType) -> bool:
+        """
+        Execute a full pick-and-place sequence:
+        1. Open gripper
+        2. Move above item
+        3. Move down to item
+        4. Close gripper
+        5. Move up
+        6. Move above target bin
+        7. Open gripper (drop)
+        8. Return to neutral
+        """
+        info = self.items.get(item_name)
+        if not info or info.picked:
+            logger.warning(f"Item {item_name} not found or already picked")
+            return False
+        # Freeze all other items in place before we move the arm
+        self._stabilize_unpicked_items(exclude=item_name)
+        self._arm_busy = True
+        try:
+            item_pos = self.get_item_pos(item_name)
+            if item_pos is None:
+                logger.warning(f"Cannot get position for {item_name}")
+                return False
+            # Sanity check: item must be within reachable workspace
+            if (abs(item_pos[0]) > 1.0 or abs(item_pos[1]) > 1.0
+                    or item_pos[2] < 0.0 or item_pos[2] > 1.0):
+                logger.warning(
+                    f"Item {item_name} at {item_pos} is outside reachable "
+                    f"workspace — it may have been displaced by physics")
+                return False
+            logger.info(f"Picking {item_name} at {item_pos} -> {target_bin.value}")
+            # 1. Open gripper
+            self.set_gripper(True)
+            self.step(50)
+            # 1.5 Move high to ensure we clear the scene
+            safe_high = np.array([0.0, 0.0, 0.65])
+            if not self.move_to_position(safe_high, move_steps=200, settle_steps=50):
+                return False
+            # Re-read item position after safe-high move (physics may shift items)
+            item_pos = self.get_item_pos(item_name)
+            if item_pos is None or item_pos[2] < 0.0 or item_pos[2] > 1.0:
+                logger.warning(
+                    f"Item {item_name} moved to invalid position {item_pos} "
+                    f"during arm movement")
+                return False
+            # 2. Move above item (approach from above)
+            approach_pos = item_pos.copy()
+            approach_pos[2] += 0.10
+            if not self.move_to_position(approach_pos):
+                logger.warning(f"Failed to reach approach position for {item_name}")
+                return False
+            # 3. Move down to grasp
+            grasp_pos = item_pos.copy()
+            grasp_pos[2] += 0.03
+            if not self.move_to_position(grasp_pos):
+                logger.warning(f"Failed to reach grasp position for {item_name}")
+                return False
+            # 4. Close gripper
+            self.set_gripper(False)
+            self.step(120)  # allow gripper to close
+            # Verify we are close enough to claim a grasp.
+            ee_pos = self.get_ee_pos()
+            item_now = self.get_item_pos(item_name)
+            if item_now is None or np.linalg.norm(ee_pos - item_now) > 0.12:
+                logger.warning(
+                    f"Grasp verification failed for {item_name}: "
+                    f"ee={ee_pos}, item={item_now}")
+                return False
+            # Kinematic carry of the item for deterministic phase testing.
+            carry_offset = np.array([0.0, 0.0, -0.06])
+            self._set_item_pose(item_name, ee_pos + carry_offset)
+            mujoco.mj_forward(self.model, self.data)
+            # 5. Lift up while carrying.
+            lift_pos = grasp_pos.copy()
+            lift_pos[2] += 0.22
+            if not self.move_to_position(
+                lift_pos, carry_item=item_name, carry_offset=carry_offset):
+                return False
+            # 6. Move above target bin while carrying.
+            bin_pos = BIN_POSITIONS[target_bin].copy()
+            if not self.move_to_position(
+                bin_pos, carry_item=item_name, carry_offset=carry_offset):
+                return False
+            # 7. Place and release.
+            drop_pos = bin_pos.copy()
+            drop_pos[2] -= 0.12
+            self._set_item_pose(item_name, drop_pos)
+            mujoco.mj_forward(self.model, self.data)
+            self.set_gripper(True)
+            self.step(100)
+            # Mark item as sorted only after successful place.
+            info.picked = True
+            self._items_sorted += 1
+            # 8. Return to neutral.
+            neutral = np.array([0.0, 0.0, 0.6])
+            self.move_to_position(neutral)
+            # Stabilize remaining items after arm movement
+            self._stabilize_unpicked_items()
+            logger.info(f"Successfully placed {item_name} in {target_bin.value}")
+            return True
+        finally:
+            self._arm_busy = False
+    # ─── State snapshot ──────────────────────────────────────────────────
+    def get_state(self) -> SimState:
+        """Get current simulation state for the frontend."""
+        ee = self.get_ee_pos()
+        items_info = []
+        for name, info in self.items.items():
+            pos = self.get_item_pos(name)
+            items_info.append({
+                "name": name,
+                "pos": pos.tolist() if pos is not None else [0, 0, 0],
+                "is_hazardous": info.is_hazardous,
+                "hazard_type": info.hazard_type.value if info.hazard_type else None,
+                "picked": info.picked,
+            })
+        return SimState(
+            time=self.data.time,
+            ee_pos=tuple(ee),
+            gripper_open=self.data.ctrl[GRIPPER_ACTUATOR_ID] > 100,
+            items=items_info,
+            arm_busy=self._arm_busy,
+            items_sorted=self._items_sorted,
+        )
+    # ─── Rendering ───────────────────────────────────────────────────────
+    def render_frame(self, width: int = 1280, height: int = 720,
+                     camera: str = "overview") -> np.ndarray:
+        """Render a frame from the specified camera. Returns RGB array."""
+        if self.renderer is None:
+            self.renderer = mujoco.Renderer(self.model, height, width)
+        cam_id = mujoco.mj_name2id(
+            self.model, mujoco.mjtObj.mjOBJ_CAMERA, camera)
+        self.renderer.update_scene(self.data, camera=cam_id)
+        return self.renderer.render()
+    def save_frame(self, path: str, camera: str = "overview") -> None:
+        """Render a frame and save as PNG."""
+        from PIL import Image
+        frame = self.render_frame(camera=camera)
+        Image.fromarray(frame).save(path)
+        logger.info(f"Frame saved to {path}")
+    def close(self) -> None:
+        """Release renderer resources explicitly."""
+        if self.renderer is not None:
+            try:
+                self.renderer.close()
+            except Exception:
+                pass  # EGL cleanup errors are harmless at shutdown
+            self.renderer = None
+    # ─── Interactive viewer ──────────────────────────────────────────────
+    def launch_viewer(self) -> None:
+        """Launch the interactive MuJoCo viewer."""
+        key_id = mujoco.mj_name2id(
+            self.model, mujoco.mjtObj.mjOBJ_KEY, "home")
+        if key_id >= 0:
+            mujoco.mj_resetDataKeyframe(self.model, self.data, key_id)
+        mujoco.viewer.launch(self.model, self.data)
+    # ─── Async interface for agent integration ───────────────────────────
+    async def async_pick_and_place(self, item_name: str,
+                                    target_bin: BinType) -> Dict:
+        """Async wrapper around pick_and_place for agent integration."""
+        loop = asyncio.get_event_loop()
+        success = await loop.run_in_executor(
+            None, self.pick_and_place, item_name, target_bin
+        )
+        return {
+            "success": success,
+            "item": item_name,
+            "target_bin": target_bin.value,
+            "items_sorted": self._items_sorted,
+        }
+    async def async_get_state(self) -> Dict:
+        """Async state snapshot."""
+        state = self.get_state()
+        return {
+            "time": state.time,
+            "ee_pos": list(state.ee_pos),
+            "gripper_open": state.gripper_open,
+            "items": state.items,
+            "arm_busy": state.arm_busy,
+            "items_sorted": state.items_sorted,
+        }
+# ─── CLI entry point ────────────────────────────────────────────────────────
+def main():
+    import argparse
+    parser = argparse.ArgumentParser(description="SemSorter Simulation Controller")
+    parser.add_argument("--render", action="store_true",
+                        help="Render a test frame and save as PNG")
+    parser.add_argument("--test-pick", action="store_true",
+                        help="Test pick-and-place of first hazardous item")
+    parser.add_argument("--output", default="test_frame.png",
+                        help="Output path for rendered frame")
+    args = parser.parse_args()
+    logging.basicConfig(level=logging.INFO)
+    sim = SemSorterSimulation()
+    sim.load_scene()
+    try:
+        if args.render:
+            sim.save_frame(args.output)
+            print(f"Frame saved to {args.output}")
+        elif args.test_pick:
+            print("Testing pick-and-place...")
+            sim.pick_and_place("item_flammable_1", BinType.FLAMMABLE)
+            sim.save_frame("after_pick.png")
+            print(f"Done! Items sorted: {sim._items_sorted}")
+        else:
+            print("Launching interactive viewer...")
+            sim.launch_viewer()
+    finally:
+        sim.close()
+if __name__ == "__main__":
+    main()

SemSorter/simulation/interactive_test.py ADDED Viewed

	@@ -0,0 +1,70 @@

+"""
+Interactive viewer for SemSorter simulation.
+Runs pick-and-place in real time with the MuJoCo viewer.
+Usage:
+    python3 interactive_test.py
+"""
+import os
+import time
+import mujoco
+import mujoco.viewer
+try:
+    from .controller import SemSorterSimulation, BinType
+except ImportError:
+    from controller import SemSorterSimulation, BinType
+# How often to sync the viewer (every N physics steps)
+VIEWER_SYNC_INTERVAL = 10
+def main():
+    print("Initializing simulation...")
+    # NOTE: Do NOT set MUJOCO_GL=egl when using the interactive viewer
+    if 'MUJOCO_GL' in os.environ:
+        del os.environ['MUJOCO_GL']
+    sim = SemSorterSimulation()
+    sim.load_scene()
+    print("Launching interactive viewer. Watch the arm move!")
+    with mujoco.viewer.launch_passive(sim.model, sim.data) as viewer:
+        # Patch mj_step to sync viewer every N steps (much faster than every step)
+        original_mj_step = mujoco.mj_step
+        step_counter = [0]
+        def patched_mj_step(model, data):
+            original_mj_step(model, data)
+            step_counter[0] += 1
+            if step_counter[0] % VIEWER_SYNC_INTERVAL == 0:
+                viewer.sync()
+                # Sleep only on sync frames to maintain ~real-time playback
+                time.sleep(model.opt.timestep * VIEWER_SYNC_INTERVAL)
+        mujoco.mj_step = patched_mj_step
+        try:
+            # Let the scene settle
+            sim.step(200)
+            time.sleep(2)  # Give user time to see the initial state
+            print("\nStarting pick-and-place operation...")
+            success = sim.pick_and_place("item_flammable_1", BinType.FLAMMABLE)
+            print(f"\nDone! success={success}, items sorted: {sim._items_sorted}")
+            print("\nYou can close the viewer window now, or press Ctrl+C.")
+            # Keep viewer open until user closes it
+            while viewer.is_running():
+                original_mj_step(sim.model, sim.data)
+                viewer.sync()
+                time.sleep(0.02)  # ~50 FPS idle
+        except KeyboardInterrupt:
+            print("\nViewer closed.")
+        finally:
+            mujoco.mj_step = original_mj_step
+if __name__ == "__main__":
+    main()

SemSorter/simulation/semsorter_scene.xml ADDED Viewed

	@@ -0,0 +1,194 @@

+<mujoco model="semsorter">
+  <!-- Let panda.xml handle its own meshdir via its embedded compiler element -->
+  <compiler angle="radian" autolimits="true"/>
+  <option integrator="implicitfast" gravity="0 0 -9.81" timestep="0.002"/>
+  <!-- ============================================================ -->
+  <!-- Include the Franka Panda arm (with integrated gripper)        -->
+  <!-- ============================================================ -->
+  <include file="../../mujoco_menagerie/franka_emika_panda/panda.xml"/>
+  <statistic center="0 0 0.5" extent="1.5"/>
+  <!-- ============================================================ -->
+  <!-- Visual settings                                               -->
+  <!-- ============================================================ -->
+  <visual>
+    <headlight diffuse="0.6 0.6 0.6" ambient="0.3 0.3 0.3" specular="0 0 0"/>
+    <rgba haze="0.15 0.25 0.35 1"/>
+    <global azimuth="150" elevation="-25"/>
+  </visual>
+  <!-- ============================================================ -->
+  <!-- Textures & Materials                                          -->
+  <!-- ============================================================ -->
+  <asset>
+    <texture type="skybox" builtin="gradient" rgb1="0.3 0.5 0.7" rgb2="0 0 0" width="512" height="3072"/>
+    <texture type="2d" name="groundplane" builtin="checker" mark="edge"
+      rgb1="0.2 0.3 0.4" rgb2="0.1 0.2 0.3" markrgb="0.8 0.8 0.8" width="300" height="300"/>
+    <material name="groundplane" texture="groundplane" texuniform="true" texrepeat="5 5" reflectance="0.2"/>
+    <!-- Conveyor belt material -->
+    <texture type="2d" name="belt_tex" builtin="checker" rgb1="0.15 0.15 0.15" rgb2="0.2 0.2 0.2"
+      width="100" height="100"/>
+    <material name="belt_mat" texture="belt_tex" texrepeat="10 2" specular="0.1" shininess="0.05"/>
+    <!-- Conveyor frame material -->
+    <material name="frame_mat" rgba="0.4 0.4 0.45 1" specular="0.3" shininess="0.2"/>
+    <!-- Bin materials -->
+    <material name="bin_flammable_mat" rgba="0.85 0.15 0.1 0.9" specular="0.2" shininess="0.1"/>
+    <material name="bin_chemical_mat" rgba="0.95 0.75 0.1 0.9" specular="0.2" shininess="0.1"/>
+    <material name="bin_inner_mat" rgba="0.1 0.1 0.1 1"/>
+    <!-- Hazardous item materials -->
+    <material name="hazard_red" rgba="0.9 0.1 0.1 1" specular="0.4" shininess="0.3"/>
+    <material name="hazard_green" rgba="0.1 0.8 0.2 1" specular="0.4" shininess="0.3"/>
+    <material name="hazard_blue" rgba="0.1 0.2 0.9 1" specular="0.4" shininess="0.3"/>
+    <material name="hazard_yellow" rgba="0.95 0.85 0.1 1" specular="0.4" shininess="0.3"/>
+    <material name="safe_gray" rgba="0.6 0.6 0.6 1" specular="0.3" shininess="0.2"/>
+    <material name="safe_white" rgba="0.9 0.9 0.9 1" specular="0.3" shininess="0.2"/>
+  </asset>
+  <!-- ============================================================ -->
+  <!-- World: floor, lights, conveyors, bins, items                  -->
+  <!-- ============================================================ -->
+  <worldbody>
+    <!-- Lighting -->
+    <light pos="0 0 3" dir="0 0 -1" directional="true" diffuse="0.5 0.5 0.5"/>
+    <light pos="1 -1 2" dir="-0.3 0.3 -0.8" diffuse="0.3 0.3 0.3"/>
+    <light pos="-1 -1 2" dir="0.3 0.3 -0.8" diffuse="0.3 0.3 0.3"/>
+    <!-- Ground plane -->
+    <geom name="floor" size="0 0 0.05" type="plane" material="groundplane"/>
+    <!-- ======================================================== -->
+    <!-- CONVEYOR A (Input) — items arrive here from the left      -->
+    <!-- ======================================================== -->
+    <body name="conveyor_input" pos="-0.55 0 0">
+      <!-- Belt surface -->
+      <geom name="belt_input" type="box" size="0.35 0.12 0.005" pos="0 0 0.35"
+        material="belt_mat" friction="0.8 0.005 0.0001"/>
+      <!-- Side rails -->
+      <geom name="rail_input_L" type="box" size="0.35 0.005 0.02" pos="0 0.125 0.37"
+        material="frame_mat"/>
+      <geom name="rail_input_R" type="box" size="0.35 0.005 0.02" pos="0 -0.125 0.37"
+        material="frame_mat"/>
+      <!-- Legs -->
+      <geom type="cylinder" size="0.015 0.175" pos="-0.3 0.1 0.175" material="frame_mat"/>
+      <geom type="cylinder" size="0.015 0.175" pos="-0.3 -0.1 0.175" material="frame_mat"/>
+      <geom type="cylinder" size="0.015 0.175" pos="0.3 0.1 0.175" material="frame_mat"/>
+      <geom type="cylinder" size="0.015 0.175" pos="0.3 -0.1 0.175" material="frame_mat"/>
+    </body>
+    <!-- ======================================================== -->
+    <!-- CONVEYOR B (Output) — clean items continue here (right)   -->
+    <!-- ======================================================== -->
+    <body name="conveyor_output" pos="0.55 0 0">
+      <!-- Belt surface -->
+      <geom name="belt_output" type="box" size="0.35 0.12 0.005" pos="0 0 0.35"
+        material="belt_mat" friction="0.8 0.005 0.0001"/>
+      <!-- Side rails -->
+      <geom name="rail_output_L" type="box" size="0.35 0.005 0.02" pos="0 0.125 0.37"
+        material="frame_mat"/>
+      <geom name="rail_output_R" type="box" size="0.35 0.005 0.02" pos="0 -0.125 0.37"
+        material="frame_mat"/>
+      <!-- Legs -->
+      <geom type="cylinder" size="0.015 0.175" pos="-0.3 0.1 0.175" material="frame_mat"/>
+      <geom type="cylinder" size="0.015 0.175" pos="-0.3 -0.1 0.175" material="frame_mat"/>
+      <geom type="cylinder" size="0.015 0.175" pos="0.3 0.1 0.175" material="frame_mat"/>
+      <geom type="cylinder" size="0.015 0.175" pos="0.3 -0.1 0.175" material="frame_mat"/>
+    </body>
+    <!-- ======================================================== -->
+    <!-- FLAMMABLE WASTE BIN (Red) — front-left of the arm         -->
+    <!-- ======================================================== -->
+    <body name="bin_flammable" pos="-0.25 0.45 0">
+      <!-- Bin walls (open top box) -->
+      <geom name="bin_fl_back"  type="box" size="0.1 0.005 0.12" pos="0 -0.095 0.12" material="bin_flammable_mat"/>
+      <geom name="bin_fl_front" type="box" size="0.1 0.005 0.12" pos="0 0.095 0.12" material="bin_flammable_mat"/>
+      <geom name="bin_fl_left"  type="box" size="0.005 0.1 0.12" pos="-0.095 0 0.12" material="bin_flammable_mat"/>
+      <geom name="bin_fl_right" type="box" size="0.005 0.1 0.12" pos="0.095 0 0.12" material="bin_flammable_mat"/>
+      <geom name="bin_fl_bottom" type="box" size="0.1 0.1 0.005" pos="0 0 0.005" material="bin_inner_mat"/>
+      <!-- Label area (slightly raised red panel on front) -->
+      <site name="bin_flammable_label" pos="0 0.1 0.18" size="0.06 0.005 0.03" type="box" rgba="1 0 0 1"/>
+    </body>
+    <!-- ======================================================== -->
+    <!-- CHEMICAL WASTE BIN (Yellow) — front-right of the arm      -->
+    <!-- ======================================================== -->
+    <body name="bin_chemical" pos="0.25 0.45 0">
+      <!-- Bin walls (open top box) -->
+      <geom name="bin_ch_back"  type="box" size="0.1 0.005 0.12" pos="0 -0.095 0.12" material="bin_chemical_mat"/>
+      <geom name="bin_ch_front" type="box" size="0.1 0.005 0.12" pos="0 0.095 0.12" material="bin_chemical_mat"/>
+      <geom name="bin_ch_left"  type="box" size="0.005 0.1 0.12" pos="-0.095 0 0.12" material="bin_chemical_mat"/>
+      <geom name="bin_ch_right" type="box" size="0.005 0.1 0.12" pos="0.095 0 0.12" material="bin_chemical_mat"/>
+      <geom name="bin_ch_bottom" type="box" size="0.1 0.1 0.005" pos="0 0 0.005" material="bin_inner_mat"/>
+      <!-- Label area -->
+      <site name="bin_chemical_label" pos="0 0.1 0.18" size="0.06 0.005 0.03" type="box" rgba="1 0.8 0 1"/>
+    </body>
+    <!-- ======================================================== -->
+    <!-- HAZARDOUS ITEMS (on input conveyor, with free joints)      -->
+    <!-- ======================================================== -->
+    <!-- Item 1: Red cylinder (flammable chemical) — leftmost -->
+    <body name="item_flammable_1" pos="-0.82 0 0.39">
+      <freejoint name="item_flammable_1_jnt"/>
+      <geom name="item_flammable_1_geom" type="cylinder" size="0.02 0.025"
+        material="hazard_red" mass="0.05" friction="1 0.005 0.0001" priority="1"/>
+    </body>
+    <!-- Item 2: Safe white cylinder (goes to output conveyor) -->
+    <body name="item_safe_2" pos="-0.70 0 0.39">
+      <freejoint name="item_safe_2_jnt"/>
+      <geom name="item_safe_2_geom" type="cylinder" size="0.018 0.02"
+        material="safe_white" mass="0.04" friction="1 0.005 0.0001" priority="1"/>
+    </body>
+    <!-- Item 3: Yellow box (chemical waste) -->
+    <body name="item_chemical_1" pos="-0.58 0 0.385">
+      <freejoint name="item_chemical_1_jnt"/>
+      <geom name="item_chemical_1_geom" type="box" size="0.02 0.02 0.02"
+        material="hazard_yellow" mass="0.04" friction="1 0.005 0.0001" priority="1"/>
+    </body>
+    <!-- Item 4: Safe gray box (goes to output conveyor) -->
+    <body name="item_safe_1" pos="-0.46 0 0.385">
+      <freejoint name="item_safe_1_jnt"/>
+      <geom name="item_safe_1_geom" type="box" size="0.025 0.02 0.015"
+        material="safe_gray" mass="0.05" friction="1 0.005 0.0001" priority="1"/>
+    </body>
+    <!-- Item 5: Blue box (chemical waste) -->
+    <body name="item_chemical_2" pos="-0.34 0 0.385">
+      <freejoint name="item_chemical_2_jnt"/>
+      <geom name="item_chemical_2_geom" type="box" size="0.018 0.018 0.018"
+        material="hazard_blue" mass="0.03" friction="1 0.005 0.0001" priority="1"/>
+    </body>
+    <!-- Item 6: Green box (flammable) — rightmost -->
+    <body name="item_flammable_2" pos="-0.22 0 0.385">
+      <freejoint name="item_flammable_2_jnt"/>
+      <geom name="item_flammable_2_geom" type="box" size="0.018 0.018 0.018"
+        material="hazard_green" mass="0.035" friction="1 0.005 0.0001" priority="1"/>
+    </body>
+    <!-- ======================================================== -->
+    <!-- Camera for the overview shot (used by OBS or renderer)    -->
+    <!-- ======================================================== -->
+    <camera name="overview" pos="0 -1.2 1.2" xyaxes="1 0 0 0 0.7 0.7" fovy="50"/>
+    <camera name="topdown" pos="0 0 2.0" xyaxes="1 0 0 0 1 0" fovy="60"/>
+    <camera name="side" pos="1.5 0 0.8" xyaxes="0 1 0 -0.5 0 0.87" fovy="45"/>
+  </worldbody>
+  <!-- ============================================================ -->
+  <!-- Sensors for end-effector position tracking                    -->
+  <!-- ============================================================ -->
+  <sensor>
+    <framepos name="end_effector_pos" objtype="body" objname="hand"/>
+  </sensor>
+</mujoco>

SemSorter/vision/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ # SemSorter Vision Module

SemSorter/vision/test_obs.py ADDED Viewed

	@@ -0,0 +1,29 @@

+import cv2
+import time
+def main():
+    print("Testing OBS Virtual Camera on /dev/video4...")
+    # Open the virtual camera
+    cap = cv2.VideoCapture(4)
+    if not cap.isOpened():
+        print("Error: Could not open video device /dev/video4.")
+        print("Please ensure OBS Virtual Camera is running.")
+        return
+    print("Successfully opened camera. Waiting 2 seconds for it to warm up...")
+    time.sleep(2)
+    ret, frame = cap.read()
+    if not ret:
+        print("Error: Could not read frame from camera.")
+    else:
+        output_file = "obs_snapshot.png"
+        cv2.imwrite(output_file, frame)
+        print(f"Success! Captured frame with shape {frame.shape} and saved to {output_file}.")
+    cap.release()
+if __name__ == "__main__":
+    main()

SemSorter/vision/vision_pipeline.py ADDED Viewed

	@@ -0,0 +1,239 @@

+"""
+SemSorter Vision Pipeline — Hazard Detection Processor
+Captures frames from OBS Virtual Camera or directly from the simulation,
+then sends them to Gemini VLM for hazardous item detection.
+Usage:
+    # From OBS Virtual Camera:
+    GOOGLE_API_KEY=... python3 vision_pipeline.py
+    # From simulation directly (no OBS needed):
+    GOOGLE_API_KEY=... python3 vision_pipeline.py --direct
+"""
+import os
+import sys
+import cv2
+import json
+import time
+import logging
+import google.generativeai as genai
+from PIL import Image
+from typing import List, Dict, Optional
+logger = logging.getLogger(__name__)
+class HazardDetectionProcessor:
+    """
+    Detects hazardous items in the SemSorter simulation using Gemini VLM.
+    Supports two input modes:
+    - OBS Virtual Camera: reads from /dev/videoX
+    - Direct simulation rendering: calls sim.render_frame()
+    """
+    def __init__(self, device_id: int = 4, simulation=None):
+        """
+        Args:
+            device_id: Video device ID for OBS Virtual Camera (e.g., 4 for /dev/video4)
+            simulation: Optional SemSorterSimulation instance for direct rendering
+        """
+        self.device_id = device_id
+        self.simulation = simulation
+        self._video_cap = None  # Reusable VideoCapture
+        self._gemini_model = None  # Lazy-initialized
+        # System instructions to enforce structured JSON output
+        self.system_instruction = (
+            "You are an AI vision system for a robotic waste sorting arm. "
+            "You are given an image of a conveyor belt with a robotic arm and waste bins. "
+            "Your task is to identify hazardous items on the conveyor belt. "
+            "Hazardous items are categorized as:\n"
+            "- FLAMMABLE: Red-colored items (cylinders, boxes)\n"
+            "- CHEMICAL: Yellow-colored items (boxes, spheres)\n\n"
+            "Safe items are gray, white, green, or blue — IGNORE these.\n\n"
+            "For each hazardous item detected, return a JSON object with:\n"
+            "- 'name': descriptive name like 'red_cylinder_1' or 'yellow_box_1'\n"
+            "- 'type': either 'FLAMMABLE' or 'CHEMICAL'\n"
+            "- 'color': the detected color (e.g., 'red', 'yellow')\n"
+            "- 'shape': the detected shape (e.g., 'cylinder', 'box', 'sphere')\n"
+            "- 'box_2d': bounding box as [ymin, xmin, ymax, xmax] normalized to 0-1000 scale\n\n"
+            "Return ONLY a JSON array of detected hazardous items. "
+            "If no hazardous items are visible, return an empty array []."
+        )
+    def _get_gemini_model(self):
+        """Lazy-initialize Gemini model (only when analyze_frame is called)."""
+        if self._gemini_model is None:
+            api_key = os.environ.get("GOOGLE_API_KEY")
+            if not api_key:
+                raise ValueError(
+                    "GOOGLE_API_KEY environment variable not set.\n"
+                    "Get one at https://aistudio.google.com/apikey"
+                )
+            genai.configure(api_key=api_key)
+            self._gemini_model = genai.GenerativeModel(
+                model_name="gemini-3-flash-preview",
+                system_instruction=self.system_instruction,
+                generation_config={"response_mime_type": "application/json"}
+            )
+        return self._gemini_model
+    def capture_frame(self) -> Image.Image:
+        """
+        Capture a single frame.
+        Uses direct simulation rendering if available, otherwise OBS camera.
+        """
+        if self.simulation is not None:
+            return self._capture_from_simulation()
+        else:
+            return self._capture_from_obs()
+    def _capture_from_simulation(self) -> Image.Image:
+        """Render a frame directly from the MuJoCo simulation."""
+        frame = self.simulation.render_frame(camera="overview")
+        return Image.fromarray(frame)
+    def _capture_from_obs(self) -> Image.Image:
+        """Capture a frame from the OBS Virtual Camera."""
+        if self._video_cap is None or not self._video_cap.isOpened():
+            self._video_cap = cv2.VideoCapture(self.device_id)
+            if not self._video_cap.isOpened():
+                raise RuntimeError(
+                    f"Could not open video device /dev/video{self.device_id}. "
+                    "Ensure OBS Virtual Camera is running."
+                )
+            # Warm up — discard stale frames
+            for _ in range(5):
+                self._video_cap.read()
+        ret, frame = self._video_cap.read()
+        if not ret:
+            raise RuntimeError("Failed to read frame from OBS Virtual Camera")
+        frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
+        return Image.fromarray(frame_rgb)
+    def analyze_frame(self, pil_image: Image.Image) -> List[Dict]:
+        """
+        Send the image to Gemini VLM and parse the structured JSON response.
+        Returns:
+            List of dicts, each with keys: name, type, color, shape, box_2d
+        """
+        prompt = (
+            "Analyze this image of a robotic sorting station. "
+            "Identify all FLAMMABLE (red) and CHEMICAL (yellow) items "
+            "on the conveyor belt. Return their positions as bounding boxes."
+        )
+        logger.info("Sending frame to Gemini VLM...")
+        model = self._get_gemini_model()
+        response = model.generate_content([prompt, pil_image])
+        raw_text = getattr(response, "text", None)
+        if not isinstance(raw_text, str) or not raw_text.strip():
+            logger.error("VLM response did not contain JSON text output")
+            return []
+        try:
+            results = json.loads(raw_text)
+            if isinstance(results, dict) and "items" in results:
+                results = results["items"]
+            if not isinstance(results, list):
+                logger.error(f"Unexpected VLM JSON shape: {type(results).__name__}")
+                return []
+            logger.info(f"VLM detected {len(results)} hazardous items")
+            return results
+        except (json.JSONDecodeError, TypeError):
+            logger.error(f"Failed to parse VLM response:\n{raw_text}")
+            return []
+    def detect_hazards(self) -> List[Dict]:
+        """
+        Full pipeline: capture frame → analyze → return results.
+        Convenience method combining capture_frame() and analyze_frame().
+        """
+        image = self.capture_frame()
+        return self.analyze_frame(image)
+    def close(self):
+        """Release video capture resources."""
+        if self._video_cap is not None:
+            self._video_cap.release()
+            self._video_cap = None
+# ─── CLI entry point ────────────────────────────────────────────────────────
+def main():
+    import argparse
+    parser = argparse.ArgumentParser(description="SemSorter Hazard Detection")
+    parser.add_argument("--direct", action="store_true",
+                        help="Use direct simulation rendering instead of OBS")
+    parser.add_argument("--device", type=int, default=4,
+                        help="OBS Virtual Camera device ID (default: 4)")
+    parser.add_argument("--output", default="vision_debug.png",
+                        help="Save captured frame to this path")
+    args = parser.parse_args()
+    logging.basicConfig(level=logging.INFO)
+    simulation = None
+    if args.direct:
+        # Must be set before importing MuJoCo/controller in this process.
+        os.environ.setdefault("MUJOCO_GL", "egl")
+        # Import and initialize simulation for direct rendering
+        try:
+            from ..simulation.controller import SemSorterSimulation
+        except ImportError:
+            sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..', 'simulation'))
+            from controller import SemSorterSimulation
+        print("Initializing simulation for direct rendering...")
+        simulation = SemSorterSimulation()
+        simulation.load_scene()
+        simulation.step(200)  # Let physics settle
+    processor = HazardDetectionProcessor(
+        device_id=args.device,
+        simulation=simulation
+    )
+    try:
+        print("Capturing frame...")
+        image = processor.capture_frame()
+        image.save(args.output)
+        print(f"Saved frame to {args.output}")
+        print("Analyzing frame with Gemini VLM...")
+        results = processor.analyze_frame(image)
+        print("\n" + "=" * 50)
+        print("  HAZARD DETECTION RESULTS")
+        print("=" * 50)
+        if not results:
+            print("  No hazardous items detected.")
+        else:
+            for i, item in enumerate(results, 1):
+                print(f"\n  [{i}] {item.get('name', 'unknown')}")
+                print(f"      Type:  {item.get('type', '?')}")
+                print(f"      Color: {item.get('color', '?')}")
+                print(f"      Shape: {item.get('shape', '?')}")
+                print(f"      Box:   {item.get('box_2d', '?')}")
+        print("\n" + "=" * 50)
+        print(f"  Total hazardous items: {len(results)}")
+        print("=" * 50)
+    finally:
+        processor.close()
+        if simulation is not None and hasattr(simulation, "close"):
+            simulation.close()
+if __name__ == "__main__":
+    main()

SemSorter/vision/vlm_bridge.py ADDED Viewed

	@@ -0,0 +1,269 @@

+"""
+SemSorter VLM-to-Simulation Bridge
+Maps VLM hazard detections to simulation item names and orchestrates
+the pick-and-place sequence. This is the glue between Phase 2 (Vision)
+and Phase 1 (Simulation).
+Usage:
+    # End-to-end test (direct render, no OBS):
+    MUJOCO_GL=egl GOOGLE_API_KEY=... python3 vlm_bridge.py --direct
+    # With OBS Virtual Camera:
+    GOOGLE_API_KEY=... python3 vlm_bridge.py
+"""
+import os
+import sys
+import logging
+from typing import List, Dict, Optional, Tuple
+try:
+    from .vision_pipeline import HazardDetectionProcessor
+except ImportError:
+    from vision_pipeline import HazardDetectionProcessor
+try:
+    from ..simulation.controller import BinType, SemSorterSimulation
+except ImportError:
+    sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..', 'simulation'))
+    from controller import BinType, SemSorterSimulation
+logger = logging.getLogger(__name__)
+class VLMSimBridge:
+    """
+    Bridge between VLM hazard detections and the simulation controller.
+    Matching strategy:
+    1. VLM detects items by color/shape → returns type (FLAMMABLE/CHEMICAL)
+    2. Simulation has named items with known hazard types
+    3. We match VLM detections to unpicked simulation items of the same type
+    4. For multiple items of the same type, we use spatial ordering (left-to-right
+       on the conveyor) to assign matches
+    """
+    def __init__(self, simulation, device_id: int = 4, use_direct: bool = False):
+        """
+        Args:
+            simulation: SemSorterSimulation instance
+            device_id: OBS Virtual Camera device ID
+            use_direct: If True, render frames from simulation instead of OBS
+        """
+        self.simulation = simulation
+        self.processor = HazardDetectionProcessor(
+            device_id=device_id,
+            simulation=simulation if use_direct else None
+        )
+    def get_unpicked_items_by_type(self, hazard_type: str) -> List[Tuple[str, float]]:
+        """
+        Get unpicked simulation items of a given hazard type,
+        sorted by X position (leftmost first = highest priority on conveyor).
+        Returns:
+            List of (item_name, x_position) tuples
+        """
+        type_map = {
+            "FLAMMABLE": BinType.FLAMMABLE,
+            "CHEMICAL": BinType.CHEMICAL,
+        }
+        target_type = type_map.get(hazard_type)
+        if target_type is None:
+            return []
+        items = []
+        for name, info in self.simulation.items.items():
+            if info.hazard_type == target_type and not info.picked:
+                pos = self.simulation.get_item_pos(name)
+                if pos is not None:
+                    items.append((name, pos[0]))  # x_position for sorting
+        # Sort by X (most negative = leftmost on conveyor = first to pick)
+        items.sort(key=lambda x: x[1])
+        return items
+    def match_detections_to_items(self, detections: List[Dict]) -> List[Dict]:
+        """
+        Match VLM detections to simulation item names.
+        Each detection gets an additional 'sim_item' key with the matched
+        simulation item name, and 'bin_type' with the target bin.
+        Returns:
+            List of matched detections with sim_item and bin_type fields added
+        """
+        # Track which items have already been matched
+        matched_items = set()
+        results = []
+        def box_left_x(det: Dict) -> float:
+            box = det.get("box_2d")
+            if isinstance(box, (list, tuple)) and len(box) >= 2:
+                try:
+                    return float(box[1])
+                except (TypeError, ValueError):
+                    pass
+            return 1000.0
+        # Group detections by type
+        for det_type in ["FLAMMABLE", "CHEMICAL"]:
+            type_detections = []
+            for d in detections:
+                if not isinstance(d, dict):
+                    continue
+                dtype = str(d.get("type", "")).strip().upper()
+                if dtype == det_type:
+                    type_detections.append(d)
+            available_items = self.get_unpicked_items_by_type(det_type)
+            # Sort detections by x position of bounding box (leftmost first)
+            type_detections.sort(key=box_left_x)
+            bin_type = BinType.FLAMMABLE if det_type == "FLAMMABLE" else BinType.CHEMICAL
+            for i, detection in enumerate(type_detections):
+                # Find first available item not yet matched
+                sim_item = None
+                for item_name, _ in available_items:
+                    if item_name not in matched_items:
+                        sim_item = item_name
+                        matched_items.add(item_name)
+                        break
+                if sim_item:
+                    detection["sim_item"] = sim_item
+                    detection["bin_type"] = bin_type
+                    results.append(detection)
+                    logger.info(f"Matched VLM '{detection.get('name')}' → "
+                               f"sim '{sim_item}' → bin '{bin_type.value}'")
+                else:
+                    logger.warning(f"No unmatched sim item for VLM detection: "
+                                  f"{detection.get('name')} ({det_type})")
+        return results
+    def detect_and_sort(self) -> Dict:
+        """
+        Full pipeline: detect hazards → match to sim items → pick and place all.
+        Returns:
+            Summary dict with detection count, sort count, and details
+        """
+        # Step 1: Detect hazards
+        logger.info("Step 1: Detecting hazards with VLM...")
+        detections = self.processor.detect_hazards()
+        logger.info(f"VLM found {len(detections)} hazardous items")
+        if not detections:
+            return {"detected": 0, "matched": 0, "sorted": 0, "details": []}
+        # Step 2: Match to simulation items
+        logger.info("Step 2: Matching detections to simulation items...")
+        matched = self.match_detections_to_items(detections)
+        logger.info(f"Matched {len(matched)} items")
+        # Step 3: Pick and place each matched item
+        logger.info("Step 3: Executing pick-and-place sequence...")
+        details = []
+        sorted_count = 0
+        for match in matched:
+            item_name = match["sim_item"]
+            bin_type = match["bin_type"]
+            vlm_name = match.get("name", "unknown")
+            logger.info(f"Sorting: {vlm_name} ({item_name}) → {bin_type.value}")
+            success = self.simulation.pick_and_place(item_name, bin_type)
+            # Let remaining items settle after the arm moves
+            self.simulation.step(200)
+            details.append({
+                "vlm_name": vlm_name,
+                "sim_item": item_name,
+                "target_bin": bin_type.value,
+                "success": success,
+            })
+            if success:
+                sorted_count += 1
+        return {
+            "detected": len(detections),
+            "matched": len(matched),
+            "sorted": sorted_count,
+            "details": details,
+        }
+    def close(self):
+        """Release resources."""
+        self.processor.close()
+# ─── CLI entry point ────────────────────────────────────────────────────────
+def main():
+    import argparse
+    parser = argparse.ArgumentParser(description="SemSorter VLM-Sim Bridge")
+    parser.add_argument("--direct", action="store_true",
+                        help="Use direct simulation rendering instead of OBS")
+    parser.add_argument("--device", type=int, default=4,
+                        help="OBS Virtual Camera device ID (default: 4)")
+    args = parser.parse_args()
+    logging.basicConfig(level=logging.INFO)
+    # Initialize simulation
+    print("Initializing simulation...")
+    if args.direct:
+        os.environ.setdefault("MUJOCO_GL", "egl")
+    sim = SemSorterSimulation()
+    sim.load_scene()
+    sim.step(200)  # Let physics settle
+    # Initialize bridge
+    bridge = VLMSimBridge(
+        simulation=sim,
+        device_id=args.device,
+        use_direct=args.direct,
+    )
+    try:
+        # Run full detect → match → sort pipeline
+        print("\n" + "=" * 60)
+        print("  SemSorter: VLM-Driven Hazard Sorting")
+        print("=" * 60)
+        result = bridge.detect_and_sort()
+        print("\n" + "=" * 60)
+        print("  SORTING RESULTS")
+        print("=" * 60)
+        print(f"  Hazards detected by VLM:  {result['detected']}")
+        print(f"  Matched to sim items:     {result['matched']}")
+        print(f"  Successfully sorted:      {result['sorted']}")
+        if result['details']:
+            print("\n  Details:")
+            for d in result['details']:
+                status = "✅" if d['success'] else "❌"
+                print(f"    {status} {d['vlm_name']} ({d['sim_item']}) → {d['target_bin']}")
+        print("=" * 60)
+        # Save final state
+        sim.save_frame("after_sort.png")
+        print(f"\nFinal scene saved to after_sort.png")
+    finally:
+        bridge.close()
+        if hasattr(sim, "close"):
+            sim.close()
+if __name__ == "__main__":
+    main()

Vision-Agents ADDED Viewed

	@@ -0,0 +1 @@


1	+ Subproject commit f684ece6c3b6540b02de9c73431a5ffe0c576f29

render.yaml ADDED Viewed

	@@ -0,0 +1,21 @@

+services:
+  - type: web
+    name: semsorter
+    env: docker
+    dockerfilePath: ./Dockerfile
+    plan: free
+    envVars:
+      - key: MUJOCO_GL
+        value: egl
+      - key: GOOGLE_API_KEY
+        sync: false       # Set in Render dashboard — not committed to git
+      - key: DEEPGRAM_API_KEY
+        sync: false
+      - key: ELEVENLABS_API_KEY
+        sync: false
+      - key: STREAM_API_KEY
+        sync: false
+      - key: STREAM_API_SECRET
+        sync: false
+    healthCheckPath: /api/state
+    autoDeploy: true

requirements-server.txt ADDED Viewed

	@@ -0,0 +1,18 @@

+# SemSorter Web Server Dependencies
+fastapi==0.115.0
+uvicorn[standard]==0.30.6
+websockets==13.1
+python-multipart==0.0.12
+httpx==0.27.2
+pillow==10.4.0
+numpy==1.26.4
+# MuJoCo (headless, EGL)
+mujoco==3.2.0
+# Google Gemini (legacy + new SDK — both used for compatibility)
+google-generativeai==0.8.3
+google-genai==1.0.0
+# dotenv for loading .env files
+python-dotenv==1.0.1