--- title: SemSorter emoji: ♻️ colorFrom: blue colorTo: indigo sdk: docker app_port: 7860 pinned: false --- # SemSorter β€” AI Hazard Sorting System > **Real-time robotic arm simulation controlled by a multimodal AI agent using the [Vision-Agents SDK](https://github.com/GetStream/vision-agents) by GetStream.** [![Demo](https://img.shields.io/badge/Live%20Demo-Render.com-4f46e5)](https://semsorter.onrender.com) [![GitHub](https://img.shields.io/badge/GitHub-KaustubhUp025-181717?logo=github)](https://github.com/KaustubhUp025/SemSorter) --- ## πŸ€– Overview SemSorter is an AI-powered hazardous waste sorting system where a Franka Panda robotic arm, simulated in MuJoCo, is controlled by a multimodal AI agent. The agent: 1. **Watches** the conveyor belt via a live camera feed 2. **Detects** hazardous items (flammable / chemical) using **Gemini VLM** 3. **Plans and executes** pick-and-place operations via **Gemini LLM function-calling** 4. **Speaks back** results using **ElevenLabs TTS** 5. **Listens** to voice commands via **Deepgram STT** All orchestration uses the **[Vision-Agents SDK](https://github.com/GetStream/vision-agents)** by GetStream. --- ## πŸ— Architecture ``` Browser ←─── WebSocket ───→ FastAPI Server β”‚ Vision-Agents SDK Agent β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” gemini.LLM deepgram.STT (tool-calling) (voiceβ†’text) β”‚ VLM Bridge β”‚ MuJoCo Sim (Franka Panda) ``` --- ## πŸš€ Quick Start ### Prerequisites - Python 3.10+ - MuJoCo 3.x - EGL (headless GPU rendering) ### Local Setup ```bash # Clone git clone https://github.com/KaustubhUp025/SemSorter.git cd SemSorter # Install dependencies pip install -r requirements-server.txt # Configure API keys cp .env.example .env # Edit .env with your keys: # GOOGLE_API_KEY, DEEPGRAM_API_KEY, ELEVENLABS_API_KEY # STREAM_API_KEY, STREAM_API_SECRET # Run MUJOCO_GL=egl uvicorn SemSorter.server.app:app --host 0.0.0.0 --port 8000 # Open http://localhost:8000 ``` ### Voice Agent (Vision-Agents SDK CLI) ```bash cd Vision-Agents MUJOCO_GL=egl uv run python ../SemSorter/agent/agent.py run ``` --- ## πŸ“¦ Project Structure ``` SemSorter/ β”œβ”€β”€ SemSorter/ β”‚ β”œβ”€β”€ simulation/ β”‚ β”‚ β”œβ”€β”€ controller.py # MuJoCo sim + IK + pick-and-place β”‚ β”‚ └── semsorter_scene.xml # MJCF scene (Panda + conveyor + bins) β”‚ β”œβ”€β”€ vision/ β”‚ β”‚ β”œβ”€β”€ vision_pipeline.py # Gemini VLM hazard detection β”‚ β”‚ └── vlm_bridge.py # VLM β†’ sim item matching β”‚ β”œβ”€β”€ agent/ β”‚ β”‚ β”œβ”€β”€ agent.py # Vision-Agents SDK agent β”‚ β”‚ └── semsorter_instructions.md β”‚ └── server/ β”‚ β”œβ”€β”€ app.py # FastAPI + WebSocket video stream β”‚ β”œβ”€β”€ agent_bridge.py # SDK bridge + quota detection β”‚ └── static/index.html # Web UI β”œβ”€β”€ Vision-Agents/ # GetStream Vision-Agents SDK β”œβ”€β”€ Dockerfile β”œβ”€β”€ render.yaml └── requirements-server.txt ``` --- ## πŸ”‘ API Keys Required | Service | Purpose | Free tier | |---|---|---| | Google Gemini | LLM orchestration + VLM detection | 15 RPM | | Deepgram | Speech-to-Text | 45 min/month | | ElevenLabs | Text-to-Speech | ~10k chars/month | | GetStream | Real-time video call (Voice agent) | Free tier available | > **API exhaustion handling:** The server detects quota errors (`429 / ResourceExhausted`) and automatically switches to demo-mode per service, showing a banner in the UI. --- ## 🐳 Deploy to Render 1. Fork this repo 2. Create a new **Web Service** on [Render.com](https://render.com) pointing to your fork 3. Add your API keys as **Environment Variables** in the Render dashboard 4. Done β€” Render auto-deploys from `render.yaml`