Spaces:
Sleeping
Sleeping
File size: 4,082 Bytes
6c166ce 2588ff8 79eb52c 2588ff8 79eb52c 2588ff8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 | ---
title: SemSorter
emoji: ♻️
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false
---
# SemSorter — AI Hazard Sorting System
> **Real-time robotic arm simulation controlled by a multimodal AI agent using the [Vision-Agents SDK](https://github.com/GetStream/vision-agents) by GetStream.**
[](https://semsorter.onrender.com)
[](https://github.com/KaustubhUp025/SemSorter)
---
## 🤖 Overview
SemSorter is an AI-powered hazardous waste sorting system where a Franka Panda robotic arm, simulated in MuJoCo, is controlled by a multimodal AI agent. The agent:
1. **Watches** the conveyor belt via a live camera feed
2. **Detects** hazardous items (flammable / chemical) using **Gemini VLM**
3. **Plans and executes** pick-and-place operations via **Gemini LLM function-calling**
4. **Speaks back** results using **ElevenLabs TTS**
5. **Listens** to voice commands via **Deepgram STT**
All orchestration uses the **[Vision-Agents SDK](https://github.com/GetStream/vision-agents)** by GetStream.
---
## 🏗 Architecture
```
Browser ←─── WebSocket ───→ FastAPI Server
│
Vision-Agents SDK Agent
┌─────────┴──────────┐
gemini.LLM deepgram.STT
(tool-calling) (voice→text)
│
VLM Bridge
│
MuJoCo Sim (Franka Panda)
```
---
## 🚀 Quick Start
### Prerequisites
- Python 3.10+
- MuJoCo 3.x
- EGL (headless GPU rendering)
### Local Setup
```bash
# Clone
git clone https://github.com/KaustubhUp025/SemSorter.git
cd SemSorter
# Install dependencies
pip install -r requirements-server.txt
# Configure API keys
cp .env.example .env
# Edit .env with your keys:
# GOOGLE_API_KEY, DEEPGRAM_API_KEY, ELEVENLABS_API_KEY
# STREAM_API_KEY, STREAM_API_SECRET
# Run
MUJOCO_GL=egl uvicorn SemSorter.server.app:app --host 0.0.0.0 --port 8000
# Open http://localhost:8000
```
### Voice Agent (Vision-Agents SDK CLI)
```bash
cd Vision-Agents
MUJOCO_GL=egl uv run python ../SemSorter/agent/agent.py run
```
---
## 📦 Project Structure
```
SemSorter/
├── SemSorter/
│ ├── simulation/
│ │ ├── controller.py # MuJoCo sim + IK + pick-and-place
│ │ └── semsorter_scene.xml # MJCF scene (Panda + conveyor + bins)
│ ├── vision/
│ │ ├── vision_pipeline.py # Gemini VLM hazard detection
│ │ └── vlm_bridge.py # VLM → sim item matching
│ ├── agent/
│ │ ├── agent.py # Vision-Agents SDK agent
│ │ └── semsorter_instructions.md
│ └── server/
│ ├── app.py # FastAPI + WebSocket video stream
│ ├── agent_bridge.py # SDK bridge + quota detection
│ └── static/index.html # Web UI
├── Vision-Agents/ # GetStream Vision-Agents SDK
├── Dockerfile
├── render.yaml
└── requirements-server.txt
```
---
## 🔑 API Keys Required
| Service | Purpose | Free tier |
|---|---|---|
| Google Gemini | LLM orchestration + VLM detection | 15 RPM |
| Deepgram | Speech-to-Text | 45 min/month |
| ElevenLabs | Text-to-Speech | ~10k chars/month |
| GetStream | Real-time video call (Voice agent) | Free tier available |
> **API exhaustion handling:** The server detects quota errors (`429 / ResourceExhausted`) and automatically switches to demo-mode per service, showing a banner in the UI.
---
## 🐳 Deploy to Render
1. Fork this repo
2. Create a new **Web Service** on [Render.com](https://render.com) pointing to your fork
3. Add your API keys as **Environment Variables** in the Render dashboard
4. Done — Render auto-deploys from `render.yaml`
|