File size: 4,082 Bytes
6c166ce
 
 
 
 
 
 
 
 
 
2588ff8
 
 
 
 
79eb52c
2588ff8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79eb52c
2588ff8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
---
title: SemSorter
emoji: ♻️
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false
---

# SemSorter — AI Hazard Sorting System

> **Real-time robotic arm simulation controlled by a multimodal AI agent using the [Vision-Agents SDK](https://github.com/GetStream/vision-agents) by GetStream.**

[![Demo](https://img.shields.io/badge/Live%20Demo-Render.com-4f46e5)](https://semsorter.onrender.com)
[![GitHub](https://img.shields.io/badge/GitHub-KaustubhUp025-181717?logo=github)](https://github.com/KaustubhUp025/SemSorter)

---

## 🤖 Overview

SemSorter is an AI-powered hazardous waste sorting system where a Franka Panda robotic arm, simulated in MuJoCo, is controlled by a multimodal AI agent. The agent:

1. **Watches** the conveyor belt via a live camera feed
2. **Detects** hazardous items (flammable / chemical) using **Gemini VLM**
3. **Plans and executes** pick-and-place operations via **Gemini LLM function-calling**
4. **Speaks back** results using **ElevenLabs TTS**
5. **Listens** to voice commands via **Deepgram STT**

All orchestration uses the **[Vision-Agents SDK](https://github.com/GetStream/vision-agents)** by GetStream.

---

## 🏗 Architecture

```
Browser  ←─── WebSocket ───→  FastAPI Server

                          Vision-Agents SDK Agent
                          ┌─────────┴──────────┐
                     gemini.LLM          deepgram.STT
                     (tool-calling)      (voice→text)

                     VLM Bridge

                     MuJoCo Sim (Franka Panda)
```

---

## 🚀 Quick Start

### Prerequisites
- Python 3.10+
- MuJoCo 3.x
- EGL (headless GPU rendering)

### Local Setup

```bash
# Clone
git clone https://github.com/KaustubhUp025/SemSorter.git
cd SemSorter

# Install dependencies
pip install -r requirements-server.txt

# Configure API keys
cp .env.example .env
# Edit .env with your keys:
# GOOGLE_API_KEY, DEEPGRAM_API_KEY, ELEVENLABS_API_KEY
# STREAM_API_KEY, STREAM_API_SECRET

# Run
MUJOCO_GL=egl uvicorn SemSorter.server.app:app --host 0.0.0.0 --port 8000
# Open http://localhost:8000
```

### Voice Agent (Vision-Agents SDK CLI)
```bash
cd Vision-Agents
MUJOCO_GL=egl uv run python ../SemSorter/agent/agent.py run
```

---

## 📦 Project Structure

```
SemSorter/
├── SemSorter/
│   ├── simulation/
│   │   ├── controller.py          # MuJoCo sim + IK + pick-and-place
│   │   └── semsorter_scene.xml    # MJCF scene (Panda + conveyor + bins)
│   ├── vision/
│   │   ├── vision_pipeline.py     # Gemini VLM hazard detection
│   │   └── vlm_bridge.py         # VLM → sim item matching
│   ├── agent/
│   │   ├── agent.py               # Vision-Agents SDK agent
│   │   └── semsorter_instructions.md
│   └── server/
│       ├── app.py                 # FastAPI + WebSocket video stream
│       ├── agent_bridge.py        # SDK bridge + quota detection
│       └── static/index.html      # Web UI
├── Vision-Agents/                 # GetStream Vision-Agents SDK
├── Dockerfile
├── render.yaml
└── requirements-server.txt
```

---

## 🔑 API Keys Required

| Service | Purpose | Free tier |
|---|---|---|
| Google Gemini | LLM orchestration + VLM detection | 15 RPM |
| Deepgram | Speech-to-Text | 45 min/month |
| ElevenLabs | Text-to-Speech | ~10k chars/month |
| GetStream | Real-time video call (Voice agent) | Free tier available |

> **API exhaustion handling:** The server detects quota errors (`429 / ResourceExhausted`) and automatically switches to demo-mode per service, showing a banner in the UI.

---

## 🐳 Deploy to Render

1. Fork this repo
2. Create a new **Web Service** on [Render.com](https://render.com) pointing to your fork
3. Add your API keys as **Environment Variables** in the Render dashboard
4. Done — Render auto-deploys from `render.yaml`