Spaces:

TheSilentOne
/

SemSorter

Running

App Files Files Community

SemSorter / README.md

SemSorter

Configure HF Space for Docker

6c166ce 14 days ago

preview code

raw

history blame contribute delete

4.08 kB

metadata

title: SemSorter
emoji: ♻️
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false

SemSorter — AI Hazard Sorting System

Real-time robotic arm simulation controlled by a multimodal AI agent using the Vision-Agents SDK by GetStream.

🤖 Overview

SemSorter is an AI-powered hazardous waste sorting system where a Franka Panda robotic arm, simulated in MuJoCo, is controlled by a multimodal AI agent. The agent:

Watches the conveyor belt via a live camera feed
Detects hazardous items (flammable / chemical) using Gemini VLM
Plans and executes pick-and-place operations via Gemini LLM function-calling
Speaks back results using ElevenLabs TTS
Listens to voice commands via Deepgram STT

All orchestration uses the Vision-Agents SDK by GetStream.

🏗 Architecture

Browser  ←─── WebSocket ───→  FastAPI Server
                                    │
                          Vision-Agents SDK Agent
                          ┌─────────┴──────────┐
                     gemini.LLM          deepgram.STT
                     (tool-calling)      (voice→text)
                          │
                     VLM Bridge
                          │
                     MuJoCo Sim (Franka Panda)

🚀 Quick Start

Prerequisites

Python 3.10+
MuJoCo 3.x
EGL (headless GPU rendering)

Local Setup

# Clone
git clone https://github.com/KaustubhUp025/SemSorter.git
cd SemSorter

# Install dependencies
pip install -r requirements-server.txt

# Configure API keys
cp .env.example .env
# Edit .env with your keys:
# GOOGLE_API_KEY, DEEPGRAM_API_KEY, ELEVENLABS_API_KEY
# STREAM_API_KEY, STREAM_API_SECRET

# Run
MUJOCO_GL=egl uvicorn SemSorter.server.app:app --host 0.0.0.0 --port 8000
# Open http://localhost:8000

Voice Agent (Vision-Agents SDK CLI)

cd Vision-Agents
MUJOCO_GL=egl uv run python ../SemSorter/agent/agent.py run

📦 Project Structure

SemSorter/
├── SemSorter/
│   ├── simulation/
│   │   ├── controller.py          # MuJoCo sim + IK + pick-and-place
│   │   └── semsorter_scene.xml    # MJCF scene (Panda + conveyor + bins)
│   ├── vision/
│   │   ├── vision_pipeline.py     # Gemini VLM hazard detection
│   │   └── vlm_bridge.py         # VLM → sim item matching
│   ├── agent/
│   │   ├── agent.py               # Vision-Agents SDK agent
│   │   └── semsorter_instructions.md
│   └── server/
│       ├── app.py                 # FastAPI + WebSocket video stream
│       ├── agent_bridge.py        # SDK bridge + quota detection
│       └── static/index.html      # Web UI
├── Vision-Agents/                 # GetStream Vision-Agents SDK
├── Dockerfile
├── render.yaml
└── requirements-server.txt

🔑 API Keys Required

Service	Purpose	Free tier
Google Gemini	LLM orchestration + VLM detection	15 RPM
Deepgram	Speech-to-Text	45 min/month
ElevenLabs	Text-to-Speech	~10k chars/month
GetStream	Real-time video call (Voice agent)	Free tier available

API exhaustion handling: The server detects quota errors (429 / ResourceExhausted) and automatically switches to demo-mode per service, showing a banner in the UI.

🐳 Deploy to Render

Fork this repo
Create a new Web Service on Render.com pointing to your fork
Add your API keys as Environment Variables in the Render dashboard
Done — Render auto-deploys from render.yaml