TrashCollector / README.md
Mihir Mithani
Sync Hub-enabled code to Space (no weights)
a8d4cdf
metadata
title: GarbageBot  RL Control Center
emoji: 🗑️
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
pinned: false
tags:
  - openenv
  - robotics
  - reinforcement-learning
  - llama-3.2

🤖 Garbage Collecting Robot — OpenEnv

An OpenEnv-compliant reinforcement learning environment for a garbage collecting robot. The agent must navigate a grid room to pick up garbage while managing battery constraints and storage capacity.

Why Garbage Collection?

Autonomous garbage collection is a classic robotics challenge involving pathfinding, resource management (battery), and state management (storage capacity). This environment provides a realistic training ground for AI agents to learn:

  • Optimal Navigation — shortest paths via BFS and Q-Learning.
  • Resource Management — returning to base for charging before battery depletion.
  • Logistics — managing a 6-unit storage bin and prioritizing unload cycles.

Architecture

The environment is a discrete grid world where the robot interacts with garbage, obstacles, a charging station (Home), and an Unload Station.

┌──────────┐
│ Dashboard│ (FastAPI + Vanilla JS)
└─────┬────┘
      ▼
┌──────────┐
│ API      │ (app.py)
└─────┬────┘
      ▼
┌──────────┐
│ Env Logic│ (environment.py)
└──────────┘

Tasks

Task ID Difficulty Description Grid Size
task_easy 🟢 Easy Small 5x5 grid, 1 piece of garbage. 5x5
task_medium 🟡 Medium 7x7 grid with obstacles, 3 pieces of garbage. 7x7
task_hard 🔴 Hard 10x10 maze, 5 pieces of garbage, strict battery. 10x10

Action Space

Movement and interaction commands:

  • UP, DOWN, LEFT, RIGHT: Move the robot one cell.
  • COLLECT: Pick up garbage if the robot is on its cell.

Observation Space

The environment returns a detailed state:

  • robot_position: (x, y)
  • garbage_positions: List of (x, y)
  • battery_level: Current battery vs max.
  • current_storage_load: Current items vs capacity (6).
  • robot_mode: normal, recharging, or unloading.

Policy Priority Chain

Decisions can be driven by:

  1. Q-Learning Table — pre-trained optimal policy.
  2. Llama-3.2-3B-Instruct — fine-tuned LLM policy.
  3. BFS Heuristic — reliable fallback pathfinding.

Local Development

# 1. Install dependencies
pip install -r requirements.txt

# 2. Start the server
uvicorn app:app --host 0.0.0.0 --port 7860

# 3. Training
python qlearning.py --train --episodes 10000

Project Structure

├── app.py              # FastAPI server
├── environment.py      # Core RL logic
├── models.py           # Data schemas
├── scenarios.py        # Task definitions
├── qlearning.py        # Tabular RL training
├── inference.py        # Policy resolver
├── frontend/           # Dashboard HTML/CSS/JS
├── qtable.json         # Trained policy weights
├── Dockerfile          # Deployment container
└── README.md           # This file