--- title: GarbageBot β€” RL Control Center emoji: πŸ—‘οΈ colorFrom: blue colorTo: green sdk: docker app_port: 7860 pinned: false tags: - openenv - robotics - reinforcement-learning - llama-3.2 --- # πŸ€– Garbage Collecting Robot β€” OpenEnv An OpenEnv-compliant reinforcement learning environment for a garbage collecting robot. The agent must navigate a grid room to pick up garbage while managing battery constraints and storage capacity. ## Why Garbage Collection? Autonomous garbage collection is a classic robotics challenge involving pathfinding, resource management (battery), and state management (storage capacity). This environment provides a realistic training ground for AI agents to learn: - **Optimal Navigation** β€” shortest paths via BFS and Q-Learning. - **Resource Management** β€” returning to base for charging before battery depletion. - **Logistics** β€” managing a 6-unit storage bin and prioritizing unload cycles. --- ## Architecture The environment is a discrete grid world where the robot interacts with garbage, obstacles, a charging station (Home), and an Unload Station. ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Dashboardβ”‚ (FastAPI + Vanilla JS) β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ API β”‚ (app.py) β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Env Logicβ”‚ (environment.py) β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` --- ## Tasks | Task ID | Difficulty | Description | Grid Size | |---------|-----------|-------------|-----------| | `task_easy` | 🟒 Easy | Small 5x5 grid, 1 piece of garbage. | 5x5 | | `task_medium` | 🟑 Medium | 7x7 grid with obstacles, 3 pieces of garbage. | 7x7 | | `task_hard` | πŸ”΄ Hard | 10x10 maze, 5 pieces of garbage, strict battery. | 10x10 | --- ## Action Space Movement and interaction commands: - `UP`, `DOWN`, `LEFT`, `RIGHT`: Move the robot one cell. - `COLLECT`: Pick up garbage if the robot is on its cell. --- ## Observation Space The environment returns a detailed state: - `robot_position`: `(x, y)` - `garbage_positions`: List of `(x, y)` - `battery_level`: Current battery vs max. - `current_storage_load`: Current items vs capacity (6). - `robot_mode`: `normal`, `recharging`, or `unloading`. --- ## Policy Priority Chain Decisions can be driven by: 1. **Q-Learning Table** β€” pre-trained optimal policy. 2. **Llama-3.2-3B-Instruct** β€” fine-tuned LLM policy. 3. **BFS Heuristic** β€” reliable fallback pathfinding. --- ## Local Development ```bash # 1. Install dependencies pip install -r requirements.txt # 2. Start the server uvicorn app:app --host 0.0.0.0 --port 7860 # 3. Training python qlearning.py --train --episodes 10000 ``` --- ## Project Structure ``` β”œβ”€β”€ app.py # FastAPI server β”œβ”€β”€ environment.py # Core RL logic β”œβ”€β”€ models.py # Data schemas β”œβ”€β”€ scenarios.py # Task definitions β”œβ”€β”€ qlearning.py # Tabular RL training β”œβ”€β”€ inference.py # Policy resolver β”œβ”€β”€ frontend/ # Dashboard HTML/CSS/JS β”œβ”€β”€ qtable.json # Trained policy weights β”œβ”€β”€ Dockerfile # Deployment container └── README.md # This file ```