Spaces:
Build error
Build error
Mihir Mithani commited on
Commit ·
a8d4cdf
0
Parent(s):
Sync Hub-enabled code to Space (no weights)
Browse files- .dockerignore +7 -0
- .env.example +4 -0
- .gitattributes +1 -0
- .gitignore +12 -0
- .gitmodules +3 -0
- DockerFile +16 -0
- Dockerfile +12 -0
- OpenEnv +1 -0
- README.md +115 -0
- app.py +253 -0
- code.py +78 -0
- code2.py +34 -0
- environment.py +500 -0
- fixed_dataset.jsonl +0 -0
- fixer.py +28 -0
- frontend/index.html +166 -0
- frontend/script.js +515 -0
- frontend/style.css +634 -0
- garbage_robot_dataset.jsonl +35 -0
- inference.py +520 -0
- inference_output.log +240 -0
- models.py +79 -0
- openenv.yaml +20 -0
- pyproject.toml +23 -0
- qlearning.py +345 -0
- qtable.json +0 -0
- requirements.txt +19 -0
- rl_trajectories.jsonl +0 -0
- run_pipeline.sh +17 -0
- scenarios.py +50 -0
- server.log +0 -0
- server.pid +1 -0
- test_env.py +19 -0
- train.pid +1 -0
- train_output.log +4 -0
- train_unsloth.py +144 -0
.dockerignore
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Meta/
|
| 2 |
+
Robot/
|
| 3 |
+
.venv/
|
| 4 |
+
__pycache__/
|
| 5 |
+
*.pyc
|
| 6 |
+
.git/
|
| 7 |
+
.env
|
.env.example
ADDED
|
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
API_BASE_URL=https://api.openai.com/v1
|
| 2 |
+
MODEL_NAME=gpt-4o-mini
|
| 3 |
+
HF_TOKEN=your_hf_or_api_key_here
|
| 4 |
+
ENV_URL=http://localhost:7860
|
.gitattributes
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
model/*.safetensors filter=lfs diff=lfs merge=lfs -text
|
.gitignore
ADDED
|
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
.venv/
|
| 2 |
+
venv/
|
| 3 |
+
model/
|
| 4 |
+
Meta/
|
| 5 |
+
Robot/
|
| 6 |
+
__pycache__/
|
| 7 |
+
*.pyc
|
| 8 |
+
.env
|
| 9 |
+
uv.lock
|
| 10 |
+
test_run.log
|
| 11 |
+
hf_test.log
|
| 12 |
+
hf_test2.log
|
.gitmodules
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[submodule "OpenEnv"]
|
| 2 |
+
path = OpenEnv
|
| 3 |
+
url = "https://github.com/techavenger123/OpenEnv#"
|
DockerFile
ADDED
|
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
FROM python:3.11-slim
|
| 2 |
+
|
| 3 |
+
WORKDIR /app
|
| 4 |
+
|
| 5 |
+
# Install Python dependencies first (cached layer)
|
| 6 |
+
COPY requirements.txt .
|
| 7 |
+
RUN pip install --no-cache-dir -r requirements.txt
|
| 8 |
+
|
| 9 |
+
# Copy all project files
|
| 10 |
+
COPY . .
|
| 11 |
+
|
| 12 |
+
# HuggingFace Spaces requires port 7860
|
| 13 |
+
EXPOSE 7860
|
| 14 |
+
|
| 15 |
+
# Launch FastAPI server
|
| 16 |
+
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
|
Dockerfile
ADDED
|
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
FROM python:3.11-slim
|
| 2 |
+
|
| 3 |
+
WORKDIR /app
|
| 4 |
+
|
| 5 |
+
COPY requirements.txt .
|
| 6 |
+
RUN pip install --no-cache-dir -r requirements.txt
|
| 7 |
+
|
| 8 |
+
COPY . .
|
| 9 |
+
|
| 10 |
+
EXPOSE 7860
|
| 11 |
+
|
| 12 |
+
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
|
OpenEnv
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
Subproject commit c719decf2b19175d5ca35301d58a14c83e985480
|
README.md
ADDED
|
@@ -0,0 +1,115 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: GarbageBot — RL Control Center
|
| 3 |
+
emoji: 🗑️
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: green
|
| 6 |
+
sdk: docker
|
| 7 |
+
app_port: 7860
|
| 8 |
+
pinned: false
|
| 9 |
+
tags:
|
| 10 |
+
- openenv
|
| 11 |
+
- robotics
|
| 12 |
+
- reinforcement-learning
|
| 13 |
+
- llama-3.2
|
| 14 |
+
---
|
| 15 |
+
|
| 16 |
+
# 🤖 Garbage Collecting Robot — OpenEnv
|
| 17 |
+
|
| 18 |
+
An OpenEnv-compliant reinforcement learning environment for a garbage collecting robot. The agent must navigate a grid room to pick up garbage while managing battery constraints and storage capacity.
|
| 19 |
+
|
| 20 |
+
## Why Garbage Collection?
|
| 21 |
+
|
| 22 |
+
Autonomous garbage collection is a classic robotics challenge involving pathfinding, resource management (battery), and state management (storage capacity). This environment provides a realistic training ground for AI agents to learn:
|
| 23 |
+
- **Optimal Navigation** — shortest paths via BFS and Q-Learning.
|
| 24 |
+
- **Resource Management** — returning to base for charging before battery depletion.
|
| 25 |
+
- **Logistics** — managing a 6-unit storage bin and prioritizing unload cycles.
|
| 26 |
+
|
| 27 |
+
---
|
| 28 |
+
|
| 29 |
+
## Architecture
|
| 30 |
+
|
| 31 |
+
The environment is a discrete grid world where the robot interacts with garbage, obstacles, a charging station (Home), and an Unload Station.
|
| 32 |
+
|
| 33 |
+
```
|
| 34 |
+
┌──────────┐
|
| 35 |
+
│ Dashboard│ (FastAPI + Vanilla JS)
|
| 36 |
+
└─────┬────┘
|
| 37 |
+
▼
|
| 38 |
+
┌──────────┐
|
| 39 |
+
│ API │ (app.py)
|
| 40 |
+
└─────┬────┘
|
| 41 |
+
▼
|
| 42 |
+
┌──────────┐
|
| 43 |
+
│ Env Logic│ (environment.py)
|
| 44 |
+
└──────────┘
|
| 45 |
+
```
|
| 46 |
+
|
| 47 |
+
---
|
| 48 |
+
|
| 49 |
+
## Tasks
|
| 50 |
+
|
| 51 |
+
| Task ID | Difficulty | Description | Grid Size |
|
| 52 |
+
|---------|-----------|-------------|-----------|
|
| 53 |
+
| `task_easy` | 🟢 Easy | Small 5x5 grid, 1 piece of garbage. | 5x5 |
|
| 54 |
+
| `task_medium` | 🟡 Medium | 7x7 grid with obstacles, 3 pieces of garbage. | 7x7 |
|
| 55 |
+
| `task_hard` | 🔴 Hard | 10x10 maze, 5 pieces of garbage, strict battery. | 10x10 |
|
| 56 |
+
|
| 57 |
+
---
|
| 58 |
+
|
| 59 |
+
## Action Space
|
| 60 |
+
|
| 61 |
+
Movement and interaction commands:
|
| 62 |
+
- `UP`, `DOWN`, `LEFT`, `RIGHT`: Move the robot one cell.
|
| 63 |
+
- `COLLECT`: Pick up garbage if the robot is on its cell.
|
| 64 |
+
|
| 65 |
+
---
|
| 66 |
+
|
| 67 |
+
## Observation Space
|
| 68 |
+
|
| 69 |
+
The environment returns a detailed state:
|
| 70 |
+
- `robot_position`: `(x, y)`
|
| 71 |
+
- `garbage_positions`: List of `(x, y)`
|
| 72 |
+
- `battery_level`: Current battery vs max.
|
| 73 |
+
- `current_storage_load`: Current items vs capacity (6).
|
| 74 |
+
- `robot_mode`: `normal`, `recharging`, or `unloading`.
|
| 75 |
+
|
| 76 |
+
---
|
| 77 |
+
|
| 78 |
+
## Policy Priority Chain
|
| 79 |
+
|
| 80 |
+
Decisions can be driven by:
|
| 81 |
+
1. **Q-Learning Table** — pre-trained optimal policy.
|
| 82 |
+
2. **Llama-3.2-3B-Instruct** — fine-tuned LLM policy.
|
| 83 |
+
3. **BFS Heuristic** — reliable fallback pathfinding.
|
| 84 |
+
|
| 85 |
+
---
|
| 86 |
+
|
| 87 |
+
## Local Development
|
| 88 |
+
|
| 89 |
+
```bash
|
| 90 |
+
# 1. Install dependencies
|
| 91 |
+
pip install -r requirements.txt
|
| 92 |
+
|
| 93 |
+
# 2. Start the server
|
| 94 |
+
uvicorn app:app --host 0.0.0.0 --port 7860
|
| 95 |
+
|
| 96 |
+
# 3. Training
|
| 97 |
+
python qlearning.py --train --episodes 10000
|
| 98 |
+
```
|
| 99 |
+
|
| 100 |
+
---
|
| 101 |
+
|
| 102 |
+
## Project Structure
|
| 103 |
+
|
| 104 |
+
```
|
| 105 |
+
├── app.py # FastAPI server
|
| 106 |
+
├── environment.py # Core RL logic
|
| 107 |
+
├── models.py # Data schemas
|
| 108 |
+
├── scenarios.py # Task definitions
|
| 109 |
+
├── qlearning.py # Tabular RL training
|
| 110 |
+
├── inference.py # Policy resolver
|
| 111 |
+
├── frontend/ # Dashboard HTML/CSS/JS
|
| 112 |
+
├── qtable.json # Trained policy weights
|
| 113 |
+
├── Dockerfile # Deployment container
|
| 114 |
+
└── README.md # This file
|
| 115 |
+
```
|
app.py
ADDED
|
@@ -0,0 +1,253 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
FastAPI server for the Garbage Collecting Robot OpenEnv environment.
|
| 3 |
+
Exposes reset / step / state / tasks / grade / policy / configure endpoints.
|
| 4 |
+
|
| 5 |
+
Fix applied:
|
| 6 |
+
- /policy BFS fallback now uses env.get_observation().dict() instead of
|
| 7 |
+
a hand-built incomplete dict (which was missing robot_mode, home_position,
|
| 8 |
+
unload_station, current_storage_load, storage_capacity, distance_from_home).
|
| 9 |
+
- Static files and /ui route added so the HTML dashboard is served from the
|
| 10 |
+
same origin — required for HuggingFace Spaces deployment.
|
| 11 |
+
"""
|
| 12 |
+
|
| 13 |
+
import os
|
| 14 |
+
import sys
|
| 15 |
+
sys.path.insert(0, os.path.dirname(__file__))
|
| 16 |
+
|
| 17 |
+
from typing import List
|
| 18 |
+
from pydantic import BaseModel
|
| 19 |
+
from fastapi import FastAPI, HTTPException
|
| 20 |
+
from fastapi.middleware.cors import CORSMiddleware
|
| 21 |
+
from fastapi.staticfiles import StaticFiles
|
| 22 |
+
from fastapi.responses import FileResponse
|
| 23 |
+
|
| 24 |
+
from environment import GarbageRobotEnv
|
| 25 |
+
from models import (
|
| 26 |
+
Action, StepOutput, ResetInput, ResetOutput, CustomResetInput, State, Task,
|
| 27 |
+
)
|
| 28 |
+
|
| 29 |
+
app = FastAPI(
|
| 30 |
+
title="Garbage Collecting Robot — OpenEnv",
|
| 31 |
+
description=(
|
| 32 |
+
"An OpenEnv-compliant robotics environment for garbage collection. "
|
| 33 |
+
"AI agents must navigate a grid room to pick up garbage while managing battery constraints."
|
| 34 |
+
),
|
| 35 |
+
version="1.0.0",
|
| 36 |
+
)
|
| 37 |
+
|
| 38 |
+
app.add_middleware(
|
| 39 |
+
CORSMiddleware,
|
| 40 |
+
allow_origins=["*"],
|
| 41 |
+
allow_methods=["*"],
|
| 42 |
+
allow_headers=["*"],
|
| 43 |
+
)
|
| 44 |
+
|
| 45 |
+
env = GarbageRobotEnv()
|
| 46 |
+
|
| 47 |
+
TASKS = [
|
| 48 |
+
Task(
|
| 49 |
+
id="task_easy",
|
| 50 |
+
name="Small Room Clean",
|
| 51 |
+
description="Navigate a small 5x5 grid to collect 1 piece of garbage.",
|
| 52 |
+
difficulty="easy",
|
| 53 |
+
reward_range=[0.0, 1.0],
|
| 54 |
+
),
|
| 55 |
+
Task(
|
| 56 |
+
id="task_medium",
|
| 57 |
+
name="Medium Room with Obstacles",
|
| 58 |
+
description="Navigate a 7x7 grid to collect 3 pieces of garbage with limited battery.",
|
| 59 |
+
difficulty="medium",
|
| 60 |
+
reward_range=[0.0, 1.0],
|
| 61 |
+
),
|
| 62 |
+
Task(
|
| 63 |
+
id="task_hard",
|
| 64 |
+
name="Large Maze Cleanup",
|
| 65 |
+
description="Navigate a 10x10 maze avoiding obstacles to collect 5 pieces of garbage with strict battery usage.",
|
| 66 |
+
difficulty="hard",
|
| 67 |
+
reward_range=[0.0, 1.0],
|
| 68 |
+
),
|
| 69 |
+
]
|
| 70 |
+
|
| 71 |
+
VALID_IDS = {t.id for t in TASKS}
|
| 72 |
+
|
| 73 |
+
@app.get("/", tags=["health"])
|
| 74 |
+
def health():
|
| 75 |
+
return {"status": "ok", "env": "garbage-collecting-robot"}
|
| 76 |
+
|
| 77 |
+
@app.post("/reset", response_model=ResetOutput, tags=["openenv"])
|
| 78 |
+
def reset(body: ResetInput = ResetInput()):
|
| 79 |
+
if body.task_id not in VALID_IDS:
|
| 80 |
+
raise HTTPException(400, f"task_id must be one of {sorted(VALID_IDS)}")
|
| 81 |
+
state = env.reset(task_id=body.task_id)
|
| 82 |
+
return {"observation": env.get_observation().dict()}
|
| 83 |
+
|
| 84 |
+
@app.post("/reset_custom", response_model=ResetOutput, tags=["openenv"])
|
| 85 |
+
def reset_custom(body: CustomResetInput):
|
| 86 |
+
"""
|
| 87 |
+
Dynamic reset endpoint. Lets callers specify garbage positions,
|
| 88 |
+
obstacle positions, robot start, grid size and battery at runtime.
|
| 89 |
+
Any omitted field falls back to the base scenario's value.
|
| 90 |
+
"""
|
| 91 |
+
env.reset_custom(
|
| 92 |
+
task_id=body.task_id,
|
| 93 |
+
grid_size=body.grid_size,
|
| 94 |
+
robot_start=body.robot_start,
|
| 95 |
+
garbage_positions=body.garbage_positions,
|
| 96 |
+
obstacle_positions=body.obstacle_positions,
|
| 97 |
+
max_battery=body.max_battery,
|
| 98 |
+
storage_capacity=body.storage_capacity,
|
| 99 |
+
home_position=body.home_position,
|
| 100 |
+
unload_station=body.unload_station,
|
| 101 |
+
)
|
| 102 |
+
return {"observation": env.get_observation().dict()}
|
| 103 |
+
|
| 104 |
+
@app.post("/step", response_model=StepOutput, tags=["openenv"])
|
| 105 |
+
def step(body: Action):
|
| 106 |
+
result = env.step(command=body.command)
|
| 107 |
+
return result
|
| 108 |
+
|
| 109 |
+
@app.get("/state", response_model=State, tags=["openenv"])
|
| 110 |
+
def state():
|
| 111 |
+
return env.state()
|
| 112 |
+
|
| 113 |
+
@app.get("/tasks", response_model=list[Task], tags=["openenv"])
|
| 114 |
+
def tasks():
|
| 115 |
+
return TASKS
|
| 116 |
+
|
| 117 |
+
@app.get("/grade/{task_id}", tags=["grading"])
|
| 118 |
+
def grade(task_id: str):
|
| 119 |
+
if task_id not in VALID_IDS:
|
| 120 |
+
raise HTTPException(400, f"task_id must be one of {sorted(VALID_IDS)}")
|
| 121 |
+
score = env.grade(task_id)
|
| 122 |
+
return {"task_id": task_id, "score": score, "reward_range": [0.0, 1.0]}
|
| 123 |
+
|
| 124 |
+
|
| 125 |
+
# ── Policy endpoint (fine-tuned LLM) ──────────────────────────────────────
|
| 126 |
+
|
| 127 |
+
LOCAL_MODEL_PATH = os.environ.get(
|
| 128 |
+
"LOCAL_MODEL_PATH",
|
| 129 |
+
"TechAvenger/GarbageBot-Weights"
|
| 130 |
+
)
|
| 131 |
+
|
| 132 |
+
_policy_model = None
|
| 133 |
+
_policy_tokenizer = None
|
| 134 |
+
_policy_loaded = False
|
| 135 |
+
|
| 136 |
+
def _load_policy():
|
| 137 |
+
global _policy_model, _policy_tokenizer, _policy_loaded
|
| 138 |
+
if _policy_loaded:
|
| 139 |
+
return
|
| 140 |
+
try:
|
| 141 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 142 |
+
import torch
|
| 143 |
+
_policy_tokenizer = AutoTokenizer.from_pretrained(LOCAL_MODEL_PATH)
|
| 144 |
+
_policy_model = AutoModelForCausalLM.from_pretrained(
|
| 145 |
+
LOCAL_MODEL_PATH, torch_dtype=torch.float16, device_map="auto"
|
| 146 |
+
)
|
| 147 |
+
_policy_model.eval()
|
| 148 |
+
print(f"[Policy] Fine-tuned model loaded from {LOCAL_MODEL_PATH}")
|
| 149 |
+
except Exception as e:
|
| 150 |
+
print(f"[Policy] Model unavailable: {e}")
|
| 151 |
+
_policy_loaded = True
|
| 152 |
+
|
| 153 |
+
|
| 154 |
+
class PolicyInput(BaseModel):
|
| 155 |
+
message: str # the obs.message string from the environment
|
| 156 |
+
|
| 157 |
+
@app.post("/policy", tags=["openenv"])
|
| 158 |
+
def policy(body: PolicyInput):
|
| 159 |
+
"""
|
| 160 |
+
Ask the fine-tuned LLM for the next action.
|
| 161 |
+
Returns {"action": "UP|DOWN|LEFT|RIGHT|COLLECT", "source": "llm|bfs"}
|
| 162 |
+
"""
|
| 163 |
+
_load_policy()
|
| 164 |
+
VALID = ["UP", "DOWN", "LEFT", "RIGHT", "COLLECT"]
|
| 165 |
+
|
| 166 |
+
if _policy_model is not None and _policy_tokenizer is not None:
|
| 167 |
+
try:
|
| 168 |
+
import torch
|
| 169 |
+
instruction = (
|
| 170 |
+
"You are an AI brain controlling a garbage collecting robot.\n"
|
| 171 |
+
"Reply with EXACTLY ONE of: UP DOWN LEFT RIGHT COLLECT"
|
| 172 |
+
)
|
| 173 |
+
prompt = (
|
| 174 |
+
f"### Instruction:\n{instruction}\n\n"
|
| 175 |
+
f"### Input:\nENVIRONMENT STATUS:\n{body.message}\n\n"
|
| 176 |
+
f"### Response:\n"
|
| 177 |
+
)
|
| 178 |
+
inputs = _policy_tokenizer(
|
| 179 |
+
prompt, return_tensors="pt", truncation=True, max_length=512
|
| 180 |
+
).to(_policy_model.device)
|
| 181 |
+
with torch.no_grad():
|
| 182 |
+
outputs = _policy_model.generate(
|
| 183 |
+
**inputs, max_new_tokens=6, do_sample=False,
|
| 184 |
+
pad_token_id=_policy_tokenizer.eos_token_id
|
| 185 |
+
)
|
| 186 |
+
new_tokens = outputs[0][inputs["input_ids"].shape[1]:]
|
| 187 |
+
raw = _policy_tokenizer.decode(new_tokens, skip_special_tokens=True).strip().upper()
|
| 188 |
+
for v in VALID:
|
| 189 |
+
if v in raw:
|
| 190 |
+
return {"action": v, "source": "llm", "raw": raw}
|
| 191 |
+
except Exception as e:
|
| 192 |
+
print(f"[Policy] Inference error: {e}")
|
| 193 |
+
|
| 194 |
+
# FIX: use env.get_observation().dict() so heuristic_action() receives
|
| 195 |
+
# all required fields (robot_mode, home_position, unload_station, etc.)
|
| 196 |
+
# instead of the previous hand-built incomplete dict.
|
| 197 |
+
from inference import heuristic_action
|
| 198 |
+
obs_dict = env.get_observation().dict()
|
| 199 |
+
obs_dict["message"] = body.message # use the caller's message for context
|
| 200 |
+
return {"action": heuristic_action(obs_dict), "source": "bfs"}
|
| 201 |
+
|
| 202 |
+
|
| 203 |
+
# ── Dynamic garbage placement ──────────────────────────────────────────────
|
| 204 |
+
|
| 205 |
+
class ConfigureInput(BaseModel):
|
| 206 |
+
task_id: str = "task_easy"
|
| 207 |
+
garbage_positions: List[List[int]] # [[x,y], ...]
|
| 208 |
+
|
| 209 |
+
@app.post("/configure", tags=["openenv"])
|
| 210 |
+
def configure(body: ConfigureInput):
|
| 211 |
+
"""
|
| 212 |
+
Reset the environment for task_id, then override garbage positions
|
| 213 |
+
with whatever the caller supplies.
|
| 214 |
+
"""
|
| 215 |
+
if body.task_id not in VALID_IDS:
|
| 216 |
+
raise HTTPException(400, f"task_id must be one of {sorted(VALID_IDS)}")
|
| 217 |
+
|
| 218 |
+
env.reset(task_id=body.task_id)
|
| 219 |
+
|
| 220 |
+
validated = []
|
| 221 |
+
for pos in body.garbage_positions:
|
| 222 |
+
if len(pos) != 2:
|
| 223 |
+
raise HTTPException(400, f"Each position must be [x, y], got {pos}")
|
| 224 |
+
x, y = pos
|
| 225 |
+
gw, gh = env.grid_size
|
| 226 |
+
if not (0 <= x < gw and 0 <= y < gh):
|
| 227 |
+
raise HTTPException(400, f"Position {pos} out of bounds for grid {env.grid_size}")
|
| 228 |
+
if [x, y] in env.obstacle_positions:
|
| 229 |
+
raise HTTPException(400, f"Position {pos} is an obstacle")
|
| 230 |
+
validated.append([x, y])
|
| 231 |
+
|
| 232 |
+
env.garbage_positions = validated
|
| 233 |
+
|
| 234 |
+
return {"observation": env.get_observation().dict()}
|
| 235 |
+
|
| 236 |
+
|
| 237 |
+
# ── Serve HTML dashboard ───────────────────────────────────────────────────
|
| 238 |
+
# This makes the frontend accessible at /ui on the same origin as the API,
|
| 239 |
+
# which is required for HuggingFace Spaces (no localhost cross-origin issues).
|
| 240 |
+
|
| 241 |
+
@app.get("/ui", include_in_schema=False)
|
| 242 |
+
def ui():
|
| 243 |
+
"""Serve the dashboard HTML."""
|
| 244 |
+
return FileResponse("frontend/index.html")
|
| 245 |
+
|
| 246 |
+
# Mount static assets (style.css, script.js) at /static
|
| 247 |
+
if os.path.exists("frontend/style.css") or os.path.exists("frontend/script.js"):
|
| 248 |
+
app.mount("/static", StaticFiles(directory="frontend"), name="static")
|
| 249 |
+
|
| 250 |
+
|
| 251 |
+
if __name__ == "__main__":
|
| 252 |
+
import uvicorn
|
| 253 |
+
uvicorn.run(app, host="0.0.0.0", port=7860)
|
code.py
ADDED
|
@@ -0,0 +1,78 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
code.py — Seed training data generator for the Garbage Collecting Robot.
|
| 3 |
+
|
| 4 |
+
Fix applied:
|
| 5 |
+
- All trajectory entries now use the unified {"text": "..."} Alpaca format.
|
| 6 |
+
- Previously the first entry used {"text": ...} while all others used
|
| 7 |
+
{"obs": ..., "action": ...}, causing fixer.py to silently skip them
|
| 8 |
+
(KeyError on the missing "text" key).
|
| 9 |
+
"""
|
| 10 |
+
|
| 11 |
+
import json
|
| 12 |
+
|
| 13 |
+
INSTRUCTION = (
|
| 14 |
+
"You control a garbage collecting robot. "
|
| 15 |
+
"Reply with ONE of: UP DOWN LEFT RIGHT COLLECT"
|
| 16 |
+
)
|
| 17 |
+
|
| 18 |
+
def alpaca(obs: str, action: str) -> dict:
|
| 19 |
+
"""Wrap an obs/action pair into the Alpaca fine-tuning format."""
|
| 20 |
+
return {
|
| 21 |
+
"text": (
|
| 22 |
+
f"### Instruction:\n{INSTRUCTION}\n\n"
|
| 23 |
+
f"### Input:\nENVIRONMENT STATUS:\n{obs}\n\n"
|
| 24 |
+
f"### Response:\n{action}"
|
| 25 |
+
)
|
| 26 |
+
}
|
| 27 |
+
|
| 28 |
+
|
| 29 |
+
trajectories = [
|
| 30 |
+
# --- task_easy: straight-line approach ---
|
| 31 |
+
alpaca("You are at (0, 0). Garbage at [(4, 4)]. Battery: 30/30. No obstacles nearby.", "RIGHT"),
|
| 32 |
+
alpaca("You are at (1, 0). Garbage at [(4, 4)]. Battery: 29/30. No obstacles nearby.", "RIGHT"),
|
| 33 |
+
alpaca("You are at (2, 0). Garbage at [(4, 4)]. Battery: 28/30. No obstacles nearby.", "RIGHT"),
|
| 34 |
+
alpaca("You are at (3, 0). Garbage at [(4, 4)]. Battery: 27/30. No obstacles nearby.", "RIGHT"),
|
| 35 |
+
alpaca("You are at (4, 0). Garbage at [(4, 4)]. Battery: 26/30. No obstacles nearby.", "UP"),
|
| 36 |
+
alpaca("You are at (4, 1). Garbage at [(4, 4)]. Battery: 25/30. No obstacles nearby.", "UP"),
|
| 37 |
+
alpaca("You are at (4, 2). Garbage at [(4, 4)]. Battery: 24/30. No obstacles nearby.", "UP"),
|
| 38 |
+
alpaca("You are at (4, 3). Garbage at [(4, 4)]. Battery: 23/30. No obstacles nearby.", "UP"),
|
| 39 |
+
alpaca("You are at (4, 4). Garbage at [(4, 4)]. Battery: 22/30. You are ON the garbage.", "COLLECT"),
|
| 40 |
+
|
| 41 |
+
# --- task_medium: obstacle avoidance ---
|
| 42 |
+
alpaca("You are at (3, 3). Garbage at [(1,1),(5,5),(1,5)]. Battery: 50/50. BLOCKED! DOWN is an obstacle. Blocked directions: DOWN, LEFT. Choose a different direction.", "UP"),
|
| 43 |
+
alpaca("You are at (3, 4). Garbage at [(1,1),(5,5),(1,5)]. Battery: 49/50. Moving toward (1,5).", "LEFT"),
|
| 44 |
+
alpaca("You are at (2, 4). Garbage at [(1,1),(5,5),(1,5)]. Battery: 48/50. BLOCKED! LEFT is an obstacle. Blocked directions: LEFT. Choose RIGHT or UP.", "UP"),
|
| 45 |
+
alpaca("You are at (2, 5). Garbage at [(1,1),(5,5),(1,5)]. Battery: 47/50. Clear path left.", "LEFT"),
|
| 46 |
+
alpaca("You are at (1, 5). Garbage at [(1,1),(5,5),(1,5)]. Battery: 46/50. You are ON the garbage.", "COLLECT"),
|
| 47 |
+
alpaca("You are at (1, 5). Garbage at [(1,1),(5,5)]. Battery: 45/50. Next target (5,5), moving right.", "RIGHT"),
|
| 48 |
+
alpaca("You are at (2, 5). Garbage at [(1,1),(5,5)]. Battery: 44/50. Continuing right.", "RIGHT"),
|
| 49 |
+
alpaca("You are at (3, 5). Garbage at [(1,1),(5,5)]. Battery: 43/50. Continuing right.", "RIGHT"),
|
| 50 |
+
alpaca("You are at (4, 5). Garbage at [(1,1),(5,5)]. Battery: 42/50. Continuing right.", "RIGHT"),
|
| 51 |
+
alpaca("You are at (5, 5). Garbage at [(1,1),(5,5)]. Battery: 41/50. You are ON the garbage.", "COLLECT"),
|
| 52 |
+
alpaca("You are at (5, 5). Garbage at [(1,1)]. Battery: 40/50. Last garbage at (1,1), heading left+down.", "LEFT"),
|
| 53 |
+
alpaca("You are at (4, 5). Garbage at [(1,1)]. Battery: 39/50. Continuing toward (1,1).", "LEFT"),
|
| 54 |
+
alpaca("You are at (3, 5). Garbage at [(1,1)]. Battery: 38/50. BLOCKED! DOWN is an obstacle. Go LEFT.", "LEFT"),
|
| 55 |
+
alpaca("You are at (2, 5). Garbage at [(1,1)]. Battery: 37/50. BLOCKED! DOWN is an obstacle. Go LEFT.", "LEFT"),
|
| 56 |
+
alpaca("You are at (1, 5). Garbage at [(1,1)]. Battery: 36/50. Path down is clear now.", "DOWN"),
|
| 57 |
+
alpaca("You are at (1, 4). Garbage at [(1,1)]. Battery: 35/50. Continuing down.", "DOWN"),
|
| 58 |
+
alpaca("You are at (1, 3). Garbage at [(1,1)]. Battery: 34/50. Continuing down.", "DOWN"),
|
| 59 |
+
alpaca("You are at (1, 2). Garbage at [(1,1)]. Battery: 33/50. Continuing down.", "DOWN"),
|
| 60 |
+
alpaca("You are at (1, 1). Garbage at [(1,1)]. Battery: 32/50. You are ON the last garbage.", "COLLECT"),
|
| 61 |
+
|
| 62 |
+
# --- low battery urgency ---
|
| 63 |
+
alpaca("You are at (2, 2). Garbage at [(4,4)]. Battery: 5/30. CRITICAL battery! Move directly: RIGHT.", "RIGHT"),
|
| 64 |
+
alpaca("You are at (3, 2). Garbage at [(4,4)]. Battery: 4/30. CRITICAL battery! Move directly: RIGHT.", "RIGHT"),
|
| 65 |
+
alpaca("You are at (4, 2). Garbage at [(4,4)]. Battery: 3/30. CRITICAL battery! Move directly: UP.", "UP"),
|
| 66 |
+
alpaca("You are at (4, 3). Garbage at [(4,4)]. Battery: 2/30. CRITICAL battery! Move directly: UP.", "UP"),
|
| 67 |
+
alpaca("You are at (4, 4). Garbage at [(4,4)]. Battery: 1/30. You are ON the garbage. COLLECT NOW.", "COLLECT"),
|
| 68 |
+
|
| 69 |
+
# --- do not collect when not on garbage ---
|
| 70 |
+
alpaca("You are at (2, 3). Garbage at [(4,4)]. Battery: 20/30. You are NOT on garbage. Move toward it.", "RIGHT"),
|
| 71 |
+
alpaca("You are at (0, 0). Garbage at [(3,3)]. Battery: 15/30. You are NOT on garbage. Do not COLLECT.", "RIGHT"),
|
| 72 |
+
]
|
| 73 |
+
|
| 74 |
+
with open("garbage_robot_dataset.jsonl", "w") as f:
|
| 75 |
+
for row in trajectories:
|
| 76 |
+
f.write(json.dumps(row) + "\n")
|
| 77 |
+
|
| 78 |
+
print(f"Wrote {len(trajectories)} samples to garbage_robot_dataset.jsonl")
|
code2.py
ADDED
|
@@ -0,0 +1,34 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from qlearning import QTable, ACTIONS, encode_state
|
| 2 |
+
from environment import GarbageRobotEnv
|
| 3 |
+
from scenarios import SCENARIOS
|
| 4 |
+
import json
|
| 5 |
+
|
| 6 |
+
qt = QTable()
|
| 7 |
+
qt.load('qtable.json')
|
| 8 |
+
env = GarbageRobotEnv()
|
| 9 |
+
|
| 10 |
+
instruction = '''You are an AI brain controlling a garbage collecting robot.
|
| 11 |
+
Reply with EXACTLY ONE of: UP DOWN LEFT RIGHT COLLECT'''
|
| 12 |
+
|
| 13 |
+
alpaca = '''### Instruction:\n{}\n\n### Input:\nENVIRONMENT STATUS:\n{}\n\n### Response:\n{}'''
|
| 14 |
+
|
| 15 |
+
data = []
|
| 16 |
+
for task_id in SCENARIOS:
|
| 17 |
+
for _ in range(10): # 10 episodes per task
|
| 18 |
+
env.reset(task_id)
|
| 19 |
+
done = False
|
| 20 |
+
while not done:
|
| 21 |
+
obs_obj = env.get_observation()
|
| 22 |
+
obs = {'robot_position': obs_obj.robot_position,
|
| 23 |
+
'garbage_positions': list(obs_obj.garbage_positions),
|
| 24 |
+
'grid_size': obs_obj.grid_size}
|
| 25 |
+
state = encode_state(obs)
|
| 26 |
+
action = ACTIONS[qt.best_action(state)]
|
| 27 |
+
data.append({'text': alpaca.format(instruction, obs_obj.message, action)})
|
| 28 |
+
result = env.step(action)
|
| 29 |
+
done = result['done']
|
| 30 |
+
|
| 31 |
+
with open('rl_trajectories.jsonl', 'w') as f:
|
| 32 |
+
for row in data:
|
| 33 |
+
f.write(json.dumps(row) + '\n')
|
| 34 |
+
print(f'Generated {len(data)} samples')
|
environment.py
ADDED
|
@@ -0,0 +1,500 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
environment.py — Garbage Collecting Robot Core RL Environment.
|
| 3 |
+
|
| 4 |
+
Fixes applied:
|
| 5 |
+
• Battery no longer drains during autonomous CHARGE / UNLOAD_HERE steps.
|
| 6 |
+
• Recharge guard now checks `not self.done` instead of `self.garbage_positions`
|
| 7 |
+
so it also fires correctly at episode boundaries.
|
| 8 |
+
"""
|
| 9 |
+
|
| 10 |
+
from typing import Any, Dict, Optional, List, Tuple
|
| 11 |
+
from collections import deque
|
| 12 |
+
from models import Observation, State
|
| 13 |
+
from scenarios import SCENARIOS
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
# ─────────────────────────────────────────────────────────────
|
| 17 |
+
# BFS PATHFINDING HELPER
|
| 18 |
+
# ─────────────────────────────────────────────────────────────
|
| 19 |
+
|
| 20 |
+
def _bfs(
|
| 21 |
+
start,
|
| 22 |
+
goal,
|
| 23 |
+
obstacles,
|
| 24 |
+
grid_w: int,
|
| 25 |
+
grid_h: int,
|
| 26 |
+
) -> Tuple[Optional[str], float]:
|
| 27 |
+
"""
|
| 28 |
+
Breadth-First Search from *start* to *goal* on a rectangular grid.
|
| 29 |
+
|
| 30 |
+
Avoids all cells listed in *obstacles*. Returns:
|
| 31 |
+
(first_direction, path_length) — the single step that begins the
|
| 32 |
+
shortest path, and how many steps
|
| 33 |
+
the full path takes.
|
| 34 |
+
(None, 0) — start == goal (already there).
|
| 35 |
+
(None, inf) — goal is unreachable.
|
| 36 |
+
|
| 37 |
+
Directions: "UP" (+y), "DOWN" (−y), "LEFT" (−x), "RIGHT" (+x).
|
| 38 |
+
"""
|
| 39 |
+
start = (int(start[0]), int(start[1]))
|
| 40 |
+
goal = (int(goal[0]), int(goal[1]))
|
| 41 |
+
|
| 42 |
+
if start == goal:
|
| 43 |
+
return (None, 0)
|
| 44 |
+
|
| 45 |
+
obstacle_set = frozenset((int(o[0]), int(o[1])) for o in obstacles)
|
| 46 |
+
dirs = [("RIGHT", (1, 0)), ("LEFT", (-1, 0)), ("UP", (0, 1)), ("DOWN", (0, -1))]
|
| 47 |
+
|
| 48 |
+
queue: deque = deque([(start, None, 0)]) # (pos, first_move, depth)
|
| 49 |
+
visited = {start}
|
| 50 |
+
|
| 51 |
+
while queue:
|
| 52 |
+
pos, first, depth = queue.popleft()
|
| 53 |
+
for name, (dx, dy) in dirs:
|
| 54 |
+
npos = (pos[0] + dx, pos[1] + dy)
|
| 55 |
+
if not (0 <= npos[0] < grid_w and 0 <= npos[1] < grid_h):
|
| 56 |
+
continue
|
| 57 |
+
if npos in obstacle_set or npos in visited:
|
| 58 |
+
continue
|
| 59 |
+
move = first if first else name
|
| 60 |
+
if npos == goal:
|
| 61 |
+
return (move, depth + 1)
|
| 62 |
+
visited.add(npos)
|
| 63 |
+
queue.append((npos, move, depth + 1))
|
| 64 |
+
|
| 65 |
+
return (None, float("inf"))
|
| 66 |
+
|
| 67 |
+
|
| 68 |
+
# ─────────────────────────────────────────────────────────────
|
| 69 |
+
# ENVIRONMENT
|
| 70 |
+
# ─────────────────────────────────────────────────────────────
|
| 71 |
+
|
| 72 |
+
class GarbageRobotEnv:
|
| 73 |
+
"""
|
| 74 |
+
Core RL Environment for the Garbage Collecting Robot.
|
| 75 |
+
|
| 76 |
+
Robot modes
|
| 77 |
+
-----------
|
| 78 |
+
MODE_NORMAL — agent controls the robot normally.
|
| 79 |
+
MODE_RECHARGE — battery critically low; robot auto-navigates home,
|
| 80 |
+
recharges, then switches back to NORMAL.
|
| 81 |
+
MODE_UNLOAD — storage full; robot auto-navigates to unload_station,
|
| 82 |
+
empties its bin, then switches back to NORMAL.
|
| 83 |
+
|
| 84 |
+
Autonomous overrides happen *inside* step(): the command the caller
|
| 85 |
+
sends is silently replaced when the robot is in a non-normal mode.
|
| 86 |
+
This keeps the external API unchanged while giving the robot
|
| 87 |
+
self-managing capabilities.
|
| 88 |
+
|
| 89 |
+
FIX: Battery is only decremented for real movement/collection commands,
|
| 90 |
+
NOT for internal CHARGE or UNLOAD_HERE commands.
|
| 91 |
+
"""
|
| 92 |
+
|
| 93 |
+
MODE_NORMAL = "normal"
|
| 94 |
+
MODE_RECHARGE = "recharging"
|
| 95 |
+
MODE_UNLOAD = "unloading"
|
| 96 |
+
|
| 97 |
+
# Safety margin added on top of BFS distance when deciding to recharge.
|
| 98 |
+
RECHARGE_BUFFER = 4
|
| 99 |
+
|
| 100 |
+
def __init__(self):
|
| 101 |
+
self.current_task_id = None
|
| 102 |
+
self.grid_size = (0, 0)
|
| 103 |
+
self.robot_position = [0, 0]
|
| 104 |
+
self.garbage_positions = []
|
| 105 |
+
self.obstacle_positions = []
|
| 106 |
+
self.battery_level = 0
|
| 107 |
+
self.max_battery = 0
|
| 108 |
+
self.inventory_count = 0
|
| 109 |
+
|
| 110 |
+
# Resource management state
|
| 111 |
+
self.home_position = [0, 0]
|
| 112 |
+
self.unload_station = [0, 0]
|
| 113 |
+
self.storage_capacity = 6
|
| 114 |
+
self.current_storage_load = 0
|
| 115 |
+
|
| 116 |
+
# Episode accounting
|
| 117 |
+
self.total_reward = 0.0
|
| 118 |
+
self.steps_taken = 0
|
| 119 |
+
self.done = False
|
| 120 |
+
|
| 121 |
+
# Autonomous navigation mode
|
| 122 |
+
self._mode = self.MODE_NORMAL
|
| 123 |
+
|
| 124 |
+
# ── Reset ─────────────────────────────────────────────────
|
| 125 |
+
|
| 126 |
+
def reset(self, task_id: str) -> State:
|
| 127 |
+
if task_id not in SCENARIOS:
|
| 128 |
+
raise ValueError(f"Task ID '{task_id}' not found in scenarios.")
|
| 129 |
+
|
| 130 |
+
s = SCENARIOS[task_id]
|
| 131 |
+
self.current_task_id = task_id
|
| 132 |
+
self.grid_size = tuple(s["grid_size"])
|
| 133 |
+
self.robot_position = list(s["robot_start"])
|
| 134 |
+
self.garbage_positions = [list(g) for g in s["garbage_starts"]]
|
| 135 |
+
self.obstacle_positions = [list(o) for o in s["obstacle_starts"]]
|
| 136 |
+
self.battery_level = s["max_battery"]
|
| 137 |
+
self.max_battery = s["max_battery"]
|
| 138 |
+
|
| 139 |
+
self.home_position = list(s.get("home_position", s["robot_start"]))
|
| 140 |
+
self.unload_station = list(s.get("unload_station", [0, self.grid_size[1] - 1]))
|
| 141 |
+
self.storage_capacity = s.get("storage_capacity", 6)
|
| 142 |
+
self.current_storage_load = 0
|
| 143 |
+
self.inventory_count = 0
|
| 144 |
+
|
| 145 |
+
self.total_reward = 0.0
|
| 146 |
+
self.steps_taken = 0
|
| 147 |
+
self.done = False
|
| 148 |
+
self._mode = self.MODE_NORMAL
|
| 149 |
+
|
| 150 |
+
return self.state()
|
| 151 |
+
|
| 152 |
+
def reset_custom(
|
| 153 |
+
self,
|
| 154 |
+
task_id: str = "task_easy",
|
| 155 |
+
grid_size=None,
|
| 156 |
+
robot_start=None,
|
| 157 |
+
garbage_positions=None,
|
| 158 |
+
obstacle_positions=None,
|
| 159 |
+
max_battery=None,
|
| 160 |
+
storage_capacity=None,
|
| 161 |
+
home_position=None,
|
| 162 |
+
unload_station=None,
|
| 163 |
+
) -> State:
|
| 164 |
+
"""
|
| 165 |
+
Dynamic reset: start from a scenario baseline and override any fields.
|
| 166 |
+
Pass task_id='custom' with all fields supplied to skip scenario lookup.
|
| 167 |
+
"""
|
| 168 |
+
if task_id in SCENARIOS:
|
| 169 |
+
s = SCENARIOS[task_id]
|
| 170 |
+
base_grid = s["grid_size"]
|
| 171 |
+
base_robot = s["robot_start"]
|
| 172 |
+
base_garbage = s["garbage_starts"]
|
| 173 |
+
base_obstacles = s["obstacle_starts"]
|
| 174 |
+
base_battery = s["max_battery"]
|
| 175 |
+
base_home = s.get("home_position", s["robot_start"])
|
| 176 |
+
base_unload = s.get("unload_station", [0, s["grid_size"][1] - 1])
|
| 177 |
+
base_capacity = s.get("storage_capacity", 5)
|
| 178 |
+
else:
|
| 179 |
+
base_grid = (10, 10)
|
| 180 |
+
base_robot = (0, 0)
|
| 181 |
+
base_garbage = []
|
| 182 |
+
base_obstacles = []
|
| 183 |
+
base_battery = 60
|
| 184 |
+
base_home = (0, 0)
|
| 185 |
+
base_unload = (9, 0)
|
| 186 |
+
base_capacity = 6
|
| 187 |
+
|
| 188 |
+
self.current_task_id = task_id
|
| 189 |
+
self.grid_size = tuple(grid_size) if grid_size is not None else tuple(base_grid)
|
| 190 |
+
self.robot_position = list(robot_start) if robot_start is not None else list(base_robot)
|
| 191 |
+
self.garbage_positions = [list(g) for g in garbage_positions] if garbage_positions is not None else [list(g) for g in base_garbage]
|
| 192 |
+
self.obstacle_positions = [list(o) for o in obstacle_positions] if obstacle_positions is not None else [list(o) for o in base_obstacles]
|
| 193 |
+
self.battery_level = max_battery if max_battery is not None else base_battery
|
| 194 |
+
self.max_battery = self.battery_level
|
| 195 |
+
self.home_position = list(home_position) if home_position is not None else list(base_home)
|
| 196 |
+
self.unload_station = list(unload_station) if unload_station is not None else list(base_unload)
|
| 197 |
+
self.storage_capacity = storage_capacity if storage_capacity is not None else base_capacity
|
| 198 |
+
|
| 199 |
+
self.current_storage_load = 0
|
| 200 |
+
self.inventory_count = 0
|
| 201 |
+
self.total_reward = 0.0
|
| 202 |
+
self.steps_taken = 0
|
| 203 |
+
self.done = False
|
| 204 |
+
self._mode = self.MODE_NORMAL
|
| 205 |
+
|
| 206 |
+
# Remove any garbage placed on top of an obstacle
|
| 207 |
+
self.garbage_positions = [
|
| 208 |
+
g for g in self.garbage_positions if g not in self.obstacle_positions
|
| 209 |
+
]
|
| 210 |
+
return self.state()
|
| 211 |
+
|
| 212 |
+
# ── Observation & State helpers ───────────────────────────
|
| 213 |
+
|
| 214 |
+
def _bfs_distance(self, target) -> int:
|
| 215 |
+
"""Return BFS step-count from current robot position to *target*."""
|
| 216 |
+
_, dist = _bfs(
|
| 217 |
+
self.robot_position, target,
|
| 218 |
+
self.obstacle_positions, self.grid_size[0], self.grid_size[1],
|
| 219 |
+
)
|
| 220 |
+
return int(dist) if dist != float("inf") else -1
|
| 221 |
+
|
| 222 |
+
def _should_recharge(self) -> bool:
|
| 223 |
+
"""
|
| 224 |
+
Return True when the robot must leave immediately to reach home
|
| 225 |
+
before battery runs out.
|
| 226 |
+
|
| 227 |
+
Threshold = BFS distance to home + RECHARGE_BUFFER.
|
| 228 |
+
A buffer of 4 gives comfortable headroom for obstacle detours.
|
| 229 |
+
"""
|
| 230 |
+
if self.battery_level <= 1:
|
| 231 |
+
return True
|
| 232 |
+
dist = self._bfs_distance(self.home_position)
|
| 233 |
+
if dist < 0:
|
| 234 |
+
# Home unreachable via BFS — fall back to Manhattan distance
|
| 235 |
+
dist = (abs(self.robot_position[0] - self.home_position[0]) +
|
| 236 |
+
abs(self.robot_position[1] - self.home_position[1]))
|
| 237 |
+
return self.battery_level <= (dist + self.RECHARGE_BUFFER)
|
| 238 |
+
|
| 239 |
+
def _should_unload(self) -> bool:
|
| 240 |
+
"""Return True when the storage bin is at capacity."""
|
| 241 |
+
return self.current_storage_load >= self.storage_capacity
|
| 242 |
+
|
| 243 |
+
def get_observation(self, message: str = "") -> Observation:
|
| 244 |
+
dist_home = self._bfs_distance(self.home_position)
|
| 245 |
+
|
| 246 |
+
if not message:
|
| 247 |
+
message = (
|
| 248 |
+
f"You are at {tuple(self.robot_position)}. "
|
| 249 |
+
f"Garbage remaining: {len(self.garbage_positions)}. "
|
| 250 |
+
f"Battery: {self.battery_level}/{self.max_battery}. "
|
| 251 |
+
f"Storage: {self.current_storage_load}/{self.storage_capacity}. "
|
| 252 |
+
f"Home (charging): {tuple(self.home_position)} "
|
| 253 |
+
f"[{dist_home if dist_home >= 0 else 'unreachable'} steps]. "
|
| 254 |
+
f"Unload station: {tuple(self.unload_station)}. "
|
| 255 |
+
f"Mode: {self._mode}."
|
| 256 |
+
)
|
| 257 |
+
|
| 258 |
+
return Observation(
|
| 259 |
+
grid_size = self.grid_size,
|
| 260 |
+
robot_position = tuple(self.robot_position),
|
| 261 |
+
garbage_positions = [tuple(g) for g in self.garbage_positions],
|
| 262 |
+
obstacle_positions = [tuple(o) for o in self.obstacle_positions],
|
| 263 |
+
battery_level = self.battery_level,
|
| 264 |
+
inventory_count = self.inventory_count,
|
| 265 |
+
message = message,
|
| 266 |
+
home_position = tuple(self.home_position),
|
| 267 |
+
unload_station = tuple(self.unload_station),
|
| 268 |
+
storage_capacity = self.storage_capacity,
|
| 269 |
+
current_storage_load = self.current_storage_load,
|
| 270 |
+
distance_from_home = dist_home,
|
| 271 |
+
robot_mode = self._mode,
|
| 272 |
+
)
|
| 273 |
+
|
| 274 |
+
def state(self) -> State:
|
| 275 |
+
return State(
|
| 276 |
+
task_id = self.current_task_id,
|
| 277 |
+
total_reward = self.total_reward,
|
| 278 |
+
steps_taken = self.steps_taken,
|
| 279 |
+
done = self.done,
|
| 280 |
+
robot_mode = self._mode,
|
| 281 |
+
current_storage_load = self.current_storage_load,
|
| 282 |
+
battery_level = self.battery_level,
|
| 283 |
+
distance_from_home = self._bfs_distance(self.home_position),
|
| 284 |
+
)
|
| 285 |
+
|
| 286 |
+
# ── Autonomous command resolver ────────────────────────────
|
| 287 |
+
|
| 288 |
+
def _resolve_command(self, requested: str) -> Tuple[str, str]:
|
| 289 |
+
"""
|
| 290 |
+
Determine the *effective* command for this step.
|
| 291 |
+
|
| 292 |
+
When the robot is in MODE_RECHARGE or MODE_UNLOAD the caller's
|
| 293 |
+
command is replaced by an autonomously-computed one.
|
| 294 |
+
|
| 295 |
+
Returns
|
| 296 |
+
-------
|
| 297 |
+
(effective_command, mode_message)
|
| 298 |
+
"""
|
| 299 |
+
|
| 300 |
+
# ── Trigger check (only when in normal mode) ───────────
|
| 301 |
+
# FIX: use `not self.done` guard instead of `self.garbage_positions`
|
| 302 |
+
# so recharge still fires even if all garbage is collected this step.
|
| 303 |
+
if self._mode == self.MODE_NORMAL:
|
| 304 |
+
if self._should_recharge() and not self.done:
|
| 305 |
+
self._mode = self.MODE_RECHARGE
|
| 306 |
+
elif self._should_unload():
|
| 307 |
+
self._mode = self.MODE_UNLOAD
|
| 308 |
+
|
| 309 |
+
# ── Recharging mode ────────────────────────────────────
|
| 310 |
+
if self._mode == self.MODE_RECHARGE:
|
| 311 |
+
if tuple(self.robot_position) == tuple(self.home_position):
|
| 312 |
+
# Arrived — charge and return to normal
|
| 313 |
+
self._mode = self.MODE_NORMAL
|
| 314 |
+
return (
|
| 315 |
+
"CHARGE",
|
| 316 |
+
(f"Reached charging station {tuple(self.home_position)}. "
|
| 317 |
+
f"Battery fully restored to {self.max_battery}. "
|
| 318 |
+
f"Resuming garbage collection."),
|
| 319 |
+
)
|
| 320 |
+
else:
|
| 321 |
+
move, dist = _bfs(
|
| 322 |
+
self.robot_position, self.home_position,
|
| 323 |
+
self.obstacle_positions, self.grid_size[0], self.grid_size[1],
|
| 324 |
+
)
|
| 325 |
+
dist_str = f"{int(dist)} steps" if dist != float("inf") else "route blocked"
|
| 326 |
+
return (
|
| 327 |
+
move or "UP",
|
| 328 |
+
(f"⚡ Battery critical ({self.battery_level}/{self.max_battery}). "
|
| 329 |
+
f"Auto-navigating to charging station {tuple(self.home_position)} "
|
| 330 |
+
f"[{dist_str}]."),
|
| 331 |
+
)
|
| 332 |
+
|
| 333 |
+
# ── Unloading mode ─────────────────────────────────────
|
| 334 |
+
if self._mode == self.MODE_UNLOAD:
|
| 335 |
+
if tuple(self.robot_position) == tuple(self.unload_station):
|
| 336 |
+
# Arrived — empty the bin and return to normal
|
| 337 |
+
freed = self.current_storage_load
|
| 338 |
+
self._mode = self.MODE_NORMAL
|
| 339 |
+
return (
|
| 340 |
+
"UNLOAD_HERE",
|
| 341 |
+
(f"Reached unload station {tuple(self.unload_station)}. "
|
| 342 |
+
f"Emptied {freed} item(s) from storage. "
|
| 343 |
+
f"Resuming garbage collection."),
|
| 344 |
+
)
|
| 345 |
+
else:
|
| 346 |
+
move, dist = _bfs(
|
| 347 |
+
self.robot_position, self.unload_station,
|
| 348 |
+
self.obstacle_positions, self.grid_size[0], self.grid_size[1],
|
| 349 |
+
)
|
| 350 |
+
dist_str = f"{int(dist)} steps" if dist != float("inf") else "route blocked"
|
| 351 |
+
return (
|
| 352 |
+
move or "UP",
|
| 353 |
+
(f"📦 Storage full ({self.current_storage_load}/{self.storage_capacity}). "
|
| 354 |
+
f"Auto-navigating to unload station {tuple(self.unload_station)} "
|
| 355 |
+
f"[{dist_str}]."),
|
| 356 |
+
)
|
| 357 |
+
|
| 358 |
+
# ── Normal mode — use caller's command ─────────────────
|
| 359 |
+
return (requested, "")
|
| 360 |
+
|
| 361 |
+
# ── Step ──────────────────────────────────────────────────
|
| 362 |
+
|
| 363 |
+
def step(self, command: str) -> Dict[str, Any]:
|
| 364 |
+
if self.done:
|
| 365 |
+
obs = self.get_observation("Episode already finished.")
|
| 366 |
+
return {"observation": obs.dict(), "reward": 0.0, "done": True, "info": {}}
|
| 367 |
+
|
| 368 |
+
self.steps_taken += 1
|
| 369 |
+
|
| 370 |
+
# Resolve autonomous overrides BEFORE battery decrement so that
|
| 371 |
+
# CHARGE / UNLOAD_HERE commands do NOT consume battery.
|
| 372 |
+
effective_cmd, mode_message = self._resolve_command(command)
|
| 373 |
+
|
| 374 |
+
# FIX: only drain battery for real movement / collection actions.
|
| 375 |
+
# Autonomous internal commands (CHARGE, UNLOAD_HERE) are free.
|
| 376 |
+
if effective_cmd in ("CHARGE", "UNLOAD_HERE"):
|
| 377 |
+
reward = 0.0
|
| 378 |
+
else:
|
| 379 |
+
self.battery_level -= 1
|
| 380 |
+
reward = -0.1
|
| 381 |
+
|
| 382 |
+
message = mode_message # may be overwritten below
|
| 383 |
+
|
| 384 |
+
# ── CHARGE (internal — issued autonomously at home) ────
|
| 385 |
+
if effective_cmd == "CHARGE":
|
| 386 |
+
self.battery_level = self.max_battery
|
| 387 |
+
reward += 5.0
|
| 388 |
+
# message already set from resolver
|
| 389 |
+
|
| 390 |
+
# ── UNLOAD_HERE (internal — issued autonomously at station) ──
|
| 391 |
+
elif effective_cmd == "UNLOAD_HERE":
|
| 392 |
+
freed = self.current_storage_load
|
| 393 |
+
self.current_storage_load = 0
|
| 394 |
+
reward += 2.0
|
| 395 |
+
# message already set from resolver
|
| 396 |
+
|
| 397 |
+
# ── COLLECT ───────────────────────────────────────────
|
| 398 |
+
elif effective_cmd == "COLLECT":
|
| 399 |
+
if self.robot_position in self.garbage_positions:
|
| 400 |
+
self.garbage_positions.remove(self.robot_position)
|
| 401 |
+
self.inventory_count += 1
|
| 402 |
+
self.current_storage_load += 1
|
| 403 |
+
reward += 10.0
|
| 404 |
+
message = (
|
| 405 |
+
f"Collected garbage! "
|
| 406 |
+
f"Storage: {self.current_storage_load}/{self.storage_capacity}."
|
| 407 |
+
)
|
| 408 |
+
if self._should_unload() and self.garbage_positions:
|
| 409 |
+
self._mode = self.MODE_UNLOAD
|
| 410 |
+
message += (
|
| 411 |
+
f" Storage full — auto-routing to "
|
| 412 |
+
f"unload station {tuple(self.unload_station)}."
|
| 413 |
+
)
|
| 414 |
+
else:
|
| 415 |
+
reward -= 1.0
|
| 416 |
+
message = "No garbage to collect here."
|
| 417 |
+
|
| 418 |
+
# ── Movement commands ──────────────────────────────────
|
| 419 |
+
elif effective_cmd in ("UP", "DOWN", "LEFT", "RIGHT"):
|
| 420 |
+
new_pos = list(self.robot_position)
|
| 421 |
+
if effective_cmd == "UP":
|
| 422 |
+
new_pos[1] += 1
|
| 423 |
+
elif effective_cmd == "DOWN":
|
| 424 |
+
new_pos[1] -= 1
|
| 425 |
+
elif effective_cmd == "LEFT":
|
| 426 |
+
new_pos[0] -= 1
|
| 427 |
+
elif effective_cmd == "RIGHT":
|
| 428 |
+
new_pos[0] += 1
|
| 429 |
+
|
| 430 |
+
gw, gh = self.grid_size
|
| 431 |
+
if 0 <= new_pos[0] < gw and 0 <= new_pos[1] < gh:
|
| 432 |
+
if new_pos in self.obstacle_positions:
|
| 433 |
+
reward -= 5.0
|
| 434 |
+
blocked = []
|
| 435 |
+
direction_map = {
|
| 436 |
+
"UP": [0, 1], "DOWN": [0, -1],
|
| 437 |
+
"LEFT": [-1, 0], "RIGHT": [1, 0],
|
| 438 |
+
}
|
| 439 |
+
for d, delta in direction_map.items():
|
| 440 |
+
nb = [self.robot_position[0] + delta[0],
|
| 441 |
+
self.robot_position[1] + delta[1]]
|
| 442 |
+
if nb in self.obstacle_positions:
|
| 443 |
+
blocked.append(d)
|
| 444 |
+
blocked_str = ", ".join(blocked) if blocked else "none"
|
| 445 |
+
message = (
|
| 446 |
+
f"BLOCKED! {effective_cmd} leads to an obstacle. "
|
| 447 |
+
f"Blocked directions from here: {blocked_str}. "
|
| 448 |
+
f"Choose a different direction."
|
| 449 |
+
)
|
| 450 |
+
else:
|
| 451 |
+
self.robot_position = new_pos
|
| 452 |
+
if not message:
|
| 453 |
+
message = f"Moved {effective_cmd}."
|
| 454 |
+
else:
|
| 455 |
+
reward -= 1.0
|
| 456 |
+
if not message:
|
| 457 |
+
message = (
|
| 458 |
+
f"Hit a wall trying to move {effective_cmd}. "
|
| 459 |
+
f"Do NOT try {effective_cmd} again from this position."
|
| 460 |
+
)
|
| 461 |
+
|
| 462 |
+
# ── Unknown command ────────────────────────────────────
|
| 463 |
+
else:
|
| 464 |
+
reward -= 1.0
|
| 465 |
+
message = f"Invalid command: '{effective_cmd}'."
|
| 466 |
+
|
| 467 |
+
# ── Termination checks ─────────────────────────────────
|
| 468 |
+
if len(self.garbage_positions) == 0:
|
| 469 |
+
self.done = True
|
| 470 |
+
reward += 50.0
|
| 471 |
+
message += " All garbage collected! Task complete."
|
| 472 |
+
elif self.battery_level <= 0:
|
| 473 |
+
self.done = True
|
| 474 |
+
message += " Battery depleted! Game over."
|
| 475 |
+
|
| 476 |
+
self.total_reward += reward
|
| 477 |
+
|
| 478 |
+
return {
|
| 479 |
+
"observation": self.get_observation(message).dict(),
|
| 480 |
+
"reward": reward,
|
| 481 |
+
"done": self.done,
|
| 482 |
+
"info": {
|
| 483 |
+
"inventory_count": self.inventory_count,
|
| 484 |
+
"steps": self.steps_taken,
|
| 485 |
+
"current_storage_load": self.current_storage_load,
|
| 486 |
+
"robot_mode": self._mode,
|
| 487 |
+
"autonomous_override": effective_cmd != command,
|
| 488 |
+
"original_command": command,
|
| 489 |
+
"effective_command": effective_cmd,
|
| 490 |
+
},
|
| 491 |
+
}
|
| 492 |
+
|
| 493 |
+
# ── Grading ───────────────────────────────────────────────
|
| 494 |
+
|
| 495 |
+
def grade(self, task_id: str) -> float:
|
| 496 |
+
"""Normalised [0.0, 1.0] completion score for the leaderboard."""
|
| 497 |
+
if task_id not in SCENARIOS:
|
| 498 |
+
return 0.0
|
| 499 |
+
total = len(SCENARIOS[task_id]["garbage_starts"])
|
| 500 |
+
return min(max(self.inventory_count / total, 0.0), 1.0)
|
fixed_dataset.jsonl
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
fixer.py
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import json
|
| 2 |
+
|
| 3 |
+
input_file = "rl_trajectories.jsonl"
|
| 4 |
+
output_file = "fixed_dataset.jsonl"
|
| 5 |
+
|
| 6 |
+
def extract_parts(text):
|
| 7 |
+
try:
|
| 8 |
+
user_part = text.split("### Response:")[0].strip()
|
| 9 |
+
assistant_part = text.split("### Response:")[1].strip()
|
| 10 |
+
return user_part, assistant_part
|
| 11 |
+
except:
|
| 12 |
+
return None, None
|
| 13 |
+
|
| 14 |
+
with open(input_file, "r") as f_in, open(output_file, "w") as f_out:
|
| 15 |
+
for line in f_in:
|
| 16 |
+
data = json.loads(line)
|
| 17 |
+
text = data.get("text", "")
|
| 18 |
+
|
| 19 |
+
user, assistant = extract_parts(text)
|
| 20 |
+
|
| 21 |
+
if user and assistant:
|
| 22 |
+
new_entry = {
|
| 23 |
+
"user": user,
|
| 24 |
+
"assistant": assistant
|
| 25 |
+
}
|
| 26 |
+
f_out.write(json.dumps(new_entry) + "\n")
|
| 27 |
+
|
| 28 |
+
print("Done. Fixed dataset saved.")
|
frontend/index.html
ADDED
|
@@ -0,0 +1,166 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!DOCTYPE html>
|
| 2 |
+
<html lang="en">
|
| 3 |
+
<head>
|
| 4 |
+
<meta charset="UTF-8">
|
| 5 |
+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
| 6 |
+
<title>GarbageBot — RL Control Center</title>
|
| 7 |
+
<meta name="description" content="Real-time dashboard for the fine-tuned Llama-3.2 garbage collecting robot. Watch Q-learning and LLM policy decisions live.">
|
| 8 |
+
<link rel="preconnect" href="https://fonts.googleapis.com">
|
| 9 |
+
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
|
| 10 |
+
<link href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700;800&family=JetBrains+Mono:wght@400;600&display=swap" rel="stylesheet">
|
| 11 |
+
<!-- Use /static/ prefix so FastAPI's StaticFiles middleware serves these correctly
|
| 12 |
+
on HuggingFace Spaces and any other hosted environment. -->
|
| 13 |
+
<link rel="stylesheet" href="/static/style.css">
|
| 14 |
+
</head>
|
| 15 |
+
<body>
|
| 16 |
+
<!-- Animated background -->
|
| 17 |
+
<div class="bg-orbs">
|
| 18 |
+
<div class="orb orb-1"></div>
|
| 19 |
+
<div class="orb orb-2"></div>
|
| 20 |
+
<div class="orb orb-3"></div>
|
| 21 |
+
<div class="orb orb-4"></div>
|
| 22 |
+
</div>
|
| 23 |
+
<div class="grid-bg"></div>
|
| 24 |
+
|
| 25 |
+
<div class="dashboard">
|
| 26 |
+
|
| 27 |
+
<!-- ── HEADER ── -->
|
| 28 |
+
<header class="header-bar">
|
| 29 |
+
<div class="brand">
|
| 30 |
+
<div class="brand-icon">🤖</div>
|
| 31 |
+
<div>
|
| 32 |
+
<h1>GarbageBot <span class="version-tag">v2.0</span></h1>
|
| 33 |
+
<p class="brand-sub">Llama-3.2-3B · Q-Table · BFS</p>
|
| 34 |
+
</div>
|
| 35 |
+
</div>
|
| 36 |
+
|
| 37 |
+
<div class="status-strip">
|
| 38 |
+
<div class="status-pill" id="server-pill">
|
| 39 |
+
<span class="pulse-dot" id="status-dot"></span>
|
| 40 |
+
<span id="status-label">Connecting…</span>
|
| 41 |
+
</div>
|
| 42 |
+
<div class="policy-badge" id="policy-badge">
|
| 43 |
+
<span class="badge-icon">⚡</span>
|
| 44 |
+
<span id="policy-label">–</span>
|
| 45 |
+
</div>
|
| 46 |
+
<div class="status-pill" id="mode-pill">
|
| 47 |
+
<span id="mode-label">NORMAL</span>
|
| 48 |
+
</div>
|
| 49 |
+
</div>
|
| 50 |
+
|
| 51 |
+
<div class="controls">
|
| 52 |
+
<select id="task-select">
|
| 53 |
+
<option value="task_easy">🟢 Easy — 5×5</option>
|
| 54 |
+
<option value="task_medium">🟡 Medium — 7×7</option>
|
| 55 |
+
<option value="task_hard">🔴 Hard — 10×10</option>
|
| 56 |
+
</select>
|
| 57 |
+
<div class="speed-group">
|
| 58 |
+
<label class="speed-label">Speed</label>
|
| 59 |
+
<input type="range" id="speed-slider" min="100" max="1500" value="500" step="100">
|
| 60 |
+
<span id="speed-val">500ms</span>
|
| 61 |
+
</div>
|
| 62 |
+
<button id="reset-btn" class="btn secondary">↺ Reset</button>
|
| 63 |
+
<button id="auto-btn" class="btn primary">▶ Run Policy</button>
|
| 64 |
+
<button id="manual-btn" class="btn outline">⏭ Step</button>
|
| 65 |
+
</div>
|
| 66 |
+
</header>
|
| 67 |
+
|
| 68 |
+
<!-- ── MAIN ── -->
|
| 69 |
+
<main>
|
| 70 |
+
|
| 71 |
+
<!-- Grid world -->
|
| 72 |
+
<section class="grid-panel panel glass">
|
| 73 |
+
<div class="grid-header">
|
| 74 |
+
<span class="grid-title">Environment</span>
|
| 75 |
+
<div class="grid-meta">
|
| 76 |
+
<span id="step-counter" class="mono-chip">Step 0</span>
|
| 77 |
+
<span id="episode-score-chip" class="mono-chip accent-chip">Score 0.00</span>
|
| 78 |
+
</div>
|
| 79 |
+
</div>
|
| 80 |
+
<div class="grid-stage" id="grid-stage">
|
| 81 |
+
<div id="env-grid" class="grid-world">
|
| 82 |
+
<div id="particle-layer" class="particle-layer"></div>
|
| 83 |
+
</div>
|
| 84 |
+
</div>
|
| 85 |
+
<p class="grid-hint">💡 Click any empty cell to place or remove garbage</p>
|
| 86 |
+
</section>
|
| 87 |
+
|
| 88 |
+
<!-- Side panel -->
|
| 89 |
+
<aside class="side-panel">
|
| 90 |
+
|
| 91 |
+
<!-- Telemetry -->
|
| 92 |
+
<div class="panel glass tele-card">
|
| 93 |
+
<h2 class="section-title">Telemetry</h2>
|
| 94 |
+
|
| 95 |
+
<div class="stat-row">
|
| 96 |
+
<div class="stat-icon">🔋</div>
|
| 97 |
+
<div class="stat-body">
|
| 98 |
+
<div class="stat-label-row">
|
| 99 |
+
<span class="stat-label">Battery</span>
|
| 100 |
+
<span id="battery-text" class="stat-num">–</span>
|
| 101 |
+
</div>
|
| 102 |
+
<div class="progress-track">
|
| 103 |
+
<div class="progress-fill" id="battery-progress"></div>
|
| 104 |
+
</div>
|
| 105 |
+
</div>
|
| 106 |
+
</div>
|
| 107 |
+
|
| 108 |
+
<div class="stat-row">
|
| 109 |
+
<div class="stat-icon">📦</div>
|
| 110 |
+
<div class="stat-body">
|
| 111 |
+
<div class="stat-label-row">
|
| 112 |
+
<span class="stat-label">Storage</span>
|
| 113 |
+
<span id="storage-text" class="stat-num">–</span>
|
| 114 |
+
</div>
|
| 115 |
+
<div class="progress-track">
|
| 116 |
+
<div class="progress-fill" id="storage-progress" style="width: 0%; background: var(--warning);"></div>
|
| 117 |
+
</div>
|
| 118 |
+
</div>
|
| 119 |
+
</div>
|
| 120 |
+
|
| 121 |
+
<div class="stat-row">
|
| 122 |
+
<div class="stat-icon">🗑️</div>
|
| 123 |
+
<div class="stat-body">
|
| 124 |
+
<div class="stat-label-row">
|
| 125 |
+
<span class="stat-label">Total Collected</span>
|
| 126 |
+
<span id="inventory-text" class="stat-num big-num">0</span>
|
| 127 |
+
</div>
|
| 128 |
+
</div>
|
| 129 |
+
</div>
|
| 130 |
+
|
| 131 |
+
<div class="stat-row">
|
| 132 |
+
<div class="stat-icon">🏆</div>
|
| 133 |
+
<div class="stat-body">
|
| 134 |
+
<div class="stat-label-row">
|
| 135 |
+
<span class="stat-label">Reward</span>
|
| 136 |
+
<span id="score-text" class="stat-num big-num accent">0.00</span>
|
| 137 |
+
</div>
|
| 138 |
+
</div>
|
| 139 |
+
</div>
|
| 140 |
+
|
| 141 |
+
<!-- Mini reward chart -->
|
| 142 |
+
<div class="chart-wrap">
|
| 143 |
+
<canvas id="reward-chart" width="290" height="70"></canvas>
|
| 144 |
+
</div>
|
| 145 |
+
</div>
|
| 146 |
+
|
| 147 |
+
<!-- Policy log -->
|
| 148 |
+
<div class="panel glass log-card">
|
| 149 |
+
<div class="log-header">
|
| 150 |
+
<h2 class="section-title">Policy Observations</h2>
|
| 151 |
+
<button class="clear-btn" id="clear-log">✕ Clear</button>
|
| 152 |
+
</div>
|
| 153 |
+
<div id="log-feed" class="log-feed">
|
| 154 |
+
<p class="placeholder">Awaiting environment reset…</p>
|
| 155 |
+
</div>
|
| 156 |
+
<p class="log-footer">🤖 Driven by fine-tuned Llama-3.2-3B</p>
|
| 157 |
+
</div>
|
| 158 |
+
|
| 159 |
+
</aside>
|
| 160 |
+
</main>
|
| 161 |
+
|
| 162 |
+
</div><!-- /dashboard -->
|
| 163 |
+
|
| 164 |
+
<script src="/static/script.js"></script>
|
| 165 |
+
</body>
|
| 166 |
+
</html>
|
frontend/script.js
ADDED
|
@@ -0,0 +1,515 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
/* ═══════════════════════════════════════════════════════
|
| 2 |
+
GarbageBot — Continuous-World Dashboard Logic
|
| 3 |
+
Policy chain: Fine-tuned LLM → Q-table → BFS fallback
|
| 4 |
+
|
| 5 |
+
Fix applied:
|
| 6 |
+
- API_BASE was hardcoded to "http://localhost:7861" which breaks on any
|
| 7 |
+
hosted environment (HuggingFace Spaces, cloud VMs, etc.).
|
| 8 |
+
Now uses a relative empty string "" so every fetch goes to the same
|
| 9 |
+
origin that served the page — works locally and in production without
|
| 10 |
+
any code change.
|
| 11 |
+
═══════════════════════════════════════════════════════ */
|
| 12 |
+
|
| 13 |
+
// FIX: use relative paths ("") instead of hardcoded "http://localhost:7861"
|
| 14 |
+
// so the dashboard works on HuggingFace Spaces and any other host automatically.
|
| 15 |
+
const API_BASE = "";
|
| 16 |
+
|
| 17 |
+
// ── DOM ───────────────────────────────────────────────────
|
| 18 |
+
const statusDot = document.getElementById("status-dot");
|
| 19 |
+
const statusLabel = document.getElementById("status-label");
|
| 20 |
+
const policyBadge = document.getElementById("policy-badge");
|
| 21 |
+
const policyLabel = document.getElementById("policy-label");
|
| 22 |
+
const taskSelect = document.getElementById("task-select");
|
| 23 |
+
const speedSlider = document.getElementById("speed-slider");
|
| 24 |
+
const speedVal = document.getElementById("speed-val");
|
| 25 |
+
const resetBtn = document.getElementById("reset-btn");
|
| 26 |
+
const autoBtn = document.getElementById("auto-btn");
|
| 27 |
+
const manualBtn = document.getElementById("manual-btn");
|
| 28 |
+
const clearLogBtn = document.getElementById("clear-log");
|
| 29 |
+
|
| 30 |
+
const envGrid = document.getElementById("env-grid");
|
| 31 |
+
const particleLayer = document.getElementById("particle-layer");
|
| 32 |
+
const batteryProgress = document.getElementById("battery-progress");
|
| 33 |
+
const batteryText = document.getElementById("battery-text");
|
| 34 |
+
const scoreText = document.getElementById("score-text");
|
| 35 |
+
const inventoryText = document.getElementById("inventory-text");
|
| 36 |
+
const stepCounter = document.getElementById("step-counter");
|
| 37 |
+
const episodeScoreChip = document.getElementById("episode-score-chip");
|
| 38 |
+
const logFeed = document.getElementById("log-feed");
|
| 39 |
+
const rewardCanvas = document.getElementById("reward-chart");
|
| 40 |
+
|
| 41 |
+
const modePill = document.getElementById("mode-pill");
|
| 42 |
+
const modeLabel = document.getElementById("mode-label");
|
| 43 |
+
const storageProgress = document.getElementById("storage-progress");
|
| 44 |
+
const storageText = document.getElementById("storage-text");
|
| 45 |
+
|
| 46 |
+
// ── State ─────────────────────────────────────────────────
|
| 47 |
+
let autoMode = false;
|
| 48 |
+
let autoTimer = null;
|
| 49 |
+
let currentState = null;
|
| 50 |
+
let robotEntity = null;
|
| 51 |
+
let stepCount = 0;
|
| 52 |
+
let totalReward = 0;
|
| 53 |
+
let rewardHistory = [];
|
| 54 |
+
let maxBattery = 30;
|
| 55 |
+
let stepDelay = 500;
|
| 56 |
+
let lastMode = "normal";
|
| 57 |
+
|
| 58 |
+
// World dimensions (set on reset)
|
| 59 |
+
let WORLD_W = 5, WORLD_H = 5;
|
| 60 |
+
const CELL = 52; // must match CSS --cell
|
| 61 |
+
|
| 62 |
+
// ── Speed slider ──────────────────────────────────────────
|
| 63 |
+
speedSlider.addEventListener("input", () => {
|
| 64 |
+
stepDelay = parseInt(speedSlider.value);
|
| 65 |
+
speedVal.textContent = `${stepDelay}ms`;
|
| 66 |
+
const pct = ((stepDelay - 100) / 1400) * 100;
|
| 67 |
+
speedSlider.style.background = `linear-gradient(90deg, var(--blue) ${pct}%, rgba(255,255,255,.15) ${pct}%)`;
|
| 68 |
+
syncRobotTransition();
|
| 69 |
+
if (autoMode) { clearInterval(autoTimer); autoTimer = setInterval(stepEnv, stepDelay); }
|
| 70 |
+
});
|
| 71 |
+
|
| 72 |
+
function syncRobotTransition() {
|
| 73 |
+
if (!robotEntity) return;
|
| 74 |
+
envGrid.style.setProperty("--move-dur", `${stepDelay}ms`);
|
| 75 |
+
}
|
| 76 |
+
|
| 77 |
+
// ── Log helpers ───────────────────────────────────────────
|
| 78 |
+
function addLog(msg, source = "sys") {
|
| 79 |
+
const ph = logFeed.querySelector(".placeholder");
|
| 80 |
+
if (ph) ph.remove();
|
| 81 |
+
|
| 82 |
+
const entry = document.createElement("div");
|
| 83 |
+
entry.className = "log-entry";
|
| 84 |
+
|
| 85 |
+
const badge = document.createElement("span");
|
| 86 |
+
badge.className = `log-badge ${source === "q_table" ? "q-table" : source}`;
|
| 87 |
+
badge.textContent = source.replace("_","-").toUpperCase();
|
| 88 |
+
|
| 89 |
+
const text = document.createElement("span");
|
| 90 |
+
text.textContent = msg;
|
| 91 |
+
|
| 92 |
+
entry.append(badge, text);
|
| 93 |
+
logFeed.prepend(entry);
|
| 94 |
+
while (logFeed.children.length > 65) logFeed.removeChild(logFeed.lastChild);
|
| 95 |
+
}
|
| 96 |
+
|
| 97 |
+
clearLogBtn.addEventListener("click", () => {
|
| 98 |
+
logFeed.innerHTML = `<p class="placeholder">Log cleared…</p>`;
|
| 99 |
+
});
|
| 100 |
+
|
| 101 |
+
// ── Mini reward chart ───────────────────────��─────────────
|
| 102 |
+
function drawChart() {
|
| 103 |
+
const ctx = rewardCanvas.getContext("2d");
|
| 104 |
+
const W = rewardCanvas.width, H = rewardCanvas.height;
|
| 105 |
+
ctx.clearRect(0, 0, W, H);
|
| 106 |
+
if (rewardHistory.length < 2) return;
|
| 107 |
+
|
| 108 |
+
const maxR = Math.max(...rewardHistory.map(Math.abs), .1);
|
| 109 |
+
const step = W / (rewardHistory.length - 1);
|
| 110 |
+
const pts = rewardHistory.map((v, i) => [i * step, H - ((v + maxR) / (2 * maxR)) * H]);
|
| 111 |
+
|
| 112 |
+
const grad = ctx.createLinearGradient(0, 0, 0, H);
|
| 113 |
+
grad.addColorStop(0, "rgba(59,158,255,.5)");
|
| 114 |
+
grad.addColorStop(1, "rgba(59,158,255,0)");
|
| 115 |
+
ctx.beginPath();
|
| 116 |
+
pts.forEach(([x, y], i) => i === 0 ? ctx.moveTo(x, y) : ctx.lineTo(x, y));
|
| 117 |
+
ctx.lineTo(pts[pts.length-1][0], H);
|
| 118 |
+
ctx.lineTo(0, H); ctx.closePath();
|
| 119 |
+
ctx.fillStyle = grad; ctx.fill();
|
| 120 |
+
|
| 121 |
+
ctx.beginPath();
|
| 122 |
+
pts.forEach(([x, y], i) => i === 0 ? ctx.moveTo(x, y) : ctx.lineTo(x, y));
|
| 123 |
+
ctx.strokeStyle = "#3b9eff"; ctx.lineWidth = 2;
|
| 124 |
+
ctx.lineJoin = "round"; ctx.stroke();
|
| 125 |
+
|
| 126 |
+
const [lx, ly] = pts[pts.length-1];
|
| 127 |
+
ctx.beginPath(); ctx.arc(lx, ly, 3.5, 0, Math.PI*2);
|
| 128 |
+
ctx.fillStyle = "#a5c8ff"; ctx.fill();
|
| 129 |
+
}
|
| 130 |
+
|
| 131 |
+
// ── Particles ─────────────────────────────────────────────
|
| 132 |
+
function spawnParticles(px, py) {
|
| 133 |
+
const colors = ["#c084fc","#818cf8","#3b9eff","#2dd4bf","#fbbf24"];
|
| 134 |
+
for (let i = 0; i < 14; i++) {
|
| 135 |
+
const p = document.createElement("div");
|
| 136 |
+
p.className = "particle";
|
| 137 |
+
const angle = (i / 14) * Math.PI * 2;
|
| 138 |
+
const dist = 28 + Math.random() * 42;
|
| 139 |
+
const size = 4 + Math.random() * 6;
|
| 140 |
+
p.style.cssText = `
|
| 141 |
+
left:${px}px; top:${py}px;
|
| 142 |
+
width:${size}px; height:${size}px;
|
| 143 |
+
background:${colors[i % colors.length]};
|
| 144 |
+
box-shadow:0 0 6px ${colors[i%colors.length]};
|
| 145 |
+
--tx:translate(${Math.cos(angle)*dist}px,${Math.sin(angle)*dist}px);
|
| 146 |
+
`;
|
| 147 |
+
particleLayer.appendChild(p);
|
| 148 |
+
setTimeout(() => p.remove(), 780);
|
| 149 |
+
}
|
| 150 |
+
}
|
| 151 |
+
|
| 152 |
+
// ── Trail ghost ───────────────────────────────────────────
|
| 153 |
+
function addTrail(left, top) {
|
| 154 |
+
const g = document.createElement("div");
|
| 155 |
+
g.className = "trail-ghost";
|
| 156 |
+
g.style.left = `${left}px`;
|
| 157 |
+
g.style.top = `${top}px`;
|
| 158 |
+
envGrid.appendChild(g);
|
| 159 |
+
setTimeout(() => g.remove(), 1100);
|
| 160 |
+
}
|
| 161 |
+
|
| 162 |
+
// ── World coordinates ─────────────────────────────────────
|
| 163 |
+
function wx(x) { return x * CELL; }
|
| 164 |
+
function wy(y, H) { return (H - 1 - y) * CELL; }
|
| 165 |
+
|
| 166 |
+
// ── Direction → emoji ─────────────────────────────────────
|
| 167 |
+
const DIR_EMOJI = { UP:"🤖", DOWN:"🤖", LEFT:"🤖", RIGHT:"🤖", COLLECT:"🤖" };
|
| 168 |
+
|
| 169 |
+
// ── Grid render ───────────────────────────────────────────
|
| 170 |
+
function renderGrid(obs, isReset = false) {
|
| 171 |
+
const [W, H] = obs.grid_size;
|
| 172 |
+
WORLD_W = W; WORLD_H = H;
|
| 173 |
+
const worldPx = W * CELL;
|
| 174 |
+
const worldPy = H * CELL;
|
| 175 |
+
|
| 176 |
+
if (isReset) {
|
| 177 |
+
envGrid.innerHTML = "";
|
| 178 |
+
envGrid.style.width = `${worldPx}px`;
|
| 179 |
+
envGrid.style.height = `${worldPy}px`;
|
| 180 |
+
envGrid.style.gridTemplateColumns = `repeat(${W}, ${CELL}px)`;
|
| 181 |
+
envGrid.style.gridTemplateRows = `repeat(${H}, ${CELL}px)`;
|
| 182 |
+
envGrid.style.backgroundSize = `${CELL}px ${CELL}px, ${CELL}px ${CELL}px, 100% 100%`;
|
| 183 |
+
|
| 184 |
+
// Transparent click-target cells
|
| 185 |
+
for (let y = H - 1; y >= 0; y--) {
|
| 186 |
+
for (let x = 0; x < W; x++) {
|
| 187 |
+
const cell = document.createElement("div");
|
| 188 |
+
cell.className = "cell";
|
| 189 |
+
cell.dataset.x = x; cell.dataset.y = y;
|
| 190 |
+
cell.addEventListener("click", () => toggleGarbage(x, y));
|
| 191 |
+
envGrid.appendChild(cell);
|
| 192 |
+
}
|
| 193 |
+
}
|
| 194 |
+
|
| 195 |
+
// 3D obstacle walls
|
| 196 |
+
obs.obstacle_positions.forEach(([x, y]) => {
|
| 197 |
+
const el = document.createElement("div");
|
| 198 |
+
el.className = "world-obstacle";
|
| 199 |
+
el.style.left = `${wx(x)}px`;
|
| 200 |
+
el.style.top = `${wy(y, H)}px`;
|
| 201 |
+
el.style.width = `${CELL}px`;
|
| 202 |
+
el.style.height = `${CELL}px`;
|
| 203 |
+
envGrid.appendChild(el);
|
| 204 |
+
});
|
| 205 |
+
|
| 206 |
+
// Robot entity
|
| 207 |
+
robotEntity = document.createElement("div");
|
| 208 |
+
robotEntity.className = "robot-entity";
|
| 209 |
+
robotEntity.textContent = "🤖";
|
| 210 |
+
robotEntity.style.width = `${CELL}px`;
|
| 211 |
+
robotEntity.style.height = `${CELL}px`;
|
| 212 |
+
robotEntity.style.left = `${wx(obs.robot_position[0])}px`;
|
| 213 |
+
robotEntity.style.top = `${wy(obs.robot_position[1], H)}px`;
|
| 214 |
+
envGrid.appendChild(robotEntity);
|
| 215 |
+
|
| 216 |
+
// ⚡ Home Station
|
| 217 |
+
if (obs.home_position) {
|
| 218 |
+
const home = document.createElement("div");
|
| 219 |
+
home.className = "world-home";
|
| 220 |
+
home.style.left = `${wx(obs.home_position[0])}px`;
|
| 221 |
+
home.style.top = `${wy(obs.home_position[1], H)}px`;
|
| 222 |
+
envGrid.appendChild(home);
|
| 223 |
+
}
|
| 224 |
+
|
| 225 |
+
// 📦 Unload Station
|
| 226 |
+
if (obs.unload_station) {
|
| 227 |
+
const unload = document.createElement("div");
|
| 228 |
+
unload.className = "world-unload";
|
| 229 |
+
unload.style.left = `${wx(obs.unload_station[0])}px`;
|
| 230 |
+
unload.style.top = `${wy(obs.unload_station[1], H)}px`;
|
| 231 |
+
envGrid.appendChild(unload);
|
| 232 |
+
}
|
| 233 |
+
|
| 234 |
+
// Particle layer on top
|
| 235 |
+
const pl = document.createElement("div");
|
| 236 |
+
pl.id = "particle-layer";
|
| 237 |
+
pl.className = "particle-layer";
|
| 238 |
+
envGrid.appendChild(pl);
|
| 239 |
+
|
| 240 |
+
syncRobotTransition();
|
| 241 |
+
}
|
| 242 |
+
|
| 243 |
+
// Continuous robot move
|
| 244 |
+
if (robotEntity) {
|
| 245 |
+
const nl = wx(obs.robot_position[0]);
|
| 246 |
+
const nt = wy(obs.robot_position[1], H);
|
| 247 |
+
robotEntity.style.left = `${nl}px`;
|
| 248 |
+
robotEntity.style.top = `${nt}px`;
|
| 249 |
+
}
|
| 250 |
+
|
| 251 |
+
// Re-render garbage
|
| 252 |
+
document.querySelectorAll(".world-garbage").forEach(g => g.remove());
|
| 253 |
+
obs.garbage_positions.forEach(([x, y]) => {
|
| 254 |
+
const el = document.createElement("div");
|
| 255 |
+
el.className = "world-garbage";
|
| 256 |
+
el.style.left = `${wx(x)}px`;
|
| 257 |
+
el.style.top = `${wy(y, H)}px`;
|
| 258 |
+
el.style.width = `${CELL}px`;
|
| 259 |
+
el.style.height = `${CELL}px`;
|
| 260 |
+
el.innerHTML = `<span>🗑️</span>`;
|
| 261 |
+
el.addEventListener("click", () => toggleGarbage(x, y));
|
| 262 |
+
envGrid.appendChild(el);
|
| 263 |
+
});
|
| 264 |
+
|
| 265 |
+
addLog(obs.message, "sys");
|
| 266 |
+
}
|
| 267 |
+
|
| 268 |
+
// ── Telemetry ─────────────────────────────────────────────
|
| 269 |
+
function updateTelemetry(obs, reward, done) {
|
| 270 |
+
if (obs.battery_level > maxBattery) maxBattery = obs.battery_level;
|
| 271 |
+
const pct = Math.max(0, (obs.battery_level / maxBattery) * 100);
|
| 272 |
+
batteryProgress.style.width = `${pct}%`;
|
| 273 |
+
batteryText.textContent = `${obs.battery_level} / ${maxBattery}`;
|
| 274 |
+
|
| 275 |
+
if (pct > 55) batteryProgress.style.background = "#34d399";
|
| 276 |
+
else if (pct > 25) batteryProgress.style.background = "#fbbf24";
|
| 277 |
+
else batteryProgress.style.background = "#fb7185";
|
| 278 |
+
|
| 279 |
+
// Storage update
|
| 280 |
+
if (obs.storage_capacity) {
|
| 281 |
+
const sPct = (obs.current_storage_load / obs.storage_capacity) * 100;
|
| 282 |
+
storageProgress.style.width = `${sPct}%`;
|
| 283 |
+
storageProgress.style.background = sPct >= 100 ? "#f59e0b" : "#60a5fa";
|
| 284 |
+
storageText.textContent = `${obs.current_storage_load} / ${obs.storage_capacity}`;
|
| 285 |
+
}
|
| 286 |
+
|
| 287 |
+
// Inventory (total collected)
|
| 288 |
+
if (inventoryText) {
|
| 289 |
+
inventoryText.textContent = obs.inventory_count ?? 0;
|
| 290 |
+
}
|
| 291 |
+
|
| 292 |
+
// Mode updates
|
| 293 |
+
const mode = obs.robot_mode || "normal";
|
| 294 |
+
if (mode !== lastMode) {
|
| 295 |
+
addLog(`Robot mode changed to: ${mode.toUpperCase()}`, "sys");
|
| 296 |
+
lastMode = mode;
|
| 297 |
+
}
|
| 298 |
+
modeLabel.textContent = mode.toUpperCase();
|
| 299 |
+
|
| 300 |
+
modePill.classList.remove("normal", "recharging", "unloading");
|
| 301 |
+
modePill.classList.add(mode);
|
| 302 |
+
|
| 303 |
+
if (robotEntity) {
|
| 304 |
+
robotEntity.classList.remove("recharging", "unloading");
|
| 305 |
+
if (mode !== "normal") robotEntity.classList.add(mode);
|
| 306 |
+
}
|
| 307 |
+
|
| 308 |
+
if (reward !== undefined) {
|
| 309 |
+
totalReward += reward;
|
| 310 |
+
rewardHistory.push(totalReward);
|
| 311 |
+
if (rewardHistory.length > 80) rewardHistory.shift();
|
| 312 |
+
scoreText.textContent = totalReward.toFixed(2);
|
| 313 |
+
episodeScoreChip.textContent = `Score ${totalReward.toFixed(2)}`;
|
| 314 |
+
drawChart();
|
| 315 |
+
}
|
| 316 |
+
|
| 317 |
+
stepCounter.textContent = `Step ${stepCount}`;
|
| 318 |
+
}
|
| 319 |
+
|
| 320 |
+
// ── Policy badge ──────────────────────────────────────────
|
| 321 |
+
const POLICY_STYLES = {
|
| 322 |
+
llm: { color:"#3b9eff", border:"rgba(59,158,255,.6)" },
|
| 323 |
+
bfs: { color:"#2dd4bf", border:"rgba(45,212,191,.6)" },
|
| 324 |
+
q_table: { color:"#fbbf24", border:"rgba(251,191,36,.6)" },
|
| 325 |
+
sys: { color:"#7ea8d8", border:"rgba(126,168,216,.3)" },
|
| 326 |
+
};
|
| 327 |
+
function showPolicy(source, action) {
|
| 328 |
+
const s = POLICY_STYLES[source] || POLICY_STYLES.sys;
|
| 329 |
+
policyLabel.textContent = `${source.replace("_","-").toUpperCase()} → ${action}`;
|
| 330 |
+
policyBadge.style.borderColor = s.border;
|
| 331 |
+
policyBadge.style.color = s.color;
|
| 332 |
+
policyBadge.classList.add("active");
|
| 333 |
+
}
|
| 334 |
+
|
| 335 |
+
// ── BFS fallback ──────────────────────────────────────────
|
| 336 |
+
function bfsMove(rPos, target, obstacles, W, H) {
|
| 337 |
+
if (rPos[0]===target[0] && rPos[1]===target[1]) return "COLLECT";
|
| 338 |
+
const obs = new Set(obstacles.map(([x,y]) => `${x},${y}`));
|
| 339 |
+
const dirs = [["RIGHT",1,0],["LEFT",-1,0],["UP",0,1],["DOWN",0,-1]];
|
| 340 |
+
const q = [{pos:[...rPos], first:null}];
|
| 341 |
+
const vis = new Set([`${rPos[0]},${rPos[1]}`]);
|
| 342 |
+
|
| 343 |
+
while (q.length) {
|
| 344 |
+
const {pos, first} = q.shift();
|
| 345 |
+
for (const [name, dx, dy] of dirs) {
|
| 346 |
+
const nx = pos[0]+dx, ny = pos[1]+dy;
|
| 347 |
+
if (nx<0||nx>=W||ny<0||ny>=H) continue;
|
| 348 |
+
const key = `${nx},${ny}`;
|
| 349 |
+
if (obs.has(key)||vis.has(key)) continue;
|
| 350 |
+
const move = first||name;
|
| 351 |
+
if (nx===target[0]&&ny===target[1]) return move;
|
| 352 |
+
vis.add(key); q.push({pos:[nx,ny], first:move});
|
| 353 |
+
}
|
| 354 |
+
}
|
| 355 |
+
return null;
|
| 356 |
+
}
|
| 357 |
+
|
| 358 |
+
function nnOrder(start, targets, obstacles, W, H) {
|
| 359 |
+
function dist(a, b) {
|
| 360 |
+
if (a[0]===b[0]&&a[1]===b[1]) return 0;
|
| 361 |
+
const obs=new Set(obstacles.map(([x,y])=>`${x},${y}`));
|
| 362 |
+
const dirs=[[1,0],[-1,0],[0,1],[0,-1]];
|
| 363 |
+
const q=[{pos:[...a],d:0}];const vis=new Set([`${a[0]},${a[1]}`]);
|
| 364 |
+
while(q.length){const{pos,d}=q.shift();for(const[dx,dy]of dirs){const nx=pos[0]+dx,ny=pos[1]+dy;if(nx<0||nx>=W||ny<0||ny>=H)continue;const k=`${nx},${ny}`;if(obs.has(k)||vis.has(k))continue;if(nx===b[0]&&ny===b[1])return d+1;vis.add(k);q.push({pos:[nx,ny],d:d+1});}}
|
| 365 |
+
return Infinity;
|
| 366 |
+
}
|
| 367 |
+
let rem=[...targets],cur=[...start],ord=[];
|
| 368 |
+
while(rem.length){
|
| 369 |
+
let best=rem[0],bD=dist(cur,best);
|
| 370 |
+
for(const t of rem){const d=dist(cur,t);if(d<bD){bD=d;best=t;}}
|
| 371 |
+
ord.push(best);
|
| 372 |
+
rem=rem.filter(t=>!(t[0]===best[0]&&t[1]===best[1]));
|
| 373 |
+
cur=[...best];
|
| 374 |
+
}
|
| 375 |
+
return ord;
|
| 376 |
+
}
|
| 377 |
+
|
| 378 |
+
function localFallback(obs) {
|
| 379 |
+
if (!obs.garbage_positions.length) return "UP";
|
| 380 |
+
const r = obs.robot_position;
|
| 381 |
+
if (obs.garbage_positions.some(([x,y]) => x===r[0]&&y===r[1])) return "COLLECT";
|
| 382 |
+
const ordered = nnOrder(r, obs.garbage_positions, obs.obstacle_positions, obs.grid_size[0], obs.grid_size[1]);
|
| 383 |
+
return bfsMove(r, ordered[0], obs.obstacle_positions, obs.grid_size[0], obs.grid_size[1]) || "RIGHT";
|
| 384 |
+
}
|
| 385 |
+
|
| 386 |
+
// ── Custom garbage toggle ─────────────────────────────────
|
| 387 |
+
async function toggleGarbage(x, y) {
|
| 388 |
+
if (!currentState || autoMode) return;
|
| 389 |
+
if (currentState.obstacle_positions.some(([ox,oy]) => ox===x&&oy===y)) return;
|
| 390 |
+
if (currentState.robot_position[0]===x && currentState.robot_position[1]===y) return;
|
| 391 |
+
|
| 392 |
+
const has = currentState.garbage_positions.some(([gx,gy]) => gx===x&&gy===y);
|
| 393 |
+
const next = has
|
| 394 |
+
? currentState.garbage_positions.filter(([gx,gy]) => !(gx===x&&gy===y))
|
| 395 |
+
: [...currentState.garbage_positions, [x, y]];
|
| 396 |
+
|
| 397 |
+
try {
|
| 398 |
+
const res = await fetch(`${API_BASE}/configure`, {
|
| 399 |
+
method: "POST", headers:{"Content-Type":"application/json"},
|
| 400 |
+
body: JSON.stringify({task_id: taskSelect.value, garbage_positions: next})
|
| 401 |
+
});
|
| 402 |
+
const data = await res.json();
|
| 403 |
+
currentState = data.observation;
|
| 404 |
+
renderGrid(currentState);
|
| 405 |
+
addLog(`Garbage ${has?"removed":"placed"} at (${x},${y}) · ${next.length} remaining`, "sys");
|
| 406 |
+
} catch (e) { addLog(`Config error: ${e.message}`, "sys"); }
|
| 407 |
+
}
|
| 408 |
+
|
| 409 |
+
// ── Reset ─────────────────────────────────────────────────
|
| 410 |
+
async function resetEnv() {
|
| 411 |
+
if (autoMode) toggleAutoMode();
|
| 412 |
+
stepCount=0; totalReward=0; rewardHistory=[];
|
| 413 |
+
scoreText.textContent = "0.00";
|
| 414 |
+
episodeScoreChip.textContent = "Score 0.00";
|
| 415 |
+
stepCounter.textContent = "Step 0";
|
| 416 |
+
policyLabel.textContent = "–";
|
| 417 |
+
drawChart();
|
| 418 |
+
|
| 419 |
+
try {
|
| 420 |
+
const res = await fetch(`${API_BASE}/reset`, {
|
| 421 |
+
method:"POST", headers:{"Content-Type":"application/json"},
|
| 422 |
+
body: JSON.stringify({task_id: taskSelect.value})
|
| 423 |
+
});
|
| 424 |
+
const data = await res.json();
|
| 425 |
+
currentState = data.observation;
|
| 426 |
+
maxBattery = currentState.battery_level;
|
| 427 |
+
logFeed.innerHTML = "";
|
| 428 |
+
renderGrid(currentState, true);
|
| 429 |
+
updateTelemetry(currentState);
|
| 430 |
+
statusDot.className = "pulse-dot online";
|
| 431 |
+
statusLabel.textContent = "Connected";
|
| 432 |
+
} catch (e) {
|
| 433 |
+
statusDot.className = "pulse-dot";
|
| 434 |
+
statusLabel.textContent = "Offline";
|
| 435 |
+
addLog(`Cannot reach server — is app.py running?`, "sys");
|
| 436 |
+
}
|
| 437 |
+
}
|
| 438 |
+
|
| 439 |
+
// ── Single step ───────────────────────────────────────────
|
| 440 |
+
async function stepEnv() {
|
| 441 |
+
if (!currentState) return;
|
| 442 |
+
stepCount++;
|
| 443 |
+
|
| 444 |
+
// 1. Policy endpoint (LLM / Q-table on server)
|
| 445 |
+
let action = null, source = "bfs";
|
| 446 |
+
try {
|
| 447 |
+
const pr = await fetch(`${API_BASE}/policy`, {
|
| 448 |
+
method:"POST", headers:{"Content-Type":"application/json"},
|
| 449 |
+
body: JSON.stringify({message: currentState.message})
|
| 450 |
+
});
|
| 451 |
+
if (pr.ok) { const pd = await pr.json(); action=pd.action; source=pd.source||"llm"; }
|
| 452 |
+
} catch (_) {}
|
| 453 |
+
|
| 454 |
+
// 2. Local BFS fallback
|
| 455 |
+
if (!action) { action = localFallback(currentState); source = "bfs"; }
|
| 456 |
+
|
| 457 |
+
showPolicy(source, action);
|
| 458 |
+
|
| 459 |
+
// 3. Execute
|
| 460 |
+
try {
|
| 461 |
+
const res = await fetch(`${API_BASE}/step`, {
|
| 462 |
+
method:"POST", headers:{"Content-Type":"application/json"},
|
| 463 |
+
body: JSON.stringify({command: action})
|
| 464 |
+
});
|
| 465 |
+
const data = await res.json();
|
| 466 |
+
|
| 467 |
+
const wasCollect = action === "COLLECT";
|
| 468 |
+
currentState = data.observation;
|
| 469 |
+
renderGrid(currentState);
|
| 470 |
+
updateTelemetry(currentState, data.reward, data.done);
|
| 471 |
+
|
| 472 |
+
// Collect animation + particles
|
| 473 |
+
if (wasCollect && robotEntity) {
|
| 474 |
+
robotEntity.classList.add("collecting");
|
| 475 |
+
setTimeout(() => robotEntity.classList.remove("collecting"), 440);
|
| 476 |
+
const cx = parseInt(robotEntity.style.left) + CELL/2;
|
| 477 |
+
const cy = parseInt(robotEntity.style.top) + CELL/2;
|
| 478 |
+
spawnParticles(cx, cy);
|
| 479 |
+
}
|
| 480 |
+
|
| 481 |
+
const sign = data.reward >= 0 ? "+" : "";
|
| 482 |
+
addLog(`${action} · ${sign}${data.reward.toFixed(2)}`, source);
|
| 483 |
+
|
| 484 |
+
if (data.done) {
|
| 485 |
+
addLog(`🏁 Episode complete · total ${totalReward.toFixed(2)}`, "sys");
|
| 486 |
+
if (autoMode) toggleAutoMode();
|
| 487 |
+
}
|
| 488 |
+
} catch (e) {
|
| 489 |
+
addLog(`Step error: ${e.message}`, "sys");
|
| 490 |
+
if (autoMode) toggleAutoMode();
|
| 491 |
+
}
|
| 492 |
+
}
|
| 493 |
+
|
| 494 |
+
// ── Auto mode ─────────────────────────────────────────────
|
| 495 |
+
function toggleAutoMode() {
|
| 496 |
+
autoMode = !autoMode;
|
| 497 |
+
if (autoMode) {
|
| 498 |
+
autoBtn.textContent = "⏹ Stop";
|
| 499 |
+
autoBtn.className = "btn stop";
|
| 500 |
+
autoTimer = setInterval(stepEnv, stepDelay);
|
| 501 |
+
} else {
|
| 502 |
+
autoBtn.textContent = "▶ Run Policy";
|
| 503 |
+
autoBtn.className = "btn primary";
|
| 504 |
+
clearInterval(autoTimer);
|
| 505 |
+
}
|
| 506 |
+
}
|
| 507 |
+
|
| 508 |
+
// ── Event listeners ───────────────────────────────────────
|
| 509 |
+
resetBtn .addEventListener("click", resetEnv);
|
| 510 |
+
autoBtn .addEventListener("click", toggleAutoMode);
|
| 511 |
+
manualBtn.addEventListener("click", stepEnv);
|
| 512 |
+
taskSelect.addEventListener("change", resetEnv);
|
| 513 |
+
|
| 514 |
+
// ── Boot ──────────────────────────────────────────────────
|
| 515 |
+
resetEnv();
|
frontend/style.css
ADDED
|
@@ -0,0 +1,634 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
/* ═══════════════════════════════════════════════════════
|
| 2 |
+
GarbageBot — Bright Light Mode Dashboard CSS
|
| 3 |
+
═══════════════════════════════════════════════════════ */
|
| 4 |
+
|
| 5 |
+
/* ── Design tokens ────────────────────────────────────── */
|
| 6 |
+
:root {
|
| 7 |
+
/* Background — bright soft white/cyan */
|
| 8 |
+
--bg: #f1f5f9;
|
| 9 |
+
--surface: rgba(255, 255, 255, 0);
|
| 10 |
+
--surface-2: rgba(255, 255, 255, 0);
|
| 11 |
+
--surface-hi: rgba(255, 255, 255, 1);
|
| 12 |
+
|
| 13 |
+
--border: rgba(0, 0, 0, 0.08);
|
| 14 |
+
--border-hi: rgba(0, 0, 0, 0.16);
|
| 15 |
+
--border-glow: rgba(59, 130, 246, 0.3);
|
| 16 |
+
|
| 17 |
+
--text: #1e293b;
|
| 18 |
+
--text-muted: #64748b;
|
| 19 |
+
--text-dim: #94a3b8;
|
| 20 |
+
|
| 21 |
+
/* Vibrant accents (slightly darker for contrast on white) */
|
| 22 |
+
--blue: #3b82f6;
|
| 23 |
+
--blue-glow: rgba(59, 130, 246, 0.45);
|
| 24 |
+
--indigo: #6366f1;
|
| 25 |
+
--indigo-glow: rgba(99, 102, 241, 0.45);
|
| 26 |
+
--teal: #14b8a6;
|
| 27 |
+
--teal-glow: rgba(20, 184, 166, 0.4);
|
| 28 |
+
--purple: #a855f7;
|
| 29 |
+
--purple-glow: rgba(168, 85, 247, 0.45);
|
| 30 |
+
--success: #10b981;
|
| 31 |
+
--success-glow: rgba(16, 185, 129, 0.4);
|
| 32 |
+
--warning: #f59e0b;
|
| 33 |
+
--warning-glow: rgba(245, 158, 11, 0.4);
|
| 34 |
+
--danger: #ef4444;
|
| 35 |
+
--danger-glow: rgba(239, 68, 68, 0.5);
|
| 36 |
+
--neon: #0ea5e9;
|
| 37 |
+
--neon-glow: rgba(14, 165, 233, 0.35);
|
| 38 |
+
|
| 39 |
+
/* World */
|
| 40 |
+
--floor: #ffffff;
|
| 41 |
+
--floor-light: #f8fafc;
|
| 42 |
+
--wall: #e2e8f0;
|
| 43 |
+
--wall-top: #f1f5f9;
|
| 44 |
+
--wall-shadow: rgba(15, 23, 42, 0.15);
|
| 45 |
+
|
| 46 |
+
--radius: 16px;
|
| 47 |
+
--radius-sm: 10px;
|
| 48 |
+
--radius-xs: 6px;
|
| 49 |
+
|
| 50 |
+
--cell: 52px;
|
| 51 |
+
--gap: 3px;
|
| 52 |
+
--pad: 10px;
|
| 53 |
+
|
| 54 |
+
--font: 'Inter', sans-serif;
|
| 55 |
+
--mono: 'JetBrains Mono', monospace;
|
| 56 |
+
}
|
| 57 |
+
|
| 58 |
+
/* ── Reset ─────────────────────────────────────────────── */
|
| 59 |
+
*, *::before, *::after { box-sizing: border-box; margin: 0; padding: 0; }
|
| 60 |
+
html { font-size: 15px; }
|
| 61 |
+
body {
|
| 62 |
+
font-family: var(--font);
|
| 63 |
+
background: var(--bg);
|
| 64 |
+
color: var(--text);
|
| 65 |
+
min-height: 100vh;
|
| 66 |
+
overflow-x: hidden;
|
| 67 |
+
}
|
| 68 |
+
|
| 69 |
+
/* ── Animated background orbs ───────────────────────────── */
|
| 70 |
+
.bg-orbs {
|
| 71 |
+
position: fixed; inset: 0;
|
| 72 |
+
pointer-events: none; z-index: 0; overflow: hidden;
|
| 73 |
+
}
|
| 74 |
+
.orb {
|
| 75 |
+
position: absolute; border-radius: 50%;
|
| 76 |
+
filter: blur(100px);
|
| 77 |
+
animation: drift 28s ease-in-out infinite alternate;
|
| 78 |
+
}
|
| 79 |
+
/* Opacities remain similar (since background is white, colors will act like pale washes) */
|
| 80 |
+
.orb-1 { width:700px;height:700px; background:radial-gradient(circle,var(--blue) 0%,transparent 70%); opacity:.15; left:-200px;top:-150px; animation-delay:0s; }
|
| 81 |
+
.orb-2 { width:600px;height:600px; background:radial-gradient(circle,var(--purple) 0%,transparent 70%); opacity:.15; right:-180px;bottom:-100px; animation-delay:-9s; }
|
| 82 |
+
.orb-3 { width:500px;height:500px; background:radial-gradient(circle,var(--teal) 0%,transparent 70%); opacity:.1; left:42%;top:25%; animation-delay:-18s; }
|
| 83 |
+
.orb-4 { width:350px;height:350px; background:radial-gradient(circle,var(--warning) 0%,transparent 70%); opacity:.1; right:30%;top:-80px; animation-delay:-5s; }
|
| 84 |
+
|
| 85 |
+
@keyframes drift {
|
| 86 |
+
from { transform: translate(0, 0) scale(1); }
|
| 87 |
+
to { transform: translate(50px, 35px) scale(1.12); }
|
| 88 |
+
}
|
| 89 |
+
|
| 90 |
+
/* Dot-grid background */
|
| 91 |
+
.grid-bg {
|
| 92 |
+
position: fixed; inset: 0; z-index: 0; pointer-events: none;
|
| 93 |
+
background-image: radial-gradient(circle, rgba(15,23,42,.06) 1px, transparent 1px);
|
| 94 |
+
background-size: 28px 28px;
|
| 95 |
+
}
|
| 96 |
+
|
| 97 |
+
/* ── Layout ─────────────────────────────────────────────── */
|
| 98 |
+
.dashboard {
|
| 99 |
+
position: relative; z-index: 1;
|
| 100 |
+
max-width: 1360px;
|
| 101 |
+
margin: 0 auto;
|
| 102 |
+
padding: 1.25rem 1.5rem;
|
| 103 |
+
display: flex; flex-direction: column; gap: 1.1rem;
|
| 104 |
+
min-height: 100vh;
|
| 105 |
+
}
|
| 106 |
+
|
| 107 |
+
/* ── Glass panel ─────────────────────────────────────────── */
|
| 108 |
+
.panel {
|
| 109 |
+
background: var(--surface);
|
| 110 |
+
border: 1px solid var(--border);
|
| 111 |
+
border-radius: var(--radius);
|
| 112 |
+
}
|
| 113 |
+
.glass {
|
| 114 |
+
backdrop-filter: blur(24px) saturate(180%);
|
| 115 |
+
-webkit-backdrop-filter: blur(24px) saturate(180%);
|
| 116 |
+
box-shadow: 0 4px 48px rgba(15, 23, 42, 0.06), inset 0 1px 0 rgba(255,255,255,1);
|
| 117 |
+
}
|
| 118 |
+
|
| 119 |
+
/* ── HEADER ─────────────────────────────────────────────── */
|
| 120 |
+
.header-bar {
|
| 121 |
+
display: flex; align-items: center; gap: 1.2rem;
|
| 122 |
+
padding: .9rem 1.4rem;
|
| 123 |
+
flex-wrap: wrap;
|
| 124 |
+
background: linear-gradient(135deg, rgba(255,255,255,0.92) 0%, rgba(248,250,252,0.88) 100%);
|
| 125 |
+
border: 1px solid var(--border);
|
| 126 |
+
border-top: 1px solid rgba(255,255,255,1);
|
| 127 |
+
border-radius: var(--radius);
|
| 128 |
+
backdrop-filter: blur(28px);
|
| 129 |
+
box-shadow: 0 2px 36px rgba(15,23,42,0.05), 0 0 60px rgba(59,130,246,0.06), inset 0 1px 0 rgba(255,255,255,1);
|
| 130 |
+
}
|
| 131 |
+
|
| 132 |
+
.brand { display: flex; align-items: center; gap: .9rem; }
|
| 133 |
+
.brand-icon {
|
| 134 |
+
font-size: 2rem;
|
| 135 |
+
filter: drop-shadow(0 0 12px rgba(99,102,241,.4));
|
| 136 |
+
animation: bob 3s ease-in-out infinite;
|
| 137 |
+
}
|
| 138 |
+
@keyframes bob {
|
| 139 |
+
0%,100%{ transform:translateY(0) rotate(-2deg); }
|
| 140 |
+
50% { transform:translateY(-5px) rotate(2deg); }
|
| 141 |
+
}
|
| 142 |
+
h1 {
|
| 143 |
+
font-size: 1.4rem; font-weight: 800;
|
| 144 |
+
color: var(--text);
|
| 145 |
+
background: linear-gradient(135deg, var(--blue) 0%, var(--purple) 100%);
|
| 146 |
+
-webkit-background-clip: text; background-clip: text;
|
| 147 |
+
-webkit-text-fill-color: transparent;
|
| 148 |
+
letter-spacing: -.04em;
|
| 149 |
+
}
|
| 150 |
+
.version-tag {
|
| 151 |
+
font-size: .62rem; font-weight: 700;
|
| 152 |
+
background: rgba(99,102,241,.1); color: var(--indigo);
|
| 153 |
+
padding: 2px 7px; border-radius: 20px;
|
| 154 |
+
border: 1px solid rgba(99,102,241,.25);
|
| 155 |
+
vertical-align: middle; margin-left: 6px;
|
| 156 |
+
-webkit-text-fill-color: initial;
|
| 157 |
+
}
|
| 158 |
+
.brand-sub { font-size: .72rem; color: var(--text-muted); font-family: var(--mono); margin-top: 3px; }
|
| 159 |
+
|
| 160 |
+
/* Status strip */
|
| 161 |
+
.status-strip { display:flex; gap:.7rem; align-items:center; margin-left:auto; }
|
| 162 |
+
.status-pill {
|
| 163 |
+
display:flex; align-items:center; gap:.45rem;
|
| 164 |
+
font-size:.8rem; font-family:var(--mono);
|
| 165 |
+
padding:.35rem .9rem;
|
| 166 |
+
border-radius:20px;
|
| 167 |
+
background:rgba(255,255,255,.6);
|
| 168 |
+
border:1px solid var(--border);
|
| 169 |
+
box-shadow: 0 2px 4px rgba(0,0,0,0.02);
|
| 170 |
+
}
|
| 171 |
+
.pulse-dot {
|
| 172 |
+
width:9px;height:9px;border-radius:50%;
|
| 173 |
+
background:var(--danger);
|
| 174 |
+
box-shadow:0 0 6px var(--danger-glow);
|
| 175 |
+
transition:all .4s;
|
| 176 |
+
}
|
| 177 |
+
.pulse-dot.online {
|
| 178 |
+
background: var(--success);
|
| 179 |
+
box-shadow: 0 0 6px var(--success-glow);
|
| 180 |
+
animation: blink 2.2s ease-in-out infinite;
|
| 181 |
+
}
|
| 182 |
+
@keyframes blink { 0%,100%{opacity:1} 50%{opacity:.4} }
|
| 183 |
+
|
| 184 |
+
.policy-badge {
|
| 185 |
+
display:flex; align-items:center; gap:.4rem;
|
| 186 |
+
font-size:.8rem; font-family:var(--mono);
|
| 187 |
+
padding:.35rem .9rem;
|
| 188 |
+
border-radius:20px;
|
| 189 |
+
background: linear-gradient(135deg,rgba(59,130,246,.08),rgba(168,85,247,.08));
|
| 190 |
+
border:1px solid rgba(59,130,246,.25);
|
| 191 |
+
transition:all .4s;
|
| 192 |
+
color: var(--text);
|
| 193 |
+
}
|
| 194 |
+
.policy-badge.active { border-color:rgba(59,130,246,.5); box-shadow:0 0 12px rgba(59,130,246,.1); }
|
| 195 |
+
.badge-icon { font-size:.9rem; }
|
| 196 |
+
|
| 197 |
+
#mode-pill {
|
| 198 |
+
background: rgba(255, 255, 255, 0.8);
|
| 199 |
+
color: var(--text-muted);
|
| 200 |
+
font-weight: 700;
|
| 201 |
+
transition: all 0.3s ease;
|
| 202 |
+
}
|
| 203 |
+
#mode-pill.recharging { background: var(--blue); color: #fff; box-shadow: 0 0 12px var(--blue-glow); }
|
| 204 |
+
#mode-pill.unloading { background: var(--warning); color: #fff; box-shadow: 0 0 12px var(--warning-glow); }
|
| 205 |
+
#mode-pill.normal { background: rgba(0, 0, 0, 0.05); color: var(--text-muted); }
|
| 206 |
+
|
| 207 |
+
/* Controls */
|
| 208 |
+
.controls { display:flex; gap:.65rem; align-items:center; flex-wrap:wrap; }
|
| 209 |
+
|
| 210 |
+
select {
|
| 211 |
+
background:rgba(255,255,255,.6);
|
| 212 |
+
color:var(--text);
|
| 213 |
+
border:1px solid var(--border);
|
| 214 |
+
padding:.52rem .9rem;
|
| 215 |
+
border-radius:var(--radius-sm);
|
| 216 |
+
font-size:.85rem; font-family:var(--font);
|
| 217 |
+
outline:none; cursor:pointer;
|
| 218 |
+
transition:border-color .2s,box-shadow .2s;
|
| 219 |
+
box-shadow: 0 1px 3px rgba(0,0,0,.02);
|
| 220 |
+
}
|
| 221 |
+
select:hover,select:focus { border-color:var(--blue); box-shadow:0 0 0 3px rgba(59,130,246,.15); }
|
| 222 |
+
|
| 223 |
+
/* Speed control */
|
| 224 |
+
.speed-group {
|
| 225 |
+
display:flex; align-items:center; gap:.5rem;
|
| 226 |
+
padding:.4rem .85rem;
|
| 227 |
+
background:rgba(255,255,255,.6);
|
| 228 |
+
border:1px solid var(--border);
|
| 229 |
+
border-radius:var(--radius-sm);
|
| 230 |
+
}
|
| 231 |
+
.speed-label { font-size:.75rem; color:var(--text-muted); }
|
| 232 |
+
#speed-slider {
|
| 233 |
+
-webkit-appearance:none; appearance:none;
|
| 234 |
+
width:85px; height:4px;
|
| 235 |
+
background:linear-gradient(90deg, var(--blue) 0%, rgba(0,0,0,.08) 0%);
|
| 236 |
+
border-radius:2px; outline:none; cursor:pointer;
|
| 237 |
+
transition:background .1s;
|
| 238 |
+
}
|
| 239 |
+
#speed-slider::-webkit-slider-thumb {
|
| 240 |
+
-webkit-appearance:none;
|
| 241 |
+
width:16px;height:16px;border-radius:50%;
|
| 242 |
+
background:var(--blue);
|
| 243 |
+
box-shadow:0 0 6px var(--blue-glow);
|
| 244 |
+
cursor:pointer;
|
| 245 |
+
transition:transform .15s;
|
| 246 |
+
}
|
| 247 |
+
#speed-slider::-webkit-slider-thumb:active { transform:scale(1.25); }
|
| 248 |
+
#speed-val { font-size:.75rem;color:var(--text-muted);font-family:var(--mono);min-width:40px;text-align:right; }
|
| 249 |
+
|
| 250 |
+
/* Buttons */
|
| 251 |
+
.btn {
|
| 252 |
+
padding:.55rem 1.15rem;
|
| 253 |
+
border-radius:var(--radius-sm);
|
| 254 |
+
font-size:.875rem; font-weight:700;
|
| 255 |
+
cursor:pointer; border:none;
|
| 256 |
+
transition:all .18s ease;
|
| 257 |
+
position:relative; overflow:hidden;
|
| 258 |
+
white-space:nowrap; letter-spacing:.01em;
|
| 259 |
+
}
|
| 260 |
+
.btn::after {
|
| 261 |
+
content:''; position:absolute; inset:0;
|
| 262 |
+
background:rgba(0,0,0,0); transition:background .18s;
|
| 263 |
+
}
|
| 264 |
+
.btn:hover::after { background:rgba(0,0,0,.03); }
|
| 265 |
+
.btn:active { transform:scale(.96); }
|
| 266 |
+
|
| 267 |
+
.btn.primary {
|
| 268 |
+
background:linear-gradient(135deg, var(--blue), var(--indigo));
|
| 269 |
+
color:#fff;
|
| 270 |
+
box-shadow:0 4px 18px var(--blue-glow);
|
| 271 |
+
}
|
| 272 |
+
.btn.primary:hover { box-shadow:0 6px 24px var(--blue-glow); transform:translateY(-1px); }
|
| 273 |
+
|
| 274 |
+
.btn.secondary {
|
| 275 |
+
background:rgba(255,255,255,1);
|
| 276 |
+
color:var(--text); border:1px solid var(--border);
|
| 277 |
+
box-shadow: 0 1px 3px rgba(0,0,0,0.05);
|
| 278 |
+
}
|
| 279 |
+
.btn.secondary:hover { border-color:var(--border-hi); }
|
| 280 |
+
|
| 281 |
+
.btn.outline {
|
| 282 |
+
background:transparent; color:var(--text);
|
| 283 |
+
border:1px solid var(--border);
|
| 284 |
+
}
|
| 285 |
+
.btn.outline:hover { border-color:var(--border-hi); background:rgba(0,0,0,.02); }
|
| 286 |
+
|
| 287 |
+
.btn.stop {
|
| 288 |
+
background:linear-gradient(135deg, var(--danger), #ef4444);
|
| 289 |
+
color:#fff;
|
| 290 |
+
box-shadow:0 4px 18px var(--danger-glow);
|
| 291 |
+
}
|
| 292 |
+
.btn.stop:hover { box-shadow:0 6px 24px var(--danger-glow); transform:translateY(-1px); }
|
| 293 |
+
|
| 294 |
+
/* ── MAIN LAYOUT ────────────────────────────────────────── */
|
| 295 |
+
main {
|
| 296 |
+
display:grid;
|
| 297 |
+
grid-template-columns:1fr 310px;
|
| 298 |
+
gap:1.1rem;
|
| 299 |
+
flex:1;
|
| 300 |
+
}
|
| 301 |
+
|
| 302 |
+
/* ── GRID PANEL ─────────────────────────────────────────── */
|
| 303 |
+
.grid-panel {
|
| 304 |
+
display:flex; flex-direction:column;
|
| 305 |
+
padding:1.25rem; gap:.75rem;
|
| 306 |
+
}
|
| 307 |
+
.grid-header {
|
| 308 |
+
display:flex; align-items:center; justify-content:space-between;
|
| 309 |
+
}
|
| 310 |
+
.grid-title {
|
| 311 |
+
font-size:.75rem;font-weight:700;
|
| 312 |
+
text-transform:uppercase;letter-spacing:.1em;color:var(--text-muted);
|
| 313 |
+
}
|
| 314 |
+
.grid-meta { display:flex; gap:.5rem; }
|
| 315 |
+
|
| 316 |
+
.mono-chip {
|
| 317 |
+
font-family:var(--mono); font-size:.75rem;
|
| 318 |
+
padding:.22rem .65rem; border-radius:20px;
|
| 319 |
+
background:rgba(0,0,0,.03);
|
| 320 |
+
border:1px solid var(--border); color:var(--text-muted);
|
| 321 |
+
}
|
| 322 |
+
.accent-chip {
|
| 323 |
+
color:var(--blue); border-color:rgba(59,130,246,.25);
|
| 324 |
+
background:rgba(59,130,246,.08);
|
| 325 |
+
}
|
| 326 |
+
|
| 327 |
+
.grid-stage {
|
| 328 |
+
flex:1; display:flex; align-items:center; justify-content:center;
|
| 329 |
+
position:relative;
|
| 330 |
+
}
|
| 331 |
+
|
| 332 |
+
/* ─── The World ──────────────────────────────────────────── */
|
| 333 |
+
.grid-world {
|
| 334 |
+
display:grid;
|
| 335 |
+
gap:0; /* no cell gap — seamless floor */
|
| 336 |
+
position:relative;
|
| 337 |
+
border-radius:var(--radius);
|
| 338 |
+
overflow:hidden;
|
| 339 |
+
background: var(--floor);
|
| 340 |
+
border:1px solid rgba(0,0,0,.06);
|
| 341 |
+
box-shadow:
|
| 342 |
+
0 8px 30px rgba(15,23,42,.06),
|
| 343 |
+
inset 0 0 20px rgba(0,0,0,.02);
|
| 344 |
+
|
| 345 |
+
/* Continuous floor: subtle tile lines */
|
| 346 |
+
background-image:
|
| 347 |
+
linear-gradient(rgba(0,0,0,.03) 1px, transparent 1px),
|
| 348 |
+
linear-gradient(90deg, rgba(0,0,0,.03) 1px, transparent 1px);
|
| 349 |
+
background-size:
|
| 350 |
+
var(--cell) var(--cell),
|
| 351 |
+
var(--cell) var(--cell);
|
| 352 |
+
}
|
| 353 |
+
|
| 354 |
+
/* Transparent click-target cells — world feels seamless */
|
| 355 |
+
.cell {
|
| 356 |
+
width:var(--cell); height:var(--cell);
|
| 357 |
+
background:transparent;
|
| 358 |
+
position:relative; z-index:2;
|
| 359 |
+
cursor:pointer;
|
| 360 |
+
transition:background .18s;
|
| 361 |
+
}
|
| 362 |
+
.cell:hover { background:rgba(0,0,0,.025); }
|
| 363 |
+
|
| 364 |
+
/* ── OBSTACLES — 3D walls ────────────────────────────────── */
|
| 365 |
+
.world-obstacle {
|
| 366 |
+
position:absolute; z-index:10;
|
| 367 |
+
width:var(--cell); height:var(--cell);
|
| 368 |
+
border-radius:4px;
|
| 369 |
+
background:linear-gradient(160deg, var(--wall-top) 0%, var(--wall) 45%, #94a3b8 100%);
|
| 370 |
+
border:1px solid rgba(0,0,0,.15);
|
| 371 |
+
border-top-color:rgba(255,255,255,.8);
|
| 372 |
+
border-left-color:rgba(255,255,255,.4);
|
| 373 |
+
box-shadow:
|
| 374 |
+
inset -1px 0 0 rgba(0,0,0,.08),
|
| 375 |
+
0 6px 0 0 #94a3b8, /* 3D depth */
|
| 376 |
+
0 8px 12px rgba(15,23,42,.25); /* floor shadow */
|
| 377 |
+
overflow:hidden;
|
| 378 |
+
}
|
| 379 |
+
.world-obstacle::before {
|
| 380 |
+
content:''; position:absolute; inset:0;
|
| 381 |
+
background:linear-gradient(180deg,rgba(255,255,255,.2) 0%,transparent 40%);
|
| 382 |
+
}
|
| 383 |
+
/* stone texture lines */
|
| 384 |
+
.world-obstacle::after {
|
| 385 |
+
content:''; position:absolute; inset:0;
|
| 386 |
+
background:repeating-linear-gradient(
|
| 387 |
+
0deg, transparent, transparent 14px,
|
| 388 |
+
rgba(0,0,0,.03) 14px, rgba(0,0,0,.03) 15px
|
| 389 |
+
),
|
| 390 |
+
repeating-linear-gradient(
|
| 391 |
+
90deg, transparent, transparent 14px,
|
| 392 |
+
rgba(0,0,0,.02) 14px, rgba(0,0,0,.02) 15px
|
| 393 |
+
);
|
| 394 |
+
}
|
| 395 |
+
|
| 396 |
+
/* ── GARBAGE — glowing litter ────────────────────────────── */
|
| 397 |
+
.world-garbage {
|
| 398 |
+
position:absolute; z-index:8;
|
| 399 |
+
width:var(--cell); height:var(--cell);
|
| 400 |
+
display:flex; align-items:center; justify-content:center;
|
| 401 |
+
font-size:1.4rem;
|
| 402 |
+
cursor:pointer;
|
| 403 |
+
border-radius:var(--radius-xs);
|
| 404 |
+
background:rgba(168,85,247,.1);
|
| 405 |
+
border:1px solid rgba(168,85,247,.3);
|
| 406 |
+
animation: garbo-spawn .35s cubic-bezier(.34,1.56,.64,1) both;
|
| 407 |
+
transition:transform .15s;
|
| 408 |
+
}
|
| 409 |
+
.world-garbage:hover { transform:scale(1.12); }
|
| 410 |
+
@keyframes garbo-spawn {
|
| 411 |
+
from { transform:scale(0) rotate(-25deg); opacity:0; }
|
| 412 |
+
to { transform:scale(1) rotate(0); opacity:1; }
|
| 413 |
+
}
|
| 414 |
+
/* glow ring */
|
| 415 |
+
.world-garbage::before {
|
| 416 |
+
content:''; position:absolute; inset:-3px; border-radius:inherit;
|
| 417 |
+
border:1.5px solid rgba(168,85,247,.4);
|
| 418 |
+
animation:garbo-ring 2.4s ease-out infinite;
|
| 419 |
+
}
|
| 420 |
+
@keyframes garbo-ring {
|
| 421 |
+
0% { opacity:.6; transform:scale(1); }
|
| 422 |
+
70% { opacity:0; transform:scale(1.45); }
|
| 423 |
+
100% { opacity:0; transform:scale(1.45); }
|
| 424 |
+
}
|
| 425 |
+
/* bounce */
|
| 426 |
+
.world-garbage span { display:block; animation:garbo-bob 2.2s ease-in-out infinite; filter: drop-shadow(0 4px 6px rgba(168,85,247,.2)); }
|
| 427 |
+
@keyframes garbo-bob {
|
| 428 |
+
0%,100%{ transform:translateY(0); }
|
| 429 |
+
50% { transform:translateY(-4px); }
|
| 430 |
+
}
|
| 431 |
+
|
| 432 |
+
/* ── STATIONS ───────────────────────────────────────────── */
|
| 433 |
+
.world-home, .world-unload {
|
| 434 |
+
position: absolute;
|
| 435 |
+
width: var(--cell); height: var(--cell);
|
| 436 |
+
display: flex; align-items: center; justify-content: center;
|
| 437 |
+
font-size: 1.2rem;
|
| 438 |
+
border-radius: 4px;
|
| 439 |
+
z-index: 5;
|
| 440 |
+
pointer-events: none;
|
| 441 |
+
}
|
| 442 |
+
.world-home {
|
| 443 |
+
background: rgba(59, 130, 246, 0.08);
|
| 444 |
+
border: 2px dashed rgba(59, 130, 246, 0.3);
|
| 445 |
+
}
|
| 446 |
+
.world-home::before {
|
| 447 |
+
content: '⚡'; filter: drop-shadow(0 0 8px var(--blue));
|
| 448 |
+
}
|
| 449 |
+
|
| 450 |
+
.world-unload {
|
| 451 |
+
background: rgba(245, 158, 11, 0.08);
|
| 452 |
+
border: 2px dashed rgba(245, 158, 11, 0.3);
|
| 453 |
+
}
|
| 454 |
+
.world-unload::before {
|
| 455 |
+
content: '📦'; filter: drop-shadow(0 0 8px var(--warning));
|
| 456 |
+
}
|
| 457 |
+
|
| 458 |
+
/* ── ROBOT ───────────────────────────────────────────────── */
|
| 459 |
+
.robot-entity {
|
| 460 |
+
position:absolute; z-index:30;
|
| 461 |
+
width:var(--cell); height:var(--cell);
|
| 462 |
+
display:flex; align-items:center; justify-content:center;
|
| 463 |
+
font-size:1.3rem;
|
| 464 |
+
border-radius:var(--radius-sm);
|
| 465 |
+
/* vivid gradient body */
|
| 466 |
+
background:linear-gradient(145deg, #60a5fa, #818cf8 50%, #6366f1);
|
| 467 |
+
border:1px solid rgba(255,255,255,.6);
|
| 468 |
+
border-top-color:rgba(255,255,255,.9);
|
| 469 |
+
/* multi-layer shadow */
|
| 470 |
+
box-shadow:
|
| 471 |
+
0 0 0 1px rgba(59,130,246,.2),
|
| 472 |
+
0 4px 12px rgba(59,130,246,.3),
|
| 473 |
+
0 8px 24px rgba(15,23,42,.2);
|
| 474 |
+
|
| 475 |
+
transition:
|
| 476 |
+
left var(--move-dur,350ms) linear,
|
| 477 |
+
top var(--move-dur,350ms) linear,
|
| 478 |
+
background 0.4s ease,
|
| 479 |
+
box-shadow 0.4s ease;
|
| 480 |
+
will-change: left, top;
|
| 481 |
+
}
|
| 482 |
+
|
| 483 |
+
.robot-entity.recharging {
|
| 484 |
+
background: linear-gradient(145deg, #3b82f6, #60a5fa);
|
| 485 |
+
box-shadow: 0 0 20px var(--blue-glow);
|
| 486 |
+
}
|
| 487 |
+
.robot-entity.unloading {
|
| 488 |
+
background: linear-gradient(145deg, #f59e0b, #fbbf24);
|
| 489 |
+
box-shadow: 0 0 20px var(--warning-glow);
|
| 490 |
+
}
|
| 491 |
+
/* top highlight */
|
| 492 |
+
.robot-entity::before {
|
| 493 |
+
content:''; position:absolute; top:2px; left:3px; right:3px; height:40%;
|
| 494 |
+
background:linear-gradient(180deg,rgba(255,255,255,.5) 0%,transparent 100%);
|
| 495 |
+
border-radius: 4px 4px 50% 50%;
|
| 496 |
+
}
|
| 497 |
+
/* ground shadow */
|
| 498 |
+
.robot-entity::after {
|
| 499 |
+
content:''; position:absolute; bottom:-12px; left:50%;
|
| 500 |
+
transform:translateX(-50%);
|
| 501 |
+
width:80%; height:10px;
|
| 502 |
+
background:rgba(15,23,42,.25);
|
| 503 |
+
border-radius:50%; filter:blur(4px);
|
| 504 |
+
}
|
| 505 |
+
|
| 506 |
+
/* Collect burst */
|
| 507 |
+
.robot-entity.collecting {
|
| 508 |
+
animation:robot-collect .42s ease-out;
|
| 509 |
+
}
|
| 510 |
+
@keyframes robot-collect {
|
| 511 |
+
0% { box-shadow:0 0 0 1px rgba(59,130,246,.2),0 4px 12px rgba(59,130,246,.3),0 8px 24px rgba(15,23,42,.2); }
|
| 512 |
+
35% { box-shadow:0 0 0 2px var(--purple),0 12px 35px var(--purple-glow),0 20px 45px rgba(15,23,42,.15); }
|
| 513 |
+
100% { box-shadow:0 0 0 1px rgba(59,130,246,.2),0 4px 12px rgba(59,130,246,.3),0 8px 24px rgba(15,23,42,.2); }
|
| 514 |
+
}
|
| 515 |
+
|
| 516 |
+
/* ── Trail ghost ────────────────────────────────────────── */
|
| 517 |
+
.trail-ghost {
|
| 518 |
+
position:absolute; z-index:6;
|
| 519 |
+
width:var(--cell); height:var(--cell);
|
| 520 |
+
border-radius:var(--radius-sm);
|
| 521 |
+
background:rgba(59,130,246,.08);
|
| 522 |
+
border:1px solid rgba(59,130,246,.15);
|
| 523 |
+
pointer-events:none;
|
| 524 |
+
animation:trail-fade 1.1s ease-out forwards;
|
| 525 |
+
}
|
| 526 |
+
@keyframes trail-fade {
|
| 527 |
+
from { opacity:1; transform:scale(1); }
|
| 528 |
+
to { opacity:0; transform:scale(.82); }
|
| 529 |
+
}
|
| 530 |
+
|
| 531 |
+
/* ── Particle burst ─────────────────────────────────────── */
|
| 532 |
+
.particle-layer { position:absolute; inset:0; pointer-events:none; z-index:40; }
|
| 533 |
+
.particle {
|
| 534 |
+
position:absolute;
|
| 535 |
+
border-radius:50%; pointer-events:none;
|
| 536 |
+
animation:pfx .75s ease-out forwards;
|
| 537 |
+
}
|
| 538 |
+
@keyframes pfx {
|
| 539 |
+
0% { transform:translate(0,0) scale(1); opacity:1; }
|
| 540 |
+
100% { transform:var(--tx) scale(0); opacity:0; }
|
| 541 |
+
}
|
| 542 |
+
|
| 543 |
+
/* ── Grid hint ──────────────────────────────────────────── */
|
| 544 |
+
.grid-hint { font-size:.72rem; color:var(--text-dim); text-align:center; }
|
| 545 |
+
|
| 546 |
+
/* ── SIDE PANEL ─────────────────────────────────────────── */
|
| 547 |
+
.side-panel { display:flex; flex-direction:column; gap:1.1rem; }
|
| 548 |
+
|
| 549 |
+
.section-title {
|
| 550 |
+
font-size:.72rem; font-weight:700;
|
| 551 |
+
text-transform:uppercase; letter-spacing:.12em; color:var(--text-muted);
|
| 552 |
+
margin-bottom:.9rem;
|
| 553 |
+
}
|
| 554 |
+
|
| 555 |
+
/* Telemetry card */
|
| 556 |
+
.tele-card { padding:1.2rem; }
|
| 557 |
+
.stat-row { display:flex; align-items:center; gap:.75rem; margin-bottom:.95rem; }
|
| 558 |
+
.stat-icon { font-size:1.3rem; flex-shrink:0; width:34px; text-align:center; filter:drop-shadow(0 2px 4px rgba(0,0,0,.1)); }
|
| 559 |
+
.stat-body { flex:1; }
|
| 560 |
+
.stat-label-row { display:flex; justify-content:space-between; align-items:baseline; margin-bottom:.35rem; }
|
| 561 |
+
.stat-label { font-size:.78rem; color:var(--text-muted); }
|
| 562 |
+
.stat-num { font-family:var(--mono); font-size:.85rem; font-weight:600; color:var(--text); }
|
| 563 |
+
.stat-num.big-num { font-size:1.55rem; font-weight:800; }
|
| 564 |
+
.stat-num.accent { color:var(--blue); }
|
| 565 |
+
|
| 566 |
+
.progress-track {
|
| 567 |
+
height:7px; background:rgba(0,0,0,.06);
|
| 568 |
+
border-radius:4px; overflow:hidden;
|
| 569 |
+
}
|
| 570 |
+
.progress-fill {
|
| 571 |
+
height:100%; border-radius:4px;
|
| 572 |
+
background:var(--success);
|
| 573 |
+
transition:width .45s ease, background .45s ease;
|
| 574 |
+
position:relative; overflow:hidden;
|
| 575 |
+
}
|
| 576 |
+
.progress-fill::after {
|
| 577 |
+
content:''; position:absolute; inset:0;
|
| 578 |
+
background:linear-gradient(90deg,transparent,rgba(255,255,255,.4),transparent);
|
| 579 |
+
animation:shimmer 1.8s linear infinite;
|
| 580 |
+
}
|
| 581 |
+
@keyframes shimmer { from{transform:translateX(-100%)} to{transform:translateX(100%)} }
|
| 582 |
+
|
| 583 |
+
/* Mini chart */
|
| 584 |
+
.chart-wrap { margin-top:.5rem; }
|
| 585 |
+
#reward-chart { width:100%; height:68px; border-radius:var(--radius-xs); }
|
| 586 |
+
|
| 587 |
+
/* Log card */
|
| 588 |
+
.log-card { padding:1.2rem; display:flex; flex-direction:column; flex:1; min-height:0; }
|
| 589 |
+
.log-header { display:flex; justify-content:space-between; align-items:center; margin-bottom:.7rem; }
|
| 590 |
+
.log-header .section-title { margin-bottom:0; }
|
| 591 |
+
.clear-btn { font-size:.7rem; color:var(--text-muted); background:none; border:none; cursor:pointer; transition:color .2s; }
|
| 592 |
+
.clear-btn:hover { color:var(--danger); }
|
| 593 |
+
|
| 594 |
+
.log-feed {
|
| 595 |
+
flex:1; display:flex; flex-direction:column; gap:.45rem;
|
| 596 |
+
overflow-y:auto; max-height:280px; padding-right:.2rem;
|
| 597 |
+
}
|
| 598 |
+
.placeholder { font-size:.8rem; color:var(--text-dim); text-align:center; padding:1rem; }
|
| 599 |
+
|
| 600 |
+
.log-entry {
|
| 601 |
+
display:flex; gap:.6rem; align-items:flex-start;
|
| 602 |
+
font-size:.77rem; line-height:1.45;
|
| 603 |
+
padding:.55rem .7rem;
|
| 604 |
+
border-radius:var(--radius-sm);
|
| 605 |
+
background:rgba(255,255,255,1);
|
| 606 |
+
border:1px solid rgba(0,0,0,.04);
|
| 607 |
+
box-shadow: 0 1px 3px rgba(15,23,42,.03);
|
| 608 |
+
color: var(--text);
|
| 609 |
+
animation:slide-in .22s cubic-bezier(.22,1,.36,1);
|
| 610 |
+
transition:border-color .2s, box-shadow .2s;
|
| 611 |
+
}
|
| 612 |
+
.log-entry:hover { border-color:var(--border-hi); box-shadow: 0 2px 6px rgba(15,23,42,.06); }
|
| 613 |
+
@keyframes slide-in { from{opacity:0;transform:translateY(5px)} to{opacity:1;transform:translateY(0)} }
|
| 614 |
+
|
| 615 |
+
.log-badge {
|
| 616 |
+
font-family:var(--mono); font-size:.64rem; font-weight:700;
|
| 617 |
+
padding:2px 6px; border-radius:4px; flex-shrink:0; margin-top:2px;
|
| 618 |
+
}
|
| 619 |
+
.log-badge.llm { background:rgba(59,130,246,.12); color:var(--blue); }
|
| 620 |
+
.log-badge.bfs { background:rgba(20,184,166,.12); color:var(--teal); }
|
| 621 |
+
.log-badge.q-table { background:rgba(245,158,11,.15); color:#d97706; }
|
| 622 |
+
.log-badge.sys { background:rgba(0,0,0,.05); color:var(--text-muted); }
|
| 623 |
+
|
| 624 |
+
.log-footer {
|
| 625 |
+
font-size:.7rem; color:var(--text-dim);
|
| 626 |
+
text-align:center; margin-top:.6rem; padding-top:.55rem;
|
| 627 |
+
border-top:1px solid var(--border);
|
| 628 |
+
}
|
| 629 |
+
|
| 630 |
+
/* ── Scrollbar ──────────────────────────────────────────── */
|
| 631 |
+
::-webkit-scrollbar { width:5px; }
|
| 632 |
+
::-webkit-scrollbar-track { background:transparent; }
|
| 633 |
+
::-webkit-scrollbar-thumb { background:rgba(0,0,0,.15); border-radius:3px; }
|
| 634 |
+
::-webkit-scrollbar-thumb:hover { background:rgba(0,0,0,.25); }
|
garbage_robot_dataset.jsonl
ADDED
|
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (0, 0). Garbage at [(4, 4)]. Battery: 30/30. No obstacles nearby.\n\n### Response:\nRIGHT"}
|
| 2 |
+
{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (1, 0). Garbage at [(4, 4)]. Battery: 29/30. No obstacles nearby.\n\n### Response:\nRIGHT"}
|
| 3 |
+
{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (2, 0). Garbage at [(4, 4)]. Battery: 28/30. No obstacles nearby.\n\n### Response:\nRIGHT"}
|
| 4 |
+
{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (3, 0). Garbage at [(4, 4)]. Battery: 27/30. No obstacles nearby.\n\n### Response:\nRIGHT"}
|
| 5 |
+
{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (4, 0). Garbage at [(4, 4)]. Battery: 26/30. No obstacles nearby.\n\n### Response:\nUP"}
|
| 6 |
+
{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (4, 1). Garbage at [(4, 4)]. Battery: 25/30. No obstacles nearby.\n\n### Response:\nUP"}
|
| 7 |
+
{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (4, 2). Garbage at [(4, 4)]. Battery: 24/30. No obstacles nearby.\n\n### Response:\nUP"}
|
| 8 |
+
{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (4, 3). Garbage at [(4, 4)]. Battery: 23/30. No obstacles nearby.\n\n### Response:\nUP"}
|
| 9 |
+
{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (4, 4). Garbage at [(4, 4)]. Battery: 22/30. You are ON the garbage.\n\n### Response:\nCOLLECT"}
|
| 10 |
+
{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (3, 3). Garbage at [(1,1),(5,5),(1,5)]. Battery: 50/50. BLOCKED! DOWN is an obstacle. Blocked directions: DOWN, LEFT. Choose a different direction.\n\n### Response:\nUP"}
|
| 11 |
+
{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (3, 4). Garbage at [(1,1),(5,5),(1,5)]. Battery: 49/50. Moving toward (1,5).\n\n### Response:\nLEFT"}
|
| 12 |
+
{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (2, 4). Garbage at [(1,1),(5,5),(1,5)]. Battery: 48/50. BLOCKED! LEFT is an obstacle. Blocked directions: LEFT. Choose RIGHT or UP.\n\n### Response:\nUP"}
|
| 13 |
+
{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (2, 5). Garbage at [(1,1),(5,5),(1,5)]. Battery: 47/50. Clear path left.\n\n### Response:\nLEFT"}
|
| 14 |
+
{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (1, 5). Garbage at [(1,1),(5,5),(1,5)]. Battery: 46/50. You are ON the garbage.\n\n### Response:\nCOLLECT"}
|
| 15 |
+
{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (1, 5). Garbage at [(1,1),(5,5)]. Battery: 45/50. Next target (5,5), moving right.\n\n### Response:\nRIGHT"}
|
| 16 |
+
{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (2, 5). Garbage at [(1,1),(5,5)]. Battery: 44/50. Continuing right.\n\n### Response:\nRIGHT"}
|
| 17 |
+
{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (3, 5). Garbage at [(1,1),(5,5)]. Battery: 43/50. Continuing right.\n\n### Response:\nRIGHT"}
|
| 18 |
+
{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (4, 5). Garbage at [(1,1),(5,5)]. Battery: 42/50. Continuing right.\n\n### Response:\nRIGHT"}
|
| 19 |
+
{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (5, 5). Garbage at [(1,1),(5,5)]. Battery: 41/50. You are ON the garbage.\n\n### Response:\nCOLLECT"}
|
| 20 |
+
{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (5, 5). Garbage at [(1,1)]. Battery: 40/50. Last garbage at (1,1), heading left+down.\n\n### Response:\nLEFT"}
|
| 21 |
+
{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (4, 5). Garbage at [(1,1)]. Battery: 39/50. Continuing toward (1,1).\n\n### Response:\nLEFT"}
|
| 22 |
+
{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (3, 5). Garbage at [(1,1)]. Battery: 38/50. BLOCKED! DOWN is an obstacle. Go LEFT.\n\n### Response:\nLEFT"}
|
| 23 |
+
{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (2, 5). Garbage at [(1,1)]. Battery: 37/50. BLOCKED! DOWN is an obstacle. Go LEFT.\n\n### Response:\nLEFT"}
|
| 24 |
+
{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (1, 5). Garbage at [(1,1)]. Battery: 36/50. Path down is clear now.\n\n### Response:\nDOWN"}
|
| 25 |
+
{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (1, 4). Garbage at [(1,1)]. Battery: 35/50. Continuing down.\n\n### Response:\nDOWN"}
|
| 26 |
+
{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (1, 3). Garbage at [(1,1)]. Battery: 34/50. Continuing down.\n\n### Response:\nDOWN"}
|
| 27 |
+
{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (1, 2). Garbage at [(1,1)]. Battery: 33/50. Continuing down.\n\n### Response:\nDOWN"}
|
| 28 |
+
{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (1, 1). Garbage at [(1,1)]. Battery: 32/50. You are ON the last garbage.\n\n### Response:\nCOLLECT"}
|
| 29 |
+
{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (2, 2). Garbage at [(4,4)]. Battery: 5/30. CRITICAL battery! Move directly: RIGHT.\n\n### Response:\nRIGHT"}
|
| 30 |
+
{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (3, 2). Garbage at [(4,4)]. Battery: 4/30. CRITICAL battery! Move directly: RIGHT.\n\n### Response:\nRIGHT"}
|
| 31 |
+
{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (4, 2). Garbage at [(4,4)]. Battery: 3/30. CRITICAL battery! Move directly: UP.\n\n### Response:\nUP"}
|
| 32 |
+
{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (4, 3). Garbage at [(4,4)]. Battery: 2/30. CRITICAL battery! Move directly: UP.\n\n### Response:\nUP"}
|
| 33 |
+
{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (4, 4). Garbage at [(4,4)]. Battery: 1/30. You are ON the garbage. COLLECT NOW.\n\n### Response:\nCOLLECT"}
|
| 34 |
+
{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (2, 3). Garbage at [(4,4)]. Battery: 20/30. You are NOT on garbage. Move toward it.\n\n### Response:\nRIGHT"}
|
| 35 |
+
{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (0, 0). Garbage at [(3,3)]. Battery: 15/30. You are NOT on garbage. Do not COLLECT.\n\n### Response:\nRIGHT"}
|
inference.py
ADDED
|
@@ -0,0 +1,520 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import os
|
| 2 |
+
import time
|
| 3 |
+
import requests
|
| 4 |
+
import json
|
| 5 |
+
from collections import deque
|
| 6 |
+
from openai import OpenAI
|
| 7 |
+
|
| 8 |
+
API_BASE_URL = os.environ.get("API_BASE_URL", "https://api.openai.com/v1")
|
| 9 |
+
MODEL_NAME = os.environ.get("MODEL_NAME", "gpt-4o-mini")
|
| 10 |
+
HF_TOKEN = os.environ.get("HF_TOKEN", "")
|
| 11 |
+
ENV_URL = os.environ.get("ENV_URL", "http://localhost:7861")
|
| 12 |
+
LOCAL_MODEL_PATH = os.environ.get(
|
| 13 |
+
"LOCAL_MODEL_PATH",
|
| 14 |
+
"TechAvenger/GarbageBot-Weights"
|
| 15 |
+
)
|
| 16 |
+
|
| 17 |
+
MAX_STEPS = 200 # raised to account for recharge/unload detours
|
| 18 |
+
|
| 19 |
+
# Lazy-loaded local model — populated in main() if Unsloth is available
|
| 20 |
+
_local_model = None
|
| 21 |
+
_local_tokenizer = None
|
| 22 |
+
|
| 23 |
+
# Q-Learning agent — loaded once in main(), used as primary policy
|
| 24 |
+
_ql_agent = None
|
| 25 |
+
try:
|
| 26 |
+
from qlearning import QLearningAgent
|
| 27 |
+
except ImportError:
|
| 28 |
+
QLearningAgent = None
|
| 29 |
+
|
| 30 |
+
|
| 31 |
+
# ──────────────────────────────────────────────────────────
|
| 32 |
+
# BFS CORE
|
| 33 |
+
# ──────────────────────────────────────────────────────────
|
| 34 |
+
|
| 35 |
+
def bfs(start, goal, obstacles, grid_w, grid_h):
|
| 36 |
+
"""
|
| 37 |
+
BFS from start to goal avoiding obstacles.
|
| 38 |
+
Returns (first_direction, path_length) or (None, inf) if unreachable.
|
| 39 |
+
"""
|
| 40 |
+
start, goal = tuple(start), tuple(goal)
|
| 41 |
+
if start == goal:
|
| 42 |
+
return ("COLLECT", 0)
|
| 43 |
+
|
| 44 |
+
obstacle_set = frozenset(tuple(o) for o in obstacles)
|
| 45 |
+
dirs = [("RIGHT",(1,0)), ("LEFT",(-1,0)), ("UP",(0,1)), ("DOWN",(0,-1))]
|
| 46 |
+
queue = deque([(start, None, 0)])
|
| 47 |
+
visited = {start}
|
| 48 |
+
|
| 49 |
+
while queue:
|
| 50 |
+
pos, first, depth = queue.popleft()
|
| 51 |
+
for name, (dx, dy) in dirs:
|
| 52 |
+
npos = (pos[0]+dx, pos[1]+dy)
|
| 53 |
+
if not (0 <= npos[0] < grid_w and 0 <= npos[1] < grid_h):
|
| 54 |
+
continue
|
| 55 |
+
if npos in obstacle_set or npos in visited:
|
| 56 |
+
continue
|
| 57 |
+
move = first if first else name
|
| 58 |
+
if npos == goal:
|
| 59 |
+
return (move, depth + 1)
|
| 60 |
+
visited.add(npos)
|
| 61 |
+
queue.append((npos, move, depth + 1))
|
| 62 |
+
|
| 63 |
+
return (None, float('inf'))
|
| 64 |
+
|
| 65 |
+
|
| 66 |
+
def nearest_neighbour_order(start, targets, obstacles, grid_w, grid_h):
|
| 67 |
+
"""
|
| 68 |
+
Orders garbage by nearest-neighbour TSP using actual BFS cost.
|
| 69 |
+
Much better than Manhattan when obstacles split direct paths.
|
| 70 |
+
"""
|
| 71 |
+
remaining = list(targets)
|
| 72 |
+
ordered = []
|
| 73 |
+
current = tuple(start)
|
| 74 |
+
while remaining:
|
| 75 |
+
best = min(remaining, key=lambda t: bfs(current, t, obstacles, grid_w, grid_h)[1])
|
| 76 |
+
ordered.append(best)
|
| 77 |
+
remaining.remove(best)
|
| 78 |
+
current = tuple(best)
|
| 79 |
+
return ordered
|
| 80 |
+
|
| 81 |
+
|
| 82 |
+
# ──────────────────────────────────────────────────────────
|
| 83 |
+
# HEURISTIC — BFS-based, mode-aware
|
| 84 |
+
# ──────────────────────────────────────────────────────────
|
| 85 |
+
|
| 86 |
+
def heuristic_action(obs, _stuck_counter=None) -> str:
|
| 87 |
+
"""
|
| 88 |
+
Pure-BFS heuristic that respects the robot's autonomous mode.
|
| 89 |
+
|
| 90 |
+
When the environment reports robot_mode == 'recharging' or 'unloading',
|
| 91 |
+
the action suggested here is overridden by the environment's own resolver
|
| 92 |
+
anyway — but we still return a sensible direction so logs are readable.
|
| 93 |
+
|
| 94 |
+
In normal mode the heuristic targets the nearest garbage via BFS with a
|
| 95 |
+
nearest-neighbour tour order, plus a stuck-counter escape hatch.
|
| 96 |
+
"""
|
| 97 |
+
if _stuck_counter is None:
|
| 98 |
+
_stuck_counter = [0]
|
| 99 |
+
|
| 100 |
+
robot_mode = obs.get("robot_mode", "normal")
|
| 101 |
+
r_pos = list(obs["robot_position"])
|
| 102 |
+
obstacles = [list(o) for o in obs["obstacle_positions"]]
|
| 103 |
+
grid_w, grid_h = obs["grid_size"]
|
| 104 |
+
|
| 105 |
+
# ── Recharging: head to home ───────────────────────────────
|
| 106 |
+
if robot_mode == "recharging":
|
| 107 |
+
home = obs.get("home_position", r_pos)
|
| 108 |
+
move, _ = bfs(r_pos, home, obstacles, grid_w, grid_h)
|
| 109 |
+
return move or "UP"
|
| 110 |
+
|
| 111 |
+
# ── Unloading: head to unload station ─────────────────────
|
| 112 |
+
if robot_mode == "unloading":
|
| 113 |
+
station = obs.get("unload_station", r_pos)
|
| 114 |
+
move, _ = bfs(r_pos, station, obstacles, grid_w, grid_h)
|
| 115 |
+
return move or "UP"
|
| 116 |
+
|
| 117 |
+
# ── Normal: collect nearest garbage ───────────────────────
|
| 118 |
+
garbage = [tuple(g) for g in obs["garbage_positions"]]
|
| 119 |
+
if not garbage:
|
| 120 |
+
return "UP" # nothing to do; env will mark episode done
|
| 121 |
+
|
| 122 |
+
if tuple(r_pos) in garbage:
|
| 123 |
+
_stuck_counter[0] = 0
|
| 124 |
+
return "COLLECT"
|
| 125 |
+
|
| 126 |
+
ordered = nearest_neighbour_order(r_pos, garbage, obstacles, grid_w, grid_h)
|
| 127 |
+
|
| 128 |
+
# Stuck-counter escape: try alternate targets after repeated no-progress steps
|
| 129 |
+
if _stuck_counter[0] >= 4 and len(ordered) > 1:
|
| 130 |
+
ordered = [ordered[1], ordered[0]] + ordered[2:]
|
| 131 |
+
if _stuck_counter[0] >= 8:
|
| 132 |
+
ordered = ordered[1:] + ordered[:1]
|
| 133 |
+
_stuck_counter[0] = 0
|
| 134 |
+
|
| 135 |
+
target = ordered[0]
|
| 136 |
+
if tuple(target) == tuple(r_pos):
|
| 137 |
+
_stuck_counter[0] = 0
|
| 138 |
+
return "COLLECT"
|
| 139 |
+
|
| 140 |
+
move, _ = bfs(r_pos, target, obstacles, grid_w, grid_h)
|
| 141 |
+
if move and move != "COLLECT":
|
| 142 |
+
_stuck_counter[0] = 0
|
| 143 |
+
return move
|
| 144 |
+
|
| 145 |
+
# Primary target unreachable — try alternates
|
| 146 |
+
for alt in ordered[1:]:
|
| 147 |
+
move, _ = bfs(r_pos, alt, obstacles, grid_w, grid_h)
|
| 148 |
+
if move and move != "COLLECT":
|
| 149 |
+
_stuck_counter[0] = 0
|
| 150 |
+
return move
|
| 151 |
+
|
| 152 |
+
# Fully boxed in: take any open neighbouring cell to escape
|
| 153 |
+
_stuck_counter[0] += 1
|
| 154 |
+
obstacle_set = frozenset(tuple(o) for o in obstacles)
|
| 155 |
+
for name, (dx, dy) in [("RIGHT",(1,0)),("LEFT",(-1,0)),("UP",(0,1)),("DOWN",(0,-1))]:
|
| 156 |
+
npos = (r_pos[0]+dx, r_pos[1]+dy)
|
| 157 |
+
if (0 <= npos[0] < grid_w and 0 <= npos[1] < grid_h
|
| 158 |
+
and npos not in obstacle_set):
|
| 159 |
+
return name
|
| 160 |
+
|
| 161 |
+
return "RIGHT"
|
| 162 |
+
|
| 163 |
+
|
| 164 |
+
# ──────────────────────────────────────────────────────────
|
| 165 |
+
# ACTION RESOLVER (priority: Q-table → LLM → BFS heuristic)
|
| 166 |
+
# ──────────────────────────────────────────────────────────
|
| 167 |
+
|
| 168 |
+
def resolve_next_action(client, obs, context_history, stuck_counter=None) -> str:
|
| 169 |
+
"""
|
| 170 |
+
Decide the next action using the priority chain:
|
| 171 |
+
1. Q-table (trained, deterministic, fastest)
|
| 172 |
+
2. Fine-tuned local LLM (Unsloth export)
|
| 173 |
+
3. Remote OpenAI-compatible endpoint
|
| 174 |
+
4. BFS heuristic (fallback, always works)
|
| 175 |
+
|
| 176 |
+
The BFS heuristic is mode-aware and is passed as a hint to the LLM.
|
| 177 |
+
Note: when the environment is in MODE_RECHARGE or MODE_UNLOAD it will
|
| 178 |
+
override whatever action we return, so correctness in those modes is
|
| 179 |
+
the heuristic's responsibility, not the LLM's.
|
| 180 |
+
"""
|
| 181 |
+
heuristic = heuristic_action(obs, stuck_counter)
|
| 182 |
+
|
| 183 |
+
# ── 1. Q-Learning policy (trained, deterministic) ──────────
|
| 184 |
+
if _ql_agent is not None:
|
| 185 |
+
q_action = _ql_agent.get_action(obs)
|
| 186 |
+
if q_action is not None:
|
| 187 |
+
return q_action
|
| 188 |
+
|
| 189 |
+
# Build a mode-aware system prompt for the LLM
|
| 190 |
+
robot_mode = obs.get("robot_mode", "normal")
|
| 191 |
+
dist_home = obs.get("distance_from_home", -1)
|
| 192 |
+
storage_load = obs.get("current_storage_load", 0)
|
| 193 |
+
capacity = obs.get("storage_capacity", 6)
|
| 194 |
+
home = obs.get("home_position", (0, 0))
|
| 195 |
+
station = obs.get("unload_station", (0, 0))
|
| 196 |
+
|
| 197 |
+
mode_note = ""
|
| 198 |
+
if robot_mode == "recharging":
|
| 199 |
+
mode_note = (
|
| 200 |
+
f"\n⚠ ROBOT MODE: RECHARGING — navigate to home {home} "
|
| 201 |
+
f"({dist_home} steps away). Do NOT collect garbage until recharged."
|
| 202 |
+
)
|
| 203 |
+
elif robot_mode == "unloading":
|
| 204 |
+
mode_note = (
|
| 205 |
+
f"\n⚠ ROBOT MODE: UNLOADING — navigate to unload station {station}. "
|
| 206 |
+
f"Storage is full ({storage_load}/{capacity}). "
|
| 207 |
+
f"Do NOT collect garbage until unloaded."
|
| 208 |
+
)
|
| 209 |
+
else:
|
| 210 |
+
mode_note = (
|
| 211 |
+
f"\nBattery distance to home: {dist_home} steps. "
|
| 212 |
+
f"Storage: {storage_load}/{capacity}."
|
| 213 |
+
)
|
| 214 |
+
|
| 215 |
+
system_prompt = (
|
| 216 |
+
"You control a garbage collecting robot on a grid.\n"
|
| 217 |
+
"Reply with EXACTLY ONE of: UP DOWN LEFT RIGHT COLLECT\n\n"
|
| 218 |
+
"Rules:\n"
|
| 219 |
+
"- COLLECT only when your position exactly matches a garbage position.\n"
|
| 220 |
+
"- Never move into an obstacle tile.\n"
|
| 221 |
+
"- The environment handles recharging and unloading automatically.\n"
|
| 222 |
+
f"- Pathfinding suggests: {heuristic} (only override if clearly wrong)"
|
| 223 |
+
f"{mode_note}"
|
| 224 |
+
)
|
| 225 |
+
|
| 226 |
+
# ── 2. Try local fine-tuned merged model (Alpaca prompt format) ─────
|
| 227 |
+
if _local_model is not None and _local_tokenizer is not None:
|
| 228 |
+
try:
|
| 229 |
+
alpaca_instruction = (
|
| 230 |
+
"You are an AI brain controlling a garbage collecting robot.\n"
|
| 231 |
+
"Reply with EXACTLY ONE of: UP DOWN LEFT RIGHT COLLECT"
|
| 232 |
+
)
|
| 233 |
+
prompt = (
|
| 234 |
+
f"### Instruction:\n{alpaca_instruction}\n\n"
|
| 235 |
+
f"### Input:\nENVIRONMENT STATUS:\n{obs['message']}\n\n"
|
| 236 |
+
f"### Response:\n"
|
| 237 |
+
)
|
| 238 |
+
inputs = _local_tokenizer(
|
| 239 |
+
prompt, return_tensors="pt", truncation=True, max_length=512
|
| 240 |
+
).to(_local_model.device)
|
| 241 |
+
with __import__('torch').no_grad():
|
| 242 |
+
outputs = _local_model.generate(
|
| 243 |
+
**inputs, max_new_tokens=6, do_sample=False,
|
| 244 |
+
pad_token_id=_local_tokenizer.eos_token_id
|
| 245 |
+
)
|
| 246 |
+
new_tokens = outputs[0][inputs["input_ids"].shape[1]:]
|
| 247 |
+
token = _local_tokenizer.decode(new_tokens, skip_special_tokens=True).strip().upper()
|
| 248 |
+
for valid in ["UP", "DOWN", "LEFT", "RIGHT", "COLLECT"]:
|
| 249 |
+
if valid in token:
|
| 250 |
+
print(f"[LOCAL LLM] {token.split()[0] if token else '?'} (raw: {token!r})")
|
| 251 |
+
return valid
|
| 252 |
+
except Exception as e:
|
| 253 |
+
print(f"[LOCAL LLM ERROR] {e}")
|
| 254 |
+
|
| 255 |
+
# ── 3. Try remote OpenAI-compatible endpoint ─────────────────
|
| 256 |
+
if client is not None:
|
| 257 |
+
try:
|
| 258 |
+
response = client.chat.completions.create(
|
| 259 |
+
model=MODEL_NAME,
|
| 260 |
+
messages=[
|
| 261 |
+
{"role": "system", "content": system_prompt},
|
| 262 |
+
*context_history,
|
| 263 |
+
{"role": "user", "content": f"STATUS:\n{obs['message']}\n\nCommand?"}
|
| 264 |
+
],
|
| 265 |
+
temperature=0.0,
|
| 266 |
+
max_tokens=6
|
| 267 |
+
)
|
| 268 |
+
action = response.choices[0].message.content.strip().upper()
|
| 269 |
+
for valid in ["UP", "DOWN", "LEFT", "RIGHT", "COLLECT"]:
|
| 270 |
+
if valid in action:
|
| 271 |
+
return valid
|
| 272 |
+
except Exception as e:
|
| 273 |
+
print(f"[REMOTE LLM ERROR] {e}")
|
| 274 |
+
|
| 275 |
+
# ── 4. Final fallback: pure BFS heuristic ─────────────────
|
| 276 |
+
return heuristic
|
| 277 |
+
|
| 278 |
+
|
| 279 |
+
# ──────────────────────────────────────────────────────────
|
| 280 |
+
# INTERACTIVE GARBAGE PLACEMENT
|
| 281 |
+
# ──────────────────────────────────────────────────────────
|
| 282 |
+
|
| 283 |
+
def prompt_custom_garbage(grid_w, grid_h, obstacles):
|
| 284 |
+
"""
|
| 285 |
+
Interactive CLI helper: prompts the user to enter garbage positions
|
| 286 |
+
for a dynamic episode.
|
| 287 |
+
"""
|
| 288 |
+
obstacle_set = set(tuple(o) for o in obstacles)
|
| 289 |
+
print(f"\n Grid: {grid_w} x {grid_h} Obstacles: {sorted(obstacle_set)}")
|
| 290 |
+
print(" Enter garbage positions:")
|
| 291 |
+
print(" x,y place at column x, row y (e.g. '4,4')")
|
| 292 |
+
print(" random N place N random pieces (e.g. 'random 5')")
|
| 293 |
+
print(" done start the episode\n")
|
| 294 |
+
|
| 295 |
+
garbage = []
|
| 296 |
+
while True:
|
| 297 |
+
raw = input(" Garbage > ").strip().lower()
|
| 298 |
+
|
| 299 |
+
if raw == "done":
|
| 300 |
+
if not garbage:
|
| 301 |
+
print(" Need at least one garbage tile.")
|
| 302 |
+
continue
|
| 303 |
+
break
|
| 304 |
+
|
| 305 |
+
if raw.startswith("random"):
|
| 306 |
+
import random
|
| 307 |
+
parts = raw.split()
|
| 308 |
+
n = int(parts[1]) if len(parts) > 1 else 3
|
| 309 |
+
candidates = [(x, y) for x in range(grid_w) for y in range(grid_h)
|
| 310 |
+
if (x, y) not in obstacle_set]
|
| 311 |
+
garbage = random.sample(candidates, min(n, len(candidates)))
|
| 312 |
+
print(f" Random garbage: {garbage}")
|
| 313 |
+
break
|
| 314 |
+
|
| 315 |
+
try:
|
| 316 |
+
x, y = map(int, raw.split(","))
|
| 317 |
+
if not (0 <= x < grid_w and 0 <= y < grid_h):
|
| 318 |
+
print(f" Out of bounds — valid: 0-{grid_w-1}, 0-{grid_h-1}")
|
| 319 |
+
continue
|
| 320 |
+
if (x, y) in obstacle_set:
|
| 321 |
+
print(f" ({x},{y}) is an obstacle.")
|
| 322 |
+
continue
|
| 323 |
+
if (x, y) in garbage:
|
| 324 |
+
print(f" ({x},{y}) already added.")
|
| 325 |
+
continue
|
| 326 |
+
garbage.append((x, y))
|
| 327 |
+
print(f" Added ({x},{y}) total: {garbage}")
|
| 328 |
+
except ValueError:
|
| 329 |
+
print(" Format: x,y e.g. '3,4'")
|
| 330 |
+
|
| 331 |
+
return garbage
|
| 332 |
+
|
| 333 |
+
|
| 334 |
+
def reset_with_custom_garbage(task_id, garbage_positions):
|
| 335 |
+
"""
|
| 336 |
+
Posts to /reset_custom to inject custom garbage positions at runtime.
|
| 337 |
+
Falls back to standard /reset if something goes wrong.
|
| 338 |
+
"""
|
| 339 |
+
try:
|
| 340 |
+
res = requests.post(f"{ENV_URL}/reset_custom", json={
|
| 341 |
+
"task_id": task_id,
|
| 342 |
+
"garbage_positions": [list(g) for g in garbage_positions]
|
| 343 |
+
})
|
| 344 |
+
res.raise_for_status()
|
| 345 |
+
return res.json()["observation"]
|
| 346 |
+
except Exception as e:
|
| 347 |
+
print(f"[WARN] /reset_custom failed ({e}), falling back to /reset")
|
| 348 |
+
res = requests.post(f"{ENV_URL}/reset", json={"task_id": task_id})
|
| 349 |
+
res.raise_for_status()
|
| 350 |
+
return res.json()["observation"]
|
| 351 |
+
|
| 352 |
+
|
| 353 |
+
# ──────────────────────────────────────────────────────────
|
| 354 |
+
# EPISODE RUNNER
|
| 355 |
+
# ──────────────────────────────────────────────────────────
|
| 356 |
+
|
| 357 |
+
def print_log(log_dict):
|
| 358 |
+
print(json.dumps(log_dict), flush=True)
|
| 359 |
+
|
| 360 |
+
|
| 361 |
+
def run_episode(client, task_id, obs):
|
| 362 |
+
policy = (
|
| 363 |
+
"q-table" if (_ql_agent and _ql_agent.loaded) else
|
| 364 |
+
"local-llm" if _local_model else
|
| 365 |
+
"remote-llm" if client else
|
| 366 |
+
"bfs"
|
| 367 |
+
)
|
| 368 |
+
print_log({"type": "[START]", "task_id": task_id,
|
| 369 |
+
"model": MODEL_NAME, "policy": policy, "max_steps": MAX_STEPS})
|
| 370 |
+
|
| 371 |
+
total_reward = 0.0
|
| 372 |
+
done = False
|
| 373 |
+
context_history = []
|
| 374 |
+
step_idx = 0
|
| 375 |
+
stuck_counter = [0] # per-episode; no cross-episode state leak
|
| 376 |
+
|
| 377 |
+
for step_idx in range(1, MAX_STEPS + 1):
|
| 378 |
+
action = resolve_next_action(client, obs, context_history, stuck_counter)
|
| 379 |
+
|
| 380 |
+
try:
|
| 381 |
+
res = requests.post(f"{ENV_URL}/step", json={"command": action})
|
| 382 |
+
res.raise_for_status()
|
| 383 |
+
step_data = res.json()
|
| 384 |
+
except Exception as e:
|
| 385 |
+
print(f"Step error: {e}")
|
| 386 |
+
break
|
| 387 |
+
|
| 388 |
+
obs = step_data["observation"]
|
| 389 |
+
reward = step_data["reward"]
|
| 390 |
+
done = step_data["done"]
|
| 391 |
+
info = step_data.get("info", {})
|
| 392 |
+
total_reward += reward
|
| 393 |
+
|
| 394 |
+
# Log includes autonomous-override details for debugging
|
| 395 |
+
log_entry = {
|
| 396 |
+
"type": "[STEP]",
|
| 397 |
+
"step": step_idx,
|
| 398 |
+
"action": action,
|
| 399 |
+
"effective": info.get("effective_command", action),
|
| 400 |
+
"overridden": info.get("autonomous_override", False),
|
| 401 |
+
"mode": obs.get("robot_mode", "normal"),
|
| 402 |
+
"battery": obs.get("battery_level"),
|
| 403 |
+
"storage": f"{obs.get('current_storage_load')}/{obs.get('storage_capacity')}",
|
| 404 |
+
"dist_home": obs.get("distance_from_home"),
|
| 405 |
+
"reward": round(reward, 2),
|
| 406 |
+
"total_reward": round(total_reward, 2),
|
| 407 |
+
"done": done,
|
| 408 |
+
}
|
| 409 |
+
print_log(log_entry)
|
| 410 |
+
|
| 411 |
+
if done:
|
| 412 |
+
break
|
| 413 |
+
|
| 414 |
+
time.sleep(0.05)
|
| 415 |
+
|
| 416 |
+
try:
|
| 417 |
+
score = requests.get(f"{ENV_URL}/grade/{task_id}").json()["score"]
|
| 418 |
+
except Exception:
|
| 419 |
+
score = 0.0
|
| 420 |
+
|
| 421 |
+
print_log({"type": "[END]", "task_id": task_id, "total_steps": step_idx,
|
| 422 |
+
"final_reward": round(total_reward, 2), "score": score})
|
| 423 |
+
return score
|
| 424 |
+
|
| 425 |
+
|
| 426 |
+
# ──────────────────────────────────────────────────────────
|
| 427 |
+
# MAIN
|
| 428 |
+
# ──────────────────────────────────────────────────────────
|
| 429 |
+
|
| 430 |
+
def main():
|
| 431 |
+
global _local_model, _local_tokenizer, _ql_agent
|
| 432 |
+
|
| 433 |
+
print("=" * 55)
|
| 434 |
+
print(" Garbage Collecting Robot — Inference")
|
| 435 |
+
print("=" * 55)
|
| 436 |
+
|
| 437 |
+
# ── 1. Load Q-Learning policy (fastest, no GPU needed) ────
|
| 438 |
+
if QLearningAgent is not None:
|
| 439 |
+
_ql_agent = QLearningAgent()
|
| 440 |
+
if _ql_agent.loaded:
|
| 441 |
+
print(f"\n [INFO] Q-table loaded ({len(_ql_agent.qtable):,} states). "
|
| 442 |
+
"Q-learning is the primary policy.")
|
| 443 |
+
else:
|
| 444 |
+
print("\n [WARN] No Q-table found (qtable.json). "
|
| 445 |
+
"Run: python qlearning.py --train")
|
| 446 |
+
print(" Falling through to LLM / BFS.")
|
| 447 |
+
else:
|
| 448 |
+
print("\n [WARN] qlearning.py not found — skipping Q-table.")
|
| 449 |
+
|
| 450 |
+
# ── 2. Attempt to load the fine-tuned merged model ────────────
|
| 451 |
+
try:
|
| 452 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 453 |
+
import torch
|
| 454 |
+
print(f"\n [INFO] Loading fine-tuned model from:\n {LOCAL_MODEL_PATH}")
|
| 455 |
+
_local_tokenizer = AutoTokenizer.from_pretrained(LOCAL_MODEL_PATH)
|
| 456 |
+
_local_model = AutoModelForCausalLM.from_pretrained(
|
| 457 |
+
LOCAL_MODEL_PATH,
|
| 458 |
+
torch_dtype=torch.float16,
|
| 459 |
+
device_map="auto",
|
| 460 |
+
)
|
| 461 |
+
_local_model.eval()
|
| 462 |
+
print(" [INFO] Fine-tuned model loaded — used when Q-table misses a state.")
|
| 463 |
+
except Exception as e:
|
| 464 |
+
print(f" [WARN] Fine-tuned model unavailable ({e}).")
|
| 465 |
+
print(" Falling back to remote API / BFS heuristic.")
|
| 466 |
+
_local_model, _local_tokenizer = None, None
|
| 467 |
+
|
| 468 |
+
import argparse
|
| 469 |
+
parser = argparse.ArgumentParser(description="Run GarbageBot Inference")
|
| 470 |
+
parser.add_argument("--dynamic", action="store_true",
|
| 471 |
+
help="Interactive dynamic garbage placement")
|
| 472 |
+
parser.add_argument("--task",
|
| 473 |
+
choices=["1","2","3","4","easy","medium","hard","all"],
|
| 474 |
+
default="all",
|
| 475 |
+
help="Task to run: 'easy', 'medium', 'hard', or 'all'")
|
| 476 |
+
args = parser.parse_args()
|
| 477 |
+
|
| 478 |
+
if args.task in ["1", "easy"]:
|
| 479 |
+
tasks = ["task_easy"]
|
| 480 |
+
elif args.task in ["2", "medium"]:
|
| 481 |
+
tasks = ["task_medium"]
|
| 482 |
+
elif args.task in ["3", "hard"]:
|
| 483 |
+
tasks = ["task_hard"]
|
| 484 |
+
else:
|
| 485 |
+
tasks = ["task_easy", "task_medium", "task_hard"]
|
| 486 |
+
|
| 487 |
+
print(f"\n [INFO] Running tasks: {', '.join(tasks)}")
|
| 488 |
+
|
| 489 |
+
client = OpenAI(api_key=HF_TOKEN, base_url=API_BASE_URL) if HF_TOKEN else None
|
| 490 |
+
if not client and _local_model is None:
|
| 491 |
+
print("\n [INFO] No HF_TOKEN and no local model — pure BFS heuristic mode.")
|
| 492 |
+
elif not client:
|
| 493 |
+
print("\n [INFO] No HF_TOKEN — using local Unsloth model + BFS fallback.")
|
| 494 |
+
|
| 495 |
+
for task_id in tasks:
|
| 496 |
+
print(f"\n{'─'*40}\n {task_id}\n{'─'*40}")
|
| 497 |
+
|
| 498 |
+
try:
|
| 499 |
+
res = requests.post(f"{ENV_URL}/reset", json={"task_id": task_id})
|
| 500 |
+
res.raise_for_status()
|
| 501 |
+
base_obs = res.json()["observation"]
|
| 502 |
+
except Exception as e:
|
| 503 |
+
print(f"Reset failed: {e}")
|
| 504 |
+
continue
|
| 505 |
+
|
| 506 |
+
if args.dynamic:
|
| 507 |
+
garbage = prompt_custom_garbage(
|
| 508 |
+
base_obs["grid_size"][0],
|
| 509 |
+
base_obs["grid_size"][1],
|
| 510 |
+
base_obs["obstacle_positions"]
|
| 511 |
+
)
|
| 512 |
+
obs = reset_with_custom_garbage(task_id, garbage)
|
| 513 |
+
else:
|
| 514 |
+
obs = base_obs
|
| 515 |
+
|
| 516 |
+
run_episode(client, task_id, obs)
|
| 517 |
+
|
| 518 |
+
|
| 519 |
+
if __name__ == "__main__":
|
| 520 |
+
main()
|
inference_output.log
ADDED
|
@@ -0,0 +1,240 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
============================================================
|
| 2 |
+
Garbage Collecting Robot — Inference Script
|
| 3 |
+
============================================================
|
| 4 |
+
|
| 5 |
+
|
| 6 |
+
────────────────────────────────────────
|
| 7 |
+
Running task: task_easy
|
| 8 |
+
────────────────────────────────────────
|
| 9 |
+
{"type": "[START]", "task_id": "task_easy", "env": "garbage-collecting-robot", "model": "gpt-4o-mini", "max_steps": 50}
|
| 10 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 11 |
+
{"type": "[STEP]", "step": 1, "action": "RIGHT", "reward": -0.1, "total_reward": -0.1, "done": false}
|
| 12 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 13 |
+
{"type": "[STEP]", "step": 2, "action": "RIGHT", "reward": -0.1, "total_reward": -0.2, "done": false}
|
| 14 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 15 |
+
{"type": "[STEP]", "step": 3, "action": "RIGHT", "reward": -0.1, "total_reward": -0.30000000000000004, "done": false}
|
| 16 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 17 |
+
{"type": "[STEP]", "step": 4, "action": "RIGHT", "reward": -0.1, "total_reward": -0.4, "done": false}
|
| 18 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 19 |
+
{"type": "[STEP]", "step": 5, "action": "UP", "reward": -0.1, "total_reward": -0.5, "done": false}
|
| 20 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 21 |
+
{"type": "[STEP]", "step": 6, "action": "UP", "reward": -0.1, "total_reward": -0.6, "done": false}
|
| 22 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 23 |
+
{"type": "[STEP]", "step": 7, "action": "UP", "reward": -0.1, "total_reward": -0.7, "done": false}
|
| 24 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 25 |
+
{"type": "[STEP]", "step": 8, "action": "UP", "reward": -0.1, "total_reward": -0.7999999999999999, "done": false}
|
| 26 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 27 |
+
{"type": "[STEP]", "step": 9, "action": "COLLECT", "reward": 59.9, "total_reward": 59.1, "done": true}
|
| 28 |
+
{"type": "[END]", "task_id": "task_easy", "total_steps": 9, "final_reward": 59.1, "score": 1.0}
|
| 29 |
+
|
| 30 |
+
────────────────────────────────────────
|
| 31 |
+
Running task: task_medium
|
| 32 |
+
────────────────────────────────────────
|
| 33 |
+
{"type": "[START]", "task_id": "task_medium", "env": "garbage-collecting-robot", "model": "gpt-4o-mini", "max_steps": 50}
|
| 34 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 35 |
+
{"type": "[STEP]", "step": 1, "action": "LEFT", "reward": -5.1, "total_reward": -5.1, "done": false}
|
| 36 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 37 |
+
{"type": "[STEP]", "step": 2, "action": "LEFT", "reward": -5.1, "total_reward": -10.2, "done": false}
|
| 38 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 39 |
+
{"type": "[STEP]", "step": 3, "action": "LEFT", "reward": -5.1, "total_reward": -15.299999999999999, "done": false}
|
| 40 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 41 |
+
{"type": "[STEP]", "step": 4, "action": "LEFT", "reward": -5.1, "total_reward": -20.4, "done": false}
|
| 42 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 43 |
+
{"type": "[STEP]", "step": 5, "action": "LEFT", "reward": -5.1, "total_reward": -25.5, "done": false}
|
| 44 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 45 |
+
{"type": "[STEP]", "step": 6, "action": "LEFT", "reward": -5.1, "total_reward": -30.6, "done": false}
|
| 46 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 47 |
+
{"type": "[STEP]", "step": 7, "action": "LEFT", "reward": -5.1, "total_reward": -35.7, "done": false}
|
| 48 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 49 |
+
{"type": "[STEP]", "step": 8, "action": "LEFT", "reward": -5.1, "total_reward": -40.800000000000004, "done": false}
|
| 50 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 51 |
+
{"type": "[STEP]", "step": 9, "action": "LEFT", "reward": -5.1, "total_reward": -45.900000000000006, "done": false}
|
| 52 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 53 |
+
{"type": "[STEP]", "step": 10, "action": "LEFT", "reward": -5.1, "total_reward": -51.00000000000001, "done": false}
|
| 54 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 55 |
+
{"type": "[STEP]", "step": 11, "action": "LEFT", "reward": -5.1, "total_reward": -56.10000000000001, "done": false}
|
| 56 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 57 |
+
{"type": "[STEP]", "step": 12, "action": "LEFT", "reward": -5.1, "total_reward": -61.20000000000001, "done": false}
|
| 58 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 59 |
+
{"type": "[STEP]", "step": 13, "action": "LEFT", "reward": -5.1, "total_reward": -66.30000000000001, "done": false}
|
| 60 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 61 |
+
{"type": "[STEP]", "step": 14, "action": "LEFT", "reward": -5.1, "total_reward": -71.4, "done": false}
|
| 62 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 63 |
+
{"type": "[STEP]", "step": 15, "action": "LEFT", "reward": -5.1, "total_reward": -76.5, "done": false}
|
| 64 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 65 |
+
{"type": "[STEP]", "step": 16, "action": "LEFT", "reward": -5.1, "total_reward": -81.6, "done": false}
|
| 66 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 67 |
+
{"type": "[STEP]", "step": 17, "action": "LEFT", "reward": -5.1, "total_reward": -86.69999999999999, "done": false}
|
| 68 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 69 |
+
{"type": "[STEP]", "step": 18, "action": "LEFT", "reward": -5.1, "total_reward": -91.79999999999998, "done": false}
|
| 70 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 71 |
+
{"type": "[STEP]", "step": 19, "action": "LEFT", "reward": -5.1, "total_reward": -96.89999999999998, "done": false}
|
| 72 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 73 |
+
{"type": "[STEP]", "step": 20, "action": "LEFT", "reward": -5.1, "total_reward": -101.99999999999997, "done": false}
|
| 74 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 75 |
+
{"type": "[STEP]", "step": 21, "action": "LEFT", "reward": -5.1, "total_reward": -107.09999999999997, "done": false}
|
| 76 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 77 |
+
{"type": "[STEP]", "step": 22, "action": "LEFT", "reward": -5.1, "total_reward": -112.19999999999996, "done": false}
|
| 78 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 79 |
+
{"type": "[STEP]", "step": 23, "action": "LEFT", "reward": -5.1, "total_reward": -117.29999999999995, "done": false}
|
| 80 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 81 |
+
{"type": "[STEP]", "step": 24, "action": "LEFT", "reward": -5.1, "total_reward": -122.39999999999995, "done": false}
|
| 82 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 83 |
+
{"type": "[STEP]", "step": 25, "action": "LEFT", "reward": -5.1, "total_reward": -127.49999999999994, "done": false}
|
| 84 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 85 |
+
{"type": "[STEP]", "step": 26, "action": "LEFT", "reward": -5.1, "total_reward": -132.59999999999994, "done": false}
|
| 86 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 87 |
+
{"type": "[STEP]", "step": 27, "action": "LEFT", "reward": -5.1, "total_reward": -137.69999999999993, "done": false}
|
| 88 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 89 |
+
{"type": "[STEP]", "step": 28, "action": "LEFT", "reward": -5.1, "total_reward": -142.79999999999993, "done": false}
|
| 90 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 91 |
+
{"type": "[STEP]", "step": 29, "action": "LEFT", "reward": -5.1, "total_reward": -147.89999999999992, "done": false}
|
| 92 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 93 |
+
{"type": "[STEP]", "step": 30, "action": "LEFT", "reward": -5.1, "total_reward": -152.99999999999991, "done": false}
|
| 94 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 95 |
+
{"type": "[STEP]", "step": 31, "action": "LEFT", "reward": -5.1, "total_reward": -158.0999999999999, "done": false}
|
| 96 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 97 |
+
{"type": "[STEP]", "step": 32, "action": "LEFT", "reward": -5.1, "total_reward": -163.1999999999999, "done": false}
|
| 98 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 99 |
+
{"type": "[STEP]", "step": 33, "action": "LEFT", "reward": -5.1, "total_reward": -168.2999999999999, "done": false}
|
| 100 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 101 |
+
{"type": "[STEP]", "step": 34, "action": "LEFT", "reward": -5.1, "total_reward": -173.3999999999999, "done": false}
|
| 102 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 103 |
+
{"type": "[STEP]", "step": 35, "action": "LEFT", "reward": -5.1, "total_reward": -178.4999999999999, "done": false}
|
| 104 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 105 |
+
{"type": "[STEP]", "step": 36, "action": "LEFT", "reward": -5.1, "total_reward": -183.59999999999988, "done": false}
|
| 106 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 107 |
+
{"type": "[STEP]", "step": 37, "action": "LEFT", "reward": -5.1, "total_reward": -188.69999999999987, "done": false}
|
| 108 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 109 |
+
{"type": "[STEP]", "step": 38, "action": "LEFT", "reward": -5.1, "total_reward": -193.79999999999987, "done": false}
|
| 110 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 111 |
+
{"type": "[STEP]", "step": 39, "action": "LEFT", "reward": -5.1, "total_reward": -198.89999999999986, "done": false}
|
| 112 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 113 |
+
{"type": "[STEP]", "step": 40, "action": "LEFT", "reward": -5.1, "total_reward": -203.99999999999986, "done": false}
|
| 114 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 115 |
+
{"type": "[STEP]", "step": 41, "action": "LEFT", "reward": -5.1, "total_reward": -209.09999999999985, "done": false}
|
| 116 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 117 |
+
{"type": "[STEP]", "step": 42, "action": "LEFT", "reward": -5.1, "total_reward": -214.19999999999985, "done": false}
|
| 118 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 119 |
+
{"type": "[STEP]", "step": 43, "action": "LEFT", "reward": -5.1, "total_reward": -219.29999999999984, "done": false}
|
| 120 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 121 |
+
{"type": "[STEP]", "step": 44, "action": "LEFT", "reward": -5.1, "total_reward": -224.39999999999984, "done": false}
|
| 122 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 123 |
+
{"type": "[STEP]", "step": 45, "action": "LEFT", "reward": -5.1, "total_reward": -229.49999999999983, "done": false}
|
| 124 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 125 |
+
{"type": "[STEP]", "step": 46, "action": "LEFT", "reward": -5.1, "total_reward": -234.59999999999982, "done": false}
|
| 126 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 127 |
+
{"type": "[STEP]", "step": 47, "action": "LEFT", "reward": -5.1, "total_reward": -239.69999999999982, "done": false}
|
| 128 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 129 |
+
{"type": "[STEP]", "step": 48, "action": "LEFT", "reward": -5.1, "total_reward": -244.7999999999998, "done": false}
|
| 130 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 131 |
+
{"type": "[STEP]", "step": 49, "action": "LEFT", "reward": -5.1, "total_reward": -249.8999999999998, "done": false}
|
| 132 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 133 |
+
{"type": "[STEP]", "step": 50, "action": "LEFT", "reward": -5.1, "total_reward": -254.9999999999998, "done": true}
|
| 134 |
+
{"type": "[END]", "task_id": "task_medium", "total_steps": 50, "final_reward": -254.9999999999998, "score": 0.0}
|
| 135 |
+
|
| 136 |
+
────────────────────────────────────────
|
| 137 |
+
Running task: task_hard
|
| 138 |
+
────────────────────────────────────────
|
| 139 |
+
{"type": "[START]", "task_id": "task_hard", "env": "garbage-collecting-robot", "model": "gpt-4o-mini", "max_steps": 50}
|
| 140 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 141 |
+
{"type": "[STEP]", "step": 1, "action": "RIGHT", "reward": -0.1, "total_reward": -0.1, "done": false}
|
| 142 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 143 |
+
{"type": "[STEP]", "step": 2, "action": "RIGHT", "reward": -0.1, "total_reward": -0.2, "done": false}
|
| 144 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 145 |
+
{"type": "[STEP]", "step": 3, "action": "RIGHT", "reward": -0.1, "total_reward": -0.30000000000000004, "done": false}
|
| 146 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 147 |
+
{"type": "[STEP]", "step": 4, "action": "RIGHT", "reward": -0.1, "total_reward": -0.4, "done": false}
|
| 148 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 149 |
+
{"type": "[STEP]", "step": 5, "action": "RIGHT", "reward": -0.1, "total_reward": -0.5, "done": false}
|
| 150 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 151 |
+
{"type": "[STEP]", "step": 6, "action": "RIGHT", "reward": -0.1, "total_reward": -0.6, "done": false}
|
| 152 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 153 |
+
{"type": "[STEP]", "step": 7, "action": "RIGHT", "reward": -0.1, "total_reward": -0.7, "done": false}
|
| 154 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 155 |
+
{"type": "[STEP]", "step": 8, "action": "RIGHT", "reward": -0.1, "total_reward": -0.7999999999999999, "done": false}
|
| 156 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 157 |
+
{"type": "[STEP]", "step": 9, "action": "UP", "reward": -0.1, "total_reward": -0.8999999999999999, "done": false}
|
| 158 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 159 |
+
{"type": "[STEP]", "step": 10, "action": "UP", "reward": -0.1, "total_reward": -0.9999999999999999, "done": false}
|
| 160 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 161 |
+
{"type": "[STEP]", "step": 11, "action": "UP", "reward": -0.1, "total_reward": -1.0999999999999999, "done": false}
|
| 162 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 163 |
+
{"type": "[STEP]", "step": 12, "action": "UP", "reward": -0.1, "total_reward": -1.2, "done": false}
|
| 164 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 165 |
+
{"type": "[STEP]", "step": 13, "action": "UP", "reward": -0.1, "total_reward": -1.3, "done": false}
|
| 166 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 167 |
+
{"type": "[STEP]", "step": 14, "action": "UP", "reward": -0.1, "total_reward": -1.4000000000000001, "done": false}
|
| 168 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 169 |
+
{"type": "[STEP]", "step": 15, "action": "UP", "reward": -0.1, "total_reward": -1.5000000000000002, "done": false}
|
| 170 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 171 |
+
{"type": "[STEP]", "step": 16, "action": "UP", "reward": -0.1, "total_reward": -1.6000000000000003, "done": false}
|
| 172 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 173 |
+
{"type": "[STEP]", "step": 17, "action": "COLLECT", "reward": 9.9, "total_reward": 8.3, "done": false}
|
| 174 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 175 |
+
{"type": "[STEP]", "step": 18, "action": "RIGHT", "reward": -0.1, "total_reward": 8.200000000000001, "done": false}
|
| 176 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 177 |
+
{"type": "[STEP]", "step": 19, "action": "DOWN", "reward": -0.1, "total_reward": 8.100000000000001, "done": false}
|
| 178 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 179 |
+
{"type": "[STEP]", "step": 20, "action": "DOWN", "reward": -0.1, "total_reward": 8.000000000000002, "done": false}
|
| 180 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 181 |
+
{"type": "[STEP]", "step": 21, "action": "DOWN", "reward": -0.1, "total_reward": 7.900000000000002, "done": false}
|
| 182 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 183 |
+
{"type": "[STEP]", "step": 22, "action": "DOWN", "reward": -0.1, "total_reward": 7.8000000000000025, "done": false}
|
| 184 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 185 |
+
{"type": "[STEP]", "step": 23, "action": "DOWN", "reward": -0.1, "total_reward": 7.700000000000003, "done": false}
|
| 186 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 187 |
+
{"type": "[STEP]", "step": 24, "action": "DOWN", "reward": -0.1, "total_reward": 7.600000000000003, "done": false}
|
| 188 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 189 |
+
{"type": "[STEP]", "step": 25, "action": "DOWN", "reward": -0.1, "total_reward": 7.5000000000000036, "done": false}
|
| 190 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 191 |
+
{"type": "[STEP]", "step": 26, "action": "COLLECT", "reward": 9.9, "total_reward": 17.400000000000006, "done": false}
|
| 192 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 193 |
+
{"type": "[STEP]", "step": 27, "action": "LEFT", "reward": -0.1, "total_reward": 17.300000000000004, "done": false}
|
| 194 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 195 |
+
{"type": "[STEP]", "step": 28, "action": "LEFT", "reward": -0.1, "total_reward": 17.200000000000003, "done": false}
|
| 196 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 197 |
+
{"type": "[STEP]", "step": 29, "action": "LEFT", "reward": -0.1, "total_reward": 17.1, "done": false}
|
| 198 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 199 |
+
{"type": "[STEP]", "step": 30, "action": "LEFT", "reward": -0.1, "total_reward": 17.0, "done": false}
|
| 200 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 201 |
+
{"type": "[STEP]", "step": 31, "action": "LEFT", "reward": -0.1, "total_reward": 16.9, "done": false}
|
| 202 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 203 |
+
{"type": "[STEP]", "step": 32, "action": "LEFT", "reward": -5.1, "total_reward": 11.799999999999999, "done": false}
|
| 204 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 205 |
+
{"type": "[STEP]", "step": 33, "action": "LEFT", "reward": -5.1, "total_reward": 6.699999999999999, "done": false}
|
| 206 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 207 |
+
{"type": "[STEP]", "step": 34, "action": "LEFT", "reward": -5.1, "total_reward": 1.5999999999999996, "done": false}
|
| 208 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 209 |
+
{"type": "[STEP]", "step": 35, "action": "LEFT", "reward": -5.1, "total_reward": -3.5, "done": false}
|
| 210 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 211 |
+
{"type": "[STEP]", "step": 36, "action": "LEFT", "reward": -5.1, "total_reward": -8.6, "done": false}
|
| 212 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 213 |
+
{"type": "[STEP]", "step": 37, "action": "LEFT", "reward": -5.1, "total_reward": -13.7, "done": false}
|
| 214 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 215 |
+
{"type": "[STEP]", "step": 38, "action": "LEFT", "reward": -5.1, "total_reward": -18.799999999999997, "done": false}
|
| 216 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 217 |
+
{"type": "[STEP]", "step": 39, "action": "LEFT", "reward": -5.1, "total_reward": -23.9, "done": false}
|
| 218 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 219 |
+
{"type": "[STEP]", "step": 40, "action": "LEFT", "reward": -5.1, "total_reward": -29.0, "done": false}
|
| 220 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 221 |
+
{"type": "[STEP]", "step": 41, "action": "LEFT", "reward": -5.1, "total_reward": -34.1, "done": false}
|
| 222 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 223 |
+
{"type": "[STEP]", "step": 42, "action": "LEFT", "reward": -5.1, "total_reward": -39.2, "done": false}
|
| 224 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 225 |
+
{"type": "[STEP]", "step": 43, "action": "LEFT", "reward": -5.1, "total_reward": -44.300000000000004, "done": false}
|
| 226 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 227 |
+
{"type": "[STEP]", "step": 44, "action": "LEFT", "reward": -5.1, "total_reward": -49.400000000000006, "done": false}
|
| 228 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 229 |
+
{"type": "[STEP]", "step": 45, "action": "LEFT", "reward": -5.1, "total_reward": -54.50000000000001, "done": false}
|
| 230 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 231 |
+
{"type": "[STEP]", "step": 46, "action": "LEFT", "reward": -5.1, "total_reward": -59.60000000000001, "done": false}
|
| 232 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 233 |
+
{"type": "[STEP]", "step": 47, "action": "LEFT", "reward": -5.1, "total_reward": -64.7, "done": false}
|
| 234 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 235 |
+
{"type": "[STEP]", "step": 48, "action": "LEFT", "reward": -5.1, "total_reward": -69.8, "done": false}
|
| 236 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 237 |
+
{"type": "[STEP]", "step": 49, "action": "LEFT", "reward": -5.1, "total_reward": -74.89999999999999, "done": false}
|
| 238 |
+
[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
|
| 239 |
+
{"type": "[STEP]", "step": 50, "action": "LEFT", "reward": -5.1, "total_reward": -79.99999999999999, "done": false}
|
| 240 |
+
{"type": "[END]", "task_id": "task_hard", "total_steps": 50, "final_reward": -79.99999999999999, "score": 0.4}
|
models.py
ADDED
|
@@ -0,0 +1,79 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from pydantic import BaseModel, ConfigDict
|
| 2 |
+
from typing import List, Literal, Optional, Tuple
|
| 3 |
+
|
| 4 |
+
# --- Custom observation and action logic ---
|
| 5 |
+
|
| 6 |
+
class Observation(BaseModel):
|
| 7 |
+
model_config = ConfigDict(strict=True)
|
| 8 |
+
grid_size: Tuple[int, int]
|
| 9 |
+
robot_position: Tuple[int, int]
|
| 10 |
+
garbage_positions: List[Tuple[int, int]]
|
| 11 |
+
obstacle_positions: List[Tuple[int, int]]
|
| 12 |
+
battery_level: int
|
| 13 |
+
inventory_count: int
|
| 14 |
+
message: str # Textual context for LLM
|
| 15 |
+
|
| 16 |
+
# ── Autonomous resource-management fields ──────────────────
|
| 17 |
+
home_position: Tuple[int, int] # Charging station coordinates
|
| 18 |
+
unload_station: Tuple[int, int] # Designated unload-corner coordinates
|
| 19 |
+
storage_capacity: int # Max items robot can carry before unloading
|
| 20 |
+
current_storage_load: int # Items currently held (resets after unload)
|
| 21 |
+
distance_from_home: int # BFS steps to home (-1 if unreachable)
|
| 22 |
+
robot_mode: str # 'normal' | 'recharging' | 'unloading'
|
| 23 |
+
|
| 24 |
+
|
| 25 |
+
class Action(BaseModel):
|
| 26 |
+
model_config = ConfigDict(strict=True)
|
| 27 |
+
command: Literal["UP", "DOWN", "LEFT", "RIGHT", "COLLECT"]
|
| 28 |
+
|
| 29 |
+
# --- OpenEnv Standard Spec Models ---
|
| 30 |
+
|
| 31 |
+
class State(BaseModel):
|
| 32 |
+
model_config = ConfigDict(strict=True)
|
| 33 |
+
task_id: Optional[str]
|
| 34 |
+
total_reward: float
|
| 35 |
+
steps_taken: int
|
| 36 |
+
done: bool
|
| 37 |
+
|
| 38 |
+
# ── Extended state for resource management ─────────────────
|
| 39 |
+
robot_mode: str = "normal"
|
| 40 |
+
current_storage_load: int = 0
|
| 41 |
+
battery_level: int = 0
|
| 42 |
+
distance_from_home: int = 0
|
| 43 |
+
|
| 44 |
+
|
| 45 |
+
class ResetInput(BaseModel):
|
| 46 |
+
task_id: str = "task_easy"
|
| 47 |
+
|
| 48 |
+
class CustomResetInput(BaseModel):
|
| 49 |
+
"""
|
| 50 |
+
Fully dynamic reset — caller specifies the entire layout at runtime.
|
| 51 |
+
grid_size, robot_start, garbage positions, obstacles, battery, storage_capacity,
|
| 52 |
+
home_position and unload_station are all optional overrides on top of a base task_id.
|
| 53 |
+
Pass task_id='custom' to skip scenario defaults entirely.
|
| 54 |
+
"""
|
| 55 |
+
task_id: str = "task_easy"
|
| 56 |
+
grid_size: Optional[Tuple[int, int]] = None
|
| 57 |
+
robot_start: Optional[Tuple[int, int]] = None
|
| 58 |
+
garbage_positions: Optional[List[Tuple[int, int]]] = None
|
| 59 |
+
obstacle_positions: Optional[List[Tuple[int, int]]] = None
|
| 60 |
+
max_battery: Optional[int] = None
|
| 61 |
+
storage_capacity: Optional[int] = None
|
| 62 |
+
home_position: Optional[Tuple[int, int]] = None
|
| 63 |
+
unload_station: Optional[Tuple[int, int]] = None
|
| 64 |
+
|
| 65 |
+
class ResetOutput(BaseModel):
|
| 66 |
+
observation: Observation
|
| 67 |
+
|
| 68 |
+
class StepOutput(BaseModel):
|
| 69 |
+
observation: Observation
|
| 70 |
+
reward: float
|
| 71 |
+
done: bool
|
| 72 |
+
info: dict = {}
|
| 73 |
+
|
| 74 |
+
class Task(BaseModel):
|
| 75 |
+
id: str
|
| 76 |
+
name: str
|
| 77 |
+
description: str
|
| 78 |
+
difficulty: str
|
| 79 |
+
reward_range: List[float]
|
openenv.yaml
ADDED
|
@@ -0,0 +1,20 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
name: garbage-collecting-robot
|
| 2 |
+
version: "1.0.0"
|
| 3 |
+
description: "An OpenEnv-compliant reinforcement learning environment for a garbage collecting robot. The agent navigates a grid, picks up garbage, and manages its battery."
|
| 4 |
+
type: logical-grid-world
|
| 5 |
+
action_space:
|
| 6 |
+
type: discrete
|
| 7 |
+
description: "Movement and interaction commands: UP, DOWN, LEFT, RIGHT, COLLECT."
|
| 8 |
+
observation_space:
|
| 9 |
+
type: object
|
| 10 |
+
description: "Grid state including robot position, garbage coordinates, inventory size, battery level, and a conversational text interpretation of the environment."
|
| 11 |
+
tasks:
|
| 12 |
+
- id: task_easy
|
| 13 |
+
difficulty: easy
|
| 14 |
+
description: "Navigate a small 5x5 grid to collect 1 piece of garbage."
|
| 15 |
+
- id: task_medium
|
| 16 |
+
difficulty: medium
|
| 17 |
+
description: "Navigate a 7x7 grid to collect 3 pieces of garbage with limited battery."
|
| 18 |
+
- id: task_hard
|
| 19 |
+
difficulty: hard
|
| 20 |
+
description: "Navigate a 10x10 maze avoiding obstacles to collect 5 pieces of garbage with strict battery usage."
|
pyproject.toml
ADDED
|
@@ -0,0 +1,23 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[project]
|
| 2 |
+
name = "incident-response-triage"
|
| 3 |
+
version = "1.0.0"
|
| 4 |
+
description = "OpenEnv environment simulating production incident response and triage for SRE agents"
|
| 5 |
+
readme = "README.md"
|
| 6 |
+
requires-python = ">=3.10"
|
| 7 |
+
dependencies = [
|
| 8 |
+
"fastapi>=0.110.0",
|
| 9 |
+
"uvicorn>=0.29.0",
|
| 10 |
+
"pydantic>=2.0.0",
|
| 11 |
+
"openai>=1.0.0",
|
| 12 |
+
"requests>=2.31.0",
|
| 13 |
+
"python-dotenv>=1.0.0",
|
| 14 |
+
"pyyaml>=6.0.0",
|
| 15 |
+
"openenv-core>=0.2.0",
|
| 16 |
+
]
|
| 17 |
+
|
| 18 |
+
[project.scripts]
|
| 19 |
+
server = "server.app:main"
|
| 20 |
+
|
| 21 |
+
[build-system]
|
| 22 |
+
requires = ["setuptools>=68.0"]
|
| 23 |
+
build-backend = "setuptools.backends._legacy:_Backend"
|
qlearning.py
ADDED
|
@@ -0,0 +1,345 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
qlearning.py — Tabular Q-Learning for the Garbage Collecting Robot.
|
| 3 |
+
|
| 4 |
+
Training runs directly against GarbageRobotEnv (no HTTP server needed).
|
| 5 |
+
The Q-table is persisted to disk as JSON and loaded by inference.py at startup.
|
| 6 |
+
|
| 7 |
+
State representation:
|
| 8 |
+
(robot_x, robot_y, sorted_garbage_tuple)
|
| 9 |
+
e.g. (2, 3, ((1,1),(4,4))) — compact, hashable, fully describes the relevant world
|
| 10 |
+
|
| 11 |
+
Actions:
|
| 12 |
+
0=UP 1=DOWN 2=LEFT 3=RIGHT 4=COLLECT
|
| 13 |
+
|
| 14 |
+
Usage:
|
| 15 |
+
# Train all tasks and save
|
| 16 |
+
python3 qlearning.py --train --episodes 8000
|
| 17 |
+
|
| 18 |
+
# Evaluate silently (uses saved Q-table)
|
| 19 |
+
python3 qlearning.py --eval
|
| 20 |
+
|
| 21 |
+
Fix applied:
|
| 22 |
+
- load() previously had two separate key-reconstruction passes, where the
|
| 23 |
+
first pass result (variable `k`) was computed but then immediately discarded.
|
| 24 |
+
The second pass also misidentified the garbage sub-list when it had exactly
|
| 25 |
+
2 integer elements (treating [gx, gy] pairs as flat coords instead of a
|
| 26 |
+
tuple-of-tuples). Replaced both passes with a single, unambiguous decode:
|
| 27 |
+
parsed = [rx, ry, [[gx1,gy1],[gx2,gy2],...]]
|
| 28 |
+
where the third element is always the nested garbage list.
|
| 29 |
+
"""
|
| 30 |
+
|
| 31 |
+
import os
|
| 32 |
+
import json
|
| 33 |
+
import random
|
| 34 |
+
import argparse
|
| 35 |
+
from collections import defaultdict
|
| 36 |
+
from environment import GarbageRobotEnv
|
| 37 |
+
from scenarios import SCENARIOS
|
| 38 |
+
|
| 39 |
+
# ── Constants ──────────────────────────────────────────────────────────────
|
| 40 |
+
|
| 41 |
+
ACTIONS = ["UP", "DOWN", "LEFT", "RIGHT", "COLLECT"]
|
| 42 |
+
ACTION_IDX = {a: i for i, a in enumerate(ACTIONS)}
|
| 43 |
+
Q_TABLE_PATH = os.environ.get("Q_TABLE_PATH", "qtable.json")
|
| 44 |
+
|
| 45 |
+
# ── Hyperparameters ─────────────────────────────────────────────────────────
|
| 46 |
+
|
| 47 |
+
ALPHA = 0.15
|
| 48 |
+
GAMMA = 0.97
|
| 49 |
+
EPSILON_START = 1.0
|
| 50 |
+
EPSILON_END = 0.05
|
| 51 |
+
EPSILON_DECAY = 0.9995
|
| 52 |
+
|
| 53 |
+
|
| 54 |
+
# ── State Encoding ──────────────────────────────────────────────────────────
|
| 55 |
+
|
| 56 |
+
def encode_state(obs: dict) -> tuple:
|
| 57 |
+
"""
|
| 58 |
+
Convert a raw observation dict into a hashable tuple suitable as a Q-table key.
|
| 59 |
+
|
| 60 |
+
Key structure: (robot_x, robot_y, ((gx1,gy1),(gx2,gy2),...))
|
| 61 |
+
Garbage positions are sorted so order doesn't create phantom new states.
|
| 62 |
+
"""
|
| 63 |
+
rx, ry = obs["robot_position"]
|
| 64 |
+
garbage = tuple(sorted((int(g[0]), int(g[1])) for g in obs["garbage_positions"]))
|
| 65 |
+
return (int(rx), int(ry), garbage)
|
| 66 |
+
|
| 67 |
+
|
| 68 |
+
# ── Q-Table ─────────────────────────────────────────────────────────────────
|
| 69 |
+
|
| 70 |
+
class QTable:
|
| 71 |
+
"""
|
| 72 |
+
Dictionary-backed Q-table with defaultdict initialisation.
|
| 73 |
+
Values default to a small optimistic initial value to encourage exploration.
|
| 74 |
+
"""
|
| 75 |
+
|
| 76 |
+
def __init__(self, optimistic_init: float = 0.5):
|
| 77 |
+
self.optimistic_init = optimistic_init
|
| 78 |
+
self._q: dict = {}
|
| 79 |
+
|
| 80 |
+
def _ensure(self, state: tuple):
|
| 81 |
+
if state not in self._q:
|
| 82 |
+
self._q[state] = [self.optimistic_init] * len(ACTIONS)
|
| 83 |
+
|
| 84 |
+
def get(self, state: tuple, action_idx: int) -> float:
|
| 85 |
+
self._ensure(state)
|
| 86 |
+
return self._q[state][action_idx]
|
| 87 |
+
|
| 88 |
+
def update(self, state: tuple, action_idx: int, value: float):
|
| 89 |
+
self._ensure(state)
|
| 90 |
+
self._q[state][action_idx] = value
|
| 91 |
+
|
| 92 |
+
def best_action(self, state: tuple) -> int:
|
| 93 |
+
"""Return the index of the greedy best action."""
|
| 94 |
+
self._ensure(state)
|
| 95 |
+
return int(max(range(len(ACTIONS)), key=lambda i: self._q[state][i]))
|
| 96 |
+
|
| 97 |
+
def best_q(self, state: tuple) -> float:
|
| 98 |
+
self._ensure(state)
|
| 99 |
+
return max(self._q[state])
|
| 100 |
+
|
| 101 |
+
# ── Persistence ─────────────────────────────────────────────────────────
|
| 102 |
+
|
| 103 |
+
def save(self, path: str = Q_TABLE_PATH):
|
| 104 |
+
"""
|
| 105 |
+
Serialise Q-table to JSON.
|
| 106 |
+
|
| 107 |
+
Key format saved to disk:
|
| 108 |
+
[rx, ry, [[gx1,gy1], [gx2,gy2], ...]]
|
| 109 |
+
This is unambiguous: element 0 and 1 are ints, element 2 is always a
|
| 110 |
+
list-of-lists, even when there is only one garbage piece.
|
| 111 |
+
"""
|
| 112 |
+
serialisable = {}
|
| 113 |
+
for (rx, ry, garbage), v in self._q.items():
|
| 114 |
+
key = json.dumps([rx, ry, [list(g) for g in garbage]])
|
| 115 |
+
serialisable[key] = v
|
| 116 |
+
with open(path, "w") as f:
|
| 117 |
+
json.dump(serialisable, f)
|
| 118 |
+
print(f"[Q-Table] Saved {len(self._q):,} states → {path}")
|
| 119 |
+
|
| 120 |
+
def load(self, path: str = Q_TABLE_PATH) -> bool:
|
| 121 |
+
"""
|
| 122 |
+
Load Q-table from JSON.
|
| 123 |
+
|
| 124 |
+
FIX: The previous implementation had two redundant key-reconstruction
|
| 125 |
+
loops. The first built variable `k` which was immediately discarded;
|
| 126 |
+
the second pass misclassified [gx, gy] pairs (lists of 2 ints) as flat
|
| 127 |
+
coordinates rather than garbage-position tuples, corrupting multi-garbage
|
| 128 |
+
states.
|
| 129 |
+
|
| 130 |
+
New single-pass decode relies on the unambiguous 3-element structure:
|
| 131 |
+
parsed[0] = rx (int)
|
| 132 |
+
parsed[1] = ry (int)
|
| 133 |
+
parsed[2] = [[gx1,gy1], ...] (always a list-of-lists)
|
| 134 |
+
"""
|
| 135 |
+
if not os.path.exists(path):
|
| 136 |
+
return False
|
| 137 |
+
with open(path, "r") as f:
|
| 138 |
+
raw = json.load(f)
|
| 139 |
+
self._q = {}
|
| 140 |
+
for k_str, v in raw.items():
|
| 141 |
+
parsed = json.loads(k_str)
|
| 142 |
+
# Robustly handle both new format [rx, ry, [[gx,gy],...]]
|
| 143 |
+
# and old format [rx, ry, [gx, gy]] (single garbage, flat list).
|
| 144 |
+
rx, ry = int(parsed[0]), int(parsed[1])
|
| 145 |
+
raw_garbage = parsed[2]
|
| 146 |
+
if raw_garbage and isinstance(raw_garbage[0], list):
|
| 147 |
+
# New / multi-garbage format: [[gx1,gy1],[gx2,gy2],...]
|
| 148 |
+
garbage = tuple(tuple(p) for p in raw_garbage)
|
| 149 |
+
elif raw_garbage and isinstance(raw_garbage[0], int):
|
| 150 |
+
# Old single-garbage flat format: [gx, gy]
|
| 151 |
+
garbage = (tuple(raw_garbage),)
|
| 152 |
+
else:
|
| 153 |
+
garbage = ()
|
| 154 |
+
self._q[(rx, ry, garbage)] = v
|
| 155 |
+
print(f"[Q-Table] Loaded {len(self._q):,} states ← {path}")
|
| 156 |
+
return True
|
| 157 |
+
|
| 158 |
+
def __len__(self):
|
| 159 |
+
return len(self._q)
|
| 160 |
+
|
| 161 |
+
|
| 162 |
+
# ── Observation Helper ───────────────────────────────────────────────────────
|
| 163 |
+
|
| 164 |
+
def _obs_from_env(env) -> dict:
|
| 165 |
+
"""Build an obs dict directly from GarbageRobotEnv fields."""
|
| 166 |
+
obs_obj = env.get_observation()
|
| 167 |
+
return {
|
| 168 |
+
"robot_position": obs_obj.robot_position,
|
| 169 |
+
"garbage_positions": list(obs_obj.garbage_positions),
|
| 170 |
+
"obstacle_positions": list(obs_obj.obstacle_positions),
|
| 171 |
+
"grid_size": obs_obj.grid_size,
|
| 172 |
+
"battery_level": obs_obj.battery_level,
|
| 173 |
+
"inventory_count": obs_obj.inventory_count,
|
| 174 |
+
"message": obs_obj.message,
|
| 175 |
+
"robot_mode": obs_obj.robot_mode,
|
| 176 |
+
"home_position": obs_obj.home_position,
|
| 177 |
+
"unload_station": obs_obj.unload_station,
|
| 178 |
+
"current_storage_load": obs_obj.current_storage_load,
|
| 179 |
+
"storage_capacity": obs_obj.storage_capacity,
|
| 180 |
+
"distance_from_home": obs_obj.distance_from_home,
|
| 181 |
+
}
|
| 182 |
+
|
| 183 |
+
|
| 184 |
+
# ── Training ─────────────────────────────────────────────────────────────────
|
| 185 |
+
|
| 186 |
+
def train(
|
| 187 |
+
task_ids=None,
|
| 188 |
+
episodes: int = 8000,
|
| 189 |
+
qtable: QTable = None,
|
| 190 |
+
verbose: bool = True,
|
| 191 |
+
) -> QTable:
|
| 192 |
+
"""
|
| 193 |
+
Run Q-learning over the given task_ids for `episodes` total episodes.
|
| 194 |
+
Tasks are sampled uniformly so the agent generalises across difficulties.
|
| 195 |
+
"""
|
| 196 |
+
if task_ids is None:
|
| 197 |
+
task_ids = list(SCENARIOS.keys())
|
| 198 |
+
|
| 199 |
+
if qtable is None:
|
| 200 |
+
qtable = QTable()
|
| 201 |
+
|
| 202 |
+
env = GarbageRobotEnv()
|
| 203 |
+
epsilon = EPSILON_START
|
| 204 |
+
|
| 205 |
+
best_scores: dict = {t: 0.0 for t in task_ids}
|
| 206 |
+
|
| 207 |
+
for ep in range(1, episodes + 1):
|
| 208 |
+
task_id = random.choice(task_ids)
|
| 209 |
+
env.reset(task_id)
|
| 210 |
+
obs = _obs_from_env(env)
|
| 211 |
+
state = encode_state(obs)
|
| 212 |
+
|
| 213 |
+
total_reward = 0.0
|
| 214 |
+
done = False
|
| 215 |
+
|
| 216 |
+
while not done:
|
| 217 |
+
if random.random() < epsilon:
|
| 218 |
+
action_idx = random.randrange(len(ACTIONS))
|
| 219 |
+
else:
|
| 220 |
+
action_idx = qtable.best_action(state)
|
| 221 |
+
|
| 222 |
+
action = ACTIONS[action_idx]
|
| 223 |
+
result = env.step(action)
|
| 224 |
+
next_obs = result["observation"]
|
| 225 |
+
reward = result["reward"]
|
| 226 |
+
done = result["done"]
|
| 227 |
+
|
| 228 |
+
next_state = encode_state(next_obs)
|
| 229 |
+
|
| 230 |
+
# Bellman update
|
| 231 |
+
old_q = qtable.get(state, action_idx)
|
| 232 |
+
td_target = reward + (0.0 if done else GAMMA * qtable.best_q(next_state))
|
| 233 |
+
new_q = old_q + ALPHA * (td_target - old_q)
|
| 234 |
+
qtable.update(state, action_idx, new_q)
|
| 235 |
+
|
| 236 |
+
state = next_state
|
| 237 |
+
obs = next_obs
|
| 238 |
+
total_reward += reward
|
| 239 |
+
|
| 240 |
+
score = env.grade(task_id)
|
| 241 |
+
if score > best_scores[task_id]:
|
| 242 |
+
best_scores[task_id] = score
|
| 243 |
+
|
| 244 |
+
epsilon = max(EPSILON_END, epsilon * EPSILON_DECAY)
|
| 245 |
+
|
| 246 |
+
if verbose and ep % 500 == 0:
|
| 247 |
+
avg_best = sum(best_scores.values()) / len(best_scores)
|
| 248 |
+
print(
|
| 249 |
+
f" Ep {ep:5d}/{episodes} ε={epsilon:.4f} "
|
| 250 |
+
f"states={len(qtable):,} "
|
| 251 |
+
f"best_scores={best_scores} avg={avg_best:.2f}"
|
| 252 |
+
)
|
| 253 |
+
|
| 254 |
+
return qtable
|
| 255 |
+
|
| 256 |
+
|
| 257 |
+
# ── Inference Helper (used by inference.py) ─────────────────��────────────────
|
| 258 |
+
|
| 259 |
+
class QLearningAgent:
|
| 260 |
+
"""
|
| 261 |
+
Thin wrapper around a loaded Q-table for use by inference.py.
|
| 262 |
+
Falls through (returns None) when the state has never been seen during training.
|
| 263 |
+
"""
|
| 264 |
+
|
| 265 |
+
def __init__(self, path: str = Q_TABLE_PATH):
|
| 266 |
+
self.qtable = QTable()
|
| 267 |
+
self.loaded = self.qtable.load(path)
|
| 268 |
+
|
| 269 |
+
def get_action(self, obs: dict) -> str | None:
|
| 270 |
+
if not self.loaded:
|
| 271 |
+
return None
|
| 272 |
+
state = encode_state(obs)
|
| 273 |
+
if state not in self.qtable._q:
|
| 274 |
+
return None
|
| 275 |
+
return ACTIONS[self.qtable.best_action(state)]
|
| 276 |
+
|
| 277 |
+
|
| 278 |
+
# ── Evaluation ───────────────────────────────────────────────────────────────
|
| 279 |
+
|
| 280 |
+
def evaluate(qtable: QTable, task_ids=None, runs: int = 5) -> dict:
|
| 281 |
+
"""Run `runs` greedy episodes per task and return average scores."""
|
| 282 |
+
if task_ids is None:
|
| 283 |
+
task_ids = list(SCENARIOS.keys())
|
| 284 |
+
|
| 285 |
+
env = GarbageRobotEnv()
|
| 286 |
+
results = {}
|
| 287 |
+
|
| 288 |
+
for task_id in task_ids:
|
| 289 |
+
scores = []
|
| 290 |
+
for _ in range(runs):
|
| 291 |
+
env.reset(task_id)
|
| 292 |
+
obs = _obs_from_env(env)
|
| 293 |
+
done = False
|
| 294 |
+
while not done:
|
| 295 |
+
state = encode_state(obs)
|
| 296 |
+
action_idx = qtable.best_action(state)
|
| 297 |
+
result = env.step(ACTIONS[action_idx])
|
| 298 |
+
obs = result["observation"]
|
| 299 |
+
done = result["done"]
|
| 300 |
+
scores.append(env.grade(task_id))
|
| 301 |
+
avg = sum(scores) / len(scores)
|
| 302 |
+
results[task_id] = round(avg, 3)
|
| 303 |
+
print(f" {task_id:12s} avg score = {avg:.3f} ({scores})")
|
| 304 |
+
|
| 305 |
+
return results
|
| 306 |
+
|
| 307 |
+
|
| 308 |
+
# ── CLI Entry Point ───────────────────────────────────────────────────────────
|
| 309 |
+
|
| 310 |
+
if __name__ == "__main__":
|
| 311 |
+
parser = argparse.ArgumentParser(description="Q-Learning for Garbage Robot")
|
| 312 |
+
parser.add_argument("--train", action="store_true", help="Run training")
|
| 313 |
+
parser.add_argument("--eval", action="store_true", help="Run evaluation only")
|
| 314 |
+
parser.add_argument("--episodes", type=int, default=8000)
|
| 315 |
+
parser.add_argument("--tasks", nargs="+", default=None)
|
| 316 |
+
parser.add_argument("--output", default=Q_TABLE_PATH)
|
| 317 |
+
args = parser.parse_args()
|
| 318 |
+
|
| 319 |
+
if args.train:
|
| 320 |
+
print("=" * 55)
|
| 321 |
+
print(" Q-Learning Training — Garbage Collecting Robot")
|
| 322 |
+
print("=" * 55)
|
| 323 |
+
task_ids = args.tasks or list(SCENARIOS.keys())
|
| 324 |
+
print(f" Tasks : {task_ids}")
|
| 325 |
+
print(f" Episodes : {args.episodes}")
|
| 326 |
+
print(f" α={ALPHA} γ={GAMMA} ε {EPSILON_START}→{EPSILON_END} decay={EPSILON_DECAY}")
|
| 327 |
+
print()
|
| 328 |
+
|
| 329 |
+
qt = train(task_ids=task_ids, episodes=args.episodes, verbose=True)
|
| 330 |
+
qt.save(args.output)
|
| 331 |
+
|
| 332 |
+
print("\n — Evaluation on greedy policy —")
|
| 333 |
+
evaluate(qt, task_ids)
|
| 334 |
+
|
| 335 |
+
elif args.eval:
|
| 336 |
+
print("=" * 55)
|
| 337 |
+
print(" Q-Learning Evaluation")
|
| 338 |
+
print("=" * 55)
|
| 339 |
+
qt = QTable()
|
| 340 |
+
if not qt.load(args.output):
|
| 341 |
+
print(f"[ERROR] No Q-table found at {args.output}. Run with --train first.")
|
| 342 |
+
else:
|
| 343 |
+
evaluate(qt)
|
| 344 |
+
else:
|
| 345 |
+
parser.print_help()
|
qtable.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
requirements.txt
ADDED
|
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Core server dependencies
|
| 2 |
+
fastapi>=0.110.0
|
| 3 |
+
uvicorn[standard]>=0.29.0
|
| 4 |
+
pydantic>=2.0.0
|
| 5 |
+
|
| 6 |
+
# HTTP client (used by inference.py and test_env.py)
|
| 7 |
+
requests>=2.31.0
|
| 8 |
+
openai>=1.0.0
|
| 9 |
+
|
| 10 |
+
# ── Optional: only needed if running the fine-tuned LLM locally ──────────
|
| 11 |
+
# Uncomment these if your Space has a GPU runtime.
|
| 12 |
+
torch>=2.1.0
|
| 13 |
+
transformers>=4.40.0
|
| 14 |
+
accelerate>=0.27.0
|
| 15 |
+
bitsandbytes>=0.43.0
|
| 16 |
+
|
| 17 |
+
# ── Optional: only needed for Q-learning training ─────────────────────────
|
| 18 |
+
# (training is done offline; the saved qtable.json is loaded at runtime)
|
| 19 |
+
# No extra deps required — qlearning.py uses stdlib only.
|
rl_trajectories.jsonl
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
run_pipeline.sh
ADDED
|
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/bin/bash
|
| 2 |
+
|
| 3 |
+
# Activate the existing virtual environment where dependencies are being installed
|
| 4 |
+
source venv/bin/activate
|
| 5 |
+
|
| 6 |
+
echo "Ensuring pip dependencies are installed and PyTorch is active..."
|
| 7 |
+
# Install remaining dependencies. Pip will use lock files to wait or pass if already installed by the background process.
|
| 8 |
+
pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" datasets trl peft transformers bitsandbytes --no-cache-dir
|
| 9 |
+
|
| 10 |
+
echo "======================================"
|
| 11 |
+
echo " Starting Unsloth LoRA Fine-Tuning... "
|
| 12 |
+
echo "======================================"
|
| 13 |
+
|
| 14 |
+
# Run the python script and route errors to standard out
|
| 15 |
+
python train_unsloth.py > train_output.log 2>&1
|
| 16 |
+
|
| 17 |
+
echo "Process completed. Check train_output.log for details."
|
scenarios.py
ADDED
|
@@ -0,0 +1,50 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from typing import Tuple, List, Dict, Any
|
| 2 |
+
|
| 3 |
+
SCENARIOS: Dict[str, Dict[str, Any]] = {
|
| 4 |
+
"task_easy": {
|
| 5 |
+
"grid_size": (5, 5),
|
| 6 |
+
"robot_start": (0, 0),
|
| 7 |
+
"garbage_starts": [(4, 4)],
|
| 8 |
+
"obstacle_starts": [],
|
| 9 |
+
"max_battery": 30,
|
| 10 |
+
# ── Resource management ────────────────────────────────
|
| 11 |
+
# Home (charging station) is the robot's spawn point.
|
| 12 |
+
"home_position": (0, 0),
|
| 13 |
+
# Unload corner is the cell diagonally opposite to home.
|
| 14 |
+
"unload_station": (4, 0),
|
| 15 |
+
# 1 garbage piece; capacity=1 forces an unload cycle before finishing,
|
| 16 |
+
# demonstrating the mechanic even on the simplest task.
|
| 17 |
+
"storage_capacity": 6,
|
| 18 |
+
},
|
| 19 |
+
"task_medium": {
|
| 20 |
+
"grid_size": (7, 7),
|
| 21 |
+
"robot_start": (3, 3),
|
| 22 |
+
"garbage_starts": [(1, 1), (5, 5), (1, 5)],
|
| 23 |
+
"obstacle_starts": [(2, 2), (2, 3), (2, 4), (4, 2), (4, 3), (4, 4)],
|
| 24 |
+
"max_battery": 50,
|
| 25 |
+
# ── Resource management ────────────────────────────────
|
| 26 |
+
"home_position": (3, 3),
|
| 27 |
+
# Far corner from centre home — no obstacles there.
|
| 28 |
+
"unload_station": (6, 0),
|
| 29 |
+
# Capacity 2 out of 3 garbage pieces forces exactly one unload cycle.
|
| 30 |
+
"storage_capacity": 6,
|
| 31 |
+
},
|
| 32 |
+
"task_hard": {
|
| 33 |
+
"grid_size": (10, 10),
|
| 34 |
+
"robot_start": (0, 0),
|
| 35 |
+
"garbage_starts": [(8, 8), (9, 1), (1, 9), (5, 5), (8, 2)],
|
| 36 |
+
"obstacle_starts": [
|
| 37 |
+
(1, 1), (1, 2), (1, 3), (1, 4),
|
| 38 |
+
(3, 1), (3, 2), (3, 3), (3, 4),
|
| 39 |
+
(6, 5), (6, 6), (6, 7), (6, 8), # shifted so (5,5) stays clear for garbage
|
| 40 |
+
(7, 7), (7, 8), (7, 9),
|
| 41 |
+
],
|
| 42 |
+
"max_battery": 80,
|
| 43 |
+
# ── Resource management ────────────────────────────────
|
| 44 |
+
"home_position": (0, 0),
|
| 45 |
+
# Bottom-right corner — clear of all obstacles.
|
| 46 |
+
"unload_station": (9, 0),
|
| 47 |
+
# Capacity 2 out of 5 garbage pieces → two unload cycles required.
|
| 48 |
+
"storage_capacity": 6,
|
| 49 |
+
},
|
| 50 |
+
}
|
server.log
ADDED
|
Binary file (92.3 kB). View file
|
|
|
server.pid
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
61938
|
test_env.py
ADDED
|
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import requests
|
| 2 |
+
|
| 3 |
+
ENV_URL = "http://localhost:7860"
|
| 4 |
+
|
| 5 |
+
print("Resetting task_easy...")
|
| 6 |
+
res = requests.post(f"{ENV_URL}/reset", json={"task_id": "task_easy"})
|
| 7 |
+
print("Observation:", res.json()["observation"])
|
| 8 |
+
|
| 9 |
+
print("\nStepping UP...")
|
| 10 |
+
res = requests.post(f"{ENV_URL}/step", json={"command": "UP"})
|
| 11 |
+
print("Result:", res.json())
|
| 12 |
+
|
| 13 |
+
print("\nStepping UP...")
|
| 14 |
+
res = requests.post(f"{ENV_URL}/step", json={"command": "UP"})
|
| 15 |
+
print("Result:", res.json())
|
| 16 |
+
|
| 17 |
+
print("\nGrading...")
|
| 18 |
+
res = requests.get(f"{ENV_URL}/grade/task_easy")
|
| 19 |
+
print("Grade:", res.json())
|
train.pid
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
382219
|
train_output.log
ADDED
|
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Traceback (most recent call last):
|
| 2 |
+
File "/home/robotics-mu/Downloads/Meta Hackathon/train_unsloth.py", line 13, in <module>
|
| 3 |
+
from datasets import Dataset
|
| 4 |
+
ModuleNotFoundError: No module named 'datasets'
|
train_unsloth.py
ADDED
|
@@ -0,0 +1,144 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Fine-tuning Llama-3.2-3B-Instruct with Unsloth for the Garbage Collecting Robot.
|
| 3 |
+
|
| 4 |
+
Training data: fixed_dataset.jsonl (generated by code2.py + fixer.py)
|
| 5 |
+
Format: {"user": "### Instruction:\n...\n\n### Input:\nENVIRONMENT STATUS:\n...", "assistant": "UP|DOWN|LEFT|RIGHT|COLLECT"}
|
| 6 |
+
|
| 7 |
+
Base model: unsloth/llama-3.2-3b-instruct-bnb-4bit (same as Unsloth Studio run)
|
| 8 |
+
Export: lora_garbage_robot/ (LoRA adapter)
|
| 9 |
+
"""
|
| 10 |
+
|
| 11 |
+
import os
|
| 12 |
+
import json
|
| 13 |
+
from datasets import Dataset
|
| 14 |
+
|
| 15 |
+
max_seq_length = 512 # Prompts are short; 512 is well above the longest sample
|
| 16 |
+
dtype = None # Auto-detect (float16 on T4, bfloat16 on Ampere+)
|
| 17 |
+
load_in_4bit = True
|
| 18 |
+
|
| 19 |
+
# ── Alpaca prompt — MUST match fixed_dataset.jsonl / code2.py / app.py ──────
|
| 20 |
+
ALPACA_TEMPLATE = (
|
| 21 |
+
"### Instruction:\n{instruction}\n\n"
|
| 22 |
+
"### Input:\nENVIRONMENT STATUS:\n{input}\n\n"
|
| 23 |
+
"### Response:\n{response}"
|
| 24 |
+
)
|
| 25 |
+
|
| 26 |
+
INSTRUCTION = (
|
| 27 |
+
"You are an AI brain controlling a garbage collecting robot.\n"
|
| 28 |
+
"Reply with EXACTLY ONE of: UP DOWN LEFT RIGHT COLLECT"
|
| 29 |
+
)
|
| 30 |
+
|
| 31 |
+
EOS_TOKEN = None # filled in after tokenizer loads
|
| 32 |
+
|
| 33 |
+
|
| 34 |
+
def load_fixed_dataset(path: str = "fixed_dataset.jsonl") -> Dataset:
|
| 35 |
+
"""
|
| 36 |
+
Load fixed_dataset.jsonl produced by fixer.py.
|
| 37 |
+
Each row: {"user": "<### Instruction:...### Input:...>", "assistant": "<ACTION>"}
|
| 38 |
+
We re-format into the full Alpaca text so the model sees input + target in one string.
|
| 39 |
+
"""
|
| 40 |
+
rows = []
|
| 41 |
+
with open(path, "r") as f:
|
| 42 |
+
for line in f:
|
| 43 |
+
row = json.loads(line)
|
| 44 |
+
user_text = row["user"] # already contains ### Instruction + ### Input
|
| 45 |
+
assistant = row["assistant"] # e.g. "RIGHT"
|
| 46 |
+
|
| 47 |
+
# Extract the environment status message from the user field
|
| 48 |
+
try:
|
| 49 |
+
env_status = user_text.split("ENVIRONMENT STATUS:\n")[1].strip()
|
| 50 |
+
except IndexError:
|
| 51 |
+
continue # skip malformed rows
|
| 52 |
+
|
| 53 |
+
text = ALPACA_TEMPLATE.format(
|
| 54 |
+
instruction=INSTRUCTION,
|
| 55 |
+
input=env_status,
|
| 56 |
+
response=assistant,
|
| 57 |
+
) + (EOS_TOKEN or "")
|
| 58 |
+
rows.append({"text": text})
|
| 59 |
+
|
| 60 |
+
print(f"[Dataset] Loaded {len(rows):,} samples from {path}")
|
| 61 |
+
return Dataset.from_list(rows)
|
| 62 |
+
|
| 63 |
+
|
| 64 |
+
def main():
|
| 65 |
+
from unsloth import FastLanguageModel
|
| 66 |
+
from trl import SFTTrainer
|
| 67 |
+
from transformers import TrainingArguments
|
| 68 |
+
|
| 69 |
+
global EOS_TOKEN
|
| 70 |
+
|
| 71 |
+
print("=" * 60)
|
| 72 |
+
print(" Fine-tuning Llama-3.2-3B-Instruct — Garbage Robot")
|
| 73 |
+
print("=" * 60)
|
| 74 |
+
|
| 75 |
+
# ── 1. Load base model (same as Unsloth Studio session) ──────────────────
|
| 76 |
+
print("\n[1/4] Loading base model …")
|
| 77 |
+
model, tokenizer = FastLanguageModel.from_pretrained(
|
| 78 |
+
model_name = "unsloth/llama-3.2-3b-instruct-bnb-4bit",
|
| 79 |
+
max_seq_length = max_seq_length,
|
| 80 |
+
dtype = dtype,
|
| 81 |
+
load_in_4bit = load_in_4bit,
|
| 82 |
+
)
|
| 83 |
+
EOS_TOKEN = tokenizer.eos_token # fill in for dataset formatting
|
| 84 |
+
|
| 85 |
+
# ── 2. Add LoRA adapters ─────────────────────────────────────────────────
|
| 86 |
+
print("[2/4] Attaching LoRA adapters …")
|
| 87 |
+
model = FastLanguageModel.get_peft_model(
|
| 88 |
+
model,
|
| 89 |
+
r = 16,
|
| 90 |
+
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
|
| 91 |
+
"gate_proj", "up_proj", "down_proj"],
|
| 92 |
+
lora_alpha = 16,
|
| 93 |
+
lora_dropout = 0,
|
| 94 |
+
bias = "none",
|
| 95 |
+
use_gradient_checkpointing = "unsloth",
|
| 96 |
+
random_state = 3407,
|
| 97 |
+
use_rslora = False,
|
| 98 |
+
loftq_config = None,
|
| 99 |
+
)
|
| 100 |
+
|
| 101 |
+
# ── 3. Load dataset ──────────────────────────────────────────────────────
|
| 102 |
+
print("[3/4] Loading fixed_dataset.jsonl …")
|
| 103 |
+
dataset = load_fixed_dataset("fixed_dataset.jsonl")
|
| 104 |
+
|
| 105 |
+
# ── 4. Train ─────────────────────────────────────────────────────────────
|
| 106 |
+
print("[4/4] Starting fine-tuning …")
|
| 107 |
+
trainer = SFTTrainer(
|
| 108 |
+
model = model,
|
| 109 |
+
tokenizer = tokenizer,
|
| 110 |
+
train_dataset = dataset,
|
| 111 |
+
dataset_text_field = "text",
|
| 112 |
+
max_seq_length = max_seq_length,
|
| 113 |
+
dataset_num_proc = 2,
|
| 114 |
+
packing = True, # efficient for short sequences
|
| 115 |
+
args = TrainingArguments(
|
| 116 |
+
per_device_train_batch_size = 4,
|
| 117 |
+
gradient_accumulation_steps = 4,
|
| 118 |
+
warmup_ratio = 0.03,
|
| 119 |
+
num_train_epochs = 1,
|
| 120 |
+
learning_rate = 2e-4,
|
| 121 |
+
fp16 = not FastLanguageModel.is_bfloat16_supported(),
|
| 122 |
+
bf16 = FastLanguageModel.is_bfloat16_supported(),
|
| 123 |
+
logging_steps = 10,
|
| 124 |
+
optim = "adamw_8bit",
|
| 125 |
+
weight_decay = 0.01,
|
| 126 |
+
lr_scheduler_type = "cosine",
|
| 127 |
+
seed = 3407,
|
| 128 |
+
output_dir = "outputs",
|
| 129 |
+
save_strategy = "epoch",
|
| 130 |
+
),
|
| 131 |
+
)
|
| 132 |
+
|
| 133 |
+
trainer_stats = trainer.train()
|
| 134 |
+
print(f"\nTraining complete. Loss: {trainer_stats.training_loss:.4f}")
|
| 135 |
+
|
| 136 |
+
# ── Save LoRA adapter ────────────────────────────────────────────────────
|
| 137 |
+
model.save_pretrained("lora_garbage_robot")
|
| 138 |
+
tokenizer.save_pretrained("lora_garbage_robot")
|
| 139 |
+
print("\nLoRA adapter saved to: lora_garbage_robot/")
|
| 140 |
+
print("To export a merged model, use Unsloth Studio → Export → Merged Model.")
|
| 141 |
+
|
| 142 |
+
|
| 143 |
+
if __name__ == "__main__":
|
| 144 |
+
main()
|