Mihir Mithani commited on
Commit
a8d4cdf
·
0 Parent(s):

Sync Hub-enabled code to Space (no weights)

Browse files
.dockerignore ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ Meta/
2
+ Robot/
3
+ .venv/
4
+ __pycache__/
5
+ *.pyc
6
+ .git/
7
+ .env
.env.example ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ API_BASE_URL=https://api.openai.com/v1
2
+ MODEL_NAME=gpt-4o-mini
3
+ HF_TOKEN=your_hf_or_api_key_here
4
+ ENV_URL=http://localhost:7860
.gitattributes ADDED
@@ -0,0 +1 @@
 
 
1
+ model/*.safetensors filter=lfs diff=lfs merge=lfs -text
.gitignore ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ .venv/
2
+ venv/
3
+ model/
4
+ Meta/
5
+ Robot/
6
+ __pycache__/
7
+ *.pyc
8
+ .env
9
+ uv.lock
10
+ test_run.log
11
+ hf_test.log
12
+ hf_test2.log
.gitmodules ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ [submodule "OpenEnv"]
2
+ path = OpenEnv
3
+ url = "https://github.com/techavenger123/OpenEnv#"
DockerFile ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.11-slim
2
+
3
+ WORKDIR /app
4
+
5
+ # Install Python dependencies first (cached layer)
6
+ COPY requirements.txt .
7
+ RUN pip install --no-cache-dir -r requirements.txt
8
+
9
+ # Copy all project files
10
+ COPY . .
11
+
12
+ # HuggingFace Spaces requires port 7860
13
+ EXPOSE 7860
14
+
15
+ # Launch FastAPI server
16
+ CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
Dockerfile ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.11-slim
2
+
3
+ WORKDIR /app
4
+
5
+ COPY requirements.txt .
6
+ RUN pip install --no-cache-dir -r requirements.txt
7
+
8
+ COPY . .
9
+
10
+ EXPOSE 7860
11
+
12
+ CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
OpenEnv ADDED
@@ -0,0 +1 @@
 
 
1
+ Subproject commit c719decf2b19175d5ca35301d58a14c83e985480
README.md ADDED
@@ -0,0 +1,115 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: GarbageBot — RL Control Center
3
+ emoji: 🗑️
4
+ colorFrom: blue
5
+ colorTo: green
6
+ sdk: docker
7
+ app_port: 7860
8
+ pinned: false
9
+ tags:
10
+ - openenv
11
+ - robotics
12
+ - reinforcement-learning
13
+ - llama-3.2
14
+ ---
15
+
16
+ # 🤖 Garbage Collecting Robot — OpenEnv
17
+
18
+ An OpenEnv-compliant reinforcement learning environment for a garbage collecting robot. The agent must navigate a grid room to pick up garbage while managing battery constraints and storage capacity.
19
+
20
+ ## Why Garbage Collection?
21
+
22
+ Autonomous garbage collection is a classic robotics challenge involving pathfinding, resource management (battery), and state management (storage capacity). This environment provides a realistic training ground for AI agents to learn:
23
+ - **Optimal Navigation** — shortest paths via BFS and Q-Learning.
24
+ - **Resource Management** — returning to base for charging before battery depletion.
25
+ - **Logistics** — managing a 6-unit storage bin and prioritizing unload cycles.
26
+
27
+ ---
28
+
29
+ ## Architecture
30
+
31
+ The environment is a discrete grid world where the robot interacts with garbage, obstacles, a charging station (Home), and an Unload Station.
32
+
33
+ ```
34
+ ┌──────────┐
35
+ │ Dashboard│ (FastAPI + Vanilla JS)
36
+ └─────┬────┘
37
+
38
+ ┌──────────┐
39
+ │ API │ (app.py)
40
+ └─────┬────┘
41
+
42
+ ┌──────────┐
43
+ │ Env Logic│ (environment.py)
44
+ └──────────┘
45
+ ```
46
+
47
+ ---
48
+
49
+ ## Tasks
50
+
51
+ | Task ID | Difficulty | Description | Grid Size |
52
+ |---------|-----------|-------------|-----------|
53
+ | `task_easy` | 🟢 Easy | Small 5x5 grid, 1 piece of garbage. | 5x5 |
54
+ | `task_medium` | 🟡 Medium | 7x7 grid with obstacles, 3 pieces of garbage. | 7x7 |
55
+ | `task_hard` | 🔴 Hard | 10x10 maze, 5 pieces of garbage, strict battery. | 10x10 |
56
+
57
+ ---
58
+
59
+ ## Action Space
60
+
61
+ Movement and interaction commands:
62
+ - `UP`, `DOWN`, `LEFT`, `RIGHT`: Move the robot one cell.
63
+ - `COLLECT`: Pick up garbage if the robot is on its cell.
64
+
65
+ ---
66
+
67
+ ## Observation Space
68
+
69
+ The environment returns a detailed state:
70
+ - `robot_position`: `(x, y)`
71
+ - `garbage_positions`: List of `(x, y)`
72
+ - `battery_level`: Current battery vs max.
73
+ - `current_storage_load`: Current items vs capacity (6).
74
+ - `robot_mode`: `normal`, `recharging`, or `unloading`.
75
+
76
+ ---
77
+
78
+ ## Policy Priority Chain
79
+
80
+ Decisions can be driven by:
81
+ 1. **Q-Learning Table** — pre-trained optimal policy.
82
+ 2. **Llama-3.2-3B-Instruct** — fine-tuned LLM policy.
83
+ 3. **BFS Heuristic** — reliable fallback pathfinding.
84
+
85
+ ---
86
+
87
+ ## Local Development
88
+
89
+ ```bash
90
+ # 1. Install dependencies
91
+ pip install -r requirements.txt
92
+
93
+ # 2. Start the server
94
+ uvicorn app:app --host 0.0.0.0 --port 7860
95
+
96
+ # 3. Training
97
+ python qlearning.py --train --episodes 10000
98
+ ```
99
+
100
+ ---
101
+
102
+ ## Project Structure
103
+
104
+ ```
105
+ ├── app.py # FastAPI server
106
+ ├── environment.py # Core RL logic
107
+ ├── models.py # Data schemas
108
+ ├── scenarios.py # Task definitions
109
+ ├── qlearning.py # Tabular RL training
110
+ ├── inference.py # Policy resolver
111
+ ├── frontend/ # Dashboard HTML/CSS/JS
112
+ ├── qtable.json # Trained policy weights
113
+ ├── Dockerfile # Deployment container
114
+ └── README.md # This file
115
+ ```
app.py ADDED
@@ -0,0 +1,253 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ FastAPI server for the Garbage Collecting Robot OpenEnv environment.
3
+ Exposes reset / step / state / tasks / grade / policy / configure endpoints.
4
+
5
+ Fix applied:
6
+ - /policy BFS fallback now uses env.get_observation().dict() instead of
7
+ a hand-built incomplete dict (which was missing robot_mode, home_position,
8
+ unload_station, current_storage_load, storage_capacity, distance_from_home).
9
+ - Static files and /ui route added so the HTML dashboard is served from the
10
+ same origin — required for HuggingFace Spaces deployment.
11
+ """
12
+
13
+ import os
14
+ import sys
15
+ sys.path.insert(0, os.path.dirname(__file__))
16
+
17
+ from typing import List
18
+ from pydantic import BaseModel
19
+ from fastapi import FastAPI, HTTPException
20
+ from fastapi.middleware.cors import CORSMiddleware
21
+ from fastapi.staticfiles import StaticFiles
22
+ from fastapi.responses import FileResponse
23
+
24
+ from environment import GarbageRobotEnv
25
+ from models import (
26
+ Action, StepOutput, ResetInput, ResetOutput, CustomResetInput, State, Task,
27
+ )
28
+
29
+ app = FastAPI(
30
+ title="Garbage Collecting Robot — OpenEnv",
31
+ description=(
32
+ "An OpenEnv-compliant robotics environment for garbage collection. "
33
+ "AI agents must navigate a grid room to pick up garbage while managing battery constraints."
34
+ ),
35
+ version="1.0.0",
36
+ )
37
+
38
+ app.add_middleware(
39
+ CORSMiddleware,
40
+ allow_origins=["*"],
41
+ allow_methods=["*"],
42
+ allow_headers=["*"],
43
+ )
44
+
45
+ env = GarbageRobotEnv()
46
+
47
+ TASKS = [
48
+ Task(
49
+ id="task_easy",
50
+ name="Small Room Clean",
51
+ description="Navigate a small 5x5 grid to collect 1 piece of garbage.",
52
+ difficulty="easy",
53
+ reward_range=[0.0, 1.0],
54
+ ),
55
+ Task(
56
+ id="task_medium",
57
+ name="Medium Room with Obstacles",
58
+ description="Navigate a 7x7 grid to collect 3 pieces of garbage with limited battery.",
59
+ difficulty="medium",
60
+ reward_range=[0.0, 1.0],
61
+ ),
62
+ Task(
63
+ id="task_hard",
64
+ name="Large Maze Cleanup",
65
+ description="Navigate a 10x10 maze avoiding obstacles to collect 5 pieces of garbage with strict battery usage.",
66
+ difficulty="hard",
67
+ reward_range=[0.0, 1.0],
68
+ ),
69
+ ]
70
+
71
+ VALID_IDS = {t.id for t in TASKS}
72
+
73
+ @app.get("/", tags=["health"])
74
+ def health():
75
+ return {"status": "ok", "env": "garbage-collecting-robot"}
76
+
77
+ @app.post("/reset", response_model=ResetOutput, tags=["openenv"])
78
+ def reset(body: ResetInput = ResetInput()):
79
+ if body.task_id not in VALID_IDS:
80
+ raise HTTPException(400, f"task_id must be one of {sorted(VALID_IDS)}")
81
+ state = env.reset(task_id=body.task_id)
82
+ return {"observation": env.get_observation().dict()}
83
+
84
+ @app.post("/reset_custom", response_model=ResetOutput, tags=["openenv"])
85
+ def reset_custom(body: CustomResetInput):
86
+ """
87
+ Dynamic reset endpoint. Lets callers specify garbage positions,
88
+ obstacle positions, robot start, grid size and battery at runtime.
89
+ Any omitted field falls back to the base scenario's value.
90
+ """
91
+ env.reset_custom(
92
+ task_id=body.task_id,
93
+ grid_size=body.grid_size,
94
+ robot_start=body.robot_start,
95
+ garbage_positions=body.garbage_positions,
96
+ obstacle_positions=body.obstacle_positions,
97
+ max_battery=body.max_battery,
98
+ storage_capacity=body.storage_capacity,
99
+ home_position=body.home_position,
100
+ unload_station=body.unload_station,
101
+ )
102
+ return {"observation": env.get_observation().dict()}
103
+
104
+ @app.post("/step", response_model=StepOutput, tags=["openenv"])
105
+ def step(body: Action):
106
+ result = env.step(command=body.command)
107
+ return result
108
+
109
+ @app.get("/state", response_model=State, tags=["openenv"])
110
+ def state():
111
+ return env.state()
112
+
113
+ @app.get("/tasks", response_model=list[Task], tags=["openenv"])
114
+ def tasks():
115
+ return TASKS
116
+
117
+ @app.get("/grade/{task_id}", tags=["grading"])
118
+ def grade(task_id: str):
119
+ if task_id not in VALID_IDS:
120
+ raise HTTPException(400, f"task_id must be one of {sorted(VALID_IDS)}")
121
+ score = env.grade(task_id)
122
+ return {"task_id": task_id, "score": score, "reward_range": [0.0, 1.0]}
123
+
124
+
125
+ # ── Policy endpoint (fine-tuned LLM) ──────────────────────────────────────
126
+
127
+ LOCAL_MODEL_PATH = os.environ.get(
128
+ "LOCAL_MODEL_PATH",
129
+ "TechAvenger/GarbageBot-Weights"
130
+ )
131
+
132
+ _policy_model = None
133
+ _policy_tokenizer = None
134
+ _policy_loaded = False
135
+
136
+ def _load_policy():
137
+ global _policy_model, _policy_tokenizer, _policy_loaded
138
+ if _policy_loaded:
139
+ return
140
+ try:
141
+ from transformers import AutoModelForCausalLM, AutoTokenizer
142
+ import torch
143
+ _policy_tokenizer = AutoTokenizer.from_pretrained(LOCAL_MODEL_PATH)
144
+ _policy_model = AutoModelForCausalLM.from_pretrained(
145
+ LOCAL_MODEL_PATH, torch_dtype=torch.float16, device_map="auto"
146
+ )
147
+ _policy_model.eval()
148
+ print(f"[Policy] Fine-tuned model loaded from {LOCAL_MODEL_PATH}")
149
+ except Exception as e:
150
+ print(f"[Policy] Model unavailable: {e}")
151
+ _policy_loaded = True
152
+
153
+
154
+ class PolicyInput(BaseModel):
155
+ message: str # the obs.message string from the environment
156
+
157
+ @app.post("/policy", tags=["openenv"])
158
+ def policy(body: PolicyInput):
159
+ """
160
+ Ask the fine-tuned LLM for the next action.
161
+ Returns {"action": "UP|DOWN|LEFT|RIGHT|COLLECT", "source": "llm|bfs"}
162
+ """
163
+ _load_policy()
164
+ VALID = ["UP", "DOWN", "LEFT", "RIGHT", "COLLECT"]
165
+
166
+ if _policy_model is not None and _policy_tokenizer is not None:
167
+ try:
168
+ import torch
169
+ instruction = (
170
+ "You are an AI brain controlling a garbage collecting robot.\n"
171
+ "Reply with EXACTLY ONE of: UP DOWN LEFT RIGHT COLLECT"
172
+ )
173
+ prompt = (
174
+ f"### Instruction:\n{instruction}\n\n"
175
+ f"### Input:\nENVIRONMENT STATUS:\n{body.message}\n\n"
176
+ f"### Response:\n"
177
+ )
178
+ inputs = _policy_tokenizer(
179
+ prompt, return_tensors="pt", truncation=True, max_length=512
180
+ ).to(_policy_model.device)
181
+ with torch.no_grad():
182
+ outputs = _policy_model.generate(
183
+ **inputs, max_new_tokens=6, do_sample=False,
184
+ pad_token_id=_policy_tokenizer.eos_token_id
185
+ )
186
+ new_tokens = outputs[0][inputs["input_ids"].shape[1]:]
187
+ raw = _policy_tokenizer.decode(new_tokens, skip_special_tokens=True).strip().upper()
188
+ for v in VALID:
189
+ if v in raw:
190
+ return {"action": v, "source": "llm", "raw": raw}
191
+ except Exception as e:
192
+ print(f"[Policy] Inference error: {e}")
193
+
194
+ # FIX: use env.get_observation().dict() so heuristic_action() receives
195
+ # all required fields (robot_mode, home_position, unload_station, etc.)
196
+ # instead of the previous hand-built incomplete dict.
197
+ from inference import heuristic_action
198
+ obs_dict = env.get_observation().dict()
199
+ obs_dict["message"] = body.message # use the caller's message for context
200
+ return {"action": heuristic_action(obs_dict), "source": "bfs"}
201
+
202
+
203
+ # ── Dynamic garbage placement ──────────────────────────────────────────────
204
+
205
+ class ConfigureInput(BaseModel):
206
+ task_id: str = "task_easy"
207
+ garbage_positions: List[List[int]] # [[x,y], ...]
208
+
209
+ @app.post("/configure", tags=["openenv"])
210
+ def configure(body: ConfigureInput):
211
+ """
212
+ Reset the environment for task_id, then override garbage positions
213
+ with whatever the caller supplies.
214
+ """
215
+ if body.task_id not in VALID_IDS:
216
+ raise HTTPException(400, f"task_id must be one of {sorted(VALID_IDS)}")
217
+
218
+ env.reset(task_id=body.task_id)
219
+
220
+ validated = []
221
+ for pos in body.garbage_positions:
222
+ if len(pos) != 2:
223
+ raise HTTPException(400, f"Each position must be [x, y], got {pos}")
224
+ x, y = pos
225
+ gw, gh = env.grid_size
226
+ if not (0 <= x < gw and 0 <= y < gh):
227
+ raise HTTPException(400, f"Position {pos} out of bounds for grid {env.grid_size}")
228
+ if [x, y] in env.obstacle_positions:
229
+ raise HTTPException(400, f"Position {pos} is an obstacle")
230
+ validated.append([x, y])
231
+
232
+ env.garbage_positions = validated
233
+
234
+ return {"observation": env.get_observation().dict()}
235
+
236
+
237
+ # ── Serve HTML dashboard ───────────────────────────────────────────────────
238
+ # This makes the frontend accessible at /ui on the same origin as the API,
239
+ # which is required for HuggingFace Spaces (no localhost cross-origin issues).
240
+
241
+ @app.get("/ui", include_in_schema=False)
242
+ def ui():
243
+ """Serve the dashboard HTML."""
244
+ return FileResponse("frontend/index.html")
245
+
246
+ # Mount static assets (style.css, script.js) at /static
247
+ if os.path.exists("frontend/style.css") or os.path.exists("frontend/script.js"):
248
+ app.mount("/static", StaticFiles(directory="frontend"), name="static")
249
+
250
+
251
+ if __name__ == "__main__":
252
+ import uvicorn
253
+ uvicorn.run(app, host="0.0.0.0", port=7860)
code.py ADDED
@@ -0,0 +1,78 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ code.py — Seed training data generator for the Garbage Collecting Robot.
3
+
4
+ Fix applied:
5
+ - All trajectory entries now use the unified {"text": "..."} Alpaca format.
6
+ - Previously the first entry used {"text": ...} while all others used
7
+ {"obs": ..., "action": ...}, causing fixer.py to silently skip them
8
+ (KeyError on the missing "text" key).
9
+ """
10
+
11
+ import json
12
+
13
+ INSTRUCTION = (
14
+ "You control a garbage collecting robot. "
15
+ "Reply with ONE of: UP DOWN LEFT RIGHT COLLECT"
16
+ )
17
+
18
+ def alpaca(obs: str, action: str) -> dict:
19
+ """Wrap an obs/action pair into the Alpaca fine-tuning format."""
20
+ return {
21
+ "text": (
22
+ f"### Instruction:\n{INSTRUCTION}\n\n"
23
+ f"### Input:\nENVIRONMENT STATUS:\n{obs}\n\n"
24
+ f"### Response:\n{action}"
25
+ )
26
+ }
27
+
28
+
29
+ trajectories = [
30
+ # --- task_easy: straight-line approach ---
31
+ alpaca("You are at (0, 0). Garbage at [(4, 4)]. Battery: 30/30. No obstacles nearby.", "RIGHT"),
32
+ alpaca("You are at (1, 0). Garbage at [(4, 4)]. Battery: 29/30. No obstacles nearby.", "RIGHT"),
33
+ alpaca("You are at (2, 0). Garbage at [(4, 4)]. Battery: 28/30. No obstacles nearby.", "RIGHT"),
34
+ alpaca("You are at (3, 0). Garbage at [(4, 4)]. Battery: 27/30. No obstacles nearby.", "RIGHT"),
35
+ alpaca("You are at (4, 0). Garbage at [(4, 4)]. Battery: 26/30. No obstacles nearby.", "UP"),
36
+ alpaca("You are at (4, 1). Garbage at [(4, 4)]. Battery: 25/30. No obstacles nearby.", "UP"),
37
+ alpaca("You are at (4, 2). Garbage at [(4, 4)]. Battery: 24/30. No obstacles nearby.", "UP"),
38
+ alpaca("You are at (4, 3). Garbage at [(4, 4)]. Battery: 23/30. No obstacles nearby.", "UP"),
39
+ alpaca("You are at (4, 4). Garbage at [(4, 4)]. Battery: 22/30. You are ON the garbage.", "COLLECT"),
40
+
41
+ # --- task_medium: obstacle avoidance ---
42
+ alpaca("You are at (3, 3). Garbage at [(1,1),(5,5),(1,5)]. Battery: 50/50. BLOCKED! DOWN is an obstacle. Blocked directions: DOWN, LEFT. Choose a different direction.", "UP"),
43
+ alpaca("You are at (3, 4). Garbage at [(1,1),(5,5),(1,5)]. Battery: 49/50. Moving toward (1,5).", "LEFT"),
44
+ alpaca("You are at (2, 4). Garbage at [(1,1),(5,5),(1,5)]. Battery: 48/50. BLOCKED! LEFT is an obstacle. Blocked directions: LEFT. Choose RIGHT or UP.", "UP"),
45
+ alpaca("You are at (2, 5). Garbage at [(1,1),(5,5),(1,5)]. Battery: 47/50. Clear path left.", "LEFT"),
46
+ alpaca("You are at (1, 5). Garbage at [(1,1),(5,5),(1,5)]. Battery: 46/50. You are ON the garbage.", "COLLECT"),
47
+ alpaca("You are at (1, 5). Garbage at [(1,1),(5,5)]. Battery: 45/50. Next target (5,5), moving right.", "RIGHT"),
48
+ alpaca("You are at (2, 5). Garbage at [(1,1),(5,5)]. Battery: 44/50. Continuing right.", "RIGHT"),
49
+ alpaca("You are at (3, 5). Garbage at [(1,1),(5,5)]. Battery: 43/50. Continuing right.", "RIGHT"),
50
+ alpaca("You are at (4, 5). Garbage at [(1,1),(5,5)]. Battery: 42/50. Continuing right.", "RIGHT"),
51
+ alpaca("You are at (5, 5). Garbage at [(1,1),(5,5)]. Battery: 41/50. You are ON the garbage.", "COLLECT"),
52
+ alpaca("You are at (5, 5). Garbage at [(1,1)]. Battery: 40/50. Last garbage at (1,1), heading left+down.", "LEFT"),
53
+ alpaca("You are at (4, 5). Garbage at [(1,1)]. Battery: 39/50. Continuing toward (1,1).", "LEFT"),
54
+ alpaca("You are at (3, 5). Garbage at [(1,1)]. Battery: 38/50. BLOCKED! DOWN is an obstacle. Go LEFT.", "LEFT"),
55
+ alpaca("You are at (2, 5). Garbage at [(1,1)]. Battery: 37/50. BLOCKED! DOWN is an obstacle. Go LEFT.", "LEFT"),
56
+ alpaca("You are at (1, 5). Garbage at [(1,1)]. Battery: 36/50. Path down is clear now.", "DOWN"),
57
+ alpaca("You are at (1, 4). Garbage at [(1,1)]. Battery: 35/50. Continuing down.", "DOWN"),
58
+ alpaca("You are at (1, 3). Garbage at [(1,1)]. Battery: 34/50. Continuing down.", "DOWN"),
59
+ alpaca("You are at (1, 2). Garbage at [(1,1)]. Battery: 33/50. Continuing down.", "DOWN"),
60
+ alpaca("You are at (1, 1). Garbage at [(1,1)]. Battery: 32/50. You are ON the last garbage.", "COLLECT"),
61
+
62
+ # --- low battery urgency ---
63
+ alpaca("You are at (2, 2). Garbage at [(4,4)]. Battery: 5/30. CRITICAL battery! Move directly: RIGHT.", "RIGHT"),
64
+ alpaca("You are at (3, 2). Garbage at [(4,4)]. Battery: 4/30. CRITICAL battery! Move directly: RIGHT.", "RIGHT"),
65
+ alpaca("You are at (4, 2). Garbage at [(4,4)]. Battery: 3/30. CRITICAL battery! Move directly: UP.", "UP"),
66
+ alpaca("You are at (4, 3). Garbage at [(4,4)]. Battery: 2/30. CRITICAL battery! Move directly: UP.", "UP"),
67
+ alpaca("You are at (4, 4). Garbage at [(4,4)]. Battery: 1/30. You are ON the garbage. COLLECT NOW.", "COLLECT"),
68
+
69
+ # --- do not collect when not on garbage ---
70
+ alpaca("You are at (2, 3). Garbage at [(4,4)]. Battery: 20/30. You are NOT on garbage. Move toward it.", "RIGHT"),
71
+ alpaca("You are at (0, 0). Garbage at [(3,3)]. Battery: 15/30. You are NOT on garbage. Do not COLLECT.", "RIGHT"),
72
+ ]
73
+
74
+ with open("garbage_robot_dataset.jsonl", "w") as f:
75
+ for row in trajectories:
76
+ f.write(json.dumps(row) + "\n")
77
+
78
+ print(f"Wrote {len(trajectories)} samples to garbage_robot_dataset.jsonl")
code2.py ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from qlearning import QTable, ACTIONS, encode_state
2
+ from environment import GarbageRobotEnv
3
+ from scenarios import SCENARIOS
4
+ import json
5
+
6
+ qt = QTable()
7
+ qt.load('qtable.json')
8
+ env = GarbageRobotEnv()
9
+
10
+ instruction = '''You are an AI brain controlling a garbage collecting robot.
11
+ Reply with EXACTLY ONE of: UP DOWN LEFT RIGHT COLLECT'''
12
+
13
+ alpaca = '''### Instruction:\n{}\n\n### Input:\nENVIRONMENT STATUS:\n{}\n\n### Response:\n{}'''
14
+
15
+ data = []
16
+ for task_id in SCENARIOS:
17
+ for _ in range(10): # 10 episodes per task
18
+ env.reset(task_id)
19
+ done = False
20
+ while not done:
21
+ obs_obj = env.get_observation()
22
+ obs = {'robot_position': obs_obj.robot_position,
23
+ 'garbage_positions': list(obs_obj.garbage_positions),
24
+ 'grid_size': obs_obj.grid_size}
25
+ state = encode_state(obs)
26
+ action = ACTIONS[qt.best_action(state)]
27
+ data.append({'text': alpaca.format(instruction, obs_obj.message, action)})
28
+ result = env.step(action)
29
+ done = result['done']
30
+
31
+ with open('rl_trajectories.jsonl', 'w') as f:
32
+ for row in data:
33
+ f.write(json.dumps(row) + '\n')
34
+ print(f'Generated {len(data)} samples')
environment.py ADDED
@@ -0,0 +1,500 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ environment.py — Garbage Collecting Robot Core RL Environment.
3
+
4
+ Fixes applied:
5
+ • Battery no longer drains during autonomous CHARGE / UNLOAD_HERE steps.
6
+ • Recharge guard now checks `not self.done` instead of `self.garbage_positions`
7
+ so it also fires correctly at episode boundaries.
8
+ """
9
+
10
+ from typing import Any, Dict, Optional, List, Tuple
11
+ from collections import deque
12
+ from models import Observation, State
13
+ from scenarios import SCENARIOS
14
+
15
+
16
+ # ─────────────────────────────────────────────────────────────
17
+ # BFS PATHFINDING HELPER
18
+ # ─────────────────────────────────────────────────────────────
19
+
20
+ def _bfs(
21
+ start,
22
+ goal,
23
+ obstacles,
24
+ grid_w: int,
25
+ grid_h: int,
26
+ ) -> Tuple[Optional[str], float]:
27
+ """
28
+ Breadth-First Search from *start* to *goal* on a rectangular grid.
29
+
30
+ Avoids all cells listed in *obstacles*. Returns:
31
+ (first_direction, path_length) — the single step that begins the
32
+ shortest path, and how many steps
33
+ the full path takes.
34
+ (None, 0) — start == goal (already there).
35
+ (None, inf) — goal is unreachable.
36
+
37
+ Directions: "UP" (+y), "DOWN" (−y), "LEFT" (−x), "RIGHT" (+x).
38
+ """
39
+ start = (int(start[0]), int(start[1]))
40
+ goal = (int(goal[0]), int(goal[1]))
41
+
42
+ if start == goal:
43
+ return (None, 0)
44
+
45
+ obstacle_set = frozenset((int(o[0]), int(o[1])) for o in obstacles)
46
+ dirs = [("RIGHT", (1, 0)), ("LEFT", (-1, 0)), ("UP", (0, 1)), ("DOWN", (0, -1))]
47
+
48
+ queue: deque = deque([(start, None, 0)]) # (pos, first_move, depth)
49
+ visited = {start}
50
+
51
+ while queue:
52
+ pos, first, depth = queue.popleft()
53
+ for name, (dx, dy) in dirs:
54
+ npos = (pos[0] + dx, pos[1] + dy)
55
+ if not (0 <= npos[0] < grid_w and 0 <= npos[1] < grid_h):
56
+ continue
57
+ if npos in obstacle_set or npos in visited:
58
+ continue
59
+ move = first if first else name
60
+ if npos == goal:
61
+ return (move, depth + 1)
62
+ visited.add(npos)
63
+ queue.append((npos, move, depth + 1))
64
+
65
+ return (None, float("inf"))
66
+
67
+
68
+ # ─────────────────────────────────────────────────────────────
69
+ # ENVIRONMENT
70
+ # ─────────────────────────────────────────────────────────────
71
+
72
+ class GarbageRobotEnv:
73
+ """
74
+ Core RL Environment for the Garbage Collecting Robot.
75
+
76
+ Robot modes
77
+ -----------
78
+ MODE_NORMAL — agent controls the robot normally.
79
+ MODE_RECHARGE — battery critically low; robot auto-navigates home,
80
+ recharges, then switches back to NORMAL.
81
+ MODE_UNLOAD — storage full; robot auto-navigates to unload_station,
82
+ empties its bin, then switches back to NORMAL.
83
+
84
+ Autonomous overrides happen *inside* step(): the command the caller
85
+ sends is silently replaced when the robot is in a non-normal mode.
86
+ This keeps the external API unchanged while giving the robot
87
+ self-managing capabilities.
88
+
89
+ FIX: Battery is only decremented for real movement/collection commands,
90
+ NOT for internal CHARGE or UNLOAD_HERE commands.
91
+ """
92
+
93
+ MODE_NORMAL = "normal"
94
+ MODE_RECHARGE = "recharging"
95
+ MODE_UNLOAD = "unloading"
96
+
97
+ # Safety margin added on top of BFS distance when deciding to recharge.
98
+ RECHARGE_BUFFER = 4
99
+
100
+ def __init__(self):
101
+ self.current_task_id = None
102
+ self.grid_size = (0, 0)
103
+ self.robot_position = [0, 0]
104
+ self.garbage_positions = []
105
+ self.obstacle_positions = []
106
+ self.battery_level = 0
107
+ self.max_battery = 0
108
+ self.inventory_count = 0
109
+
110
+ # Resource management state
111
+ self.home_position = [0, 0]
112
+ self.unload_station = [0, 0]
113
+ self.storage_capacity = 6
114
+ self.current_storage_load = 0
115
+
116
+ # Episode accounting
117
+ self.total_reward = 0.0
118
+ self.steps_taken = 0
119
+ self.done = False
120
+
121
+ # Autonomous navigation mode
122
+ self._mode = self.MODE_NORMAL
123
+
124
+ # ── Reset ─────────────────────────────────────────────────
125
+
126
+ def reset(self, task_id: str) -> State:
127
+ if task_id not in SCENARIOS:
128
+ raise ValueError(f"Task ID '{task_id}' not found in scenarios.")
129
+
130
+ s = SCENARIOS[task_id]
131
+ self.current_task_id = task_id
132
+ self.grid_size = tuple(s["grid_size"])
133
+ self.robot_position = list(s["robot_start"])
134
+ self.garbage_positions = [list(g) for g in s["garbage_starts"]]
135
+ self.obstacle_positions = [list(o) for o in s["obstacle_starts"]]
136
+ self.battery_level = s["max_battery"]
137
+ self.max_battery = s["max_battery"]
138
+
139
+ self.home_position = list(s.get("home_position", s["robot_start"]))
140
+ self.unload_station = list(s.get("unload_station", [0, self.grid_size[1] - 1]))
141
+ self.storage_capacity = s.get("storage_capacity", 6)
142
+ self.current_storage_load = 0
143
+ self.inventory_count = 0
144
+
145
+ self.total_reward = 0.0
146
+ self.steps_taken = 0
147
+ self.done = False
148
+ self._mode = self.MODE_NORMAL
149
+
150
+ return self.state()
151
+
152
+ def reset_custom(
153
+ self,
154
+ task_id: str = "task_easy",
155
+ grid_size=None,
156
+ robot_start=None,
157
+ garbage_positions=None,
158
+ obstacle_positions=None,
159
+ max_battery=None,
160
+ storage_capacity=None,
161
+ home_position=None,
162
+ unload_station=None,
163
+ ) -> State:
164
+ """
165
+ Dynamic reset: start from a scenario baseline and override any fields.
166
+ Pass task_id='custom' with all fields supplied to skip scenario lookup.
167
+ """
168
+ if task_id in SCENARIOS:
169
+ s = SCENARIOS[task_id]
170
+ base_grid = s["grid_size"]
171
+ base_robot = s["robot_start"]
172
+ base_garbage = s["garbage_starts"]
173
+ base_obstacles = s["obstacle_starts"]
174
+ base_battery = s["max_battery"]
175
+ base_home = s.get("home_position", s["robot_start"])
176
+ base_unload = s.get("unload_station", [0, s["grid_size"][1] - 1])
177
+ base_capacity = s.get("storage_capacity", 5)
178
+ else:
179
+ base_grid = (10, 10)
180
+ base_robot = (0, 0)
181
+ base_garbage = []
182
+ base_obstacles = []
183
+ base_battery = 60
184
+ base_home = (0, 0)
185
+ base_unload = (9, 0)
186
+ base_capacity = 6
187
+
188
+ self.current_task_id = task_id
189
+ self.grid_size = tuple(grid_size) if grid_size is not None else tuple(base_grid)
190
+ self.robot_position = list(robot_start) if robot_start is not None else list(base_robot)
191
+ self.garbage_positions = [list(g) for g in garbage_positions] if garbage_positions is not None else [list(g) for g in base_garbage]
192
+ self.obstacle_positions = [list(o) for o in obstacle_positions] if obstacle_positions is not None else [list(o) for o in base_obstacles]
193
+ self.battery_level = max_battery if max_battery is not None else base_battery
194
+ self.max_battery = self.battery_level
195
+ self.home_position = list(home_position) if home_position is not None else list(base_home)
196
+ self.unload_station = list(unload_station) if unload_station is not None else list(base_unload)
197
+ self.storage_capacity = storage_capacity if storage_capacity is not None else base_capacity
198
+
199
+ self.current_storage_load = 0
200
+ self.inventory_count = 0
201
+ self.total_reward = 0.0
202
+ self.steps_taken = 0
203
+ self.done = False
204
+ self._mode = self.MODE_NORMAL
205
+
206
+ # Remove any garbage placed on top of an obstacle
207
+ self.garbage_positions = [
208
+ g for g in self.garbage_positions if g not in self.obstacle_positions
209
+ ]
210
+ return self.state()
211
+
212
+ # ── Observation & State helpers ───────────────────────────
213
+
214
+ def _bfs_distance(self, target) -> int:
215
+ """Return BFS step-count from current robot position to *target*."""
216
+ _, dist = _bfs(
217
+ self.robot_position, target,
218
+ self.obstacle_positions, self.grid_size[0], self.grid_size[1],
219
+ )
220
+ return int(dist) if dist != float("inf") else -1
221
+
222
+ def _should_recharge(self) -> bool:
223
+ """
224
+ Return True when the robot must leave immediately to reach home
225
+ before battery runs out.
226
+
227
+ Threshold = BFS distance to home + RECHARGE_BUFFER.
228
+ A buffer of 4 gives comfortable headroom for obstacle detours.
229
+ """
230
+ if self.battery_level <= 1:
231
+ return True
232
+ dist = self._bfs_distance(self.home_position)
233
+ if dist < 0:
234
+ # Home unreachable via BFS — fall back to Manhattan distance
235
+ dist = (abs(self.robot_position[0] - self.home_position[0]) +
236
+ abs(self.robot_position[1] - self.home_position[1]))
237
+ return self.battery_level <= (dist + self.RECHARGE_BUFFER)
238
+
239
+ def _should_unload(self) -> bool:
240
+ """Return True when the storage bin is at capacity."""
241
+ return self.current_storage_load >= self.storage_capacity
242
+
243
+ def get_observation(self, message: str = "") -> Observation:
244
+ dist_home = self._bfs_distance(self.home_position)
245
+
246
+ if not message:
247
+ message = (
248
+ f"You are at {tuple(self.robot_position)}. "
249
+ f"Garbage remaining: {len(self.garbage_positions)}. "
250
+ f"Battery: {self.battery_level}/{self.max_battery}. "
251
+ f"Storage: {self.current_storage_load}/{self.storage_capacity}. "
252
+ f"Home (charging): {tuple(self.home_position)} "
253
+ f"[{dist_home if dist_home >= 0 else 'unreachable'} steps]. "
254
+ f"Unload station: {tuple(self.unload_station)}. "
255
+ f"Mode: {self._mode}."
256
+ )
257
+
258
+ return Observation(
259
+ grid_size = self.grid_size,
260
+ robot_position = tuple(self.robot_position),
261
+ garbage_positions = [tuple(g) for g in self.garbage_positions],
262
+ obstacle_positions = [tuple(o) for o in self.obstacle_positions],
263
+ battery_level = self.battery_level,
264
+ inventory_count = self.inventory_count,
265
+ message = message,
266
+ home_position = tuple(self.home_position),
267
+ unload_station = tuple(self.unload_station),
268
+ storage_capacity = self.storage_capacity,
269
+ current_storage_load = self.current_storage_load,
270
+ distance_from_home = dist_home,
271
+ robot_mode = self._mode,
272
+ )
273
+
274
+ def state(self) -> State:
275
+ return State(
276
+ task_id = self.current_task_id,
277
+ total_reward = self.total_reward,
278
+ steps_taken = self.steps_taken,
279
+ done = self.done,
280
+ robot_mode = self._mode,
281
+ current_storage_load = self.current_storage_load,
282
+ battery_level = self.battery_level,
283
+ distance_from_home = self._bfs_distance(self.home_position),
284
+ )
285
+
286
+ # ── Autonomous command resolver ────────────────────────────
287
+
288
+ def _resolve_command(self, requested: str) -> Tuple[str, str]:
289
+ """
290
+ Determine the *effective* command for this step.
291
+
292
+ When the robot is in MODE_RECHARGE or MODE_UNLOAD the caller's
293
+ command is replaced by an autonomously-computed one.
294
+
295
+ Returns
296
+ -------
297
+ (effective_command, mode_message)
298
+ """
299
+
300
+ # ── Trigger check (only when in normal mode) ───────────
301
+ # FIX: use `not self.done` guard instead of `self.garbage_positions`
302
+ # so recharge still fires even if all garbage is collected this step.
303
+ if self._mode == self.MODE_NORMAL:
304
+ if self._should_recharge() and not self.done:
305
+ self._mode = self.MODE_RECHARGE
306
+ elif self._should_unload():
307
+ self._mode = self.MODE_UNLOAD
308
+
309
+ # ── Recharging mode ────────────────────────────────────
310
+ if self._mode == self.MODE_RECHARGE:
311
+ if tuple(self.robot_position) == tuple(self.home_position):
312
+ # Arrived — charge and return to normal
313
+ self._mode = self.MODE_NORMAL
314
+ return (
315
+ "CHARGE",
316
+ (f"Reached charging station {tuple(self.home_position)}. "
317
+ f"Battery fully restored to {self.max_battery}. "
318
+ f"Resuming garbage collection."),
319
+ )
320
+ else:
321
+ move, dist = _bfs(
322
+ self.robot_position, self.home_position,
323
+ self.obstacle_positions, self.grid_size[0], self.grid_size[1],
324
+ )
325
+ dist_str = f"{int(dist)} steps" if dist != float("inf") else "route blocked"
326
+ return (
327
+ move or "UP",
328
+ (f"⚡ Battery critical ({self.battery_level}/{self.max_battery}). "
329
+ f"Auto-navigating to charging station {tuple(self.home_position)} "
330
+ f"[{dist_str}]."),
331
+ )
332
+
333
+ # ── Unloading mode ─────────────────────────────────────
334
+ if self._mode == self.MODE_UNLOAD:
335
+ if tuple(self.robot_position) == tuple(self.unload_station):
336
+ # Arrived — empty the bin and return to normal
337
+ freed = self.current_storage_load
338
+ self._mode = self.MODE_NORMAL
339
+ return (
340
+ "UNLOAD_HERE",
341
+ (f"Reached unload station {tuple(self.unload_station)}. "
342
+ f"Emptied {freed} item(s) from storage. "
343
+ f"Resuming garbage collection."),
344
+ )
345
+ else:
346
+ move, dist = _bfs(
347
+ self.robot_position, self.unload_station,
348
+ self.obstacle_positions, self.grid_size[0], self.grid_size[1],
349
+ )
350
+ dist_str = f"{int(dist)} steps" if dist != float("inf") else "route blocked"
351
+ return (
352
+ move or "UP",
353
+ (f"📦 Storage full ({self.current_storage_load}/{self.storage_capacity}). "
354
+ f"Auto-navigating to unload station {tuple(self.unload_station)} "
355
+ f"[{dist_str}]."),
356
+ )
357
+
358
+ # ── Normal mode — use caller's command ─────────────────
359
+ return (requested, "")
360
+
361
+ # ── Step ──────────────────────────────────────────────────
362
+
363
+ def step(self, command: str) -> Dict[str, Any]:
364
+ if self.done:
365
+ obs = self.get_observation("Episode already finished.")
366
+ return {"observation": obs.dict(), "reward": 0.0, "done": True, "info": {}}
367
+
368
+ self.steps_taken += 1
369
+
370
+ # Resolve autonomous overrides BEFORE battery decrement so that
371
+ # CHARGE / UNLOAD_HERE commands do NOT consume battery.
372
+ effective_cmd, mode_message = self._resolve_command(command)
373
+
374
+ # FIX: only drain battery for real movement / collection actions.
375
+ # Autonomous internal commands (CHARGE, UNLOAD_HERE) are free.
376
+ if effective_cmd in ("CHARGE", "UNLOAD_HERE"):
377
+ reward = 0.0
378
+ else:
379
+ self.battery_level -= 1
380
+ reward = -0.1
381
+
382
+ message = mode_message # may be overwritten below
383
+
384
+ # ── CHARGE (internal — issued autonomously at home) ────
385
+ if effective_cmd == "CHARGE":
386
+ self.battery_level = self.max_battery
387
+ reward += 5.0
388
+ # message already set from resolver
389
+
390
+ # ── UNLOAD_HERE (internal — issued autonomously at station) ──
391
+ elif effective_cmd == "UNLOAD_HERE":
392
+ freed = self.current_storage_load
393
+ self.current_storage_load = 0
394
+ reward += 2.0
395
+ # message already set from resolver
396
+
397
+ # ── COLLECT ───────────────────────────────────────────
398
+ elif effective_cmd == "COLLECT":
399
+ if self.robot_position in self.garbage_positions:
400
+ self.garbage_positions.remove(self.robot_position)
401
+ self.inventory_count += 1
402
+ self.current_storage_load += 1
403
+ reward += 10.0
404
+ message = (
405
+ f"Collected garbage! "
406
+ f"Storage: {self.current_storage_load}/{self.storage_capacity}."
407
+ )
408
+ if self._should_unload() and self.garbage_positions:
409
+ self._mode = self.MODE_UNLOAD
410
+ message += (
411
+ f" Storage full — auto-routing to "
412
+ f"unload station {tuple(self.unload_station)}."
413
+ )
414
+ else:
415
+ reward -= 1.0
416
+ message = "No garbage to collect here."
417
+
418
+ # ── Movement commands ──────────────────────────────────
419
+ elif effective_cmd in ("UP", "DOWN", "LEFT", "RIGHT"):
420
+ new_pos = list(self.robot_position)
421
+ if effective_cmd == "UP":
422
+ new_pos[1] += 1
423
+ elif effective_cmd == "DOWN":
424
+ new_pos[1] -= 1
425
+ elif effective_cmd == "LEFT":
426
+ new_pos[0] -= 1
427
+ elif effective_cmd == "RIGHT":
428
+ new_pos[0] += 1
429
+
430
+ gw, gh = self.grid_size
431
+ if 0 <= new_pos[0] < gw and 0 <= new_pos[1] < gh:
432
+ if new_pos in self.obstacle_positions:
433
+ reward -= 5.0
434
+ blocked = []
435
+ direction_map = {
436
+ "UP": [0, 1], "DOWN": [0, -1],
437
+ "LEFT": [-1, 0], "RIGHT": [1, 0],
438
+ }
439
+ for d, delta in direction_map.items():
440
+ nb = [self.robot_position[0] + delta[0],
441
+ self.robot_position[1] + delta[1]]
442
+ if nb in self.obstacle_positions:
443
+ blocked.append(d)
444
+ blocked_str = ", ".join(blocked) if blocked else "none"
445
+ message = (
446
+ f"BLOCKED! {effective_cmd} leads to an obstacle. "
447
+ f"Blocked directions from here: {blocked_str}. "
448
+ f"Choose a different direction."
449
+ )
450
+ else:
451
+ self.robot_position = new_pos
452
+ if not message:
453
+ message = f"Moved {effective_cmd}."
454
+ else:
455
+ reward -= 1.0
456
+ if not message:
457
+ message = (
458
+ f"Hit a wall trying to move {effective_cmd}. "
459
+ f"Do NOT try {effective_cmd} again from this position."
460
+ )
461
+
462
+ # ── Unknown command ────────────────────────────────────
463
+ else:
464
+ reward -= 1.0
465
+ message = f"Invalid command: '{effective_cmd}'."
466
+
467
+ # ── Termination checks ─────────────────────────────────
468
+ if len(self.garbage_positions) == 0:
469
+ self.done = True
470
+ reward += 50.0
471
+ message += " All garbage collected! Task complete."
472
+ elif self.battery_level <= 0:
473
+ self.done = True
474
+ message += " Battery depleted! Game over."
475
+
476
+ self.total_reward += reward
477
+
478
+ return {
479
+ "observation": self.get_observation(message).dict(),
480
+ "reward": reward,
481
+ "done": self.done,
482
+ "info": {
483
+ "inventory_count": self.inventory_count,
484
+ "steps": self.steps_taken,
485
+ "current_storage_load": self.current_storage_load,
486
+ "robot_mode": self._mode,
487
+ "autonomous_override": effective_cmd != command,
488
+ "original_command": command,
489
+ "effective_command": effective_cmd,
490
+ },
491
+ }
492
+
493
+ # ── Grading ───────────────────────────────────────────────
494
+
495
+ def grade(self, task_id: str) -> float:
496
+ """Normalised [0.0, 1.0] completion score for the leaderboard."""
497
+ if task_id not in SCENARIOS:
498
+ return 0.0
499
+ total = len(SCENARIOS[task_id]["garbage_starts"])
500
+ return min(max(self.inventory_count / total, 0.0), 1.0)
fixed_dataset.jsonl ADDED
The diff for this file is too large to render. See raw diff
 
fixer.py ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+
3
+ input_file = "rl_trajectories.jsonl"
4
+ output_file = "fixed_dataset.jsonl"
5
+
6
+ def extract_parts(text):
7
+ try:
8
+ user_part = text.split("### Response:")[0].strip()
9
+ assistant_part = text.split("### Response:")[1].strip()
10
+ return user_part, assistant_part
11
+ except:
12
+ return None, None
13
+
14
+ with open(input_file, "r") as f_in, open(output_file, "w") as f_out:
15
+ for line in f_in:
16
+ data = json.loads(line)
17
+ text = data.get("text", "")
18
+
19
+ user, assistant = extract_parts(text)
20
+
21
+ if user and assistant:
22
+ new_entry = {
23
+ "user": user,
24
+ "assistant": assistant
25
+ }
26
+ f_out.write(json.dumps(new_entry) + "\n")
27
+
28
+ print("Done. Fixed dataset saved.")
frontend/index.html ADDED
@@ -0,0 +1,166 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <title>GarbageBot — RL Control Center</title>
7
+ <meta name="description" content="Real-time dashboard for the fine-tuned Llama-3.2 garbage collecting robot. Watch Q-learning and LLM policy decisions live.">
8
+ <link rel="preconnect" href="https://fonts.googleapis.com">
9
+ <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
10
+ <link href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700;800&family=JetBrains+Mono:wght@400;600&display=swap" rel="stylesheet">
11
+ <!-- Use /static/ prefix so FastAPI's StaticFiles middleware serves these correctly
12
+ on HuggingFace Spaces and any other hosted environment. -->
13
+ <link rel="stylesheet" href="/static/style.css">
14
+ </head>
15
+ <body>
16
+ <!-- Animated background -->
17
+ <div class="bg-orbs">
18
+ <div class="orb orb-1"></div>
19
+ <div class="orb orb-2"></div>
20
+ <div class="orb orb-3"></div>
21
+ <div class="orb orb-4"></div>
22
+ </div>
23
+ <div class="grid-bg"></div>
24
+
25
+ <div class="dashboard">
26
+
27
+ <!-- ── HEADER ── -->
28
+ <header class="header-bar">
29
+ <div class="brand">
30
+ <div class="brand-icon">🤖</div>
31
+ <div>
32
+ <h1>GarbageBot <span class="version-tag">v2.0</span></h1>
33
+ <p class="brand-sub">Llama-3.2-3B · Q-Table · BFS</p>
34
+ </div>
35
+ </div>
36
+
37
+ <div class="status-strip">
38
+ <div class="status-pill" id="server-pill">
39
+ <span class="pulse-dot" id="status-dot"></span>
40
+ <span id="status-label">Connecting…</span>
41
+ </div>
42
+ <div class="policy-badge" id="policy-badge">
43
+ <span class="badge-icon">⚡</span>
44
+ <span id="policy-label">–</span>
45
+ </div>
46
+ <div class="status-pill" id="mode-pill">
47
+ <span id="mode-label">NORMAL</span>
48
+ </div>
49
+ </div>
50
+
51
+ <div class="controls">
52
+ <select id="task-select">
53
+ <option value="task_easy">🟢 Easy — 5×5</option>
54
+ <option value="task_medium">🟡 Medium — 7×7</option>
55
+ <option value="task_hard">🔴 Hard — 10×10</option>
56
+ </select>
57
+ <div class="speed-group">
58
+ <label class="speed-label">Speed</label>
59
+ <input type="range" id="speed-slider" min="100" max="1500" value="500" step="100">
60
+ <span id="speed-val">500ms</span>
61
+ </div>
62
+ <button id="reset-btn" class="btn secondary">↺ Reset</button>
63
+ <button id="auto-btn" class="btn primary">▶ Run Policy</button>
64
+ <button id="manual-btn" class="btn outline">⏭ Step</button>
65
+ </div>
66
+ </header>
67
+
68
+ <!-- ── MAIN ── -->
69
+ <main>
70
+
71
+ <!-- Grid world -->
72
+ <section class="grid-panel panel glass">
73
+ <div class="grid-header">
74
+ <span class="grid-title">Environment</span>
75
+ <div class="grid-meta">
76
+ <span id="step-counter" class="mono-chip">Step 0</span>
77
+ <span id="episode-score-chip" class="mono-chip accent-chip">Score 0.00</span>
78
+ </div>
79
+ </div>
80
+ <div class="grid-stage" id="grid-stage">
81
+ <div id="env-grid" class="grid-world">
82
+ <div id="particle-layer" class="particle-layer"></div>
83
+ </div>
84
+ </div>
85
+ <p class="grid-hint">💡 Click any empty cell to place or remove garbage</p>
86
+ </section>
87
+
88
+ <!-- Side panel -->
89
+ <aside class="side-panel">
90
+
91
+ <!-- Telemetry -->
92
+ <div class="panel glass tele-card">
93
+ <h2 class="section-title">Telemetry</h2>
94
+
95
+ <div class="stat-row">
96
+ <div class="stat-icon">🔋</div>
97
+ <div class="stat-body">
98
+ <div class="stat-label-row">
99
+ <span class="stat-label">Battery</span>
100
+ <span id="battery-text" class="stat-num">–</span>
101
+ </div>
102
+ <div class="progress-track">
103
+ <div class="progress-fill" id="battery-progress"></div>
104
+ </div>
105
+ </div>
106
+ </div>
107
+
108
+ <div class="stat-row">
109
+ <div class="stat-icon">📦</div>
110
+ <div class="stat-body">
111
+ <div class="stat-label-row">
112
+ <span class="stat-label">Storage</span>
113
+ <span id="storage-text" class="stat-num">–</span>
114
+ </div>
115
+ <div class="progress-track">
116
+ <div class="progress-fill" id="storage-progress" style="width: 0%; background: var(--warning);"></div>
117
+ </div>
118
+ </div>
119
+ </div>
120
+
121
+ <div class="stat-row">
122
+ <div class="stat-icon">🗑️</div>
123
+ <div class="stat-body">
124
+ <div class="stat-label-row">
125
+ <span class="stat-label">Total Collected</span>
126
+ <span id="inventory-text" class="stat-num big-num">0</span>
127
+ </div>
128
+ </div>
129
+ </div>
130
+
131
+ <div class="stat-row">
132
+ <div class="stat-icon">🏆</div>
133
+ <div class="stat-body">
134
+ <div class="stat-label-row">
135
+ <span class="stat-label">Reward</span>
136
+ <span id="score-text" class="stat-num big-num accent">0.00</span>
137
+ </div>
138
+ </div>
139
+ </div>
140
+
141
+ <!-- Mini reward chart -->
142
+ <div class="chart-wrap">
143
+ <canvas id="reward-chart" width="290" height="70"></canvas>
144
+ </div>
145
+ </div>
146
+
147
+ <!-- Policy log -->
148
+ <div class="panel glass log-card">
149
+ <div class="log-header">
150
+ <h2 class="section-title">Policy Observations</h2>
151
+ <button class="clear-btn" id="clear-log">✕ Clear</button>
152
+ </div>
153
+ <div id="log-feed" class="log-feed">
154
+ <p class="placeholder">Awaiting environment reset…</p>
155
+ </div>
156
+ <p class="log-footer">🤖 Driven by fine-tuned Llama-3.2-3B</p>
157
+ </div>
158
+
159
+ </aside>
160
+ </main>
161
+
162
+ </div><!-- /dashboard -->
163
+
164
+ <script src="/static/script.js"></script>
165
+ </body>
166
+ </html>
frontend/script.js ADDED
@@ -0,0 +1,515 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /* ═══════════════════════════════════════════════════════
2
+ GarbageBot — Continuous-World Dashboard Logic
3
+ Policy chain: Fine-tuned LLM → Q-table → BFS fallback
4
+
5
+ Fix applied:
6
+ - API_BASE was hardcoded to "http://localhost:7861" which breaks on any
7
+ hosted environment (HuggingFace Spaces, cloud VMs, etc.).
8
+ Now uses a relative empty string "" so every fetch goes to the same
9
+ origin that served the page — works locally and in production without
10
+ any code change.
11
+ ═══════════════════════════════════════════════════════ */
12
+
13
+ // FIX: use relative paths ("") instead of hardcoded "http://localhost:7861"
14
+ // so the dashboard works on HuggingFace Spaces and any other host automatically.
15
+ const API_BASE = "";
16
+
17
+ // ── DOM ───────────────────────────────────────────────────
18
+ const statusDot = document.getElementById("status-dot");
19
+ const statusLabel = document.getElementById("status-label");
20
+ const policyBadge = document.getElementById("policy-badge");
21
+ const policyLabel = document.getElementById("policy-label");
22
+ const taskSelect = document.getElementById("task-select");
23
+ const speedSlider = document.getElementById("speed-slider");
24
+ const speedVal = document.getElementById("speed-val");
25
+ const resetBtn = document.getElementById("reset-btn");
26
+ const autoBtn = document.getElementById("auto-btn");
27
+ const manualBtn = document.getElementById("manual-btn");
28
+ const clearLogBtn = document.getElementById("clear-log");
29
+
30
+ const envGrid = document.getElementById("env-grid");
31
+ const particleLayer = document.getElementById("particle-layer");
32
+ const batteryProgress = document.getElementById("battery-progress");
33
+ const batteryText = document.getElementById("battery-text");
34
+ const scoreText = document.getElementById("score-text");
35
+ const inventoryText = document.getElementById("inventory-text");
36
+ const stepCounter = document.getElementById("step-counter");
37
+ const episodeScoreChip = document.getElementById("episode-score-chip");
38
+ const logFeed = document.getElementById("log-feed");
39
+ const rewardCanvas = document.getElementById("reward-chart");
40
+
41
+ const modePill = document.getElementById("mode-pill");
42
+ const modeLabel = document.getElementById("mode-label");
43
+ const storageProgress = document.getElementById("storage-progress");
44
+ const storageText = document.getElementById("storage-text");
45
+
46
+ // ── State ─────────────────────────────────────────────────
47
+ let autoMode = false;
48
+ let autoTimer = null;
49
+ let currentState = null;
50
+ let robotEntity = null;
51
+ let stepCount = 0;
52
+ let totalReward = 0;
53
+ let rewardHistory = [];
54
+ let maxBattery = 30;
55
+ let stepDelay = 500;
56
+ let lastMode = "normal";
57
+
58
+ // World dimensions (set on reset)
59
+ let WORLD_W = 5, WORLD_H = 5;
60
+ const CELL = 52; // must match CSS --cell
61
+
62
+ // ── Speed slider ──────────────────────────────────────────
63
+ speedSlider.addEventListener("input", () => {
64
+ stepDelay = parseInt(speedSlider.value);
65
+ speedVal.textContent = `${stepDelay}ms`;
66
+ const pct = ((stepDelay - 100) / 1400) * 100;
67
+ speedSlider.style.background = `linear-gradient(90deg, var(--blue) ${pct}%, rgba(255,255,255,.15) ${pct}%)`;
68
+ syncRobotTransition();
69
+ if (autoMode) { clearInterval(autoTimer); autoTimer = setInterval(stepEnv, stepDelay); }
70
+ });
71
+
72
+ function syncRobotTransition() {
73
+ if (!robotEntity) return;
74
+ envGrid.style.setProperty("--move-dur", `${stepDelay}ms`);
75
+ }
76
+
77
+ // ── Log helpers ───────────────────────────────────────────
78
+ function addLog(msg, source = "sys") {
79
+ const ph = logFeed.querySelector(".placeholder");
80
+ if (ph) ph.remove();
81
+
82
+ const entry = document.createElement("div");
83
+ entry.className = "log-entry";
84
+
85
+ const badge = document.createElement("span");
86
+ badge.className = `log-badge ${source === "q_table" ? "q-table" : source}`;
87
+ badge.textContent = source.replace("_","-").toUpperCase();
88
+
89
+ const text = document.createElement("span");
90
+ text.textContent = msg;
91
+
92
+ entry.append(badge, text);
93
+ logFeed.prepend(entry);
94
+ while (logFeed.children.length > 65) logFeed.removeChild(logFeed.lastChild);
95
+ }
96
+
97
+ clearLogBtn.addEventListener("click", () => {
98
+ logFeed.innerHTML = `<p class="placeholder">Log cleared…</p>`;
99
+ });
100
+
101
+ // ── Mini reward chart ───────────────────────��─────────────
102
+ function drawChart() {
103
+ const ctx = rewardCanvas.getContext("2d");
104
+ const W = rewardCanvas.width, H = rewardCanvas.height;
105
+ ctx.clearRect(0, 0, W, H);
106
+ if (rewardHistory.length < 2) return;
107
+
108
+ const maxR = Math.max(...rewardHistory.map(Math.abs), .1);
109
+ const step = W / (rewardHistory.length - 1);
110
+ const pts = rewardHistory.map((v, i) => [i * step, H - ((v + maxR) / (2 * maxR)) * H]);
111
+
112
+ const grad = ctx.createLinearGradient(0, 0, 0, H);
113
+ grad.addColorStop(0, "rgba(59,158,255,.5)");
114
+ grad.addColorStop(1, "rgba(59,158,255,0)");
115
+ ctx.beginPath();
116
+ pts.forEach(([x, y], i) => i === 0 ? ctx.moveTo(x, y) : ctx.lineTo(x, y));
117
+ ctx.lineTo(pts[pts.length-1][0], H);
118
+ ctx.lineTo(0, H); ctx.closePath();
119
+ ctx.fillStyle = grad; ctx.fill();
120
+
121
+ ctx.beginPath();
122
+ pts.forEach(([x, y], i) => i === 0 ? ctx.moveTo(x, y) : ctx.lineTo(x, y));
123
+ ctx.strokeStyle = "#3b9eff"; ctx.lineWidth = 2;
124
+ ctx.lineJoin = "round"; ctx.stroke();
125
+
126
+ const [lx, ly] = pts[pts.length-1];
127
+ ctx.beginPath(); ctx.arc(lx, ly, 3.5, 0, Math.PI*2);
128
+ ctx.fillStyle = "#a5c8ff"; ctx.fill();
129
+ }
130
+
131
+ // ── Particles ─────────────────────────────────────────────
132
+ function spawnParticles(px, py) {
133
+ const colors = ["#c084fc","#818cf8","#3b9eff","#2dd4bf","#fbbf24"];
134
+ for (let i = 0; i < 14; i++) {
135
+ const p = document.createElement("div");
136
+ p.className = "particle";
137
+ const angle = (i / 14) * Math.PI * 2;
138
+ const dist = 28 + Math.random() * 42;
139
+ const size = 4 + Math.random() * 6;
140
+ p.style.cssText = `
141
+ left:${px}px; top:${py}px;
142
+ width:${size}px; height:${size}px;
143
+ background:${colors[i % colors.length]};
144
+ box-shadow:0 0 6px ${colors[i%colors.length]};
145
+ --tx:translate(${Math.cos(angle)*dist}px,${Math.sin(angle)*dist}px);
146
+ `;
147
+ particleLayer.appendChild(p);
148
+ setTimeout(() => p.remove(), 780);
149
+ }
150
+ }
151
+
152
+ // ── Trail ghost ───────────────────────────────────────────
153
+ function addTrail(left, top) {
154
+ const g = document.createElement("div");
155
+ g.className = "trail-ghost";
156
+ g.style.left = `${left}px`;
157
+ g.style.top = `${top}px`;
158
+ envGrid.appendChild(g);
159
+ setTimeout(() => g.remove(), 1100);
160
+ }
161
+
162
+ // ── World coordinates ─────────────────────────────────────
163
+ function wx(x) { return x * CELL; }
164
+ function wy(y, H) { return (H - 1 - y) * CELL; }
165
+
166
+ // ── Direction → emoji ─────────────────────────────────────
167
+ const DIR_EMOJI = { UP:"🤖", DOWN:"🤖", LEFT:"🤖", RIGHT:"🤖", COLLECT:"🤖" };
168
+
169
+ // ── Grid render ───────────────────────────────────────────
170
+ function renderGrid(obs, isReset = false) {
171
+ const [W, H] = obs.grid_size;
172
+ WORLD_W = W; WORLD_H = H;
173
+ const worldPx = W * CELL;
174
+ const worldPy = H * CELL;
175
+
176
+ if (isReset) {
177
+ envGrid.innerHTML = "";
178
+ envGrid.style.width = `${worldPx}px`;
179
+ envGrid.style.height = `${worldPy}px`;
180
+ envGrid.style.gridTemplateColumns = `repeat(${W}, ${CELL}px)`;
181
+ envGrid.style.gridTemplateRows = `repeat(${H}, ${CELL}px)`;
182
+ envGrid.style.backgroundSize = `${CELL}px ${CELL}px, ${CELL}px ${CELL}px, 100% 100%`;
183
+
184
+ // Transparent click-target cells
185
+ for (let y = H - 1; y >= 0; y--) {
186
+ for (let x = 0; x < W; x++) {
187
+ const cell = document.createElement("div");
188
+ cell.className = "cell";
189
+ cell.dataset.x = x; cell.dataset.y = y;
190
+ cell.addEventListener("click", () => toggleGarbage(x, y));
191
+ envGrid.appendChild(cell);
192
+ }
193
+ }
194
+
195
+ // 3D obstacle walls
196
+ obs.obstacle_positions.forEach(([x, y]) => {
197
+ const el = document.createElement("div");
198
+ el.className = "world-obstacle";
199
+ el.style.left = `${wx(x)}px`;
200
+ el.style.top = `${wy(y, H)}px`;
201
+ el.style.width = `${CELL}px`;
202
+ el.style.height = `${CELL}px`;
203
+ envGrid.appendChild(el);
204
+ });
205
+
206
+ // Robot entity
207
+ robotEntity = document.createElement("div");
208
+ robotEntity.className = "robot-entity";
209
+ robotEntity.textContent = "🤖";
210
+ robotEntity.style.width = `${CELL}px`;
211
+ robotEntity.style.height = `${CELL}px`;
212
+ robotEntity.style.left = `${wx(obs.robot_position[0])}px`;
213
+ robotEntity.style.top = `${wy(obs.robot_position[1], H)}px`;
214
+ envGrid.appendChild(robotEntity);
215
+
216
+ // ⚡ Home Station
217
+ if (obs.home_position) {
218
+ const home = document.createElement("div");
219
+ home.className = "world-home";
220
+ home.style.left = `${wx(obs.home_position[0])}px`;
221
+ home.style.top = `${wy(obs.home_position[1], H)}px`;
222
+ envGrid.appendChild(home);
223
+ }
224
+
225
+ // 📦 Unload Station
226
+ if (obs.unload_station) {
227
+ const unload = document.createElement("div");
228
+ unload.className = "world-unload";
229
+ unload.style.left = `${wx(obs.unload_station[0])}px`;
230
+ unload.style.top = `${wy(obs.unload_station[1], H)}px`;
231
+ envGrid.appendChild(unload);
232
+ }
233
+
234
+ // Particle layer on top
235
+ const pl = document.createElement("div");
236
+ pl.id = "particle-layer";
237
+ pl.className = "particle-layer";
238
+ envGrid.appendChild(pl);
239
+
240
+ syncRobotTransition();
241
+ }
242
+
243
+ // Continuous robot move
244
+ if (robotEntity) {
245
+ const nl = wx(obs.robot_position[0]);
246
+ const nt = wy(obs.robot_position[1], H);
247
+ robotEntity.style.left = `${nl}px`;
248
+ robotEntity.style.top = `${nt}px`;
249
+ }
250
+
251
+ // Re-render garbage
252
+ document.querySelectorAll(".world-garbage").forEach(g => g.remove());
253
+ obs.garbage_positions.forEach(([x, y]) => {
254
+ const el = document.createElement("div");
255
+ el.className = "world-garbage";
256
+ el.style.left = `${wx(x)}px`;
257
+ el.style.top = `${wy(y, H)}px`;
258
+ el.style.width = `${CELL}px`;
259
+ el.style.height = `${CELL}px`;
260
+ el.innerHTML = `<span>🗑️</span>`;
261
+ el.addEventListener("click", () => toggleGarbage(x, y));
262
+ envGrid.appendChild(el);
263
+ });
264
+
265
+ addLog(obs.message, "sys");
266
+ }
267
+
268
+ // ── Telemetry ─────────────────────────────────────────────
269
+ function updateTelemetry(obs, reward, done) {
270
+ if (obs.battery_level > maxBattery) maxBattery = obs.battery_level;
271
+ const pct = Math.max(0, (obs.battery_level / maxBattery) * 100);
272
+ batteryProgress.style.width = `${pct}%`;
273
+ batteryText.textContent = `${obs.battery_level} / ${maxBattery}`;
274
+
275
+ if (pct > 55) batteryProgress.style.background = "#34d399";
276
+ else if (pct > 25) batteryProgress.style.background = "#fbbf24";
277
+ else batteryProgress.style.background = "#fb7185";
278
+
279
+ // Storage update
280
+ if (obs.storage_capacity) {
281
+ const sPct = (obs.current_storage_load / obs.storage_capacity) * 100;
282
+ storageProgress.style.width = `${sPct}%`;
283
+ storageProgress.style.background = sPct >= 100 ? "#f59e0b" : "#60a5fa";
284
+ storageText.textContent = `${obs.current_storage_load} / ${obs.storage_capacity}`;
285
+ }
286
+
287
+ // Inventory (total collected)
288
+ if (inventoryText) {
289
+ inventoryText.textContent = obs.inventory_count ?? 0;
290
+ }
291
+
292
+ // Mode updates
293
+ const mode = obs.robot_mode || "normal";
294
+ if (mode !== lastMode) {
295
+ addLog(`Robot mode changed to: ${mode.toUpperCase()}`, "sys");
296
+ lastMode = mode;
297
+ }
298
+ modeLabel.textContent = mode.toUpperCase();
299
+
300
+ modePill.classList.remove("normal", "recharging", "unloading");
301
+ modePill.classList.add(mode);
302
+
303
+ if (robotEntity) {
304
+ robotEntity.classList.remove("recharging", "unloading");
305
+ if (mode !== "normal") robotEntity.classList.add(mode);
306
+ }
307
+
308
+ if (reward !== undefined) {
309
+ totalReward += reward;
310
+ rewardHistory.push(totalReward);
311
+ if (rewardHistory.length > 80) rewardHistory.shift();
312
+ scoreText.textContent = totalReward.toFixed(2);
313
+ episodeScoreChip.textContent = `Score ${totalReward.toFixed(2)}`;
314
+ drawChart();
315
+ }
316
+
317
+ stepCounter.textContent = `Step ${stepCount}`;
318
+ }
319
+
320
+ // ── Policy badge ──────────────────────────────────────────
321
+ const POLICY_STYLES = {
322
+ llm: { color:"#3b9eff", border:"rgba(59,158,255,.6)" },
323
+ bfs: { color:"#2dd4bf", border:"rgba(45,212,191,.6)" },
324
+ q_table: { color:"#fbbf24", border:"rgba(251,191,36,.6)" },
325
+ sys: { color:"#7ea8d8", border:"rgba(126,168,216,.3)" },
326
+ };
327
+ function showPolicy(source, action) {
328
+ const s = POLICY_STYLES[source] || POLICY_STYLES.sys;
329
+ policyLabel.textContent = `${source.replace("_","-").toUpperCase()} → ${action}`;
330
+ policyBadge.style.borderColor = s.border;
331
+ policyBadge.style.color = s.color;
332
+ policyBadge.classList.add("active");
333
+ }
334
+
335
+ // ── BFS fallback ──────────────────────────────────────────
336
+ function bfsMove(rPos, target, obstacles, W, H) {
337
+ if (rPos[0]===target[0] && rPos[1]===target[1]) return "COLLECT";
338
+ const obs = new Set(obstacles.map(([x,y]) => `${x},${y}`));
339
+ const dirs = [["RIGHT",1,0],["LEFT",-1,0],["UP",0,1],["DOWN",0,-1]];
340
+ const q = [{pos:[...rPos], first:null}];
341
+ const vis = new Set([`${rPos[0]},${rPos[1]}`]);
342
+
343
+ while (q.length) {
344
+ const {pos, first} = q.shift();
345
+ for (const [name, dx, dy] of dirs) {
346
+ const nx = pos[0]+dx, ny = pos[1]+dy;
347
+ if (nx<0||nx>=W||ny<0||ny>=H) continue;
348
+ const key = `${nx},${ny}`;
349
+ if (obs.has(key)||vis.has(key)) continue;
350
+ const move = first||name;
351
+ if (nx===target[0]&&ny===target[1]) return move;
352
+ vis.add(key); q.push({pos:[nx,ny], first:move});
353
+ }
354
+ }
355
+ return null;
356
+ }
357
+
358
+ function nnOrder(start, targets, obstacles, W, H) {
359
+ function dist(a, b) {
360
+ if (a[0]===b[0]&&a[1]===b[1]) return 0;
361
+ const obs=new Set(obstacles.map(([x,y])=>`${x},${y}`));
362
+ const dirs=[[1,0],[-1,0],[0,1],[0,-1]];
363
+ const q=[{pos:[...a],d:0}];const vis=new Set([`${a[0]},${a[1]}`]);
364
+ while(q.length){const{pos,d}=q.shift();for(const[dx,dy]of dirs){const nx=pos[0]+dx,ny=pos[1]+dy;if(nx<0||nx>=W||ny<0||ny>=H)continue;const k=`${nx},${ny}`;if(obs.has(k)||vis.has(k))continue;if(nx===b[0]&&ny===b[1])return d+1;vis.add(k);q.push({pos:[nx,ny],d:d+1});}}
365
+ return Infinity;
366
+ }
367
+ let rem=[...targets],cur=[...start],ord=[];
368
+ while(rem.length){
369
+ let best=rem[0],bD=dist(cur,best);
370
+ for(const t of rem){const d=dist(cur,t);if(d<bD){bD=d;best=t;}}
371
+ ord.push(best);
372
+ rem=rem.filter(t=>!(t[0]===best[0]&&t[1]===best[1]));
373
+ cur=[...best];
374
+ }
375
+ return ord;
376
+ }
377
+
378
+ function localFallback(obs) {
379
+ if (!obs.garbage_positions.length) return "UP";
380
+ const r = obs.robot_position;
381
+ if (obs.garbage_positions.some(([x,y]) => x===r[0]&&y===r[1])) return "COLLECT";
382
+ const ordered = nnOrder(r, obs.garbage_positions, obs.obstacle_positions, obs.grid_size[0], obs.grid_size[1]);
383
+ return bfsMove(r, ordered[0], obs.obstacle_positions, obs.grid_size[0], obs.grid_size[1]) || "RIGHT";
384
+ }
385
+
386
+ // ── Custom garbage toggle ─────────────────────────────────
387
+ async function toggleGarbage(x, y) {
388
+ if (!currentState || autoMode) return;
389
+ if (currentState.obstacle_positions.some(([ox,oy]) => ox===x&&oy===y)) return;
390
+ if (currentState.robot_position[0]===x && currentState.robot_position[1]===y) return;
391
+
392
+ const has = currentState.garbage_positions.some(([gx,gy]) => gx===x&&gy===y);
393
+ const next = has
394
+ ? currentState.garbage_positions.filter(([gx,gy]) => !(gx===x&&gy===y))
395
+ : [...currentState.garbage_positions, [x, y]];
396
+
397
+ try {
398
+ const res = await fetch(`${API_BASE}/configure`, {
399
+ method: "POST", headers:{"Content-Type":"application/json"},
400
+ body: JSON.stringify({task_id: taskSelect.value, garbage_positions: next})
401
+ });
402
+ const data = await res.json();
403
+ currentState = data.observation;
404
+ renderGrid(currentState);
405
+ addLog(`Garbage ${has?"removed":"placed"} at (${x},${y}) · ${next.length} remaining`, "sys");
406
+ } catch (e) { addLog(`Config error: ${e.message}`, "sys"); }
407
+ }
408
+
409
+ // ── Reset ─────────────────────────────────────────────────
410
+ async function resetEnv() {
411
+ if (autoMode) toggleAutoMode();
412
+ stepCount=0; totalReward=0; rewardHistory=[];
413
+ scoreText.textContent = "0.00";
414
+ episodeScoreChip.textContent = "Score 0.00";
415
+ stepCounter.textContent = "Step 0";
416
+ policyLabel.textContent = "–";
417
+ drawChart();
418
+
419
+ try {
420
+ const res = await fetch(`${API_BASE}/reset`, {
421
+ method:"POST", headers:{"Content-Type":"application/json"},
422
+ body: JSON.stringify({task_id: taskSelect.value})
423
+ });
424
+ const data = await res.json();
425
+ currentState = data.observation;
426
+ maxBattery = currentState.battery_level;
427
+ logFeed.innerHTML = "";
428
+ renderGrid(currentState, true);
429
+ updateTelemetry(currentState);
430
+ statusDot.className = "pulse-dot online";
431
+ statusLabel.textContent = "Connected";
432
+ } catch (e) {
433
+ statusDot.className = "pulse-dot";
434
+ statusLabel.textContent = "Offline";
435
+ addLog(`Cannot reach server — is app.py running?`, "sys");
436
+ }
437
+ }
438
+
439
+ // ── Single step ───────────────────────────────────────────
440
+ async function stepEnv() {
441
+ if (!currentState) return;
442
+ stepCount++;
443
+
444
+ // 1. Policy endpoint (LLM / Q-table on server)
445
+ let action = null, source = "bfs";
446
+ try {
447
+ const pr = await fetch(`${API_BASE}/policy`, {
448
+ method:"POST", headers:{"Content-Type":"application/json"},
449
+ body: JSON.stringify({message: currentState.message})
450
+ });
451
+ if (pr.ok) { const pd = await pr.json(); action=pd.action; source=pd.source||"llm"; }
452
+ } catch (_) {}
453
+
454
+ // 2. Local BFS fallback
455
+ if (!action) { action = localFallback(currentState); source = "bfs"; }
456
+
457
+ showPolicy(source, action);
458
+
459
+ // 3. Execute
460
+ try {
461
+ const res = await fetch(`${API_BASE}/step`, {
462
+ method:"POST", headers:{"Content-Type":"application/json"},
463
+ body: JSON.stringify({command: action})
464
+ });
465
+ const data = await res.json();
466
+
467
+ const wasCollect = action === "COLLECT";
468
+ currentState = data.observation;
469
+ renderGrid(currentState);
470
+ updateTelemetry(currentState, data.reward, data.done);
471
+
472
+ // Collect animation + particles
473
+ if (wasCollect && robotEntity) {
474
+ robotEntity.classList.add("collecting");
475
+ setTimeout(() => robotEntity.classList.remove("collecting"), 440);
476
+ const cx = parseInt(robotEntity.style.left) + CELL/2;
477
+ const cy = parseInt(robotEntity.style.top) + CELL/2;
478
+ spawnParticles(cx, cy);
479
+ }
480
+
481
+ const sign = data.reward >= 0 ? "+" : "";
482
+ addLog(`${action} · ${sign}${data.reward.toFixed(2)}`, source);
483
+
484
+ if (data.done) {
485
+ addLog(`🏁 Episode complete · total ${totalReward.toFixed(2)}`, "sys");
486
+ if (autoMode) toggleAutoMode();
487
+ }
488
+ } catch (e) {
489
+ addLog(`Step error: ${e.message}`, "sys");
490
+ if (autoMode) toggleAutoMode();
491
+ }
492
+ }
493
+
494
+ // ── Auto mode ─────────────────────────────────────────────
495
+ function toggleAutoMode() {
496
+ autoMode = !autoMode;
497
+ if (autoMode) {
498
+ autoBtn.textContent = "⏹ Stop";
499
+ autoBtn.className = "btn stop";
500
+ autoTimer = setInterval(stepEnv, stepDelay);
501
+ } else {
502
+ autoBtn.textContent = "▶ Run Policy";
503
+ autoBtn.className = "btn primary";
504
+ clearInterval(autoTimer);
505
+ }
506
+ }
507
+
508
+ // ── Event listeners ───────────────────────────────────────
509
+ resetBtn .addEventListener("click", resetEnv);
510
+ autoBtn .addEventListener("click", toggleAutoMode);
511
+ manualBtn.addEventListener("click", stepEnv);
512
+ taskSelect.addEventListener("change", resetEnv);
513
+
514
+ // ── Boot ──────────────────────────────────────────────────
515
+ resetEnv();
frontend/style.css ADDED
@@ -0,0 +1,634 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /* ═══════════════════════════════════════════════════════
2
+ GarbageBot — Bright Light Mode Dashboard CSS
3
+ ═══════════════════════════════════════════════════════ */
4
+
5
+ /* ── Design tokens ────────────────────────────────────── */
6
+ :root {
7
+ /* Background — bright soft white/cyan */
8
+ --bg: #f1f5f9;
9
+ --surface: rgba(255, 255, 255, 0);
10
+ --surface-2: rgba(255, 255, 255, 0);
11
+ --surface-hi: rgba(255, 255, 255, 1);
12
+
13
+ --border: rgba(0, 0, 0, 0.08);
14
+ --border-hi: rgba(0, 0, 0, 0.16);
15
+ --border-glow: rgba(59, 130, 246, 0.3);
16
+
17
+ --text: #1e293b;
18
+ --text-muted: #64748b;
19
+ --text-dim: #94a3b8;
20
+
21
+ /* Vibrant accents (slightly darker for contrast on white) */
22
+ --blue: #3b82f6;
23
+ --blue-glow: rgba(59, 130, 246, 0.45);
24
+ --indigo: #6366f1;
25
+ --indigo-glow: rgba(99, 102, 241, 0.45);
26
+ --teal: #14b8a6;
27
+ --teal-glow: rgba(20, 184, 166, 0.4);
28
+ --purple: #a855f7;
29
+ --purple-glow: rgba(168, 85, 247, 0.45);
30
+ --success: #10b981;
31
+ --success-glow: rgba(16, 185, 129, 0.4);
32
+ --warning: #f59e0b;
33
+ --warning-glow: rgba(245, 158, 11, 0.4);
34
+ --danger: #ef4444;
35
+ --danger-glow: rgba(239, 68, 68, 0.5);
36
+ --neon: #0ea5e9;
37
+ --neon-glow: rgba(14, 165, 233, 0.35);
38
+
39
+ /* World */
40
+ --floor: #ffffff;
41
+ --floor-light: #f8fafc;
42
+ --wall: #e2e8f0;
43
+ --wall-top: #f1f5f9;
44
+ --wall-shadow: rgba(15, 23, 42, 0.15);
45
+
46
+ --radius: 16px;
47
+ --radius-sm: 10px;
48
+ --radius-xs: 6px;
49
+
50
+ --cell: 52px;
51
+ --gap: 3px;
52
+ --pad: 10px;
53
+
54
+ --font: 'Inter', sans-serif;
55
+ --mono: 'JetBrains Mono', monospace;
56
+ }
57
+
58
+ /* ── Reset ─────────────────────────────────────────────── */
59
+ *, *::before, *::after { box-sizing: border-box; margin: 0; padding: 0; }
60
+ html { font-size: 15px; }
61
+ body {
62
+ font-family: var(--font);
63
+ background: var(--bg);
64
+ color: var(--text);
65
+ min-height: 100vh;
66
+ overflow-x: hidden;
67
+ }
68
+
69
+ /* ── Animated background orbs ───────────────────────────── */
70
+ .bg-orbs {
71
+ position: fixed; inset: 0;
72
+ pointer-events: none; z-index: 0; overflow: hidden;
73
+ }
74
+ .orb {
75
+ position: absolute; border-radius: 50%;
76
+ filter: blur(100px);
77
+ animation: drift 28s ease-in-out infinite alternate;
78
+ }
79
+ /* Opacities remain similar (since background is white, colors will act like pale washes) */
80
+ .orb-1 { width:700px;height:700px; background:radial-gradient(circle,var(--blue) 0%,transparent 70%); opacity:.15; left:-200px;top:-150px; animation-delay:0s; }
81
+ .orb-2 { width:600px;height:600px; background:radial-gradient(circle,var(--purple) 0%,transparent 70%); opacity:.15; right:-180px;bottom:-100px; animation-delay:-9s; }
82
+ .orb-3 { width:500px;height:500px; background:radial-gradient(circle,var(--teal) 0%,transparent 70%); opacity:.1; left:42%;top:25%; animation-delay:-18s; }
83
+ .orb-4 { width:350px;height:350px; background:radial-gradient(circle,var(--warning) 0%,transparent 70%); opacity:.1; right:30%;top:-80px; animation-delay:-5s; }
84
+
85
+ @keyframes drift {
86
+ from { transform: translate(0, 0) scale(1); }
87
+ to { transform: translate(50px, 35px) scale(1.12); }
88
+ }
89
+
90
+ /* Dot-grid background */
91
+ .grid-bg {
92
+ position: fixed; inset: 0; z-index: 0; pointer-events: none;
93
+ background-image: radial-gradient(circle, rgba(15,23,42,.06) 1px, transparent 1px);
94
+ background-size: 28px 28px;
95
+ }
96
+
97
+ /* ── Layout ─────────────────────────────────────────────── */
98
+ .dashboard {
99
+ position: relative; z-index: 1;
100
+ max-width: 1360px;
101
+ margin: 0 auto;
102
+ padding: 1.25rem 1.5rem;
103
+ display: flex; flex-direction: column; gap: 1.1rem;
104
+ min-height: 100vh;
105
+ }
106
+
107
+ /* ── Glass panel ─────────────────────────────────────────── */
108
+ .panel {
109
+ background: var(--surface);
110
+ border: 1px solid var(--border);
111
+ border-radius: var(--radius);
112
+ }
113
+ .glass {
114
+ backdrop-filter: blur(24px) saturate(180%);
115
+ -webkit-backdrop-filter: blur(24px) saturate(180%);
116
+ box-shadow: 0 4px 48px rgba(15, 23, 42, 0.06), inset 0 1px 0 rgba(255,255,255,1);
117
+ }
118
+
119
+ /* ── HEADER ─────────────────────────────────────────────── */
120
+ .header-bar {
121
+ display: flex; align-items: center; gap: 1.2rem;
122
+ padding: .9rem 1.4rem;
123
+ flex-wrap: wrap;
124
+ background: linear-gradient(135deg, rgba(255,255,255,0.92) 0%, rgba(248,250,252,0.88) 100%);
125
+ border: 1px solid var(--border);
126
+ border-top: 1px solid rgba(255,255,255,1);
127
+ border-radius: var(--radius);
128
+ backdrop-filter: blur(28px);
129
+ box-shadow: 0 2px 36px rgba(15,23,42,0.05), 0 0 60px rgba(59,130,246,0.06), inset 0 1px 0 rgba(255,255,255,1);
130
+ }
131
+
132
+ .brand { display: flex; align-items: center; gap: .9rem; }
133
+ .brand-icon {
134
+ font-size: 2rem;
135
+ filter: drop-shadow(0 0 12px rgba(99,102,241,.4));
136
+ animation: bob 3s ease-in-out infinite;
137
+ }
138
+ @keyframes bob {
139
+ 0%,100%{ transform:translateY(0) rotate(-2deg); }
140
+ 50% { transform:translateY(-5px) rotate(2deg); }
141
+ }
142
+ h1 {
143
+ font-size: 1.4rem; font-weight: 800;
144
+ color: var(--text);
145
+ background: linear-gradient(135deg, var(--blue) 0%, var(--purple) 100%);
146
+ -webkit-background-clip: text; background-clip: text;
147
+ -webkit-text-fill-color: transparent;
148
+ letter-spacing: -.04em;
149
+ }
150
+ .version-tag {
151
+ font-size: .62rem; font-weight: 700;
152
+ background: rgba(99,102,241,.1); color: var(--indigo);
153
+ padding: 2px 7px; border-radius: 20px;
154
+ border: 1px solid rgba(99,102,241,.25);
155
+ vertical-align: middle; margin-left: 6px;
156
+ -webkit-text-fill-color: initial;
157
+ }
158
+ .brand-sub { font-size: .72rem; color: var(--text-muted); font-family: var(--mono); margin-top: 3px; }
159
+
160
+ /* Status strip */
161
+ .status-strip { display:flex; gap:.7rem; align-items:center; margin-left:auto; }
162
+ .status-pill {
163
+ display:flex; align-items:center; gap:.45rem;
164
+ font-size:.8rem; font-family:var(--mono);
165
+ padding:.35rem .9rem;
166
+ border-radius:20px;
167
+ background:rgba(255,255,255,.6);
168
+ border:1px solid var(--border);
169
+ box-shadow: 0 2px 4px rgba(0,0,0,0.02);
170
+ }
171
+ .pulse-dot {
172
+ width:9px;height:9px;border-radius:50%;
173
+ background:var(--danger);
174
+ box-shadow:0 0 6px var(--danger-glow);
175
+ transition:all .4s;
176
+ }
177
+ .pulse-dot.online {
178
+ background: var(--success);
179
+ box-shadow: 0 0 6px var(--success-glow);
180
+ animation: blink 2.2s ease-in-out infinite;
181
+ }
182
+ @keyframes blink { 0%,100%{opacity:1} 50%{opacity:.4} }
183
+
184
+ .policy-badge {
185
+ display:flex; align-items:center; gap:.4rem;
186
+ font-size:.8rem; font-family:var(--mono);
187
+ padding:.35rem .9rem;
188
+ border-radius:20px;
189
+ background: linear-gradient(135deg,rgba(59,130,246,.08),rgba(168,85,247,.08));
190
+ border:1px solid rgba(59,130,246,.25);
191
+ transition:all .4s;
192
+ color: var(--text);
193
+ }
194
+ .policy-badge.active { border-color:rgba(59,130,246,.5); box-shadow:0 0 12px rgba(59,130,246,.1); }
195
+ .badge-icon { font-size:.9rem; }
196
+
197
+ #mode-pill {
198
+ background: rgba(255, 255, 255, 0.8);
199
+ color: var(--text-muted);
200
+ font-weight: 700;
201
+ transition: all 0.3s ease;
202
+ }
203
+ #mode-pill.recharging { background: var(--blue); color: #fff; box-shadow: 0 0 12px var(--blue-glow); }
204
+ #mode-pill.unloading { background: var(--warning); color: #fff; box-shadow: 0 0 12px var(--warning-glow); }
205
+ #mode-pill.normal { background: rgba(0, 0, 0, 0.05); color: var(--text-muted); }
206
+
207
+ /* Controls */
208
+ .controls { display:flex; gap:.65rem; align-items:center; flex-wrap:wrap; }
209
+
210
+ select {
211
+ background:rgba(255,255,255,.6);
212
+ color:var(--text);
213
+ border:1px solid var(--border);
214
+ padding:.52rem .9rem;
215
+ border-radius:var(--radius-sm);
216
+ font-size:.85rem; font-family:var(--font);
217
+ outline:none; cursor:pointer;
218
+ transition:border-color .2s,box-shadow .2s;
219
+ box-shadow: 0 1px 3px rgba(0,0,0,.02);
220
+ }
221
+ select:hover,select:focus { border-color:var(--blue); box-shadow:0 0 0 3px rgba(59,130,246,.15); }
222
+
223
+ /* Speed control */
224
+ .speed-group {
225
+ display:flex; align-items:center; gap:.5rem;
226
+ padding:.4rem .85rem;
227
+ background:rgba(255,255,255,.6);
228
+ border:1px solid var(--border);
229
+ border-radius:var(--radius-sm);
230
+ }
231
+ .speed-label { font-size:.75rem; color:var(--text-muted); }
232
+ #speed-slider {
233
+ -webkit-appearance:none; appearance:none;
234
+ width:85px; height:4px;
235
+ background:linear-gradient(90deg, var(--blue) 0%, rgba(0,0,0,.08) 0%);
236
+ border-radius:2px; outline:none; cursor:pointer;
237
+ transition:background .1s;
238
+ }
239
+ #speed-slider::-webkit-slider-thumb {
240
+ -webkit-appearance:none;
241
+ width:16px;height:16px;border-radius:50%;
242
+ background:var(--blue);
243
+ box-shadow:0 0 6px var(--blue-glow);
244
+ cursor:pointer;
245
+ transition:transform .15s;
246
+ }
247
+ #speed-slider::-webkit-slider-thumb:active { transform:scale(1.25); }
248
+ #speed-val { font-size:.75rem;color:var(--text-muted);font-family:var(--mono);min-width:40px;text-align:right; }
249
+
250
+ /* Buttons */
251
+ .btn {
252
+ padding:.55rem 1.15rem;
253
+ border-radius:var(--radius-sm);
254
+ font-size:.875rem; font-weight:700;
255
+ cursor:pointer; border:none;
256
+ transition:all .18s ease;
257
+ position:relative; overflow:hidden;
258
+ white-space:nowrap; letter-spacing:.01em;
259
+ }
260
+ .btn::after {
261
+ content:''; position:absolute; inset:0;
262
+ background:rgba(0,0,0,0); transition:background .18s;
263
+ }
264
+ .btn:hover::after { background:rgba(0,0,0,.03); }
265
+ .btn:active { transform:scale(.96); }
266
+
267
+ .btn.primary {
268
+ background:linear-gradient(135deg, var(--blue), var(--indigo));
269
+ color:#fff;
270
+ box-shadow:0 4px 18px var(--blue-glow);
271
+ }
272
+ .btn.primary:hover { box-shadow:0 6px 24px var(--blue-glow); transform:translateY(-1px); }
273
+
274
+ .btn.secondary {
275
+ background:rgba(255,255,255,1);
276
+ color:var(--text); border:1px solid var(--border);
277
+ box-shadow: 0 1px 3px rgba(0,0,0,0.05);
278
+ }
279
+ .btn.secondary:hover { border-color:var(--border-hi); }
280
+
281
+ .btn.outline {
282
+ background:transparent; color:var(--text);
283
+ border:1px solid var(--border);
284
+ }
285
+ .btn.outline:hover { border-color:var(--border-hi); background:rgba(0,0,0,.02); }
286
+
287
+ .btn.stop {
288
+ background:linear-gradient(135deg, var(--danger), #ef4444);
289
+ color:#fff;
290
+ box-shadow:0 4px 18px var(--danger-glow);
291
+ }
292
+ .btn.stop:hover { box-shadow:0 6px 24px var(--danger-glow); transform:translateY(-1px); }
293
+
294
+ /* ── MAIN LAYOUT ────────────────────────────────────────── */
295
+ main {
296
+ display:grid;
297
+ grid-template-columns:1fr 310px;
298
+ gap:1.1rem;
299
+ flex:1;
300
+ }
301
+
302
+ /* ── GRID PANEL ─────────────────────────────────────────── */
303
+ .grid-panel {
304
+ display:flex; flex-direction:column;
305
+ padding:1.25rem; gap:.75rem;
306
+ }
307
+ .grid-header {
308
+ display:flex; align-items:center; justify-content:space-between;
309
+ }
310
+ .grid-title {
311
+ font-size:.75rem;font-weight:700;
312
+ text-transform:uppercase;letter-spacing:.1em;color:var(--text-muted);
313
+ }
314
+ .grid-meta { display:flex; gap:.5rem; }
315
+
316
+ .mono-chip {
317
+ font-family:var(--mono); font-size:.75rem;
318
+ padding:.22rem .65rem; border-radius:20px;
319
+ background:rgba(0,0,0,.03);
320
+ border:1px solid var(--border); color:var(--text-muted);
321
+ }
322
+ .accent-chip {
323
+ color:var(--blue); border-color:rgba(59,130,246,.25);
324
+ background:rgba(59,130,246,.08);
325
+ }
326
+
327
+ .grid-stage {
328
+ flex:1; display:flex; align-items:center; justify-content:center;
329
+ position:relative;
330
+ }
331
+
332
+ /* ─── The World ──────────────────────────────────────────── */
333
+ .grid-world {
334
+ display:grid;
335
+ gap:0; /* no cell gap — seamless floor */
336
+ position:relative;
337
+ border-radius:var(--radius);
338
+ overflow:hidden;
339
+ background: var(--floor);
340
+ border:1px solid rgba(0,0,0,.06);
341
+ box-shadow:
342
+ 0 8px 30px rgba(15,23,42,.06),
343
+ inset 0 0 20px rgba(0,0,0,.02);
344
+
345
+ /* Continuous floor: subtle tile lines */
346
+ background-image:
347
+ linear-gradient(rgba(0,0,0,.03) 1px, transparent 1px),
348
+ linear-gradient(90deg, rgba(0,0,0,.03) 1px, transparent 1px);
349
+ background-size:
350
+ var(--cell) var(--cell),
351
+ var(--cell) var(--cell);
352
+ }
353
+
354
+ /* Transparent click-target cells — world feels seamless */
355
+ .cell {
356
+ width:var(--cell); height:var(--cell);
357
+ background:transparent;
358
+ position:relative; z-index:2;
359
+ cursor:pointer;
360
+ transition:background .18s;
361
+ }
362
+ .cell:hover { background:rgba(0,0,0,.025); }
363
+
364
+ /* ── OBSTACLES — 3D walls ────────────────────────────────── */
365
+ .world-obstacle {
366
+ position:absolute; z-index:10;
367
+ width:var(--cell); height:var(--cell);
368
+ border-radius:4px;
369
+ background:linear-gradient(160deg, var(--wall-top) 0%, var(--wall) 45%, #94a3b8 100%);
370
+ border:1px solid rgba(0,0,0,.15);
371
+ border-top-color:rgba(255,255,255,.8);
372
+ border-left-color:rgba(255,255,255,.4);
373
+ box-shadow:
374
+ inset -1px 0 0 rgba(0,0,0,.08),
375
+ 0 6px 0 0 #94a3b8, /* 3D depth */
376
+ 0 8px 12px rgba(15,23,42,.25); /* floor shadow */
377
+ overflow:hidden;
378
+ }
379
+ .world-obstacle::before {
380
+ content:''; position:absolute; inset:0;
381
+ background:linear-gradient(180deg,rgba(255,255,255,.2) 0%,transparent 40%);
382
+ }
383
+ /* stone texture lines */
384
+ .world-obstacle::after {
385
+ content:''; position:absolute; inset:0;
386
+ background:repeating-linear-gradient(
387
+ 0deg, transparent, transparent 14px,
388
+ rgba(0,0,0,.03) 14px, rgba(0,0,0,.03) 15px
389
+ ),
390
+ repeating-linear-gradient(
391
+ 90deg, transparent, transparent 14px,
392
+ rgba(0,0,0,.02) 14px, rgba(0,0,0,.02) 15px
393
+ );
394
+ }
395
+
396
+ /* ── GARBAGE — glowing litter ────────────────────────────── */
397
+ .world-garbage {
398
+ position:absolute; z-index:8;
399
+ width:var(--cell); height:var(--cell);
400
+ display:flex; align-items:center; justify-content:center;
401
+ font-size:1.4rem;
402
+ cursor:pointer;
403
+ border-radius:var(--radius-xs);
404
+ background:rgba(168,85,247,.1);
405
+ border:1px solid rgba(168,85,247,.3);
406
+ animation: garbo-spawn .35s cubic-bezier(.34,1.56,.64,1) both;
407
+ transition:transform .15s;
408
+ }
409
+ .world-garbage:hover { transform:scale(1.12); }
410
+ @keyframes garbo-spawn {
411
+ from { transform:scale(0) rotate(-25deg); opacity:0; }
412
+ to { transform:scale(1) rotate(0); opacity:1; }
413
+ }
414
+ /* glow ring */
415
+ .world-garbage::before {
416
+ content:''; position:absolute; inset:-3px; border-radius:inherit;
417
+ border:1.5px solid rgba(168,85,247,.4);
418
+ animation:garbo-ring 2.4s ease-out infinite;
419
+ }
420
+ @keyframes garbo-ring {
421
+ 0% { opacity:.6; transform:scale(1); }
422
+ 70% { opacity:0; transform:scale(1.45); }
423
+ 100% { opacity:0; transform:scale(1.45); }
424
+ }
425
+ /* bounce */
426
+ .world-garbage span { display:block; animation:garbo-bob 2.2s ease-in-out infinite; filter: drop-shadow(0 4px 6px rgba(168,85,247,.2)); }
427
+ @keyframes garbo-bob {
428
+ 0%,100%{ transform:translateY(0); }
429
+ 50% { transform:translateY(-4px); }
430
+ }
431
+
432
+ /* ── STATIONS ───────────────────────────────────────────── */
433
+ .world-home, .world-unload {
434
+ position: absolute;
435
+ width: var(--cell); height: var(--cell);
436
+ display: flex; align-items: center; justify-content: center;
437
+ font-size: 1.2rem;
438
+ border-radius: 4px;
439
+ z-index: 5;
440
+ pointer-events: none;
441
+ }
442
+ .world-home {
443
+ background: rgba(59, 130, 246, 0.08);
444
+ border: 2px dashed rgba(59, 130, 246, 0.3);
445
+ }
446
+ .world-home::before {
447
+ content: '⚡'; filter: drop-shadow(0 0 8px var(--blue));
448
+ }
449
+
450
+ .world-unload {
451
+ background: rgba(245, 158, 11, 0.08);
452
+ border: 2px dashed rgba(245, 158, 11, 0.3);
453
+ }
454
+ .world-unload::before {
455
+ content: '📦'; filter: drop-shadow(0 0 8px var(--warning));
456
+ }
457
+
458
+ /* ── ROBOT ───────────────────────────────────────────────── */
459
+ .robot-entity {
460
+ position:absolute; z-index:30;
461
+ width:var(--cell); height:var(--cell);
462
+ display:flex; align-items:center; justify-content:center;
463
+ font-size:1.3rem;
464
+ border-radius:var(--radius-sm);
465
+ /* vivid gradient body */
466
+ background:linear-gradient(145deg, #60a5fa, #818cf8 50%, #6366f1);
467
+ border:1px solid rgba(255,255,255,.6);
468
+ border-top-color:rgba(255,255,255,.9);
469
+ /* multi-layer shadow */
470
+ box-shadow:
471
+ 0 0 0 1px rgba(59,130,246,.2),
472
+ 0 4px 12px rgba(59,130,246,.3),
473
+ 0 8px 24px rgba(15,23,42,.2);
474
+
475
+ transition:
476
+ left var(--move-dur,350ms) linear,
477
+ top var(--move-dur,350ms) linear,
478
+ background 0.4s ease,
479
+ box-shadow 0.4s ease;
480
+ will-change: left, top;
481
+ }
482
+
483
+ .robot-entity.recharging {
484
+ background: linear-gradient(145deg, #3b82f6, #60a5fa);
485
+ box-shadow: 0 0 20px var(--blue-glow);
486
+ }
487
+ .robot-entity.unloading {
488
+ background: linear-gradient(145deg, #f59e0b, #fbbf24);
489
+ box-shadow: 0 0 20px var(--warning-glow);
490
+ }
491
+ /* top highlight */
492
+ .robot-entity::before {
493
+ content:''; position:absolute; top:2px; left:3px; right:3px; height:40%;
494
+ background:linear-gradient(180deg,rgba(255,255,255,.5) 0%,transparent 100%);
495
+ border-radius: 4px 4px 50% 50%;
496
+ }
497
+ /* ground shadow */
498
+ .robot-entity::after {
499
+ content:''; position:absolute; bottom:-12px; left:50%;
500
+ transform:translateX(-50%);
501
+ width:80%; height:10px;
502
+ background:rgba(15,23,42,.25);
503
+ border-radius:50%; filter:blur(4px);
504
+ }
505
+
506
+ /* Collect burst */
507
+ .robot-entity.collecting {
508
+ animation:robot-collect .42s ease-out;
509
+ }
510
+ @keyframes robot-collect {
511
+ 0% { box-shadow:0 0 0 1px rgba(59,130,246,.2),0 4px 12px rgba(59,130,246,.3),0 8px 24px rgba(15,23,42,.2); }
512
+ 35% { box-shadow:0 0 0 2px var(--purple),0 12px 35px var(--purple-glow),0 20px 45px rgba(15,23,42,.15); }
513
+ 100% { box-shadow:0 0 0 1px rgba(59,130,246,.2),0 4px 12px rgba(59,130,246,.3),0 8px 24px rgba(15,23,42,.2); }
514
+ }
515
+
516
+ /* ── Trail ghost ────────────────────────────────────────── */
517
+ .trail-ghost {
518
+ position:absolute; z-index:6;
519
+ width:var(--cell); height:var(--cell);
520
+ border-radius:var(--radius-sm);
521
+ background:rgba(59,130,246,.08);
522
+ border:1px solid rgba(59,130,246,.15);
523
+ pointer-events:none;
524
+ animation:trail-fade 1.1s ease-out forwards;
525
+ }
526
+ @keyframes trail-fade {
527
+ from { opacity:1; transform:scale(1); }
528
+ to { opacity:0; transform:scale(.82); }
529
+ }
530
+
531
+ /* ── Particle burst ─────────────────────────────────────── */
532
+ .particle-layer { position:absolute; inset:0; pointer-events:none; z-index:40; }
533
+ .particle {
534
+ position:absolute;
535
+ border-radius:50%; pointer-events:none;
536
+ animation:pfx .75s ease-out forwards;
537
+ }
538
+ @keyframes pfx {
539
+ 0% { transform:translate(0,0) scale(1); opacity:1; }
540
+ 100% { transform:var(--tx) scale(0); opacity:0; }
541
+ }
542
+
543
+ /* ── Grid hint ──────────────────────────────────────────── */
544
+ .grid-hint { font-size:.72rem; color:var(--text-dim); text-align:center; }
545
+
546
+ /* ── SIDE PANEL ─────────────────────────────────────────── */
547
+ .side-panel { display:flex; flex-direction:column; gap:1.1rem; }
548
+
549
+ .section-title {
550
+ font-size:.72rem; font-weight:700;
551
+ text-transform:uppercase; letter-spacing:.12em; color:var(--text-muted);
552
+ margin-bottom:.9rem;
553
+ }
554
+
555
+ /* Telemetry card */
556
+ .tele-card { padding:1.2rem; }
557
+ .stat-row { display:flex; align-items:center; gap:.75rem; margin-bottom:.95rem; }
558
+ .stat-icon { font-size:1.3rem; flex-shrink:0; width:34px; text-align:center; filter:drop-shadow(0 2px 4px rgba(0,0,0,.1)); }
559
+ .stat-body { flex:1; }
560
+ .stat-label-row { display:flex; justify-content:space-between; align-items:baseline; margin-bottom:.35rem; }
561
+ .stat-label { font-size:.78rem; color:var(--text-muted); }
562
+ .stat-num { font-family:var(--mono); font-size:.85rem; font-weight:600; color:var(--text); }
563
+ .stat-num.big-num { font-size:1.55rem; font-weight:800; }
564
+ .stat-num.accent { color:var(--blue); }
565
+
566
+ .progress-track {
567
+ height:7px; background:rgba(0,0,0,.06);
568
+ border-radius:4px; overflow:hidden;
569
+ }
570
+ .progress-fill {
571
+ height:100%; border-radius:4px;
572
+ background:var(--success);
573
+ transition:width .45s ease, background .45s ease;
574
+ position:relative; overflow:hidden;
575
+ }
576
+ .progress-fill::after {
577
+ content:''; position:absolute; inset:0;
578
+ background:linear-gradient(90deg,transparent,rgba(255,255,255,.4),transparent);
579
+ animation:shimmer 1.8s linear infinite;
580
+ }
581
+ @keyframes shimmer { from{transform:translateX(-100%)} to{transform:translateX(100%)} }
582
+
583
+ /* Mini chart */
584
+ .chart-wrap { margin-top:.5rem; }
585
+ #reward-chart { width:100%; height:68px; border-radius:var(--radius-xs); }
586
+
587
+ /* Log card */
588
+ .log-card { padding:1.2rem; display:flex; flex-direction:column; flex:1; min-height:0; }
589
+ .log-header { display:flex; justify-content:space-between; align-items:center; margin-bottom:.7rem; }
590
+ .log-header .section-title { margin-bottom:0; }
591
+ .clear-btn { font-size:.7rem; color:var(--text-muted); background:none; border:none; cursor:pointer; transition:color .2s; }
592
+ .clear-btn:hover { color:var(--danger); }
593
+
594
+ .log-feed {
595
+ flex:1; display:flex; flex-direction:column; gap:.45rem;
596
+ overflow-y:auto; max-height:280px; padding-right:.2rem;
597
+ }
598
+ .placeholder { font-size:.8rem; color:var(--text-dim); text-align:center; padding:1rem; }
599
+
600
+ .log-entry {
601
+ display:flex; gap:.6rem; align-items:flex-start;
602
+ font-size:.77rem; line-height:1.45;
603
+ padding:.55rem .7rem;
604
+ border-radius:var(--radius-sm);
605
+ background:rgba(255,255,255,1);
606
+ border:1px solid rgba(0,0,0,.04);
607
+ box-shadow: 0 1px 3px rgba(15,23,42,.03);
608
+ color: var(--text);
609
+ animation:slide-in .22s cubic-bezier(.22,1,.36,1);
610
+ transition:border-color .2s, box-shadow .2s;
611
+ }
612
+ .log-entry:hover { border-color:var(--border-hi); box-shadow: 0 2px 6px rgba(15,23,42,.06); }
613
+ @keyframes slide-in { from{opacity:0;transform:translateY(5px)} to{opacity:1;transform:translateY(0)} }
614
+
615
+ .log-badge {
616
+ font-family:var(--mono); font-size:.64rem; font-weight:700;
617
+ padding:2px 6px; border-radius:4px; flex-shrink:0; margin-top:2px;
618
+ }
619
+ .log-badge.llm { background:rgba(59,130,246,.12); color:var(--blue); }
620
+ .log-badge.bfs { background:rgba(20,184,166,.12); color:var(--teal); }
621
+ .log-badge.q-table { background:rgba(245,158,11,.15); color:#d97706; }
622
+ .log-badge.sys { background:rgba(0,0,0,.05); color:var(--text-muted); }
623
+
624
+ .log-footer {
625
+ font-size:.7rem; color:var(--text-dim);
626
+ text-align:center; margin-top:.6rem; padding-top:.55rem;
627
+ border-top:1px solid var(--border);
628
+ }
629
+
630
+ /* ── Scrollbar ──────────────────────────────────────────── */
631
+ ::-webkit-scrollbar { width:5px; }
632
+ ::-webkit-scrollbar-track { background:transparent; }
633
+ ::-webkit-scrollbar-thumb { background:rgba(0,0,0,.15); border-radius:3px; }
634
+ ::-webkit-scrollbar-thumb:hover { background:rgba(0,0,0,.25); }
garbage_robot_dataset.jsonl ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (0, 0). Garbage at [(4, 4)]. Battery: 30/30. No obstacles nearby.\n\n### Response:\nRIGHT"}
2
+ {"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (1, 0). Garbage at [(4, 4)]. Battery: 29/30. No obstacles nearby.\n\n### Response:\nRIGHT"}
3
+ {"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (2, 0). Garbage at [(4, 4)]. Battery: 28/30. No obstacles nearby.\n\n### Response:\nRIGHT"}
4
+ {"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (3, 0). Garbage at [(4, 4)]. Battery: 27/30. No obstacles nearby.\n\n### Response:\nRIGHT"}
5
+ {"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (4, 0). Garbage at [(4, 4)]. Battery: 26/30. No obstacles nearby.\n\n### Response:\nUP"}
6
+ {"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (4, 1). Garbage at [(4, 4)]. Battery: 25/30. No obstacles nearby.\n\n### Response:\nUP"}
7
+ {"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (4, 2). Garbage at [(4, 4)]. Battery: 24/30. No obstacles nearby.\n\n### Response:\nUP"}
8
+ {"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (4, 3). Garbage at [(4, 4)]. Battery: 23/30. No obstacles nearby.\n\n### Response:\nUP"}
9
+ {"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (4, 4). Garbage at [(4, 4)]. Battery: 22/30. You are ON the garbage.\n\n### Response:\nCOLLECT"}
10
+ {"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (3, 3). Garbage at [(1,1),(5,5),(1,5)]. Battery: 50/50. BLOCKED! DOWN is an obstacle. Blocked directions: DOWN, LEFT. Choose a different direction.\n\n### Response:\nUP"}
11
+ {"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (3, 4). Garbage at [(1,1),(5,5),(1,5)]. Battery: 49/50. Moving toward (1,5).\n\n### Response:\nLEFT"}
12
+ {"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (2, 4). Garbage at [(1,1),(5,5),(1,5)]. Battery: 48/50. BLOCKED! LEFT is an obstacle. Blocked directions: LEFT. Choose RIGHT or UP.\n\n### Response:\nUP"}
13
+ {"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (2, 5). Garbage at [(1,1),(5,5),(1,5)]. Battery: 47/50. Clear path left.\n\n### Response:\nLEFT"}
14
+ {"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (1, 5). Garbage at [(1,1),(5,5),(1,5)]. Battery: 46/50. You are ON the garbage.\n\n### Response:\nCOLLECT"}
15
+ {"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (1, 5). Garbage at [(1,1),(5,5)]. Battery: 45/50. Next target (5,5), moving right.\n\n### Response:\nRIGHT"}
16
+ {"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (2, 5). Garbage at [(1,1),(5,5)]. Battery: 44/50. Continuing right.\n\n### Response:\nRIGHT"}
17
+ {"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (3, 5). Garbage at [(1,1),(5,5)]. Battery: 43/50. Continuing right.\n\n### Response:\nRIGHT"}
18
+ {"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (4, 5). Garbage at [(1,1),(5,5)]. Battery: 42/50. Continuing right.\n\n### Response:\nRIGHT"}
19
+ {"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (5, 5). Garbage at [(1,1),(5,5)]. Battery: 41/50. You are ON the garbage.\n\n### Response:\nCOLLECT"}
20
+ {"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (5, 5). Garbage at [(1,1)]. Battery: 40/50. Last garbage at (1,1), heading left+down.\n\n### Response:\nLEFT"}
21
+ {"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (4, 5). Garbage at [(1,1)]. Battery: 39/50. Continuing toward (1,1).\n\n### Response:\nLEFT"}
22
+ {"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (3, 5). Garbage at [(1,1)]. Battery: 38/50. BLOCKED! DOWN is an obstacle. Go LEFT.\n\n### Response:\nLEFT"}
23
+ {"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (2, 5). Garbage at [(1,1)]. Battery: 37/50. BLOCKED! DOWN is an obstacle. Go LEFT.\n\n### Response:\nLEFT"}
24
+ {"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (1, 5). Garbage at [(1,1)]. Battery: 36/50. Path down is clear now.\n\n### Response:\nDOWN"}
25
+ {"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (1, 4). Garbage at [(1,1)]. Battery: 35/50. Continuing down.\n\n### Response:\nDOWN"}
26
+ {"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (1, 3). Garbage at [(1,1)]. Battery: 34/50. Continuing down.\n\n### Response:\nDOWN"}
27
+ {"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (1, 2). Garbage at [(1,1)]. Battery: 33/50. Continuing down.\n\n### Response:\nDOWN"}
28
+ {"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (1, 1). Garbage at [(1,1)]. Battery: 32/50. You are ON the last garbage.\n\n### Response:\nCOLLECT"}
29
+ {"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (2, 2). Garbage at [(4,4)]. Battery: 5/30. CRITICAL battery! Move directly: RIGHT.\n\n### Response:\nRIGHT"}
30
+ {"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (3, 2). Garbage at [(4,4)]. Battery: 4/30. CRITICAL battery! Move directly: RIGHT.\n\n### Response:\nRIGHT"}
31
+ {"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (4, 2). Garbage at [(4,4)]. Battery: 3/30. CRITICAL battery! Move directly: UP.\n\n### Response:\nUP"}
32
+ {"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (4, 3). Garbage at [(4,4)]. Battery: 2/30. CRITICAL battery! Move directly: UP.\n\n### Response:\nUP"}
33
+ {"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (4, 4). Garbage at [(4,4)]. Battery: 1/30. You are ON the garbage. COLLECT NOW.\n\n### Response:\nCOLLECT"}
34
+ {"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (2, 3). Garbage at [(4,4)]. Battery: 20/30. You are NOT on garbage. Move toward it.\n\n### Response:\nRIGHT"}
35
+ {"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (0, 0). Garbage at [(3,3)]. Battery: 15/30. You are NOT on garbage. Do not COLLECT.\n\n### Response:\nRIGHT"}
inference.py ADDED
@@ -0,0 +1,520 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import time
3
+ import requests
4
+ import json
5
+ from collections import deque
6
+ from openai import OpenAI
7
+
8
+ API_BASE_URL = os.environ.get("API_BASE_URL", "https://api.openai.com/v1")
9
+ MODEL_NAME = os.environ.get("MODEL_NAME", "gpt-4o-mini")
10
+ HF_TOKEN = os.environ.get("HF_TOKEN", "")
11
+ ENV_URL = os.environ.get("ENV_URL", "http://localhost:7861")
12
+ LOCAL_MODEL_PATH = os.environ.get(
13
+ "LOCAL_MODEL_PATH",
14
+ "TechAvenger/GarbageBot-Weights"
15
+ )
16
+
17
+ MAX_STEPS = 200 # raised to account for recharge/unload detours
18
+
19
+ # Lazy-loaded local model — populated in main() if Unsloth is available
20
+ _local_model = None
21
+ _local_tokenizer = None
22
+
23
+ # Q-Learning agent — loaded once in main(), used as primary policy
24
+ _ql_agent = None
25
+ try:
26
+ from qlearning import QLearningAgent
27
+ except ImportError:
28
+ QLearningAgent = None
29
+
30
+
31
+ # ──────────────────────────────────────────────────────────
32
+ # BFS CORE
33
+ # ──────────────────────────────────────────────────────────
34
+
35
+ def bfs(start, goal, obstacles, grid_w, grid_h):
36
+ """
37
+ BFS from start to goal avoiding obstacles.
38
+ Returns (first_direction, path_length) or (None, inf) if unreachable.
39
+ """
40
+ start, goal = tuple(start), tuple(goal)
41
+ if start == goal:
42
+ return ("COLLECT", 0)
43
+
44
+ obstacle_set = frozenset(tuple(o) for o in obstacles)
45
+ dirs = [("RIGHT",(1,0)), ("LEFT",(-1,0)), ("UP",(0,1)), ("DOWN",(0,-1))]
46
+ queue = deque([(start, None, 0)])
47
+ visited = {start}
48
+
49
+ while queue:
50
+ pos, first, depth = queue.popleft()
51
+ for name, (dx, dy) in dirs:
52
+ npos = (pos[0]+dx, pos[1]+dy)
53
+ if not (0 <= npos[0] < grid_w and 0 <= npos[1] < grid_h):
54
+ continue
55
+ if npos in obstacle_set or npos in visited:
56
+ continue
57
+ move = first if first else name
58
+ if npos == goal:
59
+ return (move, depth + 1)
60
+ visited.add(npos)
61
+ queue.append((npos, move, depth + 1))
62
+
63
+ return (None, float('inf'))
64
+
65
+
66
+ def nearest_neighbour_order(start, targets, obstacles, grid_w, grid_h):
67
+ """
68
+ Orders garbage by nearest-neighbour TSP using actual BFS cost.
69
+ Much better than Manhattan when obstacles split direct paths.
70
+ """
71
+ remaining = list(targets)
72
+ ordered = []
73
+ current = tuple(start)
74
+ while remaining:
75
+ best = min(remaining, key=lambda t: bfs(current, t, obstacles, grid_w, grid_h)[1])
76
+ ordered.append(best)
77
+ remaining.remove(best)
78
+ current = tuple(best)
79
+ return ordered
80
+
81
+
82
+ # ──────────────────────────────────────────────────────────
83
+ # HEURISTIC — BFS-based, mode-aware
84
+ # ──────────────────────────────────────────────────────────
85
+
86
+ def heuristic_action(obs, _stuck_counter=None) -> str:
87
+ """
88
+ Pure-BFS heuristic that respects the robot's autonomous mode.
89
+
90
+ When the environment reports robot_mode == 'recharging' or 'unloading',
91
+ the action suggested here is overridden by the environment's own resolver
92
+ anyway — but we still return a sensible direction so logs are readable.
93
+
94
+ In normal mode the heuristic targets the nearest garbage via BFS with a
95
+ nearest-neighbour tour order, plus a stuck-counter escape hatch.
96
+ """
97
+ if _stuck_counter is None:
98
+ _stuck_counter = [0]
99
+
100
+ robot_mode = obs.get("robot_mode", "normal")
101
+ r_pos = list(obs["robot_position"])
102
+ obstacles = [list(o) for o in obs["obstacle_positions"]]
103
+ grid_w, grid_h = obs["grid_size"]
104
+
105
+ # ── Recharging: head to home ───────────────────────────────
106
+ if robot_mode == "recharging":
107
+ home = obs.get("home_position", r_pos)
108
+ move, _ = bfs(r_pos, home, obstacles, grid_w, grid_h)
109
+ return move or "UP"
110
+
111
+ # ── Unloading: head to unload station ─────────────────────
112
+ if robot_mode == "unloading":
113
+ station = obs.get("unload_station", r_pos)
114
+ move, _ = bfs(r_pos, station, obstacles, grid_w, grid_h)
115
+ return move or "UP"
116
+
117
+ # ── Normal: collect nearest garbage ───────────────────────
118
+ garbage = [tuple(g) for g in obs["garbage_positions"]]
119
+ if not garbage:
120
+ return "UP" # nothing to do; env will mark episode done
121
+
122
+ if tuple(r_pos) in garbage:
123
+ _stuck_counter[0] = 0
124
+ return "COLLECT"
125
+
126
+ ordered = nearest_neighbour_order(r_pos, garbage, obstacles, grid_w, grid_h)
127
+
128
+ # Stuck-counter escape: try alternate targets after repeated no-progress steps
129
+ if _stuck_counter[0] >= 4 and len(ordered) > 1:
130
+ ordered = [ordered[1], ordered[0]] + ordered[2:]
131
+ if _stuck_counter[0] >= 8:
132
+ ordered = ordered[1:] + ordered[:1]
133
+ _stuck_counter[0] = 0
134
+
135
+ target = ordered[0]
136
+ if tuple(target) == tuple(r_pos):
137
+ _stuck_counter[0] = 0
138
+ return "COLLECT"
139
+
140
+ move, _ = bfs(r_pos, target, obstacles, grid_w, grid_h)
141
+ if move and move != "COLLECT":
142
+ _stuck_counter[0] = 0
143
+ return move
144
+
145
+ # Primary target unreachable — try alternates
146
+ for alt in ordered[1:]:
147
+ move, _ = bfs(r_pos, alt, obstacles, grid_w, grid_h)
148
+ if move and move != "COLLECT":
149
+ _stuck_counter[0] = 0
150
+ return move
151
+
152
+ # Fully boxed in: take any open neighbouring cell to escape
153
+ _stuck_counter[0] += 1
154
+ obstacle_set = frozenset(tuple(o) for o in obstacles)
155
+ for name, (dx, dy) in [("RIGHT",(1,0)),("LEFT",(-1,0)),("UP",(0,1)),("DOWN",(0,-1))]:
156
+ npos = (r_pos[0]+dx, r_pos[1]+dy)
157
+ if (0 <= npos[0] < grid_w and 0 <= npos[1] < grid_h
158
+ and npos not in obstacle_set):
159
+ return name
160
+
161
+ return "RIGHT"
162
+
163
+
164
+ # ──────────────────────────────────────────────────────────
165
+ # ACTION RESOLVER (priority: Q-table → LLM → BFS heuristic)
166
+ # ──────────────────────────────────────────────────────────
167
+
168
+ def resolve_next_action(client, obs, context_history, stuck_counter=None) -> str:
169
+ """
170
+ Decide the next action using the priority chain:
171
+ 1. Q-table (trained, deterministic, fastest)
172
+ 2. Fine-tuned local LLM (Unsloth export)
173
+ 3. Remote OpenAI-compatible endpoint
174
+ 4. BFS heuristic (fallback, always works)
175
+
176
+ The BFS heuristic is mode-aware and is passed as a hint to the LLM.
177
+ Note: when the environment is in MODE_RECHARGE or MODE_UNLOAD it will
178
+ override whatever action we return, so correctness in those modes is
179
+ the heuristic's responsibility, not the LLM's.
180
+ """
181
+ heuristic = heuristic_action(obs, stuck_counter)
182
+
183
+ # ── 1. Q-Learning policy (trained, deterministic) ──────────
184
+ if _ql_agent is not None:
185
+ q_action = _ql_agent.get_action(obs)
186
+ if q_action is not None:
187
+ return q_action
188
+
189
+ # Build a mode-aware system prompt for the LLM
190
+ robot_mode = obs.get("robot_mode", "normal")
191
+ dist_home = obs.get("distance_from_home", -1)
192
+ storage_load = obs.get("current_storage_load", 0)
193
+ capacity = obs.get("storage_capacity", 6)
194
+ home = obs.get("home_position", (0, 0))
195
+ station = obs.get("unload_station", (0, 0))
196
+
197
+ mode_note = ""
198
+ if robot_mode == "recharging":
199
+ mode_note = (
200
+ f"\n⚠ ROBOT MODE: RECHARGING — navigate to home {home} "
201
+ f"({dist_home} steps away). Do NOT collect garbage until recharged."
202
+ )
203
+ elif robot_mode == "unloading":
204
+ mode_note = (
205
+ f"\n⚠ ROBOT MODE: UNLOADING — navigate to unload station {station}. "
206
+ f"Storage is full ({storage_load}/{capacity}). "
207
+ f"Do NOT collect garbage until unloaded."
208
+ )
209
+ else:
210
+ mode_note = (
211
+ f"\nBattery distance to home: {dist_home} steps. "
212
+ f"Storage: {storage_load}/{capacity}."
213
+ )
214
+
215
+ system_prompt = (
216
+ "You control a garbage collecting robot on a grid.\n"
217
+ "Reply with EXACTLY ONE of: UP DOWN LEFT RIGHT COLLECT\n\n"
218
+ "Rules:\n"
219
+ "- COLLECT only when your position exactly matches a garbage position.\n"
220
+ "- Never move into an obstacle tile.\n"
221
+ "- The environment handles recharging and unloading automatically.\n"
222
+ f"- Pathfinding suggests: {heuristic} (only override if clearly wrong)"
223
+ f"{mode_note}"
224
+ )
225
+
226
+ # ── 2. Try local fine-tuned merged model (Alpaca prompt format) ─────
227
+ if _local_model is not None and _local_tokenizer is not None:
228
+ try:
229
+ alpaca_instruction = (
230
+ "You are an AI brain controlling a garbage collecting robot.\n"
231
+ "Reply with EXACTLY ONE of: UP DOWN LEFT RIGHT COLLECT"
232
+ )
233
+ prompt = (
234
+ f"### Instruction:\n{alpaca_instruction}\n\n"
235
+ f"### Input:\nENVIRONMENT STATUS:\n{obs['message']}\n\n"
236
+ f"### Response:\n"
237
+ )
238
+ inputs = _local_tokenizer(
239
+ prompt, return_tensors="pt", truncation=True, max_length=512
240
+ ).to(_local_model.device)
241
+ with __import__('torch').no_grad():
242
+ outputs = _local_model.generate(
243
+ **inputs, max_new_tokens=6, do_sample=False,
244
+ pad_token_id=_local_tokenizer.eos_token_id
245
+ )
246
+ new_tokens = outputs[0][inputs["input_ids"].shape[1]:]
247
+ token = _local_tokenizer.decode(new_tokens, skip_special_tokens=True).strip().upper()
248
+ for valid in ["UP", "DOWN", "LEFT", "RIGHT", "COLLECT"]:
249
+ if valid in token:
250
+ print(f"[LOCAL LLM] {token.split()[0] if token else '?'} (raw: {token!r})")
251
+ return valid
252
+ except Exception as e:
253
+ print(f"[LOCAL LLM ERROR] {e}")
254
+
255
+ # ── 3. Try remote OpenAI-compatible endpoint ─────────────────
256
+ if client is not None:
257
+ try:
258
+ response = client.chat.completions.create(
259
+ model=MODEL_NAME,
260
+ messages=[
261
+ {"role": "system", "content": system_prompt},
262
+ *context_history,
263
+ {"role": "user", "content": f"STATUS:\n{obs['message']}\n\nCommand?"}
264
+ ],
265
+ temperature=0.0,
266
+ max_tokens=6
267
+ )
268
+ action = response.choices[0].message.content.strip().upper()
269
+ for valid in ["UP", "DOWN", "LEFT", "RIGHT", "COLLECT"]:
270
+ if valid in action:
271
+ return valid
272
+ except Exception as e:
273
+ print(f"[REMOTE LLM ERROR] {e}")
274
+
275
+ # ── 4. Final fallback: pure BFS heuristic ─────────────────
276
+ return heuristic
277
+
278
+
279
+ # ──────────────────────────────────────────────────────────
280
+ # INTERACTIVE GARBAGE PLACEMENT
281
+ # ──────────────────────────────────────────────────────────
282
+
283
+ def prompt_custom_garbage(grid_w, grid_h, obstacles):
284
+ """
285
+ Interactive CLI helper: prompts the user to enter garbage positions
286
+ for a dynamic episode.
287
+ """
288
+ obstacle_set = set(tuple(o) for o in obstacles)
289
+ print(f"\n Grid: {grid_w} x {grid_h} Obstacles: {sorted(obstacle_set)}")
290
+ print(" Enter garbage positions:")
291
+ print(" x,y place at column x, row y (e.g. '4,4')")
292
+ print(" random N place N random pieces (e.g. 'random 5')")
293
+ print(" done start the episode\n")
294
+
295
+ garbage = []
296
+ while True:
297
+ raw = input(" Garbage > ").strip().lower()
298
+
299
+ if raw == "done":
300
+ if not garbage:
301
+ print(" Need at least one garbage tile.")
302
+ continue
303
+ break
304
+
305
+ if raw.startswith("random"):
306
+ import random
307
+ parts = raw.split()
308
+ n = int(parts[1]) if len(parts) > 1 else 3
309
+ candidates = [(x, y) for x in range(grid_w) for y in range(grid_h)
310
+ if (x, y) not in obstacle_set]
311
+ garbage = random.sample(candidates, min(n, len(candidates)))
312
+ print(f" Random garbage: {garbage}")
313
+ break
314
+
315
+ try:
316
+ x, y = map(int, raw.split(","))
317
+ if not (0 <= x < grid_w and 0 <= y < grid_h):
318
+ print(f" Out of bounds — valid: 0-{grid_w-1}, 0-{grid_h-1}")
319
+ continue
320
+ if (x, y) in obstacle_set:
321
+ print(f" ({x},{y}) is an obstacle.")
322
+ continue
323
+ if (x, y) in garbage:
324
+ print(f" ({x},{y}) already added.")
325
+ continue
326
+ garbage.append((x, y))
327
+ print(f" Added ({x},{y}) total: {garbage}")
328
+ except ValueError:
329
+ print(" Format: x,y e.g. '3,4'")
330
+
331
+ return garbage
332
+
333
+
334
+ def reset_with_custom_garbage(task_id, garbage_positions):
335
+ """
336
+ Posts to /reset_custom to inject custom garbage positions at runtime.
337
+ Falls back to standard /reset if something goes wrong.
338
+ """
339
+ try:
340
+ res = requests.post(f"{ENV_URL}/reset_custom", json={
341
+ "task_id": task_id,
342
+ "garbage_positions": [list(g) for g in garbage_positions]
343
+ })
344
+ res.raise_for_status()
345
+ return res.json()["observation"]
346
+ except Exception as e:
347
+ print(f"[WARN] /reset_custom failed ({e}), falling back to /reset")
348
+ res = requests.post(f"{ENV_URL}/reset", json={"task_id": task_id})
349
+ res.raise_for_status()
350
+ return res.json()["observation"]
351
+
352
+
353
+ # ──────────────────────────────────────────────────────────
354
+ # EPISODE RUNNER
355
+ # ──────────────────────────────────────────────────────────
356
+
357
+ def print_log(log_dict):
358
+ print(json.dumps(log_dict), flush=True)
359
+
360
+
361
+ def run_episode(client, task_id, obs):
362
+ policy = (
363
+ "q-table" if (_ql_agent and _ql_agent.loaded) else
364
+ "local-llm" if _local_model else
365
+ "remote-llm" if client else
366
+ "bfs"
367
+ )
368
+ print_log({"type": "[START]", "task_id": task_id,
369
+ "model": MODEL_NAME, "policy": policy, "max_steps": MAX_STEPS})
370
+
371
+ total_reward = 0.0
372
+ done = False
373
+ context_history = []
374
+ step_idx = 0
375
+ stuck_counter = [0] # per-episode; no cross-episode state leak
376
+
377
+ for step_idx in range(1, MAX_STEPS + 1):
378
+ action = resolve_next_action(client, obs, context_history, stuck_counter)
379
+
380
+ try:
381
+ res = requests.post(f"{ENV_URL}/step", json={"command": action})
382
+ res.raise_for_status()
383
+ step_data = res.json()
384
+ except Exception as e:
385
+ print(f"Step error: {e}")
386
+ break
387
+
388
+ obs = step_data["observation"]
389
+ reward = step_data["reward"]
390
+ done = step_data["done"]
391
+ info = step_data.get("info", {})
392
+ total_reward += reward
393
+
394
+ # Log includes autonomous-override details for debugging
395
+ log_entry = {
396
+ "type": "[STEP]",
397
+ "step": step_idx,
398
+ "action": action,
399
+ "effective": info.get("effective_command", action),
400
+ "overridden": info.get("autonomous_override", False),
401
+ "mode": obs.get("robot_mode", "normal"),
402
+ "battery": obs.get("battery_level"),
403
+ "storage": f"{obs.get('current_storage_load')}/{obs.get('storage_capacity')}",
404
+ "dist_home": obs.get("distance_from_home"),
405
+ "reward": round(reward, 2),
406
+ "total_reward": round(total_reward, 2),
407
+ "done": done,
408
+ }
409
+ print_log(log_entry)
410
+
411
+ if done:
412
+ break
413
+
414
+ time.sleep(0.05)
415
+
416
+ try:
417
+ score = requests.get(f"{ENV_URL}/grade/{task_id}").json()["score"]
418
+ except Exception:
419
+ score = 0.0
420
+
421
+ print_log({"type": "[END]", "task_id": task_id, "total_steps": step_idx,
422
+ "final_reward": round(total_reward, 2), "score": score})
423
+ return score
424
+
425
+
426
+ # ──────────────────────────────────────────────────────────
427
+ # MAIN
428
+ # ──────────────────────────────────────────────────────────
429
+
430
+ def main():
431
+ global _local_model, _local_tokenizer, _ql_agent
432
+
433
+ print("=" * 55)
434
+ print(" Garbage Collecting Robot — Inference")
435
+ print("=" * 55)
436
+
437
+ # ── 1. Load Q-Learning policy (fastest, no GPU needed) ────
438
+ if QLearningAgent is not None:
439
+ _ql_agent = QLearningAgent()
440
+ if _ql_agent.loaded:
441
+ print(f"\n [INFO] Q-table loaded ({len(_ql_agent.qtable):,} states). "
442
+ "Q-learning is the primary policy.")
443
+ else:
444
+ print("\n [WARN] No Q-table found (qtable.json). "
445
+ "Run: python qlearning.py --train")
446
+ print(" Falling through to LLM / BFS.")
447
+ else:
448
+ print("\n [WARN] qlearning.py not found — skipping Q-table.")
449
+
450
+ # ── 2. Attempt to load the fine-tuned merged model ────────────
451
+ try:
452
+ from transformers import AutoModelForCausalLM, AutoTokenizer
453
+ import torch
454
+ print(f"\n [INFO] Loading fine-tuned model from:\n {LOCAL_MODEL_PATH}")
455
+ _local_tokenizer = AutoTokenizer.from_pretrained(LOCAL_MODEL_PATH)
456
+ _local_model = AutoModelForCausalLM.from_pretrained(
457
+ LOCAL_MODEL_PATH,
458
+ torch_dtype=torch.float16,
459
+ device_map="auto",
460
+ )
461
+ _local_model.eval()
462
+ print(" [INFO] Fine-tuned model loaded — used when Q-table misses a state.")
463
+ except Exception as e:
464
+ print(f" [WARN] Fine-tuned model unavailable ({e}).")
465
+ print(" Falling back to remote API / BFS heuristic.")
466
+ _local_model, _local_tokenizer = None, None
467
+
468
+ import argparse
469
+ parser = argparse.ArgumentParser(description="Run GarbageBot Inference")
470
+ parser.add_argument("--dynamic", action="store_true",
471
+ help="Interactive dynamic garbage placement")
472
+ parser.add_argument("--task",
473
+ choices=["1","2","3","4","easy","medium","hard","all"],
474
+ default="all",
475
+ help="Task to run: 'easy', 'medium', 'hard', or 'all'")
476
+ args = parser.parse_args()
477
+
478
+ if args.task in ["1", "easy"]:
479
+ tasks = ["task_easy"]
480
+ elif args.task in ["2", "medium"]:
481
+ tasks = ["task_medium"]
482
+ elif args.task in ["3", "hard"]:
483
+ tasks = ["task_hard"]
484
+ else:
485
+ tasks = ["task_easy", "task_medium", "task_hard"]
486
+
487
+ print(f"\n [INFO] Running tasks: {', '.join(tasks)}")
488
+
489
+ client = OpenAI(api_key=HF_TOKEN, base_url=API_BASE_URL) if HF_TOKEN else None
490
+ if not client and _local_model is None:
491
+ print("\n [INFO] No HF_TOKEN and no local model — pure BFS heuristic mode.")
492
+ elif not client:
493
+ print("\n [INFO] No HF_TOKEN — using local Unsloth model + BFS fallback.")
494
+
495
+ for task_id in tasks:
496
+ print(f"\n{'─'*40}\n {task_id}\n{'─'*40}")
497
+
498
+ try:
499
+ res = requests.post(f"{ENV_URL}/reset", json={"task_id": task_id})
500
+ res.raise_for_status()
501
+ base_obs = res.json()["observation"]
502
+ except Exception as e:
503
+ print(f"Reset failed: {e}")
504
+ continue
505
+
506
+ if args.dynamic:
507
+ garbage = prompt_custom_garbage(
508
+ base_obs["grid_size"][0],
509
+ base_obs["grid_size"][1],
510
+ base_obs["obstacle_positions"]
511
+ )
512
+ obs = reset_with_custom_garbage(task_id, garbage)
513
+ else:
514
+ obs = base_obs
515
+
516
+ run_episode(client, task_id, obs)
517
+
518
+
519
+ if __name__ == "__main__":
520
+ main()
inference_output.log ADDED
@@ -0,0 +1,240 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ============================================================
2
+ Garbage Collecting Robot — Inference Script
3
+ ============================================================
4
+
5
+
6
+ ────────────────────────────────────────
7
+ Running task: task_easy
8
+ ────────────────────────────────────────
9
+ {"type": "[START]", "task_id": "task_easy", "env": "garbage-collecting-robot", "model": "gpt-4o-mini", "max_steps": 50}
10
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
11
+ {"type": "[STEP]", "step": 1, "action": "RIGHT", "reward": -0.1, "total_reward": -0.1, "done": false}
12
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
13
+ {"type": "[STEP]", "step": 2, "action": "RIGHT", "reward": -0.1, "total_reward": -0.2, "done": false}
14
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
15
+ {"type": "[STEP]", "step": 3, "action": "RIGHT", "reward": -0.1, "total_reward": -0.30000000000000004, "done": false}
16
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
17
+ {"type": "[STEP]", "step": 4, "action": "RIGHT", "reward": -0.1, "total_reward": -0.4, "done": false}
18
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
19
+ {"type": "[STEP]", "step": 5, "action": "UP", "reward": -0.1, "total_reward": -0.5, "done": false}
20
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
21
+ {"type": "[STEP]", "step": 6, "action": "UP", "reward": -0.1, "total_reward": -0.6, "done": false}
22
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
23
+ {"type": "[STEP]", "step": 7, "action": "UP", "reward": -0.1, "total_reward": -0.7, "done": false}
24
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
25
+ {"type": "[STEP]", "step": 8, "action": "UP", "reward": -0.1, "total_reward": -0.7999999999999999, "done": false}
26
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
27
+ {"type": "[STEP]", "step": 9, "action": "COLLECT", "reward": 59.9, "total_reward": 59.1, "done": true}
28
+ {"type": "[END]", "task_id": "task_easy", "total_steps": 9, "final_reward": 59.1, "score": 1.0}
29
+
30
+ ────────────────────────────────────────
31
+ Running task: task_medium
32
+ ────────────────────────────────────────
33
+ {"type": "[START]", "task_id": "task_medium", "env": "garbage-collecting-robot", "model": "gpt-4o-mini", "max_steps": 50}
34
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
35
+ {"type": "[STEP]", "step": 1, "action": "LEFT", "reward": -5.1, "total_reward": -5.1, "done": false}
36
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
37
+ {"type": "[STEP]", "step": 2, "action": "LEFT", "reward": -5.1, "total_reward": -10.2, "done": false}
38
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
39
+ {"type": "[STEP]", "step": 3, "action": "LEFT", "reward": -5.1, "total_reward": -15.299999999999999, "done": false}
40
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
41
+ {"type": "[STEP]", "step": 4, "action": "LEFT", "reward": -5.1, "total_reward": -20.4, "done": false}
42
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
43
+ {"type": "[STEP]", "step": 5, "action": "LEFT", "reward": -5.1, "total_reward": -25.5, "done": false}
44
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
45
+ {"type": "[STEP]", "step": 6, "action": "LEFT", "reward": -5.1, "total_reward": -30.6, "done": false}
46
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
47
+ {"type": "[STEP]", "step": 7, "action": "LEFT", "reward": -5.1, "total_reward": -35.7, "done": false}
48
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
49
+ {"type": "[STEP]", "step": 8, "action": "LEFT", "reward": -5.1, "total_reward": -40.800000000000004, "done": false}
50
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
51
+ {"type": "[STEP]", "step": 9, "action": "LEFT", "reward": -5.1, "total_reward": -45.900000000000006, "done": false}
52
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
53
+ {"type": "[STEP]", "step": 10, "action": "LEFT", "reward": -5.1, "total_reward": -51.00000000000001, "done": false}
54
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
55
+ {"type": "[STEP]", "step": 11, "action": "LEFT", "reward": -5.1, "total_reward": -56.10000000000001, "done": false}
56
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
57
+ {"type": "[STEP]", "step": 12, "action": "LEFT", "reward": -5.1, "total_reward": -61.20000000000001, "done": false}
58
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
59
+ {"type": "[STEP]", "step": 13, "action": "LEFT", "reward": -5.1, "total_reward": -66.30000000000001, "done": false}
60
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
61
+ {"type": "[STEP]", "step": 14, "action": "LEFT", "reward": -5.1, "total_reward": -71.4, "done": false}
62
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
63
+ {"type": "[STEP]", "step": 15, "action": "LEFT", "reward": -5.1, "total_reward": -76.5, "done": false}
64
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
65
+ {"type": "[STEP]", "step": 16, "action": "LEFT", "reward": -5.1, "total_reward": -81.6, "done": false}
66
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
67
+ {"type": "[STEP]", "step": 17, "action": "LEFT", "reward": -5.1, "total_reward": -86.69999999999999, "done": false}
68
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
69
+ {"type": "[STEP]", "step": 18, "action": "LEFT", "reward": -5.1, "total_reward": -91.79999999999998, "done": false}
70
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
71
+ {"type": "[STEP]", "step": 19, "action": "LEFT", "reward": -5.1, "total_reward": -96.89999999999998, "done": false}
72
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
73
+ {"type": "[STEP]", "step": 20, "action": "LEFT", "reward": -5.1, "total_reward": -101.99999999999997, "done": false}
74
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
75
+ {"type": "[STEP]", "step": 21, "action": "LEFT", "reward": -5.1, "total_reward": -107.09999999999997, "done": false}
76
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
77
+ {"type": "[STEP]", "step": 22, "action": "LEFT", "reward": -5.1, "total_reward": -112.19999999999996, "done": false}
78
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
79
+ {"type": "[STEP]", "step": 23, "action": "LEFT", "reward": -5.1, "total_reward": -117.29999999999995, "done": false}
80
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
81
+ {"type": "[STEP]", "step": 24, "action": "LEFT", "reward": -5.1, "total_reward": -122.39999999999995, "done": false}
82
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
83
+ {"type": "[STEP]", "step": 25, "action": "LEFT", "reward": -5.1, "total_reward": -127.49999999999994, "done": false}
84
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
85
+ {"type": "[STEP]", "step": 26, "action": "LEFT", "reward": -5.1, "total_reward": -132.59999999999994, "done": false}
86
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
87
+ {"type": "[STEP]", "step": 27, "action": "LEFT", "reward": -5.1, "total_reward": -137.69999999999993, "done": false}
88
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
89
+ {"type": "[STEP]", "step": 28, "action": "LEFT", "reward": -5.1, "total_reward": -142.79999999999993, "done": false}
90
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
91
+ {"type": "[STEP]", "step": 29, "action": "LEFT", "reward": -5.1, "total_reward": -147.89999999999992, "done": false}
92
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
93
+ {"type": "[STEP]", "step": 30, "action": "LEFT", "reward": -5.1, "total_reward": -152.99999999999991, "done": false}
94
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
95
+ {"type": "[STEP]", "step": 31, "action": "LEFT", "reward": -5.1, "total_reward": -158.0999999999999, "done": false}
96
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
97
+ {"type": "[STEP]", "step": 32, "action": "LEFT", "reward": -5.1, "total_reward": -163.1999999999999, "done": false}
98
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
99
+ {"type": "[STEP]", "step": 33, "action": "LEFT", "reward": -5.1, "total_reward": -168.2999999999999, "done": false}
100
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
101
+ {"type": "[STEP]", "step": 34, "action": "LEFT", "reward": -5.1, "total_reward": -173.3999999999999, "done": false}
102
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
103
+ {"type": "[STEP]", "step": 35, "action": "LEFT", "reward": -5.1, "total_reward": -178.4999999999999, "done": false}
104
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
105
+ {"type": "[STEP]", "step": 36, "action": "LEFT", "reward": -5.1, "total_reward": -183.59999999999988, "done": false}
106
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
107
+ {"type": "[STEP]", "step": 37, "action": "LEFT", "reward": -5.1, "total_reward": -188.69999999999987, "done": false}
108
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
109
+ {"type": "[STEP]", "step": 38, "action": "LEFT", "reward": -5.1, "total_reward": -193.79999999999987, "done": false}
110
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
111
+ {"type": "[STEP]", "step": 39, "action": "LEFT", "reward": -5.1, "total_reward": -198.89999999999986, "done": false}
112
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
113
+ {"type": "[STEP]", "step": 40, "action": "LEFT", "reward": -5.1, "total_reward": -203.99999999999986, "done": false}
114
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
115
+ {"type": "[STEP]", "step": 41, "action": "LEFT", "reward": -5.1, "total_reward": -209.09999999999985, "done": false}
116
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
117
+ {"type": "[STEP]", "step": 42, "action": "LEFT", "reward": -5.1, "total_reward": -214.19999999999985, "done": false}
118
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
119
+ {"type": "[STEP]", "step": 43, "action": "LEFT", "reward": -5.1, "total_reward": -219.29999999999984, "done": false}
120
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
121
+ {"type": "[STEP]", "step": 44, "action": "LEFT", "reward": -5.1, "total_reward": -224.39999999999984, "done": false}
122
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
123
+ {"type": "[STEP]", "step": 45, "action": "LEFT", "reward": -5.1, "total_reward": -229.49999999999983, "done": false}
124
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
125
+ {"type": "[STEP]", "step": 46, "action": "LEFT", "reward": -5.1, "total_reward": -234.59999999999982, "done": false}
126
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
127
+ {"type": "[STEP]", "step": 47, "action": "LEFT", "reward": -5.1, "total_reward": -239.69999999999982, "done": false}
128
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
129
+ {"type": "[STEP]", "step": 48, "action": "LEFT", "reward": -5.1, "total_reward": -244.7999999999998, "done": false}
130
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
131
+ {"type": "[STEP]", "step": 49, "action": "LEFT", "reward": -5.1, "total_reward": -249.8999999999998, "done": false}
132
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
133
+ {"type": "[STEP]", "step": 50, "action": "LEFT", "reward": -5.1, "total_reward": -254.9999999999998, "done": true}
134
+ {"type": "[END]", "task_id": "task_medium", "total_steps": 50, "final_reward": -254.9999999999998, "score": 0.0}
135
+
136
+ ────────────────────────────────────────
137
+ Running task: task_hard
138
+ ────────────────────────────────────────
139
+ {"type": "[START]", "task_id": "task_hard", "env": "garbage-collecting-robot", "model": "gpt-4o-mini", "max_steps": 50}
140
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
141
+ {"type": "[STEP]", "step": 1, "action": "RIGHT", "reward": -0.1, "total_reward": -0.1, "done": false}
142
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
143
+ {"type": "[STEP]", "step": 2, "action": "RIGHT", "reward": -0.1, "total_reward": -0.2, "done": false}
144
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
145
+ {"type": "[STEP]", "step": 3, "action": "RIGHT", "reward": -0.1, "total_reward": -0.30000000000000004, "done": false}
146
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
147
+ {"type": "[STEP]", "step": 4, "action": "RIGHT", "reward": -0.1, "total_reward": -0.4, "done": false}
148
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
149
+ {"type": "[STEP]", "step": 5, "action": "RIGHT", "reward": -0.1, "total_reward": -0.5, "done": false}
150
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
151
+ {"type": "[STEP]", "step": 6, "action": "RIGHT", "reward": -0.1, "total_reward": -0.6, "done": false}
152
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
153
+ {"type": "[STEP]", "step": 7, "action": "RIGHT", "reward": -0.1, "total_reward": -0.7, "done": false}
154
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
155
+ {"type": "[STEP]", "step": 8, "action": "RIGHT", "reward": -0.1, "total_reward": -0.7999999999999999, "done": false}
156
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
157
+ {"type": "[STEP]", "step": 9, "action": "UP", "reward": -0.1, "total_reward": -0.8999999999999999, "done": false}
158
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
159
+ {"type": "[STEP]", "step": 10, "action": "UP", "reward": -0.1, "total_reward": -0.9999999999999999, "done": false}
160
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
161
+ {"type": "[STEP]", "step": 11, "action": "UP", "reward": -0.1, "total_reward": -1.0999999999999999, "done": false}
162
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
163
+ {"type": "[STEP]", "step": 12, "action": "UP", "reward": -0.1, "total_reward": -1.2, "done": false}
164
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
165
+ {"type": "[STEP]", "step": 13, "action": "UP", "reward": -0.1, "total_reward": -1.3, "done": false}
166
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
167
+ {"type": "[STEP]", "step": 14, "action": "UP", "reward": -0.1, "total_reward": -1.4000000000000001, "done": false}
168
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
169
+ {"type": "[STEP]", "step": 15, "action": "UP", "reward": -0.1, "total_reward": -1.5000000000000002, "done": false}
170
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
171
+ {"type": "[STEP]", "step": 16, "action": "UP", "reward": -0.1, "total_reward": -1.6000000000000003, "done": false}
172
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
173
+ {"type": "[STEP]", "step": 17, "action": "COLLECT", "reward": 9.9, "total_reward": 8.3, "done": false}
174
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
175
+ {"type": "[STEP]", "step": 18, "action": "RIGHT", "reward": -0.1, "total_reward": 8.200000000000001, "done": false}
176
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
177
+ {"type": "[STEP]", "step": 19, "action": "DOWN", "reward": -0.1, "total_reward": 8.100000000000001, "done": false}
178
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
179
+ {"type": "[STEP]", "step": 20, "action": "DOWN", "reward": -0.1, "total_reward": 8.000000000000002, "done": false}
180
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
181
+ {"type": "[STEP]", "step": 21, "action": "DOWN", "reward": -0.1, "total_reward": 7.900000000000002, "done": false}
182
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
183
+ {"type": "[STEP]", "step": 22, "action": "DOWN", "reward": -0.1, "total_reward": 7.8000000000000025, "done": false}
184
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
185
+ {"type": "[STEP]", "step": 23, "action": "DOWN", "reward": -0.1, "total_reward": 7.700000000000003, "done": false}
186
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
187
+ {"type": "[STEP]", "step": 24, "action": "DOWN", "reward": -0.1, "total_reward": 7.600000000000003, "done": false}
188
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
189
+ {"type": "[STEP]", "step": 25, "action": "DOWN", "reward": -0.1, "total_reward": 7.5000000000000036, "done": false}
190
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
191
+ {"type": "[STEP]", "step": 26, "action": "COLLECT", "reward": 9.9, "total_reward": 17.400000000000006, "done": false}
192
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
193
+ {"type": "[STEP]", "step": 27, "action": "LEFT", "reward": -0.1, "total_reward": 17.300000000000004, "done": false}
194
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
195
+ {"type": "[STEP]", "step": 28, "action": "LEFT", "reward": -0.1, "total_reward": 17.200000000000003, "done": false}
196
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
197
+ {"type": "[STEP]", "step": 29, "action": "LEFT", "reward": -0.1, "total_reward": 17.1, "done": false}
198
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
199
+ {"type": "[STEP]", "step": 30, "action": "LEFT", "reward": -0.1, "total_reward": 17.0, "done": false}
200
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
201
+ {"type": "[STEP]", "step": 31, "action": "LEFT", "reward": -0.1, "total_reward": 16.9, "done": false}
202
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
203
+ {"type": "[STEP]", "step": 32, "action": "LEFT", "reward": -5.1, "total_reward": 11.799999999999999, "done": false}
204
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
205
+ {"type": "[STEP]", "step": 33, "action": "LEFT", "reward": -5.1, "total_reward": 6.699999999999999, "done": false}
206
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
207
+ {"type": "[STEP]", "step": 34, "action": "LEFT", "reward": -5.1, "total_reward": 1.5999999999999996, "done": false}
208
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
209
+ {"type": "[STEP]", "step": 35, "action": "LEFT", "reward": -5.1, "total_reward": -3.5, "done": false}
210
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
211
+ {"type": "[STEP]", "step": 36, "action": "LEFT", "reward": -5.1, "total_reward": -8.6, "done": false}
212
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
213
+ {"type": "[STEP]", "step": 37, "action": "LEFT", "reward": -5.1, "total_reward": -13.7, "done": false}
214
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
215
+ {"type": "[STEP]", "step": 38, "action": "LEFT", "reward": -5.1, "total_reward": -18.799999999999997, "done": false}
216
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
217
+ {"type": "[STEP]", "step": 39, "action": "LEFT", "reward": -5.1, "total_reward": -23.9, "done": false}
218
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
219
+ {"type": "[STEP]", "step": 40, "action": "LEFT", "reward": -5.1, "total_reward": -29.0, "done": false}
220
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
221
+ {"type": "[STEP]", "step": 41, "action": "LEFT", "reward": -5.1, "total_reward": -34.1, "done": false}
222
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
223
+ {"type": "[STEP]", "step": 42, "action": "LEFT", "reward": -5.1, "total_reward": -39.2, "done": false}
224
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
225
+ {"type": "[STEP]", "step": 43, "action": "LEFT", "reward": -5.1, "total_reward": -44.300000000000004, "done": false}
226
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
227
+ {"type": "[STEP]", "step": 44, "action": "LEFT", "reward": -5.1, "total_reward": -49.400000000000006, "done": false}
228
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
229
+ {"type": "[STEP]", "step": 45, "action": "LEFT", "reward": -5.1, "total_reward": -54.50000000000001, "done": false}
230
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
231
+ {"type": "[STEP]", "step": 46, "action": "LEFT", "reward": -5.1, "total_reward": -59.60000000000001, "done": false}
232
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
233
+ {"type": "[STEP]", "step": 47, "action": "LEFT", "reward": -5.1, "total_reward": -64.7, "done": false}
234
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
235
+ {"type": "[STEP]", "step": 48, "action": "LEFT", "reward": -5.1, "total_reward": -69.8, "done": false}
236
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
237
+ {"type": "[STEP]", "step": 49, "action": "LEFT", "reward": -5.1, "total_reward": -74.89999999999999, "done": false}
238
+ [LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
239
+ {"type": "[STEP]", "step": 50, "action": "LEFT", "reward": -5.1, "total_reward": -79.99999999999999, "done": false}
240
+ {"type": "[END]", "task_id": "task_hard", "total_steps": 50, "final_reward": -79.99999999999999, "score": 0.4}
models.py ADDED
@@ -0,0 +1,79 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from pydantic import BaseModel, ConfigDict
2
+ from typing import List, Literal, Optional, Tuple
3
+
4
+ # --- Custom observation and action logic ---
5
+
6
+ class Observation(BaseModel):
7
+ model_config = ConfigDict(strict=True)
8
+ grid_size: Tuple[int, int]
9
+ robot_position: Tuple[int, int]
10
+ garbage_positions: List[Tuple[int, int]]
11
+ obstacle_positions: List[Tuple[int, int]]
12
+ battery_level: int
13
+ inventory_count: int
14
+ message: str # Textual context for LLM
15
+
16
+ # ── Autonomous resource-management fields ──────────────────
17
+ home_position: Tuple[int, int] # Charging station coordinates
18
+ unload_station: Tuple[int, int] # Designated unload-corner coordinates
19
+ storage_capacity: int # Max items robot can carry before unloading
20
+ current_storage_load: int # Items currently held (resets after unload)
21
+ distance_from_home: int # BFS steps to home (-1 if unreachable)
22
+ robot_mode: str # 'normal' | 'recharging' | 'unloading'
23
+
24
+
25
+ class Action(BaseModel):
26
+ model_config = ConfigDict(strict=True)
27
+ command: Literal["UP", "DOWN", "LEFT", "RIGHT", "COLLECT"]
28
+
29
+ # --- OpenEnv Standard Spec Models ---
30
+
31
+ class State(BaseModel):
32
+ model_config = ConfigDict(strict=True)
33
+ task_id: Optional[str]
34
+ total_reward: float
35
+ steps_taken: int
36
+ done: bool
37
+
38
+ # ── Extended state for resource management ─────────────────
39
+ robot_mode: str = "normal"
40
+ current_storage_load: int = 0
41
+ battery_level: int = 0
42
+ distance_from_home: int = 0
43
+
44
+
45
+ class ResetInput(BaseModel):
46
+ task_id: str = "task_easy"
47
+
48
+ class CustomResetInput(BaseModel):
49
+ """
50
+ Fully dynamic reset — caller specifies the entire layout at runtime.
51
+ grid_size, robot_start, garbage positions, obstacles, battery, storage_capacity,
52
+ home_position and unload_station are all optional overrides on top of a base task_id.
53
+ Pass task_id='custom' to skip scenario defaults entirely.
54
+ """
55
+ task_id: str = "task_easy"
56
+ grid_size: Optional[Tuple[int, int]] = None
57
+ robot_start: Optional[Tuple[int, int]] = None
58
+ garbage_positions: Optional[List[Tuple[int, int]]] = None
59
+ obstacle_positions: Optional[List[Tuple[int, int]]] = None
60
+ max_battery: Optional[int] = None
61
+ storage_capacity: Optional[int] = None
62
+ home_position: Optional[Tuple[int, int]] = None
63
+ unload_station: Optional[Tuple[int, int]] = None
64
+
65
+ class ResetOutput(BaseModel):
66
+ observation: Observation
67
+
68
+ class StepOutput(BaseModel):
69
+ observation: Observation
70
+ reward: float
71
+ done: bool
72
+ info: dict = {}
73
+
74
+ class Task(BaseModel):
75
+ id: str
76
+ name: str
77
+ description: str
78
+ difficulty: str
79
+ reward_range: List[float]
openenv.yaml ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: garbage-collecting-robot
2
+ version: "1.0.0"
3
+ description: "An OpenEnv-compliant reinforcement learning environment for a garbage collecting robot. The agent navigates a grid, picks up garbage, and manages its battery."
4
+ type: logical-grid-world
5
+ action_space:
6
+ type: discrete
7
+ description: "Movement and interaction commands: UP, DOWN, LEFT, RIGHT, COLLECT."
8
+ observation_space:
9
+ type: object
10
+ description: "Grid state including robot position, garbage coordinates, inventory size, battery level, and a conversational text interpretation of the environment."
11
+ tasks:
12
+ - id: task_easy
13
+ difficulty: easy
14
+ description: "Navigate a small 5x5 grid to collect 1 piece of garbage."
15
+ - id: task_medium
16
+ difficulty: medium
17
+ description: "Navigate a 7x7 grid to collect 3 pieces of garbage with limited battery."
18
+ - id: task_hard
19
+ difficulty: hard
20
+ description: "Navigate a 10x10 maze avoiding obstacles to collect 5 pieces of garbage with strict battery usage."
pyproject.toml ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [project]
2
+ name = "incident-response-triage"
3
+ version = "1.0.0"
4
+ description = "OpenEnv environment simulating production incident response and triage for SRE agents"
5
+ readme = "README.md"
6
+ requires-python = ">=3.10"
7
+ dependencies = [
8
+ "fastapi>=0.110.0",
9
+ "uvicorn>=0.29.0",
10
+ "pydantic>=2.0.0",
11
+ "openai>=1.0.0",
12
+ "requests>=2.31.0",
13
+ "python-dotenv>=1.0.0",
14
+ "pyyaml>=6.0.0",
15
+ "openenv-core>=0.2.0",
16
+ ]
17
+
18
+ [project.scripts]
19
+ server = "server.app:main"
20
+
21
+ [build-system]
22
+ requires = ["setuptools>=68.0"]
23
+ build-backend = "setuptools.backends._legacy:_Backend"
qlearning.py ADDED
@@ -0,0 +1,345 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ qlearning.py — Tabular Q-Learning for the Garbage Collecting Robot.
3
+
4
+ Training runs directly against GarbageRobotEnv (no HTTP server needed).
5
+ The Q-table is persisted to disk as JSON and loaded by inference.py at startup.
6
+
7
+ State representation:
8
+ (robot_x, robot_y, sorted_garbage_tuple)
9
+ e.g. (2, 3, ((1,1),(4,4))) — compact, hashable, fully describes the relevant world
10
+
11
+ Actions:
12
+ 0=UP 1=DOWN 2=LEFT 3=RIGHT 4=COLLECT
13
+
14
+ Usage:
15
+ # Train all tasks and save
16
+ python3 qlearning.py --train --episodes 8000
17
+
18
+ # Evaluate silently (uses saved Q-table)
19
+ python3 qlearning.py --eval
20
+
21
+ Fix applied:
22
+ - load() previously had two separate key-reconstruction passes, where the
23
+ first pass result (variable `k`) was computed but then immediately discarded.
24
+ The second pass also misidentified the garbage sub-list when it had exactly
25
+ 2 integer elements (treating [gx, gy] pairs as flat coords instead of a
26
+ tuple-of-tuples). Replaced both passes with a single, unambiguous decode:
27
+ parsed = [rx, ry, [[gx1,gy1],[gx2,gy2],...]]
28
+ where the third element is always the nested garbage list.
29
+ """
30
+
31
+ import os
32
+ import json
33
+ import random
34
+ import argparse
35
+ from collections import defaultdict
36
+ from environment import GarbageRobotEnv
37
+ from scenarios import SCENARIOS
38
+
39
+ # ── Constants ──────────────────────────────────────────────────────────────
40
+
41
+ ACTIONS = ["UP", "DOWN", "LEFT", "RIGHT", "COLLECT"]
42
+ ACTION_IDX = {a: i for i, a in enumerate(ACTIONS)}
43
+ Q_TABLE_PATH = os.environ.get("Q_TABLE_PATH", "qtable.json")
44
+
45
+ # ── Hyperparameters ─────────────────────────────────────────────────────────
46
+
47
+ ALPHA = 0.15
48
+ GAMMA = 0.97
49
+ EPSILON_START = 1.0
50
+ EPSILON_END = 0.05
51
+ EPSILON_DECAY = 0.9995
52
+
53
+
54
+ # ── State Encoding ──────────────────────────────────────────────────────────
55
+
56
+ def encode_state(obs: dict) -> tuple:
57
+ """
58
+ Convert a raw observation dict into a hashable tuple suitable as a Q-table key.
59
+
60
+ Key structure: (robot_x, robot_y, ((gx1,gy1),(gx2,gy2),...))
61
+ Garbage positions are sorted so order doesn't create phantom new states.
62
+ """
63
+ rx, ry = obs["robot_position"]
64
+ garbage = tuple(sorted((int(g[0]), int(g[1])) for g in obs["garbage_positions"]))
65
+ return (int(rx), int(ry), garbage)
66
+
67
+
68
+ # ── Q-Table ─────────────────────────────────────────────────────────────────
69
+
70
+ class QTable:
71
+ """
72
+ Dictionary-backed Q-table with defaultdict initialisation.
73
+ Values default to a small optimistic initial value to encourage exploration.
74
+ """
75
+
76
+ def __init__(self, optimistic_init: float = 0.5):
77
+ self.optimistic_init = optimistic_init
78
+ self._q: dict = {}
79
+
80
+ def _ensure(self, state: tuple):
81
+ if state not in self._q:
82
+ self._q[state] = [self.optimistic_init] * len(ACTIONS)
83
+
84
+ def get(self, state: tuple, action_idx: int) -> float:
85
+ self._ensure(state)
86
+ return self._q[state][action_idx]
87
+
88
+ def update(self, state: tuple, action_idx: int, value: float):
89
+ self._ensure(state)
90
+ self._q[state][action_idx] = value
91
+
92
+ def best_action(self, state: tuple) -> int:
93
+ """Return the index of the greedy best action."""
94
+ self._ensure(state)
95
+ return int(max(range(len(ACTIONS)), key=lambda i: self._q[state][i]))
96
+
97
+ def best_q(self, state: tuple) -> float:
98
+ self._ensure(state)
99
+ return max(self._q[state])
100
+
101
+ # ── Persistence ─────────────────────────────────────────────────────────
102
+
103
+ def save(self, path: str = Q_TABLE_PATH):
104
+ """
105
+ Serialise Q-table to JSON.
106
+
107
+ Key format saved to disk:
108
+ [rx, ry, [[gx1,gy1], [gx2,gy2], ...]]
109
+ This is unambiguous: element 0 and 1 are ints, element 2 is always a
110
+ list-of-lists, even when there is only one garbage piece.
111
+ """
112
+ serialisable = {}
113
+ for (rx, ry, garbage), v in self._q.items():
114
+ key = json.dumps([rx, ry, [list(g) for g in garbage]])
115
+ serialisable[key] = v
116
+ with open(path, "w") as f:
117
+ json.dump(serialisable, f)
118
+ print(f"[Q-Table] Saved {len(self._q):,} states → {path}")
119
+
120
+ def load(self, path: str = Q_TABLE_PATH) -> bool:
121
+ """
122
+ Load Q-table from JSON.
123
+
124
+ FIX: The previous implementation had two redundant key-reconstruction
125
+ loops. The first built variable `k` which was immediately discarded;
126
+ the second pass misclassified [gx, gy] pairs (lists of 2 ints) as flat
127
+ coordinates rather than garbage-position tuples, corrupting multi-garbage
128
+ states.
129
+
130
+ New single-pass decode relies on the unambiguous 3-element structure:
131
+ parsed[0] = rx (int)
132
+ parsed[1] = ry (int)
133
+ parsed[2] = [[gx1,gy1], ...] (always a list-of-lists)
134
+ """
135
+ if not os.path.exists(path):
136
+ return False
137
+ with open(path, "r") as f:
138
+ raw = json.load(f)
139
+ self._q = {}
140
+ for k_str, v in raw.items():
141
+ parsed = json.loads(k_str)
142
+ # Robustly handle both new format [rx, ry, [[gx,gy],...]]
143
+ # and old format [rx, ry, [gx, gy]] (single garbage, flat list).
144
+ rx, ry = int(parsed[0]), int(parsed[1])
145
+ raw_garbage = parsed[2]
146
+ if raw_garbage and isinstance(raw_garbage[0], list):
147
+ # New / multi-garbage format: [[gx1,gy1],[gx2,gy2],...]
148
+ garbage = tuple(tuple(p) for p in raw_garbage)
149
+ elif raw_garbage and isinstance(raw_garbage[0], int):
150
+ # Old single-garbage flat format: [gx, gy]
151
+ garbage = (tuple(raw_garbage),)
152
+ else:
153
+ garbage = ()
154
+ self._q[(rx, ry, garbage)] = v
155
+ print(f"[Q-Table] Loaded {len(self._q):,} states ← {path}")
156
+ return True
157
+
158
+ def __len__(self):
159
+ return len(self._q)
160
+
161
+
162
+ # ── Observation Helper ───────────────────────────────────────────────────────
163
+
164
+ def _obs_from_env(env) -> dict:
165
+ """Build an obs dict directly from GarbageRobotEnv fields."""
166
+ obs_obj = env.get_observation()
167
+ return {
168
+ "robot_position": obs_obj.robot_position,
169
+ "garbage_positions": list(obs_obj.garbage_positions),
170
+ "obstacle_positions": list(obs_obj.obstacle_positions),
171
+ "grid_size": obs_obj.grid_size,
172
+ "battery_level": obs_obj.battery_level,
173
+ "inventory_count": obs_obj.inventory_count,
174
+ "message": obs_obj.message,
175
+ "robot_mode": obs_obj.robot_mode,
176
+ "home_position": obs_obj.home_position,
177
+ "unload_station": obs_obj.unload_station,
178
+ "current_storage_load": obs_obj.current_storage_load,
179
+ "storage_capacity": obs_obj.storage_capacity,
180
+ "distance_from_home": obs_obj.distance_from_home,
181
+ }
182
+
183
+
184
+ # ── Training ─────────────────────────────────────────────────────────────────
185
+
186
+ def train(
187
+ task_ids=None,
188
+ episodes: int = 8000,
189
+ qtable: QTable = None,
190
+ verbose: bool = True,
191
+ ) -> QTable:
192
+ """
193
+ Run Q-learning over the given task_ids for `episodes` total episodes.
194
+ Tasks are sampled uniformly so the agent generalises across difficulties.
195
+ """
196
+ if task_ids is None:
197
+ task_ids = list(SCENARIOS.keys())
198
+
199
+ if qtable is None:
200
+ qtable = QTable()
201
+
202
+ env = GarbageRobotEnv()
203
+ epsilon = EPSILON_START
204
+
205
+ best_scores: dict = {t: 0.0 for t in task_ids}
206
+
207
+ for ep in range(1, episodes + 1):
208
+ task_id = random.choice(task_ids)
209
+ env.reset(task_id)
210
+ obs = _obs_from_env(env)
211
+ state = encode_state(obs)
212
+
213
+ total_reward = 0.0
214
+ done = False
215
+
216
+ while not done:
217
+ if random.random() < epsilon:
218
+ action_idx = random.randrange(len(ACTIONS))
219
+ else:
220
+ action_idx = qtable.best_action(state)
221
+
222
+ action = ACTIONS[action_idx]
223
+ result = env.step(action)
224
+ next_obs = result["observation"]
225
+ reward = result["reward"]
226
+ done = result["done"]
227
+
228
+ next_state = encode_state(next_obs)
229
+
230
+ # Bellman update
231
+ old_q = qtable.get(state, action_idx)
232
+ td_target = reward + (0.0 if done else GAMMA * qtable.best_q(next_state))
233
+ new_q = old_q + ALPHA * (td_target - old_q)
234
+ qtable.update(state, action_idx, new_q)
235
+
236
+ state = next_state
237
+ obs = next_obs
238
+ total_reward += reward
239
+
240
+ score = env.grade(task_id)
241
+ if score > best_scores[task_id]:
242
+ best_scores[task_id] = score
243
+
244
+ epsilon = max(EPSILON_END, epsilon * EPSILON_DECAY)
245
+
246
+ if verbose and ep % 500 == 0:
247
+ avg_best = sum(best_scores.values()) / len(best_scores)
248
+ print(
249
+ f" Ep {ep:5d}/{episodes} ε={epsilon:.4f} "
250
+ f"states={len(qtable):,} "
251
+ f"best_scores={best_scores} avg={avg_best:.2f}"
252
+ )
253
+
254
+ return qtable
255
+
256
+
257
+ # ── Inference Helper (used by inference.py) ─────────────────��────────────────
258
+
259
+ class QLearningAgent:
260
+ """
261
+ Thin wrapper around a loaded Q-table for use by inference.py.
262
+ Falls through (returns None) when the state has never been seen during training.
263
+ """
264
+
265
+ def __init__(self, path: str = Q_TABLE_PATH):
266
+ self.qtable = QTable()
267
+ self.loaded = self.qtable.load(path)
268
+
269
+ def get_action(self, obs: dict) -> str | None:
270
+ if not self.loaded:
271
+ return None
272
+ state = encode_state(obs)
273
+ if state not in self.qtable._q:
274
+ return None
275
+ return ACTIONS[self.qtable.best_action(state)]
276
+
277
+
278
+ # ── Evaluation ───────────────────────────────────────────────────────────────
279
+
280
+ def evaluate(qtable: QTable, task_ids=None, runs: int = 5) -> dict:
281
+ """Run `runs` greedy episodes per task and return average scores."""
282
+ if task_ids is None:
283
+ task_ids = list(SCENARIOS.keys())
284
+
285
+ env = GarbageRobotEnv()
286
+ results = {}
287
+
288
+ for task_id in task_ids:
289
+ scores = []
290
+ for _ in range(runs):
291
+ env.reset(task_id)
292
+ obs = _obs_from_env(env)
293
+ done = False
294
+ while not done:
295
+ state = encode_state(obs)
296
+ action_idx = qtable.best_action(state)
297
+ result = env.step(ACTIONS[action_idx])
298
+ obs = result["observation"]
299
+ done = result["done"]
300
+ scores.append(env.grade(task_id))
301
+ avg = sum(scores) / len(scores)
302
+ results[task_id] = round(avg, 3)
303
+ print(f" {task_id:12s} avg score = {avg:.3f} ({scores})")
304
+
305
+ return results
306
+
307
+
308
+ # ── CLI Entry Point ───────────────────────────────────────────────────────────
309
+
310
+ if __name__ == "__main__":
311
+ parser = argparse.ArgumentParser(description="Q-Learning for Garbage Robot")
312
+ parser.add_argument("--train", action="store_true", help="Run training")
313
+ parser.add_argument("--eval", action="store_true", help="Run evaluation only")
314
+ parser.add_argument("--episodes", type=int, default=8000)
315
+ parser.add_argument("--tasks", nargs="+", default=None)
316
+ parser.add_argument("--output", default=Q_TABLE_PATH)
317
+ args = parser.parse_args()
318
+
319
+ if args.train:
320
+ print("=" * 55)
321
+ print(" Q-Learning Training — Garbage Collecting Robot")
322
+ print("=" * 55)
323
+ task_ids = args.tasks or list(SCENARIOS.keys())
324
+ print(f" Tasks : {task_ids}")
325
+ print(f" Episodes : {args.episodes}")
326
+ print(f" α={ALPHA} γ={GAMMA} ε {EPSILON_START}→{EPSILON_END} decay={EPSILON_DECAY}")
327
+ print()
328
+
329
+ qt = train(task_ids=task_ids, episodes=args.episodes, verbose=True)
330
+ qt.save(args.output)
331
+
332
+ print("\n — Evaluation on greedy policy —")
333
+ evaluate(qt, task_ids)
334
+
335
+ elif args.eval:
336
+ print("=" * 55)
337
+ print(" Q-Learning Evaluation")
338
+ print("=" * 55)
339
+ qt = QTable()
340
+ if not qt.load(args.output):
341
+ print(f"[ERROR] No Q-table found at {args.output}. Run with --train first.")
342
+ else:
343
+ evaluate(qt)
344
+ else:
345
+ parser.print_help()
qtable.json ADDED
The diff for this file is too large to render. See raw diff
 
requirements.txt ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Core server dependencies
2
+ fastapi>=0.110.0
3
+ uvicorn[standard]>=0.29.0
4
+ pydantic>=2.0.0
5
+
6
+ # HTTP client (used by inference.py and test_env.py)
7
+ requests>=2.31.0
8
+ openai>=1.0.0
9
+
10
+ # ── Optional: only needed if running the fine-tuned LLM locally ──────────
11
+ # Uncomment these if your Space has a GPU runtime.
12
+ torch>=2.1.0
13
+ transformers>=4.40.0
14
+ accelerate>=0.27.0
15
+ bitsandbytes>=0.43.0
16
+
17
+ # ── Optional: only needed for Q-learning training ─────────────────────────
18
+ # (training is done offline; the saved qtable.json is loaded at runtime)
19
+ # No extra deps required — qlearning.py uses stdlib only.
rl_trajectories.jsonl ADDED
The diff for this file is too large to render. See raw diff
 
run_pipeline.sh ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+
3
+ # Activate the existing virtual environment where dependencies are being installed
4
+ source venv/bin/activate
5
+
6
+ echo "Ensuring pip dependencies are installed and PyTorch is active..."
7
+ # Install remaining dependencies. Pip will use lock files to wait or pass if already installed by the background process.
8
+ pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" datasets trl peft transformers bitsandbytes --no-cache-dir
9
+
10
+ echo "======================================"
11
+ echo " Starting Unsloth LoRA Fine-Tuning... "
12
+ echo "======================================"
13
+
14
+ # Run the python script and route errors to standard out
15
+ python train_unsloth.py > train_output.log 2>&1
16
+
17
+ echo "Process completed. Check train_output.log for details."
scenarios.py ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import Tuple, List, Dict, Any
2
+
3
+ SCENARIOS: Dict[str, Dict[str, Any]] = {
4
+ "task_easy": {
5
+ "grid_size": (5, 5),
6
+ "robot_start": (0, 0),
7
+ "garbage_starts": [(4, 4)],
8
+ "obstacle_starts": [],
9
+ "max_battery": 30,
10
+ # ── Resource management ────────────────────────────────
11
+ # Home (charging station) is the robot's spawn point.
12
+ "home_position": (0, 0),
13
+ # Unload corner is the cell diagonally opposite to home.
14
+ "unload_station": (4, 0),
15
+ # 1 garbage piece; capacity=1 forces an unload cycle before finishing,
16
+ # demonstrating the mechanic even on the simplest task.
17
+ "storage_capacity": 6,
18
+ },
19
+ "task_medium": {
20
+ "grid_size": (7, 7),
21
+ "robot_start": (3, 3),
22
+ "garbage_starts": [(1, 1), (5, 5), (1, 5)],
23
+ "obstacle_starts": [(2, 2), (2, 3), (2, 4), (4, 2), (4, 3), (4, 4)],
24
+ "max_battery": 50,
25
+ # ── Resource management ────────────────────────────────
26
+ "home_position": (3, 3),
27
+ # Far corner from centre home — no obstacles there.
28
+ "unload_station": (6, 0),
29
+ # Capacity 2 out of 3 garbage pieces forces exactly one unload cycle.
30
+ "storage_capacity": 6,
31
+ },
32
+ "task_hard": {
33
+ "grid_size": (10, 10),
34
+ "robot_start": (0, 0),
35
+ "garbage_starts": [(8, 8), (9, 1), (1, 9), (5, 5), (8, 2)],
36
+ "obstacle_starts": [
37
+ (1, 1), (1, 2), (1, 3), (1, 4),
38
+ (3, 1), (3, 2), (3, 3), (3, 4),
39
+ (6, 5), (6, 6), (6, 7), (6, 8), # shifted so (5,5) stays clear for garbage
40
+ (7, 7), (7, 8), (7, 9),
41
+ ],
42
+ "max_battery": 80,
43
+ # ── Resource management ────────────────────────────────
44
+ "home_position": (0, 0),
45
+ # Bottom-right corner — clear of all obstacles.
46
+ "unload_station": (9, 0),
47
+ # Capacity 2 out of 5 garbage pieces → two unload cycles required.
48
+ "storage_capacity": 6,
49
+ },
50
+ }
server.log ADDED
Binary file (92.3 kB). View file
 
server.pid ADDED
@@ -0,0 +1 @@
 
 
1
+ 61938
test_env.py ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import requests
2
+
3
+ ENV_URL = "http://localhost:7860"
4
+
5
+ print("Resetting task_easy...")
6
+ res = requests.post(f"{ENV_URL}/reset", json={"task_id": "task_easy"})
7
+ print("Observation:", res.json()["observation"])
8
+
9
+ print("\nStepping UP...")
10
+ res = requests.post(f"{ENV_URL}/step", json={"command": "UP"})
11
+ print("Result:", res.json())
12
+
13
+ print("\nStepping UP...")
14
+ res = requests.post(f"{ENV_URL}/step", json={"command": "UP"})
15
+ print("Result:", res.json())
16
+
17
+ print("\nGrading...")
18
+ res = requests.get(f"{ENV_URL}/grade/task_easy")
19
+ print("Grade:", res.json())
train.pid ADDED
@@ -0,0 +1 @@
 
 
1
+ 382219
train_output.log ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ Traceback (most recent call last):
2
+ File "/home/robotics-mu/Downloads/Meta Hackathon/train_unsloth.py", line 13, in <module>
3
+ from datasets import Dataset
4
+ ModuleNotFoundError: No module named 'datasets'
train_unsloth.py ADDED
@@ -0,0 +1,144 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Fine-tuning Llama-3.2-3B-Instruct with Unsloth for the Garbage Collecting Robot.
3
+
4
+ Training data: fixed_dataset.jsonl (generated by code2.py + fixer.py)
5
+ Format: {"user": "### Instruction:\n...\n\n### Input:\nENVIRONMENT STATUS:\n...", "assistant": "UP|DOWN|LEFT|RIGHT|COLLECT"}
6
+
7
+ Base model: unsloth/llama-3.2-3b-instruct-bnb-4bit (same as Unsloth Studio run)
8
+ Export: lora_garbage_robot/ (LoRA adapter)
9
+ """
10
+
11
+ import os
12
+ import json
13
+ from datasets import Dataset
14
+
15
+ max_seq_length = 512 # Prompts are short; 512 is well above the longest sample
16
+ dtype = None # Auto-detect (float16 on T4, bfloat16 on Ampere+)
17
+ load_in_4bit = True
18
+
19
+ # ── Alpaca prompt — MUST match fixed_dataset.jsonl / code2.py / app.py ──────
20
+ ALPACA_TEMPLATE = (
21
+ "### Instruction:\n{instruction}\n\n"
22
+ "### Input:\nENVIRONMENT STATUS:\n{input}\n\n"
23
+ "### Response:\n{response}"
24
+ )
25
+
26
+ INSTRUCTION = (
27
+ "You are an AI brain controlling a garbage collecting robot.\n"
28
+ "Reply with EXACTLY ONE of: UP DOWN LEFT RIGHT COLLECT"
29
+ )
30
+
31
+ EOS_TOKEN = None # filled in after tokenizer loads
32
+
33
+
34
+ def load_fixed_dataset(path: str = "fixed_dataset.jsonl") -> Dataset:
35
+ """
36
+ Load fixed_dataset.jsonl produced by fixer.py.
37
+ Each row: {"user": "<### Instruction:...### Input:...>", "assistant": "<ACTION>"}
38
+ We re-format into the full Alpaca text so the model sees input + target in one string.
39
+ """
40
+ rows = []
41
+ with open(path, "r") as f:
42
+ for line in f:
43
+ row = json.loads(line)
44
+ user_text = row["user"] # already contains ### Instruction + ### Input
45
+ assistant = row["assistant"] # e.g. "RIGHT"
46
+
47
+ # Extract the environment status message from the user field
48
+ try:
49
+ env_status = user_text.split("ENVIRONMENT STATUS:\n")[1].strip()
50
+ except IndexError:
51
+ continue # skip malformed rows
52
+
53
+ text = ALPACA_TEMPLATE.format(
54
+ instruction=INSTRUCTION,
55
+ input=env_status,
56
+ response=assistant,
57
+ ) + (EOS_TOKEN or "")
58
+ rows.append({"text": text})
59
+
60
+ print(f"[Dataset] Loaded {len(rows):,} samples from {path}")
61
+ return Dataset.from_list(rows)
62
+
63
+
64
+ def main():
65
+ from unsloth import FastLanguageModel
66
+ from trl import SFTTrainer
67
+ from transformers import TrainingArguments
68
+
69
+ global EOS_TOKEN
70
+
71
+ print("=" * 60)
72
+ print(" Fine-tuning Llama-3.2-3B-Instruct — Garbage Robot")
73
+ print("=" * 60)
74
+
75
+ # ── 1. Load base model (same as Unsloth Studio session) ──────────────────
76
+ print("\n[1/4] Loading base model …")
77
+ model, tokenizer = FastLanguageModel.from_pretrained(
78
+ model_name = "unsloth/llama-3.2-3b-instruct-bnb-4bit",
79
+ max_seq_length = max_seq_length,
80
+ dtype = dtype,
81
+ load_in_4bit = load_in_4bit,
82
+ )
83
+ EOS_TOKEN = tokenizer.eos_token # fill in for dataset formatting
84
+
85
+ # ── 2. Add LoRA adapters ─────────────────────────────────────────────────
86
+ print("[2/4] Attaching LoRA adapters …")
87
+ model = FastLanguageModel.get_peft_model(
88
+ model,
89
+ r = 16,
90
+ target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
91
+ "gate_proj", "up_proj", "down_proj"],
92
+ lora_alpha = 16,
93
+ lora_dropout = 0,
94
+ bias = "none",
95
+ use_gradient_checkpointing = "unsloth",
96
+ random_state = 3407,
97
+ use_rslora = False,
98
+ loftq_config = None,
99
+ )
100
+
101
+ # ── 3. Load dataset ──────────────────────────────────────────────────────
102
+ print("[3/4] Loading fixed_dataset.jsonl …")
103
+ dataset = load_fixed_dataset("fixed_dataset.jsonl")
104
+
105
+ # ── 4. Train ─────────────────────────────────────────────────────────────
106
+ print("[4/4] Starting fine-tuning …")
107
+ trainer = SFTTrainer(
108
+ model = model,
109
+ tokenizer = tokenizer,
110
+ train_dataset = dataset,
111
+ dataset_text_field = "text",
112
+ max_seq_length = max_seq_length,
113
+ dataset_num_proc = 2,
114
+ packing = True, # efficient for short sequences
115
+ args = TrainingArguments(
116
+ per_device_train_batch_size = 4,
117
+ gradient_accumulation_steps = 4,
118
+ warmup_ratio = 0.03,
119
+ num_train_epochs = 1,
120
+ learning_rate = 2e-4,
121
+ fp16 = not FastLanguageModel.is_bfloat16_supported(),
122
+ bf16 = FastLanguageModel.is_bfloat16_supported(),
123
+ logging_steps = 10,
124
+ optim = "adamw_8bit",
125
+ weight_decay = 0.01,
126
+ lr_scheduler_type = "cosine",
127
+ seed = 3407,
128
+ output_dir = "outputs",
129
+ save_strategy = "epoch",
130
+ ),
131
+ )
132
+
133
+ trainer_stats = trainer.train()
134
+ print(f"\nTraining complete. Loss: {trainer_stats.training_loss:.4f}")
135
+
136
+ # ── Save LoRA adapter ────────────────────────────────────────────────────
137
+ model.save_pretrained("lora_garbage_robot")
138
+ tokenizer.save_pretrained("lora_garbage_robot")
139
+ print("\nLoRA adapter saved to: lora_garbage_robot/")
140
+ print("To export a merged model, use Unsloth Studio → Export → Merged Model.")
141
+
142
+
143
+ if __name__ == "__main__":
144
+ main()