Spaces:

TechAvenger
/

TrashCollector

Build error

App Files Files Community

Mihir Mithani commited on Apr 8

Commit

a8d4cdf

0 Parent(s):

Sync Hub-enabled code to Space (no weights)

Browse files

Files changed (36) hide show

.dockerignore +7 -0
.env.example +4 -0
.gitattributes +1 -0
.gitignore +12 -0
.gitmodules +3 -0
DockerFile +16 -0
Dockerfile +12 -0
OpenEnv +1 -0
README.md +115 -0
app.py +253 -0
code.py +78 -0
code2.py +34 -0
environment.py +500 -0
fixed_dataset.jsonl +0 -0
fixer.py +28 -0
frontend/index.html +166 -0
frontend/script.js +515 -0
frontend/style.css +634 -0
garbage_robot_dataset.jsonl +35 -0
inference.py +520 -0
inference_output.log +240 -0
models.py +79 -0
openenv.yaml +20 -0
pyproject.toml +23 -0
qlearning.py +345 -0
qtable.json +0 -0
requirements.txt +19 -0
rl_trajectories.jsonl +0 -0
run_pipeline.sh +17 -0
scenarios.py +50 -0
server.log +0 -0
server.pid +1 -0
test_env.py +19 -0
train.pid +1 -0
train_output.log +4 -0
train_unsloth.py +144 -0

.dockerignore ADDED Viewed

	@@ -0,0 +1,7 @@

+Meta/
+Robot/
+.venv/
+__pycache__/
+*.pyc
+.git/
+.env

.env.example ADDED Viewed

	@@ -0,0 +1,4 @@

+API_BASE_URL=https://api.openai.com/v1
+MODEL_NAME=gpt-4o-mini
+HF_TOKEN=your_hf_or_api_key_here
+ENV_URL=http://localhost:7860

.gitattributes ADDED Viewed

	@@ -0,0 +1 @@


1	+ model/*.safetensors filter=lfs diff=lfs merge=lfs -text

.gitignore ADDED Viewed

	@@ -0,0 +1,12 @@

+.venv/
+venv/
+model/
+Meta/
+Robot/
+__pycache__/
+*.pyc
+.env
+uv.lock
+test_run.log
+hf_test.log
+hf_test2.log

.gitmodules ADDED Viewed

	@@ -0,0 +1,3 @@

+[submodule "OpenEnv"]
+	path = OpenEnv
+	url = "https://github.com/techavenger123/OpenEnv#"

DockerFile ADDED Viewed

	@@ -0,0 +1,16 @@

+FROM python:3.11-slim
+WORKDIR /app
+# Install Python dependencies first (cached layer)
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+# Copy all project files
+COPY . .
+# HuggingFace Spaces requires port 7860
+EXPOSE 7860
+# Launch FastAPI server
+CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]

Dockerfile ADDED Viewed

	@@ -0,0 +1,12 @@

+FROM python:3.11-slim
+WORKDIR /app
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+COPY . .
+EXPOSE 7860
+CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]

OpenEnv ADDED Viewed

	@@ -0,0 +1 @@


1	+ Subproject commit c719decf2b19175d5ca35301d58a14c83e985480

README.md ADDED Viewed

	@@ -0,0 +1,115 @@

+---
+title: GarbageBot — RL Control Center
+emoji: 🗑️
+colorFrom: blue
+colorTo: green
+sdk: docker
+app_port: 7860
+pinned: false
+tags:
+  - openenv
+  - robotics
+  - reinforcement-learning
+  - llama-3.2
+---
+# 🤖 Garbage Collecting Robot — OpenEnv
+An OpenEnv-compliant reinforcement learning environment for a garbage collecting robot. The agent must navigate a grid room to pick up garbage while managing battery constraints and storage capacity.
+## Why Garbage Collection?
+Autonomous garbage collection is a classic robotics challenge involving pathfinding, resource management (battery), and state management (storage capacity). This environment provides a realistic training ground for AI agents to learn:
+- **Optimal Navigation** — shortest paths via BFS and Q-Learning.
+- **Resource Management** — returning to base for charging before battery depletion.
+- **Logistics** — managing a 6-unit storage bin and prioritizing unload cycles.
+---
+## Architecture
+The environment is a discrete grid world where the robot interacts with garbage, obstacles, a charging station (Home), and an Unload Station.
+```
+┌──────────┐
+│ Dashboard│ (FastAPI + Vanilla JS)
+└─────┬────┘
+      ▼
+┌──────────┐
+│ API      │ (app.py)
+└─────┬────┘
+      ▼
+┌──────────┐
+│ Env Logic│ (environment.py)
+└──────────┘
+```
+---
+## Tasks
+| Task ID | Difficulty | Description | Grid Size |
+|---------|-----------|-------------|-----------|
+| `task_easy` | 🟢 Easy | Small 5x5 grid, 1 piece of garbage. | 5x5 |
+| `task_medium` | 🟡 Medium | 7x7 grid with obstacles, 3 pieces of garbage. | 7x7 |
+| `task_hard` | 🔴 Hard | 10x10 maze, 5 pieces of garbage, strict battery. | 10x10 |
+---
+## Action Space
+Movement and interaction commands:
+- `UP`, `DOWN`, `LEFT`, `RIGHT`: Move the robot one cell.
+- `COLLECT`: Pick up garbage if the robot is on its cell.
+---
+## Observation Space
+The environment returns a detailed state:
+- `robot_position`: `(x, y)`
+- `garbage_positions`: List of `(x, y)`
+- `battery_level`: Current battery vs max.
+- `current_storage_load`: Current items vs capacity (6).
+- `robot_mode`: `normal`, `recharging`, or `unloading`.
+---
+## Policy Priority Chain
+Decisions can be driven by:
+1. **Q-Learning Table** — pre-trained optimal policy.
+2. **Llama-3.2-3B-Instruct** — fine-tuned LLM policy.
+3. **BFS Heuristic** — reliable fallback pathfinding.
+---
+## Local Development
+```bash
+# 1. Install dependencies
+pip install -r requirements.txt
+# 2. Start the server
+uvicorn app:app --host 0.0.0.0 --port 7860
+# 3. Training
+python qlearning.py --train --episodes 10000
+```
+---
+## Project Structure
+```
+├── app.py              # FastAPI server
+├── environment.py      # Core RL logic
+├── models.py           # Data schemas
+├── scenarios.py        # Task definitions
+├── qlearning.py        # Tabular RL training
+├── inference.py        # Policy resolver
+├── frontend/           # Dashboard HTML/CSS/JS
+├── qtable.json         # Trained policy weights
+├── Dockerfile          # Deployment container
+└── README.md           # This file
+```

app.py ADDED Viewed

	@@ -0,0 +1,253 @@

+"""
+FastAPI server for the Garbage Collecting Robot OpenEnv environment.
+Exposes reset / step / state / tasks / grade / policy / configure endpoints.
+Fix applied:
+  - /policy BFS fallback now uses env.get_observation().dict() instead of
+    a hand-built incomplete dict (which was missing robot_mode, home_position,
+    unload_station, current_storage_load, storage_capacity, distance_from_home).
+  - Static files and /ui route added so the HTML dashboard is served from the
+    same origin — required for HuggingFace Spaces deployment.
+"""
+import os
+import sys
+sys.path.insert(0, os.path.dirname(__file__))
+from typing import List
+from pydantic import BaseModel
+from fastapi import FastAPI, HTTPException
+from fastapi.middleware.cors import CORSMiddleware
+from fastapi.staticfiles import StaticFiles
+from fastapi.responses import FileResponse
+from environment import GarbageRobotEnv
+from models import (
+    Action, StepOutput, ResetInput, ResetOutput, CustomResetInput, State, Task,
+)
+app = FastAPI(
+    title="Garbage Collecting Robot — OpenEnv",
+    description=(
+        "An OpenEnv-compliant robotics environment for garbage collection. "
+        "AI agents must navigate a grid room to pick up garbage while managing battery constraints."
+    ),
+    version="1.0.0",
+)
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+env = GarbageRobotEnv()
+TASKS = [
+    Task(
+        id="task_easy",
+        name="Small Room Clean",
+        description="Navigate a small 5x5 grid to collect 1 piece of garbage.",
+        difficulty="easy",
+        reward_range=[0.0, 1.0],
+    ),
+    Task(
+        id="task_medium",
+        name="Medium Room with Obstacles",
+        description="Navigate a 7x7 grid to collect 3 pieces of garbage with limited battery.",
+        difficulty="medium",
+        reward_range=[0.0, 1.0],
+    ),
+    Task(
+        id="task_hard",
+        name="Large Maze Cleanup",
+        description="Navigate a 10x10 maze avoiding obstacles to collect 5 pieces of garbage with strict battery usage.",
+        difficulty="hard",
+        reward_range=[0.0, 1.0],
+    ),
+]
+VALID_IDS = {t.id for t in TASKS}
+@app.get("/", tags=["health"])
+def health():
+    return {"status": "ok", "env": "garbage-collecting-robot"}
+@app.post("/reset", response_model=ResetOutput, tags=["openenv"])
+def reset(body: ResetInput = ResetInput()):
+    if body.task_id not in VALID_IDS:
+        raise HTTPException(400, f"task_id must be one of {sorted(VALID_IDS)}")
+    state = env.reset(task_id=body.task_id)
+    return {"observation": env.get_observation().dict()}
+@app.post("/reset_custom", response_model=ResetOutput, tags=["openenv"])
+def reset_custom(body: CustomResetInput):
+    """
+    Dynamic reset endpoint. Lets callers specify garbage positions,
+    obstacle positions, robot start, grid size and battery at runtime.
+    Any omitted field falls back to the base scenario's value.
+    """
+    env.reset_custom(
+        task_id=body.task_id,
+        grid_size=body.grid_size,
+        robot_start=body.robot_start,
+        garbage_positions=body.garbage_positions,
+        obstacle_positions=body.obstacle_positions,
+        max_battery=body.max_battery,
+        storage_capacity=body.storage_capacity,
+        home_position=body.home_position,
+        unload_station=body.unload_station,
+    )
+    return {"observation": env.get_observation().dict()}
+@app.post("/step", response_model=StepOutput, tags=["openenv"])
+def step(body: Action):
+    result = env.step(command=body.command)
+    return result
+@app.get("/state", response_model=State, tags=["openenv"])
+def state():
+    return env.state()
+@app.get("/tasks", response_model=list[Task], tags=["openenv"])
+def tasks():
+    return TASKS
+@app.get("/grade/{task_id}", tags=["grading"])
+def grade(task_id: str):
+    if task_id not in VALID_IDS:
+        raise HTTPException(400, f"task_id must be one of {sorted(VALID_IDS)}")
+    score = env.grade(task_id)
+    return {"task_id": task_id, "score": score, "reward_range": [0.0, 1.0]}
+# ── Policy endpoint (fine-tuned LLM) ──────────────────────────────────────
+LOCAL_MODEL_PATH = os.environ.get(
+    "LOCAL_MODEL_PATH",
+    "TechAvenger/GarbageBot-Weights"
+)
+_policy_model     = None
+_policy_tokenizer = None
+_policy_loaded    = False
+def _load_policy():
+    global _policy_model, _policy_tokenizer, _policy_loaded
+    if _policy_loaded:
+        return
+    try:
+        from transformers import AutoModelForCausalLM, AutoTokenizer
+        import torch
+        _policy_tokenizer = AutoTokenizer.from_pretrained(LOCAL_MODEL_PATH)
+        _policy_model = AutoModelForCausalLM.from_pretrained(
+            LOCAL_MODEL_PATH, torch_dtype=torch.float16, device_map="auto"
+        )
+        _policy_model.eval()
+        print(f"[Policy] Fine-tuned model loaded from {LOCAL_MODEL_PATH}")
+    except Exception as e:
+        print(f"[Policy] Model unavailable: {e}")
+    _policy_loaded = True
+class PolicyInput(BaseModel):
+    message: str   # the obs.message string from the environment
+@app.post("/policy", tags=["openenv"])
+def policy(body: PolicyInput):
+    """
+    Ask the fine-tuned LLM for the next action.
+    Returns {"action": "UP|DOWN|LEFT|RIGHT|COLLECT", "source": "llm|bfs"}
+    """
+    _load_policy()
+    VALID = ["UP", "DOWN", "LEFT", "RIGHT", "COLLECT"]
+    if _policy_model is not None and _policy_tokenizer is not None:
+        try:
+            import torch
+            instruction = (
+                "You are an AI brain controlling a garbage collecting robot.\n"
+                "Reply with EXACTLY ONE of: UP DOWN LEFT RIGHT COLLECT"
+            )
+            prompt = (
+                f"### Instruction:\n{instruction}\n\n"
+                f"### Input:\nENVIRONMENT STATUS:\n{body.message}\n\n"
+                f"### Response:\n"
+            )
+            inputs = _policy_tokenizer(
+                prompt, return_tensors="pt", truncation=True, max_length=512
+            ).to(_policy_model.device)
+            with torch.no_grad():
+                outputs = _policy_model.generate(
+                    **inputs, max_new_tokens=6, do_sample=False,
+                    pad_token_id=_policy_tokenizer.eos_token_id
+                )
+            new_tokens = outputs[0][inputs["input_ids"].shape[1]:]
+            raw = _policy_tokenizer.decode(new_tokens, skip_special_tokens=True).strip().upper()
+            for v in VALID:
+                if v in raw:
+                    return {"action": v, "source": "llm", "raw": raw}
+        except Exception as e:
+            print(f"[Policy] Inference error: {e}")
+    # FIX: use env.get_observation().dict() so heuristic_action() receives
+    # all required fields (robot_mode, home_position, unload_station, etc.)
+    # instead of the previous hand-built incomplete dict.
+    from inference import heuristic_action
+    obs_dict = env.get_observation().dict()
+    obs_dict["message"] = body.message   # use the caller's message for context
+    return {"action": heuristic_action(obs_dict), "source": "bfs"}
+# ── Dynamic garbage placement ──────────────────────────────────────────────
+class ConfigureInput(BaseModel):
+    task_id: str = "task_easy"
+    garbage_positions: List[List[int]]  # [[x,y], ...]
+@app.post("/configure", tags=["openenv"])
+def configure(body: ConfigureInput):
+    """
+    Reset the environment for task_id, then override garbage positions
+    with whatever the caller supplies.
+    """
+    if body.task_id not in VALID_IDS:
+        raise HTTPException(400, f"task_id must be one of {sorted(VALID_IDS)}")
+    env.reset(task_id=body.task_id)
+    validated = []
+    for pos in body.garbage_positions:
+        if len(pos) != 2:
+            raise HTTPException(400, f"Each position must be [x, y], got {pos}")
+        x, y = pos
+        gw, gh = env.grid_size
+        if not (0 <= x < gw and 0 <= y < gh):
+            raise HTTPException(400, f"Position {pos} out of bounds for grid {env.grid_size}")
+        if [x, y] in env.obstacle_positions:
+            raise HTTPException(400, f"Position {pos} is an obstacle")
+        validated.append([x, y])
+    env.garbage_positions = validated
+    return {"observation": env.get_observation().dict()}
+# ── Serve HTML dashboard ───────────────────────────────────────────────────
+# This makes the frontend accessible at /ui on the same origin as the API,
+# which is required for HuggingFace Spaces (no localhost cross-origin issues).
+@app.get("/ui", include_in_schema=False)
+def ui():
+    """Serve the dashboard HTML."""
+    return FileResponse("frontend/index.html")
+# Mount static assets (style.css, script.js) at /static
+if os.path.exists("frontend/style.css") or os.path.exists("frontend/script.js"):
+    app.mount("/static", StaticFiles(directory="frontend"), name="static")
+if __name__ == "__main__":
+    import uvicorn
+    uvicorn.run(app, host="0.0.0.0", port=7860)

code.py ADDED Viewed

	@@ -0,0 +1,78 @@

+"""
+code.py — Seed training data generator for the Garbage Collecting Robot.
+Fix applied:
+  - All trajectory entries now use the unified {"text": "..."} Alpaca format.
+  - Previously the first entry used {"text": ...} while all others used
+    {"obs": ..., "action": ...}, causing fixer.py to silently skip them
+    (KeyError on the missing "text" key).
+"""
+import json
+INSTRUCTION = (
+    "You control a garbage collecting robot. "
+    "Reply with ONE of: UP DOWN LEFT RIGHT COLLECT"
+)
+def alpaca(obs: str, action: str) -> dict:
+    """Wrap an obs/action pair into the Alpaca fine-tuning format."""
+    return {
+        "text": (
+            f"### Instruction:\n{INSTRUCTION}\n\n"
+            f"### Input:\nENVIRONMENT STATUS:\n{obs}\n\n"
+            f"### Response:\n{action}"
+        )
+    }
+trajectories = [
+    # --- task_easy: straight-line approach ---
+    alpaca("You are at (0, 0). Garbage at [(4, 4)]. Battery: 30/30. No obstacles nearby.", "RIGHT"),
+    alpaca("You are at (1, 0). Garbage at [(4, 4)]. Battery: 29/30. No obstacles nearby.", "RIGHT"),
+    alpaca("You are at (2, 0). Garbage at [(4, 4)]. Battery: 28/30. No obstacles nearby.", "RIGHT"),
+    alpaca("You are at (3, 0). Garbage at [(4, 4)]. Battery: 27/30. No obstacles nearby.", "RIGHT"),
+    alpaca("You are at (4, 0). Garbage at [(4, 4)]. Battery: 26/30. No obstacles nearby.", "UP"),
+    alpaca("You are at (4, 1). Garbage at [(4, 4)]. Battery: 25/30. No obstacles nearby.", "UP"),
+    alpaca("You are at (4, 2). Garbage at [(4, 4)]. Battery: 24/30. No obstacles nearby.", "UP"),
+    alpaca("You are at (4, 3). Garbage at [(4, 4)]. Battery: 23/30. No obstacles nearby.", "UP"),
+    alpaca("You are at (4, 4). Garbage at [(4, 4)]. Battery: 22/30. You are ON the garbage.", "COLLECT"),
+    # --- task_medium: obstacle avoidance ---
+    alpaca("You are at (3, 3). Garbage at [(1,1),(5,5),(1,5)]. Battery: 50/50. BLOCKED! DOWN is an obstacle. Blocked directions: DOWN, LEFT. Choose a different direction.", "UP"),
+    alpaca("You are at (3, 4). Garbage at [(1,1),(5,5),(1,5)]. Battery: 49/50. Moving toward (1,5).", "LEFT"),
+    alpaca("You are at (2, 4). Garbage at [(1,1),(5,5),(1,5)]. Battery: 48/50. BLOCKED! LEFT is an obstacle. Blocked directions: LEFT. Choose RIGHT or UP.", "UP"),
+    alpaca("You are at (2, 5). Garbage at [(1,1),(5,5),(1,5)]. Battery: 47/50. Clear path left.", "LEFT"),
+    alpaca("You are at (1, 5). Garbage at [(1,1),(5,5),(1,5)]. Battery: 46/50. You are ON the garbage.", "COLLECT"),
+    alpaca("You are at (1, 5). Garbage at [(1,1),(5,5)]. Battery: 45/50. Next target (5,5), moving right.", "RIGHT"),
+    alpaca("You are at (2, 5). Garbage at [(1,1),(5,5)]. Battery: 44/50. Continuing right.", "RIGHT"),
+    alpaca("You are at (3, 5). Garbage at [(1,1),(5,5)]. Battery: 43/50. Continuing right.", "RIGHT"),
+    alpaca("You are at (4, 5). Garbage at [(1,1),(5,5)]. Battery: 42/50. Continuing right.", "RIGHT"),
+    alpaca("You are at (5, 5). Garbage at [(1,1),(5,5)]. Battery: 41/50. You are ON the garbage.", "COLLECT"),
+    alpaca("You are at (5, 5). Garbage at [(1,1)]. Battery: 40/50. Last garbage at (1,1), heading left+down.", "LEFT"),
+    alpaca("You are at (4, 5). Garbage at [(1,1)]. Battery: 39/50. Continuing toward (1,1).", "LEFT"),
+    alpaca("You are at (3, 5). Garbage at [(1,1)]. Battery: 38/50. BLOCKED! DOWN is an obstacle. Go LEFT.", "LEFT"),
+    alpaca("You are at (2, 5). Garbage at [(1,1)]. Battery: 37/50. BLOCKED! DOWN is an obstacle. Go LEFT.", "LEFT"),
+    alpaca("You are at (1, 5). Garbage at [(1,1)]. Battery: 36/50. Path down is clear now.", "DOWN"),
+    alpaca("You are at (1, 4). Garbage at [(1,1)]. Battery: 35/50. Continuing down.", "DOWN"),
+    alpaca("You are at (1, 3). Garbage at [(1,1)]. Battery: 34/50. Continuing down.", "DOWN"),
+    alpaca("You are at (1, 2). Garbage at [(1,1)]. Battery: 33/50. Continuing down.", "DOWN"),
+    alpaca("You are at (1, 1). Garbage at [(1,1)]. Battery: 32/50. You are ON the last garbage.", "COLLECT"),
+    # --- low battery urgency ---
+    alpaca("You are at (2, 2). Garbage at [(4,4)]. Battery: 5/30. CRITICAL battery! Move directly: RIGHT.", "RIGHT"),
+    alpaca("You are at (3, 2). Garbage at [(4,4)]. Battery: 4/30. CRITICAL battery! Move directly: RIGHT.", "RIGHT"),
+    alpaca("You are at (4, 2). Garbage at [(4,4)]. Battery: 3/30. CRITICAL battery! Move directly: UP.", "UP"),
+    alpaca("You are at (4, 3). Garbage at [(4,4)]. Battery: 2/30. CRITICAL battery! Move directly: UP.", "UP"),
+    alpaca("You are at (4, 4). Garbage at [(4,4)]. Battery: 1/30. You are ON the garbage. COLLECT NOW.", "COLLECT"),
+    # --- do not collect when not on garbage ---
+    alpaca("You are at (2, 3). Garbage at [(4,4)]. Battery: 20/30. You are NOT on garbage. Move toward it.", "RIGHT"),
+    alpaca("You are at (0, 0). Garbage at [(3,3)]. Battery: 15/30. You are NOT on garbage. Do not COLLECT.", "RIGHT"),
+]
+with open("garbage_robot_dataset.jsonl", "w") as f:
+    for row in trajectories:
+        f.write(json.dumps(row) + "\n")
+print(f"Wrote {len(trajectories)} samples to garbage_robot_dataset.jsonl")

code2.py ADDED Viewed

	@@ -0,0 +1,34 @@

+from qlearning import QTable, ACTIONS, encode_state
+from environment import GarbageRobotEnv
+from scenarios import SCENARIOS
+import json
+qt = QTable()
+qt.load('qtable.json')
+env = GarbageRobotEnv()
+instruction = '''You are an AI brain controlling a garbage collecting robot.
+Reply with EXACTLY ONE of: UP DOWN LEFT RIGHT COLLECT'''
+alpaca = '''### Instruction:\n{}\n\n### Input:\nENVIRONMENT STATUS:\n{}\n\n### Response:\n{}'''
+data = []
+for task_id in SCENARIOS:
+    for _ in range(10):  # 10 episodes per task
+        env.reset(task_id)
+        done = False
+        while not done:
+            obs_obj = env.get_observation()
+            obs = {'robot_position': obs_obj.robot_position,
+                   'garbage_positions': list(obs_obj.garbage_positions),
+                   'grid_size': obs_obj.grid_size}
+            state = encode_state(obs)
+            action = ACTIONS[qt.best_action(state)]
+            data.append({'text': alpaca.format(instruction, obs_obj.message, action)})
+            result = env.step(action)
+            done = result['done']
+with open('rl_trajectories.jsonl', 'w') as f:
+    for row in data:
+        f.write(json.dumps(row) + '\n')
+print(f'Generated {len(data)} samples')

environment.py ADDED Viewed

	@@ -0,0 +1,500 @@

+"""
+environment.py — Garbage Collecting Robot Core RL Environment.
+Fixes applied:
+  • Battery no longer drains during autonomous CHARGE / UNLOAD_HERE steps.
+  • Recharge guard now checks `not self.done` instead of `self.garbage_positions`
+    so it also fires correctly at episode boundaries.
+"""
+from typing import Any, Dict, Optional, List, Tuple
+from collections import deque
+from models import Observation, State
+from scenarios import SCENARIOS
+# ─────────────────────────────────────────────────────────────
+# BFS PATHFINDING HELPER
+# ─────────────────────────────────────────────────────────────
+def _bfs(
+    start,
+    goal,
+    obstacles,
+    grid_w: int,
+    grid_h: int,
+) -> Tuple[Optional[str], float]:
+    """
+    Breadth-First Search from *start* to *goal* on a rectangular grid.
+    Avoids all cells listed in *obstacles*.  Returns:
+        (first_direction, path_length)  — the single step that begins the
+                                          shortest path, and how many steps
+                                          the full path takes.
+        (None, 0)                        — start == goal (already there).
+        (None, inf)                      — goal is unreachable.
+    Directions: "UP" (+y), "DOWN" (−y), "LEFT" (−x), "RIGHT" (+x).
+    """
+    start = (int(start[0]), int(start[1]))
+    goal  = (int(goal[0]),  int(goal[1]))
+    if start == goal:
+        return (None, 0)
+    obstacle_set = frozenset((int(o[0]), int(o[1])) for o in obstacles)
+    dirs = [("RIGHT", (1, 0)), ("LEFT", (-1, 0)), ("UP", (0, 1)), ("DOWN", (0, -1))]
+    queue: deque = deque([(start, None, 0)])   # (pos, first_move, depth)
+    visited = {start}
+    while queue:
+        pos, first, depth = queue.popleft()
+        for name, (dx, dy) in dirs:
+            npos = (pos[0] + dx, pos[1] + dy)
+            if not (0 <= npos[0] < grid_w and 0 <= npos[1] < grid_h):
+                continue
+            if npos in obstacle_set or npos in visited:
+                continue
+            move = first if first else name
+            if npos == goal:
+                return (move, depth + 1)
+            visited.add(npos)
+            queue.append((npos, move, depth + 1))
+    return (None, float("inf"))
+# ─────────────────────────────────────────────────────────────
+# ENVIRONMENT
+# ─────────────────────────────────────────────────────────────
+class GarbageRobotEnv:
+    """
+    Core RL Environment for the Garbage Collecting Robot.
+    Robot modes
+    -----------
+    MODE_NORMAL   — agent controls the robot normally.
+    MODE_RECHARGE — battery critically low; robot auto-navigates home,
+                    recharges, then switches back to NORMAL.
+    MODE_UNLOAD   — storage full; robot auto-navigates to unload_station,
+                    empties its bin, then switches back to NORMAL.
+    Autonomous overrides happen *inside* step(): the command the caller
+    sends is silently replaced when the robot is in a non-normal mode.
+    This keeps the external API unchanged while giving the robot
+    self-managing capabilities.
+    FIX: Battery is only decremented for real movement/collection commands,
+         NOT for internal CHARGE or UNLOAD_HERE commands.
+    """
+    MODE_NORMAL   = "normal"
+    MODE_RECHARGE = "recharging"
+    MODE_UNLOAD   = "unloading"
+    # Safety margin added on top of BFS distance when deciding to recharge.
+    RECHARGE_BUFFER = 4
+    def __init__(self):
+        self.current_task_id    = None
+        self.grid_size          = (0, 0)
+        self.robot_position     = [0, 0]
+        self.garbage_positions  = []
+        self.obstacle_positions = []
+        self.battery_level      = 0
+        self.max_battery        = 0
+        self.inventory_count    = 0
+        # Resource management state
+        self.home_position        = [0, 0]
+        self.unload_station       = [0, 0]
+        self.storage_capacity     = 6
+        self.current_storage_load = 0
+        # Episode accounting
+        self.total_reward = 0.0
+        self.steps_taken  = 0
+        self.done         = False
+        # Autonomous navigation mode
+        self._mode = self.MODE_NORMAL
+    # ── Reset ─────────────────────────────────────────────────
+    def reset(self, task_id: str) -> State:
+        if task_id not in SCENARIOS:
+            raise ValueError(f"Task ID '{task_id}' not found in scenarios.")
+        s = SCENARIOS[task_id]
+        self.current_task_id    = task_id
+        self.grid_size          = tuple(s["grid_size"])
+        self.robot_position     = list(s["robot_start"])
+        self.garbage_positions  = [list(g) for g in s["garbage_starts"]]
+        self.obstacle_positions = [list(o) for o in s["obstacle_starts"]]
+        self.battery_level      = s["max_battery"]
+        self.max_battery        = s["max_battery"]
+        self.home_position        = list(s.get("home_position", s["robot_start"]))
+        self.unload_station       = list(s.get("unload_station", [0, self.grid_size[1] - 1]))
+        self.storage_capacity     = s.get("storage_capacity", 6)
+        self.current_storage_load = 0
+        self.inventory_count      = 0
+        self.total_reward = 0.0
+        self.steps_taken  = 0
+        self.done         = False
+        self._mode        = self.MODE_NORMAL
+        return self.state()
+    def reset_custom(
+        self,
+        task_id: str = "task_easy",
+        grid_size=None,
+        robot_start=None,
+        garbage_positions=None,
+        obstacle_positions=None,
+        max_battery=None,
+        storage_capacity=None,
+        home_position=None,
+        unload_station=None,
+    ) -> State:
+        """
+        Dynamic reset: start from a scenario baseline and override any fields.
+        Pass task_id='custom' with all fields supplied to skip scenario lookup.
+        """
+        if task_id in SCENARIOS:
+            s = SCENARIOS[task_id]
+            base_grid       = s["grid_size"]
+            base_robot      = s["robot_start"]
+            base_garbage    = s["garbage_starts"]
+            base_obstacles  = s["obstacle_starts"]
+            base_battery    = s["max_battery"]
+            base_home       = s.get("home_position", s["robot_start"])
+            base_unload     = s.get("unload_station", [0, s["grid_size"][1] - 1])
+            base_capacity   = s.get("storage_capacity", 5)
+        else:
+            base_grid      = (10, 10)
+            base_robot     = (0, 0)
+            base_garbage   = []
+            base_obstacles = []
+            base_battery   = 60
+            base_home      = (0, 0)
+            base_unload    = (9, 0)
+            base_capacity  = 6
+        self.current_task_id    = task_id
+        self.grid_size          = tuple(grid_size)        if grid_size        is not None else tuple(base_grid)
+        self.robot_position     = list(robot_start)       if robot_start      is not None else list(base_robot)
+        self.garbage_positions  = [list(g) for g in garbage_positions]  if garbage_positions  is not None else [list(g) for g in base_garbage]
+        self.obstacle_positions = [list(o) for o in obstacle_positions] if obstacle_positions is not None else [list(o) for o in base_obstacles]
+        self.battery_level      = max_battery             if max_battery      is not None else base_battery
+        self.max_battery        = self.battery_level
+        self.home_position      = list(home_position)     if home_position    is not None else list(base_home)
+        self.unload_station     = list(unload_station)    if unload_station   is not None else list(base_unload)
+        self.storage_capacity   = storage_capacity        if storage_capacity is not None else base_capacity
+        self.current_storage_load = 0
+        self.inventory_count      = 0
+        self.total_reward         = 0.0
+        self.steps_taken          = 0
+        self.done                 = False
+        self._mode                = self.MODE_NORMAL
+        # Remove any garbage placed on top of an obstacle
+        self.garbage_positions = [
+            g for g in self.garbage_positions if g not in self.obstacle_positions
+        ]
+        return self.state()
+    # ── Observation & State helpers ───────────────────────────
+    def _bfs_distance(self, target) -> int:
+        """Return BFS step-count from current robot position to *target*."""
+        _, dist = _bfs(
+            self.robot_position, target,
+            self.obstacle_positions, self.grid_size[0], self.grid_size[1],
+        )
+        return int(dist) if dist != float("inf") else -1
+    def _should_recharge(self) -> bool:
+        """
+        Return True when the robot must leave immediately to reach home
+        before battery runs out.
+        Threshold = BFS distance to home + RECHARGE_BUFFER.
+        A buffer of 4 gives comfortable headroom for obstacle detours.
+        """
+        if self.battery_level <= 1:
+            return True
+        dist = self._bfs_distance(self.home_position)
+        if dist < 0:
+            # Home unreachable via BFS — fall back to Manhattan distance
+            dist = (abs(self.robot_position[0] - self.home_position[0]) +
+                    abs(self.robot_position[1] - self.home_position[1]))
+        return self.battery_level <= (dist + self.RECHARGE_BUFFER)
+    def _should_unload(self) -> bool:
+        """Return True when the storage bin is at capacity."""
+        return self.current_storage_load >= self.storage_capacity
+    def get_observation(self, message: str = "") -> Observation:
+        dist_home = self._bfs_distance(self.home_position)
+        if not message:
+            message = (
+                f"You are at {tuple(self.robot_position)}. "
+                f"Garbage remaining: {len(self.garbage_positions)}. "
+                f"Battery: {self.battery_level}/{self.max_battery}. "
+                f"Storage: {self.current_storage_load}/{self.storage_capacity}. "
+                f"Home (charging): {tuple(self.home_position)} "
+                f"[{dist_home if dist_home >= 0 else 'unreachable'} steps]. "
+                f"Unload station: {tuple(self.unload_station)}. "
+                f"Mode: {self._mode}."
+            )
+        return Observation(
+            grid_size          = self.grid_size,
+            robot_position     = tuple(self.robot_position),
+            garbage_positions  = [tuple(g) for g in self.garbage_positions],
+            obstacle_positions = [tuple(o) for o in self.obstacle_positions],
+            battery_level      = self.battery_level,
+            inventory_count    = self.inventory_count,
+            message            = message,
+            home_position        = tuple(self.home_position),
+            unload_station       = tuple(self.unload_station),
+            storage_capacity     = self.storage_capacity,
+            current_storage_load = self.current_storage_load,
+            distance_from_home   = dist_home,
+            robot_mode           = self._mode,
+        )
+    def state(self) -> State:
+        return State(
+            task_id              = self.current_task_id,
+            total_reward         = self.total_reward,
+            steps_taken          = self.steps_taken,
+            done                 = self.done,
+            robot_mode           = self._mode,
+            current_storage_load = self.current_storage_load,
+            battery_level        = self.battery_level,
+            distance_from_home   = self._bfs_distance(self.home_position),
+        )
+    # ── Autonomous command resolver ────────────────────────────
+    def _resolve_command(self, requested: str) -> Tuple[str, str]:
+        """
+        Determine the *effective* command for this step.
+        When the robot is in MODE_RECHARGE or MODE_UNLOAD the caller's
+        command is replaced by an autonomously-computed one.
+        Returns
+        -------
+        (effective_command, mode_message)
+        """
+        # ── Trigger check (only when in normal mode) ───────────
+        # FIX: use `not self.done` guard instead of `self.garbage_positions`
+        # so recharge still fires even if all garbage is collected this step.
+        if self._mode == self.MODE_NORMAL:
+            if self._should_recharge() and not self.done:
+                self._mode = self.MODE_RECHARGE
+            elif self._should_unload():
+                self._mode = self.MODE_UNLOAD
+        # ── Recharging mode ────────────────────────────────────
+        if self._mode == self.MODE_RECHARGE:
+            if tuple(self.robot_position) == tuple(self.home_position):
+                # Arrived — charge and return to normal
+                self._mode = self.MODE_NORMAL
+                return (
+                    "CHARGE",
+                    (f"Reached charging station {tuple(self.home_position)}. "
+                     f"Battery fully restored to {self.max_battery}. "
+                     f"Resuming garbage collection."),
+                )
+            else:
+                move, dist = _bfs(
+                    self.robot_position, self.home_position,
+                    self.obstacle_positions, self.grid_size[0], self.grid_size[1],
+                )
+                dist_str = f"{int(dist)} steps" if dist != float("inf") else "route blocked"
+                return (
+                    move or "UP",
+                    (f"⚡ Battery critical ({self.battery_level}/{self.max_battery}). "
+                     f"Auto-navigating to charging station {tuple(self.home_position)} "
+                     f"[{dist_str}]."),
+                )
+        # ── Unloading mode ─────────────────────────────────────
+        if self._mode == self.MODE_UNLOAD:
+            if tuple(self.robot_position) == tuple(self.unload_station):
+                # Arrived — empty the bin and return to normal
+                freed      = self.current_storage_load
+                self._mode = self.MODE_NORMAL
+                return (
+                    "UNLOAD_HERE",
+                    (f"Reached unload station {tuple(self.unload_station)}. "
+                     f"Emptied {freed} item(s) from storage. "
+                     f"Resuming garbage collection."),
+                )
+            else:
+                move, dist = _bfs(
+                    self.robot_position, self.unload_station,
+                    self.obstacle_positions, self.grid_size[0], self.grid_size[1],
+                )
+                dist_str = f"{int(dist)} steps" if dist != float("inf") else "route blocked"
+                return (
+                    move or "UP",
+                    (f"📦 Storage full ({self.current_storage_load}/{self.storage_capacity}). "
+                     f"Auto-navigating to unload station {tuple(self.unload_station)} "
+                     f"[{dist_str}]."),
+                )
+        # ── Normal mode — use caller's command ─────────────────
+        return (requested, "")
+    # ── Step ──────────────────────────────────────────────────
+    def step(self, command: str) -> Dict[str, Any]:
+        if self.done:
+            obs = self.get_observation("Episode already finished.")
+            return {"observation": obs.dict(), "reward": 0.0, "done": True, "info": {}}
+        self.steps_taken += 1
+        # Resolve autonomous overrides BEFORE battery decrement so that
+        # CHARGE / UNLOAD_HERE commands do NOT consume battery.
+        effective_cmd, mode_message = self._resolve_command(command)
+        # FIX: only drain battery for real movement / collection actions.
+        # Autonomous internal commands (CHARGE, UNLOAD_HERE) are free.
+        if effective_cmd in ("CHARGE", "UNLOAD_HERE"):
+            reward = 0.0
+        else:
+            self.battery_level -= 1
+            reward = -0.1
+        message = mode_message  # may be overwritten below
+        # ── CHARGE (internal — issued autonomously at home) ────
+        if effective_cmd == "CHARGE":
+            self.battery_level = self.max_battery
+            reward += 5.0
+            # message already set from resolver
+        # ── UNLOAD_HERE (internal — issued autonomously at station) ──
+        elif effective_cmd == "UNLOAD_HERE":
+            freed                     = self.current_storage_load
+            self.current_storage_load = 0
+            reward += 2.0
+            # message already set from resolver
+        # ── COLLECT ───────────────────────────────────────────
+        elif effective_cmd == "COLLECT":
+            if self.robot_position in self.garbage_positions:
+                self.garbage_positions.remove(self.robot_position)
+                self.inventory_count      += 1
+                self.current_storage_load += 1
+                reward += 10.0
+                message = (
+                    f"Collected garbage! "
+                    f"Storage: {self.current_storage_load}/{self.storage_capacity}."
+                )
+                if self._should_unload() and self.garbage_positions:
+                    self._mode = self.MODE_UNLOAD
+                    message += (
+                        f" Storage full — auto-routing to "
+                        f"unload station {tuple(self.unload_station)}."
+                    )
+            else:
+                reward  -= 1.0
+                message  = "No garbage to collect here."
+        # ── Movement commands ──────────────────────────────────
+        elif effective_cmd in ("UP", "DOWN", "LEFT", "RIGHT"):
+            new_pos = list(self.robot_position)
+            if effective_cmd == "UP":
+                new_pos[1] += 1
+            elif effective_cmd == "DOWN":
+                new_pos[1] -= 1
+            elif effective_cmd == "LEFT":
+                new_pos[0] -= 1
+            elif effective_cmd == "RIGHT":
+                new_pos[0] += 1
+            gw, gh = self.grid_size
+            if 0 <= new_pos[0] < gw and 0 <= new_pos[1] < gh:
+                if new_pos in self.obstacle_positions:
+                    reward -= 5.0
+                    blocked = []
+                    direction_map = {
+                        "UP":    [0,  1], "DOWN":  [0, -1],
+                        "LEFT": [-1,  0], "RIGHT": [1,  0],
+                    }
+                    for d, delta in direction_map.items():
+                        nb = [self.robot_position[0] + delta[0],
+                              self.robot_position[1] + delta[1]]
+                        if nb in self.obstacle_positions:
+                            blocked.append(d)
+                    blocked_str = ", ".join(blocked) if blocked else "none"
+                    message = (
+                        f"BLOCKED! {effective_cmd} leads to an obstacle. "
+                        f"Blocked directions from here: {blocked_str}. "
+                        f"Choose a different direction."
+                    )
+                else:
+                    self.robot_position = new_pos
+                    if not message:
+                        message = f"Moved {effective_cmd}."
+            else:
+                reward -= 1.0
+                if not message:
+                    message = (
+                        f"Hit a wall trying to move {effective_cmd}. "
+                        f"Do NOT try {effective_cmd} again from this position."
+                    )
+        # ── Unknown command ────────────────────────────────────
+        else:
+            reward  -= 1.0
+            message  = f"Invalid command: '{effective_cmd}'."
+        # ── Termination checks ─────────────────────────────────
+        if len(self.garbage_positions) == 0:
+            self.done  = True
+            reward    += 50.0
+            message   += " All garbage collected! Task complete."
+        elif self.battery_level <= 0:
+            self.done  = True
+            message   += " Battery depleted! Game over."
+        self.total_reward += reward
+        return {
+            "observation": self.get_observation(message).dict(),
+            "reward":      reward,
+            "done":        self.done,
+            "info": {
+                "inventory_count":      self.inventory_count,
+                "steps":                self.steps_taken,
+                "current_storage_load": self.current_storage_load,
+                "robot_mode":           self._mode,
+                "autonomous_override":  effective_cmd != command,
+                "original_command":     command,
+                "effective_command":    effective_cmd,
+            },
+        }
+    # ── Grading ───────────────────────────────────────────────
+    def grade(self, task_id: str) -> float:
+        """Normalised [0.0, 1.0] completion score for the leaderboard."""
+        if task_id not in SCENARIOS:
+            return 0.0
+        total = len(SCENARIOS[task_id]["garbage_starts"])
+        return min(max(self.inventory_count / total, 0.0), 1.0)

fixed_dataset.jsonl ADDED Viewed

The diff for this file is too large to render. See raw diff

fixer.py ADDED Viewed

	@@ -0,0 +1,28 @@

+import json
+input_file = "rl_trajectories.jsonl"
+output_file = "fixed_dataset.jsonl"
+def extract_parts(text):
+    try:
+        user_part = text.split("### Response:")[0].strip()
+        assistant_part = text.split("### Response:")[1].strip()
+        return user_part, assistant_part
+    except:
+        return None, None
+with open(input_file, "r") as f_in, open(output_file, "w") as f_out:
+    for line in f_in:
+        data = json.loads(line)
+        text = data.get("text", "")
+        user, assistant = extract_parts(text)
+        if user and assistant:
+            new_entry = {
+                "user": user,
+                "assistant": assistant
+            }
+            f_out.write(json.dumps(new_entry) + "\n")
+print("Done. Fixed dataset saved.")

frontend/index.html ADDED Viewed

	@@ -0,0 +1,166 @@

+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>GarbageBot — RL Control Center</title>
+    <meta name="description" content="Real-time dashboard for the fine-tuned Llama-3.2 garbage collecting robot. Watch Q-learning and LLM policy decisions live.">
+    <link rel="preconnect" href="https://fonts.googleapis.com">
+    <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
+    <link href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700;800&family=JetBrains+Mono:wght@400;600&display=swap" rel="stylesheet">
+    <!-- Use /static/ prefix so FastAPI's StaticFiles middleware serves these correctly
+         on HuggingFace Spaces and any other hosted environment. -->
+    <link rel="stylesheet" href="/static/style.css">
+</head>
+<body>
+    <!-- Animated background -->
+    <div class="bg-orbs">
+        <div class="orb orb-1"></div>
+        <div class="orb orb-2"></div>
+        <div class="orb orb-3"></div>
+        <div class="orb orb-4"></div>
+    </div>
+    <div class="grid-bg"></div>
+    <div class="dashboard">
+        <!-- ── HEADER ── -->
+        <header class="header-bar">
+            <div class="brand">
+                <div class="brand-icon">🤖</div>
+                <div>
+                    <h1>GarbageBot <span class="version-tag">v2.0</span></h1>
+                    <p class="brand-sub">Llama-3.2-3B · Q-Table · BFS</p>
+                </div>
+            </div>
+            <div class="status-strip">
+                <div class="status-pill" id="server-pill">
+                    <span class="pulse-dot" id="status-dot"></span>
+                    <span id="status-label">Connecting…</span>
+                </div>
+                <div class="policy-badge" id="policy-badge">
+                    <span class="badge-icon">⚡</span>
+                    <span id="policy-label">–</span>
+                </div>
+                <div class="status-pill" id="mode-pill">
+                    <span id="mode-label">NORMAL</span>
+                </div>
+            </div>
+            <div class="controls">
+                <select id="task-select">
+                    <option value="task_easy">🟢 Easy  — 5×5</option>
+                    <option value="task_medium">🟡 Medium — 7×7</option>
+                    <option value="task_hard">🔴 Hard  — 10×10</option>
+                </select>
+                <div class="speed-group">
+                    <label class="speed-label">Speed</label>
+                    <input type="range" id="speed-slider" min="100" max="1500" value="500" step="100">
+                    <span id="speed-val">500ms</span>
+                </div>
+                <button id="reset-btn"  class="btn secondary">↺ Reset</button>
+                <button id="auto-btn"   class="btn primary">▶  Run Policy</button>
+                <button id="manual-btn" class="btn outline">⏭ Step</button>
+            </div>
+        </header>
+        <!-- ── MAIN ── -->
+        <main>
+            <!-- Grid world -->
+            <section class="grid-panel panel glass">
+                <div class="grid-header">
+                    <span class="grid-title">Environment</span>
+                    <div class="grid-meta">
+                        <span id="step-counter" class="mono-chip">Step 0</span>
+                        <span id="episode-score-chip" class="mono-chip accent-chip">Score 0.00</span>
+                    </div>
+                </div>
+                <div class="grid-stage" id="grid-stage">
+                    <div id="env-grid" class="grid-world">
+                        <div id="particle-layer" class="particle-layer"></div>
+                    </div>
+                </div>
+                <p class="grid-hint">💡 Click any empty cell to place or remove garbage</p>
+            </section>
+            <!-- Side panel -->
+            <aside class="side-panel">
+                <!-- Telemetry -->
+                <div class="panel glass tele-card">
+                    <h2 class="section-title">Telemetry</h2>
+                    <div class="stat-row">
+                        <div class="stat-icon">🔋</div>
+                        <div class="stat-body">
+                            <div class="stat-label-row">
+                                <span class="stat-label">Battery</span>
+                                <span id="battery-text" class="stat-num">–</span>
+                            </div>
+                            <div class="progress-track">
+                                <div class="progress-fill" id="battery-progress"></div>
+                            </div>
+                        </div>
+                    </div>
+                    <div class="stat-row">
+                        <div class="stat-icon">📦</div>
+                        <div class="stat-body">
+                            <div class="stat-label-row">
+                                <span class="stat-label">Storage</span>
+                                <span id="storage-text" class="stat-num">–</span>
+                            </div>
+                            <div class="progress-track">
+                                <div class="progress-fill" id="storage-progress" style="width: 0%; background: var(--warning);"></div>
+                            </div>
+                        </div>
+                    </div>
+                    <div class="stat-row">
+                        <div class="stat-icon">🗑️</div>
+                        <div class="stat-body">
+                            <div class="stat-label-row">
+                                <span class="stat-label">Total Collected</span>
+                                <span id="inventory-text" class="stat-num big-num">0</span>
+                            </div>
+                        </div>
+                    </div>
+                    <div class="stat-row">
+                        <div class="stat-icon">🏆</div>
+                        <div class="stat-body">
+                            <div class="stat-label-row">
+                                <span class="stat-label">Reward</span>
+                                <span id="score-text" class="stat-num big-num accent">0.00</span>
+                            </div>
+                        </div>
+                    </div>
+                    <!-- Mini reward chart -->
+                    <div class="chart-wrap">
+                        <canvas id="reward-chart" width="290" height="70"></canvas>
+                    </div>
+                </div>
+                <!-- Policy log -->
+                <div class="panel glass log-card">
+                    <div class="log-header">
+                        <h2 class="section-title">Policy Observations</h2>
+                        <button class="clear-btn" id="clear-log">✕ Clear</button>
+                    </div>
+                    <div id="log-feed" class="log-feed">
+                        <p class="placeholder">Awaiting environment reset…</p>
+                    </div>
+                    <p class="log-footer">🤖 Driven by fine-tuned Llama-3.2-3B</p>
+                </div>
+            </aside>
+        </main>
+    </div><!-- /dashboard -->
+    <script src="/static/script.js"></script>
+</body>
+</html>

frontend/script.js ADDED Viewed

	@@ -0,0 +1,515 @@

+/* ═══════════════════════════════════════════════════════
+   GarbageBot — Continuous-World Dashboard Logic
+   Policy chain: Fine-tuned LLM → Q-table → BFS fallback
+   Fix applied:
+     - API_BASE was hardcoded to "http://localhost:7861" which breaks on any
+       hosted environment (HuggingFace Spaces, cloud VMs, etc.).
+       Now uses a relative empty string "" so every fetch goes to the same
+       origin that served the page — works locally and in production without
+       any code change.
+   ═══════════════════════════════════════════════════════ */
+// FIX: use relative paths ("") instead of hardcoded "http://localhost:7861"
+// so the dashboard works on HuggingFace Spaces and any other host automatically.
+const API_BASE = "";
+// ── DOM ───────────────────────────────────────────────────
+const statusDot        = document.getElementById("status-dot");
+const statusLabel      = document.getElementById("status-label");
+const policyBadge      = document.getElementById("policy-badge");
+const policyLabel      = document.getElementById("policy-label");
+const taskSelect       = document.getElementById("task-select");
+const speedSlider      = document.getElementById("speed-slider");
+const speedVal         = document.getElementById("speed-val");
+const resetBtn         = document.getElementById("reset-btn");
+const autoBtn          = document.getElementById("auto-btn");
+const manualBtn        = document.getElementById("manual-btn");
+const clearLogBtn      = document.getElementById("clear-log");
+const envGrid          = document.getElementById("env-grid");
+const particleLayer    = document.getElementById("particle-layer");
+const batteryProgress  = document.getElementById("battery-progress");
+const batteryText      = document.getElementById("battery-text");
+const scoreText        = document.getElementById("score-text");
+const inventoryText    = document.getElementById("inventory-text");
+const stepCounter      = document.getElementById("step-counter");
+const episodeScoreChip = document.getElementById("episode-score-chip");
+const logFeed          = document.getElementById("log-feed");
+const rewardCanvas     = document.getElementById("reward-chart");
+const modePill         = document.getElementById("mode-pill");
+const modeLabel        = document.getElementById("mode-label");
+const storageProgress  = document.getElementById("storage-progress");
+const storageText      = document.getElementById("storage-text");
+// ── State ─────────────────────────────────────────────────
+let autoMode      = false;
+let autoTimer     = null;
+let currentState  = null;
+let robotEntity   = null;
+let stepCount     = 0;
+let totalReward   = 0;
+let rewardHistory = [];
+let maxBattery    = 30;
+let stepDelay     = 500;
+let lastMode      = "normal";
+// World dimensions (set on reset)
+let WORLD_W = 5, WORLD_H = 5;
+const CELL = 52;   // must match CSS --cell
+// ── Speed slider ──────────────────────────────────────────
+speedSlider.addEventListener("input", () => {
+    stepDelay = parseInt(speedSlider.value);
+    speedVal.textContent = `${stepDelay}ms`;
+    const pct = ((stepDelay - 100) / 1400) * 100;
+    speedSlider.style.background = `linear-gradient(90deg, var(--blue) ${pct}%, rgba(255,255,255,.15) ${pct}%)`;
+    syncRobotTransition();
+    if (autoMode) { clearInterval(autoTimer); autoTimer = setInterval(stepEnv, stepDelay); }
+});
+function syncRobotTransition() {
+    if (!robotEntity) return;
+    envGrid.style.setProperty("--move-dur", `${stepDelay}ms`);
+}
+// ── Log helpers ───────────────────────────────────────────
+function addLog(msg, source = "sys") {
+    const ph = logFeed.querySelector(".placeholder");
+    if (ph) ph.remove();
+    const entry  = document.createElement("div");
+    entry.className = "log-entry";
+    const badge = document.createElement("span");
+    badge.className = `log-badge ${source === "q_table" ? "q-table" : source}`;
+    badge.textContent = source.replace("_","-").toUpperCase();
+    const text = document.createElement("span");
+    text.textContent = msg;
+    entry.append(badge, text);
+    logFeed.prepend(entry);
+    while (logFeed.children.length > 65) logFeed.removeChild(logFeed.lastChild);
+}
+clearLogBtn.addEventListener("click", () => {
+    logFeed.innerHTML = `<p class="placeholder">Log cleared…</p>`;
+});
+// ── Mini reward chart ───────────────────────��─────────────
+function drawChart() {
+    const ctx = rewardCanvas.getContext("2d");
+    const W = rewardCanvas.width, H = rewardCanvas.height;
+    ctx.clearRect(0, 0, W, H);
+    if (rewardHistory.length < 2) return;
+    const maxR = Math.max(...rewardHistory.map(Math.abs), .1);
+    const step = W / (rewardHistory.length - 1);
+    const pts  = rewardHistory.map((v, i) => [i * step, H - ((v + maxR) / (2 * maxR)) * H]);
+    const grad = ctx.createLinearGradient(0, 0, 0, H);
+    grad.addColorStop(0, "rgba(59,158,255,.5)");
+    grad.addColorStop(1, "rgba(59,158,255,0)");
+    ctx.beginPath();
+    pts.forEach(([x, y], i) => i === 0 ? ctx.moveTo(x, y) : ctx.lineTo(x, y));
+    ctx.lineTo(pts[pts.length-1][0], H);
+    ctx.lineTo(0, H); ctx.closePath();
+    ctx.fillStyle = grad; ctx.fill();
+    ctx.beginPath();
+    pts.forEach(([x, y], i) => i === 0 ? ctx.moveTo(x, y) : ctx.lineTo(x, y));
+    ctx.strokeStyle = "#3b9eff"; ctx.lineWidth = 2;
+    ctx.lineJoin = "round"; ctx.stroke();
+    const [lx, ly] = pts[pts.length-1];
+    ctx.beginPath(); ctx.arc(lx, ly, 3.5, 0, Math.PI*2);
+    ctx.fillStyle = "#a5c8ff"; ctx.fill();
+}
+// ── Particles ─────────────────────────────────────────────
+function spawnParticles(px, py) {
+    const colors = ["#c084fc","#818cf8","#3b9eff","#2dd4bf","#fbbf24"];
+    for (let i = 0; i < 14; i++) {
+        const p = document.createElement("div");
+        p.className = "particle";
+        const angle = (i / 14) * Math.PI * 2;
+        const dist  = 28 + Math.random() * 42;
+        const size  = 4 + Math.random() * 6;
+        p.style.cssText = `
+            left:${px}px; top:${py}px;
+            width:${size}px; height:${size}px;
+            background:${colors[i % colors.length]};
+            box-shadow:0 0 6px ${colors[i%colors.length]};
+            --tx:translate(${Math.cos(angle)*dist}px,${Math.sin(angle)*dist}px);
+        `;
+        particleLayer.appendChild(p);
+        setTimeout(() => p.remove(), 780);
+    }
+}
+// ── Trail ghost ───────────────────────────────────────────
+function addTrail(left, top) {
+    const g = document.createElement("div");
+    g.className = "trail-ghost";
+    g.style.left = `${left}px`;
+    g.style.top  = `${top}px`;
+    envGrid.appendChild(g);
+    setTimeout(() => g.remove(), 1100);
+}
+// ── World coordinates ─────────────────────────────────────
+function wx(x)    { return x * CELL; }
+function wy(y, H) { return (H - 1 - y) * CELL; }
+// ── Direction → emoji ─────────────────────────────────────
+const DIR_EMOJI = { UP:"🤖", DOWN:"🤖", LEFT:"🤖", RIGHT:"🤖", COLLECT:"🤖" };
+// ── Grid render ───────────────────────────────────────────
+function renderGrid(obs, isReset = false) {
+    const [W, H] = obs.grid_size;
+    WORLD_W = W; WORLD_H = H;
+    const worldPx = W * CELL;
+    const worldPy = H * CELL;
+    if (isReset) {
+        envGrid.innerHTML = "";
+        envGrid.style.width  = `${worldPx}px`;
+        envGrid.style.height = `${worldPy}px`;
+        envGrid.style.gridTemplateColumns = `repeat(${W}, ${CELL}px)`;
+        envGrid.style.gridTemplateRows    = `repeat(${H}, ${CELL}px)`;
+        envGrid.style.backgroundSize = `${CELL}px ${CELL}px, ${CELL}px ${CELL}px, 100% 100%`;
+        // Transparent click-target cells
+        for (let y = H - 1; y >= 0; y--) {
+            for (let x = 0; x < W; x++) {
+                const cell = document.createElement("div");
+                cell.className = "cell";
+                cell.dataset.x = x; cell.dataset.y = y;
+                cell.addEventListener("click", () => toggleGarbage(x, y));
+                envGrid.appendChild(cell);
+            }
+        }
+        // 3D obstacle walls
+        obs.obstacle_positions.forEach(([x, y]) => {
+            const el = document.createElement("div");
+            el.className = "world-obstacle";
+            el.style.left   = `${wx(x)}px`;
+            el.style.top    = `${wy(y, H)}px`;
+            el.style.width  = `${CELL}px`;
+            el.style.height = `${CELL}px`;
+            envGrid.appendChild(el);
+        });
+        // Robot entity
+        robotEntity = document.createElement("div");
+        robotEntity.className   = "robot-entity";
+        robotEntity.textContent = "🤖";
+        robotEntity.style.width  = `${CELL}px`;
+        robotEntity.style.height = `${CELL}px`;
+        robotEntity.style.left   = `${wx(obs.robot_position[0])}px`;
+        robotEntity.style.top    = `${wy(obs.robot_position[1], H)}px`;
+        envGrid.appendChild(robotEntity);
+        // ⚡ Home Station
+        if (obs.home_position) {
+            const home = document.createElement("div");
+            home.className = "world-home";
+            home.style.left = `${wx(obs.home_position[0])}px`;
+            home.style.top  = `${wy(obs.home_position[1], H)}px`;
+            envGrid.appendChild(home);
+        }
+        // 📦 Unload Station
+        if (obs.unload_station) {
+            const unload = document.createElement("div");
+            unload.className = "world-unload";
+            unload.style.left = `${wx(obs.unload_station[0])}px`;
+            unload.style.top  = `${wy(obs.unload_station[1], H)}px`;
+            envGrid.appendChild(unload);
+        }
+        // Particle layer on top
+        const pl = document.createElement("div");
+        pl.id = "particle-layer";
+        pl.className = "particle-layer";
+        envGrid.appendChild(pl);
+        syncRobotTransition();
+    }
+    // Continuous robot move
+    if (robotEntity) {
+        const nl = wx(obs.robot_position[0]);
+        const nt = wy(obs.robot_position[1], H);
+        robotEntity.style.left = `${nl}px`;
+        robotEntity.style.top  = `${nt}px`;
+    }
+    // Re-render garbage
+    document.querySelectorAll(".world-garbage").forEach(g => g.remove());
+    obs.garbage_positions.forEach(([x, y]) => {
+        const el = document.createElement("div");
+        el.className = "world-garbage";
+        el.style.left   = `${wx(x)}px`;
+        el.style.top    = `${wy(y, H)}px`;
+        el.style.width  = `${CELL}px`;
+        el.style.height = `${CELL}px`;
+        el.innerHTML    = `<span>🗑️</span>`;
+        el.addEventListener("click", () => toggleGarbage(x, y));
+        envGrid.appendChild(el);
+    });
+    addLog(obs.message, "sys");
+}
+// ── Telemetry ─────────────────────────────────────────────
+function updateTelemetry(obs, reward, done) {
+    if (obs.battery_level > maxBattery) maxBattery = obs.battery_level;
+    const pct = Math.max(0, (obs.battery_level / maxBattery) * 100);
+    batteryProgress.style.width = `${pct}%`;
+    batteryText.textContent     = `${obs.battery_level} / ${maxBattery}`;
+    if      (pct > 55) batteryProgress.style.background = "#34d399";
+    else if (pct > 25) batteryProgress.style.background = "#fbbf24";
+    else               batteryProgress.style.background = "#fb7185";
+    // Storage update
+    if (obs.storage_capacity) {
+        const sPct = (obs.current_storage_load / obs.storage_capacity) * 100;
+        storageProgress.style.width      = `${sPct}%`;
+        storageProgress.style.background = sPct >= 100 ? "#f59e0b" : "#60a5fa";
+        storageText.textContent          = `${obs.current_storage_load} / ${obs.storage_capacity}`;
+    }
+    // Inventory (total collected)
+    if (inventoryText) {
+        inventoryText.textContent = obs.inventory_count ?? 0;
+    }
+    // Mode updates
+    const mode = obs.robot_mode || "normal";
+    if (mode !== lastMode) {
+        addLog(`Robot mode changed to: ${mode.toUpperCase()}`, "sys");
+        lastMode = mode;
+    }
+    modeLabel.textContent = mode.toUpperCase();
+    modePill.classList.remove("normal", "recharging", "unloading");
+    modePill.classList.add(mode);
+    if (robotEntity) {
+        robotEntity.classList.remove("recharging", "unloading");
+        if (mode !== "normal") robotEntity.classList.add(mode);
+    }
+    if (reward !== undefined) {
+        totalReward += reward;
+        rewardHistory.push(totalReward);
+        if (rewardHistory.length > 80) rewardHistory.shift();
+        scoreText.textContent        = totalReward.toFixed(2);
+        episodeScoreChip.textContent = `Score ${totalReward.toFixed(2)}`;
+        drawChart();
+    }
+    stepCounter.textContent = `Step ${stepCount}`;
+}
+// ── Policy badge ──────────────────────────────────────────
+const POLICY_STYLES = {
+    llm:     { color:"#3b9eff", border:"rgba(59,158,255,.6)" },
+    bfs:     { color:"#2dd4bf", border:"rgba(45,212,191,.6)" },
+    q_table: { color:"#fbbf24", border:"rgba(251,191,36,.6)" },
+    sys:     { color:"#7ea8d8", border:"rgba(126,168,216,.3)" },
+};
+function showPolicy(source, action) {
+    const s = POLICY_STYLES[source] || POLICY_STYLES.sys;
+    policyLabel.textContent       = `${source.replace("_","-").toUpperCase()} → ${action}`;
+    policyBadge.style.borderColor = s.border;
+    policyBadge.style.color       = s.color;
+    policyBadge.classList.add("active");
+}
+// ── BFS fallback ──────────────────────────────────────────
+function bfsMove(rPos, target, obstacles, W, H) {
+    if (rPos[0]===target[0] && rPos[1]===target[1]) return "COLLECT";
+    const obs  = new Set(obstacles.map(([x,y]) => `${x},${y}`));
+    const dirs = [["RIGHT",1,0],["LEFT",-1,0],["UP",0,1],["DOWN",0,-1]];
+    const q    = [{pos:[...rPos], first:null}];
+    const vis  = new Set([`${rPos[0]},${rPos[1]}`]);
+    while (q.length) {
+        const {pos, first} = q.shift();
+        for (const [name, dx, dy] of dirs) {
+            const nx = pos[0]+dx, ny = pos[1]+dy;
+            if (nx<0||nx>=W||ny<0||ny>=H) continue;
+            const key = `${nx},${ny}`;
+            if (obs.has(key)||vis.has(key)) continue;
+            const move = first||name;
+            if (nx===target[0]&&ny===target[1]) return move;
+            vis.add(key); q.push({pos:[nx,ny], first:move});
+        }
+    }
+    return null;
+}
+function nnOrder(start, targets, obstacles, W, H) {
+    function dist(a, b) {
+        if (a[0]===b[0]&&a[1]===b[1]) return 0;
+        const obs=new Set(obstacles.map(([x,y])=>`${x},${y}`));
+        const dirs=[[1,0],[-1,0],[0,1],[0,-1]];
+        const q=[{pos:[...a],d:0}];const vis=new Set([`${a[0]},${a[1]}`]);
+        while(q.length){const{pos,d}=q.shift();for(const[dx,dy]of dirs){const nx=pos[0]+dx,ny=pos[1]+dy;if(nx<0||nx>=W||ny<0||ny>=H)continue;const k=`${nx},${ny}`;if(obs.has(k)||vis.has(k))continue;if(nx===b[0]&&ny===b[1])return d+1;vis.add(k);q.push({pos:[nx,ny],d:d+1});}}
+        return Infinity;
+    }
+    let rem=[...targets],cur=[...start],ord=[];
+    while(rem.length){
+        let best=rem[0],bD=dist(cur,best);
+        for(const t of rem){const d=dist(cur,t);if(d<bD){bD=d;best=t;}}
+        ord.push(best);
+        rem=rem.filter(t=>!(t[0]===best[0]&&t[1]===best[1]));
+        cur=[...best];
+    }
+    return ord;
+}
+function localFallback(obs) {
+    if (!obs.garbage_positions.length) return "UP";
+    const r = obs.robot_position;
+    if (obs.garbage_positions.some(([x,y]) => x===r[0]&&y===r[1])) return "COLLECT";
+    const ordered = nnOrder(r, obs.garbage_positions, obs.obstacle_positions, obs.grid_size[0], obs.grid_size[1]);
+    return bfsMove(r, ordered[0], obs.obstacle_positions, obs.grid_size[0], obs.grid_size[1]) || "RIGHT";
+}
+// ── Custom garbage toggle ─────────────────────────────────
+async function toggleGarbage(x, y) {
+    if (!currentState || autoMode) return;
+    if (currentState.obstacle_positions.some(([ox,oy]) => ox===x&&oy===y)) return;
+    if (currentState.robot_position[0]===x && currentState.robot_position[1]===y) return;
+    const has  = currentState.garbage_positions.some(([gx,gy]) => gx===x&&gy===y);
+    const next = has
+        ? currentState.garbage_positions.filter(([gx,gy]) => !(gx===x&&gy===y))
+        : [...currentState.garbage_positions, [x, y]];
+    try {
+        const res = await fetch(`${API_BASE}/configure`, {
+            method: "POST", headers:{"Content-Type":"application/json"},
+            body: JSON.stringify({task_id: taskSelect.value, garbage_positions: next})
+        });
+        const data = await res.json();
+        currentState = data.observation;
+        renderGrid(currentState);
+        addLog(`Garbage ${has?"removed":"placed"} at (${x},${y})  ·  ${next.length} remaining`, "sys");
+    } catch (e) { addLog(`Config error: ${e.message}`, "sys"); }
+}
+// ── Reset ─────────────────────────────────────────────────
+async function resetEnv() {
+    if (autoMode) toggleAutoMode();
+    stepCount=0; totalReward=0; rewardHistory=[];
+    scoreText.textContent        = "0.00";
+    episodeScoreChip.textContent = "Score 0.00";
+    stepCounter.textContent      = "Step 0";
+    policyLabel.textContent      = "–";
+    drawChart();
+    try {
+        const res  = await fetch(`${API_BASE}/reset`, {
+            method:"POST", headers:{"Content-Type":"application/json"},
+            body: JSON.stringify({task_id: taskSelect.value})
+        });
+        const data = await res.json();
+        currentState = data.observation;
+        maxBattery   = currentState.battery_level;
+        logFeed.innerHTML = "";
+        renderGrid(currentState, true);
+        updateTelemetry(currentState);
+        statusDot.className     = "pulse-dot online";
+        statusLabel.textContent = "Connected";
+    } catch (e) {
+        statusDot.className     = "pulse-dot";
+        statusLabel.textContent = "Offline";
+        addLog(`Cannot reach server — is app.py running?`, "sys");
+    }
+}
+// ── Single step ───────────────────────────────────────────
+async function stepEnv() {
+    if (!currentState) return;
+    stepCount++;
+    // 1. Policy endpoint (LLM / Q-table on server)
+    let action = null, source = "bfs";
+    try {
+        const pr = await fetch(`${API_BASE}/policy`, {
+            method:"POST", headers:{"Content-Type":"application/json"},
+            body: JSON.stringify({message: currentState.message})
+        });
+        if (pr.ok) { const pd = await pr.json(); action=pd.action; source=pd.source||"llm"; }
+    } catch (_) {}
+    // 2. Local BFS fallback
+    if (!action) { action = localFallback(currentState); source = "bfs"; }
+    showPolicy(source, action);
+    // 3. Execute
+    try {
+        const res  = await fetch(`${API_BASE}/step`, {
+            method:"POST", headers:{"Content-Type":"application/json"},
+            body: JSON.stringify({command: action})
+        });
+        const data = await res.json();
+        const wasCollect = action === "COLLECT";
+        currentState = data.observation;
+        renderGrid(currentState);
+        updateTelemetry(currentState, data.reward, data.done);
+        // Collect animation + particles
+        if (wasCollect && robotEntity) {
+            robotEntity.classList.add("collecting");
+            setTimeout(() => robotEntity.classList.remove("collecting"), 440);
+            const cx = parseInt(robotEntity.style.left)  + CELL/2;
+            const cy = parseInt(robotEntity.style.top)   + CELL/2;
+            spawnParticles(cx, cy);
+        }
+        const sign = data.reward >= 0 ? "+" : "";
+        addLog(`${action}  ·  ${sign}${data.reward.toFixed(2)}`, source);
+        if (data.done) {
+            addLog(`🏁 Episode complete  ·  total ${totalReward.toFixed(2)}`, "sys");
+            if (autoMode) toggleAutoMode();
+        }
+    } catch (e) {
+        addLog(`Step error: ${e.message}`, "sys");
+        if (autoMode) toggleAutoMode();
+    }
+}
+// ── Auto mode ─────────────────────────────────────────────
+function toggleAutoMode() {
+    autoMode = !autoMode;
+    if (autoMode) {
+        autoBtn.textContent = "⏹  Stop";
+        autoBtn.className   = "btn stop";
+        autoTimer = setInterval(stepEnv, stepDelay);
+    } else {
+        autoBtn.textContent = "▶  Run Policy";
+        autoBtn.className   = "btn primary";
+        clearInterval(autoTimer);
+    }
+}
+// ── Event listeners ───────────────────────────────────────
+resetBtn .addEventListener("click",  resetEnv);
+autoBtn  .addEventListener("click",  toggleAutoMode);
+manualBtn.addEventListener("click",  stepEnv);
+taskSelect.addEventListener("change", resetEnv);
+// ── Boot ──────────────────────────────────────────────────
+resetEnv();

frontend/style.css ADDED Viewed

	@@ -0,0 +1,634 @@

+/* ═══════════════════════════════════════════════════════
+   GarbageBot — Bright Light Mode Dashboard CSS
+   ═══════════════════════════════════════════════════════ */
+/* ── Design tokens ────────────────────────────────────── */
+:root {
+    /* Background — bright soft white/cyan */
+    --bg:           #f1f5f9;
+    --surface:      rgba(255, 255, 255, 0);
+    --surface-2:    rgba(255, 255, 255, 0);
+    --surface-hi:   rgba(255, 255, 255, 1);
+    --border:       rgba(0, 0, 0, 0.08);
+    --border-hi:    rgba(0, 0, 0, 0.16);
+    --border-glow:  rgba(59, 130, 246, 0.3);
+    --text:         #1e293b;
+    --text-muted:   #64748b;
+    --text-dim:     #94a3b8;
+    /* Vibrant accents (slightly darker for contrast on white) */
+    --blue:         #3b82f6;
+    --blue-glow:    rgba(59, 130, 246, 0.45);
+    --indigo:       #6366f1;
+    --indigo-glow:  rgba(99, 102, 241, 0.45);
+    --teal:         #14b8a6;
+    --teal-glow:    rgba(20, 184, 166, 0.4);
+    --purple:       #a855f7;
+    --purple-glow:  rgba(168, 85, 247, 0.45);
+    --success:      #10b981;
+    --success-glow: rgba(16, 185, 129, 0.4);
+    --warning:      #f59e0b;
+    --warning-glow: rgba(245, 158, 11, 0.4);
+    --danger:       #ef4444;
+    --danger-glow:  rgba(239, 68, 68, 0.5);
+    --neon:         #0ea5e9;
+    --neon-glow:    rgba(14, 165, 233, 0.35);
+    /* World */
+    --floor:        #ffffff;
+    --floor-light:  #f8fafc;
+    --wall:         #e2e8f0;
+    --wall-top:     #f1f5f9;
+    --wall-shadow:  rgba(15, 23, 42, 0.15);
+    --radius:       16px;
+    --radius-sm:    10px;
+    --radius-xs:    6px;
+    --cell:         52px;
+    --gap:          3px;
+    --pad:          10px;
+    --font:  'Inter', sans-serif;
+    --mono:  'JetBrains Mono', monospace;
+}
+/* ── Reset ─────────────────────────────────────────────── */
+*, *::before, *::after { box-sizing: border-box; margin: 0; padding: 0; }
+html { font-size: 15px; }
+body {
+    font-family: var(--font);
+    background: var(--bg);
+    color: var(--text);
+    min-height: 100vh;
+    overflow-x: hidden;
+}
+/* ── Animated background orbs ───────────────────────────── */
+.bg-orbs {
+    position: fixed; inset: 0;
+    pointer-events: none; z-index: 0; overflow: hidden;
+}
+.orb {
+    position: absolute; border-radius: 50%;
+    filter: blur(100px);
+    animation: drift 28s ease-in-out infinite alternate;
+}
+/* Opacities remain similar (since background is white, colors will act like pale washes) */
+.orb-1 { width:700px;height:700px; background:radial-gradient(circle,var(--blue) 0%,transparent 70%); opacity:.15; left:-200px;top:-150px; animation-delay:0s; }
+.orb-2 { width:600px;height:600px; background:radial-gradient(circle,var(--purple) 0%,transparent 70%); opacity:.15; right:-180px;bottom:-100px; animation-delay:-9s; }
+.orb-3 { width:500px;height:500px; background:radial-gradient(circle,var(--teal) 0%,transparent 70%); opacity:.1; left:42%;top:25%; animation-delay:-18s; }
+.orb-4 { width:350px;height:350px; background:radial-gradient(circle,var(--warning) 0%,transparent 70%); opacity:.1; right:30%;top:-80px; animation-delay:-5s; }
+@keyframes drift {
+    from { transform: translate(0, 0) scale(1); }
+    to   { transform: translate(50px, 35px) scale(1.12); }
+}
+/* Dot-grid background */
+.grid-bg {
+    position: fixed; inset: 0; z-index: 0; pointer-events: none;
+    background-image: radial-gradient(circle, rgba(15,23,42,.06) 1px, transparent 1px);
+    background-size: 28px 28px;
+}
+/* ── Layout ─────────────────────────────────────────────── */
+.dashboard {
+    position: relative; z-index: 1;
+    max-width: 1360px;
+    margin: 0 auto;
+    padding: 1.25rem 1.5rem;
+    display: flex; flex-direction: column; gap: 1.1rem;
+    min-height: 100vh;
+}
+/* ── Glass panel ─────────────────────────────────────────── */
+.panel {
+    background: var(--surface);
+    border: 1px solid var(--border);
+    border-radius: var(--radius);
+}
+.glass {
+    backdrop-filter: blur(24px) saturate(180%);
+    -webkit-backdrop-filter: blur(24px) saturate(180%);
+    box-shadow: 0 4px 48px rgba(15, 23, 42, 0.06), inset 0 1px 0 rgba(255,255,255,1);
+}
+/* ── HEADER ─────────────────────────────────────────────── */
+.header-bar {
+    display: flex; align-items: center; gap: 1.2rem;
+    padding: .9rem 1.4rem;
+    flex-wrap: wrap;
+    background: linear-gradient(135deg, rgba(255,255,255,0.92) 0%, rgba(248,250,252,0.88) 100%);
+    border: 1px solid var(--border);
+    border-top: 1px solid rgba(255,255,255,1);
+    border-radius: var(--radius);
+    backdrop-filter: blur(28px);
+    box-shadow: 0 2px 36px rgba(15,23,42,0.05), 0 0 60px rgba(59,130,246,0.06), inset 0 1px 0 rgba(255,255,255,1);
+}
+.brand { display: flex; align-items: center; gap: .9rem; }
+.brand-icon {
+    font-size: 2rem;
+    filter: drop-shadow(0 0 12px rgba(99,102,241,.4));
+    animation: bob 3s ease-in-out infinite;
+}
+@keyframes bob {
+    0%,100%{ transform:translateY(0) rotate(-2deg); }
+    50%     { transform:translateY(-5px) rotate(2deg); }
+}
+h1 {
+    font-size: 1.4rem; font-weight: 800;
+    color: var(--text);
+    background: linear-gradient(135deg, var(--blue) 0%, var(--purple) 100%);
+    -webkit-background-clip: text; background-clip: text;
+    -webkit-text-fill-color: transparent;
+    letter-spacing: -.04em;
+}
+.version-tag {
+    font-size: .62rem; font-weight: 700;
+    background: rgba(99,102,241,.1); color: var(--indigo);
+    padding: 2px 7px; border-radius: 20px;
+    border: 1px solid rgba(99,102,241,.25);
+    vertical-align: middle; margin-left: 6px;
+    -webkit-text-fill-color: initial;
+}
+.brand-sub { font-size: .72rem; color: var(--text-muted); font-family: var(--mono); margin-top: 3px; }
+/* Status strip */
+.status-strip { display:flex; gap:.7rem; align-items:center; margin-left:auto; }
+.status-pill {
+    display:flex; align-items:center; gap:.45rem;
+    font-size:.8rem; font-family:var(--mono);
+    padding:.35rem .9rem;
+    border-radius:20px;
+    background:rgba(255,255,255,.6);
+    border:1px solid var(--border);
+    box-shadow: 0 2px 4px rgba(0,0,0,0.02);
+}
+.pulse-dot {
+    width:9px;height:9px;border-radius:50%;
+    background:var(--danger);
+    box-shadow:0 0 6px var(--danger-glow);
+    transition:all .4s;
+}
+.pulse-dot.online {
+    background: var(--success);
+    box-shadow: 0 0 6px var(--success-glow);
+    animation: blink 2.2s ease-in-out infinite;
+}
+@keyframes blink { 0%,100%{opacity:1} 50%{opacity:.4} }
+.policy-badge {
+    display:flex; align-items:center; gap:.4rem;
+    font-size:.8rem; font-family:var(--mono);
+    padding:.35rem .9rem;
+    border-radius:20px;
+    background: linear-gradient(135deg,rgba(59,130,246,.08),rgba(168,85,247,.08));
+    border:1px solid rgba(59,130,246,.25);
+    transition:all .4s;
+    color: var(--text);
+}
+.policy-badge.active { border-color:rgba(59,130,246,.5); box-shadow:0 0 12px rgba(59,130,246,.1); }
+.badge-icon { font-size:.9rem; }
+#mode-pill {
+    background: rgba(255, 255, 255, 0.8);
+    color: var(--text-muted);
+    font-weight: 700;
+    transition: all 0.3s ease;
+}
+#mode-pill.recharging { background: var(--blue); color: #fff; box-shadow: 0 0 12px var(--blue-glow); }
+#mode-pill.unloading { background: var(--warning); color: #fff; box-shadow: 0 0 12px var(--warning-glow); }
+#mode-pill.normal { background: rgba(0, 0, 0, 0.05); color: var(--text-muted); }
+/* Controls */
+.controls { display:flex; gap:.65rem; align-items:center; flex-wrap:wrap; }
+select {
+    background:rgba(255,255,255,.6);
+    color:var(--text);
+    border:1px solid var(--border);
+    padding:.52rem .9rem;
+    border-radius:var(--radius-sm);
+    font-size:.85rem; font-family:var(--font);
+    outline:none; cursor:pointer;
+    transition:border-color .2s,box-shadow .2s;
+    box-shadow: 0 1px 3px rgba(0,0,0,.02);
+}
+select:hover,select:focus { border-color:var(--blue); box-shadow:0 0 0 3px rgba(59,130,246,.15); }
+/* Speed control */
+.speed-group {
+    display:flex; align-items:center; gap:.5rem;
+    padding:.4rem .85rem;
+    background:rgba(255,255,255,.6);
+    border:1px solid var(--border);
+    border-radius:var(--radius-sm);
+}
+.speed-label { font-size:.75rem; color:var(--text-muted); }
+#speed-slider {
+    -webkit-appearance:none; appearance:none;
+    width:85px; height:4px;
+    background:linear-gradient(90deg, var(--blue) 0%, rgba(0,0,0,.08) 0%);
+    border-radius:2px; outline:none; cursor:pointer;
+    transition:background .1s;
+}
+#speed-slider::-webkit-slider-thumb {
+    -webkit-appearance:none;
+    width:16px;height:16px;border-radius:50%;
+    background:var(--blue);
+    box-shadow:0 0 6px var(--blue-glow);
+    cursor:pointer;
+    transition:transform .15s;
+}
+#speed-slider::-webkit-slider-thumb:active { transform:scale(1.25); }
+#speed-val { font-size:.75rem;color:var(--text-muted);font-family:var(--mono);min-width:40px;text-align:right; }
+/* Buttons */
+.btn {
+    padding:.55rem 1.15rem;
+    border-radius:var(--radius-sm);
+    font-size:.875rem; font-weight:700;
+    cursor:pointer; border:none;
+    transition:all .18s ease;
+    position:relative; overflow:hidden;
+    white-space:nowrap; letter-spacing:.01em;
+}
+.btn::after {
+    content:''; position:absolute; inset:0;
+    background:rgba(0,0,0,0); transition:background .18s;
+}
+.btn:hover::after { background:rgba(0,0,0,.03); }
+.btn:active { transform:scale(.96); }
+.btn.primary {
+    background:linear-gradient(135deg, var(--blue), var(--indigo));
+    color:#fff;
+    box-shadow:0 4px 18px var(--blue-glow);
+}
+.btn.primary:hover { box-shadow:0 6px 24px var(--blue-glow); transform:translateY(-1px); }
+.btn.secondary {
+    background:rgba(255,255,255,1);
+    color:var(--text); border:1px solid var(--border);
+    box-shadow: 0 1px 3px rgba(0,0,0,0.05);
+}
+.btn.secondary:hover { border-color:var(--border-hi); }
+.btn.outline {
+    background:transparent; color:var(--text);
+    border:1px solid var(--border);
+}
+.btn.outline:hover { border-color:var(--border-hi); background:rgba(0,0,0,.02); }
+.btn.stop {
+    background:linear-gradient(135deg, var(--danger), #ef4444);
+    color:#fff;
+    box-shadow:0 4px 18px var(--danger-glow);
+}
+.btn.stop:hover { box-shadow:0 6px 24px var(--danger-glow); transform:translateY(-1px); }
+/* ── MAIN LAYOUT ────────────────────────────────────────── */
+main {
+    display:grid;
+    grid-template-columns:1fr 310px;
+    gap:1.1rem;
+    flex:1;
+}
+/* ── GRID PANEL ─────────────────────────────────────────── */
+.grid-panel {
+    display:flex; flex-direction:column;
+    padding:1.25rem; gap:.75rem;
+}
+.grid-header {
+    display:flex; align-items:center; justify-content:space-between;
+}
+.grid-title {
+    font-size:.75rem;font-weight:700;
+    text-transform:uppercase;letter-spacing:.1em;color:var(--text-muted);
+}
+.grid-meta { display:flex; gap:.5rem; }
+.mono-chip {
+    font-family:var(--mono); font-size:.75rem;
+    padding:.22rem .65rem; border-radius:20px;
+    background:rgba(0,0,0,.03);
+    border:1px solid var(--border); color:var(--text-muted);
+}
+.accent-chip {
+    color:var(--blue); border-color:rgba(59,130,246,.25);
+    background:rgba(59,130,246,.08);
+}
+.grid-stage {
+    flex:1; display:flex; align-items:center; justify-content:center;
+    position:relative;
+}
+/* ─── The World ──────────────────────────────────────────── */
+.grid-world {
+    display:grid;
+    gap:0;                        /* no cell gap — seamless floor */
+    position:relative;
+    border-radius:var(--radius);
+    overflow:hidden;
+    background: var(--floor);
+    border:1px solid rgba(0,0,0,.06);
+    box-shadow:
+        0 8px 30px rgba(15,23,42,.06),
+        inset 0 0 20px rgba(0,0,0,.02);
+    /* Continuous floor: subtle tile lines */
+    background-image:
+        linear-gradient(rgba(0,0,0,.03) 1px, transparent 1px),
+        linear-gradient(90deg, rgba(0,0,0,.03) 1px, transparent 1px);
+    background-size:
+        var(--cell) var(--cell),
+        var(--cell) var(--cell);
+}
+/* Transparent click-target cells — world feels seamless */
+.cell {
+    width:var(--cell); height:var(--cell);
+    background:transparent;
+    position:relative; z-index:2;
+    cursor:pointer;
+    transition:background .18s;
+}
+.cell:hover { background:rgba(0,0,0,.025); }
+/* ── OBSTACLES — 3D walls ────────────────────────────────── */
+.world-obstacle {
+    position:absolute; z-index:10;
+    width:var(--cell); height:var(--cell);
+    border-radius:4px;
+    background:linear-gradient(160deg, var(--wall-top) 0%, var(--wall) 45%, #94a3b8 100%);
+    border:1px solid rgba(0,0,0,.15);
+    border-top-color:rgba(255,255,255,.8);
+    border-left-color:rgba(255,255,255,.4);
+    box-shadow:
+        inset -1px 0 0 rgba(0,0,0,.08),
+        0 6px 0 0 #94a3b8,                      /* 3D depth */
+        0 8px 12px rgba(15,23,42,.25);          /* floor shadow */
+    overflow:hidden;
+}
+.world-obstacle::before {
+    content:''; position:absolute; inset:0;
+    background:linear-gradient(180deg,rgba(255,255,255,.2) 0%,transparent 40%);
+}
+/* stone texture lines */
+.world-obstacle::after {
+    content:''; position:absolute; inset:0;
+    background:repeating-linear-gradient(
+        0deg, transparent, transparent 14px,
+        rgba(0,0,0,.03) 14px, rgba(0,0,0,.03) 15px
+    ),
+    repeating-linear-gradient(
+        90deg, transparent, transparent 14px,
+        rgba(0,0,0,.02) 14px, rgba(0,0,0,.02) 15px
+    );
+}
+/* ── GARBAGE — glowing litter ────────────────────────────── */
+.world-garbage {
+    position:absolute; z-index:8;
+    width:var(--cell); height:var(--cell);
+    display:flex; align-items:center; justify-content:center;
+    font-size:1.4rem;
+    cursor:pointer;
+    border-radius:var(--radius-xs);
+    background:rgba(168,85,247,.1);
+    border:1px solid rgba(168,85,247,.3);
+    animation: garbo-spawn .35s cubic-bezier(.34,1.56,.64,1) both;
+    transition:transform .15s;
+}
+.world-garbage:hover { transform:scale(1.12); }
+@keyframes garbo-spawn {
+    from { transform:scale(0) rotate(-25deg); opacity:0; }
+    to   { transform:scale(1) rotate(0);      opacity:1; }
+}
+/* glow ring */
+.world-garbage::before {
+    content:''; position:absolute; inset:-3px; border-radius:inherit;
+    border:1.5px solid rgba(168,85,247,.4);
+    animation:garbo-ring 2.4s ease-out infinite;
+}
+@keyframes garbo-ring {
+    0%   { opacity:.6; transform:scale(1); }
+    70%  { opacity:0;  transform:scale(1.45); }
+    100% { opacity:0;  transform:scale(1.45); }
+}
+/* bounce */
+.world-garbage span { display:block; animation:garbo-bob 2.2s ease-in-out infinite; filter: drop-shadow(0 4px 6px rgba(168,85,247,.2)); }
+@keyframes garbo-bob {
+    0%,100%{ transform:translateY(0); }
+    50%    { transform:translateY(-4px); }
+}
+/* ── STATIONS ───────────────────────────────────────────── */
+.world-home, .world-unload {
+    position: absolute;
+    width: var(--cell); height: var(--cell);
+    display: flex; align-items: center; justify-content: center;
+    font-size: 1.2rem;
+    border-radius: 4px;
+    z-index: 5;
+    pointer-events: none;
+}
+.world-home {
+    background: rgba(59, 130, 246, 0.08);
+    border: 2px dashed rgba(59, 130, 246, 0.3);
+}
+.world-home::before {
+    content: '⚡'; filter: drop-shadow(0 0 8px var(--blue));
+}
+.world-unload {
+    background: rgba(245, 158, 11, 0.08);
+    border: 2px dashed rgba(245, 158, 11, 0.3);
+}
+.world-unload::before {
+    content: '📦'; filter: drop-shadow(0 0 8px var(--warning));
+}
+/* ── ROBOT ───────────────────────────────────────────────── */
+.robot-entity {
+    position:absolute; z-index:30;
+    width:var(--cell); height:var(--cell);
+    display:flex; align-items:center; justify-content:center;
+    font-size:1.3rem;
+    border-radius:var(--radius-sm);
+    /* vivid gradient body */
+    background:linear-gradient(145deg, #60a5fa, #818cf8 50%, #6366f1);
+    border:1px solid rgba(255,255,255,.6);
+    border-top-color:rgba(255,255,255,.9);
+    /* multi-layer shadow */
+    box-shadow:
+        0 0 0 1px rgba(59,130,246,.2),
+        0 4px 12px rgba(59,130,246,.3),
+        0 8px 24px rgba(15,23,42,.2);
+    transition:
+        left   var(--move-dur,350ms) linear,
+        top    var(--move-dur,350ms) linear,
+        background 0.4s ease,
+        box-shadow 0.4s ease;
+    will-change: left, top;
+}
+.robot-entity.recharging {
+    background: linear-gradient(145deg, #3b82f6, #60a5fa);
+    box-shadow: 0 0 20px var(--blue-glow);
+}
+.robot-entity.unloading {
+    background: linear-gradient(145deg, #f59e0b, #fbbf24);
+    box-shadow: 0 0 20px var(--warning-glow);
+}
+/* top highlight */
+.robot-entity::before {
+    content:''; position:absolute; top:2px; left:3px; right:3px; height:40%;
+    background:linear-gradient(180deg,rgba(255,255,255,.5) 0%,transparent 100%);
+    border-radius: 4px 4px 50% 50%;
+}
+/* ground shadow */
+.robot-entity::after {
+    content:''; position:absolute; bottom:-12px; left:50%;
+    transform:translateX(-50%);
+    width:80%; height:10px;
+    background:rgba(15,23,42,.25);
+    border-radius:50%; filter:blur(4px);
+}
+/* Collect burst */
+.robot-entity.collecting {
+    animation:robot-collect .42s ease-out;
+}
+@keyframes robot-collect {
+    0%   { box-shadow:0 0 0 1px rgba(59,130,246,.2),0 4px 12px rgba(59,130,246,.3),0 8px 24px rgba(15,23,42,.2); }
+    35%  { box-shadow:0 0 0 2px var(--purple),0 12px 35px var(--purple-glow),0 20px 45px rgba(15,23,42,.15); }
+    100% { box-shadow:0 0 0 1px rgba(59,130,246,.2),0 4px 12px rgba(59,130,246,.3),0 8px 24px rgba(15,23,42,.2); }
+}
+/* ── Trail ghost ────────────────────────────────────────── */
+.trail-ghost {
+    position:absolute; z-index:6;
+    width:var(--cell); height:var(--cell);
+    border-radius:var(--radius-sm);
+    background:rgba(59,130,246,.08);
+    border:1px solid rgba(59,130,246,.15);
+    pointer-events:none;
+    animation:trail-fade 1.1s ease-out forwards;
+}
+@keyframes trail-fade {
+    from { opacity:1; transform:scale(1); }
+    to   { opacity:0;  transform:scale(.82); }
+}
+/* ── Particle burst ─────────────────────────────────────── */
+.particle-layer { position:absolute; inset:0; pointer-events:none; z-index:40; }
+.particle {
+    position:absolute;
+    border-radius:50%; pointer-events:none;
+    animation:pfx .75s ease-out forwards;
+}
+@keyframes pfx {
+    0%   { transform:translate(0,0) scale(1); opacity:1; }
+    100% { transform:var(--tx) scale(0);       opacity:0; }
+}
+/* ── Grid hint ──────────────────────────────────────────── */
+.grid-hint { font-size:.72rem; color:var(--text-dim); text-align:center; }
+/* ── SIDE PANEL ─────────────────────────────────────────── */
+.side-panel { display:flex; flex-direction:column; gap:1.1rem; }
+.section-title {
+    font-size:.72rem; font-weight:700;
+    text-transform:uppercase; letter-spacing:.12em; color:var(--text-muted);
+    margin-bottom:.9rem;
+}
+/* Telemetry card */
+.tele-card { padding:1.2rem; }
+.stat-row { display:flex; align-items:center; gap:.75rem; margin-bottom:.95rem; }
+.stat-icon { font-size:1.3rem; flex-shrink:0; width:34px; text-align:center; filter:drop-shadow(0 2px 4px rgba(0,0,0,.1)); }
+.stat-body { flex:1; }
+.stat-label-row { display:flex; justify-content:space-between; align-items:baseline; margin-bottom:.35rem; }
+.stat-label { font-size:.78rem; color:var(--text-muted); }
+.stat-num { font-family:var(--mono); font-size:.85rem; font-weight:600; color:var(--text); }
+.stat-num.big-num { font-size:1.55rem; font-weight:800; }
+.stat-num.accent { color:var(--blue); }
+.progress-track {
+    height:7px; background:rgba(0,0,0,.06);
+    border-radius:4px; overflow:hidden;
+}
+.progress-fill {
+    height:100%; border-radius:4px;
+    background:var(--success);
+    transition:width .45s ease, background .45s ease;
+    position:relative; overflow:hidden;
+}
+.progress-fill::after {
+    content:''; position:absolute; inset:0;
+    background:linear-gradient(90deg,transparent,rgba(255,255,255,.4),transparent);
+    animation:shimmer 1.8s linear infinite;
+}
+@keyframes shimmer { from{transform:translateX(-100%)} to{transform:translateX(100%)} }
+/* Mini chart */
+.chart-wrap { margin-top:.5rem; }
+#reward-chart { width:100%; height:68px; border-radius:var(--radius-xs); }
+/* Log card */
+.log-card { padding:1.2rem; display:flex; flex-direction:column; flex:1; min-height:0; }
+.log-header { display:flex; justify-content:space-between; align-items:center; margin-bottom:.7rem; }
+.log-header .section-title { margin-bottom:0; }
+.clear-btn { font-size:.7rem; color:var(--text-muted); background:none; border:none; cursor:pointer; transition:color .2s; }
+.clear-btn:hover { color:var(--danger); }
+.log-feed {
+    flex:1; display:flex; flex-direction:column; gap:.45rem;
+    overflow-y:auto; max-height:280px; padding-right:.2rem;
+}
+.placeholder { font-size:.8rem; color:var(--text-dim); text-align:center; padding:1rem; }
+.log-entry {
+    display:flex; gap:.6rem; align-items:flex-start;
+    font-size:.77rem; line-height:1.45;
+    padding:.55rem .7rem;
+    border-radius:var(--radius-sm);
+    background:rgba(255,255,255,1);
+    border:1px solid rgba(0,0,0,.04);
+    box-shadow: 0 1px 3px rgba(15,23,42,.03);
+    color: var(--text);
+    animation:slide-in .22s cubic-bezier(.22,1,.36,1);
+    transition:border-color .2s, box-shadow .2s;
+}
+.log-entry:hover { border-color:var(--border-hi); box-shadow: 0 2px 6px rgba(15,23,42,.06); }
+@keyframes slide-in { from{opacity:0;transform:translateY(5px)} to{opacity:1;transform:translateY(0)} }
+.log-badge {
+    font-family:var(--mono); font-size:.64rem; font-weight:700;
+    padding:2px 6px; border-radius:4px; flex-shrink:0; margin-top:2px;
+}
+.log-badge.llm     { background:rgba(59,130,246,.12); color:var(--blue); }
+.log-badge.bfs     { background:rgba(20,184,166,.12); color:var(--teal); }
+.log-badge.q-table { background:rgba(245,158,11,.15); color:#d97706; }
+.log-badge.sys     { background:rgba(0,0,0,.05);      color:var(--text-muted); }
+.log-footer {
+    font-size:.7rem; color:var(--text-dim);
+    text-align:center; margin-top:.6rem; padding-top:.55rem;
+    border-top:1px solid var(--border);
+}
+/* ── Scrollbar ──────────────────────────────────────────── */
+::-webkit-scrollbar          { width:5px; }
+::-webkit-scrollbar-track    { background:transparent; }
+::-webkit-scrollbar-thumb    { background:rgba(0,0,0,.15); border-radius:3px; }
+::-webkit-scrollbar-thumb:hover { background:rgba(0,0,0,.25); }

garbage_robot_dataset.jsonl ADDED Viewed

	@@ -0,0 +1,35 @@

+{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (0, 0). Garbage at [(4, 4)]. Battery: 30/30. No obstacles nearby.\n\n### Response:\nRIGHT"}
+{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (1, 0). Garbage at [(4, 4)]. Battery: 29/30. No obstacles nearby.\n\n### Response:\nRIGHT"}
+{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (2, 0). Garbage at [(4, 4)]. Battery: 28/30. No obstacles nearby.\n\n### Response:\nRIGHT"}
+{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (3, 0). Garbage at [(4, 4)]. Battery: 27/30. No obstacles nearby.\n\n### Response:\nRIGHT"}
+{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (4, 0). Garbage at [(4, 4)]. Battery: 26/30. No obstacles nearby.\n\n### Response:\nUP"}
+{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (4, 1). Garbage at [(4, 4)]. Battery: 25/30. No obstacles nearby.\n\n### Response:\nUP"}
+{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (4, 2). Garbage at [(4, 4)]. Battery: 24/30. No obstacles nearby.\n\n### Response:\nUP"}
+{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (4, 3). Garbage at [(4, 4)]. Battery: 23/30. No obstacles nearby.\n\n### Response:\nUP"}
+{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (4, 4). Garbage at [(4, 4)]. Battery: 22/30. You are ON the garbage.\n\n### Response:\nCOLLECT"}
+{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (3, 3). Garbage at [(1,1),(5,5),(1,5)]. Battery: 50/50. BLOCKED! DOWN is an obstacle. Blocked directions: DOWN, LEFT. Choose a different direction.\n\n### Response:\nUP"}
+{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (3, 4). Garbage at [(1,1),(5,5),(1,5)]. Battery: 49/50. Moving toward (1,5).\n\n### Response:\nLEFT"}
+{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (2, 4). Garbage at [(1,1),(5,5),(1,5)]. Battery: 48/50. BLOCKED! LEFT is an obstacle. Blocked directions: LEFT. Choose RIGHT or UP.\n\n### Response:\nUP"}
+{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (2, 5). Garbage at [(1,1),(5,5),(1,5)]. Battery: 47/50. Clear path left.\n\n### Response:\nLEFT"}
+{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (1, 5). Garbage at [(1,1),(5,5),(1,5)]. Battery: 46/50. You are ON the garbage.\n\n### Response:\nCOLLECT"}
+{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (1, 5). Garbage at [(1,1),(5,5)]. Battery: 45/50. Next target (5,5), moving right.\n\n### Response:\nRIGHT"}
+{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (2, 5). Garbage at [(1,1),(5,5)]. Battery: 44/50. Continuing right.\n\n### Response:\nRIGHT"}
+{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (3, 5). Garbage at [(1,1),(5,5)]. Battery: 43/50. Continuing right.\n\n### Response:\nRIGHT"}
+{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (4, 5). Garbage at [(1,1),(5,5)]. Battery: 42/50. Continuing right.\n\n### Response:\nRIGHT"}
+{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (5, 5). Garbage at [(1,1),(5,5)]. Battery: 41/50. You are ON the garbage.\n\n### Response:\nCOLLECT"}
+{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (5, 5). Garbage at [(1,1)]. Battery: 40/50. Last garbage at (1,1), heading left+down.\n\n### Response:\nLEFT"}
+{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (4, 5). Garbage at [(1,1)]. Battery: 39/50. Continuing toward (1,1).\n\n### Response:\nLEFT"}
+{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (3, 5). Garbage at [(1,1)]. Battery: 38/50. BLOCKED! DOWN is an obstacle. Go LEFT.\n\n### Response:\nLEFT"}
+{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (2, 5). Garbage at [(1,1)]. Battery: 37/50. BLOCKED! DOWN is an obstacle. Go LEFT.\n\n### Response:\nLEFT"}
+{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (1, 5). Garbage at [(1,1)]. Battery: 36/50. Path down is clear now.\n\n### Response:\nDOWN"}
+{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (1, 4). Garbage at [(1,1)]. Battery: 35/50. Continuing down.\n\n### Response:\nDOWN"}
+{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (1, 3). Garbage at [(1,1)]. Battery: 34/50. Continuing down.\n\n### Response:\nDOWN"}
+{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (1, 2). Garbage at [(1,1)]. Battery: 33/50. Continuing down.\n\n### Response:\nDOWN"}
+{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (1, 1). Garbage at [(1,1)]. Battery: 32/50. You are ON the last garbage.\n\n### Response:\nCOLLECT"}
+{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (2, 2). Garbage at [(4,4)]. Battery: 5/30. CRITICAL battery! Move directly: RIGHT.\n\n### Response:\nRIGHT"}
+{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (3, 2). Garbage at [(4,4)]. Battery: 4/30. CRITICAL battery! Move directly: RIGHT.\n\n### Response:\nRIGHT"}
+{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (4, 2). Garbage at [(4,4)]. Battery: 3/30. CRITICAL battery! Move directly: UP.\n\n### Response:\nUP"}
+{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (4, 3). Garbage at [(4,4)]. Battery: 2/30. CRITICAL battery! Move directly: UP.\n\n### Response:\nUP"}
+{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (4, 4). Garbage at [(4,4)]. Battery: 1/30. You are ON the garbage. COLLECT NOW.\n\n### Response:\nCOLLECT"}
+{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (2, 3). Garbage at [(4,4)]. Battery: 20/30. You are NOT on garbage. Move toward it.\n\n### Response:\nRIGHT"}
+{"text": "### Instruction:\nYou control a garbage collecting robot. Reply with ONE of: UP DOWN LEFT RIGHT COLLECT\n\n### Input:\nENVIRONMENT STATUS:\nYou are at (0, 0). Garbage at [(3,3)]. Battery: 15/30. You are NOT on garbage. Do not COLLECT.\n\n### Response:\nRIGHT"}

inference.py ADDED Viewed

	@@ -0,0 +1,520 @@

+import os
+import time
+import requests
+import json
+from collections import deque
+from openai import OpenAI
+API_BASE_URL = os.environ.get("API_BASE_URL", "https://api.openai.com/v1")
+MODEL_NAME   = os.environ.get("MODEL_NAME",   "gpt-4o-mini")
+HF_TOKEN     = os.environ.get("HF_TOKEN",     "")
+ENV_URL      = os.environ.get("ENV_URL",      "http://localhost:7861")
+LOCAL_MODEL_PATH = os.environ.get(
+    "LOCAL_MODEL_PATH",
+    "TechAvenger/GarbageBot-Weights"
+)
+MAX_STEPS = 200   # raised to account for recharge/unload detours
+# Lazy-loaded local model — populated in main() if Unsloth is available
+_local_model     = None
+_local_tokenizer = None
+# Q-Learning agent — loaded once in main(), used as primary policy
+_ql_agent = None
+try:
+    from qlearning import QLearningAgent
+except ImportError:
+    QLearningAgent = None
+# ──────────────────────────────────────────────────────────
+# BFS CORE
+# ──────────────────────────────────────────────────────────
+def bfs(start, goal, obstacles, grid_w, grid_h):
+    """
+    BFS from start to goal avoiding obstacles.
+    Returns (first_direction, path_length) or (None, inf) if unreachable.
+    """
+    start, goal = tuple(start), tuple(goal)
+    if start == goal:
+        return ("COLLECT", 0)
+    obstacle_set = frozenset(tuple(o) for o in obstacles)
+    dirs = [("RIGHT",(1,0)), ("LEFT",(-1,0)), ("UP",(0,1)), ("DOWN",(0,-1))]
+    queue   = deque([(start, None, 0)])
+    visited = {start}
+    while queue:
+        pos, first, depth = queue.popleft()
+        for name, (dx, dy) in dirs:
+            npos = (pos[0]+dx, pos[1]+dy)
+            if not (0 <= npos[0] < grid_w and 0 <= npos[1] < grid_h):
+                continue
+            if npos in obstacle_set or npos in visited:
+                continue
+            move = first if first else name
+            if npos == goal:
+                return (move, depth + 1)
+            visited.add(npos)
+            queue.append((npos, move, depth + 1))
+    return (None, float('inf'))
+def nearest_neighbour_order(start, targets, obstacles, grid_w, grid_h):
+    """
+    Orders garbage by nearest-neighbour TSP using actual BFS cost.
+    Much better than Manhattan when obstacles split direct paths.
+    """
+    remaining = list(targets)
+    ordered   = []
+    current   = tuple(start)
+    while remaining:
+        best = min(remaining, key=lambda t: bfs(current, t, obstacles, grid_w, grid_h)[1])
+        ordered.append(best)
+        remaining.remove(best)
+        current = tuple(best)
+    return ordered
+# ──────────────────────────────────────────────────────────
+# HEURISTIC — BFS-based, mode-aware
+# ──────────────────────────────────────────────────────────
+def heuristic_action(obs, _stuck_counter=None) -> str:
+    """
+    Pure-BFS heuristic that respects the robot's autonomous mode.
+    When the environment reports robot_mode == 'recharging' or 'unloading',
+    the action suggested here is overridden by the environment's own resolver
+    anyway — but we still return a sensible direction so logs are readable.
+    In normal mode the heuristic targets the nearest garbage via BFS with a
+    nearest-neighbour tour order, plus a stuck-counter escape hatch.
+    """
+    if _stuck_counter is None:
+        _stuck_counter = [0]
+    robot_mode     = obs.get("robot_mode", "normal")
+    r_pos          = list(obs["robot_position"])
+    obstacles      = [list(o) for o in obs["obstacle_positions"]]
+    grid_w, grid_h = obs["grid_size"]
+    # ── Recharging: head to home ───────────────────────────────
+    if robot_mode == "recharging":
+        home = obs.get("home_position", r_pos)
+        move, _ = bfs(r_pos, home, obstacles, grid_w, grid_h)
+        return move or "UP"
+    # ── Unloading: head to unload station ─────────────────────
+    if robot_mode == "unloading":
+        station = obs.get("unload_station", r_pos)
+        move, _ = bfs(r_pos, station, obstacles, grid_w, grid_h)
+        return move or "UP"
+    # ── Normal: collect nearest garbage ───────────────────────
+    garbage = [tuple(g) for g in obs["garbage_positions"]]
+    if not garbage:
+        return "UP"   # nothing to do; env will mark episode done
+    if tuple(r_pos) in garbage:
+        _stuck_counter[0] = 0
+        return "COLLECT"
+    ordered = nearest_neighbour_order(r_pos, garbage, obstacles, grid_w, grid_h)
+    # Stuck-counter escape: try alternate targets after repeated no-progress steps
+    if _stuck_counter[0] >= 4 and len(ordered) > 1:
+        ordered = [ordered[1], ordered[0]] + ordered[2:]
+    if _stuck_counter[0] >= 8:
+        ordered = ordered[1:] + ordered[:1]
+        _stuck_counter[0] = 0
+    target = ordered[0]
+    if tuple(target) == tuple(r_pos):
+        _stuck_counter[0] = 0
+        return "COLLECT"
+    move, _ = bfs(r_pos, target, obstacles, grid_w, grid_h)
+    if move and move != "COLLECT":
+        _stuck_counter[0] = 0
+        return move
+    # Primary target unreachable — try alternates
+    for alt in ordered[1:]:
+        move, _ = bfs(r_pos, alt, obstacles, grid_w, grid_h)
+        if move and move != "COLLECT":
+            _stuck_counter[0] = 0
+            return move
+    # Fully boxed in: take any open neighbouring cell to escape
+    _stuck_counter[0] += 1
+    obstacle_set = frozenset(tuple(o) for o in obstacles)
+    for name, (dx, dy) in [("RIGHT",(1,0)),("LEFT",(-1,0)),("UP",(0,1)),("DOWN",(0,-1))]:
+        npos = (r_pos[0]+dx, r_pos[1]+dy)
+        if (0 <= npos[0] < grid_w and 0 <= npos[1] < grid_h
+                and npos not in obstacle_set):
+            return name
+    return "RIGHT"
+# ──────────────────────────────────────────────────────────
+# ACTION RESOLVER  (priority: Q-table → LLM → BFS heuristic)
+# ──────────────────────────────────────────────────────────
+def resolve_next_action(client, obs, context_history, stuck_counter=None) -> str:
+    """
+    Decide the next action using the priority chain:
+      1. Q-table (trained, deterministic, fastest)
+      2. Fine-tuned local LLM (Unsloth export)
+      3. Remote OpenAI-compatible endpoint
+      4. BFS heuristic (fallback, always works)
+    The BFS heuristic is mode-aware and is passed as a hint to the LLM.
+    Note: when the environment is in MODE_RECHARGE or MODE_UNLOAD it will
+    override whatever action we return, so correctness in those modes is
+    the heuristic's responsibility, not the LLM's.
+    """
+    heuristic = heuristic_action(obs, stuck_counter)
+    # ── 1. Q-Learning policy (trained, deterministic) ──────────
+    if _ql_agent is not None:
+        q_action = _ql_agent.get_action(obs)
+        if q_action is not None:
+            return q_action
+    # Build a mode-aware system prompt for the LLM
+    robot_mode   = obs.get("robot_mode", "normal")
+    dist_home    = obs.get("distance_from_home", -1)
+    storage_load = obs.get("current_storage_load", 0)
+    capacity     = obs.get("storage_capacity", 6)
+    home         = obs.get("home_position", (0, 0))
+    station      = obs.get("unload_station", (0, 0))
+    mode_note = ""
+    if robot_mode == "recharging":
+        mode_note = (
+            f"\n⚠ ROBOT MODE: RECHARGING — navigate to home {home} "
+            f"({dist_home} steps away). Do NOT collect garbage until recharged."
+        )
+    elif robot_mode == "unloading":
+        mode_note = (
+            f"\n⚠ ROBOT MODE: UNLOADING — navigate to unload station {station}. "
+            f"Storage is full ({storage_load}/{capacity}). "
+            f"Do NOT collect garbage until unloaded."
+        )
+    else:
+        mode_note = (
+            f"\nBattery distance to home: {dist_home} steps. "
+            f"Storage: {storage_load}/{capacity}."
+        )
+    system_prompt = (
+        "You control a garbage collecting robot on a grid.\n"
+        "Reply with EXACTLY ONE of: UP  DOWN  LEFT  RIGHT  COLLECT\n\n"
+        "Rules:\n"
+        "- COLLECT only when your position exactly matches a garbage position.\n"
+        "- Never move into an obstacle tile.\n"
+        "- The environment handles recharging and unloading automatically.\n"
+        f"- Pathfinding suggests: {heuristic}  (only override if clearly wrong)"
+        f"{mode_note}"
+    )
+    # ── 2. Try local fine-tuned merged model (Alpaca prompt format) ─────
+    if _local_model is not None and _local_tokenizer is not None:
+        try:
+            alpaca_instruction = (
+                "You are an AI brain controlling a garbage collecting robot.\n"
+                "Reply with EXACTLY ONE of: UP DOWN LEFT RIGHT COLLECT"
+            )
+            prompt = (
+                f"### Instruction:\n{alpaca_instruction}\n\n"
+                f"### Input:\nENVIRONMENT STATUS:\n{obs['message']}\n\n"
+                f"### Response:\n"
+            )
+            inputs = _local_tokenizer(
+                prompt, return_tensors="pt", truncation=True, max_length=512
+            ).to(_local_model.device)
+            with __import__('torch').no_grad():
+                outputs = _local_model.generate(
+                    **inputs, max_new_tokens=6, do_sample=False,
+                    pad_token_id=_local_tokenizer.eos_token_id
+                )
+            new_tokens = outputs[0][inputs["input_ids"].shape[1]:]
+            token = _local_tokenizer.decode(new_tokens, skip_special_tokens=True).strip().upper()
+            for valid in ["UP", "DOWN", "LEFT", "RIGHT", "COLLECT"]:
+                if valid in token:
+                    print(f"[LOCAL LLM] {token.split()[0] if token else '?'} (raw: {token!r})")
+                    return valid
+        except Exception as e:
+            print(f"[LOCAL LLM ERROR] {e}")
+    # ── 3. Try remote OpenAI-compatible endpoint ─────────────────
+    if client is not None:
+        try:
+            response = client.chat.completions.create(
+                model=MODEL_NAME,
+                messages=[
+                    {"role": "system", "content": system_prompt},
+                    *context_history,
+                    {"role": "user", "content": f"STATUS:\n{obs['message']}\n\nCommand?"}
+                ],
+                temperature=0.0,
+                max_tokens=6
+            )
+            action = response.choices[0].message.content.strip().upper()
+            for valid in ["UP", "DOWN", "LEFT", "RIGHT", "COLLECT"]:
+                if valid in action:
+                    return valid
+        except Exception as e:
+            print(f"[REMOTE LLM ERROR] {e}")
+    # ── 4. Final fallback: pure BFS heuristic ─────────────────
+    return heuristic
+# ──────────────────────────────────────────────────────────
+# INTERACTIVE GARBAGE PLACEMENT
+# ──────────────────────────────────────────────────────────
+def prompt_custom_garbage(grid_w, grid_h, obstacles):
+    """
+    Interactive CLI helper: prompts the user to enter garbage positions
+    for a dynamic episode.
+    """
+    obstacle_set = set(tuple(o) for o in obstacles)
+    print(f"\n  Grid: {grid_w} x {grid_h}   Obstacles: {sorted(obstacle_set)}")
+    print("  Enter garbage positions:")
+    print("    x,y       place at column x, row y  (e.g. '4,4')")
+    print("    random N  place N random pieces      (e.g. 'random 5')")
+    print("    done      start the episode\n")
+    garbage = []
+    while True:
+        raw = input("  Garbage > ").strip().lower()
+        if raw == "done":
+            if not garbage:
+                print("  Need at least one garbage tile.")
+                continue
+            break
+        if raw.startswith("random"):
+            import random
+            parts = raw.split()
+            n = int(parts[1]) if len(parts) > 1 else 3
+            candidates = [(x, y) for x in range(grid_w) for y in range(grid_h)
+                          if (x, y) not in obstacle_set]
+            garbage = random.sample(candidates, min(n, len(candidates)))
+            print(f"  Random garbage: {garbage}")
+            break
+        try:
+            x, y = map(int, raw.split(","))
+            if not (0 <= x < grid_w and 0 <= y < grid_h):
+                print(f"  Out of bounds — valid: 0-{grid_w-1}, 0-{grid_h-1}")
+                continue
+            if (x, y) in obstacle_set:
+                print(f"  ({x},{y}) is an obstacle.")
+                continue
+            if (x, y) in garbage:
+                print(f"  ({x},{y}) already added.")
+                continue
+            garbage.append((x, y))
+            print(f"  Added ({x},{y})  total: {garbage}")
+        except ValueError:
+            print("  Format: x,y  e.g. '3,4'")
+    return garbage
+def reset_with_custom_garbage(task_id, garbage_positions):
+    """
+    Posts to /reset_custom to inject custom garbage positions at runtime.
+    Falls back to standard /reset if something goes wrong.
+    """
+    try:
+        res = requests.post(f"{ENV_URL}/reset_custom", json={
+            "task_id": task_id,
+            "garbage_positions": [list(g) for g in garbage_positions]
+        })
+        res.raise_for_status()
+        return res.json()["observation"]
+    except Exception as e:
+        print(f"[WARN] /reset_custom failed ({e}), falling back to /reset")
+        res = requests.post(f"{ENV_URL}/reset", json={"task_id": task_id})
+        res.raise_for_status()
+        return res.json()["observation"]
+# ──────────────────────────────────────────────────────────
+# EPISODE RUNNER
+# ──────────────────────────────────────────────────────────
+def print_log(log_dict):
+    print(json.dumps(log_dict), flush=True)
+def run_episode(client, task_id, obs):
+    policy = (
+        "q-table"    if (_ql_agent and _ql_agent.loaded)  else
+        "local-llm"  if _local_model                      else
+        "remote-llm" if client                            else
+        "bfs"
+    )
+    print_log({"type": "[START]", "task_id": task_id,
+               "model": MODEL_NAME, "policy": policy, "max_steps": MAX_STEPS})
+    total_reward    = 0.0
+    done            = False
+    context_history = []
+    step_idx        = 0
+    stuck_counter   = [0]   # per-episode; no cross-episode state leak
+    for step_idx in range(1, MAX_STEPS + 1):
+        action = resolve_next_action(client, obs, context_history, stuck_counter)
+        try:
+            res = requests.post(f"{ENV_URL}/step", json={"command": action})
+            res.raise_for_status()
+            step_data = res.json()
+        except Exception as e:
+            print(f"Step error: {e}")
+            break
+        obs          = step_data["observation"]
+        reward       = step_data["reward"]
+        done         = step_data["done"]
+        info         = step_data.get("info", {})
+        total_reward += reward
+        # Log includes autonomous-override details for debugging
+        log_entry = {
+            "type":           "[STEP]",
+            "step":           step_idx,
+            "action":         action,
+            "effective":      info.get("effective_command", action),
+            "overridden":     info.get("autonomous_override", False),
+            "mode":           obs.get("robot_mode", "normal"),
+            "battery":        obs.get("battery_level"),
+            "storage":        f"{obs.get('current_storage_load')}/{obs.get('storage_capacity')}",
+            "dist_home":      obs.get("distance_from_home"),
+            "reward":         round(reward, 2),
+            "total_reward":   round(total_reward, 2),
+            "done":           done,
+        }
+        print_log(log_entry)
+        if done:
+            break
+        time.sleep(0.05)
+    try:
+        score = requests.get(f"{ENV_URL}/grade/{task_id}").json()["score"]
+    except Exception:
+        score = 0.0
+    print_log({"type": "[END]", "task_id": task_id, "total_steps": step_idx,
+               "final_reward": round(total_reward, 2), "score": score})
+    return score
+# ──────────────────────────────────────────────────────────
+# MAIN
+# ──────────────────────────────────────────────────────────
+def main():
+    global _local_model, _local_tokenizer, _ql_agent
+    print("=" * 55)
+    print("  Garbage Collecting Robot — Inference")
+    print("=" * 55)
+    # ── 1. Load Q-Learning policy (fastest, no GPU needed) ────
+    if QLearningAgent is not None:
+        _ql_agent = QLearningAgent()
+        if _ql_agent.loaded:
+            print(f"\n  [INFO] Q-table loaded ({len(_ql_agent.qtable):,} states). "
+                  "Q-learning is the primary policy.")
+        else:
+            print("\n  [WARN] No Q-table found (qtable.json). "
+                  "Run: python qlearning.py --train")
+            print("          Falling through to LLM / BFS.")
+    else:
+        print("\n  [WARN] qlearning.py not found — skipping Q-table.")
+    # ── 2. Attempt to load the fine-tuned merged model ────────────
+    try:
+        from transformers import AutoModelForCausalLM, AutoTokenizer
+        import torch
+        print(f"\n  [INFO] Loading fine-tuned model from:\n         {LOCAL_MODEL_PATH}")
+        _local_tokenizer = AutoTokenizer.from_pretrained(LOCAL_MODEL_PATH)
+        _local_model = AutoModelForCausalLM.from_pretrained(
+            LOCAL_MODEL_PATH,
+            torch_dtype=torch.float16,
+            device_map="auto",
+        )
+        _local_model.eval()
+        print("  [INFO] Fine-tuned model loaded — used when Q-table misses a state.")
+    except Exception as e:
+        print(f"  [WARN] Fine-tuned model unavailable ({e}).")
+        print("          Falling back to remote API / BFS heuristic.")
+        _local_model, _local_tokenizer = None, None
+    import argparse
+    parser = argparse.ArgumentParser(description="Run GarbageBot Inference")
+    parser.add_argument("--dynamic", action="store_true",
+                        help="Interactive dynamic garbage placement")
+    parser.add_argument("--task",
+                        choices=["1","2","3","4","easy","medium","hard","all"],
+                        default="all",
+                        help="Task to run: 'easy', 'medium', 'hard', or 'all'")
+    args = parser.parse_args()
+    if args.task in ["1", "easy"]:
+        tasks = ["task_easy"]
+    elif args.task in ["2", "medium"]:
+        tasks = ["task_medium"]
+    elif args.task in ["3", "hard"]:
+        tasks = ["task_hard"]
+    else:
+        tasks = ["task_easy", "task_medium", "task_hard"]
+    print(f"\n  [INFO] Running tasks: {', '.join(tasks)}")
+    client = OpenAI(api_key=HF_TOKEN, base_url=API_BASE_URL) if HF_TOKEN else None
+    if not client and _local_model is None:
+        print("\n  [INFO] No HF_TOKEN and no local model — pure BFS heuristic mode.")
+    elif not client:
+        print("\n  [INFO] No HF_TOKEN — using local Unsloth model + BFS fallback.")
+    for task_id in tasks:
+        print(f"\n{'─'*40}\n  {task_id}\n{'─'*40}")
+        try:
+            res = requests.post(f"{ENV_URL}/reset", json={"task_id": task_id})
+            res.raise_for_status()
+            base_obs = res.json()["observation"]
+        except Exception as e:
+            print(f"Reset failed: {e}")
+            continue
+        if args.dynamic:
+            garbage = prompt_custom_garbage(
+                base_obs["grid_size"][0],
+                base_obs["grid_size"][1],
+                base_obs["obstacle_positions"]
+            )
+            obs = reset_with_custom_garbage(task_id, garbage)
+        else:
+            obs = base_obs
+        run_episode(client, task_id, obs)
+if __name__ == "__main__":
+    main()

inference_output.log ADDED Viewed

	@@ -0,0 +1,240 @@

+============================================================
+Garbage Collecting Robot — Inference Script
+============================================================
+────────────────────────────────────────
+Running task: task_easy
+────────────────────────────────────────
+{"type": "[START]", "task_id": "task_easy", "env": "garbage-collecting-robot", "model": "gpt-4o-mini", "max_steps": 50}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 1, "action": "RIGHT", "reward": -0.1, "total_reward": -0.1, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 2, "action": "RIGHT", "reward": -0.1, "total_reward": -0.2, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 3, "action": "RIGHT", "reward": -0.1, "total_reward": -0.30000000000000004, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 4, "action": "RIGHT", "reward": -0.1, "total_reward": -0.4, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 5, "action": "UP", "reward": -0.1, "total_reward": -0.5, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 6, "action": "UP", "reward": -0.1, "total_reward": -0.6, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 7, "action": "UP", "reward": -0.1, "total_reward": -0.7, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 8, "action": "UP", "reward": -0.1, "total_reward": -0.7999999999999999, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 9, "action": "COLLECT", "reward": 59.9, "total_reward": 59.1, "done": true}
+{"type": "[END]", "task_id": "task_easy", "total_steps": 9, "final_reward": 59.1, "score": 1.0}
+────────────────────────────────────────
+Running task: task_medium
+────────────────────────────────────────
+{"type": "[START]", "task_id": "task_medium", "env": "garbage-collecting-robot", "model": "gpt-4o-mini", "max_steps": 50}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 1, "action": "LEFT", "reward": -5.1, "total_reward": -5.1, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 2, "action": "LEFT", "reward": -5.1, "total_reward": -10.2, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 3, "action": "LEFT", "reward": -5.1, "total_reward": -15.299999999999999, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 4, "action": "LEFT", "reward": -5.1, "total_reward": -20.4, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 5, "action": "LEFT", "reward": -5.1, "total_reward": -25.5, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 6, "action": "LEFT", "reward": -5.1, "total_reward": -30.6, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 7, "action": "LEFT", "reward": -5.1, "total_reward": -35.7, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 8, "action": "LEFT", "reward": -5.1, "total_reward": -40.800000000000004, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 9, "action": "LEFT", "reward": -5.1, "total_reward": -45.900000000000006, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 10, "action": "LEFT", "reward": -5.1, "total_reward": -51.00000000000001, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 11, "action": "LEFT", "reward": -5.1, "total_reward": -56.10000000000001, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 12, "action": "LEFT", "reward": -5.1, "total_reward": -61.20000000000001, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 13, "action": "LEFT", "reward": -5.1, "total_reward": -66.30000000000001, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 14, "action": "LEFT", "reward": -5.1, "total_reward": -71.4, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 15, "action": "LEFT", "reward": -5.1, "total_reward": -76.5, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 16, "action": "LEFT", "reward": -5.1, "total_reward": -81.6, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 17, "action": "LEFT", "reward": -5.1, "total_reward": -86.69999999999999, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 18, "action": "LEFT", "reward": -5.1, "total_reward": -91.79999999999998, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 19, "action": "LEFT", "reward": -5.1, "total_reward": -96.89999999999998, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 20, "action": "LEFT", "reward": -5.1, "total_reward": -101.99999999999997, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 21, "action": "LEFT", "reward": -5.1, "total_reward": -107.09999999999997, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 22, "action": "LEFT", "reward": -5.1, "total_reward": -112.19999999999996, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 23, "action": "LEFT", "reward": -5.1, "total_reward": -117.29999999999995, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 24, "action": "LEFT", "reward": -5.1, "total_reward": -122.39999999999995, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 25, "action": "LEFT", "reward": -5.1, "total_reward": -127.49999999999994, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 26, "action": "LEFT", "reward": -5.1, "total_reward": -132.59999999999994, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 27, "action": "LEFT", "reward": -5.1, "total_reward": -137.69999999999993, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 28, "action": "LEFT", "reward": -5.1, "total_reward": -142.79999999999993, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 29, "action": "LEFT", "reward": -5.1, "total_reward": -147.89999999999992, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 30, "action": "LEFT", "reward": -5.1, "total_reward": -152.99999999999991, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 31, "action": "LEFT", "reward": -5.1, "total_reward": -158.0999999999999, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 32, "action": "LEFT", "reward": -5.1, "total_reward": -163.1999999999999, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 33, "action": "LEFT", "reward": -5.1, "total_reward": -168.2999999999999, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 34, "action": "LEFT", "reward": -5.1, "total_reward": -173.3999999999999, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 35, "action": "LEFT", "reward": -5.1, "total_reward": -178.4999999999999, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 36, "action": "LEFT", "reward": -5.1, "total_reward": -183.59999999999988, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 37, "action": "LEFT", "reward": -5.1, "total_reward": -188.69999999999987, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 38, "action": "LEFT", "reward": -5.1, "total_reward": -193.79999999999987, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 39, "action": "LEFT", "reward": -5.1, "total_reward": -198.89999999999986, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 40, "action": "LEFT", "reward": -5.1, "total_reward": -203.99999999999986, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 41, "action": "LEFT", "reward": -5.1, "total_reward": -209.09999999999985, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 42, "action": "LEFT", "reward": -5.1, "total_reward": -214.19999999999985, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 43, "action": "LEFT", "reward": -5.1, "total_reward": -219.29999999999984, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 44, "action": "LEFT", "reward": -5.1, "total_reward": -224.39999999999984, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 45, "action": "LEFT", "reward": -5.1, "total_reward": -229.49999999999983, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 46, "action": "LEFT", "reward": -5.1, "total_reward": -234.59999999999982, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 47, "action": "LEFT", "reward": -5.1, "total_reward": -239.69999999999982, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 48, "action": "LEFT", "reward": -5.1, "total_reward": -244.7999999999998, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 49, "action": "LEFT", "reward": -5.1, "total_reward": -249.8999999999998, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 50, "action": "LEFT", "reward": -5.1, "total_reward": -254.9999999999998, "done": true}
+{"type": "[END]", "task_id": "task_medium", "total_steps": 50, "final_reward": -254.9999999999998, "score": 0.0}
+────────────────────────────────────────
+Running task: task_hard
+────────────────────────────────────────
+{"type": "[START]", "task_id": "task_hard", "env": "garbage-collecting-robot", "model": "gpt-4o-mini", "max_steps": 50}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 1, "action": "RIGHT", "reward": -0.1, "total_reward": -0.1, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 2, "action": "RIGHT", "reward": -0.1, "total_reward": -0.2, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 3, "action": "RIGHT", "reward": -0.1, "total_reward": -0.30000000000000004, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 4, "action": "RIGHT", "reward": -0.1, "total_reward": -0.4, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 5, "action": "RIGHT", "reward": -0.1, "total_reward": -0.5, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 6, "action": "RIGHT", "reward": -0.1, "total_reward": -0.6, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 7, "action": "RIGHT", "reward": -0.1, "total_reward": -0.7, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 8, "action": "RIGHT", "reward": -0.1, "total_reward": -0.7999999999999999, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 9, "action": "UP", "reward": -0.1, "total_reward": -0.8999999999999999, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 10, "action": "UP", "reward": -0.1, "total_reward": -0.9999999999999999, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 11, "action": "UP", "reward": -0.1, "total_reward": -1.0999999999999999, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 12, "action": "UP", "reward": -0.1, "total_reward": -1.2, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 13, "action": "UP", "reward": -0.1, "total_reward": -1.3, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 14, "action": "UP", "reward": -0.1, "total_reward": -1.4000000000000001, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 15, "action": "UP", "reward": -0.1, "total_reward": -1.5000000000000002, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 16, "action": "UP", "reward": -0.1, "total_reward": -1.6000000000000003, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 17, "action": "COLLECT", "reward": 9.9, "total_reward": 8.3, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 18, "action": "RIGHT", "reward": -0.1, "total_reward": 8.200000000000001, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 19, "action": "DOWN", "reward": -0.1, "total_reward": 8.100000000000001, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 20, "action": "DOWN", "reward": -0.1, "total_reward": 8.000000000000002, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 21, "action": "DOWN", "reward": -0.1, "total_reward": 7.900000000000002, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 22, "action": "DOWN", "reward": -0.1, "total_reward": 7.8000000000000025, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 23, "action": "DOWN", "reward": -0.1, "total_reward": 7.700000000000003, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 24, "action": "DOWN", "reward": -0.1, "total_reward": 7.600000000000003, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 25, "action": "DOWN", "reward": -0.1, "total_reward": 7.5000000000000036, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 26, "action": "COLLECT", "reward": 9.9, "total_reward": 17.400000000000006, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 27, "action": "LEFT", "reward": -0.1, "total_reward": 17.300000000000004, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 28, "action": "LEFT", "reward": -0.1, "total_reward": 17.200000000000003, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 29, "action": "LEFT", "reward": -0.1, "total_reward": 17.1, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 30, "action": "LEFT", "reward": -0.1, "total_reward": 17.0, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 31, "action": "LEFT", "reward": -0.1, "total_reward": 16.9, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 32, "action": "LEFT", "reward": -5.1, "total_reward": 11.799999999999999, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 33, "action": "LEFT", "reward": -5.1, "total_reward": 6.699999999999999, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 34, "action": "LEFT", "reward": -5.1, "total_reward": 1.5999999999999996, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 35, "action": "LEFT", "reward": -5.1, "total_reward": -3.5, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 36, "action": "LEFT", "reward": -5.1, "total_reward": -8.6, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 37, "action": "LEFT", "reward": -5.1, "total_reward": -13.7, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 38, "action": "LEFT", "reward": -5.1, "total_reward": -18.799999999999997, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 39, "action": "LEFT", "reward": -5.1, "total_reward": -23.9, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 40, "action": "LEFT", "reward": -5.1, "total_reward": -29.0, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 41, "action": "LEFT", "reward": -5.1, "total_reward": -34.1, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 42, "action": "LEFT", "reward": -5.1, "total_reward": -39.2, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 43, "action": "LEFT", "reward": -5.1, "total_reward": -44.300000000000004, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 44, "action": "LEFT", "reward": -5.1, "total_reward": -49.400000000000006, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 45, "action": "LEFT", "reward": -5.1, "total_reward": -54.50000000000001, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 46, "action": "LEFT", "reward": -5.1, "total_reward": -59.60000000000001, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 47, "action": "LEFT", "reward": -5.1, "total_reward": -64.7, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 48, "action": "LEFT", "reward": -5.1, "total_reward": -69.8, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 49, "action": "LEFT", "reward": -5.1, "total_reward": -74.89999999999999, "done": false}
+[LLM ERROR] Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
+{"type": "[STEP]", "step": 50, "action": "LEFT", "reward": -5.1, "total_reward": -79.99999999999999, "done": false}
+{"type": "[END]", "task_id": "task_hard", "total_steps": 50, "final_reward": -79.99999999999999, "score": 0.4}

models.py ADDED Viewed

	@@ -0,0 +1,79 @@

+from pydantic import BaseModel, ConfigDict
+from typing import List, Literal, Optional, Tuple
+# --- Custom observation and action logic ---
+class Observation(BaseModel):
+    model_config = ConfigDict(strict=True)
+    grid_size: Tuple[int, int]
+    robot_position: Tuple[int, int]
+    garbage_positions: List[Tuple[int, int]]
+    obstacle_positions: List[Tuple[int, int]]
+    battery_level: int
+    inventory_count: int
+    message: str                        # Textual context for LLM
+    # ── Autonomous resource-management fields ──────────────────
+    home_position: Tuple[int, int]      # Charging station coordinates
+    unload_station: Tuple[int, int]     # Designated unload-corner coordinates
+    storage_capacity: int               # Max items robot can carry before unloading
+    current_storage_load: int           # Items currently held (resets after unload)
+    distance_from_home: int             # BFS steps to home (-1 if unreachable)
+    robot_mode: str                     # 'normal' | 'recharging' | 'unloading'
+class Action(BaseModel):
+    model_config = ConfigDict(strict=True)
+    command: Literal["UP", "DOWN", "LEFT", "RIGHT", "COLLECT"]
+# --- OpenEnv Standard Spec Models ---
+class State(BaseModel):
+    model_config = ConfigDict(strict=True)
+    task_id: Optional[str]
+    total_reward: float
+    steps_taken: int
+    done: bool
+    # ── Extended state for resource management ─────────────────
+    robot_mode: str = "normal"
+    current_storage_load: int = 0
+    battery_level: int = 0
+    distance_from_home: int = 0
+class ResetInput(BaseModel):
+    task_id: str = "task_easy"
+class CustomResetInput(BaseModel):
+    """
+    Fully dynamic reset — caller specifies the entire layout at runtime.
+    grid_size, robot_start, garbage positions, obstacles, battery, storage_capacity,
+    home_position and unload_station are all optional overrides on top of a base task_id.
+    Pass task_id='custom' to skip scenario defaults entirely.
+    """
+    task_id: str = "task_easy"
+    grid_size: Optional[Tuple[int, int]] = None
+    robot_start: Optional[Tuple[int, int]] = None
+    garbage_positions: Optional[List[Tuple[int, int]]] = None
+    obstacle_positions: Optional[List[Tuple[int, int]]] = None
+    max_battery: Optional[int] = None
+    storage_capacity: Optional[int] = None
+    home_position: Optional[Tuple[int, int]] = None
+    unload_station: Optional[Tuple[int, int]] = None
+class ResetOutput(BaseModel):
+    observation: Observation
+class StepOutput(BaseModel):
+    observation: Observation
+    reward: float
+    done: bool
+    info: dict = {}
+class Task(BaseModel):
+    id: str
+    name: str
+    description: str
+    difficulty: str
+    reward_range: List[float]

openenv.yaml ADDED Viewed

	@@ -0,0 +1,20 @@

+name: garbage-collecting-robot
+version: "1.0.0"
+description: "An OpenEnv-compliant reinforcement learning environment for a garbage collecting robot. The agent navigates a grid, picks up garbage, and manages its battery."
+type: logical-grid-world
+action_space:
+  type: discrete
+  description: "Movement and interaction commands: UP, DOWN, LEFT, RIGHT, COLLECT."
+observation_space:
+  type: object
+  description: "Grid state including robot position, garbage coordinates, inventory size, battery level, and a conversational text interpretation of the environment."
+tasks:
+  - id: task_easy
+    difficulty: easy
+    description: "Navigate a small 5x5 grid to collect 1 piece of garbage."
+  - id: task_medium
+    difficulty: medium
+    description: "Navigate a 7x7 grid to collect 3 pieces of garbage with limited battery."
+  - id: task_hard
+    difficulty: hard
+    description: "Navigate a 10x10 maze avoiding obstacles to collect 5 pieces of garbage with strict battery usage."

pyproject.toml ADDED Viewed

	@@ -0,0 +1,23 @@

+[project]
+name = "incident-response-triage"
+version = "1.0.0"
+description = "OpenEnv environment simulating production incident response and triage for SRE agents"
+readme = "README.md"
+requires-python = ">=3.10"
+dependencies = [
+    "fastapi>=0.110.0",
+    "uvicorn>=0.29.0",
+    "pydantic>=2.0.0",
+    "openai>=1.0.0",
+    "requests>=2.31.0",
+    "python-dotenv>=1.0.0",
+    "pyyaml>=6.0.0",
+    "openenv-core>=0.2.0",
+]
+[project.scripts]
+server = "server.app:main"
+[build-system]
+requires = ["setuptools>=68.0"]
+build-backend = "setuptools.backends._legacy:_Backend"

qlearning.py ADDED Viewed

	@@ -0,0 +1,345 @@

+"""
+qlearning.py — Tabular Q-Learning for the Garbage Collecting Robot.
+Training runs directly against GarbageRobotEnv (no HTTP server needed).
+The Q-table is persisted to disk as JSON and loaded by inference.py at startup.
+State representation:
+    (robot_x, robot_y, sorted_garbage_tuple)
+    e.g. (2, 3, ((1,1),(4,4)))   — compact, hashable, fully describes the relevant world
+Actions:
+    0=UP  1=DOWN  2=LEFT  3=RIGHT  4=COLLECT
+Usage:
+    # Train all tasks and save
+    python3 qlearning.py --train --episodes 8000
+    # Evaluate silently (uses saved Q-table)
+    python3 qlearning.py --eval
+Fix applied:
+    - load() previously had two separate key-reconstruction passes, where the
+      first pass result (variable `k`) was computed but then immediately discarded.
+      The second pass also misidentified the garbage sub-list when it had exactly
+      2 integer elements (treating [gx, gy] pairs as flat coords instead of a
+      tuple-of-tuples). Replaced both passes with a single, unambiguous decode:
+        parsed = [rx, ry, [[gx1,gy1],[gx2,gy2],...]]
+      where the third element is always the nested garbage list.
+"""
+import os
+import json
+import random
+import argparse
+from collections import defaultdict
+from environment import GarbageRobotEnv
+from scenarios import SCENARIOS
+# ── Constants ──────────────────────────────────────────────────────────────
+ACTIONS      = ["UP", "DOWN", "LEFT", "RIGHT", "COLLECT"]
+ACTION_IDX   = {a: i for i, a in enumerate(ACTIONS)}
+Q_TABLE_PATH = os.environ.get("Q_TABLE_PATH", "qtable.json")
+# ── Hyperparameters ─────────────────────────────────────────────────────────
+ALPHA         = 0.15
+GAMMA         = 0.97
+EPSILON_START = 1.0
+EPSILON_END   = 0.05
+EPSILON_DECAY = 0.9995
+# ── State Encoding ──────────────────────────────────────────────────────────
+def encode_state(obs: dict) -> tuple:
+    """
+    Convert a raw observation dict into a hashable tuple suitable as a Q-table key.
+    Key structure: (robot_x, robot_y, ((gx1,gy1),(gx2,gy2),...))
+    Garbage positions are sorted so order doesn't create phantom new states.
+    """
+    rx, ry = obs["robot_position"]
+    garbage = tuple(sorted((int(g[0]), int(g[1])) for g in obs["garbage_positions"]))
+    return (int(rx), int(ry), garbage)
+# ── Q-Table ─────────────────────────────────────────────────────────────────
+class QTable:
+    """
+    Dictionary-backed Q-table with defaultdict initialisation.
+    Values default to a small optimistic initial value to encourage exploration.
+    """
+    def __init__(self, optimistic_init: float = 0.5):
+        self.optimistic_init = optimistic_init
+        self._q: dict = {}
+    def _ensure(self, state: tuple):
+        if state not in self._q:
+            self._q[state] = [self.optimistic_init] * len(ACTIONS)
+    def get(self, state: tuple, action_idx: int) -> float:
+        self._ensure(state)
+        return self._q[state][action_idx]
+    def update(self, state: tuple, action_idx: int, value: float):
+        self._ensure(state)
+        self._q[state][action_idx] = value
+    def best_action(self, state: tuple) -> int:
+        """Return the index of the greedy best action."""
+        self._ensure(state)
+        return int(max(range(len(ACTIONS)), key=lambda i: self._q[state][i]))
+    def best_q(self, state: tuple) -> float:
+        self._ensure(state)
+        return max(self._q[state])
+    # ── Persistence ─────────────────────────────────────────────────────────
+    def save(self, path: str = Q_TABLE_PATH):
+        """
+        Serialise Q-table to JSON.
+        Key format saved to disk:
+            [rx, ry, [[gx1,gy1], [gx2,gy2], ...]]
+        This is unambiguous: element 0 and 1 are ints, element 2 is always a
+        list-of-lists, even when there is only one garbage piece.
+        """
+        serialisable = {}
+        for (rx, ry, garbage), v in self._q.items():
+            key = json.dumps([rx, ry, [list(g) for g in garbage]])
+            serialisable[key] = v
+        with open(path, "w") as f:
+            json.dump(serialisable, f)
+        print(f"[Q-Table] Saved {len(self._q):,} states → {path}")
+    def load(self, path: str = Q_TABLE_PATH) -> bool:
+        """
+        Load Q-table from JSON.
+        FIX: The previous implementation had two redundant key-reconstruction
+        loops. The first built variable `k` which was immediately discarded;
+        the second pass misclassified [gx, gy] pairs (lists of 2 ints) as flat
+        coordinates rather than garbage-position tuples, corrupting multi-garbage
+        states.
+        New single-pass decode relies on the unambiguous 3-element structure:
+            parsed[0] = rx  (int)
+            parsed[1] = ry  (int)
+            parsed[2] = [[gx1,gy1], ...]  (always a list-of-lists)
+        """
+        if not os.path.exists(path):
+            return False
+        with open(path, "r") as f:
+            raw = json.load(f)
+        self._q = {}
+        for k_str, v in raw.items():
+            parsed = json.loads(k_str)
+            # Robustly handle both new format [rx, ry, [[gx,gy],...]]
+            # and old format [rx, ry, [gx, gy]] (single garbage, flat list).
+            rx, ry = int(parsed[0]), int(parsed[1])
+            raw_garbage = parsed[2]
+            if raw_garbage and isinstance(raw_garbage[0], list):
+                # New / multi-garbage format: [[gx1,gy1],[gx2,gy2],...]
+                garbage = tuple(tuple(p) for p in raw_garbage)
+            elif raw_garbage and isinstance(raw_garbage[0], int):
+                # Old single-garbage flat format: [gx, gy]
+                garbage = (tuple(raw_garbage),)
+            else:
+                garbage = ()
+            self._q[(rx, ry, garbage)] = v
+        print(f"[Q-Table] Loaded {len(self._q):,} states ← {path}")
+        return True
+    def __len__(self):
+        return len(self._q)
+# ── Observation Helper ───────────────────────────────────────────────────────
+def _obs_from_env(env) -> dict:
+    """Build an obs dict directly from GarbageRobotEnv fields."""
+    obs_obj = env.get_observation()
+    return {
+        "robot_position":     obs_obj.robot_position,
+        "garbage_positions":  list(obs_obj.garbage_positions),
+        "obstacle_positions": list(obs_obj.obstacle_positions),
+        "grid_size":          obs_obj.grid_size,
+        "battery_level":      obs_obj.battery_level,
+        "inventory_count":    obs_obj.inventory_count,
+        "message":            obs_obj.message,
+        "robot_mode":         obs_obj.robot_mode,
+        "home_position":      obs_obj.home_position,
+        "unload_station":     obs_obj.unload_station,
+        "current_storage_load": obs_obj.current_storage_load,
+        "storage_capacity":   obs_obj.storage_capacity,
+        "distance_from_home": obs_obj.distance_from_home,
+    }
+# ── Training ─────────────────────────────────────────────────────────────────
+def train(
+    task_ids=None,
+    episodes: int = 8000,
+    qtable: QTable = None,
+    verbose: bool = True,
+) -> QTable:
+    """
+    Run Q-learning over the given task_ids for `episodes` total episodes.
+    Tasks are sampled uniformly so the agent generalises across difficulties.
+    """
+    if task_ids is None:
+        task_ids = list(SCENARIOS.keys())
+    if qtable is None:
+        qtable = QTable()
+    env     = GarbageRobotEnv()
+    epsilon = EPSILON_START
+    best_scores: dict = {t: 0.0 for t in task_ids}
+    for ep in range(1, episodes + 1):
+        task_id = random.choice(task_ids)
+        env.reset(task_id)
+        obs       = _obs_from_env(env)
+        state     = encode_state(obs)
+        total_reward = 0.0
+        done         = False
+        while not done:
+            if random.random() < epsilon:
+                action_idx = random.randrange(len(ACTIONS))
+            else:
+                action_idx = qtable.best_action(state)
+            action     = ACTIONS[action_idx]
+            result     = env.step(action)
+            next_obs   = result["observation"]
+            reward     = result["reward"]
+            done       = result["done"]
+            next_state = encode_state(next_obs)
+            # Bellman update
+            old_q     = qtable.get(state, action_idx)
+            td_target = reward + (0.0 if done else GAMMA * qtable.best_q(next_state))
+            new_q     = old_q + ALPHA * (td_target - old_q)
+            qtable.update(state, action_idx, new_q)
+            state = next_state
+            obs   = next_obs
+            total_reward += reward
+        score = env.grade(task_id)
+        if score > best_scores[task_id]:
+            best_scores[task_id] = score
+        epsilon = max(EPSILON_END, epsilon * EPSILON_DECAY)
+        if verbose and ep % 500 == 0:
+            avg_best = sum(best_scores.values()) / len(best_scores)
+            print(
+                f"  Ep {ep:5d}/{episodes}  ε={epsilon:.4f}  "
+                f"states={len(qtable):,}  "
+                f"best_scores={best_scores}  avg={avg_best:.2f}"
+            )
+    return qtable
+# ── Inference Helper (used by inference.py) ─────────────────��────────────────
+class QLearningAgent:
+    """
+    Thin wrapper around a loaded Q-table for use by inference.py.
+    Falls through (returns None) when the state has never been seen during training.
+    """
+    def __init__(self, path: str = Q_TABLE_PATH):
+        self.qtable = QTable()
+        self.loaded = self.qtable.load(path)
+    def get_action(self, obs: dict) -> str | None:
+        if not self.loaded:
+            return None
+        state = encode_state(obs)
+        if state not in self.qtable._q:
+            return None
+        return ACTIONS[self.qtable.best_action(state)]
+# ── Evaluation ───────────────────────────────────────────────────────────────
+def evaluate(qtable: QTable, task_ids=None, runs: int = 5) -> dict:
+    """Run `runs` greedy episodes per task and return average scores."""
+    if task_ids is None:
+        task_ids = list(SCENARIOS.keys())
+    env     = GarbageRobotEnv()
+    results = {}
+    for task_id in task_ids:
+        scores = []
+        for _ in range(runs):
+            env.reset(task_id)
+            obs  = _obs_from_env(env)
+            done = False
+            while not done:
+                state      = encode_state(obs)
+                action_idx = qtable.best_action(state)
+                result     = env.step(ACTIONS[action_idx])
+                obs        = result["observation"]
+                done       = result["done"]
+            scores.append(env.grade(task_id))
+        avg = sum(scores) / len(scores)
+        results[task_id] = round(avg, 3)
+        print(f"  {task_id:12s}  avg score = {avg:.3f}  ({scores})")
+    return results
+# ── CLI Entry Point ───────────────────────────────────────────────────────────
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Q-Learning for Garbage Robot")
+    parser.add_argument("--train",    action="store_true", help="Run training")
+    parser.add_argument("--eval",     action="store_true", help="Run evaluation only")
+    parser.add_argument("--episodes", type=int, default=8000)
+    parser.add_argument("--tasks",    nargs="+", default=None)
+    parser.add_argument("--output",   default=Q_TABLE_PATH)
+    args = parser.parse_args()
+    if args.train:
+        print("=" * 55)
+        print("  Q-Learning Training — Garbage Collecting Robot")
+        print("=" * 55)
+        task_ids = args.tasks or list(SCENARIOS.keys())
+        print(f"  Tasks    : {task_ids}")
+        print(f"  Episodes : {args.episodes}")
+        print(f"  α={ALPHA}  γ={GAMMA}  ε {EPSILON_START}→{EPSILON_END}  decay={EPSILON_DECAY}")
+        print()
+        qt = train(task_ids=task_ids, episodes=args.episodes, verbose=True)
+        qt.save(args.output)
+        print("\n  — Evaluation on greedy policy —")
+        evaluate(qt, task_ids)
+    elif args.eval:
+        print("=" * 55)
+        print("  Q-Learning Evaluation")
+        print("=" * 55)
+        qt = QTable()
+        if not qt.load(args.output):
+            print(f"[ERROR] No Q-table found at {args.output}. Run with --train first.")
+        else:
+            evaluate(qt)
+    else:
+        parser.print_help()

qtable.json ADDED Viewed

The diff for this file is too large to render. See raw diff

requirements.txt ADDED Viewed

	@@ -0,0 +1,19 @@

+# Core server dependencies
+fastapi>=0.110.0
+uvicorn[standard]>=0.29.0
+pydantic>=2.0.0
+# HTTP client (used by inference.py and test_env.py)
+requests>=2.31.0
+openai>=1.0.0
+# ── Optional: only needed if running the fine-tuned LLM locally ──────────
+# Uncomment these if your Space has a GPU runtime.
+torch>=2.1.0
+transformers>=4.40.0
+accelerate>=0.27.0
+bitsandbytes>=0.43.0
+# ── Optional: only needed for Q-learning training ─────────────────────────
+# (training is done offline; the saved qtable.json is loaded at runtime)
+# No extra deps required — qlearning.py uses stdlib only.

rl_trajectories.jsonl ADDED Viewed

The diff for this file is too large to render. See raw diff

run_pipeline.sh ADDED Viewed

	@@ -0,0 +1,17 @@

+#!/bin/bash
+# Activate the existing virtual environment where dependencies are being installed
+source venv/bin/activate
+echo "Ensuring pip dependencies are installed and PyTorch is active..."
+# Install remaining dependencies. Pip will use lock files to wait or pass if already installed by the background process.
+pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" datasets trl peft transformers bitsandbytes --no-cache-dir
+echo "======================================"
+echo " Starting Unsloth LoRA Fine-Tuning... "
+echo "======================================"
+# Run the python script and route errors to standard out
+python train_unsloth.py > train_output.log 2>&1
+echo "Process completed. Check train_output.log for details."

scenarios.py ADDED Viewed

	@@ -0,0 +1,50 @@

+from typing import Tuple, List, Dict, Any
+SCENARIOS: Dict[str, Dict[str, Any]] = {
+    "task_easy": {
+        "grid_size": (5, 5),
+        "robot_start": (0, 0),
+        "garbage_starts": [(4, 4)],
+        "obstacle_starts": [],
+        "max_battery": 30,
+        # ── Resource management ────────────────────────────────
+        # Home (charging station) is the robot's spawn point.
+        "home_position": (0, 0),
+        # Unload corner is the cell diagonally opposite to home.
+        "unload_station": (4, 0),
+        # 1 garbage piece; capacity=1 forces an unload cycle before finishing,
+        # demonstrating the mechanic even on the simplest task.
+        "storage_capacity": 6,
+    },
+    "task_medium": {
+        "grid_size": (7, 7),
+        "robot_start": (3, 3),
+        "garbage_starts": [(1, 1), (5, 5), (1, 5)],
+        "obstacle_starts": [(2, 2), (2, 3), (2, 4), (4, 2), (4, 3), (4, 4)],
+        "max_battery": 50,
+        # ── Resource management ────────────────────────────────
+        "home_position": (3, 3),
+        # Far corner from centre home — no obstacles there.
+        "unload_station": (6, 0),
+        # Capacity 2 out of 3 garbage pieces forces exactly one unload cycle.
+        "storage_capacity": 6,
+    },
+    "task_hard": {
+        "grid_size": (10, 10),
+        "robot_start": (0, 0),
+        "garbage_starts": [(8, 8), (9, 1), (1, 9), (5, 5), (8, 2)],
+        "obstacle_starts": [
+            (1, 1), (1, 2), (1, 3), (1, 4),
+            (3, 1), (3, 2), (3, 3), (3, 4),
+            (6, 5), (6, 6), (6, 7), (6, 8),   # shifted so (5,5) stays clear for garbage
+            (7, 7), (7, 8), (7, 9),
+        ],
+        "max_battery": 80,
+        # ── Resource management ────────────────────────────────
+        "home_position": (0, 0),
+        # Bottom-right corner — clear of all obstacles.
+        "unload_station": (9, 0),
+        # Capacity 2 out of 5 garbage pieces → two unload cycles required.
+        "storage_capacity": 6,
+    },
+}

server.log ADDED Viewed

Binary file (92.3 kB). View file

server.pid ADDED Viewed

	@@ -0,0 +1 @@


1	+ 61938

test_env.py ADDED Viewed

	@@ -0,0 +1,19 @@

+import requests
+ENV_URL = "http://localhost:7860"
+print("Resetting task_easy...")
+res = requests.post(f"{ENV_URL}/reset", json={"task_id": "task_easy"})
+print("Observation:", res.json()["observation"])
+print("\nStepping UP...")
+res = requests.post(f"{ENV_URL}/step", json={"command": "UP"})
+print("Result:", res.json())
+print("\nStepping UP...")
+res = requests.post(f"{ENV_URL}/step", json={"command": "UP"})
+print("Result:", res.json())
+print("\nGrading...")
+res = requests.get(f"{ENV_URL}/grade/task_easy")
+print("Grade:", res.json())

train.pid ADDED Viewed

	@@ -0,0 +1 @@


1	+ 382219

train_output.log ADDED Viewed

	@@ -0,0 +1,4 @@

+Traceback (most recent call last):
+  File "/home/robotics-mu/Downloads/Meta Hackathon/train_unsloth.py", line 13, in <module>
+    from datasets import Dataset
+ModuleNotFoundError: No module named 'datasets'

train_unsloth.py ADDED Viewed

	@@ -0,0 +1,144 @@

+"""
+Fine-tuning Llama-3.2-3B-Instruct with Unsloth for the Garbage Collecting Robot.
+Training data: fixed_dataset.jsonl  (generated by code2.py + fixer.py)
+  Format: {"user": "### Instruction:\n...\n\n### Input:\nENVIRONMENT STATUS:\n...", "assistant": "UP|DOWN|LEFT|RIGHT|COLLECT"}
+Base model: unsloth/llama-3.2-3b-instruct-bnb-4bit  (same as Unsloth Studio run)
+Export:     lora_garbage_robot/  (LoRA adapter)
+"""
+import os
+import json
+from datasets import Dataset
+max_seq_length = 512   # Prompts are short; 512 is well above the longest sample
+dtype          = None  # Auto-detect (float16 on T4, bfloat16 on Ampere+)
+load_in_4bit   = True
+# ── Alpaca prompt — MUST match fixed_dataset.jsonl / code2.py / app.py ──────
+ALPACA_TEMPLATE = (
+    "### Instruction:\n{instruction}\n\n"
+    "### Input:\nENVIRONMENT STATUS:\n{input}\n\n"
+    "### Response:\n{response}"
+)
+INSTRUCTION = (
+    "You are an AI brain controlling a garbage collecting robot.\n"
+    "Reply with EXACTLY ONE of: UP DOWN LEFT RIGHT COLLECT"
+)
+EOS_TOKEN = None   # filled in after tokenizer loads
+def load_fixed_dataset(path: str = "fixed_dataset.jsonl") -> Dataset:
+    """
+    Load fixed_dataset.jsonl produced by fixer.py.
+    Each row: {"user": "<### Instruction:...### Input:...>", "assistant": "<ACTION>"}
+    We re-format into the full Alpaca text so the model sees input + target in one string.
+    """
+    rows = []
+    with open(path, "r") as f:
+        for line in f:
+            row = json.loads(line)
+            user_text  = row["user"]      # already contains ### Instruction + ### Input
+            assistant  = row["assistant"] # e.g. "RIGHT"
+            # Extract the environment status message from the user field
+            try:
+                env_status = user_text.split("ENVIRONMENT STATUS:\n")[1].strip()
+            except IndexError:
+                continue   # skip malformed rows
+            text = ALPACA_TEMPLATE.format(
+                instruction=INSTRUCTION,
+                input=env_status,
+                response=assistant,
+            ) + (EOS_TOKEN or "")
+            rows.append({"text": text})
+    print(f"[Dataset] Loaded {len(rows):,} samples from {path}")
+    return Dataset.from_list(rows)
+def main():
+    from unsloth import FastLanguageModel
+    from trl import SFTTrainer
+    from transformers import TrainingArguments
+    global EOS_TOKEN
+    print("=" * 60)
+    print("  Fine-tuning Llama-3.2-3B-Instruct — Garbage Robot")
+    print("=" * 60)
+    # ── 1. Load base model (same as Unsloth Studio session) ──────────────────
+    print("\n[1/4] Loading base model …")
+    model, tokenizer = FastLanguageModel.from_pretrained(
+        model_name    = "unsloth/llama-3.2-3b-instruct-bnb-4bit",
+        max_seq_length = max_seq_length,
+        dtype          = dtype,
+        load_in_4bit   = load_in_4bit,
+    )
+    EOS_TOKEN = tokenizer.eos_token   # fill in for dataset formatting
+    # ── 2. Add LoRA adapters ─────────────────────────────────────────────────
+    print("[2/4] Attaching LoRA adapters …")
+    model = FastLanguageModel.get_peft_model(
+        model,
+        r                     = 16,
+        target_modules        = ["q_proj", "k_proj", "v_proj", "o_proj",
+                                  "gate_proj", "up_proj", "down_proj"],
+        lora_alpha            = 16,
+        lora_dropout          = 0,
+        bias                  = "none",
+        use_gradient_checkpointing = "unsloth",
+        random_state          = 3407,
+        use_rslora            = False,
+        loftq_config          = None,
+    )
+    # ── 3. Load dataset ──────────────────────────────────────────────────────
+    print("[3/4] Loading fixed_dataset.jsonl …")
+    dataset = load_fixed_dataset("fixed_dataset.jsonl")
+    # ── 4. Train ─────────────────────────────────────────────────────────────
+    print("[4/4] Starting fine-tuning …")
+    trainer = SFTTrainer(
+        model              = model,
+        tokenizer          = tokenizer,
+        train_dataset      = dataset,
+        dataset_text_field = "text",
+        max_seq_length     = max_seq_length,
+        dataset_num_proc   = 2,
+        packing            = True,   # efficient for short sequences
+        args = TrainingArguments(
+            per_device_train_batch_size  = 4,
+            gradient_accumulation_steps  = 4,
+            warmup_ratio                 = 0.03,
+            num_train_epochs             = 1,
+            learning_rate                = 2e-4,
+            fp16  = not FastLanguageModel.is_bfloat16_supported(),
+            bf16  = FastLanguageModel.is_bfloat16_supported(),
+            logging_steps   = 10,
+            optim           = "adamw_8bit",
+            weight_decay    = 0.01,
+            lr_scheduler_type = "cosine",
+            seed            = 3407,
+            output_dir      = "outputs",
+            save_strategy   = "epoch",
+        ),
+    )
+    trainer_stats = trainer.train()
+    print(f"\nTraining complete. Loss: {trainer_stats.training_loss:.4f}")
+    # ── Save LoRA adapter ────────────────────────────────────────────────────
+    model.save_pretrained("lora_garbage_robot")
+    tokenizer.save_pretrained("lora_garbage_robot")
+    print("\nLoRA adapter saved to: lora_garbage_robot/")
+    print("To export a merged model, use Unsloth Studio → Export → Merged Model.")
+if __name__ == "__main__":
+    main()