Spaces:

gpue
/

nova-sim

Paused

Georg commited on Jan 19

Commit

c525614

1 Parent(s): fc65a06

Enhance mujoco_server.py and UR5 environment for improved teleoperation and episode control

- Introduced new mechanisms for handling teleoperation commands, including a snapshot of the last command for better state management.
- Updated the reward calculation in the UR5 environment to prioritize task rewards, with a fallback to distance-based rewards.
- Added gym-style control message handling over the unified WebSocket, allowing for richer interactions and state updates.
- Implemented a robust episode control flag system to manage episode termination and truncation without interfering with the gym client's lifecycle.
- Enhanced documentation in README.md to clarify server startup options and the unified WebSocket API for trainers and clients.
- Added unit tests to verify the functionality of the episode control mechanism, ensuring thread safety and correct behavior across multiple episodes.

Files changed (5) hide show

README.md +50 -23
mujoco_server.py +180 -55
robots/ur5/ur5_env.py +12 -4
tests/test_api.py +2 -2
tests/test_episode_control_fix.py +167 -0

README.md CHANGED Viewed

@@ -115,12 +115,20 @@ git clone --recurse-submodules https://github.com/iit-DLSLab/Quadruped-PyMPC
 cd Quadruped-PyMPC
 pip install -e .
-# Start the server
 python mujoco_server.py
 # Open browser at http://localhost:3004/nova-sim/api/v1
 ```
 ## Docker Deployment
 ### Getting Started
@@ -381,12 +389,14 @@ ws.send(JSON.stringify({type: 'command', data: {vx: 0.5, vy: 0, vyaw: 0}}));
 // Receive messages
 ws.onmessage = (event) => {
-  const msg = JSON.parse(event.data);
-  if (msg.type === 'state') {
-    console.log(msg.data);
-  }
-};
-```
 #### Client → Server Messages
@@ -397,10 +407,23 @@ All messages are JSON with `{type, data}` structure:
 | `command` | `{vx, vy, vyaw}` | Set velocity command |
 | `reset` | `{}` | Reset robot to standing pose |
 | `switch_robot` | `{robot, scene?}` | Switch active robot and optional scene |
 | `camera` | `{action, ...}` | Camera control |
 | `camera_follow` | `{follow}` | Toggle camera follow mode |
 | `teleop_command` | `{dx, dy, dz}` | Apply incremental cartesian jog command (UI teleop) |
 **`command`:**
 ```json
 {"type": "command", "data": {"vx": 0.5, "vy": 0.0, "vyaw": 0.0}}
@@ -493,9 +516,10 @@ For locomotion robots (G1, Spot):
     "base_height": 0.46,
     "upright": 0.98,
     "steps": 1234,
-    "vx": 0.5,
-    "vy": 0.0,
-    "vyaw": 0.0
   }
 }
 ```
@@ -515,6 +539,10 @@ For robot arm (UR5):
     "control_mode": "ik",
     "use_orientation": true,
     "steps": 1234,
     "nova_api": {
       "connected": true,
       "state_streaming": true,
@@ -530,21 +558,21 @@ For robot arm (UR5):
   - `connected`: Whether Nova API client is connected
   - `state_streaming`: Whether using Nova API for robot state streaming (vs. internal)
   - `ik`: Whether using Nova API for inverse kinematics (vs. internal)
-### Gym WebSocket API (RL/IL)
-The gym-style API is exposed at `ws://localhost:3004/nova-sim/api/v1/gym/ws`.
-It supports `reset`, `step`, `configure`, and `get_spaces`.
-Example request payloads:
-```json
-{"type": "configure", "data": {"robot": "ur5_t_push"}}
-{"type": "reset"}
-{"type": "step", "data": {"action": [0,0,0,0,0,0,0], "render": false}}
-```
-The server responds with `gym_reset`, `gym_step`, `gym_spaces`, or `gym_configured` messages.
 ### HTTP Endpoints
@@ -803,8 +831,7 @@ The Nova API integration is implemented in:
 2. Keep it running at `http://localhost:3004` so the HTTP/websocket endpoints stay reachable.
 3. Run `pytest nova-sim/tests` to exercise:
    - API endpoints (`/metadata`, `/camera/<name>/video_feed`, `/video_feed`)
-   - WebSocket control (`/ws`)
-   - Gym-style websocket (`/gym/ws`)
    - Auxiliary MJPEG overlays after switching to the T-push UR5 scene
 The tests assume the server is accessible via `http://localhost:3004/nova-sim/api/v1` and will skip automatically if the API is unreachable.

 cd Quadruped-PyMPC
 pip install -e .
+# Start the server (default reward threshold: -0.1)
 python mujoco_server.py
+# Or with custom reward threshold for auto episode termination
+python mujoco_server.py --reward-threshold -0.05  # Stricter (5cm from target)
+python mujoco_server.py --reward-threshold -0.2   # Lenient (20cm from target)
 # Open browser at http://localhost:3004/nova-sim/api/v1
 ```
+**Reward Threshold**: Episodes automatically terminate when the robot reaches within the specified distance of the target. See [REWARD_THRESHOLD.md](REWARD_THRESHOLD.md) for details.
+```
 ## Docker Deployment
 ### Getting Started
 // Receive messages
 ws.onmessage = (event) => {
+    const msg = JSON.parse(event.data);
+    if (msg.type === 'state') {
+      console.log(msg.data);
+    }
+  };
+ ```
+Nova-Sim uses `/ws` as the shared control channel for the browser UI, trainers, and any RL clients. Every UI interaction (teleop, camera controls, robot switching) and the trainer handshake/notifications flows through this single socket; the UI `state` messages shown below now also carry the action deltas, integrated reward, and trainer connection status that RL agents need. The legacy `/nova-sim/api/v1/gym/ws` endpoint remains for backwards compatibility, but new clients should talk to `/ws` instead.
 #### Client → Server Messages
 | `command` | `{vx, vy, vyaw}` | Set velocity command |
 | `reset` | `{}` | Reset robot to standing pose |
 | `switch_robot` | `{robot, scene?}` | Switch active robot and optional scene |
+| `configure` | `{robot, scene?}` | Reconfigure the active robot/scene (reuses the UI configure flow) |
 | `camera` | `{action, ...}` | Camera control |
 | `camera_follow` | `{follow}` | Toggle camera follow mode |
 | `teleop_command` | `{dx, dy, dz}` | Apply incremental cartesian jog command (UI teleop) |
+RL agents, trainers, and CLI tools connect to `/ws` as well by sending gym-style control packets that include a request `id`. The server replies with `gym_*` responses so your existing clients keep working while also benefiting from the richer state stream.
+**Gym-style RL/RL control messages** (use `id` and expect a `gym_*` response):
+```
+{"type": "reset", "id": 42}
+{"type": "step", "data": {"action": [0,0,0,0,0,0,0], "render": false}, "id": 43}
+{"type": "configure", "data": {"robot": "ur5_t_push"}, "id": 44}
+{"type": "get_spaces", "id": 45}
+{"type": "close", "id": 46}
+```
+The server replies with `gym_reset`, `gym_step`, `gym_configured`, `gym_spaces`, or `gym_closed` (matching the `id` you supplied). Every message still flows over `/ws`, so UI teleop commands, state updates, and gym responses share the same connection.
 **`command`:**
 ```json
 {"type": "command", "data": {"vx": 0.5, "vy": 0.0, "vyaw": 0.0}}
     "base_height": 0.46,
     "upright": 0.98,
     "steps": 1234,
+    "command": {"vx": 0.5, "vy": 0.0, "vyaw": 0.0},
+    "reward": 0.0,
+    "teleop_command": {"dx": 0.05, "dy": 0.0, "dz": 0.0},
+    "trainer_connected": true
   }
 }
 ```
     "control_mode": "ik",
     "use_orientation": true,
     "steps": 1234,
+    "reward": -0.25,
+    "command": {"vx": 0.1, "vy": 0.0, "vyaw": 0.0},
+    "teleop_command": {"dx": 0.02, "dy": 0.0, "dz": 0.0},
+    "trainer_connected": true,
     "nova_api": {
       "connected": true,
       "state_streaming": true,
   - `connected`: Whether Nova API client is connected
   - `state_streaming`: Whether using Nova API for robot state streaming (vs. internal)
   - `ik`: Whether using Nova API for inverse kinematics (vs. internal)
+- `command`: The latest velocity command (`vx`, `vy`, `vyaw`) that drives locomotion or arm movement.
+- `teleop_command`: The most recent UI teleop delta (`dx`, `dy`, `dz`) so trainers know how the UI nudged the robot.
+- `reward`: The integrated task reward from the simulator that remote trainers can consume.
+- `trainer_connected`: Whether a trainer handshake is active on `/ws` (useful for status LEDs).
+### State broadcasts and trainer notifications
+Every `/ws` client receives a `state` message roughly every 100 ms. The examples above show the locomotion (`spot`) and arm (`ur5`) payloads; the payload also now includes:
+- `command`: The last velocity command that drives locomotion or arm motion (`vx`, `vy`, `vyaw`).
+- `teleop_command`: The latest UI teleop delta (`dx`, `dy`, `dz`) so trainers know how the browser nudged the robot.
+- `reward`: The integrated task reward that trainers can consume without sending a separate `step`.
+- `trainer_connected`: Whether a trainer handshake is active on `/ws` (used to update the UI indicator).
+Trainers announce themselves by sending a `trainer_identity` payload when the socket opens. The server mirrors that information into the `trainer_status` broadcasts (`trainer_status` messages flow to every UI client) and lets trainers emit `notification` payloads that the UI receives as `trainer_notification` events.
 ### HTTP Endpoints
 2. Keep it running at `http://localhost:3004` so the HTTP/websocket endpoints stay reachable.
 3. Run `pytest nova-sim/tests` to exercise:
    - API endpoints (`/metadata`, `/camera/<name>/video_feed`, `/video_feed`)
+   - Unified WebSocket control (`/ws`)
    - Auxiliary MJPEG overlays after switching to the T-push UR5 scene
 The tests assume the server is accessible via `http://localhost:3004/nova-sim/api/v1` and will skip automatically if the API is unreachable.

mujoco_server.py CHANGED Viewed

@@ -9,7 +9,7 @@ import cv2
 import numpy as np
 import mujoco
 from pathlib import Path
-from typing import Any
 from flask import Flask, Response, render_template_string, request, jsonify
 from flask_sock import Sock
@@ -98,6 +98,10 @@ episode_control_state = {
 }
 episode_control_lock = threading.Lock()
 # WebSocket clients
 ws_clients = set()
 ws_clients_lock = threading.Lock()
@@ -105,6 +109,8 @@ trainer_ws_clients = set()
 trainer_ws_clients_lock = threading.Lock()
 trainer_client_metadata: dict = {}
 trainer_client_metadata_lock = threading.Lock()
 # Camera state for orbit controls
 cam = mujoco.MjvCamera()
@@ -199,7 +205,6 @@ UR5_T_PUSH_OVERLAY_PRESETS = [
 OVERLAY_CAMERA_PRESETS = {
     "ur5_t_push": UR5_T_PUSH_OVERLAY_PRESETS,
-    "scene_t_push": UR5_T_PUSH_OVERLAY_PRESETS,
 }
 CAMERA_FEEDS = [
@@ -484,6 +489,16 @@ def broadcast_state():
         obs = env._get_obs()
         cmd = env.get_command()
         steps = env.steps
         with trainer_ws_clients_lock:
             trainer_connected = len(trainer_ws_clients) > 0
@@ -533,7 +548,9 @@ def broadcast_state():
                     'control_mode': control_mode,
                     'use_orientation': use_orientation,
                     'steps': int(steps),
-                    'reward': env.get_task_reward(),
                     'nova_api': {
                         'connected': nova_connected,
                         'state_streaming': nova_state_streaming,
@@ -557,9 +574,12 @@ def broadcast_state():
                     'base_height': base_height,
                     'upright': upright,
                     'steps': int(steps),
                     'vx': float(cmd[0]),
                     'vy': float(cmd[1]),
                     'vyaw': float(cmd[2]),
                     'trainer_connected': trainer_connected
                 }
             })
@@ -576,6 +596,25 @@ def broadcast_state():
         ws_clients.difference_update(dead_clients)
 def _build_trainer_status_payload():
     """Build a summary payload describing connected trainer clients."""
     with trainer_ws_clients_lock:
@@ -650,6 +689,106 @@ def broadcast_notification_to_ui(payload: dict):
         ws_clients.difference_update(dead_clients)
 def _signal_episode_control(action: str):
     """Set episode control flags and notify trainer/UI clients."""
     action = (action or "").lower()
@@ -828,12 +967,19 @@ def generate_overlay_frames(name: str):
         time.sleep(0.04)
-def handle_ws_message(data):
     """Handle incoming WebSocket message."""
-    global needs_robot_switch, camera_follow
     msg_type = data.get('type')
     if msg_type == 'command':
         payload = data.get('data', {})
         vx = payload.get('vx', 0.0)
@@ -868,6 +1014,14 @@ def handle_ws_message(data):
         print(f"Robot switch requested: {robot} / scene: {scene}")
         needs_robot_switch = {"robot": robot, "scene": scene}
     elif msg_type == 'camera':
         payload = data.get('data', {})
         a = cam.azimuth * np.pi / 180.0
@@ -913,6 +1067,17 @@ def handle_ws_message(data):
             else:
                 updated_target = None
         broadcast_to_trainer(
             "teleop_command",
             {
@@ -1057,6 +1222,7 @@ def _create_env(robot, scene):
         sys.path.insert(0, ur5_dir)
         from ur5_env import UR5Env
         sys.path.pop(0)
         if scene:
             return UR5Env(render_mode="rgb_array", width=RENDER_WIDTH, height=RENDER_HEIGHT, scene_name=scene)
         return UR5Env(render_mode="rgb_array", width=RENDER_WIDTH, height=RENDER_HEIGHT)
@@ -1134,7 +1300,7 @@ def websocket_handler(ws):
                 break
             try:
                 data = json.loads(message)
-                handle_ws_message(data)
             except json.JSONDecodeError:
                 print(f"Invalid JSON received: {message}")
             except Exception as e:
@@ -1145,53 +1311,17 @@ def websocket_handler(ws):
         # Unregister client
         with ws_clients_lock:
             ws_clients.discard(ws)
-        print('WebSocket client disconnected')
-@sock.route(f'{API_PREFIX}/trainer/ws')
-def trainer_websocket_handler(ws):
-    """Handle trainer WebSocket clients (teleop commands + notifications)."""
-    print('Trainer client connected')
-    with trainer_ws_clients_lock:
-        trainer_ws_clients.add(ws)
-    _register_trainer_client(ws)
-    broadcast_trainer_connection_status()
-    broadcast_state()
-    try:
-        while True:
-            message = ws.receive()
-            if message is None:
-                break
-            try:
-                data = json.loads(message)
-                _handle_trainer_message(ws, data)
-            except json.JSONDecodeError:
-                print(f"Trainer sent invalid JSON: {message}")
-            except Exception as exc:
-                print(f"Error handling trainer message: {exc}")
-    finally:
         with trainer_ws_clients_lock:
             trainer_ws_clients.discard(ws)
-        _unregister_trainer_client(ws)
-    print('Trainer client disconnected')
-    broadcast_state()
-    broadcast_trainer_connection_status()
-def _handle_trainer_message(ws, data):
-    """Process messages sent from the training client."""
-    msg_type = data.get("type")
-    if msg_type == "trainer_identity":
-        payload = data.get("data", {}) or {}
-        identity = payload.get("trainer_id") or payload.get("trainer_name") or payload.get("name") or "trainer"
-        _set_trainer_identity(ws, identity)
-        broadcast_trainer_connection_status()
-        return
-    if msg_type == "notification":
-        payload = data.get("data", {})
-        payload.setdefault("timestamp", time.time())
-        broadcast_notification_to_ui(payload)
 @sock.route(f'{API_PREFIX}/gym/ws')
@@ -1227,6 +1357,7 @@ def gym_websocket_handler(ws):
                 elif msg_type == "step":
                     action = payload.get("action", [])
                     obs, reward, terminated, truncated, info = session.step(action)
                     response = {
                         "type": "gym_step",
                         "data": {
@@ -1801,12 +1932,6 @@ def index():
                                     <span>Z nudge</span>
                                 </span>
                             </li>
-                            <li>
-                                <span class="hint-key">
-                                    <kbd>Enter</kbd>
-                                    <span>End episode</span>
-                                </span>
-                            </li>
                         </ul>
                     </div>
                     <div class="robot-info" id="robot_info">

 import numpy as np
 import mujoco
 from pathlib import Path
+from typing import Any, Optional
 from flask import Flask, Response, render_template_string, request, jsonify
 from flask_sock import Sock
 }
 episode_control_lock = threading.Lock()
+# Latest teleoperation command (for trainer state)
+last_teleop_command: Optional[dict[str, Any]] = None
+teleop_lock = threading.Lock()
 # WebSocket clients
 ws_clients = set()
 ws_clients_lock = threading.Lock()
 trainer_ws_clients_lock = threading.Lock()
 trainer_client_metadata: dict = {}
 trainer_client_metadata_lock = threading.Lock()
+gym_sessions: dict = {}
+gym_sessions_lock = threading.Lock()
 # Camera state for orbit controls
 cam = mujoco.MjvCamera()
 OVERLAY_CAMERA_PRESETS = {
     "ur5_t_push": UR5_T_PUSH_OVERLAY_PRESETS,
 }
 CAMERA_FEEDS = [
         obs = env._get_obs()
         cmd = env.get_command()
         steps = env.steps
+        command = {
+            "vx": float(cmd[0]) if len(cmd) > 0 else 0.0,
+            "vy": float(cmd[1]) if len(cmd) > 1 else 0.0,
+            "vyaw": float(cmd[2]) if len(cmd) > 2 else 0.0,
+        }
+        with teleop_lock:
+            teleop_snapshot = last_teleop_command.copy() if last_teleop_command else None
+        reward_value = None
+        if hasattr(env, "get_task_reward"):
+            reward_value = env.get_task_reward()
         with trainer_ws_clients_lock:
             trainer_connected = len(trainer_ws_clients) > 0
                     'control_mode': control_mode,
                     'use_orientation': use_orientation,
                     'steps': int(steps),
+                    'reward': reward_value,
+                    'command': command,
+                    'teleop_command': teleop_snapshot,
                     'nova_api': {
                         'connected': nova_connected,
                         'state_streaming': nova_state_streaming,
                     'base_height': base_height,
                     'upright': upright,
                     'steps': int(steps),
+                    'command': command,
+                    'teleop_command': teleop_snapshot,
                     'vx': float(cmd[0]),
                     'vy': float(cmd[1]),
                     'vyaw': float(cmd[2]),
+                    'reward': reward_value,
                     'trainer_connected': trainer_connected
                 }
             })
         ws_clients.difference_update(dead_clients)
+def _handle_trainer_message(ws, data):
+    """Process message payloads originating from trainers."""
+    msg_type = data.get("type")
+    if msg_type == "trainer_identity":
+        payload = data.get("data", {}) or {}
+        identity = payload.get("trainer_id") or payload.get("trainer_name") or payload.get("name") or "trainer"
+        with trainer_ws_clients_lock:
+            trainer_ws_clients.add(ws)
+        _register_trainer_client(ws)
+        _set_trainer_identity(ws, identity)
+        broadcast_state()
+        broadcast_trainer_connection_status()
+        return
+    if msg_type == "notification":
+        payload = data.get("data", {})
+        payload.setdefault("timestamp", time.time())
+        broadcast_notification_to_ui(payload)
 def _build_trainer_status_payload():
     """Build a summary payload describing connected trainer clients."""
     with trainer_ws_clients_lock:
         ws_clients.difference_update(dead_clients)
+def _safe_ws_send(ws, message: dict):
+    """Send JSON message over WebSocket without raising."""
+    try:
+        ws.send(json.dumps(message))
+    except Exception:
+        pass
+def _get_or_create_gym_session(ws):
+    with gym_sessions_lock:
+        session = gym_sessions.get(ws)
+        if session is None:
+            session = GymSession()
+            gym_sessions[ws] = session
+        return session
+def _remove_gym_session(ws):
+    with gym_sessions_lock:
+        session = gym_sessions.pop(ws, None)
+    if session:
+        session.close()
+def _handle_gym_ws_message(ws, data):
+    """Handle gym-style control messages routed over `/ws`."""
+    msg_type = data.get("type")
+    if msg_type not in {"reset", "step", "configure", "get_spaces", "close"}:
+        return False
+    msg_id = data.get("id")
+    if msg_id is None:
+        return False
+    payload = data.get("data", {}) or {}
+    session = _get_or_create_gym_session(ws)
+    try:
+        if msg_type == "reset":
+            seed = payload.get("seed")
+            obs, info = session.reset(seed=seed)
+            response = {
+                "type": "gym_reset",
+                "data": {
+                    "obs": obs.tolist(),
+                    "info": _serialize_value(info),
+                },
+            }
+        elif msg_type == "step":
+            action = payload.get("action", [])
+            render = bool(payload.get("render", False))
+            obs, reward, terminated, truncated, info = session.step(action)
+            response = {
+                "type": "gym_step",
+                "data": {
+                    "obs": obs.tolist(),
+                    "reward": float(reward),
+                    "terminated": bool(terminated),
+                    "truncated": bool(truncated),
+                    "info": _serialize_value(info),
+                },
+            }
+            if render:
+                frame_jpeg = session.render_jpeg()
+                if frame_jpeg:
+                    response["data"]["frame_jpeg"] = frame_jpeg
+        elif msg_type == "configure":
+            robot = payload.get("robot", "ur5")
+            scene = payload.get("scene")
+            session.configure(robot, scene)
+            response = {
+                "type": "gym_configured",
+                "data": {"robot": session.robot, "scene": session.scene},
+            }
+        elif msg_type == "get_spaces":
+            response = {
+                "type": "gym_spaces",
+                "data": {
+                    "action_space": _serialize_space(session.env.action_space),
+                    "observation_space": _serialize_space(session.env.observation_space),
+                },
+            }
+        elif msg_type == "close":
+            response = {"type": "gym_closed"}
+            _remove_gym_session(ws)
+        else:
+            response = {
+                "type": "gym_error",
+                "message": f"Unknown message type: {msg_type}",
+            }
+        if msg_id is not None:
+            response["id"] = msg_id
+        _safe_ws_send(ws, response)
+    except Exception as exc:
+        error_response = {"type": "gym_error", "message": str(exc)}
+        if msg_id is not None:
+            error_response["id"] = msg_id
+        _safe_ws_send(ws, error_response)
+    return True
 def _signal_episode_control(action: str):
     """Set episode control flags and notify trainer/UI clients."""
     action = (action or "").lower()
         time.sleep(0.04)
+def handle_ws_message(ws, data):
     """Handle incoming WebSocket message."""
+    global needs_robot_switch, camera_follow, last_teleop_command
     msg_type = data.get('type')
+    if _handle_gym_ws_message(ws, data):
+        return
+    if msg_type in ("trainer_identity", "notification"):
+        _handle_trainer_message(ws, data)
+        return
     if msg_type == 'command':
         payload = data.get('data', {})
         vx = payload.get('vx', 0.0)
         print(f"Robot switch requested: {robot} / scene: {scene}")
         needs_robot_switch = {"robot": robot, "scene": scene}
+    elif msg_type == 'configure':
+        payload = data.get('data', {})
+        robot = payload.get('robot')
+        scene = payload.get('scene')
+        if robot:
+            print(f"Configure requested: {robot} / scene: {scene}")
+            needs_robot_switch = {"robot": robot, "scene": scene}
     elif msg_type == 'camera':
         payload = data.get('data', {})
         a = cam.azimuth * np.pi / 180.0
             else:
                 updated_target = None
+        with teleop_lock:
+            last_teleop_command = {
+                "dx": dx,
+                "dy": dy,
+                "dz": dz,
+                "robot": current_robot,
+                "scene": getattr(env, "scene_name", None) if env is not None else None,
+                "target": updated_target.tolist() if updated_target is not None else None,
+                "timestamp": timestamp,
+            }
         broadcast_to_trainer(
             "teleop_command",
             {
         sys.path.insert(0, ur5_dir)
         from ur5_env import UR5Env
         sys.path.pop(0)
+        # scene is already resolved by _resolve_robot_scene (e.g., "scene_t_push")
         if scene:
             return UR5Env(render_mode="rgb_array", width=RENDER_WIDTH, height=RENDER_HEIGHT, scene_name=scene)
         return UR5Env(render_mode="rgb_array", width=RENDER_WIDTH, height=RENDER_HEIGHT)
                 break
             try:
                 data = json.loads(message)
+                handle_ws_message(ws, data)
             except json.JSONDecodeError:
                 print(f"Invalid JSON received: {message}")
             except Exception as e:
         # Unregister client
         with ws_clients_lock:
             ws_clients.discard(ws)
         with trainer_ws_clients_lock:
+            was_trainer = ws in trainer_ws_clients
             trainer_ws_clients.discard(ws)
+        if was_trainer:
+            _unregister_trainer_client(ws)
+            broadcast_state()
+            broadcast_trainer_connection_status()
+        _remove_gym_session(ws)
+        print('WebSocket client disconnected')
 @sock.route(f'{API_PREFIX}/gym/ws')
                 elif msg_type == "step":
                     action = payload.get("action", [])
                     obs, reward, terminated, truncated, info = session.step(action)
+                    print(f"[GYM WS] step reward={reward:.4f}, terminated={terminated}, truncated={truncated}", flush=True)
                     response = {
                         "type": "gym_step",
                         "data": {
                                     <span>Z nudge</span>
                                 </span>
                             </li>
                         </ul>
                     </div>
                     <div class="robot-info" id="robot_info">

robots/ur5/ur5_env.py CHANGED Viewed

@@ -800,14 +800,22 @@ class UR5Env(gym.Env):
         observation = self._get_obs()
-        # Reward: distance to target
-        ee_pos = self.get_end_effector_pos()
-        dist = np.linalg.norm(ee_pos - self._target_pos)
-        reward = -dist
         terminated = False
         truncated = self.steps >= self.max_steps
         info = {
             "ee_pos": ee_pos,
             "target_pos": self._target_pos,

         observation = self._get_obs()
+        # Reward: Use task reward if available, otherwise distance to target
+        task_reward = self.get_task_reward()
+        if task_reward is not None:
+            reward = task_reward
+        else:
+            # Fallback: distance to target (for non-task scenes)
+            ee_pos = self.get_end_effector_pos()
+            dist = np.linalg.norm(ee_pos - self._target_pos)
+            reward = -dist
         terminated = False
         truncated = self.steps >= self.max_steps
+        # Compute info with distance for debugging
+        ee_pos = self.get_end_effector_pos()
+        dist = np.linalg.norm(ee_pos - self._target_pos)
         info = {
             "ee_pos": ee_pos,
             "target_pos": self._target_pos,

tests/test_api.py CHANGED Viewed

@@ -28,8 +28,8 @@ def test_overlay_camera_presets(api_base: str):
     resp = requests.get(f"{api_base}/metadata", timeout=5)
     data = resp.json()
     presets = data.get('overlay_camera_presets', {})
-    assert 'ur5_t_push' in presets or 'scene_t_push' in presets
-    target_presets = presets.get('ur5_t_push') or presets.get('scene_t_push') or []
     names = {item.get('name') for item in target_presets}
     assert {'aux_top', 'aux_side', 'aux_flange'}.issubset(names)

     resp = requests.get(f"{api_base}/metadata", timeout=5)
     data = resp.json()
     presets = data.get('overlay_camera_presets', {})
+    assert 'ur5_t_push' in presets
+    target_presets = presets.get('ur5_t_push', [])
     names = {item.get('name') for item in target_presets}
     assert {'aux_top', 'aux_side', 'aux_flange'}.issubset(names)

tests/test_episode_control_fix.py ADDED Viewed

	@@ -0,0 +1,167 @@

+"""
+Unit test for episode control flag mechanism.
+This test verifies that episode control flags work correctly across multiple episodes
+without the problematic env.reset() call.
+"""
+import threading
+import time
+class MockEpisodeControlSystem:
+    """
+    Minimal reproduction of the episode control system from mujoco_server.py.
+    This tests the core flag-based signaling mechanism without needing the full
+    MuJoCo environment or WebSocket infrastructure.
+    """
+    def __init__(self):
+        self.episode_control_state = {
+            "terminate": False,
+            "truncate": False,
+        }
+        self.episode_control_lock = threading.Lock()
+    def signal_episode_control(self, action: str):
+        """
+        Simulates _signal_episode_control() from mujoco_server.py.
+        This is the FIXED version without the env.reset() call.
+        """
+        action = (action or "").lower()
+        if action not in ("terminate", "truncate"):
+            return
+        with self.episode_control_lock:
+            self.episode_control_state[action] = True
+        # NOTE: The bug was here - the original code called env.reset()
+        # The fix is to NOT reset the environment, as it interferes with
+        # the gym client's episode lifecycle management.
+    def consume_episode_control_flags(self):
+        """Simulates _consume_episode_control_flags() from mujoco_server.py."""
+        with self.episode_control_lock:
+            terminate = self.episode_control_state.get("terminate", False)
+            truncate = self.episode_control_state.get("truncate", False)
+            self.episode_control_state["terminate"] = False
+            self.episode_control_state["truncate"] = False
+        return terminate, truncate
+def test_episode_control_multiple_episodes():
+    """
+    Test that episode control works correctly for multiple consecutive episodes.
+    This simulates the scenario where a user presses Enter in the Nova-Sim UI
+    during multiple episodes to terminate them early.
+    """
+    system = MockEpisodeControlSystem()
+    num_episodes = 5
+    steps_per_episode = 10
+    step_to_terminate = 3
+    print(f"Testing {num_episodes} episodes with episode control...")
+    all_episodes_succeeded = True
+    for ep in range(num_episodes):
+        print(f"\n=== Episode {ep + 1}/{num_episodes} ===")
+        # Simulate episode reset
+        episode_terminated = False
+        for step in range(steps_per_episode):
+            # Simulate user pressing Enter at a specific step
+            if step == step_to_terminate:
+                print(f"  Step {step + 1}: User presses Enter (UI episode control)")
+                system.signal_episode_control("terminate")
+            # Simulate gym client calling step()
+            term, trunc = system.consume_episode_control_flags()
+            if term or trunc:
+                print(f"  Step {step + 1}: Episode ended (terminated={term}, truncated={trunc})")
+                episode_terminated = True
+                break
+            else:
+                print(f"  Step {step + 1}: Continuing...")
+        if not episode_terminated:
+            print(f"  ✗ FAILED: Episode {ep + 1} did not terminate!")
+            all_episodes_succeeded = False
+        elif not episode_terminated:
+            print(f"  ✗ FAILED: Episode {ep + 1} ended too early!")
+            all_episodes_succeeded = False
+        else:
+            print(f"  ✓ SUCCESS: Episode {ep + 1} terminated correctly")
+    print("\n" + "=" * 50)
+    if all_episodes_succeeded:
+        print("✓ ALL TESTS PASSED")
+        print(f"Episode control worked correctly for all {num_episodes} episodes")
+        return True
+    else:
+        print("✗ SOME TESTS FAILED")
+        return False
+def test_episode_control_threading():
+    """
+    Test that episode control is thread-safe.
+    This simulates the scenario where the UI thread signals episode control
+    while the gym client thread is consuming flags.
+    """
+    system = MockEpisodeControlSystem()
+    print("\nTesting thread safety...")
+    # Simulate concurrent UI and gym client activity
+    def ui_thread():
+        for i in range(10):
+            time.sleep(0.01)
+            system.signal_episode_control("terminate")
+    def gym_thread():
+        terminate_count = 0
+        for i in range(100):
+            time.sleep(0.001)
+            term, _ = system.consume_episode_control_flags()
+            if term:
+                terminate_count += 1
+        return terminate_count
+    ui = threading.Thread(target=ui_thread)
+    gym = threading.Thread(target=gym_thread)
+    ui.start()
+    gym.start()
+    ui.join()
+    gym.join()
+    print("✓ Thread safety test completed without crashes")
+    return True
+if __name__ == "__main__":
+    success = True
+    print("=" * 50)
+    print("Episode Control Fix - Unit Tests")
+    print("=" * 50)
+    success = test_episode_control_multiple_episodes() and success
+    success = test_episode_control_threading() and success
+    print("\n" + "=" * 50)
+    if success:
+        print("✓ ALL UNIT TESTS PASSED")
+        exit(0)
+    else:
+        print("✗ SOME UNIT TESTS FAILED")
+        exit(1)