Spaces:

gpue
/

nova-sim

Paused

Georg commited on Feb 3

Commit

2425670

1 Parent(s): 84b67c7

Implement dynamic camera management and enhance overlay functionality in mujoco_server.py

- Introduced dynamic camera configuration capabilities, allowing users to register auxiliary cameras via a new API endpoint.
- Updated overlay camera state management to support dynamic camera states, improving the responsiveness of the UI to camera events.
- Enhanced the handling of camera events in the frontend, ensuring that the UI refreshes automatically when new cameras are added.
- Added comprehensive tests for the dynamic camera functionality, verifying state announcements and environment updates.
- Updated README.md to document the new dynamic camera API and its integration with the existing system.

Files changed (4) hide show

README.md +24 -7
frontend/index.html +25 -1
mujoco_server.py +260 -78
tests/test_dynamic_camera.py +78 -0

README.md CHANGED Viewed

@@ -285,7 +285,7 @@ docker run --gpus all -p 3004:5000 \
 ### UR5 Scene & Camera Hints
 - The UI now selects between exactly two UR5 options: the gripper-ready scene and the T-push scene (both enumerated via `/nova-sim/api/v1/metadata`). Scene-specific camera feeds are available via the `/env` endpoint, so trainers can build dashboards based on the available streams.
-- UR5 T-push streams three auxiliary MJPEG tiles (`aux_top`, `aux_side`, and the new `aux_flange` first-person camera) via `/nova-sim/api/v1/camera/<name>/video_feed`. Those overlays appear below the state panel only when the scene actually provides them; the flange camera now sits below the tool, offset slightly along the flange +X axis, and always looks toward the stick tip so the trainer knows what the robot is actually pressing even if the stick partially hides the table view.
 - The T-shape target stays anchored at its configured pose across resets, which keeps the training objective consistent even when you hit Reset from the UI.
 ## Architecture
@@ -420,7 +420,8 @@ Nova-Sim provides a minimal HTTP API for static information:
 | `/env` | `GET` | Returns static environment information: robot, scene, has_gripper, action_space, observation_space, camera_feeds |
 | `/metadata` | `GET` | Returns available robots, scenes, actions, and system configuration |
 | `/video_feed` | `GET` | MJPEG video stream of the main camera |
-| `/camera/<name>/video_feed` | `GET` | MJPEG video stream of auxiliary cameras (e.g., aux_top, aux_side) |
 **Example `/env` response:**
 ```json
@@ -443,9 +444,9 @@ Nova-Sim provides a minimal HTTP API for static information:
       "intrinsics": {"fx": 869.1, "fy": 869.1, "cx": 640.0, "cy": 360.0, "width": 1280, "height": 720, "fovy_degrees": 45.0}
     },
     {
-      "name": "aux_top",
-      "label": "Top View",
-      "url": "/nova-sim/api/v1/camera/aux_top/video_feed",
       "pose": {
         "position": {"x": 0.5, "y": 0.2, "z": 0.9},
         "orientation": {"w": 0.99, "x": -0.08, "y": 0.02, "z": 0.02}
@@ -457,10 +458,26 @@ Nova-Sim provides a minimal HTTP API for static information:
 }
 ```
-The `/env` endpoint returns scene-specific information including camera feeds available for the current robot/scene configuration.
 All dynamic operations (reset, switching robots, sending actions) are performed via WebSocket messages. Training data (observations, rewards, etc.) come from the `/ws` state stream.
 ### Client → Server WebSocket Messages
 **`action`** - Send velocity actions to all robots:
@@ -745,7 +762,7 @@ External clients (trainers, monitors, etc.) announce themselves by sending a `cl
 ### Metadata & Camera Feeds
 - `GET /nova-sim/api/v1/metadata` returns JSON describing every available robot/scene pair and the supported actions
-- `GET /nova-sim/api/v1/env` returns scene-specific camera feeds - the `camera_feeds` array lists all available video streams for the current robot/scene configuration including the main camera and any auxiliary cameras (e.g., `aux_top`, `aux_side`, `aux_flange`)
 - `GET /nova-sim/api/v1/camera/<name>/video_feed` streams MJPEG for a specific camera feed
 - `pytest tests/` exercises the HTTP metadata/video endpoints, the `/ws` control socket, and every camera feed. Keep Nova-Sim running at `http://localhost:3004` when you run it so the suite can talk to the live server.

 ### UR5 Scene & Camera Hints
 - The UI now selects between exactly two UR5 options: the gripper-ready scene and the T-push scene (both enumerated via `/nova-sim/api/v1/metadata`). Scene-specific camera feeds are available via the `/env` endpoint, so trainers can build dashboards based on the available streams.
+- Auxiliary camera tiles now appear only when you add cameras via the dynamic camera API (see API section below). The UI refreshes when a new camera is announced.
 - The T-shape target stays anchored at its configured pose across resets, which keeps the training objective consistent even when you hit Reset from the UI.
 ## Architecture
 | `/env` | `GET` | Returns static environment information: robot, scene, has_gripper, action_space, observation_space, camera_feeds |
 | `/metadata` | `GET` | Returns available robots, scenes, actions, and system configuration |
 | `/video_feed` | `GET` | MJPEG video stream of the main camera |
+| `/camera/<name>/video_feed` | `GET` | MJPEG video stream of auxiliary cameras (added via `/cameras`) |
+| `/cameras` | `POST` | Register a new auxiliary camera for the current robot/scene |
 **Example `/env` response:**
 ```json
       "intrinsics": {"fx": 869.1, "fy": 869.1, "cx": 640.0, "cy": 360.0, "width": 1280, "height": 720, "fovy_degrees": 45.0}
     },
     {
+      "name": "aux_side",
+      "label": "Aux Side View",
+      "url": "/nova-sim/api/v1/camera/aux_side/video_feed",
       "pose": {
         "position": {"x": 0.5, "y": 0.2, "z": 0.9},
         "orientation": {"w": 0.99, "x": -0.08, "y": 0.02, "z": 0.02}
 }
 ```
+The `/env` endpoint returns scene-specific information including camera feeds available for the current robot/scene configuration. Dynamic cameras added via `POST /cameras` appear here immediately.
 All dynamic operations (reset, switching robots, sending actions) are performed via WebSocket messages. Training data (observations, rewards, etc.) come from the `/ws` state stream.
+**Add a camera (`POST /cameras`)**
+```json
+{
+  "name": "aux_side",
+  "label": "Aux Side View",
+  "lookat": [0.55, -0.1, 0.42],
+  "distance": 0.9,
+  "azimuth": -45,
+  "elevation": -30,
+  "replace": true
+}
+```
+Notes:
+- Cameras are scoped to the current robot + scene and show up in `/env` under `camera_feeds`.
+- The UI listens for a `state` message containing `camera_event` and refreshes tiles automatically.
 ### Client → Server WebSocket Messages
 **`action`** - Send velocity actions to all robots:
 ### Metadata & Camera Feeds
 - `GET /nova-sim/api/v1/metadata` returns JSON describing every available robot/scene pair and the supported actions
+- `GET /nova-sim/api/v1/env` returns scene-specific camera feeds - the `camera_feeds` array lists all available video streams for the current robot/scene configuration including the main camera and any auxiliary cameras you registered via `POST /nova-sim/api/v1/cameras`
 - `GET /nova-sim/api/v1/camera/<name>/video_feed` streams MJPEG for a specific camera feed
 - `pytest tests/` exercises the HTTP metadata/video endpoints, the `/ws` control socket, and every camera feed. Keep Nova-Sim running at `http://localhost:3004` when you run it so the suite can talk to the live server.

frontend/index.html CHANGED Viewed

@@ -1256,6 +1256,26 @@
             setupOverlayTiles();
         }
         const robotInfoText = {
             'g1': '29 DOF humanoid with RL walking policy',
             'spot': '12 DOF quadruped with trot gait controller',
@@ -1534,7 +1554,11 @@
                 try {
                     const msg = JSON.parse(event.data);
                     if (msg.type === 'state') {
-                        const data = msg.data;
                         // Check if robot or scene changed in state stream
                         if (data.robot && data.robot !== currentRobot) {

             setupOverlayTiles();
         }
+        function handleCameraEvent(eventPayload) {
+            if (!eventPayload || !eventPayload.camera) {
+                return;
+            }
+            const scope = eventPayload.scope || {};
+            if (scope.robot && currentRobot && scope.robot !== currentRobot) {
+                return;
+            }
+            if (scope.scene && currentScene && scope.scene !== currentScene) {
+                return;
+            }
+            fetch(envUrl)
+                .then(r => r.json())
+                .then(envData => {
+                    envCache = envData;
+                    refreshOverlayTiles();
+                    refreshVideoStreams();
+                });
+        }
         const robotInfoText = {
             'g1': '29 DOF humanoid with RL walking policy',
             'spot': '12 DOF quadruped with trot gait controller',
                 try {
                     const msg = JSON.parse(event.data);
                     if (msg.type === 'state') {
+                        const data = msg.data || {};
+                        if (data.camera_event) {
+                            handleCameraEvent(data.camera_event);
+                            return;
+                        }
                         // Check if robot or scene changed in state stream
                         if (data.robot && data.robot !== currentRobot) {

mujoco_server.py CHANGED Viewed

@@ -5,6 +5,7 @@ import threading
 import json
 import base64
 import math
 import traceback
 import io
 import cv2
@@ -246,51 +247,13 @@ AVAILABLE_ACTIONS = [
     "use_orientation",
 ]
-UR5_T_PUSH_OVERLAY_PRESETS = [
-    {
-        "name": "aux_top",
-        "label": "Aux Top View",
-        "lookat": [0.5, 0.0, 0.42],
-        "distance": 0.75,
-        "azimuth": 30,
-        "elevation": -65,
-    },
-    {
-        "name": "aux_side",
-        "label": "Aux Side View",
-        "lookat": [0.55, -0.1, 0.42],
-        "distance": 0.9,
-        "azimuth": -45,
-        "elevation": -30,
-    },
-    {
-        "name": "aux_flange",
-        "label": "Flange View",
-        "follow_site": "ee_site",
-        "look_target_site": "stick_tip",
-        "track_orientation": True,
-        "offset": [0.09, 0.0, 0.06],
-        "forward_offset": 0.03,
-        "distance": 0.12,
-        "azimuth": 12,
-        "elevation": 32,
-    },
-]
-OVERLAY_CAMERA_PRESETS = {
-    "ur5_t_push": UR5_T_PUSH_OVERLAY_PRESETS,
-}
-CAMERA_FEEDS = [
-    {"name": "main", "label": "Main", "description": "Primary viewport"},
-    {"name": "aux_top", "label": "Aux Top", "description": "Additional top perspective"},
-    {"name": "aux_side", "label": "Aux Side", "description": "Side perspective"},
-    {"name": "aux_flange", "label": "Flange", "description": "Tool view mounted below the flange"},
-]
 overlay_camera_states: dict[str, dict[str, Any]] = {}
 overlay_frames: dict[str, bytes | None] = {}
 overlay_frame_lock = threading.Lock()
 def _make_mjv_camera(config: dict[str, float]) -> mujoco.MjvCamera:
     cam_obj = mujoco.MjvCamera()
@@ -301,6 +264,40 @@ def _make_mjv_camera(config: dict[str, float]) -> mujoco.MjvCamera:
     return cam_obj
 def _normalize_vec(vec: np.ndarray) -> np.ndarray:
     norm = np.linalg.norm(vec)
     if norm < 1e-8:
@@ -424,15 +421,33 @@ def _get_site_forward(env_obj, site_id: int) -> np.ndarray:
 def _close_overlay_renderers():
     global overlay_camera_states
-    for state in overlay_camera_states.values():
-        renderer_obj = state.get("renderer")
-        if renderer_obj:
-            renderer_obj.close()
-    overlay_camera_states = {}
     with overlay_frame_lock:
         overlay_frames.clear()
 def prepare_overlay_renderers(robot_type: str, scene_name: str | None):
     """Create overlay renderers for the active robot/scene (if configured)."""
     _close_overlay_renderers()
@@ -442,37 +457,21 @@ def prepare_overlay_renderers(robot_type: str, scene_name: str | None):
         configs = OVERLAY_CAMERA_PRESETS[robot_type]
     elif scene_name and scene_name in OVERLAY_CAMERA_PRESETS:
         configs = OVERLAY_CAMERA_PRESETS[scene_name]
     if not configs:
         return
     for config in configs:
-        cam_obj = _make_mjv_camera(config)
-        renderer_obj = mujoco.Renderer(
-            env.model,
-            height=OVERLAY_RENDER_HEIGHT,
-            width=OVERLAY_RENDER_WIDTH
-        ) if env else None
-        follow_site = config.get("follow_site")
-        site_id = -1
-        if follow_site and env is not None:
-            site_id = mujoco.mj_name2id(env.model, mujoco.mjtObj.mjOBJ_SITE, follow_site)
-        offset = np.array(config.get("offset", [0.0, 0.0, 0.0]), dtype=np.float32)
-        overlay_camera_states[config["name"]] = {
-            "camera": cam_obj,
-            "renderer": renderer_obj,
-            "label": config.get("label", config["name"]),
-            "follow_site": follow_site,
-            "site_id": site_id,
-            "offset": offset,
-            "forward_offset": float(config.get("forward_offset", 0.0)),
-            "track_orientation": bool(config.get("track_orientation")),
-            "look_target_site": config.get("look_target_site"),
-            "look_target_id": -1,
-        }
-        look_target_site = config.get("look_target_site")
-        if look_target_site and env is not None:
-            look_target_id = mujoco.mj_name2id(env.model, mujoco.mjtObj.mjOBJ_SITE, look_target_site)
-            overlay_camera_states[config["name"]]["look_target_id"] = look_target_id
         with overlay_frame_lock:
             overlay_frames[config["name"]] = None
@@ -483,6 +482,62 @@ def _parse_bool(value):
     return str(value).strip().lower() in ("1", "true", "yes", "on")
 def _load_nova_ur5_config_from_env():
     use_state = _parse_bool(os.environ.get("NOVA_UR5_USE_STATE_STREAM") or os.environ.get("NOVA_USE_STATE_STREAM"))
     use_ik = _parse_bool(os.environ.get("NOVA_UR5_USE_IK") or os.environ.get("NOVA_USE_IK"))
@@ -855,6 +910,29 @@ def broadcast_state():
         ws_clients.difference_update(dead_clients)
 def _handle_external_client_message(ws, data):
     """Process message payloads originating from external clients."""
     msg_type = data.get("type")
@@ -1205,11 +1283,30 @@ def simulation_loop():
                 latest_frame = buffer.tobytes()
         # Render overlay camera frames (if configured)
-        for name, state in overlay_camera_states.items():
             renderer_obj = state.get("renderer")
             cam_obj = state.get("camera")
             if renderer_obj is None or cam_obj is None:
-                continue
             if state.get("follow_site") and env is not None:
                 site_id = state.get("site_id", -1)
                 if site_id >= 0 and site_id < getattr(env.model, "nsite", 0):
@@ -1893,6 +1990,80 @@ def video_feed():
     return Response(generate_frames(),
                     mimetype='multipart/x-mixed-replace; boundary=frame')
 @app.route(f'{API_PREFIX}/camera/<name>/video_feed')
 def camera_feed(name):
     if name == "main":
@@ -1925,8 +2096,10 @@ def capture_depth_snapshot(camera_name: str) -> Optional[bytes]:
     camera_obj = None
     if camera_name == "main":
         camera_obj = cam
-    elif camera_name in overlay_camera_states:
-        camera_obj = overlay_camera_states[camera_name].get("camera")
     if camera_obj is None:
         return None
@@ -2078,9 +2251,18 @@ def get_env_info():
         # Check for overlay cameras based on scene
         configs = (scene_name and OVERLAY_CAMERA_PRESETS.get(scene_name)) or OVERLAY_CAMERA_PRESETS.get(current_robot) or []
         for feed in configs:
             feed_name = feed.get('name', 'aux')
-            cam_state = overlay_camera_states.get(feed_name, {})
             cam_obj = cam_state.get("camera")
             feed_intrinsics = _camera_intrinsics(
                 OVERLAY_RENDER_WIDTH,

 import json
 import base64
 import math
+import re
 import traceback
 import io
 import cv2
     "use_orientation",
 ]
+OVERLAY_CAMERA_PRESETS: dict[str, list[dict[str, Any]]] = {}
 overlay_camera_states: dict[str, dict[str, Any]] = {}
 overlay_frames: dict[str, bytes | None] = {}
 overlay_frame_lock = threading.Lock()
+overlay_camera_lock = threading.Lock()
+dynamic_camera_configs: dict[str, dict[str, dict[str, Any]]] = {}
 def _make_mjv_camera(config: dict[str, float]) -> mujoco.MjvCamera:
     cam_obj = mujoco.MjvCamera()
     return cam_obj
+def _build_overlay_camera_state(config: dict[str, Any], create_renderer: bool = True) -> dict[str, Any]:
+    """Build overlay camera state payload from config."""
+    cam_obj = _make_mjv_camera(config)
+    renderer_obj = None
+    if create_renderer and env is not None:
+        renderer_obj = mujoco.Renderer(
+            env.model,
+            height=OVERLAY_RENDER_HEIGHT,
+            width=OVERLAY_RENDER_WIDTH
+        )
+    follow_site = config.get("follow_site")
+    site_id = -1
+    if follow_site and env is not None:
+        site_id = mujoco.mj_name2id(env.model, mujoco.mjtObj.mjOBJ_SITE, follow_site)
+    offset = np.array(config.get("offset", [0.0, 0.0, 0.0]), dtype=np.float32)
+    state_payload = {
+        "camera": cam_obj,
+        "renderer": renderer_obj,
+        "label": config.get("label", config["name"]),
+        "follow_site": follow_site,
+        "site_id": site_id,
+        "offset": offset,
+        "forward_offset": float(config.get("forward_offset", 0.0)),
+        "track_orientation": bool(config.get("track_orientation")),
+        "look_target_site": config.get("look_target_site"),
+        "look_target_id": -1,
+    }
+    look_target_site = config.get("look_target_site")
+    if look_target_site and env is not None:
+        look_target_id = mujoco.mj_name2id(env.model, mujoco.mjtObj.mjOBJ_SITE, look_target_site)
+        state_payload["look_target_id"] = look_target_id
+    return state_payload
 def _normalize_vec(vec: np.ndarray) -> np.ndarray:
     norm = np.linalg.norm(vec)
     if norm < 1e-8:
 def _close_overlay_renderers():
     global overlay_camera_states
+    with overlay_camera_lock:
+        for state in overlay_camera_states.values():
+            renderer_obj = state.get("renderer")
+            if renderer_obj:
+                renderer_obj.close()
+        overlay_camera_states = {}
     with overlay_frame_lock:
         overlay_frames.clear()
+def _camera_scope_key(robot_type: str | None, scene_name: str | None) -> str:
+    return f"{robot_type or ''}:{scene_name or ''}"
+def _get_dynamic_camera_configs(robot_type: str | None, scene_name: str | None) -> list[dict[str, Any]]:
+    key = _camera_scope_key(robot_type, scene_name)
+    configs = dynamic_camera_configs.get(key, {})
+    return list(configs.values())
+def _store_dynamic_camera_config(robot_type: str | None, scene_name: str | None, config: dict[str, Any]) -> None:
+    key = _camera_scope_key(robot_type, scene_name)
+    if key not in dynamic_camera_configs:
+        dynamic_camera_configs[key] = {}
+    dynamic_camera_configs[key][config["name"]] = config
 def prepare_overlay_renderers(robot_type: str, scene_name: str | None):
     """Create overlay renderers for the active robot/scene (if configured)."""
     _close_overlay_renderers()
         configs = OVERLAY_CAMERA_PRESETS[robot_type]
     elif scene_name and scene_name in OVERLAY_CAMERA_PRESETS:
         configs = OVERLAY_CAMERA_PRESETS[scene_name]
+    dynamic_configs = _get_dynamic_camera_configs(robot_type, scene_name)
+    if dynamic_configs:
+        merged: dict[str, dict[str, Any]] = {}
+        for config in configs:
+            merged[config["name"]] = config
+        for config in dynamic_configs:
+            merged[config["name"]] = config
+        configs = list(merged.values())
     if not configs:
         return
     for config in configs:
+        state_payload = _build_overlay_camera_state(config)
+        with overlay_camera_lock:
+            overlay_camera_states[config["name"]] = state_payload
         with overlay_frame_lock:
             overlay_frames[config["name"]] = None
     return str(value).strip().lower() in ("1", "true", "yes", "on")
+def _coerce_vec3(value: Any, default: list[float]) -> list[float]:
+    if value is None:
+        return default
+    if isinstance(value, dict):
+        value = [value.get("x"), value.get("y"), value.get("z")]
+    if not isinstance(value, (list, tuple)) or len(value) != 3:
+        raise ValueError("Expected a 3-element vector")
+    return [float(value[0]), float(value[1]), float(value[2])]
+def _valid_camera_name(name: str) -> bool:
+    if not name:
+        return False
+    if name == "main":
+        return False
+    return re.match(r"^[A-Za-z0-9_-]+$", name) is not None
+def _build_dynamic_camera_config(payload: dict[str, Any]) -> dict[str, Any]:
+    name = str(payload.get("name", "")).strip()
+    if not _valid_camera_name(name):
+        raise ValueError("Invalid camera name. Use letters, numbers, hyphens, or underscores.")
+    label = str(payload.get("label") or name).strip()
+    default_lookat = [float(cam.lookat[0]), float(cam.lookat[1]), float(cam.lookat[2])]
+    default_distance = float(cam.distance)
+    default_azimuth = float(cam.azimuth)
+    default_elevation = float(cam.elevation)
+    lookat = _coerce_vec3(payload.get("lookat"), default_lookat)
+    distance = float(payload.get("distance", default_distance))
+    azimuth = float(payload.get("azimuth", default_azimuth))
+    elevation = float(payload.get("elevation", default_elevation))
+    config: dict[str, Any] = {
+        "name": name,
+        "label": label,
+        "lookat": lookat,
+        "distance": distance,
+        "azimuth": azimuth,
+        "elevation": elevation,
+    }
+    for key in ("follow_site", "look_target_site"):
+        if payload.get(key) is not None:
+            config[key] = str(payload.get(key))
+    if payload.get("offset") is not None:
+        config["offset"] = _coerce_vec3(payload.get("offset"), [0.0, 0.0, 0.0])
+    if payload.get("forward_offset") is not None:
+        config["forward_offset"] = float(payload.get("forward_offset", 0.0))
+    if payload.get("track_orientation") is not None:
+        config["track_orientation"] = bool(payload.get("track_orientation"))
+    return config
 def _load_nova_ur5_config_from_env():
     use_state = _parse_bool(os.environ.get("NOVA_UR5_USE_STATE_STREAM") or os.environ.get("NOVA_USE_STATE_STREAM"))
     use_ik = _parse_bool(os.environ.get("NOVA_UR5_USE_IK") or os.environ.get("NOVA_USE_IK"))
         ws_clients.difference_update(dead_clients)
+def broadcast_camera_event(action: str, camera_payload: dict[str, Any], scope: dict[str, Any]) -> None:
+    """Announce camera changes via the state stream."""
+    message = {
+        "type": "state",
+        "data": {
+            "camera_event": {
+                "action": action,
+                "camera": camera_payload,
+                "scope": scope,
+                "timestamp": time.time(),
+            }
+        }
+    }
+    with ws_clients_lock:
+        dead_clients = set()
+        for ws in ws_clients:
+            try:
+                ws.send(json.dumps(message))
+            except Exception:
+                dead_clients.add(ws)
+        ws_clients.difference_update(dead_clients)
 def _handle_external_client_message(ws, data):
     """Process message payloads originating from external clients."""
     msg_type = data.get("type")
                 latest_frame = buffer.tobytes()
         # Render overlay camera frames (if configured)
+        with overlay_camera_lock:
+            overlay_snapshot = list(overlay_camera_states.items())
+        for name, state in overlay_snapshot:
             renderer_obj = state.get("renderer")
             cam_obj = state.get("camera")
             if renderer_obj is None or cam_obj is None:
+                if cam_obj is None:
+                    continue
+                with mujoco_lock:
+                    if env is None:
+                        continue
+                    try:
+                        renderer_obj = mujoco.Renderer(
+                            env.model,
+                            height=OVERLAY_RENDER_HEIGHT,
+                            width=OVERLAY_RENDER_WIDTH
+                        )
+                    except Exception:
+                        continue
+                with overlay_camera_lock:
+                    if name in overlay_camera_states:
+                        overlay_camera_states[name]["renderer"] = renderer_obj
+                if renderer_obj is None:
+                    continue
             if state.get("follow_site") and env is not None:
                 site_id = state.get("site_id", -1)
                 if site_id >= 0 and site_id < getattr(env.model, "nsite", 0):
     return Response(generate_frames(),
                     mimetype='multipart/x-mixed-replace; boundary=frame')
+@app.route(f'{API_PREFIX}/cameras', methods=['POST'])
+def add_camera():
+    """Add an overlay camera for the current robot/scene."""
+    payload = request.get_json(silent=True) or {}
+    replace_existing = bool(payload.get("replace", False))
+    try:
+        config = _build_dynamic_camera_config(payload)
+    except ValueError as exc:
+        return jsonify({"error": str(exc)}), 400
+    with mujoco_lock:
+        if env is None:
+            return jsonify({"error": "no environment initialized"}), 503
+        scene_name = getattr(env, "scene_name", None) or current_scene
+        robot_type = current_robot
+        # Validate site references if provided
+        follow_site = config.get("follow_site")
+        look_target_site = config.get("look_target_site")
+        if follow_site:
+            site_id = mujoco.mj_name2id(env.model, mujoco.mjtObj.mjOBJ_SITE, follow_site)
+            if site_id < 0:
+                return jsonify({"error": f"follow_site '{follow_site}' not found"}), 400
+        if look_target_site:
+            target_id = mujoco.mj_name2id(env.model, mujoco.mjtObj.mjOBJ_SITE, look_target_site)
+            if target_id < 0:
+                return jsonify({"error": f"look_target_site '{look_target_site}' not found"}), 400
+        temp_cam = _make_mjv_camera(config)
+        feed_intrinsics = _camera_intrinsics(
+            OVERLAY_RENDER_WIDTH,
+            OVERLAY_RENDER_HEIGHT,
+            _camera_fovy_degrees(temp_cam or cam, env),
+        )
+        camera_payload = {
+            "name": config["name"],
+            "label": config.get("label", config["name"]),
+            "url": f"{API_PREFIX}/camera/{config['name']}/video_feed",
+            "pose": _camera_pose(temp_cam),
+            "intrinsics": feed_intrinsics,
+        }
+    with overlay_camera_lock:
+        exists = config["name"] in overlay_camera_states
+    if exists and not replace_existing:
+        return jsonify({"error": "camera already exists", "name": config["name"]}), 409
+    _store_dynamic_camera_config(robot_type, scene_name, config)
+    state_payload = _build_overlay_camera_state(config, create_renderer=False)
+    with overlay_camera_lock:
+        if config["name"] in overlay_camera_states:
+            old_renderer = overlay_camera_states[config["name"]].get("renderer")
+            if old_renderer:
+                old_renderer.close()
+        overlay_camera_states[config["name"]] = state_payload
+    with overlay_frame_lock:
+        overlay_frames[config["name"]] = None
+    broadcast_camera_event(
+        "added",
+        camera_payload,
+        {"robot": robot_type, "scene": scene_name},
+    )
+    return jsonify({
+        "status": "ok",
+        "camera": camera_payload,
+        "scope": {"robot": robot_type, "scene": scene_name},
+    })
 @app.route(f'{API_PREFIX}/camera/<name>/video_feed')
 def camera_feed(name):
     if name == "main":
     camera_obj = None
     if camera_name == "main":
         camera_obj = cam
+    else:
+        with overlay_camera_lock:
+            if camera_name in overlay_camera_states:
+                camera_obj = overlay_camera_states[camera_name].get("camera")
     if camera_obj is None:
         return None
         # Check for overlay cameras based on scene
         configs = (scene_name and OVERLAY_CAMERA_PRESETS.get(scene_name)) or OVERLAY_CAMERA_PRESETS.get(current_robot) or []
+        dynamic_configs = _get_dynamic_camera_configs(current_robot, scene_name)
+        if dynamic_configs:
+            merged_configs: dict[str, dict[str, Any]] = {}
+            for feed in configs:
+                merged_configs[feed.get("name", "aux")] = feed
+            for feed in dynamic_configs:
+                merged_configs[feed.get("name", "aux")] = feed
+            configs = list(merged_configs.values())
         for feed in configs:
             feed_name = feed.get('name', 'aux')
+            with overlay_camera_lock:
+                cam_state = overlay_camera_states.get(feed_name, {}).copy()
             cam_obj = cam_state.get("camera")
             feed_intrinsics = _camera_intrinsics(
                 OVERLAY_RENDER_WIDTH,

tests/test_dynamic_camera.py ADDED Viewed

	@@ -0,0 +1,78 @@

+"""Tests for dynamic camera endpoint and state stream announcements.
+Run with server running:
+    python nova-sim/mujoco_server.py
+Then in another terminal:
+    pytest nova-sim/tests/test_dynamic_camera.py -v
+"""
+import json
+import time
+import pytest
+import requests
+from websockets.sync.client import connect
+BASE_URL = "http://localhost:3004/nova-sim/api/v1"
+WS_URL = "ws://localhost:3004/nova-sim/api/v1/ws"
+@pytest.fixture(scope="module")
+def check_server():
+    """Check if the server is running before tests."""
+    try:
+        response = requests.get(f"{BASE_URL}/metadata", timeout=2)
+        response.raise_for_status()
+    except requests.RequestException:
+        pytest.skip("Nova-Sim server is not running at localhost:3004")
+def test_add_dynamic_camera_announces_state(check_server):
+    """Add a camera and verify state stream announcement + env listing."""
+    camera_name = "aux_side"
+    payload = {
+        "name": camera_name,
+        "label": "Aux Side View",
+        "lookat": [0.55, -0.1, 0.42],
+        "distance": 0.9,
+        "azimuth": -45,
+        "elevation": -30,
+        "replace": True,
+    }
+    with connect(WS_URL, timeout=10) as ws:
+        ws.send(json.dumps({
+            "type": "client_identity",
+            "data": {"client_id": "test-dynamic-camera"}
+        }))
+        resp = requests.post(f"{BASE_URL}/cameras", json=payload, timeout=5)
+        assert resp.status_code == 200
+        data = resp.json()
+        assert data["camera"]["name"] == camera_name
+        found_event = False
+        deadline = time.time() + 5
+        while time.time() < deadline:
+            try:
+                msg = ws.recv(timeout=1)
+            except TimeoutError:
+                continue
+            parsed = json.loads(msg)
+            if parsed.get("type") != "state":
+                continue
+            event = (parsed.get("data") or {}).get("camera_event")
+            if not event:
+                continue
+            if event.get("action") == "added" and (event.get("camera") or {}).get("name") == camera_name:
+                found_event = True
+                break
+        assert found_event, "Did not receive camera_event state message"
+    env_resp = requests.get(f"{BASE_URL}/env", timeout=5)
+    env_resp.raise_for_status()
+    env_data = env_resp.json()
+    feed_names = [feed.get("name") for feed in env_data.get("camera_feeds", [])]
+    assert camera_name in feed_names