Georg commited on
Commit
2425670
·
1 Parent(s): 84b67c7

Implement dynamic camera management and enhance overlay functionality in mujoco_server.py

Browse files

- Introduced dynamic camera configuration capabilities, allowing users to register auxiliary cameras via a new API endpoint.
- Updated overlay camera state management to support dynamic camera states, improving the responsiveness of the UI to camera events.
- Enhanced the handling of camera events in the frontend, ensuring that the UI refreshes automatically when new cameras are added.
- Added comprehensive tests for the dynamic camera functionality, verifying state announcements and environment updates.
- Updated README.md to document the new dynamic camera API and its integration with the existing system.

Files changed (4) hide show
  1. README.md +24 -7
  2. frontend/index.html +25 -1
  3. mujoco_server.py +260 -78
  4. tests/test_dynamic_camera.py +78 -0
README.md CHANGED
@@ -285,7 +285,7 @@ docker run --gpus all -p 3004:5000 \
285
 
286
  ### UR5 Scene & Camera Hints
287
  - The UI now selects between exactly two UR5 options: the gripper-ready scene and the T-push scene (both enumerated via `/nova-sim/api/v1/metadata`). Scene-specific camera feeds are available via the `/env` endpoint, so trainers can build dashboards based on the available streams.
288
- - UR5 T-push streams three auxiliary MJPEG tiles (`aux_top`, `aux_side`, and the new `aux_flange` first-person camera) via `/nova-sim/api/v1/camera/<name>/video_feed`. Those overlays appear below the state panel only when the scene actually provides them; the flange camera now sits below the tool, offset slightly along the flange +X axis, and always looks toward the stick tip so the trainer knows what the robot is actually pressing even if the stick partially hides the table view.
289
  - The T-shape target stays anchored at its configured pose across resets, which keeps the training objective consistent even when you hit Reset from the UI.
290
 
291
  ## Architecture
@@ -420,7 +420,8 @@ Nova-Sim provides a minimal HTTP API for static information:
420
  | `/env` | `GET` | Returns static environment information: robot, scene, has_gripper, action_space, observation_space, camera_feeds |
421
  | `/metadata` | `GET` | Returns available robots, scenes, actions, and system configuration |
422
  | `/video_feed` | `GET` | MJPEG video stream of the main camera |
423
- | `/camera/<name>/video_feed` | `GET` | MJPEG video stream of auxiliary cameras (e.g., aux_top, aux_side) |
 
424
 
425
  **Example `/env` response:**
426
  ```json
@@ -443,9 +444,9 @@ Nova-Sim provides a minimal HTTP API for static information:
443
  "intrinsics": {"fx": 869.1, "fy": 869.1, "cx": 640.0, "cy": 360.0, "width": 1280, "height": 720, "fovy_degrees": 45.0}
444
  },
445
  {
446
- "name": "aux_top",
447
- "label": "Top View",
448
- "url": "/nova-sim/api/v1/camera/aux_top/video_feed",
449
  "pose": {
450
  "position": {"x": 0.5, "y": 0.2, "z": 0.9},
451
  "orientation": {"w": 0.99, "x": -0.08, "y": 0.02, "z": 0.02}
@@ -457,10 +458,26 @@ Nova-Sim provides a minimal HTTP API for static information:
457
  }
458
  ```
459
 
460
- The `/env` endpoint returns scene-specific information including camera feeds available for the current robot/scene configuration.
461
 
462
  All dynamic operations (reset, switching robots, sending actions) are performed via WebSocket messages. Training data (observations, rewards, etc.) come from the `/ws` state stream.
463
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
464
  ### Client → Server WebSocket Messages
465
 
466
  **`action`** - Send velocity actions to all robots:
@@ -745,7 +762,7 @@ External clients (trainers, monitors, etc.) announce themselves by sending a `cl
745
  ### Metadata & Camera Feeds
746
 
747
  - `GET /nova-sim/api/v1/metadata` returns JSON describing every available robot/scene pair and the supported actions
748
- - `GET /nova-sim/api/v1/env` returns scene-specific camera feeds - the `camera_feeds` array lists all available video streams for the current robot/scene configuration including the main camera and any auxiliary cameras (e.g., `aux_top`, `aux_side`, `aux_flange`)
749
  - `GET /nova-sim/api/v1/camera/<name>/video_feed` streams MJPEG for a specific camera feed
750
  - `pytest tests/` exercises the HTTP metadata/video endpoints, the `/ws` control socket, and every camera feed. Keep Nova-Sim running at `http://localhost:3004` when you run it so the suite can talk to the live server.
751
 
 
285
 
286
  ### UR5 Scene & Camera Hints
287
  - The UI now selects between exactly two UR5 options: the gripper-ready scene and the T-push scene (both enumerated via `/nova-sim/api/v1/metadata`). Scene-specific camera feeds are available via the `/env` endpoint, so trainers can build dashboards based on the available streams.
288
+ - Auxiliary camera tiles now appear only when you add cameras via the dynamic camera API (see API section below). The UI refreshes when a new camera is announced.
289
  - The T-shape target stays anchored at its configured pose across resets, which keeps the training objective consistent even when you hit Reset from the UI.
290
 
291
  ## Architecture
 
420
  | `/env` | `GET` | Returns static environment information: robot, scene, has_gripper, action_space, observation_space, camera_feeds |
421
  | `/metadata` | `GET` | Returns available robots, scenes, actions, and system configuration |
422
  | `/video_feed` | `GET` | MJPEG video stream of the main camera |
423
+ | `/camera/<name>/video_feed` | `GET` | MJPEG video stream of auxiliary cameras (added via `/cameras`) |
424
+ | `/cameras` | `POST` | Register a new auxiliary camera for the current robot/scene |
425
 
426
  **Example `/env` response:**
427
  ```json
 
444
  "intrinsics": {"fx": 869.1, "fy": 869.1, "cx": 640.0, "cy": 360.0, "width": 1280, "height": 720, "fovy_degrees": 45.0}
445
  },
446
  {
447
+ "name": "aux_side",
448
+ "label": "Aux Side View",
449
+ "url": "/nova-sim/api/v1/camera/aux_side/video_feed",
450
  "pose": {
451
  "position": {"x": 0.5, "y": 0.2, "z": 0.9},
452
  "orientation": {"w": 0.99, "x": -0.08, "y": 0.02, "z": 0.02}
 
458
  }
459
  ```
460
 
461
+ The `/env` endpoint returns scene-specific information including camera feeds available for the current robot/scene configuration. Dynamic cameras added via `POST /cameras` appear here immediately.
462
 
463
  All dynamic operations (reset, switching robots, sending actions) are performed via WebSocket messages. Training data (observations, rewards, etc.) come from the `/ws` state stream.
464
 
465
+ **Add a camera (`POST /cameras`)**
466
+ ```json
467
+ {
468
+ "name": "aux_side",
469
+ "label": "Aux Side View",
470
+ "lookat": [0.55, -0.1, 0.42],
471
+ "distance": 0.9,
472
+ "azimuth": -45,
473
+ "elevation": -30,
474
+ "replace": true
475
+ }
476
+ ```
477
+ Notes:
478
+ - Cameras are scoped to the current robot + scene and show up in `/env` under `camera_feeds`.
479
+ - The UI listens for a `state` message containing `camera_event` and refreshes tiles automatically.
480
+
481
  ### Client → Server WebSocket Messages
482
 
483
  **`action`** - Send velocity actions to all robots:
 
762
  ### Metadata & Camera Feeds
763
 
764
  - `GET /nova-sim/api/v1/metadata` returns JSON describing every available robot/scene pair and the supported actions
765
+ - `GET /nova-sim/api/v1/env` returns scene-specific camera feeds - the `camera_feeds` array lists all available video streams for the current robot/scene configuration including the main camera and any auxiliary cameras you registered via `POST /nova-sim/api/v1/cameras`
766
  - `GET /nova-sim/api/v1/camera/<name>/video_feed` streams MJPEG for a specific camera feed
767
  - `pytest tests/` exercises the HTTP metadata/video endpoints, the `/ws` control socket, and every camera feed. Keep Nova-Sim running at `http://localhost:3004` when you run it so the suite can talk to the live server.
768
 
frontend/index.html CHANGED
@@ -1256,6 +1256,26 @@
1256
  setupOverlayTiles();
1257
  }
1258
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1259
  const robotInfoText = {
1260
  'g1': '29 DOF humanoid with RL walking policy',
1261
  'spot': '12 DOF quadruped with trot gait controller',
@@ -1534,7 +1554,11 @@
1534
  try {
1535
  const msg = JSON.parse(event.data);
1536
  if (msg.type === 'state') {
1537
- const data = msg.data;
 
 
 
 
1538
 
1539
  // Check if robot or scene changed in state stream
1540
  if (data.robot && data.robot !== currentRobot) {
 
1256
  setupOverlayTiles();
1257
  }
1258
 
1259
+ function handleCameraEvent(eventPayload) {
1260
+ if (!eventPayload || !eventPayload.camera) {
1261
+ return;
1262
+ }
1263
+ const scope = eventPayload.scope || {};
1264
+ if (scope.robot && currentRobot && scope.robot !== currentRobot) {
1265
+ return;
1266
+ }
1267
+ if (scope.scene && currentScene && scope.scene !== currentScene) {
1268
+ return;
1269
+ }
1270
+ fetch(envUrl)
1271
+ .then(r => r.json())
1272
+ .then(envData => {
1273
+ envCache = envData;
1274
+ refreshOverlayTiles();
1275
+ refreshVideoStreams();
1276
+ });
1277
+ }
1278
+
1279
  const robotInfoText = {
1280
  'g1': '29 DOF humanoid with RL walking policy',
1281
  'spot': '12 DOF quadruped with trot gait controller',
 
1554
  try {
1555
  const msg = JSON.parse(event.data);
1556
  if (msg.type === 'state') {
1557
+ const data = msg.data || {};
1558
+ if (data.camera_event) {
1559
+ handleCameraEvent(data.camera_event);
1560
+ return;
1561
+ }
1562
 
1563
  // Check if robot or scene changed in state stream
1564
  if (data.robot && data.robot !== currentRobot) {
mujoco_server.py CHANGED
@@ -5,6 +5,7 @@ import threading
5
  import json
6
  import base64
7
  import math
 
8
  import traceback
9
  import io
10
  import cv2
@@ -246,51 +247,13 @@ AVAILABLE_ACTIONS = [
246
  "use_orientation",
247
  ]
248
 
249
- UR5_T_PUSH_OVERLAY_PRESETS = [
250
- {
251
- "name": "aux_top",
252
- "label": "Aux Top View",
253
- "lookat": [0.5, 0.0, 0.42],
254
- "distance": 0.75,
255
- "azimuth": 30,
256
- "elevation": -65,
257
- },
258
- {
259
- "name": "aux_side",
260
- "label": "Aux Side View",
261
- "lookat": [0.55, -0.1, 0.42],
262
- "distance": 0.9,
263
- "azimuth": -45,
264
- "elevation": -30,
265
- },
266
- {
267
- "name": "aux_flange",
268
- "label": "Flange View",
269
- "follow_site": "ee_site",
270
- "look_target_site": "stick_tip",
271
- "track_orientation": True,
272
- "offset": [0.09, 0.0, 0.06],
273
- "forward_offset": 0.03,
274
- "distance": 0.12,
275
- "azimuth": 12,
276
- "elevation": 32,
277
- },
278
- ]
279
-
280
- OVERLAY_CAMERA_PRESETS = {
281
- "ur5_t_push": UR5_T_PUSH_OVERLAY_PRESETS,
282
- }
283
-
284
- CAMERA_FEEDS = [
285
- {"name": "main", "label": "Main", "description": "Primary viewport"},
286
- {"name": "aux_top", "label": "Aux Top", "description": "Additional top perspective"},
287
- {"name": "aux_side", "label": "Aux Side", "description": "Side perspective"},
288
- {"name": "aux_flange", "label": "Flange", "description": "Tool view mounted below the flange"},
289
- ]
290
 
291
  overlay_camera_states: dict[str, dict[str, Any]] = {}
292
  overlay_frames: dict[str, bytes | None] = {}
293
  overlay_frame_lock = threading.Lock()
 
 
294
 
295
  def _make_mjv_camera(config: dict[str, float]) -> mujoco.MjvCamera:
296
  cam_obj = mujoco.MjvCamera()
@@ -301,6 +264,40 @@ def _make_mjv_camera(config: dict[str, float]) -> mujoco.MjvCamera:
301
  return cam_obj
302
 
303
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
304
  def _normalize_vec(vec: np.ndarray) -> np.ndarray:
305
  norm = np.linalg.norm(vec)
306
  if norm < 1e-8:
@@ -424,15 +421,33 @@ def _get_site_forward(env_obj, site_id: int) -> np.ndarray:
424
 
425
  def _close_overlay_renderers():
426
  global overlay_camera_states
427
- for state in overlay_camera_states.values():
428
- renderer_obj = state.get("renderer")
429
- if renderer_obj:
430
- renderer_obj.close()
431
- overlay_camera_states = {}
 
432
  with overlay_frame_lock:
433
  overlay_frames.clear()
434
 
435
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
436
  def prepare_overlay_renderers(robot_type: str, scene_name: str | None):
437
  """Create overlay renderers for the active robot/scene (if configured)."""
438
  _close_overlay_renderers()
@@ -442,37 +457,21 @@ def prepare_overlay_renderers(robot_type: str, scene_name: str | None):
442
  configs = OVERLAY_CAMERA_PRESETS[robot_type]
443
  elif scene_name and scene_name in OVERLAY_CAMERA_PRESETS:
444
  configs = OVERLAY_CAMERA_PRESETS[scene_name]
 
 
 
 
 
 
 
 
445
  if not configs:
446
  return
447
 
448
  for config in configs:
449
- cam_obj = _make_mjv_camera(config)
450
- renderer_obj = mujoco.Renderer(
451
- env.model,
452
- height=OVERLAY_RENDER_HEIGHT,
453
- width=OVERLAY_RENDER_WIDTH
454
- ) if env else None
455
- follow_site = config.get("follow_site")
456
- site_id = -1
457
- if follow_site and env is not None:
458
- site_id = mujoco.mj_name2id(env.model, mujoco.mjtObj.mjOBJ_SITE, follow_site)
459
- offset = np.array(config.get("offset", [0.0, 0.0, 0.0]), dtype=np.float32)
460
- overlay_camera_states[config["name"]] = {
461
- "camera": cam_obj,
462
- "renderer": renderer_obj,
463
- "label": config.get("label", config["name"]),
464
- "follow_site": follow_site,
465
- "site_id": site_id,
466
- "offset": offset,
467
- "forward_offset": float(config.get("forward_offset", 0.0)),
468
- "track_orientation": bool(config.get("track_orientation")),
469
- "look_target_site": config.get("look_target_site"),
470
- "look_target_id": -1,
471
- }
472
- look_target_site = config.get("look_target_site")
473
- if look_target_site and env is not None:
474
- look_target_id = mujoco.mj_name2id(env.model, mujoco.mjtObj.mjOBJ_SITE, look_target_site)
475
- overlay_camera_states[config["name"]]["look_target_id"] = look_target_id
476
  with overlay_frame_lock:
477
  overlay_frames[config["name"]] = None
478
 
@@ -483,6 +482,62 @@ def _parse_bool(value):
483
  return str(value).strip().lower() in ("1", "true", "yes", "on")
484
 
485
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
486
  def _load_nova_ur5_config_from_env():
487
  use_state = _parse_bool(os.environ.get("NOVA_UR5_USE_STATE_STREAM") or os.environ.get("NOVA_USE_STATE_STREAM"))
488
  use_ik = _parse_bool(os.environ.get("NOVA_UR5_USE_IK") or os.environ.get("NOVA_USE_IK"))
@@ -855,6 +910,29 @@ def broadcast_state():
855
  ws_clients.difference_update(dead_clients)
856
 
857
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
858
  def _handle_external_client_message(ws, data):
859
  """Process message payloads originating from external clients."""
860
  msg_type = data.get("type")
@@ -1205,11 +1283,30 @@ def simulation_loop():
1205
  latest_frame = buffer.tobytes()
1206
 
1207
  # Render overlay camera frames (if configured)
1208
- for name, state in overlay_camera_states.items():
 
 
1209
  renderer_obj = state.get("renderer")
1210
  cam_obj = state.get("camera")
1211
  if renderer_obj is None or cam_obj is None:
1212
- continue
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1213
  if state.get("follow_site") and env is not None:
1214
  site_id = state.get("site_id", -1)
1215
  if site_id >= 0 and site_id < getattr(env.model, "nsite", 0):
@@ -1893,6 +1990,80 @@ def video_feed():
1893
  return Response(generate_frames(),
1894
  mimetype='multipart/x-mixed-replace; boundary=frame')
1895
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1896
  @app.route(f'{API_PREFIX}/camera/<name>/video_feed')
1897
  def camera_feed(name):
1898
  if name == "main":
@@ -1925,8 +2096,10 @@ def capture_depth_snapshot(camera_name: str) -> Optional[bytes]:
1925
  camera_obj = None
1926
  if camera_name == "main":
1927
  camera_obj = cam
1928
- elif camera_name in overlay_camera_states:
1929
- camera_obj = overlay_camera_states[camera_name].get("camera")
 
 
1930
 
1931
  if camera_obj is None:
1932
  return None
@@ -2078,9 +2251,18 @@ def get_env_info():
2078
 
2079
  # Check for overlay cameras based on scene
2080
  configs = (scene_name and OVERLAY_CAMERA_PRESETS.get(scene_name)) or OVERLAY_CAMERA_PRESETS.get(current_robot) or []
 
 
 
 
 
 
 
 
2081
  for feed in configs:
2082
  feed_name = feed.get('name', 'aux')
2083
- cam_state = overlay_camera_states.get(feed_name, {})
 
2084
  cam_obj = cam_state.get("camera")
2085
  feed_intrinsics = _camera_intrinsics(
2086
  OVERLAY_RENDER_WIDTH,
 
5
  import json
6
  import base64
7
  import math
8
+ import re
9
  import traceback
10
  import io
11
  import cv2
 
247
  "use_orientation",
248
  ]
249
 
250
+ OVERLAY_CAMERA_PRESETS: dict[str, list[dict[str, Any]]] = {}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
251
 
252
  overlay_camera_states: dict[str, dict[str, Any]] = {}
253
  overlay_frames: dict[str, bytes | None] = {}
254
  overlay_frame_lock = threading.Lock()
255
+ overlay_camera_lock = threading.Lock()
256
+ dynamic_camera_configs: dict[str, dict[str, dict[str, Any]]] = {}
257
 
258
  def _make_mjv_camera(config: dict[str, float]) -> mujoco.MjvCamera:
259
  cam_obj = mujoco.MjvCamera()
 
264
  return cam_obj
265
 
266
 
267
+ def _build_overlay_camera_state(config: dict[str, Any], create_renderer: bool = True) -> dict[str, Any]:
268
+ """Build overlay camera state payload from config."""
269
+ cam_obj = _make_mjv_camera(config)
270
+ renderer_obj = None
271
+ if create_renderer and env is not None:
272
+ renderer_obj = mujoco.Renderer(
273
+ env.model,
274
+ height=OVERLAY_RENDER_HEIGHT,
275
+ width=OVERLAY_RENDER_WIDTH
276
+ )
277
+ follow_site = config.get("follow_site")
278
+ site_id = -1
279
+ if follow_site and env is not None:
280
+ site_id = mujoco.mj_name2id(env.model, mujoco.mjtObj.mjOBJ_SITE, follow_site)
281
+ offset = np.array(config.get("offset", [0.0, 0.0, 0.0]), dtype=np.float32)
282
+ state_payload = {
283
+ "camera": cam_obj,
284
+ "renderer": renderer_obj,
285
+ "label": config.get("label", config["name"]),
286
+ "follow_site": follow_site,
287
+ "site_id": site_id,
288
+ "offset": offset,
289
+ "forward_offset": float(config.get("forward_offset", 0.0)),
290
+ "track_orientation": bool(config.get("track_orientation")),
291
+ "look_target_site": config.get("look_target_site"),
292
+ "look_target_id": -1,
293
+ }
294
+ look_target_site = config.get("look_target_site")
295
+ if look_target_site and env is not None:
296
+ look_target_id = mujoco.mj_name2id(env.model, mujoco.mjtObj.mjOBJ_SITE, look_target_site)
297
+ state_payload["look_target_id"] = look_target_id
298
+ return state_payload
299
+
300
+
301
  def _normalize_vec(vec: np.ndarray) -> np.ndarray:
302
  norm = np.linalg.norm(vec)
303
  if norm < 1e-8:
 
421
 
422
  def _close_overlay_renderers():
423
  global overlay_camera_states
424
+ with overlay_camera_lock:
425
+ for state in overlay_camera_states.values():
426
+ renderer_obj = state.get("renderer")
427
+ if renderer_obj:
428
+ renderer_obj.close()
429
+ overlay_camera_states = {}
430
  with overlay_frame_lock:
431
  overlay_frames.clear()
432
 
433
 
434
+ def _camera_scope_key(robot_type: str | None, scene_name: str | None) -> str:
435
+ return f"{robot_type or ''}:{scene_name or ''}"
436
+
437
+
438
+ def _get_dynamic_camera_configs(robot_type: str | None, scene_name: str | None) -> list[dict[str, Any]]:
439
+ key = _camera_scope_key(robot_type, scene_name)
440
+ configs = dynamic_camera_configs.get(key, {})
441
+ return list(configs.values())
442
+
443
+
444
+ def _store_dynamic_camera_config(robot_type: str | None, scene_name: str | None, config: dict[str, Any]) -> None:
445
+ key = _camera_scope_key(robot_type, scene_name)
446
+ if key not in dynamic_camera_configs:
447
+ dynamic_camera_configs[key] = {}
448
+ dynamic_camera_configs[key][config["name"]] = config
449
+
450
+
451
  def prepare_overlay_renderers(robot_type: str, scene_name: str | None):
452
  """Create overlay renderers for the active robot/scene (if configured)."""
453
  _close_overlay_renderers()
 
457
  configs = OVERLAY_CAMERA_PRESETS[robot_type]
458
  elif scene_name and scene_name in OVERLAY_CAMERA_PRESETS:
459
  configs = OVERLAY_CAMERA_PRESETS[scene_name]
460
+ dynamic_configs = _get_dynamic_camera_configs(robot_type, scene_name)
461
+ if dynamic_configs:
462
+ merged: dict[str, dict[str, Any]] = {}
463
+ for config in configs:
464
+ merged[config["name"]] = config
465
+ for config in dynamic_configs:
466
+ merged[config["name"]] = config
467
+ configs = list(merged.values())
468
  if not configs:
469
  return
470
 
471
  for config in configs:
472
+ state_payload = _build_overlay_camera_state(config)
473
+ with overlay_camera_lock:
474
+ overlay_camera_states[config["name"]] = state_payload
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
475
  with overlay_frame_lock:
476
  overlay_frames[config["name"]] = None
477
 
 
482
  return str(value).strip().lower() in ("1", "true", "yes", "on")
483
 
484
 
485
+ def _coerce_vec3(value: Any, default: list[float]) -> list[float]:
486
+ if value is None:
487
+ return default
488
+ if isinstance(value, dict):
489
+ value = [value.get("x"), value.get("y"), value.get("z")]
490
+ if not isinstance(value, (list, tuple)) or len(value) != 3:
491
+ raise ValueError("Expected a 3-element vector")
492
+ return [float(value[0]), float(value[1]), float(value[2])]
493
+
494
+
495
+ def _valid_camera_name(name: str) -> bool:
496
+ if not name:
497
+ return False
498
+ if name == "main":
499
+ return False
500
+ return re.match(r"^[A-Za-z0-9_-]+$", name) is not None
501
+
502
+
503
+ def _build_dynamic_camera_config(payload: dict[str, Any]) -> dict[str, Any]:
504
+ name = str(payload.get("name", "")).strip()
505
+ if not _valid_camera_name(name):
506
+ raise ValueError("Invalid camera name. Use letters, numbers, hyphens, or underscores.")
507
+
508
+ label = str(payload.get("label") or name).strip()
509
+ default_lookat = [float(cam.lookat[0]), float(cam.lookat[1]), float(cam.lookat[2])]
510
+ default_distance = float(cam.distance)
511
+ default_azimuth = float(cam.azimuth)
512
+ default_elevation = float(cam.elevation)
513
+
514
+ lookat = _coerce_vec3(payload.get("lookat"), default_lookat)
515
+ distance = float(payload.get("distance", default_distance))
516
+ azimuth = float(payload.get("azimuth", default_azimuth))
517
+ elevation = float(payload.get("elevation", default_elevation))
518
+
519
+ config: dict[str, Any] = {
520
+ "name": name,
521
+ "label": label,
522
+ "lookat": lookat,
523
+ "distance": distance,
524
+ "azimuth": azimuth,
525
+ "elevation": elevation,
526
+ }
527
+
528
+ for key in ("follow_site", "look_target_site"):
529
+ if payload.get(key) is not None:
530
+ config[key] = str(payload.get(key))
531
+ if payload.get("offset") is not None:
532
+ config["offset"] = _coerce_vec3(payload.get("offset"), [0.0, 0.0, 0.0])
533
+ if payload.get("forward_offset") is not None:
534
+ config["forward_offset"] = float(payload.get("forward_offset", 0.0))
535
+ if payload.get("track_orientation") is not None:
536
+ config["track_orientation"] = bool(payload.get("track_orientation"))
537
+
538
+ return config
539
+
540
+
541
  def _load_nova_ur5_config_from_env():
542
  use_state = _parse_bool(os.environ.get("NOVA_UR5_USE_STATE_STREAM") or os.environ.get("NOVA_USE_STATE_STREAM"))
543
  use_ik = _parse_bool(os.environ.get("NOVA_UR5_USE_IK") or os.environ.get("NOVA_USE_IK"))
 
910
  ws_clients.difference_update(dead_clients)
911
 
912
 
913
+ def broadcast_camera_event(action: str, camera_payload: dict[str, Any], scope: dict[str, Any]) -> None:
914
+ """Announce camera changes via the state stream."""
915
+ message = {
916
+ "type": "state",
917
+ "data": {
918
+ "camera_event": {
919
+ "action": action,
920
+ "camera": camera_payload,
921
+ "scope": scope,
922
+ "timestamp": time.time(),
923
+ }
924
+ }
925
+ }
926
+ with ws_clients_lock:
927
+ dead_clients = set()
928
+ for ws in ws_clients:
929
+ try:
930
+ ws.send(json.dumps(message))
931
+ except Exception:
932
+ dead_clients.add(ws)
933
+ ws_clients.difference_update(dead_clients)
934
+
935
+
936
  def _handle_external_client_message(ws, data):
937
  """Process message payloads originating from external clients."""
938
  msg_type = data.get("type")
 
1283
  latest_frame = buffer.tobytes()
1284
 
1285
  # Render overlay camera frames (if configured)
1286
+ with overlay_camera_lock:
1287
+ overlay_snapshot = list(overlay_camera_states.items())
1288
+ for name, state in overlay_snapshot:
1289
  renderer_obj = state.get("renderer")
1290
  cam_obj = state.get("camera")
1291
  if renderer_obj is None or cam_obj is None:
1292
+ if cam_obj is None:
1293
+ continue
1294
+ with mujoco_lock:
1295
+ if env is None:
1296
+ continue
1297
+ try:
1298
+ renderer_obj = mujoco.Renderer(
1299
+ env.model,
1300
+ height=OVERLAY_RENDER_HEIGHT,
1301
+ width=OVERLAY_RENDER_WIDTH
1302
+ )
1303
+ except Exception:
1304
+ continue
1305
+ with overlay_camera_lock:
1306
+ if name in overlay_camera_states:
1307
+ overlay_camera_states[name]["renderer"] = renderer_obj
1308
+ if renderer_obj is None:
1309
+ continue
1310
  if state.get("follow_site") and env is not None:
1311
  site_id = state.get("site_id", -1)
1312
  if site_id >= 0 and site_id < getattr(env.model, "nsite", 0):
 
1990
  return Response(generate_frames(),
1991
  mimetype='multipart/x-mixed-replace; boundary=frame')
1992
 
1993
+
1994
+ @app.route(f'{API_PREFIX}/cameras', methods=['POST'])
1995
+ def add_camera():
1996
+ """Add an overlay camera for the current robot/scene."""
1997
+ payload = request.get_json(silent=True) or {}
1998
+ replace_existing = bool(payload.get("replace", False))
1999
+
2000
+ try:
2001
+ config = _build_dynamic_camera_config(payload)
2002
+ except ValueError as exc:
2003
+ return jsonify({"error": str(exc)}), 400
2004
+
2005
+ with mujoco_lock:
2006
+ if env is None:
2007
+ return jsonify({"error": "no environment initialized"}), 503
2008
+
2009
+ scene_name = getattr(env, "scene_name", None) or current_scene
2010
+ robot_type = current_robot
2011
+
2012
+ # Validate site references if provided
2013
+ follow_site = config.get("follow_site")
2014
+ look_target_site = config.get("look_target_site")
2015
+ if follow_site:
2016
+ site_id = mujoco.mj_name2id(env.model, mujoco.mjtObj.mjOBJ_SITE, follow_site)
2017
+ if site_id < 0:
2018
+ return jsonify({"error": f"follow_site '{follow_site}' not found"}), 400
2019
+ if look_target_site:
2020
+ target_id = mujoco.mj_name2id(env.model, mujoco.mjtObj.mjOBJ_SITE, look_target_site)
2021
+ if target_id < 0:
2022
+ return jsonify({"error": f"look_target_site '{look_target_site}' not found"}), 400
2023
+
2024
+ temp_cam = _make_mjv_camera(config)
2025
+ feed_intrinsics = _camera_intrinsics(
2026
+ OVERLAY_RENDER_WIDTH,
2027
+ OVERLAY_RENDER_HEIGHT,
2028
+ _camera_fovy_degrees(temp_cam or cam, env),
2029
+ )
2030
+ camera_payload = {
2031
+ "name": config["name"],
2032
+ "label": config.get("label", config["name"]),
2033
+ "url": f"{API_PREFIX}/camera/{config['name']}/video_feed",
2034
+ "pose": _camera_pose(temp_cam),
2035
+ "intrinsics": feed_intrinsics,
2036
+ }
2037
+
2038
+ with overlay_camera_lock:
2039
+ exists = config["name"] in overlay_camera_states
2040
+ if exists and not replace_existing:
2041
+ return jsonify({"error": "camera already exists", "name": config["name"]}), 409
2042
+
2043
+ _store_dynamic_camera_config(robot_type, scene_name, config)
2044
+
2045
+ state_payload = _build_overlay_camera_state(config, create_renderer=False)
2046
+ with overlay_camera_lock:
2047
+ if config["name"] in overlay_camera_states:
2048
+ old_renderer = overlay_camera_states[config["name"]].get("renderer")
2049
+ if old_renderer:
2050
+ old_renderer.close()
2051
+ overlay_camera_states[config["name"]] = state_payload
2052
+ with overlay_frame_lock:
2053
+ overlay_frames[config["name"]] = None
2054
+
2055
+ broadcast_camera_event(
2056
+ "added",
2057
+ camera_payload,
2058
+ {"robot": robot_type, "scene": scene_name},
2059
+ )
2060
+ return jsonify({
2061
+ "status": "ok",
2062
+ "camera": camera_payload,
2063
+ "scope": {"robot": robot_type, "scene": scene_name},
2064
+ })
2065
+
2066
+
2067
  @app.route(f'{API_PREFIX}/camera/<name>/video_feed')
2068
  def camera_feed(name):
2069
  if name == "main":
 
2096
  camera_obj = None
2097
  if camera_name == "main":
2098
  camera_obj = cam
2099
+ else:
2100
+ with overlay_camera_lock:
2101
+ if camera_name in overlay_camera_states:
2102
+ camera_obj = overlay_camera_states[camera_name].get("camera")
2103
 
2104
  if camera_obj is None:
2105
  return None
 
2251
 
2252
  # Check for overlay cameras based on scene
2253
  configs = (scene_name and OVERLAY_CAMERA_PRESETS.get(scene_name)) or OVERLAY_CAMERA_PRESETS.get(current_robot) or []
2254
+ dynamic_configs = _get_dynamic_camera_configs(current_robot, scene_name)
2255
+ if dynamic_configs:
2256
+ merged_configs: dict[str, dict[str, Any]] = {}
2257
+ for feed in configs:
2258
+ merged_configs[feed.get("name", "aux")] = feed
2259
+ for feed in dynamic_configs:
2260
+ merged_configs[feed.get("name", "aux")] = feed
2261
+ configs = list(merged_configs.values())
2262
  for feed in configs:
2263
  feed_name = feed.get('name', 'aux')
2264
+ with overlay_camera_lock:
2265
+ cam_state = overlay_camera_states.get(feed_name, {}).copy()
2266
  cam_obj = cam_state.get("camera")
2267
  feed_intrinsics = _camera_intrinsics(
2268
  OVERLAY_RENDER_WIDTH,
tests/test_dynamic_camera.py ADDED
@@ -0,0 +1,78 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Tests for dynamic camera endpoint and state stream announcements.
2
+
3
+ Run with server running:
4
+ python nova-sim/mujoco_server.py
5
+
6
+ Then in another terminal:
7
+ pytest nova-sim/tests/test_dynamic_camera.py -v
8
+ """
9
+
10
+ import json
11
+ import time
12
+ import pytest
13
+ import requests
14
+ from websockets.sync.client import connect
15
+
16
+
17
+ BASE_URL = "http://localhost:3004/nova-sim/api/v1"
18
+ WS_URL = "ws://localhost:3004/nova-sim/api/v1/ws"
19
+
20
+
21
+ @pytest.fixture(scope="module")
22
+ def check_server():
23
+ """Check if the server is running before tests."""
24
+ try:
25
+ response = requests.get(f"{BASE_URL}/metadata", timeout=2)
26
+ response.raise_for_status()
27
+ except requests.RequestException:
28
+ pytest.skip("Nova-Sim server is not running at localhost:3004")
29
+
30
+
31
+ def test_add_dynamic_camera_announces_state(check_server):
32
+ """Add a camera and verify state stream announcement + env listing."""
33
+ camera_name = "aux_side"
34
+ payload = {
35
+ "name": camera_name,
36
+ "label": "Aux Side View",
37
+ "lookat": [0.55, -0.1, 0.42],
38
+ "distance": 0.9,
39
+ "azimuth": -45,
40
+ "elevation": -30,
41
+ "replace": True,
42
+ }
43
+
44
+ with connect(WS_URL, timeout=10) as ws:
45
+ ws.send(json.dumps({
46
+ "type": "client_identity",
47
+ "data": {"client_id": "test-dynamic-camera"}
48
+ }))
49
+
50
+ resp = requests.post(f"{BASE_URL}/cameras", json=payload, timeout=5)
51
+ assert resp.status_code == 200
52
+ data = resp.json()
53
+ assert data["camera"]["name"] == camera_name
54
+
55
+ found_event = False
56
+ deadline = time.time() + 5
57
+ while time.time() < deadline:
58
+ try:
59
+ msg = ws.recv(timeout=1)
60
+ except TimeoutError:
61
+ continue
62
+ parsed = json.loads(msg)
63
+ if parsed.get("type") != "state":
64
+ continue
65
+ event = (parsed.get("data") or {}).get("camera_event")
66
+ if not event:
67
+ continue
68
+ if event.get("action") == "added" and (event.get("camera") or {}).get("name") == camera_name:
69
+ found_event = True
70
+ break
71
+
72
+ assert found_event, "Did not receive camera_event state message"
73
+
74
+ env_resp = requests.get(f"{BASE_URL}/env", timeout=5)
75
+ env_resp.raise_for_status()
76
+ env_data = env_resp.json()
77
+ feed_names = [feed.get("name") for feed in env_data.get("camera_feeds", [])]
78
+ assert camera_name in feed_names