File size: 46,045 Bytes

4ee0c8c

---
license: cc-by-4.0
task_categories:
- robotics
- image-segmentation
- graph-ml
language:
- en
tags:
- robotics
- manipulation
- disassembly
- constraint-graph
- gnn
- world-model
- sam2
- segmentation
- ur5e
size_categories:
- 1K<n<10K
pretty_name: GNN Disassembly World Model Dataset (v3)
---

# GNN Disassembly World Model Dataset (v3)

Real robot disassembly episodes with side-view **per-frame constraint graphs**, SAM2 segmentation masks, 256D feature embeddings, **full 3D depth information (point clouds)**, and synchronized robot states. The robot is labeled as a separate **agent node** with its own mask, embedding, and depth bundle.

**Project:** CoRL 2026 — GNN world model for constraint-aware video generation
**Author:** Chang Liu (Texas A&M University)
**Hardware:** UR5e + Robotiq 2F-85 gripper, OAK-D Pro (side view)
**Format version:** v3 (2026-04-10)

## Dataset Structure

```
episode_XX/
├── metadata.json              # Episode metadata, component counts, labeled frame count
├── robot_states.npy           # (T, 13) float32 — joint angles + TCP pose + gripper
├── robot_actions.npy          # (T-1, 13) float32 — frame-to-frame state deltas
├── timestamps.npy             # (T, 3) float64
├── side/
│   ├── rgb/frame_XXXXXX.png   # 1280×720 RGB (side camera)
│   └── depth/frame_XXXXXX.npy # 1280×720 uint16 depth (mm)
├── wrist/                     # Raw wrist camera data (not used in v3)
│   ├── rgb/...
│   └── depth/...
└── annotations/
    ├── side_graph.json              # Constraint graph (products only, NO robot)
    ├── side_masks/
    │   └── frame_XXXXXX.npz         # {component_id: (H,W) uint8} — products only
    ├── side_embeddings/
    │   └── frame_XXXXXX.npz         # {component_id: (256,) float32} — products only
    ├── side_depth_info/
    │   └── frame_XXXXXX.npz         # Per-product depth bundle (flat keys)
    ├── side_robot/
    │   └── frame_XXXXXX.npz         # Robot bundle — ALWAYS written per labeled frame
    └── dataset_card.json            # Format description
```

**Alignment guarantee:** every labeled frame has files in all 4 annotation directories. Files are aligned by frame index.

## Component Types (9 types)

**8 product types** (constraint nodes):

| Index | Type | Color | Notes |
|-------|------|-------|-------|
| 0 | `cpu_fan` | #FF6B6B | Always visible at start |
| 1 | `cpu_bracket` | #4ECDC4 | Hidden at start (under fan) |
| 2 | `cpu` | #45B7D1 | Hidden at start |
| 3 | `ram_clip` | #96CEB4 | Multiple instances: ram_clip_1, ram_clip_2, ... |
| 4 | `ram` | #FFEAA7 | Multiple instances: ram_1, ram_2, ... |
| 5 | `connector` | #DDA0DD | Multiple instances: connector_1, connector_2, ... |
| 6 | `graphic_card` | #FF8C42 | Always visible |
| 7 | `motherboard` | #8B5CF6 | Always visible (base) |

**1 agent type** (NOT in constraint graph):

| Index | Type | Color | Notes |
|-------|------|-------|-------|
| 8 | `robot` | #F5F5F5 | Labeled but stored separately. Added as agent node at training time. |

## Graph Semantics

### Constraint Graph (Sparse, Stored)

`side_graph.json` defines the **physical constraint relationships** between products. Directed edges: `A -> B` means "A must be removed before B can be removed" (A blocks B).

```
cpu_fan      -> cpu_bracket         (fan covers bracket)
cpu_fan      -> motherboard         (fan attached to board)
cpu_bracket  -> cpu                 (bracket holds CPU)
cpu_bracket  -> motherboard
cpu          -> motherboard
ram_N        -> motherboard
ram_clip_N   -> motherboard
ram_clip_N   -> ram_M               (user manually pairs)
connector_N  -> motherboard
graphic_card -> motherboard
```

**Edge states** are delta-encoded in `frame_states`:
- `locked: true` (1) — constraint active, component cannot be removed
- `locked: false` (0) — constraint released, component is free
- Monotonic: once unlocked, stays unlocked

### Fully Connected Graph (Built at Training Time)

For GNN message passing, the sparse constraint graph is expanded to a **fully connected directed** graph. Every ordered pair `(i, j)` where `i != j` gets an edge. Self-loops are excluded.

**Edge count:** For a graph with N nodes, there are **N × (N - 1)** directed edges (both directions for every pair).

**Edge features (2D):**

| `has_constraint` | `is_locked` | Meaning |
|---|---|---|
| 1 | 1 | Directed physical constraint `i → j` exists, currently active (locked) |
| 1 | 0 | Directed physical constraint `i → j` exists, released (unlocked) |
| 0 | 0 | No physical constraint in this direction — message passing only |

**Direction handling is asymmetric.** The physical constraint `A → B` (A blocks B's removal) is a one-way relationship:
- Edge `(A, B)` → `has_constraint = 1`
- Edge `(B, A)` → `has_constraint = 0` (no reverse constraint; still present for message passing)

For example, if `cpu_fan → cpu_bracket` is a constraint:
```
(cpu_fan, cpu_bracket)  →  has_constraint=1, is_locked=1  (physical, active)
(cpu_bracket, cpu_fan)  →  has_constraint=0, is_locked=0  (message passing only)
```

This ensures every node pair communicates during GNN layers while still encoding the directionality of the prerequisite relationship.

**Robot (agent node)** has NO physical constraints. All edges involving the robot (`robot ↔ any_product`) have features `[0, 0]` — context-passing only.

**Node ordering:** Node indices in `edge_index` match the order of `components` in `side_graph.json`. When the robot is added (with `load_pyg_frame_with_robot`), it is appended at index `N_products` (the last position).

## Data File Schemas

### `side_graph.json`

```json
{
  "view": "side",
  "episode_id": "episode_00",
  "goal_component": "connector_1",
  "components": [
    {"id": "cpu_fan", "type": "cpu_fan", "color": "#FF6B6B"},
    {"id": "ram_1", "type": "ram", "color": "#FFEAA7"}
  ],
  "edges": [
    {"src": "cpu_fan", "dst": "cpu_bracket", "directed": true},
    {"src": "ram_clip_1", "dst": "ram_1", "directed": true}
  ],
  "frame_states": {
    "0": {
      "constraints": {"cpu_fan->cpu_bracket": true},
      "visibility": {"cpu_bracket": false, "cpu": false, "robot": true}
    },
    "152": {
      "constraints": {"cpu_fan->cpu_bracket": false},
      "visibility": {"cpu_fan": false, "cpu_bracket": true, "cpu": true}
    }
  },
  "node_positions": {"cpu_fan": [120, 80]},
  "embedding_dim": 256,
  "feature_extractor": "sam2.1_hiera_base_plus",
  "type_vocab": ["cpu_fan", "cpu_bracket", "cpu", "ram_clip", "ram", "connector", "graphic_card", "motherboard", "robot"]
}
```

**Robot is NOT in components.** Robot is stored in `side_robot/`.

### `side_depth_info/frame_XXXXXX.npz`

**Always contains all 7 keys per component in `graph.components`.** Flat keys prefixed by component_id.

| Key | Shape | Dtype | Description |
|-----|-------|-------|-------------|
| `{cid}_point_cloud` | (N, 3) | float32 | 3D points in camera frame (meters). Empty (0, 3) if no valid depth. |
| `{cid}_pixel_coords` | (N, 2) | int32 | (u, v) pixel coords of valid points |
| `{cid}_raw_depths_mm` | (N,) | uint16 | Raw depth values in mm, filtered to [50, 2000] |
| `{cid}_centroid` | (3,) | float32 | Mean of point_cloud; [0,0,0] if no valid depth |
| `{cid}_bbox_2d` | (4,) | int32 | [x1, y1, x2, y2] from mask |
| `{cid}_area` | (1,) | int32 | Mask pixel count |
| `{cid}_depth_valid` | (1,) | uint8 | 1 if N > 0 else 0 |

### `side_robot/frame_XXXXXX.npz`

**Always written per labeled frame** (with `visible=[0]` if robot not in this frame).

| Key | Shape | Dtype | Description |
|-----|-------|-------|-------------|
| `visible` | (1,) | uint8 | 1 if robot labeled, 0 otherwise |
| `mask` | (H, W) | uint8 | Binary mask |
| `embedding` | (256,) | float32 | SAM2 256D feature |
| `point_cloud` | (N, 3) | float32 | 3D points (meters) |
| `pixel_coords` | (N, 2) | int32 | (u, v) pixel coords |
| `raw_depths_mm` | (N,) | uint16 | Raw depths in mm |
| `centroid` | (3,) | float32 | Mean of point_cloud |
| `bbox_2d` | (4,) | int32 | From mask |
| `area` | (1,) | int32 | Pixel count |
| `depth_valid` | (1,) | uint8 | 1 if N > 0 else 0 |

### `metadata.json`

```json
{
  "episode_id": "episode_00",
  "goal_component": "connector_1",
  "num_frames": 604,
  "labeled_frame_count": 246,
  "annotation_complete": false,
  "component_counts": {
    "cpu_fan": 1, "cpu_bracket": 1, "cpu": 1,
    "ram": 2, "ram_clip": 4, "connector": 4,
    "graphic_card": 1, "motherboard": 1
  },
  "format_version": "3.0",
  "sam2_model": "sam2.1_hiera_b+",
  "embedding_dim": 256,
  "fps": 30,
  "cameras": ["side"],
  "robot": "UR5e",
  "gripper": "Robotiq 2F-85"
}
```

## Test Data Available

One episode is fully labeled and validated — you can use it to test the loader:

**Labeled episode:** `session_0408_162129/episode_00`

| Stat | Value |
|------|-------|
| Total frames in episode | 604 |
| Labeled frames | **346** (range 0–351, 6 gaps) |
| Product components | 15 (cpu_fan, cpu_bracket, cpu, graphic_card, motherboard, connector_1..4, ram_1..2, ram_clip_1..4) |
| Physical constraints (edges) | 14 |
| Robot visibility | Visible in 216 / 346 frames |
| Goal component | `connector_1` |

### Download and Test (3 steps)

**Step 1: Download just one episode (lightweight)**

```bash
pip install huggingface_hub
```

```python
from huggingface_hub import snapshot_download

local_dir = snapshot_download(
    repo_id="ChangChrisLiu/GNN_Disassembly_WorldModel",
    repo_type="dataset",
    allow_patterns=[
        "session_0408_162129/episode_00/metadata.json",
        "session_0408_162129/episode_00/robot_states.npy",
        "session_0408_162129/episode_00/robot_actions.npy",
        "session_0408_162129/episode_00/side/rgb/frame_000042.png",
        "session_0408_162129/episode_00/side/depth/frame_000042.npy",
        "session_0408_162129/episode_00/annotations/*",
    ],
)
print("Downloaded to:", local_dir)
```

**Step 2: Save the loader code** (copy the self-contained `gnn_disassembly_loader.py` block below into a file)

**Step 3: Run this test script** — it loads frame 42, prints the full graph anatomy, and verifies everything:

```python
from pathlib import Path
from gnn_disassembly_loader import (
    load_pyg_frame_products_only,
    load_pyg_frame_with_robot,
    list_labeled_frames,
    load_frame_data,
)

# After snapshot_download above:
episode = Path(local_dir) / "session_0408_162129" / "episode_00"

# 1. Enumerate labeled frames
frames = list_labeled_frames(episode)
assert len(frames) == 346, f"Expected 346 labeled frames, got {len(frames)}"
print(f"✓ Labeled frames: {len(frames)} (range {frames[0]}..{frames[-1]})")

# 2. Load frame 42 — products only
data1 = load_pyg_frame_products_only(episode, frame_idx=42)
assert data1.num_nodes == 15, f"Expected 15 products, got {data1.num_nodes}"
assert data1.edge_index.shape[1] == 15 * 14  # fully connected
assert data1.edge_attr.shape == (210, 3)     # 3D edge features
print(f"✓ Products-only: {data1}")

# 3. Load frame 42 — with robot agent
data2 = load_pyg_frame_with_robot(episode, frame_idx=42)
assert data2.num_nodes == 16, f"Expected 15 products + 1 robot = 16, got {data2.num_nodes}"
assert data2.edge_index.shape[1] == 16 * 15
assert hasattr(data2, "robot_point_cloud")
print(f"✓ With robot: {data2}")
print(f"  Robot point cloud: {tuple(data2.robot_point_cloud.shape)}")
print(f"  Robot mask:        {tuple(data2.robot_mask.shape)}")

# 4. Verify robot edges are all [0, 0, 0]
robot_idx = data2.num_nodes - 1
robot_edges = (data2.edge_index[0] == robot_idx) | (data2.edge_index[1] == robot_idx)
assert (data2.edge_attr[robot_edges] == 0).all()
print(f"✓ Robot edges: {robot_edges.sum().item()} — all [0,0,0]")

# 5. Verify edge feature semantics
has_c = (data1.edge_attr[:, 0] == 1).sum().item()
locked = ((data1.edge_attr[:, 0] == 1) & (data1.edge_attr[:, 1] == 1)).sum().item()
src_blocks = ((data1.edge_attr[:, 0] == 1) & (data1.edge_attr[:, 2] == 1)).sum().item()
assert has_c == 28   # 14 constraints × 2 directions
assert locked == 28  # all locked at frame 42
assert src_blocks == 14  # half the constraint edges have src as blocker
print(f"✓ Edge features: {has_c} constraint edges, {locked} locked, {src_blocks} forward-direction")

# 6. Verify fully-connected + symmetric structure
from collections import Counter
pairs = Counter()
for i in range(data1.edge_index.shape[1]):
    src = data1.edge_index[0, i].item()
    dst = data1.edge_index[1, i].item()
    pairs[frozenset([src, dst])] += 1
# Every unordered pair should appear exactly twice: (i, j) AND (j, i)
assert all(count == 2 for count in pairs.values())
print(f"✓ Structurally symmetric: every pair has both directions")

# 7. Raw data access
fd = load_frame_data(episode, frame_idx=42)
print(f"✓ Raw data: {len(fd.masks)} product masks, robot {'visible' if fd.robot else 'hidden'}")

print("\nAll tests passed! The dataset is ready for training.")
```

Expected output:
```
✓ Labeled frames: 346 (range 0..351)
✓ Products-only: Data(x=[15, 269], edge_index=[2, 210], edge_attr=[210, 3], y=[1], num_nodes=15)
✓ With robot: Data(x=[16, 269], edge_index=[2, 240], edge_attr=[240, 3], y=[1], num_nodes=16, robot_point_cloud=[5729, 3], robot_pixel_coords=[5729, 2], robot_mask=[720, 1280])
  Robot point cloud: (5729, 3)
  Robot mask:        (720, 1280)
✓ Robot edges: 30 — all [0,0,0]
✓ Edge features: 28 constraint edges, 28 locked, 14 forward-direction
✓ Structurally symmetric: every pair has both directions
✓ Raw data: 13 product masks, robot visible

All tests passed! The dataset is ready for training.
```

## Graph Structure — What You Get Per Frame

Every labeled frame is converted to **one PyTorch Geometric `Data` object**. Here's exactly what it contains:

### Node Features (269D per node)

```
┌───────────────────────────────────────────────────────────────────────┐
│ x[i] = 269D feature vector for node i                                  │
├───────────────────────────────────────────────────────────────────────┤
│ [0   : 256]   SAM2 embedding (256D)                                    │
│               Masked average pool over SAM2 encoder's vision_features. │
│               Captures visual appearance of the component.              │
├───────────────────────────────────────────────────────────────────────┤
│ [256 : 259]   3D position (3D)                                         │
│               Centroid in camera frame, meters. Mean of the valid      │
│               depth-backprojected points within the mask.              │
│               Zero vector if no valid depth (check depth_valid flag).  │
├───────────────────────────────────────────────────────────────────────┤
│ [259 : 268]   Type one-hot (9D)                                        │
│               Index order: cpu_fan, cpu_bracket, cpu, ram_clip, ram,   │
│               connector, graphic_card, motherboard, robot.             │
│               Multiple instances (e.g. ram_1, ram_2) share the same    │
│               one-hot — distinguished by their SAM2 embedding + 3D pos.│
├───────────────────────────────────────────────────────────────────────┤
│ [268]         Visibility (1D)                                          │
│               Binary flag — 1 if visible this frame, 0 if hidden.      │
│               Delta-encoded through frame_states in side_graph.json.   │
└───────────────────────────────────────────────────────────────────────┘
```

### Graph Topology — Fully Connected, Structurally Symmetric

For N nodes, the PyG graph has:
- `edge_index` shape: **(2, N × (N − 1))**
- Every ordered pair `(i, j)` with `i ≠ j` has an edge
- Both `(i, j)` AND `(j, i)` exist — the graph is **not structurally directed**
- Self-loops are excluded

**Why fully connected?** Sparse constraint graphs (just physical prerequisites) would prevent distant nodes from exchanging information through GNN message passing. Making it fully connected ensures every node pair communicates in one layer.

### Edge Features (3D per edge)

```
┌─────────────────┬──────────┬────────────────┬─────────────────────────┐
│ has_constraint  │ is_locked│ src_blocks_dst │ Meaning                 │
├─────────────────┼──────────┼────────────────┼─────────────────────────┤
│ 0               │ 0        │ 0              │ No physical constraint  │
│                 │          │                │ (message passing only)  │
├─────────────────┼──────────┼────────────────┼─────────────────────────┤
│ 1               │ 1        │ 1              │ Physical constraint     │
│                 │          │                │ LOCKED, src is blocker  │
├─────────────────┼──────────┼────────────────┼─────────────────────────┤
│ 1               │ 1        │ 0              │ Physical constraint     │
│                 │          │                │ LOCKED, src is blocked  │
│                 │          │                │ (reverse direction)     │
├─────────────────┼──────────┼────────────────┼─────────────────────────┤
│ 1               │ 0        │ 1              │ Physical constraint     │
│                 │          │                │ RELEASED (unlocked)     │
│                 │          │                │ src is the blocker      │
├─────────────────┼──────────┼────────────────┼─────────────────────────┤
│ 1               │ 0        │ 0              │ Physical constraint     │
│                 │          │                │ RELEASED, src is blocked│
└─────────────────┴──────────┴────────────────┴─────────────────────────┘
```

**Direction is a feature, not structure.**
- `has_constraint` and `is_locked` describe the PAIR — they're the same for both `(i,j)` and `(j,i)`.
- `src_blocks_dst` is asymmetric: it flips depending on which direction the edge goes.

**Example:** `cpu_fan` blocks `cpu_bracket` (fan covers bracket). At frame 0 (locked):

```
edge (cpu_fan, cpu_bracket)  →  [1, 1, 1]   cpu_fan is the blocker
edge (cpu_bracket, cpu_fan)  →  [1, 1, 0]   cpu_bracket is the blocked
```

At frame 152 after the user removes the fan (unlocked):

```
edge (cpu_fan, cpu_bracket)  →  [1, 0, 1]
edge (cpu_bracket, cpu_fan)  →  [1, 0, 0]
```

### Robot Agent Node (Optional)

When loaded with `load_pyg_frame_with_robot()`, the robot is appended as the **last node** (index `N_products`). All edges involving the robot have features `[0, 0, 0]` — the robot has no physical constraints, it's a context-providing agent node.

The raw robot data (point cloud, pixel coords, full mask) is attached as extra tensors on the `Data` object for optional PointNet-style encoding.

### Matching a Frame to Its RGB Image

Frame indices in the loader directly map to image files:

```python
frame_idx = 42
rgb_path   = episode / "side" / "rgb"   / f"frame_{frame_idx:06d}.png"
depth_path = episode / "side" / "depth" / f"frame_{frame_idx:06d}.npy"
```

Example — load PyG frame + matching image + depth:

```python
from pathlib import Path
import numpy as np
from PIL import Image
from gnn_disassembly_loader import load_pyg_frame_with_robot

episode = Path("episode_00")
frame_idx = 42

# PyG graph for this frame
data = load_pyg_frame_with_robot(episode, frame_idx)

# Matching RGB image (1280x720 PNG)
rgb = np.array(Image.open(episode / "side" / "rgb" / f"frame_{frame_idx:06d}.png"))
print("RGB shape:", rgb.shape)  # (720, 1280, 3)

# Matching depth (1280x720 uint16 mm)
depth = np.load(episode / "side" / "depth" / f"frame_{frame_idx:06d}.npy")
print("Depth shape:", depth.shape, depth.dtype)  # (720, 1280) uint16

# Robot mask is in the PyG data if robot is visible
if hasattr(data, "robot_mask"):
    robot_mask = data.robot_mask.numpy()  # (720, 1280) uint8
    print("Robot mask area:", robot_mask.sum(), "pixels")
```

## Loading the Data — PyTorch Geometric

This section contains **self-contained** code you can copy-paste directly. No need to clone any repo.

### Prerequisites

```bash
pip install torch numpy torch_geometric pillow
```

### Self-contained PyG loader

Copy this into a file called `gnn_disassembly_loader.py`:

```python
"""Self-contained PyG loader for the GNN Disassembly dataset.

Two loader variants:
  - load_pyg_frame_products_only(ep, frame)  → constraint graph only, no robot
  - load_pyg_frame_with_robot(ep, frame)     → constraint graph + robot agent node

Both return torch_geometric.data.Data with:
  x            (N, 268)      node features
  edge_index   (2, N*(N-1))  fully connected directed message-passing edges
  edge_attr    (N*(N-1), 3)  [has_constraint, is_locked, src_blocks_dst]
  num_nodes    N

Notes on the edge feature design:
- The graph is FULLY CONNECTED and structurally symmetric.
  Both (i, j) and (j, i) exist in edge_index for every node pair i != j.
- Direction is NOT encoded in the graph structure. It is encoded as
  a feature: `src_blocks_dst`.
- `has_constraint` and `is_locked` are symmetric per pair (same value
  for both (i, j) and (j, i)).
- `src_blocks_dst` is asymmetric: it is 1 if the edge's src node
  physically blocks its dst node, 0 otherwise.
"""

import json
from dataclasses import dataclass
from pathlib import Path
from typing import Dict, List, Optional, Tuple

import numpy as np
import torch
from torch_geometric.data import Data


# ─────────────────────────────────────────────────────────────────────────────
# Helpers
# ─────────────────────────────────────────────────────────────────────────────

def list_labeled_frames(episode_dir: Path) -> List[int]:
    """Return sorted list of frame indices that have saved annotations."""
    mask_dir = episode_dir / "annotations" / "side_masks"
    if not mask_dir.exists():
        return []
    frames = []
    for p in mask_dir.glob("frame_*.npz"):
        try:
            frames.append(int(p.stem.split("_")[1]))
        except (ValueError, IndexError):
            continue
    return sorted(frames)


def resolve_frame_state(graph_json: dict, frame_idx: int) -> Tuple[Dict[str, bool], Dict[str, bool]]:
    """Resolve delta-encoded constraints + visibility at a frame.

    Walks frame_states from frame 0 to frame_idx, accumulating deltas.
    Returns (constraints_dict, visibility_dict).
    """
    constraints: Dict[str, bool] = {}
    visibility: Dict[str, bool] = {}
    # Defaults: every component visible, every edge locked
    for c in graph_json["components"]:
        visibility[c["id"]] = True
    for e in graph_json["edges"]:
        constraints[f"{e['src']}->{e['dst']}"] = True
    # Walk deltas up to frame_idx
    fs_dict = graph_json.get("frame_states", {})
    for f in sorted([int(k) for k in fs_dict]):
        if f > frame_idx:
            break
        fs = fs_dict[str(f)]
        for k, v in fs.get("constraints", {}).items():
            constraints[k] = v
        for k, v in fs.get("visibility", {}).items():
            visibility[k] = v
    return constraints, visibility


def type_one_hot(comp_type: str, type_vocab: List[str]) -> List[float]:
    """9-dim one-hot encoding of component type based on type_vocab."""
    return [1.0 if t == comp_type else 0.0 for t in type_vocab]


# ─────────────────────────────────────────────────────────────────────────────
# Raw data loader (NumPy only, no torch)
# ─────────────────────────────────────────────────────────────────────────────

@dataclass
class FrameData:
    graph: dict
    masks: Dict[str, np.ndarray]
    embeddings: Dict[str, np.ndarray]
    depth_info: dict
    robot: Optional[dict]
    constraints: Dict[str, bool]
    visibility: Dict[str, bool]


def load_frame_data(episode_dir: Path, frame_idx: int) -> FrameData:
    """Load all v3 annotation files for one frame."""
    anno = episode_dir / "annotations"

    with open(anno / "side_graph.json") as f:
        graph = json.load(f)

    def _load_npz_dict(path: Path) -> Dict[str, np.ndarray]:
        if not path.exists():
            return {}
        d = np.load(path)
        return {k: d[k] for k in d.files}

    masks = _load_npz_dict(anno / "side_masks" / f"frame_{frame_idx:06d}.npz")
    embeddings = _load_npz_dict(anno / "side_embeddings" / f"frame_{frame_idx:06d}.npz")
    depth_info = _load_npz_dict(anno / "side_depth_info" / f"frame_{frame_idx:06d}.npz")

    robot: Optional[dict] = None
    robot_path = anno / "side_robot" / f"frame_{frame_idx:06d}.npz"
    if robot_path.exists():
        r = np.load(robot_path)
        if r["visible"][0] == 1:
            robot = {k: r[k] for k in r.files}

    constraints, visibility = resolve_frame_state(graph, frame_idx)
    return FrameData(graph, masks, embeddings, depth_info, robot, constraints, visibility)


# ─────────────────────────────────────────────────────────────────────────────
# PyG loader — products only
# ─────────────────────────────────────────────────────────────────────────────

def load_pyg_frame_products_only(episode_dir: Path, frame_idx: int) -> Data:
    """Fully connected PyG graph WITHOUT robot.

    Returns Data(
        x=[N, 268],
        edge_index=[2, N*(N-1)],
        edge_attr=[N*(N-1), 3],   # [has_constraint, is_locked, src_blocks_dst]
        num_nodes=N,
    )
    where N = number of product components (robot excluded).
    """
    fd = load_frame_data(episode_dir, frame_idx)
    graph = fd.graph
    type_vocab = graph["type_vocab"]  # 9 entries incl. robot
    nodes = graph["components"]       # robot already excluded per spec
    N = len(nodes)

    # ── Node features ──
    # [256D SAM2 embedding, 3D position, 9D type one-hot, 1D visibility] = 269
    # NOTE: 256 + 3 + 9 + 1 = 269 (not 268). Adjust if you need a different layout.
    x_list = []
    for node in nodes:
        cid = node["id"]
        emb = fd.embeddings.get(cid, np.zeros(256, dtype=np.float32))

        depth_valid_key = f"{cid}_depth_valid"
        centroid_key = f"{cid}_centroid"
        if (depth_valid_key in fd.depth_info
                and int(fd.depth_info[depth_valid_key][0]) == 1):
            pos = fd.depth_info[centroid_key].astype(np.float32)
        else:
            pos = np.zeros(3, dtype=np.float32)

        type_oh = type_one_hot(node["type"], type_vocab)  # 9D
        vis = 1.0 if fd.visibility.get(cid, True) else 0.0

        feat = np.concatenate([
            emb.astype(np.float32),
            pos,
            np.array(type_oh, dtype=np.float32),
            np.array([vis], dtype=np.float32),
        ])
        x_list.append(feat)
    x = torch.tensor(np.stack(x_list), dtype=torch.float32) if x_list else torch.empty((0, 269))

    # ── Fully connected edges with 3D features ──
    # Edge feature: [has_constraint, is_locked, src_blocks_dst]
    # - has_constraint & is_locked are SYMMETRIC for the pair (A, B)
    # - src_blocks_dst is ASYMMETRIC: 1 if edge's src physically blocks dst
    constraint_set = {(e["src"], e["dst"]) for e in graph["edges"]}
    pair_forward = {}  # frozenset({a, b}) -> (blocker, blocked)
    for (s, d) in constraint_set:
        pair_forward[frozenset([s, d])] = (s, d)

    src_idx, dst_idx, edge_attr = [], [], []
    for i in range(N):
        for j in range(N):
            if i == j:
                continue
            src_id = nodes[i]["id"]
            dst_id = nodes[j]["id"]
            src_idx.append(i)
            dst_idx.append(j)

            pair_key = frozenset([src_id, dst_id])
            if pair_key in pair_forward:
                forward = pair_forward[pair_key]
                constraint_key = f"{forward[0]}->{forward[1]}"
                is_locked = fd.constraints.get(constraint_key, True)
                src_blocks_dst = 1.0 if src_id == forward[0] else 0.0
                edge_attr.append([
                    1.0,
                    1.0 if is_locked else 0.0,
                    src_blocks_dst,
                ])
            else:
                edge_attr.append([0.0, 0.0, 0.0])  # message passing only

    return Data(
        x=x,
        edge_index=torch.tensor([src_idx, dst_idx], dtype=torch.long),
        edge_attr=torch.tensor(edge_attr, dtype=torch.float32),
        y=torch.tensor([frame_idx], dtype=torch.long),
        num_nodes=N,
    )


# ─────────────────────────────────────────────────────────────────────────────
# PyG loader — with robot agent node
# ─────────────────────────────────────────────────────────────────────────────

def load_pyg_frame_with_robot(episode_dir: Path, frame_idx: int) -> Data:
    """Fully connected PyG graph WITH robot appended as agent node.

    Robot is node N (the last node). All edges involving the robot have
    features [0, 0, 0] because the robot has no physical constraints.

    If the robot is not visible at this frame, returns the products-only graph.
    Additional attached tensors when robot is visible:
        data.robot_point_cloud  (M, 3) float32
        data.robot_pixel_coords (M, 2) int32
        data.robot_mask         (H, W) uint8
    """
    data = load_pyg_frame_products_only(episode_dir, frame_idx)
    fd = load_frame_data(episode_dir, frame_idx)
    if fd.robot is None:
        return data

    graph = fd.graph
    type_vocab = graph["type_vocab"]
    products = graph["components"]
    N_prod = len(products)
    N = N_prod + 1

    # ── Build robot node features ──
    robot_emb = fd.robot["embedding"].astype(np.float32)
    robot_pos = (fd.robot["centroid"].astype(np.float32)
                 if int(fd.robot["depth_valid"][0]) == 1
                 else np.zeros(3, dtype=np.float32))
    robot_type_oh = type_one_hot("robot", type_vocab)
    robot_feat = np.concatenate([
        robot_emb, robot_pos,
        np.array(robot_type_oh, dtype=np.float32),
        np.array([1.0], dtype=np.float32),
    ])
    x = torch.cat([data.x, torch.tensor(robot_feat, dtype=torch.float32).unsqueeze(0)], dim=0)

    # ── Rebuild edges with 3D features ──
    constraint_set = {(e["src"], e["dst"]) for e in graph["edges"]}
    pair_forward = {}
    for (s, d) in constraint_set:
        pair_forward[frozenset([s, d])] = (s, d)

    src_idx, dst_idx, edge_attr = [], [], []

    # Products × Products
    for i in range(N_prod):
        for j in range(N_prod):
            if i == j:
                continue
            src_id = products[i]["id"]
            dst_id = products[j]["id"]
            src_idx.append(i)
            dst_idx.append(j)
            pair_key = frozenset([src_id, dst_id])
            if pair_key in pair_forward:
                forward = pair_forward[pair_key]
                is_locked = fd.constraints.get(f"{forward[0]}->{forward[1]}", True)
                src_blocks_dst = 1.0 if src_id == forward[0] else 0.0
                edge_attr.append([1.0, 1.0 if is_locked else 0.0, src_blocks_dst])
            else:
                edge_attr.append([0.0, 0.0, 0.0])

    # Robot ↔ Products (both directions, message-passing only)
    robot_idx = N_prod
    for i in range(N_prod):
        src_idx.append(robot_idx); dst_idx.append(i); edge_attr.append([0.0, 0.0, 0.0])
        src_idx.append(i); dst_idx.append(robot_idx); edge_attr.append([0.0, 0.0, 0.0])

    data = Data(
        x=x,
        edge_index=torch.tensor([src_idx, dst_idx], dtype=torch.long),
        edge_attr=torch.tensor(edge_attr, dtype=torch.float32),
        y=torch.tensor([frame_idx], dtype=torch.long),
        num_nodes=N,
    )
    data.robot_point_cloud = torch.tensor(fd.robot["point_cloud"], dtype=torch.float32)
    data.robot_pixel_coords = torch.tensor(fd.robot["pixel_coords"], dtype=torch.int32)
    data.robot_mask = torch.tensor(fd.robot["mask"], dtype=torch.uint8)
    return data


# ─────────────────────────────────────────────────────────────────────────────
# Episode iterator
# ─────────────────────────────────────────────────────────────────────────────

def iterate_episode(episode_dir: Path, with_robot: bool = True):
    """Yield (frame_idx, Data) pairs for all labeled frames in an episode."""
    loader = load_pyg_frame_with_robot if with_robot else load_pyg_frame_products_only
    for frame_idx in list_labeled_frames(episode_dir):
        yield frame_idx, loader(episode_dir, frame_idx)
```

### Usage Examples

#### Variant 1: Constraint Graph Only (No Robot)

```python
from pathlib import Path
from gnn_disassembly_loader import load_pyg_frame_products_only, list_labeled_frames

episode = Path("episode_00")  # downloaded from HF

# Enumerate labeled frames
frames = list_labeled_frames(episode)
print(f"Episode has {len(frames)} labeled frames")
# → Episode has 246 labeled frames

# Load one frame as a fully connected PyG graph (products only)
data = load_pyg_frame_products_only(episode, frame_idx=42)
print(data)
# → Data(x=[15, 269], edge_index=[2, 210], edge_attr=[210, 3], y=[1], num_nodes=15)

# For N=15 products: edges = 15 * 14 = 210 (fully connected)
print("Node features:", data.x.shape)         # (15, 269)
print("Edges:", data.edge_index.shape)        # (2, 210)
print("Edge attrs:", data.edge_attr.shape)    # (210, 3) = [has_constraint, is_locked, src_blocks_dst]

# Count edge feature breakdown
has_c = (data.edge_attr[:, 0] == 1).sum().item()
locked = ((data.edge_attr[:, 0] == 1) & (data.edge_attr[:, 1] == 1)).sum().item()
src_blocks = ((data.edge_attr[:, 0] == 1) & (data.edge_attr[:, 2] == 1)).sum().item()
print(f"Edges with physical constraint: {has_c}")
print(f"  currently locked:            {locked}")
print(f"  where src is the blocker:    {src_blocks}")
print(f"Message-passing-only edges:    {(data.edge_attr[:, 0] == 0).sum().item()}")
```

#### Variant 2: Constraint Graph + Robot Agent Node

```python
from gnn_disassembly_loader import load_pyg_frame_with_robot

data = load_pyg_frame_with_robot(episode, frame_idx=42)
print(data)
# → Data(x=[16, 269], edge_index=[2, 240], edge_attr=[240, 3], y=[1], num_nodes=16)
# Robot is the last node (index 15 for a 15-product graph).
# Robot edges: 15 products * 2 directions = 30 extra edges → 210 + 30 = 240

# Verify robot edges are all message-passing (no constraint)
robot_idx = data.num_nodes - 1
robot_edges = (data.edge_index[0] == robot_idx) | (data.edge_index[1] == robot_idx)
assert (data.edge_attr[robot_edges] == 0).all(), "Robot edges must be [0, 0, 0]"
print(f"Robot edges: {robot_edges.sum().item()} — all [0, 0, 0]")

# Raw robot data (optional, for PointNet-style encoding)
print("Robot point cloud:", data.robot_point_cloud.shape)  # (M, 3) — M varies per frame
print("Robot mask:", data.robot_mask.shape)                # (720, 1280)
```

#### Edge Feature Semantics

Each row of `data.edge_attr` is 3-dimensional: `[has_constraint, is_locked, src_blocks_dst]`.

```
┌──────────────────┬────────────┬────────────────┬─────────────────────────────────┐
│ has_constraint   │ is_locked  │ src_blocks_dst │ Meaning                         │
├──────────────────┼────────────┼────────────────┼─────────────────────────────────┤
│ 0                │ 0          │ 0              │ No physical constraint          │
│                  │            │                │ Message passing only            │
├──────────────────┼────────────┼────────────────┼─────────────────────────────────┤
│ 1                │ 1          │ 1              │ Edge src physically blocks dst  │
│                  │            │                │ Constraint currently LOCKED     │
├──────────────────┼────────────┼────────────────┼─────────────────────────────────┤
│ 1                │ 1          │ 0              │ Edge dst physically blocks src  │
│                  │            │                │ (the reverse direction of the   │
│                  │            │                │ physical constraint)            │
│                  │            │                │ Constraint currently LOCKED     │
├──────────────────┼────────────┼────────────────┼─────────────────────────────────┤
│ 1                │ 0          │ 1              │ Edge src physically blocks dst  │
│                  │            │                │ Constraint RELEASED             │
├──────────────────┼────────────┼────────────────┼─────────────────────────────────┤
│ 1                │ 0          │ 0              │ Edge dst physically blocks src  │
│                  │            │                │ Constraint RELEASED             │
└──────────────────┴────────────┴────────────────┴─────────────────────────────────┘
```

**Important:** The graph is **fully connected and structurally symmetric** — both `(A, B)` and `(B, A)` edges exist for every pair. `has_constraint` and `is_locked` are the same for both directions (they describe the unordered pair). `src_blocks_dst` flips between the two directions — it tells you whether the edge's source is the one doing the blocking.

**Example: CPU bracket blocks CPU removal**

If `cpu_bracket → cpu` is an active constraint, the loader produces:

```
Edge (cpu_bracket, cpu):  [1, 1, 1]   # cpu_bracket blocks cpu, locked, src=blocker
Edge (cpu, cpu_bracket):  [1, 1, 0]   # same physical pair, src=blocked
```

When the user unlocks the constraint (e.g., after releasing the bracket):
```
Edge (cpu_bracket, cpu):  [1, 0, 1]   # constraint released, but bracket still named as blocker
Edge (cpu, cpu_bracket):  [1, 0, 0]
```

### Iterating the Full Episode

```python
from torch_geometric.loader import DataLoader
from gnn_disassembly_loader import iterate_episode

# Build a dataset list
data_list = [data for _, data in iterate_episode(episode, with_robot=True)]
print(f"Loaded {len(data_list)} frames")

# Batch them for training
loader = DataLoader(data_list, batch_size=8, shuffle=True)
for batch in loader:
    print(batch.x.shape, batch.edge_index.shape, batch.edge_attr.shape)
    break
```

### Adding Robot State as Node Features (Graph B)

For the perception + robot state variant, concatenate the 13D robot state to every node:

```python
import numpy as np
import torch

robot_states = np.load(episode / "robot_states.npy")  # (T, 13)

def add_robot_state_to_graph(data, frame_idx, robot_states):
    robot_state_t = torch.tensor(robot_states[frame_idx], dtype=torch.float32)  # (13,)
    broadcast = robot_state_t.unsqueeze(0).expand(data.num_nodes, -1)           # (N, 13)
    data.x = torch.cat([data.x, broadcast], dim=1)                              # (N, 282)
    return data

data_b = add_robot_state_to_graph(data, frame_idx=42, robot_states=robot_states)
print("Graph B node features:", data_b.x.shape)  # (16, 282) for with_robot variant
```

## Node Feature Layout (269D)

```
[0   : 256]   SAM2 embedding (256D)  — masked avg pool over vision_features
[256 : 259]   3D position (3D)        — centroid in camera frame (meters)
[259 : 268]   type one-hot (9D)       — index by type_vocab (incl. "robot")
[268]         visibility (1D)         — binary flag
```

Total: **269D per node**.

For Graph B (with robot state broadcast):
```
[0   : 269]   Graph A features (269D)
[269 : 275]   joint positions (6D)     — UR5e joint angles (radians)
[275 : 281]   TCP pose (6D)            — [x, y, z, rx, ry, rz]
[281]         gripper position (1D)    — Robotiq 2F-85 (0-255)
```

Total: **282D per node**.

## Raw Data Access (No PyG)

If you prefer raw NumPy without PyTorch Geometric:

```python
from scripts.pyg_loader import load_frame_data

fd = load_frame_data(episode, frame_idx=42)

print("Graph:", fd.graph["components"])
print("Masks:", list(fd.masks.keys()))
print("Resolved visibility:", fd.visibility)
print("Robot present:", fd.robot is not None)

if fd.robot is not None:
    print("Robot mask shape:", fd.robot["mask"].shape)
    print("Robot point cloud:", fd.robot["point_cloud"].shape)
    print("Robot centroid (m):", fd.robot["centroid"])

# Access a specific component's depth info
for key in ["point_cloud", "pixel_coords", "centroid", "area", "depth_valid"]:
    full_key = f"cpu_fan_{key}"
    if full_key in fd.depth_info:
        print(f"cpu_fan {key}: {fd.depth_info[full_key]}")
```

## Recording Hardware

- **Robot:** UR5e + Robotiq 2F-85 gripper
- **Side camera:** Luxonis OAK-D Pro (static viewpoint)
  - Intrinsics: fx=1033.8, fy=1033.7, cx=632.9, cy=359.9
- **Recording rate:** 30 Hz
- **Image size:** 1280 × 720
- **Depth format:** uint16, millimeters
- **Teleoperation:** Thrustmaster SOL-R2 HOSAS controllers

## Annotation Tool

Annotations created with a custom SAM2-based labeling tool:

- **Repository:** https://github.com/ChangChrisLiu/gnn-world-model
- **Backend:** FastAPI + SAM2 (`sam2.1_hiera_base_plus`)
- **Frontend:** Vanilla HTML/JS, side-only interactive view
- **Tools:** BBox, Point, Polygon, Brush, Eraser (all mask-editing operations)
- **Features:** Dynamic component instances, AGENT badge for robot, scroll-to-zoom, undo/redo, per-frame delta-encoded visibility

## License

Released under **CC BY 4.0**. Use, share, and adapt freely with attribution.

## Acknowledgements

Built using:
- [Segment Anything Model 2 (SAM2)](https://github.com/facebookresearch/sam2) by Meta AI
- [PyTorch Geometric](https://pytorch-geometric.readthedocs.io/)
- [Hugging Face Datasets](https://huggingface.co/docs/datasets)