Instructions to use EndeavourDD/gnn_wm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use EndeavourDD/gnn_wm with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("EndeavourDD/gnn_wm", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
| license: cc-by-4.0 | |
| task_categories: | |
| - robotics | |
| - image-segmentation | |
| - graph-ml | |
| language: | |
| - en | |
| tags: | |
| - robotics | |
| - manipulation | |
| - disassembly | |
| - constraint-graph | |
| - gnn | |
| - world-model | |
| - sam2 | |
| - segmentation | |
| - ur5e | |
| size_categories: | |
| - 1K<n<10K | |
| pretty_name: GNN Disassembly World Model Dataset (v3) | |
| # GNN Disassembly World Model Dataset (v3) | |
| Real robot disassembly episodes with side-view **per-frame constraint graphs**, SAM2 segmentation masks, 256D feature embeddings, **full 3D depth information (point clouds)**, and synchronized robot states. The robot is labeled as a separate **agent node** with its own mask, embedding, and depth bundle. | |
| **Project:** CoRL 2026 β GNN world model for constraint-aware video generation | |
| **Author:** Chang Liu (Texas A&M University) | |
| **Hardware:** UR5e + Robotiq 2F-85 gripper, OAK-D Pro (side view) | |
| **Format version:** v3 (2026-04-10) | |
| ## Dataset Structure | |
| ``` | |
| episode_XX/ | |
| βββ metadata.json # Episode metadata, component counts, labeled frame count | |
| βββ robot_states.npy # (T, 13) float32 β joint angles + TCP pose + gripper | |
| βββ robot_actions.npy # (T-1, 13) float32 β frame-to-frame state deltas | |
| βββ timestamps.npy # (T, 3) float64 | |
| βββ side/ | |
| β βββ rgb/frame_XXXXXX.png # 1280Γ720 RGB (side camera) | |
| β βββ depth/frame_XXXXXX.npy # 1280Γ720 uint16 depth (mm) | |
| βββ wrist/ # Raw wrist camera data (not used in v3) | |
| β βββ rgb/... | |
| β βββ depth/... | |
| βββ annotations/ | |
| βββ side_graph.json # Constraint graph (products only, NO robot) | |
| βββ side_masks/ | |
| β βββ frame_XXXXXX.npz # {component_id: (H,W) uint8} β products only | |
| βββ side_embeddings/ | |
| β βββ frame_XXXXXX.npz # {component_id: (256,) float32} β products only | |
| βββ side_depth_info/ | |
| β βββ frame_XXXXXX.npz # Per-product depth bundle (flat keys) | |
| βββ side_robot/ | |
| β βββ frame_XXXXXX.npz # Robot bundle β ALWAYS written per labeled frame | |
| βββ dataset_card.json # Format description | |
| ``` | |
| **Alignment guarantee:** every labeled frame has files in all 4 annotation directories. Files are aligned by frame index. | |
| ## Component Types (9 types) | |
| **8 product types** (constraint nodes): | |
| | Index | Type | Color | Notes | | |
| |-------|------|-------|-------| | |
| | 0 | `cpu_fan` | #FF6B6B | Always visible at start | | |
| | 1 | `cpu_bracket` | #4ECDC4 | Hidden at start (under fan) | | |
| | 2 | `cpu` | #45B7D1 | Hidden at start | | |
| | 3 | `ram_clip` | #96CEB4 | Multiple instances: ram_clip_1, ram_clip_2, ... | | |
| | 4 | `ram` | #FFEAA7 | Multiple instances: ram_1, ram_2, ... | | |
| | 5 | `connector` | #DDA0DD | Multiple instances: connector_1, connector_2, ... | | |
| | 6 | `graphic_card` | #FF8C42 | Always visible | | |
| | 7 | `motherboard` | #8B5CF6 | Always visible (base) | | |
| **1 agent type** (NOT in constraint graph): | |
| | Index | Type | Color | Notes | | |
| |-------|------|-------|-------| | |
| | 8 | `robot` | #F5F5F5 | Labeled but stored separately. Added as agent node at training time. | | |
| ## Graph Semantics | |
| ### Constraint Graph (Sparse, Stored) | |
| `side_graph.json` defines the **physical constraint relationships** between products. Directed edges: `A -> B` means "A must be removed before B can be removed" (A blocks B). | |
| ``` | |
| cpu_fan -> cpu_bracket (fan covers bracket) | |
| cpu_fan -> motherboard (fan attached to board) | |
| cpu_bracket -> cpu (bracket holds CPU) | |
| cpu_bracket -> motherboard | |
| cpu -> motherboard | |
| ram_N -> motherboard | |
| ram_clip_N -> motherboard | |
| ram_clip_N -> ram_M (user manually pairs) | |
| connector_N -> motherboard | |
| graphic_card -> motherboard | |
| ``` | |
| **Edge states** are delta-encoded in `frame_states`: | |
| - `locked: true` (1) β constraint active, component cannot be removed | |
| - `locked: false` (0) β constraint released, component is free | |
| - Monotonic: once unlocked, stays unlocked | |
| ### Fully Connected Graph (Built at Training Time) | |
| For GNN message passing, the sparse constraint graph is expanded to a **fully connected directed** graph. Every ordered pair `(i, j)` where `i != j` gets an edge. Self-loops are excluded. | |
| **Edge count:** For a graph with N nodes, there are **N Γ (N - 1)** directed edges (both directions for every pair). | |
| **Edge features (2D):** | |
| | `has_constraint` | `is_locked` | Meaning | | |
| |---|---|---| | |
| | 1 | 1 | Directed physical constraint `i β j` exists, currently active (locked) | | |
| | 1 | 0 | Directed physical constraint `i β j` exists, released (unlocked) | | |
| | 0 | 0 | No physical constraint in this direction β message passing only | | |
| **Direction handling is asymmetric.** The physical constraint `A β B` (A blocks B's removal) is a one-way relationship: | |
| - Edge `(A, B)` β `has_constraint = 1` | |
| - Edge `(B, A)` β `has_constraint = 0` (no reverse constraint; still present for message passing) | |
| For example, if `cpu_fan β cpu_bracket` is a constraint: | |
| ``` | |
| (cpu_fan, cpu_bracket) β has_constraint=1, is_locked=1 (physical, active) | |
| (cpu_bracket, cpu_fan) β has_constraint=0, is_locked=0 (message passing only) | |
| ``` | |
| This ensures every node pair communicates during GNN layers while still encoding the directionality of the prerequisite relationship. | |
| **Robot (agent node)** has NO physical constraints. All edges involving the robot (`robot β any_product`) have features `[0, 0]` β context-passing only. | |
| **Node ordering:** Node indices in `edge_index` match the order of `components` in `side_graph.json`. When the robot is added (with `load_pyg_frame_with_robot`), it is appended at index `N_products` (the last position). | |
| ## Data File Schemas | |
| ### `side_graph.json` | |
| ```json | |
| { | |
| "view": "side", | |
| "episode_id": "episode_00", | |
| "goal_component": "connector_1", | |
| "components": [ | |
| {"id": "cpu_fan", "type": "cpu_fan", "color": "#FF6B6B"}, | |
| {"id": "ram_1", "type": "ram", "color": "#FFEAA7"} | |
| ], | |
| "edges": [ | |
| {"src": "cpu_fan", "dst": "cpu_bracket", "directed": true}, | |
| {"src": "ram_clip_1", "dst": "ram_1", "directed": true} | |
| ], | |
| "frame_states": { | |
| "0": { | |
| "constraints": {"cpu_fan->cpu_bracket": true}, | |
| "visibility": {"cpu_bracket": false, "cpu": false, "robot": true} | |
| }, | |
| "152": { | |
| "constraints": {"cpu_fan->cpu_bracket": false}, | |
| "visibility": {"cpu_fan": false, "cpu_bracket": true, "cpu": true} | |
| } | |
| }, | |
| "node_positions": {"cpu_fan": [120, 80]}, | |
| "embedding_dim": 256, | |
| "feature_extractor": "sam2.1_hiera_base_plus", | |
| "type_vocab": ["cpu_fan", "cpu_bracket", "cpu", "ram_clip", "ram", "connector", "graphic_card", "motherboard", "robot"] | |
| } | |
| ``` | |
| **Robot is NOT in components.** Robot is stored in `side_robot/`. | |
| ### `side_depth_info/frame_XXXXXX.npz` | |
| **Always contains all 7 keys per component in `graph.components`.** Flat keys prefixed by component_id. | |
| | Key | Shape | Dtype | Description | | |
| |-----|-------|-------|-------------| | |
| | `{cid}_point_cloud` | (N, 3) | float32 | 3D points in camera frame (meters). Empty (0, 3) if no valid depth. | | |
| | `{cid}_pixel_coords` | (N, 2) | int32 | (u, v) pixel coords of valid points | | |
| | `{cid}_raw_depths_mm` | (N,) | uint16 | Raw depth values in mm, filtered to [50, 2000] | | |
| | `{cid}_centroid` | (3,) | float32 | Mean of point_cloud; [0,0,0] if no valid depth | | |
| | `{cid}_bbox_2d` | (4,) | int32 | [x1, y1, x2, y2] from mask | | |
| | `{cid}_area` | (1,) | int32 | Mask pixel count | | |
| | `{cid}_depth_valid` | (1,) | uint8 | 1 if N > 0 else 0 | | |
| ### `side_robot/frame_XXXXXX.npz` | |
| **Always written per labeled frame** (with `visible=[0]` if robot not in this frame). | |
| | Key | Shape | Dtype | Description | | |
| |-----|-------|-------|-------------| | |
| | `visible` | (1,) | uint8 | 1 if robot labeled, 0 otherwise | | |
| | `mask` | (H, W) | uint8 | Binary mask | | |
| | `embedding` | (256,) | float32 | SAM2 256D feature | | |
| | `point_cloud` | (N, 3) | float32 | 3D points (meters) | | |
| | `pixel_coords` | (N, 2) | int32 | (u, v) pixel coords | | |
| | `raw_depths_mm` | (N,) | uint16 | Raw depths in mm | | |
| | `centroid` | (3,) | float32 | Mean of point_cloud | | |
| | `bbox_2d` | (4,) | int32 | From mask | | |
| | `area` | (1,) | int32 | Pixel count | | |
| | `depth_valid` | (1,) | uint8 | 1 if N > 0 else 0 | | |
| ### `metadata.json` | |
| ```json | |
| { | |
| "episode_id": "episode_00", | |
| "goal_component": "connector_1", | |
| "num_frames": 604, | |
| "labeled_frame_count": 246, | |
| "annotation_complete": false, | |
| "component_counts": { | |
| "cpu_fan": 1, "cpu_bracket": 1, "cpu": 1, | |
| "ram": 2, "ram_clip": 4, "connector": 4, | |
| "graphic_card": 1, "motherboard": 1 | |
| }, | |
| "format_version": "3.0", | |
| "sam2_model": "sam2.1_hiera_b+", | |
| "embedding_dim": 256, | |
| "fps": 30, | |
| "cameras": ["side"], | |
| "robot": "UR5e", | |
| "gripper": "Robotiq 2F-85" | |
| } | |
| ``` | |
| ## Test Data Available | |
| One episode is fully labeled and validated β you can use it to test the loader: | |
| **Labeled episode:** `session_0408_162129/episode_00` | |
| | Stat | Value | | |
| |------|-------| | |
| | Total frames in episode | 604 | | |
| | Labeled frames | **346** (range 0β351, 6 gaps) | | |
| | Product components | 15 (cpu_fan, cpu_bracket, cpu, graphic_card, motherboard, connector_1..4, ram_1..2, ram_clip_1..4) | | |
| | Physical constraints (edges) | 14 | | |
| | Robot visibility | Visible in 216 / 346 frames | | |
| | Goal component | `connector_1` | | |
| ### Download and Test (3 steps) | |
| **Step 1: Download just one episode (lightweight)** | |
| ```bash | |
| pip install huggingface_hub | |
| ``` | |
| ```python | |
| from huggingface_hub import snapshot_download | |
| local_dir = snapshot_download( | |
| repo_id="ChangChrisLiu/GNN_Disassembly_WorldModel", | |
| repo_type="dataset", | |
| allow_patterns=[ | |
| "session_0408_162129/episode_00/metadata.json", | |
| "session_0408_162129/episode_00/robot_states.npy", | |
| "session_0408_162129/episode_00/robot_actions.npy", | |
| "session_0408_162129/episode_00/side/rgb/frame_000042.png", | |
| "session_0408_162129/episode_00/side/depth/frame_000042.npy", | |
| "session_0408_162129/episode_00/annotations/*", | |
| ], | |
| ) | |
| print("Downloaded to:", local_dir) | |
| ``` | |
| **Step 2: Save the loader code** (copy the self-contained `gnn_disassembly_loader.py` block below into a file) | |
| **Step 3: Run this test script** β it loads frame 42, prints the full graph anatomy, and verifies everything: | |
| ```python | |
| from pathlib import Path | |
| from gnn_disassembly_loader import ( | |
| load_pyg_frame_products_only, | |
| load_pyg_frame_with_robot, | |
| list_labeled_frames, | |
| load_frame_data, | |
| ) | |
| # After snapshot_download above: | |
| episode = Path(local_dir) / "session_0408_162129" / "episode_00" | |
| # 1. Enumerate labeled frames | |
| frames = list_labeled_frames(episode) | |
| assert len(frames) == 346, f"Expected 346 labeled frames, got {len(frames)}" | |
| print(f"β Labeled frames: {len(frames)} (range {frames[0]}..{frames[-1]})") | |
| # 2. Load frame 42 β products only | |
| data1 = load_pyg_frame_products_only(episode, frame_idx=42) | |
| assert data1.num_nodes == 15, f"Expected 15 products, got {data1.num_nodes}" | |
| assert data1.edge_index.shape[1] == 15 * 14 # fully connected | |
| assert data1.edge_attr.shape == (210, 3) # 3D edge features | |
| print(f"β Products-only: {data1}") | |
| # 3. Load frame 42 β with robot agent | |
| data2 = load_pyg_frame_with_robot(episode, frame_idx=42) | |
| assert data2.num_nodes == 16, f"Expected 15 products + 1 robot = 16, got {data2.num_nodes}" | |
| assert data2.edge_index.shape[1] == 16 * 15 | |
| assert hasattr(data2, "robot_point_cloud") | |
| print(f"β With robot: {data2}") | |
| print(f" Robot point cloud: {tuple(data2.robot_point_cloud.shape)}") | |
| print(f" Robot mask: {tuple(data2.robot_mask.shape)}") | |
| # 4. Verify robot edges are all [0, 0, 0] | |
| robot_idx = data2.num_nodes - 1 | |
| robot_edges = (data2.edge_index[0] == robot_idx) | (data2.edge_index[1] == robot_idx) | |
| assert (data2.edge_attr[robot_edges] == 0).all() | |
| print(f"β Robot edges: {robot_edges.sum().item()} β all [0,0,0]") | |
| # 5. Verify edge feature semantics | |
| has_c = (data1.edge_attr[:, 0] == 1).sum().item() | |
| locked = ((data1.edge_attr[:, 0] == 1) & (data1.edge_attr[:, 1] == 1)).sum().item() | |
| src_blocks = ((data1.edge_attr[:, 0] == 1) & (data1.edge_attr[:, 2] == 1)).sum().item() | |
| assert has_c == 28 # 14 constraints Γ 2 directions | |
| assert locked == 28 # all locked at frame 42 | |
| assert src_blocks == 14 # half the constraint edges have src as blocker | |
| print(f"β Edge features: {has_c} constraint edges, {locked} locked, {src_blocks} forward-direction") | |
| # 6. Verify fully-connected + symmetric structure | |
| from collections import Counter | |
| pairs = Counter() | |
| for i in range(data1.edge_index.shape[1]): | |
| src = data1.edge_index[0, i].item() | |
| dst = data1.edge_index[1, i].item() | |
| pairs[frozenset([src, dst])] += 1 | |
| # Every unordered pair should appear exactly twice: (i, j) AND (j, i) | |
| assert all(count == 2 for count in pairs.values()) | |
| print(f"β Structurally symmetric: every pair has both directions") | |
| # 7. Raw data access | |
| fd = load_frame_data(episode, frame_idx=42) | |
| print(f"β Raw data: {len(fd.masks)} product masks, robot {'visible' if fd.robot else 'hidden'}") | |
| print("\nAll tests passed! The dataset is ready for training.") | |
| ``` | |
| Expected output: | |
| ``` | |
| β Labeled frames: 346 (range 0..351) | |
| β Products-only: Data(x=[15, 269], edge_index=[2, 210], edge_attr=[210, 3], y=[1], num_nodes=15) | |
| β With robot: Data(x=[16, 269], edge_index=[2, 240], edge_attr=[240, 3], y=[1], num_nodes=16, robot_point_cloud=[5729, 3], robot_pixel_coords=[5729, 2], robot_mask=[720, 1280]) | |
| Robot point cloud: (5729, 3) | |
| Robot mask: (720, 1280) | |
| β Robot edges: 30 β all [0,0,0] | |
| β Edge features: 28 constraint edges, 28 locked, 14 forward-direction | |
| β Structurally symmetric: every pair has both directions | |
| β Raw data: 13 product masks, robot visible | |
| All tests passed! The dataset is ready for training. | |
| ``` | |
| ## Graph Structure β What You Get Per Frame | |
| Every labeled frame is converted to **one PyTorch Geometric `Data` object**. Here's exactly what it contains: | |
| ### Node Features (269D per node) | |
| ``` | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β x[i] = 269D feature vector for node i β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β [0 : 256] SAM2 embedding (256D) β | |
| β Masked average pool over SAM2 encoder's vision_features. β | |
| β Captures visual appearance of the component. β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β [256 : 259] 3D position (3D) β | |
| β Centroid in camera frame, meters. Mean of the valid β | |
| β depth-backprojected points within the mask. β | |
| β Zero vector if no valid depth (check depth_valid flag). β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β [259 : 268] Type one-hot (9D) β | |
| β Index order: cpu_fan, cpu_bracket, cpu, ram_clip, ram, β | |
| β connector, graphic_card, motherboard, robot. β | |
| β Multiple instances (e.g. ram_1, ram_2) share the same β | |
| β one-hot β distinguished by their SAM2 embedding + 3D pos.β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β [268] Visibility (1D) β | |
| β Binary flag β 1 if visible this frame, 0 if hidden. β | |
| β Delta-encoded through frame_states in side_graph.json. β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| ``` | |
| ### Graph Topology β Fully Connected, Structurally Symmetric | |
| For N nodes, the PyG graph has: | |
| - `edge_index` shape: **(2, N Γ (N β 1))** | |
| - Every ordered pair `(i, j)` with `i β j` has an edge | |
| - Both `(i, j)` AND `(j, i)` exist β the graph is **not structurally directed** | |
| - Self-loops are excluded | |
| **Why fully connected?** Sparse constraint graphs (just physical prerequisites) would prevent distant nodes from exchanging information through GNN message passing. Making it fully connected ensures every node pair communicates in one layer. | |
| ### Edge Features (3D per edge) | |
| ``` | |
| βββββββββββββββββββ¬βββββββββββ¬βββββββββββββββββ¬ββββββββββββββββββββββββββ | |
| β has_constraint β is_lockedβ src_blocks_dst β Meaning β | |
| βββββββββββββββββββΌβββββββββββΌβββββββββββββββββΌββββββββββββββββββββββββββ€ | |
| β 0 β 0 β 0 β No physical constraint β | |
| β β β β (message passing only) β | |
| βββββββββββββββββββΌβββββββββββΌβββββββββββββββββΌββββββββββββββββββββββββββ€ | |
| β 1 β 1 β 1 β Physical constraint β | |
| β β β β LOCKED, src is blocker β | |
| βββββββββββββββββββΌβββββββββββΌβββββββββββββββββΌββββββββββββββββββββββββββ€ | |
| β 1 β 1 β 0 β Physical constraint β | |
| β β β β LOCKED, src is blocked β | |
| β β β β (reverse direction) β | |
| βββββββββββββββββββΌβββββββββββΌβββββββββββββββββΌββββββββββββββββββββββββββ€ | |
| β 1 β 0 β 1 β Physical constraint β | |
| β β β β RELEASED (unlocked) β | |
| β β β β src is the blocker β | |
| βββββββββββββββββββΌβββββββββββΌβββββββββββββββββΌββββββββββββββββββββββββββ€ | |
| β 1 β 0 β 0 β Physical constraint β | |
| β β β β RELEASED, src is blockedβ | |
| βββββββββββββββββββ΄βββββββββββ΄βββββββββββββββββ΄ββββββββββββββββββββββββββ | |
| ``` | |
| **Direction is a feature, not structure.** | |
| - `has_constraint` and `is_locked` describe the PAIR β they're the same for both `(i,j)` and `(j,i)`. | |
| - `src_blocks_dst` is asymmetric: it flips depending on which direction the edge goes. | |
| **Example:** `cpu_fan` blocks `cpu_bracket` (fan covers bracket). At frame 0 (locked): | |
| ``` | |
| edge (cpu_fan, cpu_bracket) β [1, 1, 1] cpu_fan is the blocker | |
| edge (cpu_bracket, cpu_fan) β [1, 1, 0] cpu_bracket is the blocked | |
| ``` | |
| At frame 152 after the user removes the fan (unlocked): | |
| ``` | |
| edge (cpu_fan, cpu_bracket) β [1, 0, 1] | |
| edge (cpu_bracket, cpu_fan) β [1, 0, 0] | |
| ``` | |
| ### Robot Agent Node (Optional) | |
| When loaded with `load_pyg_frame_with_robot()`, the robot is appended as the **last node** (index `N_products`). All edges involving the robot have features `[0, 0, 0]` β the robot has no physical constraints, it's a context-providing agent node. | |
| The raw robot data (point cloud, pixel coords, full mask) is attached as extra tensors on the `Data` object for optional PointNet-style encoding. | |
| ### Matching a Frame to Its RGB Image | |
| Frame indices in the loader directly map to image files: | |
| ```python | |
| frame_idx = 42 | |
| rgb_path = episode / "side" / "rgb" / f"frame_{frame_idx:06d}.png" | |
| depth_path = episode / "side" / "depth" / f"frame_{frame_idx:06d}.npy" | |
| ``` | |
| Example β load PyG frame + matching image + depth: | |
| ```python | |
| from pathlib import Path | |
| import numpy as np | |
| from PIL import Image | |
| from gnn_disassembly_loader import load_pyg_frame_with_robot | |
| episode = Path("episode_00") | |
| frame_idx = 42 | |
| # PyG graph for this frame | |
| data = load_pyg_frame_with_robot(episode, frame_idx) | |
| # Matching RGB image (1280x720 PNG) | |
| rgb = np.array(Image.open(episode / "side" / "rgb" / f"frame_{frame_idx:06d}.png")) | |
| print("RGB shape:", rgb.shape) # (720, 1280, 3) | |
| # Matching depth (1280x720 uint16 mm) | |
| depth = np.load(episode / "side" / "depth" / f"frame_{frame_idx:06d}.npy") | |
| print("Depth shape:", depth.shape, depth.dtype) # (720, 1280) uint16 | |
| # Robot mask is in the PyG data if robot is visible | |
| if hasattr(data, "robot_mask"): | |
| robot_mask = data.robot_mask.numpy() # (720, 1280) uint8 | |
| print("Robot mask area:", robot_mask.sum(), "pixels") | |
| ``` | |
| ## Loading the Data β PyTorch Geometric | |
| This section contains **self-contained** code you can copy-paste directly. No need to clone any repo. | |
| ### Prerequisites | |
| ```bash | |
| pip install torch numpy torch_geometric pillow | |
| ``` | |
| ### Self-contained PyG loader | |
| Copy this into a file called `gnn_disassembly_loader.py`: | |
| ```python | |
| """Self-contained PyG loader for the GNN Disassembly dataset. | |
| Two loader variants: | |
| - load_pyg_frame_products_only(ep, frame) β constraint graph only, no robot | |
| - load_pyg_frame_with_robot(ep, frame) β constraint graph + robot agent node | |
| Both return torch_geometric.data.Data with: | |
| x (N, 268) node features | |
| edge_index (2, N*(N-1)) fully connected directed message-passing edges | |
| edge_attr (N*(N-1), 3) [has_constraint, is_locked, src_blocks_dst] | |
| num_nodes N | |
| Notes on the edge feature design: | |
| - The graph is FULLY CONNECTED and structurally symmetric. | |
| Both (i, j) and (j, i) exist in edge_index for every node pair i != j. | |
| - Direction is NOT encoded in the graph structure. It is encoded as | |
| a feature: `src_blocks_dst`. | |
| - `has_constraint` and `is_locked` are symmetric per pair (same value | |
| for both (i, j) and (j, i)). | |
| - `src_blocks_dst` is asymmetric: it is 1 if the edge's src node | |
| physically blocks its dst node, 0 otherwise. | |
| """ | |
| import json | |
| from dataclasses import dataclass | |
| from pathlib import Path | |
| from typing import Dict, List, Optional, Tuple | |
| import numpy as np | |
| import torch | |
| from torch_geometric.data import Data | |
| # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| # Helpers | |
| # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| def list_labeled_frames(episode_dir: Path) -> List[int]: | |
| """Return sorted list of frame indices that have saved annotations.""" | |
| mask_dir = episode_dir / "annotations" / "side_masks" | |
| if not mask_dir.exists(): | |
| return [] | |
| frames = [] | |
| for p in mask_dir.glob("frame_*.npz"): | |
| try: | |
| frames.append(int(p.stem.split("_")[1])) | |
| except (ValueError, IndexError): | |
| continue | |
| return sorted(frames) | |
| def resolve_frame_state(graph_json: dict, frame_idx: int) -> Tuple[Dict[str, bool], Dict[str, bool]]: | |
| """Resolve delta-encoded constraints + visibility at a frame. | |
| Walks frame_states from frame 0 to frame_idx, accumulating deltas. | |
| Returns (constraints_dict, visibility_dict). | |
| """ | |
| constraints: Dict[str, bool] = {} | |
| visibility: Dict[str, bool] = {} | |
| # Defaults: every component visible, every edge locked | |
| for c in graph_json["components"]: | |
| visibility[c["id"]] = True | |
| for e in graph_json["edges"]: | |
| constraints[f"{e['src']}->{e['dst']}"] = True | |
| # Walk deltas up to frame_idx | |
| fs_dict = graph_json.get("frame_states", {}) | |
| for f in sorted([int(k) for k in fs_dict]): | |
| if f > frame_idx: | |
| break | |
| fs = fs_dict[str(f)] | |
| for k, v in fs.get("constraints", {}).items(): | |
| constraints[k] = v | |
| for k, v in fs.get("visibility", {}).items(): | |
| visibility[k] = v | |
| return constraints, visibility | |
| def type_one_hot(comp_type: str, type_vocab: List[str]) -> List[float]: | |
| """9-dim one-hot encoding of component type based on type_vocab.""" | |
| return [1.0 if t == comp_type else 0.0 for t in type_vocab] | |
| # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| # Raw data loader (NumPy only, no torch) | |
| # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| @dataclass | |
| class FrameData: | |
| graph: dict | |
| masks: Dict[str, np.ndarray] | |
| embeddings: Dict[str, np.ndarray] | |
| depth_info: dict | |
| robot: Optional[dict] | |
| constraints: Dict[str, bool] | |
| visibility: Dict[str, bool] | |
| def load_frame_data(episode_dir: Path, frame_idx: int) -> FrameData: | |
| """Load all v3 annotation files for one frame.""" | |
| anno = episode_dir / "annotations" | |
| with open(anno / "side_graph.json") as f: | |
| graph = json.load(f) | |
| def _load_npz_dict(path: Path) -> Dict[str, np.ndarray]: | |
| if not path.exists(): | |
| return {} | |
| d = np.load(path) | |
| return {k: d[k] for k in d.files} | |
| masks = _load_npz_dict(anno / "side_masks" / f"frame_{frame_idx:06d}.npz") | |
| embeddings = _load_npz_dict(anno / "side_embeddings" / f"frame_{frame_idx:06d}.npz") | |
| depth_info = _load_npz_dict(anno / "side_depth_info" / f"frame_{frame_idx:06d}.npz") | |
| robot: Optional[dict] = None | |
| robot_path = anno / "side_robot" / f"frame_{frame_idx:06d}.npz" | |
| if robot_path.exists(): | |
| r = np.load(robot_path) | |
| if r["visible"][0] == 1: | |
| robot = {k: r[k] for k in r.files} | |
| constraints, visibility = resolve_frame_state(graph, frame_idx) | |
| return FrameData(graph, masks, embeddings, depth_info, robot, constraints, visibility) | |
| # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| # PyG loader β products only | |
| # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| def load_pyg_frame_products_only(episode_dir: Path, frame_idx: int) -> Data: | |
| """Fully connected PyG graph WITHOUT robot. | |
| Returns Data( | |
| x=[N, 268], | |
| edge_index=[2, N*(N-1)], | |
| edge_attr=[N*(N-1), 3], # [has_constraint, is_locked, src_blocks_dst] | |
| num_nodes=N, | |
| ) | |
| where N = number of product components (robot excluded). | |
| """ | |
| fd = load_frame_data(episode_dir, frame_idx) | |
| graph = fd.graph | |
| type_vocab = graph["type_vocab"] # 9 entries incl. robot | |
| nodes = graph["components"] # robot already excluded per spec | |
| N = len(nodes) | |
| # ββ Node features ββ | |
| # [256D SAM2 embedding, 3D position, 9D type one-hot, 1D visibility] = 269 | |
| # NOTE: 256 + 3 + 9 + 1 = 269 (not 268). Adjust if you need a different layout. | |
| x_list = [] | |
| for node in nodes: | |
| cid = node["id"] | |
| emb = fd.embeddings.get(cid, np.zeros(256, dtype=np.float32)) | |
| depth_valid_key = f"{cid}_depth_valid" | |
| centroid_key = f"{cid}_centroid" | |
| if (depth_valid_key in fd.depth_info | |
| and int(fd.depth_info[depth_valid_key][0]) == 1): | |
| pos = fd.depth_info[centroid_key].astype(np.float32) | |
| else: | |
| pos = np.zeros(3, dtype=np.float32) | |
| type_oh = type_one_hot(node["type"], type_vocab) # 9D | |
| vis = 1.0 if fd.visibility.get(cid, True) else 0.0 | |
| feat = np.concatenate([ | |
| emb.astype(np.float32), | |
| pos, | |
| np.array(type_oh, dtype=np.float32), | |
| np.array([vis], dtype=np.float32), | |
| ]) | |
| x_list.append(feat) | |
| x = torch.tensor(np.stack(x_list), dtype=torch.float32) if x_list else torch.empty((0, 269)) | |
| # ββ Fully connected edges with 3D features ββ | |
| # Edge feature: [has_constraint, is_locked, src_blocks_dst] | |
| # - has_constraint & is_locked are SYMMETRIC for the pair (A, B) | |
| # - src_blocks_dst is ASYMMETRIC: 1 if edge's src physically blocks dst | |
| constraint_set = {(e["src"], e["dst"]) for e in graph["edges"]} | |
| pair_forward = {} # frozenset({a, b}) -> (blocker, blocked) | |
| for (s, d) in constraint_set: | |
| pair_forward[frozenset([s, d])] = (s, d) | |
| src_idx, dst_idx, edge_attr = [], [], [] | |
| for i in range(N): | |
| for j in range(N): | |
| if i == j: | |
| continue | |
| src_id = nodes[i]["id"] | |
| dst_id = nodes[j]["id"] | |
| src_idx.append(i) | |
| dst_idx.append(j) | |
| pair_key = frozenset([src_id, dst_id]) | |
| if pair_key in pair_forward: | |
| forward = pair_forward[pair_key] | |
| constraint_key = f"{forward[0]}->{forward[1]}" | |
| is_locked = fd.constraints.get(constraint_key, True) | |
| src_blocks_dst = 1.0 if src_id == forward[0] else 0.0 | |
| edge_attr.append([ | |
| 1.0, | |
| 1.0 if is_locked else 0.0, | |
| src_blocks_dst, | |
| ]) | |
| else: | |
| edge_attr.append([0.0, 0.0, 0.0]) # message passing only | |
| return Data( | |
| x=x, | |
| edge_index=torch.tensor([src_idx, dst_idx], dtype=torch.long), | |
| edge_attr=torch.tensor(edge_attr, dtype=torch.float32), | |
| y=torch.tensor([frame_idx], dtype=torch.long), | |
| num_nodes=N, | |
| ) | |
| # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| # PyG loader β with robot agent node | |
| # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| def load_pyg_frame_with_robot(episode_dir: Path, frame_idx: int) -> Data: | |
| """Fully connected PyG graph WITH robot appended as agent node. | |
| Robot is node N (the last node). All edges involving the robot have | |
| features [0, 0, 0] because the robot has no physical constraints. | |
| If the robot is not visible at this frame, returns the products-only graph. | |
| Additional attached tensors when robot is visible: | |
| data.robot_point_cloud (M, 3) float32 | |
| data.robot_pixel_coords (M, 2) int32 | |
| data.robot_mask (H, W) uint8 | |
| """ | |
| data = load_pyg_frame_products_only(episode_dir, frame_idx) | |
| fd = load_frame_data(episode_dir, frame_idx) | |
| if fd.robot is None: | |
| return data | |
| graph = fd.graph | |
| type_vocab = graph["type_vocab"] | |
| products = graph["components"] | |
| N_prod = len(products) | |
| N = N_prod + 1 | |
| # ββ Build robot node features ββ | |
| robot_emb = fd.robot["embedding"].astype(np.float32) | |
| robot_pos = (fd.robot["centroid"].astype(np.float32) | |
| if int(fd.robot["depth_valid"][0]) == 1 | |
| else np.zeros(3, dtype=np.float32)) | |
| robot_type_oh = type_one_hot("robot", type_vocab) | |
| robot_feat = np.concatenate([ | |
| robot_emb, robot_pos, | |
| np.array(robot_type_oh, dtype=np.float32), | |
| np.array([1.0], dtype=np.float32), | |
| ]) | |
| x = torch.cat([data.x, torch.tensor(robot_feat, dtype=torch.float32).unsqueeze(0)], dim=0) | |
| # ββ Rebuild edges with 3D features ββ | |
| constraint_set = {(e["src"], e["dst"]) for e in graph["edges"]} | |
| pair_forward = {} | |
| for (s, d) in constraint_set: | |
| pair_forward[frozenset([s, d])] = (s, d) | |
| src_idx, dst_idx, edge_attr = [], [], [] | |
| # Products Γ Products | |
| for i in range(N_prod): | |
| for j in range(N_prod): | |
| if i == j: | |
| continue | |
| src_id = products[i]["id"] | |
| dst_id = products[j]["id"] | |
| src_idx.append(i) | |
| dst_idx.append(j) | |
| pair_key = frozenset([src_id, dst_id]) | |
| if pair_key in pair_forward: | |
| forward = pair_forward[pair_key] | |
| is_locked = fd.constraints.get(f"{forward[0]}->{forward[1]}", True) | |
| src_blocks_dst = 1.0 if src_id == forward[0] else 0.0 | |
| edge_attr.append([1.0, 1.0 if is_locked else 0.0, src_blocks_dst]) | |
| else: | |
| edge_attr.append([0.0, 0.0, 0.0]) | |
| # Robot β Products (both directions, message-passing only) | |
| robot_idx = N_prod | |
| for i in range(N_prod): | |
| src_idx.append(robot_idx); dst_idx.append(i); edge_attr.append([0.0, 0.0, 0.0]) | |
| src_idx.append(i); dst_idx.append(robot_idx); edge_attr.append([0.0, 0.0, 0.0]) | |
| data = Data( | |
| x=x, | |
| edge_index=torch.tensor([src_idx, dst_idx], dtype=torch.long), | |
| edge_attr=torch.tensor(edge_attr, dtype=torch.float32), | |
| y=torch.tensor([frame_idx], dtype=torch.long), | |
| num_nodes=N, | |
| ) | |
| data.robot_point_cloud = torch.tensor(fd.robot["point_cloud"], dtype=torch.float32) | |
| data.robot_pixel_coords = torch.tensor(fd.robot["pixel_coords"], dtype=torch.int32) | |
| data.robot_mask = torch.tensor(fd.robot["mask"], dtype=torch.uint8) | |
| return data | |
| # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| # Episode iterator | |
| # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| def iterate_episode(episode_dir: Path, with_robot: bool = True): | |
| """Yield (frame_idx, Data) pairs for all labeled frames in an episode.""" | |
| loader = load_pyg_frame_with_robot if with_robot else load_pyg_frame_products_only | |
| for frame_idx in list_labeled_frames(episode_dir): | |
| yield frame_idx, loader(episode_dir, frame_idx) | |
| ``` | |
| ### Usage Examples | |
| #### Variant 1: Constraint Graph Only (No Robot) | |
| ```python | |
| from pathlib import Path | |
| from gnn_disassembly_loader import load_pyg_frame_products_only, list_labeled_frames | |
| episode = Path("episode_00") # downloaded from HF | |
| # Enumerate labeled frames | |
| frames = list_labeled_frames(episode) | |
| print(f"Episode has {len(frames)} labeled frames") | |
| # β Episode has 246 labeled frames | |
| # Load one frame as a fully connected PyG graph (products only) | |
| data = load_pyg_frame_products_only(episode, frame_idx=42) | |
| print(data) | |
| # β Data(x=[15, 269], edge_index=[2, 210], edge_attr=[210, 3], y=[1], num_nodes=15) | |
| # For N=15 products: edges = 15 * 14 = 210 (fully connected) | |
| print("Node features:", data.x.shape) # (15, 269) | |
| print("Edges:", data.edge_index.shape) # (2, 210) | |
| print("Edge attrs:", data.edge_attr.shape) # (210, 3) = [has_constraint, is_locked, src_blocks_dst] | |
| # Count edge feature breakdown | |
| has_c = (data.edge_attr[:, 0] == 1).sum().item() | |
| locked = ((data.edge_attr[:, 0] == 1) & (data.edge_attr[:, 1] == 1)).sum().item() | |
| src_blocks = ((data.edge_attr[:, 0] == 1) & (data.edge_attr[:, 2] == 1)).sum().item() | |
| print(f"Edges with physical constraint: {has_c}") | |
| print(f" currently locked: {locked}") | |
| print(f" where src is the blocker: {src_blocks}") | |
| print(f"Message-passing-only edges: {(data.edge_attr[:, 0] == 0).sum().item()}") | |
| ``` | |
| #### Variant 2: Constraint Graph + Robot Agent Node | |
| ```python | |
| from gnn_disassembly_loader import load_pyg_frame_with_robot | |
| data = load_pyg_frame_with_robot(episode, frame_idx=42) | |
| print(data) | |
| # β Data(x=[16, 269], edge_index=[2, 240], edge_attr=[240, 3], y=[1], num_nodes=16) | |
| # Robot is the last node (index 15 for a 15-product graph). | |
| # Robot edges: 15 products * 2 directions = 30 extra edges β 210 + 30 = 240 | |
| # Verify robot edges are all message-passing (no constraint) | |
| robot_idx = data.num_nodes - 1 | |
| robot_edges = (data.edge_index[0] == robot_idx) | (data.edge_index[1] == robot_idx) | |
| assert (data.edge_attr[robot_edges] == 0).all(), "Robot edges must be [0, 0, 0]" | |
| print(f"Robot edges: {robot_edges.sum().item()} β all [0, 0, 0]") | |
| # Raw robot data (optional, for PointNet-style encoding) | |
| print("Robot point cloud:", data.robot_point_cloud.shape) # (M, 3) β M varies per frame | |
| print("Robot mask:", data.robot_mask.shape) # (720, 1280) | |
| ``` | |
| #### Edge Feature Semantics | |
| Each row of `data.edge_attr` is 3-dimensional: `[has_constraint, is_locked, src_blocks_dst]`. | |
| ``` | |
| ββββββββββββββββββββ¬βββββββββββββ¬βββββββββββββββββ¬ββββββββββββββββββββββββββββββββββ | |
| β has_constraint β is_locked β src_blocks_dst β Meaning β | |
| ββββββββββββββββββββΌβββββββββββββΌβββββββββββββββββΌββββββββββββββββββββββββββββββββββ€ | |
| β 0 β 0 β 0 β No physical constraint β | |
| β β β β Message passing only β | |
| ββββββββββββββββββββΌβββββββββββββΌβββββββββββββββββΌββββββββββββββββββββββββββββββββββ€ | |
| β 1 β 1 β 1 β Edge src physically blocks dst β | |
| β β β β Constraint currently LOCKED β | |
| ββββββββββββββββββββΌβββββββββββββΌβββββββββββββββββΌββββββββββββββββββββββββββββββββββ€ | |
| β 1 β 1 β 0 β Edge dst physically blocks src β | |
| β β β β (the reverse direction of the β | |
| β β β β physical constraint) β | |
| β β β β Constraint currently LOCKED β | |
| ββββββββββββββββββββΌβββββββββββββΌβββββββββββββββββΌββββββββββββββββββββββββββββββββββ€ | |
| β 1 β 0 β 1 β Edge src physically blocks dst β | |
| β β β β Constraint RELEASED β | |
| ββββββββββββββββββββΌβββββββββββββΌβββββββββββββββββΌββββββββββββββββββββββββββββββββββ€ | |
| β 1 β 0 β 0 β Edge dst physically blocks src β | |
| β β β β Constraint RELEASED β | |
| ββββββββββββββββββββ΄βββββββββββββ΄βββββββββββββββββ΄ββββββββββββββββββββββββββββββββββ | |
| ``` | |
| **Important:** The graph is **fully connected and structurally symmetric** β both `(A, B)` and `(B, A)` edges exist for every pair. `has_constraint` and `is_locked` are the same for both directions (they describe the unordered pair). `src_blocks_dst` flips between the two directions β it tells you whether the edge's source is the one doing the blocking. | |
| **Example: CPU bracket blocks CPU removal** | |
| If `cpu_bracket β cpu` is an active constraint, the loader produces: | |
| ``` | |
| Edge (cpu_bracket, cpu): [1, 1, 1] # cpu_bracket blocks cpu, locked, src=blocker | |
| Edge (cpu, cpu_bracket): [1, 1, 0] # same physical pair, src=blocked | |
| ``` | |
| When the user unlocks the constraint (e.g., after releasing the bracket): | |
| ``` | |
| Edge (cpu_bracket, cpu): [1, 0, 1] # constraint released, but bracket still named as blocker | |
| Edge (cpu, cpu_bracket): [1, 0, 0] | |
| ``` | |
| ### Iterating the Full Episode | |
| ```python | |
| from torch_geometric.loader import DataLoader | |
| from gnn_disassembly_loader import iterate_episode | |
| # Build a dataset list | |
| data_list = [data for _, data in iterate_episode(episode, with_robot=True)] | |
| print(f"Loaded {len(data_list)} frames") | |
| # Batch them for training | |
| loader = DataLoader(data_list, batch_size=8, shuffle=True) | |
| for batch in loader: | |
| print(batch.x.shape, batch.edge_index.shape, batch.edge_attr.shape) | |
| break | |
| ``` | |
| ### Adding Robot State as Node Features (Graph B) | |
| For the perception + robot state variant, concatenate the 13D robot state to every node: | |
| ```python | |
| import numpy as np | |
| import torch | |
| robot_states = np.load(episode / "robot_states.npy") # (T, 13) | |
| def add_robot_state_to_graph(data, frame_idx, robot_states): | |
| robot_state_t = torch.tensor(robot_states[frame_idx], dtype=torch.float32) # (13,) | |
| broadcast = robot_state_t.unsqueeze(0).expand(data.num_nodes, -1) # (N, 13) | |
| data.x = torch.cat([data.x, broadcast], dim=1) # (N, 282) | |
| return data | |
| data_b = add_robot_state_to_graph(data, frame_idx=42, robot_states=robot_states) | |
| print("Graph B node features:", data_b.x.shape) # (16, 282) for with_robot variant | |
| ``` | |
| ## Node Feature Layout (269D) | |
| ``` | |
| [0 : 256] SAM2 embedding (256D) β masked avg pool over vision_features | |
| [256 : 259] 3D position (3D) β centroid in camera frame (meters) | |
| [259 : 268] type one-hot (9D) β index by type_vocab (incl. "robot") | |
| [268] visibility (1D) β binary flag | |
| ``` | |
| Total: **269D per node**. | |
| For Graph B (with robot state broadcast): | |
| ``` | |
| [0 : 269] Graph A features (269D) | |
| [269 : 275] joint positions (6D) β UR5e joint angles (radians) | |
| [275 : 281] TCP pose (6D) β [x, y, z, rx, ry, rz] | |
| [281] gripper position (1D) β Robotiq 2F-85 (0-255) | |
| ``` | |
| Total: **282D per node**. | |
| ## Raw Data Access (No PyG) | |
| If you prefer raw NumPy without PyTorch Geometric: | |
| ```python | |
| from scripts.pyg_loader import load_frame_data | |
| fd = load_frame_data(episode, frame_idx=42) | |
| print("Graph:", fd.graph["components"]) | |
| print("Masks:", list(fd.masks.keys())) | |
| print("Resolved visibility:", fd.visibility) | |
| print("Robot present:", fd.robot is not None) | |
| if fd.robot is not None: | |
| print("Robot mask shape:", fd.robot["mask"].shape) | |
| print("Robot point cloud:", fd.robot["point_cloud"].shape) | |
| print("Robot centroid (m):", fd.robot["centroid"]) | |
| # Access a specific component's depth info | |
| for key in ["point_cloud", "pixel_coords", "centroid", "area", "depth_valid"]: | |
| full_key = f"cpu_fan_{key}" | |
| if full_key in fd.depth_info: | |
| print(f"cpu_fan {key}: {fd.depth_info[full_key]}") | |
| ``` | |
| ## Recording Hardware | |
| - **Robot:** UR5e + Robotiq 2F-85 gripper | |
| - **Side camera:** Luxonis OAK-D Pro (static viewpoint) | |
| - Intrinsics: fx=1033.8, fy=1033.7, cx=632.9, cy=359.9 | |
| - **Recording rate:** 30 Hz | |
| - **Image size:** 1280 Γ 720 | |
| - **Depth format:** uint16, millimeters | |
| - **Teleoperation:** Thrustmaster SOL-R2 HOSAS controllers | |
| ## Annotation Tool | |
| Annotations created with a custom SAM2-based labeling tool: | |
| - **Repository:** https://github.com/ChangChrisLiu/gnn-world-model | |
| - **Backend:** FastAPI + SAM2 (`sam2.1_hiera_base_plus`) | |
| - **Frontend:** Vanilla HTML/JS, side-only interactive view | |
| - **Tools:** BBox, Point, Polygon, Brush, Eraser (all mask-editing operations) | |
| - **Features:** Dynamic component instances, AGENT badge for robot, scroll-to-zoom, undo/redo, per-frame delta-encoded visibility | |
| ## License | |
| Released under **CC BY 4.0**. Use, share, and adapt freely with attribution. | |
| ## Acknowledgements | |
| Built using: | |
| - [Segment Anything Model 2 (SAM2)](https://github.com/facebookresearch/sam2) by Meta AI | |
| - [PyTorch Geometric](https://pytorch-geometric.readthedocs.io/) | |
| - [Hugging Face Datasets](https://huggingface.co/docs/datasets) | |