Instructions to use EndeavourDD/gnn_wm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use EndeavourDD/gnn_wm with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("EndeavourDD/gnn_wm", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
File size: 46,045 Bytes
4ee0c8c | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 | ---
license: cc-by-4.0
task_categories:
- robotics
- image-segmentation
- graph-ml
language:
- en
tags:
- robotics
- manipulation
- disassembly
- constraint-graph
- gnn
- world-model
- sam2
- segmentation
- ur5e
size_categories:
- 1K<n<10K
pretty_name: GNN Disassembly World Model Dataset (v3)
---
# GNN Disassembly World Model Dataset (v3)
Real robot disassembly episodes with side-view **per-frame constraint graphs**, SAM2 segmentation masks, 256D feature embeddings, **full 3D depth information (point clouds)**, and synchronized robot states. The robot is labeled as a separate **agent node** with its own mask, embedding, and depth bundle.
**Project:** CoRL 2026 β GNN world model for constraint-aware video generation
**Author:** Chang Liu (Texas A&M University)
**Hardware:** UR5e + Robotiq 2F-85 gripper, OAK-D Pro (side view)
**Format version:** v3 (2026-04-10)
## Dataset Structure
```
episode_XX/
βββ metadata.json # Episode metadata, component counts, labeled frame count
βββ robot_states.npy # (T, 13) float32 β joint angles + TCP pose + gripper
βββ robot_actions.npy # (T-1, 13) float32 β frame-to-frame state deltas
βββ timestamps.npy # (T, 3) float64
βββ side/
β βββ rgb/frame_XXXXXX.png # 1280Γ720 RGB (side camera)
β βββ depth/frame_XXXXXX.npy # 1280Γ720 uint16 depth (mm)
βββ wrist/ # Raw wrist camera data (not used in v3)
β βββ rgb/...
β βββ depth/...
βββ annotations/
βββ side_graph.json # Constraint graph (products only, NO robot)
βββ side_masks/
β βββ frame_XXXXXX.npz # {component_id: (H,W) uint8} β products only
βββ side_embeddings/
β βββ frame_XXXXXX.npz # {component_id: (256,) float32} β products only
βββ side_depth_info/
β βββ frame_XXXXXX.npz # Per-product depth bundle (flat keys)
βββ side_robot/
β βββ frame_XXXXXX.npz # Robot bundle β ALWAYS written per labeled frame
βββ dataset_card.json # Format description
```
**Alignment guarantee:** every labeled frame has files in all 4 annotation directories. Files are aligned by frame index.
## Component Types (9 types)
**8 product types** (constraint nodes):
| Index | Type | Color | Notes |
|-------|------|-------|-------|
| 0 | `cpu_fan` | #FF6B6B | Always visible at start |
| 1 | `cpu_bracket` | #4ECDC4 | Hidden at start (under fan) |
| 2 | `cpu` | #45B7D1 | Hidden at start |
| 3 | `ram_clip` | #96CEB4 | Multiple instances: ram_clip_1, ram_clip_2, ... |
| 4 | `ram` | #FFEAA7 | Multiple instances: ram_1, ram_2, ... |
| 5 | `connector` | #DDA0DD | Multiple instances: connector_1, connector_2, ... |
| 6 | `graphic_card` | #FF8C42 | Always visible |
| 7 | `motherboard` | #8B5CF6 | Always visible (base) |
**1 agent type** (NOT in constraint graph):
| Index | Type | Color | Notes |
|-------|------|-------|-------|
| 8 | `robot` | #F5F5F5 | Labeled but stored separately. Added as agent node at training time. |
## Graph Semantics
### Constraint Graph (Sparse, Stored)
`side_graph.json` defines the **physical constraint relationships** between products. Directed edges: `A -> B` means "A must be removed before B can be removed" (A blocks B).
```
cpu_fan -> cpu_bracket (fan covers bracket)
cpu_fan -> motherboard (fan attached to board)
cpu_bracket -> cpu (bracket holds CPU)
cpu_bracket -> motherboard
cpu -> motherboard
ram_N -> motherboard
ram_clip_N -> motherboard
ram_clip_N -> ram_M (user manually pairs)
connector_N -> motherboard
graphic_card -> motherboard
```
**Edge states** are delta-encoded in `frame_states`:
- `locked: true` (1) β constraint active, component cannot be removed
- `locked: false` (0) β constraint released, component is free
- Monotonic: once unlocked, stays unlocked
### Fully Connected Graph (Built at Training Time)
For GNN message passing, the sparse constraint graph is expanded to a **fully connected directed** graph. Every ordered pair `(i, j)` where `i != j` gets an edge. Self-loops are excluded.
**Edge count:** For a graph with N nodes, there are **N Γ (N - 1)** directed edges (both directions for every pair).
**Edge features (2D):**
| `has_constraint` | `is_locked` | Meaning |
|---|---|---|
| 1 | 1 | Directed physical constraint `i β j` exists, currently active (locked) |
| 1 | 0 | Directed physical constraint `i β j` exists, released (unlocked) |
| 0 | 0 | No physical constraint in this direction β message passing only |
**Direction handling is asymmetric.** The physical constraint `A β B` (A blocks B's removal) is a one-way relationship:
- Edge `(A, B)` β `has_constraint = 1`
- Edge `(B, A)` β `has_constraint = 0` (no reverse constraint; still present for message passing)
For example, if `cpu_fan β cpu_bracket` is a constraint:
```
(cpu_fan, cpu_bracket) β has_constraint=1, is_locked=1 (physical, active)
(cpu_bracket, cpu_fan) β has_constraint=0, is_locked=0 (message passing only)
```
This ensures every node pair communicates during GNN layers while still encoding the directionality of the prerequisite relationship.
**Robot (agent node)** has NO physical constraints. All edges involving the robot (`robot β any_product`) have features `[0, 0]` β context-passing only.
**Node ordering:** Node indices in `edge_index` match the order of `components` in `side_graph.json`. When the robot is added (with `load_pyg_frame_with_robot`), it is appended at index `N_products` (the last position).
## Data File Schemas
### `side_graph.json`
```json
{
"view": "side",
"episode_id": "episode_00",
"goal_component": "connector_1",
"components": [
{"id": "cpu_fan", "type": "cpu_fan", "color": "#FF6B6B"},
{"id": "ram_1", "type": "ram", "color": "#FFEAA7"}
],
"edges": [
{"src": "cpu_fan", "dst": "cpu_bracket", "directed": true},
{"src": "ram_clip_1", "dst": "ram_1", "directed": true}
],
"frame_states": {
"0": {
"constraints": {"cpu_fan->cpu_bracket": true},
"visibility": {"cpu_bracket": false, "cpu": false, "robot": true}
},
"152": {
"constraints": {"cpu_fan->cpu_bracket": false},
"visibility": {"cpu_fan": false, "cpu_bracket": true, "cpu": true}
}
},
"node_positions": {"cpu_fan": [120, 80]},
"embedding_dim": 256,
"feature_extractor": "sam2.1_hiera_base_plus",
"type_vocab": ["cpu_fan", "cpu_bracket", "cpu", "ram_clip", "ram", "connector", "graphic_card", "motherboard", "robot"]
}
```
**Robot is NOT in components.** Robot is stored in `side_robot/`.
### `side_depth_info/frame_XXXXXX.npz`
**Always contains all 7 keys per component in `graph.components`.** Flat keys prefixed by component_id.
| Key | Shape | Dtype | Description |
|-----|-------|-------|-------------|
| `{cid}_point_cloud` | (N, 3) | float32 | 3D points in camera frame (meters). Empty (0, 3) if no valid depth. |
| `{cid}_pixel_coords` | (N, 2) | int32 | (u, v) pixel coords of valid points |
| `{cid}_raw_depths_mm` | (N,) | uint16 | Raw depth values in mm, filtered to [50, 2000] |
| `{cid}_centroid` | (3,) | float32 | Mean of point_cloud; [0,0,0] if no valid depth |
| `{cid}_bbox_2d` | (4,) | int32 | [x1, y1, x2, y2] from mask |
| `{cid}_area` | (1,) | int32 | Mask pixel count |
| `{cid}_depth_valid` | (1,) | uint8 | 1 if N > 0 else 0 |
### `side_robot/frame_XXXXXX.npz`
**Always written per labeled frame** (with `visible=[0]` if robot not in this frame).
| Key | Shape | Dtype | Description |
|-----|-------|-------|-------------|
| `visible` | (1,) | uint8 | 1 if robot labeled, 0 otherwise |
| `mask` | (H, W) | uint8 | Binary mask |
| `embedding` | (256,) | float32 | SAM2 256D feature |
| `point_cloud` | (N, 3) | float32 | 3D points (meters) |
| `pixel_coords` | (N, 2) | int32 | (u, v) pixel coords |
| `raw_depths_mm` | (N,) | uint16 | Raw depths in mm |
| `centroid` | (3,) | float32 | Mean of point_cloud |
| `bbox_2d` | (4,) | int32 | From mask |
| `area` | (1,) | int32 | Pixel count |
| `depth_valid` | (1,) | uint8 | 1 if N > 0 else 0 |
### `metadata.json`
```json
{
"episode_id": "episode_00",
"goal_component": "connector_1",
"num_frames": 604,
"labeled_frame_count": 246,
"annotation_complete": false,
"component_counts": {
"cpu_fan": 1, "cpu_bracket": 1, "cpu": 1,
"ram": 2, "ram_clip": 4, "connector": 4,
"graphic_card": 1, "motherboard": 1
},
"format_version": "3.0",
"sam2_model": "sam2.1_hiera_b+",
"embedding_dim": 256,
"fps": 30,
"cameras": ["side"],
"robot": "UR5e",
"gripper": "Robotiq 2F-85"
}
```
## Test Data Available
One episode is fully labeled and validated β you can use it to test the loader:
**Labeled episode:** `session_0408_162129/episode_00`
| Stat | Value |
|------|-------|
| Total frames in episode | 604 |
| Labeled frames | **346** (range 0β351, 6 gaps) |
| Product components | 15 (cpu_fan, cpu_bracket, cpu, graphic_card, motherboard, connector_1..4, ram_1..2, ram_clip_1..4) |
| Physical constraints (edges) | 14 |
| Robot visibility | Visible in 216 / 346 frames |
| Goal component | `connector_1` |
### Download and Test (3 steps)
**Step 1: Download just one episode (lightweight)**
```bash
pip install huggingface_hub
```
```python
from huggingface_hub import snapshot_download
local_dir = snapshot_download(
repo_id="ChangChrisLiu/GNN_Disassembly_WorldModel",
repo_type="dataset",
allow_patterns=[
"session_0408_162129/episode_00/metadata.json",
"session_0408_162129/episode_00/robot_states.npy",
"session_0408_162129/episode_00/robot_actions.npy",
"session_0408_162129/episode_00/side/rgb/frame_000042.png",
"session_0408_162129/episode_00/side/depth/frame_000042.npy",
"session_0408_162129/episode_00/annotations/*",
],
)
print("Downloaded to:", local_dir)
```
**Step 2: Save the loader code** (copy the self-contained `gnn_disassembly_loader.py` block below into a file)
**Step 3: Run this test script** β it loads frame 42, prints the full graph anatomy, and verifies everything:
```python
from pathlib import Path
from gnn_disassembly_loader import (
load_pyg_frame_products_only,
load_pyg_frame_with_robot,
list_labeled_frames,
load_frame_data,
)
# After snapshot_download above:
episode = Path(local_dir) / "session_0408_162129" / "episode_00"
# 1. Enumerate labeled frames
frames = list_labeled_frames(episode)
assert len(frames) == 346, f"Expected 346 labeled frames, got {len(frames)}"
print(f"β Labeled frames: {len(frames)} (range {frames[0]}..{frames[-1]})")
# 2. Load frame 42 β products only
data1 = load_pyg_frame_products_only(episode, frame_idx=42)
assert data1.num_nodes == 15, f"Expected 15 products, got {data1.num_nodes}"
assert data1.edge_index.shape[1] == 15 * 14 # fully connected
assert data1.edge_attr.shape == (210, 3) # 3D edge features
print(f"β Products-only: {data1}")
# 3. Load frame 42 β with robot agent
data2 = load_pyg_frame_with_robot(episode, frame_idx=42)
assert data2.num_nodes == 16, f"Expected 15 products + 1 robot = 16, got {data2.num_nodes}"
assert data2.edge_index.shape[1] == 16 * 15
assert hasattr(data2, "robot_point_cloud")
print(f"β With robot: {data2}")
print(f" Robot point cloud: {tuple(data2.robot_point_cloud.shape)}")
print(f" Robot mask: {tuple(data2.robot_mask.shape)}")
# 4. Verify robot edges are all [0, 0, 0]
robot_idx = data2.num_nodes - 1
robot_edges = (data2.edge_index[0] == robot_idx) | (data2.edge_index[1] == robot_idx)
assert (data2.edge_attr[robot_edges] == 0).all()
print(f"β Robot edges: {robot_edges.sum().item()} β all [0,0,0]")
# 5. Verify edge feature semantics
has_c = (data1.edge_attr[:, 0] == 1).sum().item()
locked = ((data1.edge_attr[:, 0] == 1) & (data1.edge_attr[:, 1] == 1)).sum().item()
src_blocks = ((data1.edge_attr[:, 0] == 1) & (data1.edge_attr[:, 2] == 1)).sum().item()
assert has_c == 28 # 14 constraints Γ 2 directions
assert locked == 28 # all locked at frame 42
assert src_blocks == 14 # half the constraint edges have src as blocker
print(f"β Edge features: {has_c} constraint edges, {locked} locked, {src_blocks} forward-direction")
# 6. Verify fully-connected + symmetric structure
from collections import Counter
pairs = Counter()
for i in range(data1.edge_index.shape[1]):
src = data1.edge_index[0, i].item()
dst = data1.edge_index[1, i].item()
pairs[frozenset([src, dst])] += 1
# Every unordered pair should appear exactly twice: (i, j) AND (j, i)
assert all(count == 2 for count in pairs.values())
print(f"β Structurally symmetric: every pair has both directions")
# 7. Raw data access
fd = load_frame_data(episode, frame_idx=42)
print(f"β Raw data: {len(fd.masks)} product masks, robot {'visible' if fd.robot else 'hidden'}")
print("\nAll tests passed! The dataset is ready for training.")
```
Expected output:
```
β Labeled frames: 346 (range 0..351)
β Products-only: Data(x=[15, 269], edge_index=[2, 210], edge_attr=[210, 3], y=[1], num_nodes=15)
β With robot: Data(x=[16, 269], edge_index=[2, 240], edge_attr=[240, 3], y=[1], num_nodes=16, robot_point_cloud=[5729, 3], robot_pixel_coords=[5729, 2], robot_mask=[720, 1280])
Robot point cloud: (5729, 3)
Robot mask: (720, 1280)
β Robot edges: 30 β all [0,0,0]
β Edge features: 28 constraint edges, 28 locked, 14 forward-direction
β Structurally symmetric: every pair has both directions
β Raw data: 13 product masks, robot visible
All tests passed! The dataset is ready for training.
```
## Graph Structure β What You Get Per Frame
Every labeled frame is converted to **one PyTorch Geometric `Data` object**. Here's exactly what it contains:
### Node Features (269D per node)
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β x[i] = 269D feature vector for node i β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β [0 : 256] SAM2 embedding (256D) β
β Masked average pool over SAM2 encoder's vision_features. β
β Captures visual appearance of the component. β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β [256 : 259] 3D position (3D) β
β Centroid in camera frame, meters. Mean of the valid β
β depth-backprojected points within the mask. β
β Zero vector if no valid depth (check depth_valid flag). β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β [259 : 268] Type one-hot (9D) β
β Index order: cpu_fan, cpu_bracket, cpu, ram_clip, ram, β
β connector, graphic_card, motherboard, robot. β
β Multiple instances (e.g. ram_1, ram_2) share the same β
β one-hot β distinguished by their SAM2 embedding + 3D pos.β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β [268] Visibility (1D) β
β Binary flag β 1 if visible this frame, 0 if hidden. β
β Delta-encoded through frame_states in side_graph.json. β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
```
### Graph Topology β Fully Connected, Structurally Symmetric
For N nodes, the PyG graph has:
- `edge_index` shape: **(2, N Γ (N β 1))**
- Every ordered pair `(i, j)` with `i β j` has an edge
- Both `(i, j)` AND `(j, i)` exist β the graph is **not structurally directed**
- Self-loops are excluded
**Why fully connected?** Sparse constraint graphs (just physical prerequisites) would prevent distant nodes from exchanging information through GNN message passing. Making it fully connected ensures every node pair communicates in one layer.
### Edge Features (3D per edge)
```
βββββββββββββββββββ¬βββββββββββ¬βββββββββββββββββ¬ββββββββββββββββββββββββββ
β has_constraint β is_lockedβ src_blocks_dst β Meaning β
βββββββββββββββββββΌβββββββββββΌβββββββββββββββββΌββββββββββββββββββββββββββ€
β 0 β 0 β 0 β No physical constraint β
β β β β (message passing only) β
βββββββββββββββββββΌβββββββββββΌβββββββββββββββββΌββββββββββββββββββββββββββ€
β 1 β 1 β 1 β Physical constraint β
β β β β LOCKED, src is blocker β
βββββββββββββββββββΌβββββββββββΌβββββββββββββββββΌββββββββββββββββββββββββββ€
β 1 β 1 β 0 β Physical constraint β
β β β β LOCKED, src is blocked β
β β β β (reverse direction) β
βββββββββββββββββββΌβββββββββββΌβββββββββββββββββΌββββββββββββββββββββββββββ€
β 1 β 0 β 1 β Physical constraint β
β β β β RELEASED (unlocked) β
β β β β src is the blocker β
βββββββββββββββββββΌβββββββββββΌβββββββββββββββββΌββββββββββββββββββββββββββ€
β 1 β 0 β 0 β Physical constraint β
β β β β RELEASED, src is blockedβ
βββββββββββββββββββ΄βββββββββββ΄βββββββββββββββββ΄ββββββββββββββββββββββββββ
```
**Direction is a feature, not structure.**
- `has_constraint` and `is_locked` describe the PAIR β they're the same for both `(i,j)` and `(j,i)`.
- `src_blocks_dst` is asymmetric: it flips depending on which direction the edge goes.
**Example:** `cpu_fan` blocks `cpu_bracket` (fan covers bracket). At frame 0 (locked):
```
edge (cpu_fan, cpu_bracket) β [1, 1, 1] cpu_fan is the blocker
edge (cpu_bracket, cpu_fan) β [1, 1, 0] cpu_bracket is the blocked
```
At frame 152 after the user removes the fan (unlocked):
```
edge (cpu_fan, cpu_bracket) β [1, 0, 1]
edge (cpu_bracket, cpu_fan) β [1, 0, 0]
```
### Robot Agent Node (Optional)
When loaded with `load_pyg_frame_with_robot()`, the robot is appended as the **last node** (index `N_products`). All edges involving the robot have features `[0, 0, 0]` β the robot has no physical constraints, it's a context-providing agent node.
The raw robot data (point cloud, pixel coords, full mask) is attached as extra tensors on the `Data` object for optional PointNet-style encoding.
### Matching a Frame to Its RGB Image
Frame indices in the loader directly map to image files:
```python
frame_idx = 42
rgb_path = episode / "side" / "rgb" / f"frame_{frame_idx:06d}.png"
depth_path = episode / "side" / "depth" / f"frame_{frame_idx:06d}.npy"
```
Example β load PyG frame + matching image + depth:
```python
from pathlib import Path
import numpy as np
from PIL import Image
from gnn_disassembly_loader import load_pyg_frame_with_robot
episode = Path("episode_00")
frame_idx = 42
# PyG graph for this frame
data = load_pyg_frame_with_robot(episode, frame_idx)
# Matching RGB image (1280x720 PNG)
rgb = np.array(Image.open(episode / "side" / "rgb" / f"frame_{frame_idx:06d}.png"))
print("RGB shape:", rgb.shape) # (720, 1280, 3)
# Matching depth (1280x720 uint16 mm)
depth = np.load(episode / "side" / "depth" / f"frame_{frame_idx:06d}.npy")
print("Depth shape:", depth.shape, depth.dtype) # (720, 1280) uint16
# Robot mask is in the PyG data if robot is visible
if hasattr(data, "robot_mask"):
robot_mask = data.robot_mask.numpy() # (720, 1280) uint8
print("Robot mask area:", robot_mask.sum(), "pixels")
```
## Loading the Data β PyTorch Geometric
This section contains **self-contained** code you can copy-paste directly. No need to clone any repo.
### Prerequisites
```bash
pip install torch numpy torch_geometric pillow
```
### Self-contained PyG loader
Copy this into a file called `gnn_disassembly_loader.py`:
```python
"""Self-contained PyG loader for the GNN Disassembly dataset.
Two loader variants:
- load_pyg_frame_products_only(ep, frame) β constraint graph only, no robot
- load_pyg_frame_with_robot(ep, frame) β constraint graph + robot agent node
Both return torch_geometric.data.Data with:
x (N, 268) node features
edge_index (2, N*(N-1)) fully connected directed message-passing edges
edge_attr (N*(N-1), 3) [has_constraint, is_locked, src_blocks_dst]
num_nodes N
Notes on the edge feature design:
- The graph is FULLY CONNECTED and structurally symmetric.
Both (i, j) and (j, i) exist in edge_index for every node pair i != j.
- Direction is NOT encoded in the graph structure. It is encoded as
a feature: `src_blocks_dst`.
- `has_constraint` and `is_locked` are symmetric per pair (same value
for both (i, j) and (j, i)).
- `src_blocks_dst` is asymmetric: it is 1 if the edge's src node
physically blocks its dst node, 0 otherwise.
"""
import json
from dataclasses import dataclass
from pathlib import Path
from typing import Dict, List, Optional, Tuple
import numpy as np
import torch
from torch_geometric.data import Data
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# Helpers
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
def list_labeled_frames(episode_dir: Path) -> List[int]:
"""Return sorted list of frame indices that have saved annotations."""
mask_dir = episode_dir / "annotations" / "side_masks"
if not mask_dir.exists():
return []
frames = []
for p in mask_dir.glob("frame_*.npz"):
try:
frames.append(int(p.stem.split("_")[1]))
except (ValueError, IndexError):
continue
return sorted(frames)
def resolve_frame_state(graph_json: dict, frame_idx: int) -> Tuple[Dict[str, bool], Dict[str, bool]]:
"""Resolve delta-encoded constraints + visibility at a frame.
Walks frame_states from frame 0 to frame_idx, accumulating deltas.
Returns (constraints_dict, visibility_dict).
"""
constraints: Dict[str, bool] = {}
visibility: Dict[str, bool] = {}
# Defaults: every component visible, every edge locked
for c in graph_json["components"]:
visibility[c["id"]] = True
for e in graph_json["edges"]:
constraints[f"{e['src']}->{e['dst']}"] = True
# Walk deltas up to frame_idx
fs_dict = graph_json.get("frame_states", {})
for f in sorted([int(k) for k in fs_dict]):
if f > frame_idx:
break
fs = fs_dict[str(f)]
for k, v in fs.get("constraints", {}).items():
constraints[k] = v
for k, v in fs.get("visibility", {}).items():
visibility[k] = v
return constraints, visibility
def type_one_hot(comp_type: str, type_vocab: List[str]) -> List[float]:
"""9-dim one-hot encoding of component type based on type_vocab."""
return [1.0 if t == comp_type else 0.0 for t in type_vocab]
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# Raw data loader (NumPy only, no torch)
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
@dataclass
class FrameData:
graph: dict
masks: Dict[str, np.ndarray]
embeddings: Dict[str, np.ndarray]
depth_info: dict
robot: Optional[dict]
constraints: Dict[str, bool]
visibility: Dict[str, bool]
def load_frame_data(episode_dir: Path, frame_idx: int) -> FrameData:
"""Load all v3 annotation files for one frame."""
anno = episode_dir / "annotations"
with open(anno / "side_graph.json") as f:
graph = json.load(f)
def _load_npz_dict(path: Path) -> Dict[str, np.ndarray]:
if not path.exists():
return {}
d = np.load(path)
return {k: d[k] for k in d.files}
masks = _load_npz_dict(anno / "side_masks" / f"frame_{frame_idx:06d}.npz")
embeddings = _load_npz_dict(anno / "side_embeddings" / f"frame_{frame_idx:06d}.npz")
depth_info = _load_npz_dict(anno / "side_depth_info" / f"frame_{frame_idx:06d}.npz")
robot: Optional[dict] = None
robot_path = anno / "side_robot" / f"frame_{frame_idx:06d}.npz"
if robot_path.exists():
r = np.load(robot_path)
if r["visible"][0] == 1:
robot = {k: r[k] for k in r.files}
constraints, visibility = resolve_frame_state(graph, frame_idx)
return FrameData(graph, masks, embeddings, depth_info, robot, constraints, visibility)
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# PyG loader β products only
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
def load_pyg_frame_products_only(episode_dir: Path, frame_idx: int) -> Data:
"""Fully connected PyG graph WITHOUT robot.
Returns Data(
x=[N, 268],
edge_index=[2, N*(N-1)],
edge_attr=[N*(N-1), 3], # [has_constraint, is_locked, src_blocks_dst]
num_nodes=N,
)
where N = number of product components (robot excluded).
"""
fd = load_frame_data(episode_dir, frame_idx)
graph = fd.graph
type_vocab = graph["type_vocab"] # 9 entries incl. robot
nodes = graph["components"] # robot already excluded per spec
N = len(nodes)
# ββ Node features ββ
# [256D SAM2 embedding, 3D position, 9D type one-hot, 1D visibility] = 269
# NOTE: 256 + 3 + 9 + 1 = 269 (not 268). Adjust if you need a different layout.
x_list = []
for node in nodes:
cid = node["id"]
emb = fd.embeddings.get(cid, np.zeros(256, dtype=np.float32))
depth_valid_key = f"{cid}_depth_valid"
centroid_key = f"{cid}_centroid"
if (depth_valid_key in fd.depth_info
and int(fd.depth_info[depth_valid_key][0]) == 1):
pos = fd.depth_info[centroid_key].astype(np.float32)
else:
pos = np.zeros(3, dtype=np.float32)
type_oh = type_one_hot(node["type"], type_vocab) # 9D
vis = 1.0 if fd.visibility.get(cid, True) else 0.0
feat = np.concatenate([
emb.astype(np.float32),
pos,
np.array(type_oh, dtype=np.float32),
np.array([vis], dtype=np.float32),
])
x_list.append(feat)
x = torch.tensor(np.stack(x_list), dtype=torch.float32) if x_list else torch.empty((0, 269))
# ββ Fully connected edges with 3D features ββ
# Edge feature: [has_constraint, is_locked, src_blocks_dst]
# - has_constraint & is_locked are SYMMETRIC for the pair (A, B)
# - src_blocks_dst is ASYMMETRIC: 1 if edge's src physically blocks dst
constraint_set = {(e["src"], e["dst"]) for e in graph["edges"]}
pair_forward = {} # frozenset({a, b}) -> (blocker, blocked)
for (s, d) in constraint_set:
pair_forward[frozenset([s, d])] = (s, d)
src_idx, dst_idx, edge_attr = [], [], []
for i in range(N):
for j in range(N):
if i == j:
continue
src_id = nodes[i]["id"]
dst_id = nodes[j]["id"]
src_idx.append(i)
dst_idx.append(j)
pair_key = frozenset([src_id, dst_id])
if pair_key in pair_forward:
forward = pair_forward[pair_key]
constraint_key = f"{forward[0]}->{forward[1]}"
is_locked = fd.constraints.get(constraint_key, True)
src_blocks_dst = 1.0 if src_id == forward[0] else 0.0
edge_attr.append([
1.0,
1.0 if is_locked else 0.0,
src_blocks_dst,
])
else:
edge_attr.append([0.0, 0.0, 0.0]) # message passing only
return Data(
x=x,
edge_index=torch.tensor([src_idx, dst_idx], dtype=torch.long),
edge_attr=torch.tensor(edge_attr, dtype=torch.float32),
y=torch.tensor([frame_idx], dtype=torch.long),
num_nodes=N,
)
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# PyG loader β with robot agent node
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
def load_pyg_frame_with_robot(episode_dir: Path, frame_idx: int) -> Data:
"""Fully connected PyG graph WITH robot appended as agent node.
Robot is node N (the last node). All edges involving the robot have
features [0, 0, 0] because the robot has no physical constraints.
If the robot is not visible at this frame, returns the products-only graph.
Additional attached tensors when robot is visible:
data.robot_point_cloud (M, 3) float32
data.robot_pixel_coords (M, 2) int32
data.robot_mask (H, W) uint8
"""
data = load_pyg_frame_products_only(episode_dir, frame_idx)
fd = load_frame_data(episode_dir, frame_idx)
if fd.robot is None:
return data
graph = fd.graph
type_vocab = graph["type_vocab"]
products = graph["components"]
N_prod = len(products)
N = N_prod + 1
# ββ Build robot node features ββ
robot_emb = fd.robot["embedding"].astype(np.float32)
robot_pos = (fd.robot["centroid"].astype(np.float32)
if int(fd.robot["depth_valid"][0]) == 1
else np.zeros(3, dtype=np.float32))
robot_type_oh = type_one_hot("robot", type_vocab)
robot_feat = np.concatenate([
robot_emb, robot_pos,
np.array(robot_type_oh, dtype=np.float32),
np.array([1.0], dtype=np.float32),
])
x = torch.cat([data.x, torch.tensor(robot_feat, dtype=torch.float32).unsqueeze(0)], dim=0)
# ββ Rebuild edges with 3D features ββ
constraint_set = {(e["src"], e["dst"]) for e in graph["edges"]}
pair_forward = {}
for (s, d) in constraint_set:
pair_forward[frozenset([s, d])] = (s, d)
src_idx, dst_idx, edge_attr = [], [], []
# Products Γ Products
for i in range(N_prod):
for j in range(N_prod):
if i == j:
continue
src_id = products[i]["id"]
dst_id = products[j]["id"]
src_idx.append(i)
dst_idx.append(j)
pair_key = frozenset([src_id, dst_id])
if pair_key in pair_forward:
forward = pair_forward[pair_key]
is_locked = fd.constraints.get(f"{forward[0]}->{forward[1]}", True)
src_blocks_dst = 1.0 if src_id == forward[0] else 0.0
edge_attr.append([1.0, 1.0 if is_locked else 0.0, src_blocks_dst])
else:
edge_attr.append([0.0, 0.0, 0.0])
# Robot β Products (both directions, message-passing only)
robot_idx = N_prod
for i in range(N_prod):
src_idx.append(robot_idx); dst_idx.append(i); edge_attr.append([0.0, 0.0, 0.0])
src_idx.append(i); dst_idx.append(robot_idx); edge_attr.append([0.0, 0.0, 0.0])
data = Data(
x=x,
edge_index=torch.tensor([src_idx, dst_idx], dtype=torch.long),
edge_attr=torch.tensor(edge_attr, dtype=torch.float32),
y=torch.tensor([frame_idx], dtype=torch.long),
num_nodes=N,
)
data.robot_point_cloud = torch.tensor(fd.robot["point_cloud"], dtype=torch.float32)
data.robot_pixel_coords = torch.tensor(fd.robot["pixel_coords"], dtype=torch.int32)
data.robot_mask = torch.tensor(fd.robot["mask"], dtype=torch.uint8)
return data
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# Episode iterator
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
def iterate_episode(episode_dir: Path, with_robot: bool = True):
"""Yield (frame_idx, Data) pairs for all labeled frames in an episode."""
loader = load_pyg_frame_with_robot if with_robot else load_pyg_frame_products_only
for frame_idx in list_labeled_frames(episode_dir):
yield frame_idx, loader(episode_dir, frame_idx)
```
### Usage Examples
#### Variant 1: Constraint Graph Only (No Robot)
```python
from pathlib import Path
from gnn_disassembly_loader import load_pyg_frame_products_only, list_labeled_frames
episode = Path("episode_00") # downloaded from HF
# Enumerate labeled frames
frames = list_labeled_frames(episode)
print(f"Episode has {len(frames)} labeled frames")
# β Episode has 246 labeled frames
# Load one frame as a fully connected PyG graph (products only)
data = load_pyg_frame_products_only(episode, frame_idx=42)
print(data)
# β Data(x=[15, 269], edge_index=[2, 210], edge_attr=[210, 3], y=[1], num_nodes=15)
# For N=15 products: edges = 15 * 14 = 210 (fully connected)
print("Node features:", data.x.shape) # (15, 269)
print("Edges:", data.edge_index.shape) # (2, 210)
print("Edge attrs:", data.edge_attr.shape) # (210, 3) = [has_constraint, is_locked, src_blocks_dst]
# Count edge feature breakdown
has_c = (data.edge_attr[:, 0] == 1).sum().item()
locked = ((data.edge_attr[:, 0] == 1) & (data.edge_attr[:, 1] == 1)).sum().item()
src_blocks = ((data.edge_attr[:, 0] == 1) & (data.edge_attr[:, 2] == 1)).sum().item()
print(f"Edges with physical constraint: {has_c}")
print(f" currently locked: {locked}")
print(f" where src is the blocker: {src_blocks}")
print(f"Message-passing-only edges: {(data.edge_attr[:, 0] == 0).sum().item()}")
```
#### Variant 2: Constraint Graph + Robot Agent Node
```python
from gnn_disassembly_loader import load_pyg_frame_with_robot
data = load_pyg_frame_with_robot(episode, frame_idx=42)
print(data)
# β Data(x=[16, 269], edge_index=[2, 240], edge_attr=[240, 3], y=[1], num_nodes=16)
# Robot is the last node (index 15 for a 15-product graph).
# Robot edges: 15 products * 2 directions = 30 extra edges β 210 + 30 = 240
# Verify robot edges are all message-passing (no constraint)
robot_idx = data.num_nodes - 1
robot_edges = (data.edge_index[0] == robot_idx) | (data.edge_index[1] == robot_idx)
assert (data.edge_attr[robot_edges] == 0).all(), "Robot edges must be [0, 0, 0]"
print(f"Robot edges: {robot_edges.sum().item()} β all [0, 0, 0]")
# Raw robot data (optional, for PointNet-style encoding)
print("Robot point cloud:", data.robot_point_cloud.shape) # (M, 3) β M varies per frame
print("Robot mask:", data.robot_mask.shape) # (720, 1280)
```
#### Edge Feature Semantics
Each row of `data.edge_attr` is 3-dimensional: `[has_constraint, is_locked, src_blocks_dst]`.
```
ββββββββββββββββββββ¬βββββββββββββ¬βββββββββββββββββ¬ββββββββββββββββββββββββββββββββββ
β has_constraint β is_locked β src_blocks_dst β Meaning β
ββββββββββββββββββββΌβββββββββββββΌβββββββββββββββββΌββββββββββββββββββββββββββββββββββ€
β 0 β 0 β 0 β No physical constraint β
β β β β Message passing only β
ββββββββββββββββββββΌβββββββββββββΌβββββββββββββββββΌββββββββββββββββββββββββββββββββββ€
β 1 β 1 β 1 β Edge src physically blocks dst β
β β β β Constraint currently LOCKED β
ββββββββββββββββββββΌβββββββββββββΌβββββββββββββββββΌββββββββββββββββββββββββββββββββββ€
β 1 β 1 β 0 β Edge dst physically blocks src β
β β β β (the reverse direction of the β
β β β β physical constraint) β
β β β β Constraint currently LOCKED β
ββββββββββββββββββββΌβββββββββββββΌβββββββββββββββββΌββββββββββββββββββββββββββββββββββ€
β 1 β 0 β 1 β Edge src physically blocks dst β
β β β β Constraint RELEASED β
ββββββββββββββββββββΌβββββββββββββΌβββββββββββββββββΌββββββββββββββββββββββββββββββββββ€
β 1 β 0 β 0 β Edge dst physically blocks src β
β β β β Constraint RELEASED β
ββββββββββββββββββββ΄βββββββββββββ΄βββββββββββββββββ΄ββββββββββββββββββββββββββββββββββ
```
**Important:** The graph is **fully connected and structurally symmetric** β both `(A, B)` and `(B, A)` edges exist for every pair. `has_constraint` and `is_locked` are the same for both directions (they describe the unordered pair). `src_blocks_dst` flips between the two directions β it tells you whether the edge's source is the one doing the blocking.
**Example: CPU bracket blocks CPU removal**
If `cpu_bracket β cpu` is an active constraint, the loader produces:
```
Edge (cpu_bracket, cpu): [1, 1, 1] # cpu_bracket blocks cpu, locked, src=blocker
Edge (cpu, cpu_bracket): [1, 1, 0] # same physical pair, src=blocked
```
When the user unlocks the constraint (e.g., after releasing the bracket):
```
Edge (cpu_bracket, cpu): [1, 0, 1] # constraint released, but bracket still named as blocker
Edge (cpu, cpu_bracket): [1, 0, 0]
```
### Iterating the Full Episode
```python
from torch_geometric.loader import DataLoader
from gnn_disassembly_loader import iterate_episode
# Build a dataset list
data_list = [data for _, data in iterate_episode(episode, with_robot=True)]
print(f"Loaded {len(data_list)} frames")
# Batch them for training
loader = DataLoader(data_list, batch_size=8, shuffle=True)
for batch in loader:
print(batch.x.shape, batch.edge_index.shape, batch.edge_attr.shape)
break
```
### Adding Robot State as Node Features (Graph B)
For the perception + robot state variant, concatenate the 13D robot state to every node:
```python
import numpy as np
import torch
robot_states = np.load(episode / "robot_states.npy") # (T, 13)
def add_robot_state_to_graph(data, frame_idx, robot_states):
robot_state_t = torch.tensor(robot_states[frame_idx], dtype=torch.float32) # (13,)
broadcast = robot_state_t.unsqueeze(0).expand(data.num_nodes, -1) # (N, 13)
data.x = torch.cat([data.x, broadcast], dim=1) # (N, 282)
return data
data_b = add_robot_state_to_graph(data, frame_idx=42, robot_states=robot_states)
print("Graph B node features:", data_b.x.shape) # (16, 282) for with_robot variant
```
## Node Feature Layout (269D)
```
[0 : 256] SAM2 embedding (256D) β masked avg pool over vision_features
[256 : 259] 3D position (3D) β centroid in camera frame (meters)
[259 : 268] type one-hot (9D) β index by type_vocab (incl. "robot")
[268] visibility (1D) β binary flag
```
Total: **269D per node**.
For Graph B (with robot state broadcast):
```
[0 : 269] Graph A features (269D)
[269 : 275] joint positions (6D) β UR5e joint angles (radians)
[275 : 281] TCP pose (6D) β [x, y, z, rx, ry, rz]
[281] gripper position (1D) β Robotiq 2F-85 (0-255)
```
Total: **282D per node**.
## Raw Data Access (No PyG)
If you prefer raw NumPy without PyTorch Geometric:
```python
from scripts.pyg_loader import load_frame_data
fd = load_frame_data(episode, frame_idx=42)
print("Graph:", fd.graph["components"])
print("Masks:", list(fd.masks.keys()))
print("Resolved visibility:", fd.visibility)
print("Robot present:", fd.robot is not None)
if fd.robot is not None:
print("Robot mask shape:", fd.robot["mask"].shape)
print("Robot point cloud:", fd.robot["point_cloud"].shape)
print("Robot centroid (m):", fd.robot["centroid"])
# Access a specific component's depth info
for key in ["point_cloud", "pixel_coords", "centroid", "area", "depth_valid"]:
full_key = f"cpu_fan_{key}"
if full_key in fd.depth_info:
print(f"cpu_fan {key}: {fd.depth_info[full_key]}")
```
## Recording Hardware
- **Robot:** UR5e + Robotiq 2F-85 gripper
- **Side camera:** Luxonis OAK-D Pro (static viewpoint)
- Intrinsics: fx=1033.8, fy=1033.7, cx=632.9, cy=359.9
- **Recording rate:** 30 Hz
- **Image size:** 1280 Γ 720
- **Depth format:** uint16, millimeters
- **Teleoperation:** Thrustmaster SOL-R2 HOSAS controllers
## Annotation Tool
Annotations created with a custom SAM2-based labeling tool:
- **Repository:** https://github.com/ChangChrisLiu/gnn-world-model
- **Backend:** FastAPI + SAM2 (`sam2.1_hiera_base_plus`)
- **Frontend:** Vanilla HTML/JS, side-only interactive view
- **Tools:** BBox, Point, Polygon, Brush, Eraser (all mask-editing operations)
- **Features:** Dynamic component instances, AGENT badge for robot, scroll-to-zoom, undo/redo, per-frame delta-encoded visibility
## License
Released under **CC BY 4.0**. Use, share, and adapt freely with attribution.
## Acknowledgements
Built using:
- [Segment Anything Model 2 (SAM2)](https://github.com/facebookresearch/sam2) by Meta AI
- [PyTorch Geometric](https://pytorch-geometric.readthedocs.io/)
- [Hugging Face Datasets](https://huggingface.co/docs/datasets)
|