c1tr0n75
/

VoxelPathFinder

+---
+license: mit
+library_name: pytorch
+pipeline_tag: other
+tags:
+  - path-planning
+  - 3d
+  - voxels
+  - cnn
+  - transformer
+  - robotics
+  - pytorch
+  - inference
+  - Blender
+---
+### Voxel Path Finder (3D Voxel Path Planning with CNN+Transformer)
+This repository hosts the weights and code for a neural network that plans paths in a 3D voxel grid (32×32×32). The model encodes the voxelized environment (obstacles + start + goal) with a 3D CNN, fuses learned position embeddings, and autoregressively generates a sequence of movement actions with a Transformer decoder.
+- **Task**: 3D voxel path planning (generate action steps from start to goal)
+- **Actions**: 0..5 → [FORWARD, BACK, LEFT, RIGHT, UP, DOWN]
+- **Framework**: PyTorch
+- **License**: MIT
+### Model architecture (high level)
+- Voxel encoder: 3D CNN with 3 conv blocks → 512-d environment feature
+- Position encoder: learned embeddings over (x, y, z) → 64-d position feature
+- Planner: Transformer decoder over action tokens with START/END special tokens
+- Output: action token sequence; special tokens are excluded from final path
+### Inputs and outputs
+- **Input tensors**
+  - `voxel_data`: float tensor of shape `[1, 3, 32, 32, 32]`
+    Channels: [obstacles, start_mask, goal_mask]
+  - `positions`: long tensor of shape `[1, 2, 3]`
+    Format: `[[start_xyz, goal_xyz]]` with each coordinate in `[0, 31]`
+- **Output**
+  - Long tensor `[1, T]` of action IDs (0..5), padded internally with END if needed
+### Quickstart (inference)
+Make sure this repo includes both `final_model.pth` (or `model_state_dict`) and `pathfinding_nn.py`.
+```python
+import torch, numpy as np
+from huggingface_hub import hf_hub_download
+import importlib.util, sys
+REPO_ID = "c1tr0n75/VoxelPathFinder"
+Github = https://github.com/c1tr0n75/VoxelPathFinder
+# Download files from the Hub
+pth_path = hf_hub_download(repo_id=REPO_ID, filename="final_model.pth")
+py_path  = hf_hub_download(repo_id=REPO_ID, filename="pathfinding_nn.py")
+# Dynamically import the model code
+spec = importlib.util.spec_from_file_location("pathfinding_nn", py_path)
+mod = importlib.util.module_from_spec(spec)
+spec.loader.exec_module(mod)
+PathfindingNetwork = mod.PathfindingNetwork
+create_voxel_input = mod.create_voxel_input
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+model = PathfindingNetwork().to(device).eval()
+# Load weights (supports either a plain state_dict or {'model_state_dict': ...})
+ckpt = torch.load(pth_path, map_location=device)
+state = ckpt["model_state_dict"] if isinstance(ckpt, dict) and "model_state_dict" in ckpt else ckpt
+model.load_state_dict(state)
+# Build a random test environment
+voxel_dim = model.voxel_dim  # (32, 32, 32)
+D, H, W = voxel_dim
+obstacle_prob = 0.2
+obstacles = (np.random.rand(D, H, W) < obstacle_prob).astype(np.float32)
+free = np.argwhere(obstacles == 0)
+assert len(free) >= 2, "Not enough free cells; lower obstacle_prob"
+s_idx, g_idx = np.random.choice(len(free), size=2, replace=False)
+start = tuple(free[s_idx])
+goal = tuple(free[g_idx])
+voxel_np = create_voxel_input(obstacles, start, goal, voxel_dim=voxel_dim)  # (3,32,32,32)
+voxel = torch.from_numpy(voxel_np).float().unsqueeze(0).to(device)           # (1,3,32,32,32)
+pos = torch.tensor([[start, goal]], dtype=torch.long, device=device)         # (1,2,3)
+with torch.no_grad():
+    actions = model(voxel, pos)[0].tolist()
+ACTION_NAMES = ['FORWARD', 'BACK', 'LEFT', 'RIGHT', 'UP', 'DOWN']
+decoded = [ACTION_NAMES[a] for a in actions if 0 <= a < 6]
+print(f"Start: {start} | Goal: {goal}")
+print(f"Generated {len(decoded)} steps (first 30): {decoded[:30]}")
+```
+### Intended uses and limitations
+- **Intended**: Research and demo of 3D voxel path planning; educational examples; quick inference in CPU/GPU environments.
+- **Not intended**: Safety-critical navigation without additional validation; large scenes beyond 32³ without retraining; Blender-based generation on hosted environments.
+- The generated actions may not yield collision-free paths in complex scenes; downstream validation is recommended.
+### Training data and procedure
+- Synthetic voxel environments were generated (in-project tools leverage Blender for dataset creation and visualization).
+- Model trained to predict action sequences from start to goal; Loss includes cross-entropy over actions plus auxiliary turn/collision components.
+### Ethical considerations
+- This is a research model for toy 3D grids. It is not validated for real-world navigation where safety, environment dynamics, and constraints apply.
+### Citation
+If you use this model, please cite this repository: