--- license: mit datasets: - heig-vd-geo/GridNet-HD language: - en metrics: - mean_iou base_model: - openmmlab/upernet-swin-tiny --- # GridNet-HD Baseline: Image semantic segmentation and LiDAR projection framework ## Overview This repository provides a reproducible implementation of a semantic segmentation pipeline and 3D projection baseline used in our NeurIPS submission introducing the **GridNet-HD** dataset. The framework includes: * A semantic segmentation pipeline transformer-based with `UperNetForSemanticSegmentation` (via HuggingFace Transformers). * Support for high-resolution aerial imagery using random crop during training and sliding window inference at test time. * Post-processing for projection of 2D semantic predictions onto LiDAR point clouds and depth-based visibility filtering. * JAX-accelerated operations for efficient 3D projection. * Logging and experiment tracking with Weights & Biases. This implementation serves as one of the official baselines for GridNet-HD. --- ## Table of Contents * [Project Structure](#project-structure) * [Configuration](#configuration) * [Environment](#environment) * [Dataset Structure](#dataset-structure) * [Installation](#setup--installation) * [Supported Modes](#supported-modes) * [Results](#results) * [Pretrained Weights](#pretrained-weights) * [Usage Examples](#usage-examples) * [Weights & Biases Integration](#weights--biases-integration) * [License](#license) * [Contact](#contact) * [Citation](#citation) --- ## Project Structure ``` project_root/ ├── main.py # Pipeline entry point ├── config.yaml # Main configuration file ├── datasets/ │ └── semantic_dataset.py # Semantic segmentation dataset class ├── models/ │ └── upernet_wrapper.py # Model loading utility ├── train/ │ ├── train.py # Training loop │ └── eval.py # Evaluation loop ├── inference/ │ ├── inference.py # Sliding window inference and output saving │ ├── sliding_window.py # Core logic for windowed inference │ └── export_logits.py # Export of softmax probabilities ├── projection/ │ ├── lidar_projection.py # Projection of predictions to LiDAR space │ └── fast_proj.py # Utilities for projection (Agsoft conventions), accelerated with Jax ├── utils/ │ ├── logging_utils.py # Logging setup │ ├── metrics.py # Evaluation metrics (IoU, F1) │ └── seed.py # Reproducibility utilities ├── best_model.pth # Weights for best model └── requirements.txt # Python dependencies ``` --- ## Configuration All parameters are managed in **config.yaml**. Key sections include: * `data`: paths, input dimensions, normalization statistics, class remapping. * `training`: optimizer settings, learning rate schedule, checkpoint directory. * `validation`: batch sizes, projection parameters. * `model`: pretrained backbone, number of classes, ignore index. * `wandb`: project and entity names for Weights & Biases tracking. Adjust these settings to match your dataset and compute environment. **Example `config.yaml`:** ```yaml data: root_dir: "/path/to/GridNet-HD" # Root folder containing t1z4, t2z5, etc. split_file: "/path/to/GridNet-HD/split.json" # JSON split file listing train/val/test folders resize_size: [1760, 1318] # resize image and mask, PIL style (width, height) crop_size: [512, 512] # random-crop (train) or sliding-window (val/test) to this size # Image normalization mean: [0.5, 0.5, 0.5] std: [0.5, 0.5, 0.5] class_map: - keys: [0, 1, 2, 3, 4] # original values value: 0 # new value (remap value) - keys: [5] value: 1 - keys: [6, 7] value: 2 - keys: [8, 9, 10, 11] value: 3 - keys: [14] value: 4 - keys: [15] value: 5 - keys: [16] value: 6 - keys: [17, 18] value: 7 - keys: [19] value: 8 - keys: [20] value: 9 - keys: [21] value: 10 - keys: [12, 13, 255] value: 255 model: pretrained_model: "openmmlab/upernet-swin-tiny" # small and base version are possible (HuggingFace) num_classes: 11 # target classes ignore_index: 255 # 'ignore' in loss & metrics training: output_dir: "./outputs/run" # Where to save checkpoints & logs seed: 42 batch_size: 32 num_workers: 8 # parallel workers for DataLoader lr: 0.0001 # Initial learning rate sched_step: 10 # Scheduler: step every N epochs sched_gamma: 0.5 # multiply LR by this gamma epochs: 60 eval_every: 5 # eval every n epochs val: batch_size: 8 # number of images per batch during validation and test num_workers: 8 # parallel workers for DataLoader batch_size_proj: 5000000 # number of points per batch to project on images wandb: project: "GridNet-HD-ImageOnly" # only used for training and validation entity: "your-team" ``` --- ## Environment The following environment was used to train and evaluate the baseline model: | Component | Details | | --------------- | -------------------------------- | | GPU | NVIDIA A40 (48 GB VRAM) | | CUDA Version | 12.x | | OS | Ubuntu 22.04 LTS | | Python Version | 3.12 | | PyTorch Version | 2.7+cu126 | | Transformers | 🤗 Transformers 4.51 | | JAX | jax==0.6.0 | | laspy | >= 2.0 | | RAM | 256 GB (≥ 64 GB recommended) | ⚠️ For operations involving batch sliding-window inference and 3D projection with JAX on large scenes, high VRAM is recommended, otherwise if CUDA OOM error, decrease: ``` val batch_size batch_size_proj ``` --- ## Dataset Structure The input data is structured by geographic zone, with RGB images, semantic masks, LiDAR scans, and camera pose files. The structure of the GridNet-HD dataset remains the same (see [GridNet-HD dataset](https://huggingface.co/datasets/heig-vd-geo/GridNet-HD) for more information) --- ## Setup & Installation 1. **Clone the repository**: ```bash git clone https://huggingface.co/heig-vd-geo/ImageVote_GridNet-HD_baseline cd ImageVote_GridNet-HD_baseline ``` 2. **Create a conda virtual environment**: ```bash conda create -n gridnet_hd_image python=3.12 conda activate gridnet_hd_image ``` 3. **Install dependencies**: ```bash pip install --upgrade pip pip install -r requirements.txt ``` --- ## Supported Modes Each mode is selected via the `--mode` argument in `main.py`. | Mode | Description | | -------------- | --------------------------------------------------- | | `train` | Train the image segmentation model | | `val` | Evaluate the model on validation set (2D) and return metrics at image level | | `test` | Run inference on test set (saves predicted masks) | | `test3d` | Run inference + reproject predictions to LiDAR (3D) saved in classif field in las file | | `val3d` | Evaluate predictions projected onto LiDAR (3D) and return metrics at 3D level | | `export_probs` | Export softmax logits for each input image | | `project_probs_3d` | Export softmax logits for each input image onto each LiDAR point cloud to train the 3rd baseline | --- ### Results The following table summarizes the per-class Intersection over Union (IoU) scores on the test set at 3D level. The model was trained using the configuration specified in `config.yaml`. | Class | IoU (Test set) (%)| |---------------------------|------------| | Pylon | 85.09 | | Conductor cable | 64.82 | | Structural cable | 45.06 | | Insulator | 71.07 | | High vegetation | 83.86 | | Low vegetation | 63.43 | | Herbaceous vegetation | 84.45 | | Rock, gravel, soil | 38.62 | | Impervious soil (Road) | 80.69 | | Water | 74.87 | | Building | 68.09 | | **Mean IoU (mIoU)** | **69.10** | ### Pretrained Weights 🔗 **Pretrained weights** for the best performing model are available for download directly in this repo. > This checkpoint corresponds to the model trained using the configuration in `config.yaml`, achieving a mean IoU of **69.10%** on test set. --- ## Usage Examples ### Training ```bash python main.py --mode train --config config.yaml ``` ### 2D Validation ```bash python main.py --mode val --weights_path best_model.pth ``` ### 2D Inference ```bash python main.py --mode test --weights_path best_model.pth ``` ### 3D Inference (with LiDAR projection) ```bash python main.py --mode test3d --weights_path best_model.pth ``` ### 3D Validation ```bash python main.py --mode val3d --weights_path best_model.pth ``` ### Export Softmax Logits ```bash python main.py --mode export_probs --weights_path best_model.pth ``` ### Project Softmax logits onto LiDAR ```bash python main.py --mode project_probs_3d --weights_path best_model.pth ``` --- ## Weights & Biases Integration To log training and evaluation to Weights & Biases: ```bash wandb login ``` Set the project and entity fields in your `config.yaml` file. --- ## License This project is open-sourced under the MIT License. --- ## Contact For questions, issues, or contributions, please open an issue on the repository. --- ## Citation If you use this repo in research, please cite: GridNet-HD: A High-Resolution Multi-Modal Dataset for LiDAR-Image Fusion on Power Line Infrastructure Masked Authors Submitted to CVPR 2026.