---
license: mit
datasets:
- heig-vd-geo/GridNet-HD
language:
- en
metrics:
- mean_iou
base_model:
- openmmlab/upernet-swin-tiny
---
# GridNet-HD Baseline: Image semantic segmentation and LiDAR projection framework

## Overview

This repository provides a reproducible implementation of a semantic segmentation pipeline and 3D projection baseline used in our NeurIPS submission introducing the **GridNet-HD** dataset. The framework includes:

* A semantic segmentation pipeline transformer-based with `UperNetForSemanticSegmentation` (via HuggingFace Transformers).
* Support for high-resolution aerial imagery using random crop during training and sliding window inference at test time.
* Post-processing for projection of 2D semantic predictions onto LiDAR point clouds and depth-based visibility filtering.
* JAX-accelerated operations for efficient 3D projection.
* Logging and experiment tracking with Weights & Biases.

This implementation serves as one of the official baselines for GridNet-HD.

---

## Table of Contents

* [Project Structure](#project-structure)
* [Configuration](#configuration)
* [Environment](#environment)
* [Dataset Structure](#dataset-structure)
* [Installation](#setup--installation)
* [Supported Modes](#supported-modes)
* [Results](#results)
* [Pretrained Weights](#pretrained-weights)
* [Usage Examples](#usage-examples)
* [Weights & Biases Integration](#weights--biases-integration)
* [License](#license)
* [Contact](#contact)
* [Citation](#citation)

---

## Project Structure

```
project_root/
├── main.py                      # Pipeline entry point
├── config.yaml                  # Main configuration file
├── datasets/
│   └── semantic_dataset.py      # Semantic segmentation dataset class
├── models/
│   └── upernet_wrapper.py       # Model loading utility
├── train/
│   ├── train.py                 # Training loop
│   └── eval.py                  # Evaluation loop
├── inference/
│   ├── inference.py             # Sliding window inference and output saving
│   ├── sliding_window.py        # Core logic for windowed inference
│   └── export_logits.py         # Export of softmax probabilities
├── projection/
│   ├── lidar_projection.py      # Projection of predictions to LiDAR space
│   └── fast_proj.py             # Utilities for projection (Agsoft conventions), accelerated with Jax
├── utils/
│   ├── logging_utils.py         # Logging setup
│   ├── metrics.py               # Evaluation metrics (IoU, F1)
│   └── seed.py                  # Reproducibility utilities
├── best_model.pth               # Weights for best model
└── requirements.txt             # Python dependencies
```

---

## Configuration

All parameters are managed in **config.yaml**. Key sections include:

* `data`: paths, input dimensions, normalization statistics, class remapping.
* `training`: optimizer settings, learning rate schedule, checkpoint directory.
* `validation`: batch sizes, projection parameters.
* `model`: pretrained backbone, number of classes, ignore index.
* `wandb`: project and entity names for Weights & Biases tracking.

Adjust these settings to match your dataset and compute environment.

**Example `config.yaml`:**

```yaml
data:
  root_dir: "/path/to/GridNet-HD" # Root folder containing t1z4, t2z5, etc.
  split_file: "/path/to/GridNet-HD/split.json" # JSON split file listing train/val/test folders
  resize_size: [1760, 1318] # resize image and mask, PIL style (width, height)
  crop_size: [512, 512] # random-crop (train) or sliding-window (val/test) to this size
  # Image normalization
  mean: [0.5, 0.5, 0.5]
  std:  [0.5, 0.5, 0.5]
  class_map:
    - keys: [0, 1, 2, 3, 4] # original values
      value: 0              # new value (remap value)
    - keys: [5]
      value: 1
    - keys: [6, 7]
      value: 2
    - keys: [8, 9, 10, 11]
      value: 3
    - keys: [14]
      value: 4
    - keys: [15]
      value: 5
    - keys: [16]
      value: 6
    - keys: [17, 18]
      value: 7
    - keys: [19]
      value: 8
    - keys: [20]
      value: 9
    - keys: [21]
      value: 10
    - keys: [12, 13, 255]
      value: 255

model:
  pretrained_model: "openmmlab/upernet-swin-tiny" # small and base version are possible (HuggingFace)
  num_classes: 11 # target classes
  ignore_index: 255 # 'ignore' in loss & metrics

training:
  output_dir: "./outputs/run" # Where to save checkpoints & logs
  seed: 42
  batch_size: 32
  num_workers: 8 # parallel workers for DataLoader
  lr: 0.0001 # Initial learning rate
  sched_step: 10 # Scheduler: step every N epochs
  sched_gamma: 0.5 # multiply LR by this gamma
  epochs: 60
  eval_every: 5 # eval every n epochs
 
val:
 batch_size: 8 # number of images per batch during validation and test
 num_workers: 8 # parallel workers for DataLoader
 batch_size_proj: 5000000 # number of points per batch to project on images
 
wandb:
  project: "GridNet-HD-ImageOnly" # only used for training and validation
  entity:  "your-team"
```

---

## Environment

The following environment was used to train and evaluate the baseline model:

| Component       | Details                          |
| --------------- | -------------------------------- |
| GPU             | NVIDIA A40 (48 GB VRAM)          |
| CUDA Version    | 12.x                             |
| OS              | Ubuntu 22.04 LTS                 |
| Python Version  | 3.12                             |
| PyTorch Version | 2.7+cu126                        |
| Transformers    | 🤗 Transformers 4.51             |
| JAX             | jax==0.6.0                       |
| laspy           | >= 2.0                           |
| RAM             | 256 GB (≥ 64 GB recommended)     |

⚠️ For operations involving batch sliding-window inference and 3D projection with JAX on large scenes, high VRAM is recommended, otherwise if CUDA OOM error, decrease:

```
val
 batch_size
 batch_size_proj
```

---

## Dataset Structure

The input data is structured by geographic zone, with RGB images, semantic masks, LiDAR scans, and camera pose files.
The structure of the GridNet-HD dataset remains the same (see [GridNet-HD dataset](https://huggingface.co/datasets/heig-vd-geo/GridNet-HD) for more information)

---

## Setup & Installation

1. **Clone the repository**:
 
   ```bash
   git clone https://huggingface.co/heig-vd-geo/ImageVote_GridNet-HD_baseline
   cd ImageVote_GridNet-HD_baseline
   ```

2. **Create a conda virtual environment**:

   ```bash
   conda create -n gridnet_hd_image python=3.12
   conda activate gridnet_hd_image
   
   ```

3. **Install dependencies**:

   ```bash
   pip install --upgrade pip
   pip install -r requirements.txt
   ```

---

## Supported Modes

Each mode is selected via the `--mode` argument in `main.py`.

| Mode           | Description                                         |
| -------------- | --------------------------------------------------- |
| `train`        | Train the image segmentation model                        |
| `val`          | Evaluate the model on validation set (2D) and return metrics at image level         |
| `test`         | Run inference on test set (saves predicted masks)   |
| `test3d`       | Run inference + reproject predictions to LiDAR (3D) saved in classif field in las file |
| `val3d`        | Evaluate predictions projected onto LiDAR (3D) and return metrics at 3D level  |
| `export_probs` | Export softmax logits for each input image          |
| `project_probs_3d` | Export softmax logits for each input image onto each LiDAR point cloud to train the 3rd baseline          |

---

### Results

The following table summarizes the per-class Intersection over Union (IoU) scores on the test set at 3D level. The model was trained using the configuration specified in `config.yaml`.

| Class                     | IoU (Test set) (%)|
|---------------------------|------------|
| Pylon                     |   85.09     |
| Conductor cable           |   64.82     |
| Structural cable          |   45.06     |
| Insulator                 |   71.07     |
| High vegetation           |   83.86     |
| Low vegetation            |   63.43     |
| Herbaceous vegetation     |   84.45     |
| Rock, gravel, soil        |   38.62     |
| Impervious soil (Road)    |   80.69     |
| Water                     |   74.87     |
| Building                  |   68.09     |
| **Mean IoU (mIoU)**       | **69.10**   |

### Pretrained Weights

🔗 **Pretrained weights** for the best performing model are available for download directly in this repo.

> This checkpoint corresponds to the model trained using the configuration in `config.yaml`, achieving a mean IoU of **69.10%** on test set.

---

## Usage Examples

### Training

```bash
python main.py --mode train --config config.yaml
```

### 2D Validation

```bash
python main.py --mode val --weights_path best_model.pth
```

### 2D Inference

```bash
python main.py --mode test --weights_path best_model.pth
```

### 3D Inference (with LiDAR projection)

```bash
python main.py --mode test3d --weights_path best_model.pth
```

### 3D Validation

```bash
python main.py --mode val3d --weights_path best_model.pth
```

### Export Softmax Logits

```bash
python main.py --mode export_probs --weights_path best_model.pth
```

### Project Softmax logits onto LiDAR

```bash
python main.py --mode project_probs_3d --weights_path best_model.pth
```

---

## Weights & Biases Integration

To log training and evaluation to Weights & Biases:

```bash
wandb login
```

Set the project and entity fields in your `config.yaml` file.

---

## License

This project is open-sourced under the MIT License.

---

## Contact

For questions, issues, or contributions, please open an issue on the repository.

---

## Citation

If you use this repo in research, please cite:

    GridNet-HD: A High-Resolution Multi-Modal Dataset for LiDAR-Image Fusion on Power Line Infrastructure
    Masked Authors
    Submitted to CVPR 2026.