English
Antoine1091's picture
Update README.md
b363e7b verified
---
license: mit
datasets:
- heig-vd-geo/GridNet-HD
language:
- en
metrics:
- mean_iou
base_model:
- openmmlab/upernet-swin-tiny
---
# GridNet-HD Baseline: Image semantic segmentation and LiDAR projection framework
## Overview
This repository provides a reproducible implementation of a semantic segmentation pipeline and 3D projection baseline used in our NeurIPS submission introducing the **GridNet-HD** dataset. The framework includes:
* A semantic segmentation pipeline transformer-based with `UperNetForSemanticSegmentation` (via HuggingFace Transformers).
* Support for high-resolution aerial imagery using random crop during training and sliding window inference at test time.
* Post-processing for projection of 2D semantic predictions onto LiDAR point clouds and depth-based visibility filtering.
* JAX-accelerated operations for efficient 3D projection.
* Logging and experiment tracking with Weights & Biases.
This implementation serves as one of the official baselines for GridNet-HD.
---
## Table of Contents
* [Project Structure](#project-structure)
* [Configuration](#configuration)
* [Environment](#environment)
* [Dataset Structure](#dataset-structure)
* [Installation](#setup--installation)
* [Supported Modes](#supported-modes)
* [Results](#results)
* [Pretrained Weights](#pretrained-weights)
* [Usage Examples](#usage-examples)
* [Weights & Biases Integration](#weights--biases-integration)
* [License](#license)
* [Contact](#contact)
* [Citation](#citation)
---
## Project Structure
```
project_root/
β”œβ”€β”€ main.py # Pipeline entry point
β”œβ”€β”€ config.yaml # Main configuration file
β”œβ”€β”€ datasets/
β”‚ └── semantic_dataset.py # Semantic segmentation dataset class
β”œβ”€β”€ models/
β”‚ └── upernet_wrapper.py # Model loading utility
β”œβ”€β”€ train/
β”‚ β”œβ”€β”€ train.py # Training loop
β”‚ └── eval.py # Evaluation loop
β”œβ”€β”€ inference/
β”‚ β”œβ”€β”€ inference.py # Sliding window inference and output saving
β”‚ β”œβ”€β”€ sliding_window.py # Core logic for windowed inference
β”‚ └── export_logits.py # Export of softmax probabilities
β”œβ”€β”€ projection/
β”‚ β”œβ”€β”€ lidar_projection.py # Projection of predictions to LiDAR space
β”‚ └── fast_proj.py # Utilities for projection (Agsoft conventions), accelerated with Jax
β”œβ”€β”€ utils/
β”‚ β”œβ”€β”€ logging_utils.py # Logging setup
β”‚ β”œβ”€β”€ metrics.py # Evaluation metrics (IoU, F1)
β”‚ └── seed.py # Reproducibility utilities
β”œβ”€β”€ best_model.pth # Weights for best model
└── requirements.txt # Python dependencies
```
---
## Configuration
All parameters are managed in **config.yaml**. Key sections include:
* `data`: paths, input dimensions, normalization statistics, class remapping.
* `training`: optimizer settings, learning rate schedule, checkpoint directory.
* `validation`: batch sizes, projection parameters.
* `model`: pretrained backbone, number of classes, ignore index.
* `wandb`: project and entity names for Weights & Biases tracking.
Adjust these settings to match your dataset and compute environment.
**Example `config.yaml`:**
```yaml
data:
root_dir: "/path/to/GridNet-HD" # Root folder containing t1z4, t2z5, etc.
split_file: "/path/to/GridNet-HD/split.json" # JSON split file listing train/val/test folders
resize_size: [1760, 1318] # resize image and mask, PIL style (width, height)
crop_size: [512, 512] # random-crop (train) or sliding-window (val/test) to this size
# Image normalization
mean: [0.5, 0.5, 0.5]
std: [0.5, 0.5, 0.5]
class_map:
- keys: [0, 1, 2, 3, 4] # original values
value: 0 # new value (remap value)
- keys: [5]
value: 1
- keys: [6, 7]
value: 2
- keys: [8, 9, 10, 11]
value: 3
- keys: [14]
value: 4
- keys: [15]
value: 5
- keys: [16]
value: 6
- keys: [17, 18]
value: 7
- keys: [19]
value: 8
- keys: [20]
value: 9
- keys: [21]
value: 10
- keys: [12, 13, 255]
value: 255
model:
pretrained_model: "openmmlab/upernet-swin-tiny" # small and base version are possible (HuggingFace)
num_classes: 11 # target classes
ignore_index: 255 # 'ignore' in loss & metrics
training:
output_dir: "./outputs/run" # Where to save checkpoints & logs
seed: 42
batch_size: 32
num_workers: 8 # parallel workers for DataLoader
lr: 0.0001 # Initial learning rate
sched_step: 10 # Scheduler: step every N epochs
sched_gamma: 0.5 # multiply LR by this gamma
epochs: 60
eval_every: 5 # eval every n epochs
val:
batch_size: 8 # number of images per batch during validation and test
num_workers: 8 # parallel workers for DataLoader
batch_size_proj: 5000000 # number of points per batch to project on images
wandb:
project: "GridNet-HD-ImageOnly" # only used for training and validation
entity: "your-team"
```
---
## Environment
The following environment was used to train and evaluate the baseline model:
| Component | Details |
| --------------- | -------------------------------- |
| GPU | NVIDIA A40 (48 GB VRAM) |
| CUDA Version | 12.x |
| OS | Ubuntu 22.04 LTS |
| Python Version | 3.12 |
| PyTorch Version | 2.7+cu126 |
| Transformers | πŸ€— Transformers 4.51 |
| JAX | jax==0.6.0 |
| laspy | >= 2.0 |
| RAM | 256 GB (β‰₯ 64 GB recommended) |
⚠️ For operations involving batch sliding-window inference and 3D projection with JAX on large scenes, high VRAM is recommended, otherwise if CUDA OOM error, decrease:
```
val
batch_size
batch_size_proj
```
---
## Dataset Structure
The input data is structured by geographic zone, with RGB images, semantic masks, LiDAR scans, and camera pose files.
The structure of the GridNet-HD dataset remains the same (see [GridNet-HD dataset](https://huggingface.co/datasets/heig-vd-geo/GridNet-HD) for more information)
---
## Setup & Installation
1. **Clone the repository**:
```bash
git clone https://huggingface.co/heig-vd-geo/ImageVote_GridNet-HD_baseline
cd ImageVote_GridNet-HD_baseline
```
2. **Create a conda virtual environment**:
```bash
conda create -n gridnet_hd_image python=3.12
conda activate gridnet_hd_image
```
3. **Install dependencies**:
```bash
pip install --upgrade pip
pip install -r requirements.txt
```
---
## Supported Modes
Each mode is selected via the `--mode` argument in `main.py`.
| Mode | Description |
| -------------- | --------------------------------------------------- |
| `train` | Train the image segmentation model |
| `val` | Evaluate the model on validation set (2D) and return metrics at image level |
| `test` | Run inference on test set (saves predicted masks) |
| `test3d` | Run inference + reproject predictions to LiDAR (3D) saved in classif field in las file |
| `val3d` | Evaluate predictions projected onto LiDAR (3D) and return metrics at 3D level |
| `export_probs` | Export softmax logits for each input image |
| `project_probs_3d` | Export softmax logits for each input image onto each LiDAR point cloud to train the 3rd baseline |
---
### Results
The following table summarizes the per-class Intersection over Union (IoU) scores on the test set at 3D level. The model was trained using the configuration specified in `config.yaml`.
| Class | IoU (Test set) (%)|
|---------------------------|------------|
| Pylon | 85.09 |
| Conductor cable | 64.82 |
| Structural cable | 45.06 |
| Insulator | 71.07 |
| High vegetation | 83.86 |
| Low vegetation | 63.43 |
| Herbaceous vegetation | 84.45 |
| Rock, gravel, soil | 38.62 |
| Impervious soil (Road) | 80.69 |
| Water | 74.87 |
| Building | 68.09 |
| **Mean IoU (mIoU)** | **69.10** |
### Pretrained Weights
πŸ”— **Pretrained weights** for the best performing model are available for download directly in this repo.
> This checkpoint corresponds to the model trained using the configuration in `config.yaml`, achieving a mean IoU of **69.10%** on test set.
---
## Usage Examples
### Training
```bash
python main.py --mode train --config config.yaml
```
### 2D Validation
```bash
python main.py --mode val --weights_path best_model.pth
```
### 2D Inference
```bash
python main.py --mode test --weights_path best_model.pth
```
### 3D Inference (with LiDAR projection)
```bash
python main.py --mode test3d --weights_path best_model.pth
```
### 3D Validation
```bash
python main.py --mode val3d --weights_path best_model.pth
```
### Export Softmax Logits
```bash
python main.py --mode export_probs --weights_path best_model.pth
```
### Project Softmax logits onto LiDAR
```bash
python main.py --mode project_probs_3d --weights_path best_model.pth
```
---
## Weights & Biases Integration
To log training and evaluation to Weights & Biases:
```bash
wandb login
```
Set the project and entity fields in your `config.yaml` file.
---
## License
This project is open-sourced under the MIT License.
---
## Contact
For questions, issues, or contributions, please open an issue on the repository.
---
## Citation
If you use this repo in research, please cite:
GridNet-HD: A High-Resolution Multi-Modal Dataset for LiDAR-Image Fusion on Power Line Infrastructure
Masked Authors
Submitted to CVPR 2026.