ImageVote_GridNet-HD_baseline / README.md

Update README.md

b363e7b verified 2 months ago

10.1 kB

	---
	license: mit
	datasets:
	- heig-vd-geo/GridNet-HD
	language:
	- en
	metrics:
	- mean_iou
	base_model:
	- openmmlab/upernet-swin-tiny
	---
	# GridNet-HD Baseline: Image semantic segmentation and LiDAR projection framework

	## Overview

	This repository provides a reproducible implementation of a semantic segmentation pipeline and 3D projection baseline used in our NeurIPS submission introducing the GridNet-HD dataset. The framework includes:

	* A semantic segmentation pipeline transformer-based with `UperNetForSemanticSegmentation` (via HuggingFace Transformers).
	* Support for high-resolution aerial imagery using random crop during training and sliding window inference at test time.
	* Post-processing for projection of 2D semantic predictions onto LiDAR point clouds and depth-based visibility filtering.
	* JAX-accelerated operations for efficient 3D projection.
	* Logging and experiment tracking with Weights & Biases.

	This implementation serves as one of the official baselines for GridNet-HD.

	---

	## Table of Contents

	* [Project Structure](#project-structure)
	* [Configuration](#configuration)
	* [Environment](#environment)
	* [Dataset Structure](#dataset-structure)
	* [Installation](#setup--installation)
	* [Supported Modes](#supported-modes)
	* [Results](#results)
	* [Pretrained Weights](#pretrained-weights)
	* [Usage Examples](#usage-examples)
	* [Weights & Biases Integration](#weights--biases-integration)
	* [License](#license)
	* [Contact](#contact)
	* [Citation](#citation)

	---

	## Project Structure

	```
	project_root/
	├── main.py # Pipeline entry point
	├── config.yaml # Main configuration file
	├── datasets/
	│ └── semantic_dataset.py # Semantic segmentation dataset class
	├── models/
	│ └── upernet_wrapper.py # Model loading utility
	├── train/
	│ ├── train.py # Training loop
	│ └── eval.py # Evaluation loop
	├── inference/
	│ ├── inference.py # Sliding window inference and output saving
	│ ├── sliding_window.py # Core logic for windowed inference
	│ └── export_logits.py # Export of softmax probabilities
	├── projection/
	│ ├── lidar_projection.py # Projection of predictions to LiDAR space
	│ └── fast_proj.py # Utilities for projection (Agsoft conventions), accelerated with Jax
	├── utils/
	│ ├── logging_utils.py # Logging setup
	│ ├── metrics.py # Evaluation metrics (IoU, F1)
	│ └── seed.py # Reproducibility utilities
	├── best_model.pth # Weights for best model
	└── requirements.txt # Python dependencies
	```

	---

	## Configuration

	All parameters are managed in config.yaml. Key sections include:

	* `data`: paths, input dimensions, normalization statistics, class remapping.
	* `training`: optimizer settings, learning rate schedule, checkpoint directory.
	* `validation`: batch sizes, projection parameters.
	* `model`: pretrained backbone, number of classes, ignore index.
	* `wandb`: project and entity names for Weights & Biases tracking.

	Adjust these settings to match your dataset and compute environment.

	Example `config.yaml`:

	```yaml
	data:
	root_dir: "/path/to/GridNet-HD" # Root folder containing t1z4, t2z5, etc.
	split_file: "/path/to/GridNet-HD/split.json" # JSON split file listing train/val/test folders
	resize_size: [1760, 1318] # resize image and mask, PIL style (width, height)
	crop_size: [512, 512] # random-crop (train) or sliding-window (val/test) to this size
	# Image normalization
	mean: [0.5, 0.5, 0.5]
	std: [0.5, 0.5, 0.5]
	class_map:
	- keys: [0, 1, 2, 3, 4] # original values
	value: 0 # new value (remap value)
	- keys: [5]
	value: 1
	- keys: [6, 7]
	value: 2
	- keys: [8, 9, 10, 11]
	value: 3
	- keys: [14]
	value: 4
	- keys: [15]
	value: 5
	- keys: [16]
	value: 6
	- keys: [17, 18]
	value: 7
	- keys: [19]
	value: 8
	- keys: [20]
	value: 9
	- keys: [21]
	value: 10
	- keys: [12, 13, 255]
	value: 255

	model:
	pretrained_model: "openmmlab/upernet-swin-tiny" # small and base version are possible (HuggingFace)
	num_classes: 11 # target classes
	ignore_index: 255 # 'ignore' in loss & metrics

	training:
	output_dir: "./outputs/run" # Where to save checkpoints & logs
	seed: 42
	batch_size: 32
	num_workers: 8 # parallel workers for DataLoader
	lr: 0.0001 # Initial learning rate
	sched_step: 10 # Scheduler: step every N epochs
	sched_gamma: 0.5 # multiply LR by this gamma
	epochs: 60
	eval_every: 5 # eval every n epochs

	val:
	batch_size: 8 # number of images per batch during validation and test
	num_workers: 8 # parallel workers for DataLoader
	batch_size_proj: 5000000 # number of points per batch to project on images

	wandb:
	project: "GridNet-HD-ImageOnly" # only used for training and validation
	entity: "your-team"
	```

	---

	## Environment

	The following environment was used to train and evaluate the baseline model:

	\| Component \| Details \|
	\| --------------- \| -------------------------------- \|
	\| GPU \| NVIDIA A40 (48 GB VRAM) \|
	\| CUDA Version \| 12.x \|
	\| OS \| Ubuntu 22.04 LTS \|
	\| Python Version \| 3.12 \|
	\| PyTorch Version \| 2.7+cu126 \|
	\| Transformers \| 🤗 Transformers 4.51 \|
	\| JAX \| jax==0.6.0 \|
	\| laspy \| >= 2.0 \|
	\| RAM \| 256 GB (≥ 64 GB recommended) \|

	⚠️ For operations involving batch sliding-window inference and 3D projection with JAX on large scenes, high VRAM is recommended, otherwise if CUDA OOM error, decrease:

	```
	val
	batch_size
	batch_size_proj
	```

	---

	## Dataset Structure

	The input data is structured by geographic zone, with RGB images, semantic masks, LiDAR scans, and camera pose files.
	The structure of the GridNet-HD dataset remains the same (see [GridNet-HD dataset](https://huggingface.co/datasets/heig-vd-geo/GridNet-HD) for more information)

	---

	## Setup & Installation

	1. Clone the repository:

	```bash
	git clone https://huggingface.co/heig-vd-geo/ImageVote_GridNet-HD_baseline
	cd ImageVote_GridNet-HD_baseline
	```

	2. Create a conda virtual environment:

	```bash
	conda create -n gridnet_hd_image python=3.12
	conda activate gridnet_hd_image

	```

	3. Install dependencies:

	```bash
	pip install --upgrade pip
	pip install -r requirements.txt
	```

	---

	## Supported Modes

	Each mode is selected via the `--mode` argument in `main.py`.

	\| Mode \| Description \|
	\| -------------- \| --------------------------------------------------- \|
	\| `train` \| Train the image segmentation model \|
	\| `val` \| Evaluate the model on validation set (2D) and return metrics at image level \|
	\| `test` \| Run inference on test set (saves predicted masks) \|
	\| `test3d` \| Run inference + reproject predictions to LiDAR (3D) saved in classif field in las file \|
	\| `val3d` \| Evaluate predictions projected onto LiDAR (3D) and return metrics at 3D level \|
	\| `export_probs` \| Export softmax logits for each input image \|
	\| `project_probs_3d` \| Export softmax logits for each input image onto each LiDAR point cloud to train the 3rd baseline \|

	---

	### Results

	The following table summarizes the per-class Intersection over Union (IoU) scores on the test set at 3D level. The model was trained using the configuration specified in `config.yaml`.

	\| Class \| IoU (Test set) (%)\|
	\|---------------------------\|------------\|
	\| Pylon \| 85.09 \|
	\| Conductor cable \| 64.82 \|
	\| Structural cable \| 45.06 \|
	\| Insulator \| 71.07 \|
	\| High vegetation \| 83.86 \|
	\| Low vegetation \| 63.43 \|
	\| Herbaceous vegetation \| 84.45 \|
	\| Rock, gravel, soil \| 38.62 \|
	\| Impervious soil (Road) \| 80.69 \|
	\| Water \| 74.87 \|
	\| Building \| 68.09 \|
	\| Mean IoU (mIoU) \| 69.10 \|

	### Pretrained Weights

	🔗 Pretrained weights for the best performing model are available for download directly in this repo.

	> This checkpoint corresponds to the model trained using the configuration in `config.yaml`, achieving a mean IoU of 69.10% on test set.

	---

	## Usage Examples

	### Training

	```bash
	python main.py --mode train --config config.yaml
	```

	### 2D Validation

	```bash
	python main.py --mode val --weights_path best_model.pth
	```

	### 2D Inference

	```bash
	python main.py --mode test --weights_path best_model.pth
	```

	### 3D Inference (with LiDAR projection)

	```bash
	python main.py --mode test3d --weights_path best_model.pth
	```

	### 3D Validation

	```bash
	python main.py --mode val3d --weights_path best_model.pth
	```

	### Export Softmax Logits

	```bash
	python main.py --mode export_probs --weights_path best_model.pth
	```

	### Project Softmax logits onto LiDAR

	```bash
	python main.py --mode project_probs_3d --weights_path best_model.pth
	```

	---

	## Weights & Biases Integration

	To log training and evaluation to Weights & Biases:

	```bash
	wandb login
	```

	Set the project and entity fields in your `config.yaml` file.

	---

	## License

	This project is open-sourced under the MIT License.

	---

	## Contact

	For questions, issues, or contributions, please open an issue on the repository.

	---

	## Citation

	If you use this repo in research, please cite:

	GridNet-HD: A High-Resolution Multi-Modal Dataset for LiDAR-Image Fusion on Power Line Infrastructure
	Masked Authors
	Submitted to CVPR 2026.