|
|
--- |
|
|
license: mit |
|
|
datasets: |
|
|
- heig-vd-geo/GridNet-HD |
|
|
language: |
|
|
- en |
|
|
metrics: |
|
|
- mean_iou |
|
|
base_model: |
|
|
- openmmlab/upernet-swin-tiny |
|
|
--- |
|
|
# GridNet-HD Baseline: Image semantic segmentation and LiDAR projection framework |
|
|
|
|
|
## Overview |
|
|
|
|
|
This repository provides a reproducible implementation of a semantic segmentation pipeline and 3D projection baseline used in our NeurIPS submission introducing the **GridNet-HD** dataset. The framework includes: |
|
|
|
|
|
* A semantic segmentation pipeline transformer-based with `UperNetForSemanticSegmentation` (via HuggingFace Transformers). |
|
|
* Support for high-resolution aerial imagery using random crop during training and sliding window inference at test time. |
|
|
* Post-processing for projection of 2D semantic predictions onto LiDAR point clouds and depth-based visibility filtering. |
|
|
* JAX-accelerated operations for efficient 3D projection. |
|
|
* Logging and experiment tracking with Weights & Biases. |
|
|
|
|
|
This implementation serves as one of the official baselines for GridNet-HD. |
|
|
|
|
|
--- |
|
|
|
|
|
## Table of Contents |
|
|
|
|
|
* [Project Structure](#project-structure) |
|
|
* [Configuration](#configuration) |
|
|
* [Environment](#environment) |
|
|
* [Dataset Structure](#dataset-structure) |
|
|
* [Installation](#setup--installation) |
|
|
* [Supported Modes](#supported-modes) |
|
|
* [Results](#results) |
|
|
* [Pretrained Weights](#pretrained-weights) |
|
|
* [Usage Examples](#usage-examples) |
|
|
* [Weights & Biases Integration](#weights--biases-integration) |
|
|
* [License](#license) |
|
|
* [Contact](#contact) |
|
|
* [Citation](#citation) |
|
|
|
|
|
--- |
|
|
|
|
|
## Project Structure |
|
|
|
|
|
``` |
|
|
project_root/ |
|
|
βββ main.py # Pipeline entry point |
|
|
βββ config.yaml # Main configuration file |
|
|
βββ datasets/ |
|
|
β βββ semantic_dataset.py # Semantic segmentation dataset class |
|
|
βββ models/ |
|
|
β βββ upernet_wrapper.py # Model loading utility |
|
|
βββ train/ |
|
|
β βββ train.py # Training loop |
|
|
β βββ eval.py # Evaluation loop |
|
|
βββ inference/ |
|
|
β βββ inference.py # Sliding window inference and output saving |
|
|
β βββ sliding_window.py # Core logic for windowed inference |
|
|
β βββ export_logits.py # Export of softmax probabilities |
|
|
βββ projection/ |
|
|
β βββ lidar_projection.py # Projection of predictions to LiDAR space |
|
|
β βββ fast_proj.py # Utilities for projection (Agsoft conventions), accelerated with Jax |
|
|
βββ utils/ |
|
|
β βββ logging_utils.py # Logging setup |
|
|
β βββ metrics.py # Evaluation metrics (IoU, F1) |
|
|
β βββ seed.py # Reproducibility utilities |
|
|
βββ best_model.pth # Weights for best model |
|
|
βββ requirements.txt # Python dependencies |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Configuration |
|
|
|
|
|
All parameters are managed in **config.yaml**. Key sections include: |
|
|
|
|
|
* `data`: paths, input dimensions, normalization statistics, class remapping. |
|
|
* `training`: optimizer settings, learning rate schedule, checkpoint directory. |
|
|
* `validation`: batch sizes, projection parameters. |
|
|
* `model`: pretrained backbone, number of classes, ignore index. |
|
|
* `wandb`: project and entity names for Weights & Biases tracking. |
|
|
|
|
|
Adjust these settings to match your dataset and compute environment. |
|
|
|
|
|
**Example `config.yaml`:** |
|
|
|
|
|
```yaml |
|
|
data: |
|
|
root_dir: "/path/to/GridNet-HD" # Root folder containing t1z4, t2z5, etc. |
|
|
split_file: "/path/to/GridNet-HD/split.json" # JSON split file listing train/val/test folders |
|
|
resize_size: [1760, 1318] # resize image and mask, PIL style (width, height) |
|
|
crop_size: [512, 512] # random-crop (train) or sliding-window (val/test) to this size |
|
|
# Image normalization |
|
|
mean: [0.5, 0.5, 0.5] |
|
|
std: [0.5, 0.5, 0.5] |
|
|
class_map: |
|
|
- keys: [0, 1, 2, 3, 4] # original values |
|
|
value: 0 # new value (remap value) |
|
|
- keys: [5] |
|
|
value: 1 |
|
|
- keys: [6, 7] |
|
|
value: 2 |
|
|
- keys: [8, 9, 10, 11] |
|
|
value: 3 |
|
|
- keys: [14] |
|
|
value: 4 |
|
|
- keys: [15] |
|
|
value: 5 |
|
|
- keys: [16] |
|
|
value: 6 |
|
|
- keys: [17, 18] |
|
|
value: 7 |
|
|
- keys: [19] |
|
|
value: 8 |
|
|
- keys: [20] |
|
|
value: 9 |
|
|
- keys: [21] |
|
|
value: 10 |
|
|
- keys: [12, 13, 255] |
|
|
value: 255 |
|
|
|
|
|
model: |
|
|
pretrained_model: "openmmlab/upernet-swin-tiny" # small and base version are possible (HuggingFace) |
|
|
num_classes: 11 # target classes |
|
|
ignore_index: 255 # 'ignore' in loss & metrics |
|
|
|
|
|
training: |
|
|
output_dir: "./outputs/run" # Where to save checkpoints & logs |
|
|
seed: 42 |
|
|
batch_size: 32 |
|
|
num_workers: 8 # parallel workers for DataLoader |
|
|
lr: 0.0001 # Initial learning rate |
|
|
sched_step: 10 # Scheduler: step every N epochs |
|
|
sched_gamma: 0.5 # multiply LR by this gamma |
|
|
epochs: 60 |
|
|
eval_every: 5 # eval every n epochs |
|
|
|
|
|
val: |
|
|
batch_size: 8 # number of images per batch during validation and test |
|
|
num_workers: 8 # parallel workers for DataLoader |
|
|
batch_size_proj: 5000000 # number of points per batch to project on images |
|
|
|
|
|
wandb: |
|
|
project: "GridNet-HD-ImageOnly" # only used for training and validation |
|
|
entity: "your-team" |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Environment |
|
|
|
|
|
The following environment was used to train and evaluate the baseline model: |
|
|
|
|
|
| Component | Details | |
|
|
| --------------- | -------------------------------- | |
|
|
| GPU | NVIDIA A40 (48 GB VRAM) | |
|
|
| CUDA Version | 12.x | |
|
|
| OS | Ubuntu 22.04 LTS | |
|
|
| Python Version | 3.12 | |
|
|
| PyTorch Version | 2.7+cu126 | |
|
|
| Transformers | π€ Transformers 4.51 | |
|
|
| JAX | jax==0.6.0 | |
|
|
| laspy | >= 2.0 | |
|
|
| RAM | 256 GB (β₯ 64 GB recommended) | |
|
|
|
|
|
β οΈ For operations involving batch sliding-window inference and 3D projection with JAX on large scenes, high VRAM is recommended, otherwise if CUDA OOM error, decrease: |
|
|
|
|
|
``` |
|
|
val |
|
|
batch_size |
|
|
batch_size_proj |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Dataset Structure |
|
|
|
|
|
The input data is structured by geographic zone, with RGB images, semantic masks, LiDAR scans, and camera pose files. |
|
|
The structure of the GridNet-HD dataset remains the same (see [GridNet-HD dataset](https://huggingface.co/datasets/heig-vd-geo/GridNet-HD) for more information) |
|
|
|
|
|
--- |
|
|
|
|
|
## Setup & Installation |
|
|
|
|
|
1. **Clone the repository**: |
|
|
|
|
|
```bash |
|
|
git clone https://huggingface.co/heig-vd-geo/ImageVote_GridNet-HD_baseline |
|
|
cd ImageVote_GridNet-HD_baseline |
|
|
``` |
|
|
|
|
|
2. **Create a conda virtual environment**: |
|
|
|
|
|
```bash |
|
|
conda create -n gridnet_hd_image python=3.12 |
|
|
conda activate gridnet_hd_image |
|
|
|
|
|
``` |
|
|
|
|
|
3. **Install dependencies**: |
|
|
|
|
|
```bash |
|
|
pip install --upgrade pip |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Supported Modes |
|
|
|
|
|
Each mode is selected via the `--mode` argument in `main.py`. |
|
|
|
|
|
| Mode | Description | |
|
|
| -------------- | --------------------------------------------------- | |
|
|
| `train` | Train the image segmentation model | |
|
|
| `val` | Evaluate the model on validation set (2D) and return metrics at image level | |
|
|
| `test` | Run inference on test set (saves predicted masks) | |
|
|
| `test3d` | Run inference + reproject predictions to LiDAR (3D) saved in classif field in las file | |
|
|
| `val3d` | Evaluate predictions projected onto LiDAR (3D) and return metrics at 3D level | |
|
|
| `export_probs` | Export softmax logits for each input image | |
|
|
| `project_probs_3d` | Export softmax logits for each input image onto each LiDAR point cloud to train the 3rd baseline | |
|
|
|
|
|
--- |
|
|
|
|
|
### Results |
|
|
|
|
|
The following table summarizes the per-class Intersection over Union (IoU) scores on the test set at 3D level. The model was trained using the configuration specified in `config.yaml`. |
|
|
|
|
|
| Class | IoU (Test set) (%)| |
|
|
|---------------------------|------------| |
|
|
| Pylon | 85.09 | |
|
|
| Conductor cable | 64.82 | |
|
|
| Structural cable | 45.06 | |
|
|
| Insulator | 71.07 | |
|
|
| High vegetation | 83.86 | |
|
|
| Low vegetation | 63.43 | |
|
|
| Herbaceous vegetation | 84.45 | |
|
|
| Rock, gravel, soil | 38.62 | |
|
|
| Impervious soil (Road) | 80.69 | |
|
|
| Water | 74.87 | |
|
|
| Building | 68.09 | |
|
|
| **Mean IoU (mIoU)** | **69.10** | |
|
|
|
|
|
### Pretrained Weights |
|
|
|
|
|
π **Pretrained weights** for the best performing model are available for download directly in this repo. |
|
|
|
|
|
> This checkpoint corresponds to the model trained using the configuration in `config.yaml`, achieving a mean IoU of **69.10%** on test set. |
|
|
|
|
|
--- |
|
|
|
|
|
## Usage Examples |
|
|
|
|
|
### Training |
|
|
|
|
|
```bash |
|
|
python main.py --mode train --config config.yaml |
|
|
``` |
|
|
|
|
|
### 2D Validation |
|
|
|
|
|
```bash |
|
|
python main.py --mode val --weights_path best_model.pth |
|
|
``` |
|
|
|
|
|
### 2D Inference |
|
|
|
|
|
```bash |
|
|
python main.py --mode test --weights_path best_model.pth |
|
|
``` |
|
|
|
|
|
### 3D Inference (with LiDAR projection) |
|
|
|
|
|
```bash |
|
|
python main.py --mode test3d --weights_path best_model.pth |
|
|
``` |
|
|
|
|
|
### 3D Validation |
|
|
|
|
|
```bash |
|
|
python main.py --mode val3d --weights_path best_model.pth |
|
|
``` |
|
|
|
|
|
### Export Softmax Logits |
|
|
|
|
|
```bash |
|
|
python main.py --mode export_probs --weights_path best_model.pth |
|
|
``` |
|
|
|
|
|
### Project Softmax logits onto LiDAR |
|
|
|
|
|
```bash |
|
|
python main.py --mode project_probs_3d --weights_path best_model.pth |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Weights & Biases Integration |
|
|
|
|
|
To log training and evaluation to Weights & Biases: |
|
|
|
|
|
```bash |
|
|
wandb login |
|
|
``` |
|
|
|
|
|
Set the project and entity fields in your `config.yaml` file. |
|
|
|
|
|
--- |
|
|
|
|
|
## License |
|
|
|
|
|
This project is open-sourced under the MIT License. |
|
|
|
|
|
--- |
|
|
|
|
|
## Contact |
|
|
|
|
|
For questions, issues, or contributions, please open an issue on the repository. |
|
|
|
|
|
--- |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this repo in research, please cite: |
|
|
|
|
|
GridNet-HD: A High-Resolution Multi-Modal Dataset for LiDAR-Image Fusion on Power Line Infrastructure |
|
|
Masked Authors |
|
|
Submitted to CVPR 2026. |
|
|
|
|
|
|
|
|
|
|
|
|