English

🧠 Point Transformer v3 & Dino-In-The-Room (DITR) on GridNet-HD

This repository provides the implementation and training pipelines for two models applied to the GridNet-HD dataset:

  1. Point Transformer v3 (PTv3): baseline 3D model PTv3.
  2. Dino-In-The-Room (DITR): a fusion architecture combining Point Transformer v3 with DINOv2 image features, following the methodology proposed in DITR. This model represents the current state-of-the-art in multimodal 3D–2D fusion on multiple dataset.

πŸ“‚ Dataset Structure

The GridNet-HD dataset must follow the original structure:

dataset-root/
β”œβ”€β”€ t1z5b/
β”‚   β”œβ”€β”€ images/           # RGB images (.JPG)
β”‚   β”œβ”€β”€ masks/            # Semantic segmentation masks (.png, single-channel label)
β”‚   β”œβ”€β”€ lidar/            # LiDAR point cloud (.las format with field "ground_truth")
β”‚   └── pose/             # Camera poses and intrinsics (text files)
β”œβ”€β”€ t1z6a/
β”‚   β”œβ”€β”€ ...
β”œβ”€β”€ ...
β”œβ”€β”€ split.json            # JSON file specifying the train/test split
└── README.md

Environment

The following environment was used to train and evaluate the baseline model.

Component Details
GPU 4 x NVIDIA A40 (48 GB VRAM)
CUDA Version 12.x (installed in docker container)
OS Ubuntu 22.04 LTS
RAM 512 GB

🧩 1. Point Transformer v3 (PTv3)

Start by clone the repo:

git clone https://huggingface.co/heig-vd-geo/PTv3_GridNet-HD_baseline

Data Preparation

python prepare_gridnethd.py \
  --gridnethd_root $path_to_GridNet-HD-dataset_public$ \
  --split_json $path_to_split.json$ \
  --out_root $path_to_PTv3_GridNet-HD_baseline$/data/gridnethd/pc \
  --pointcept_root $path_to_PTv3_GridNet-HD_baseline$ \
  --temporary_root $path_to_temp_directory$ \
  --dino_projection False

Training

This repository follows the same structure as Pointcept, enabling seamless integration. Launch the container and train as follows:

docker run --gpus all -it --rm --shm-size=240g \
  -v $path_to_PTv3_GridNet-HD_baseline$:/workspace/Pointcept \
  pointcept/pointcept:v1.6.0-pytorch2.5.0-cuda12.4-cudnn9-devel bash

cd Pointcept
export PYTHONPATH=./

python tools/train.py \
  --config-file configs/gridnethd/PTv3_gridnethd_color.py \
  --options save_path=exp/gridnethd/ptv3_color/ \
  --num-gpus 4

Evaluation

python tools/test.py \
  --config-file configs/gridnethd/PTv3_gridnethd_color.py \
  --options save_path=exp/gridnethd/ptv3_color/ \
  weight=model_best_PTv3.pth

🧠 2. Dino-In-The-Room (DITR)

Data Preparation

First, precompute DINOv2 image features:

python dinov2/compute_dinov2_features.py \
  --gridnethd_root $path_to_gridnet_hd$ \
  --split_json $path_to_split.json$

Then, prepare the dataset with feature projection enabled:

python prepare_gridnethd.py \
  --gridnethd_root $path_to_GridNet-HD-dataset_public$ \
  --split_json $path_to_split.json$ \
  --out_root $path_to_PTv3_GridNet-HD_baseline$/data/gridnethd/pc \
  --pointcept_root $path_to_PTv3_GridNet-HD_baseline$ \
  --temporary_root $path_to_temp_directory$ \
  --dino_projection True

Training

As with PTv3, integrate this repository within Pointcept:

docker run --gpus all -it --rm --shm-size=240g \
  -v $path_to_PTv3_GridNet-HD_baseline$:/workspace/Pointcept \
  pointcept/pointcept:v1.6.0-pytorch2.5.0-cuda12.4-cudnn9-devel bash

cd Pointcept
export PYTHONPATH=./

Evaluation

python tools/test.py \
  --config-file configs/gridnethd/DITR_gridnethd_color_dinov2 \
  --options save_path=exp/gridnethd/ditr/ \
  weight=model_best_DITR.pth

πŸ“Š Quantitative Results

PTv3 (XYZ + Color), using overlap on test set and TTA (Test Time Augmentations), same for DITR (Dino In The Room).

Class PTv3 IoU (%) DITR IoU (%)
Pylon 97.12 96.81
Conductor cable 85.88 89.07
Structural cable 53.22 57.80
Insulator 90.63 93.20
High vegetation 88.30 88.81
Low vegetation 33.93 41.99
Herbaceous vegetation 91.72 90.05
Rock, gravel, soil 51.88 44.26
Impervious soil (Road) 79.63 79.49
Water 29.68 71.86
Building 60.49 70.26
Mean IoU (mIoU) 69.32 74.87

🧾 References

Point Transformer v3 β€” PTv3 paper

@misc{wu2024pointtransformerv3simpler,
      title={Point Transformer V3: Simpler, Faster, Stronger}, 
      author={Xiaoyang Wu and Li Jiang and Peng-Shuai Wang and Zhijian Liu and Xihui Liu and Yu Qiao and Wanli Ouyang and Tong He and Hengshuang Zhao},
      year={2024},
      eprint={2312.10035},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2312.10035}, 
}

Dino-In-The-Room (DITR) β€” DITR paper

@misc{zeid2025dinoroomleveraging2d,
      title={DINO in the Room: Leveraging 2D Foundation Models for 3D Segmentation}, 
      author={Karim Abou Zeid and Kadir Yilmaz and Daan de Geus and Alexander Hermans and David Adrian and Timm Linder and Bastian Leibe},
      year={2025},
      eprint={2503.18944},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.18944}, 
}

DINOv2 β€” DINOv2 paper

@misc{oquab2024dinov2learningrobustvisual,
      title={DINOv2: Learning Robust Visual Features without Supervision}, 
      author={Maxime Oquab and TimothΓ©e Darcet and ThΓ©o Moutakanni and Huy Vo and Marc Szafraniec and Vasil Khalidov and Pierre Fernandez and Daniel Haziza and Francisco Massa and Alaaeldin El-Nouby and Mahmoud Assran and Nicolas Ballas and Wojciech Galuba and Russell Howes and Po-Yao Huang and Shang-Wen Li and Ishan Misra and Michael Rabbat and Vasu Sharma and Gabriel Synnaeve and Hu Xu and HervΓ© Jegou and Julien Mairal and Patrick Labatut and Armand Joulin and Piotr Bojanowski},
      year={2024},
      eprint={2304.07193},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2304.07193}, 
}

GridNet-HD Dataset β€” GridNet-HD paper

@misc{gridnet-hd-dataset,
      title={GridNet-HD: A High-Resolution Multi-Modal Dataset for LiDAR-Image Fusion on Power Line Infrastructure}, 
      author={Antoine Carreaud and Shanci Li and Malo De Lacour and Digre Frinde and Jan Skaloud and Adrien Gressin},
      year={2026},
      eprint={2601.13052},
      url={https://arxiv.org/abs/2601.13052}, 
}

πŸ§‘β€πŸ’» Authors & Contact

For questions, please open an issue or contact us directly.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train heig-vd-geo/PTv3_GridNet-HD_baseline

Collection including heig-vd-geo/PTv3_GridNet-HD_baseline

Papers for heig-vd-geo/PTv3_GridNet-HD_baseline