🧠 Point Transformer v3 & Dino-In-The-Room (DITR) on GridNet-HD

This repository provides the implementation and training pipelines for two models applied to the GridNet-HD dataset:

Point Transformer v3 (PTv3): baseline 3D model PTv3.
Dino-In-The-Room (DITR): a fusion architecture combining Point Transformer v3 with DINOv2 image features, following the methodology proposed in DITR. This model represents the current state-of-the-art in multimodal 3D–2D fusion on multiple dataset.

📂 Dataset Structure

The GridNet-HD dataset must follow the original structure:

dataset-root/
├── t1z5b/
│   ├── images/           # RGB images (.JPG)
│   ├── masks/            # Semantic segmentation masks (.png, single-channel label)
│   ├── lidar/            # LiDAR point cloud (.las format with field "ground_truth")
│   └── pose/             # Camera poses and intrinsics (text files)
├── t1z6a/
│   ├── ...
├── ...
├── split.json            # JSON file specifying the train/test split
└── README.md

Environment

The following environment was used to train and evaluate the baseline model.

Component	Details
GPU	4 x NVIDIA A40 (48 GB VRAM)
CUDA Version	12.x (installed in docker container)
OS	Ubuntu 22.04 LTS
RAM	512 GB

🧩 1. Point Transformer v3 (PTv3)

Start by clone the repo:

git clone https://huggingface.co/heig-vd-geo/PTv3_GridNet-HD_baseline

Data Preparation

python prepare_gridnethd.py \
  --gridnethd_root $path_to_GridNet-HD-dataset_public$ \
  --split_json $path_to_split.json$ \
  --out_root $path_to_PTv3_GridNet-HD_baseline$/data/gridnethd/pc \
  --pointcept_root $path_to_PTv3_GridNet-HD_baseline$ \
  --temporary_root $path_to_temp_directory$ \
  --dino_projection False

Training

This repository follows the same structure as Pointcept, enabling seamless integration. Launch the container and train as follows:

docker run --gpus all -it --rm --shm-size=240g \
  -v $path_to_PTv3_GridNet-HD_baseline$:/workspace/Pointcept \
  pointcept/pointcept:v1.6.0-pytorch2.5.0-cuda12.4-cudnn9-devel bash

cd Pointcept
export PYTHONPATH=./

python tools/train.py \
  --config-file configs/gridnethd/PTv3_gridnethd_color.py \
  --options save_path=exp/gridnethd/ptv3_color/ \
  --num-gpus 4

Evaluation

python tools/test.py \
  --config-file configs/gridnethd/PTv3_gridnethd_color.py \
  --options save_path=exp/gridnethd/ptv3_color/ \
  weight=model_best_PTv3.pth

🧠 2. Dino-In-The-Room (DITR)

Data Preparation

First, precompute DINOv2 image features:

python dinov2/compute_dinov2_features.py \
  --gridnethd_root $path_to_gridnet_hd$ \
  --split_json $path_to_split.json$

Then, prepare the dataset with feature projection enabled:

python prepare_gridnethd.py \
  --gridnethd_root $path_to_GridNet-HD-dataset_public$ \
  --split_json $path_to_split.json$ \
  --out_root $path_to_PTv3_GridNet-HD_baseline$/data/gridnethd/pc \
  --pointcept_root $path_to_PTv3_GridNet-HD_baseline$ \
  --temporary_root $path_to_temp_directory$ \
  --dino_projection True

Training

As with PTv3, integrate this repository within Pointcept:

docker run --gpus all -it --rm --shm-size=240g \
  -v $path_to_PTv3_GridNet-HD_baseline$:/workspace/Pointcept \
  pointcept/pointcept:v1.6.0-pytorch2.5.0-cuda12.4-cudnn9-devel bash

cd Pointcept
export PYTHONPATH=./

Evaluation

python tools/test.py \
  --config-file configs/gridnethd/DITR_gridnethd_color_dinov2 \
  --options save_path=exp/gridnethd/ditr/ \
  weight=model_best_DITR.pth

📊 Quantitative Results

PTv3 (XYZ + Color), using overlap on test set and TTA (Test Time Augmentations), same for DITR (Dino In The Room).

Class	PTv3 IoU (%)	DITR IoU (%)
Pylon	97.12	96.81
Conductor cable	85.88	89.07
Structural cable	53.22	57.80
Insulator	90.63	93.20
High vegetation	88.30	88.81
Low vegetation	33.93	41.99
Herbaceous vegetation	91.72	90.05
Rock, gravel, soil	51.88	44.26
Impervious soil (Road)	79.63	79.49
Water	29.68	71.86
Building	60.49	70.26
Mean IoU (mIoU)	69.32	74.87

🧾 References

Point Transformer v3 — PTv3 paper

@misc{wu2024pointtransformerv3simpler,
      title={Point Transformer V3: Simpler, Faster, Stronger}, 
      author={Xiaoyang Wu and Li Jiang and Peng-Shuai Wang and Zhijian Liu and Xihui Liu and Yu Qiao and Wanli Ouyang and Tong He and Hengshuang Zhao},
      year={2024},
      eprint={2312.10035},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2312.10035}, 
}

Dino-In-The-Room (DITR) — DITR paper

@misc{zeid2025dinoroomleveraging2d,
      title={DINO in the Room: Leveraging 2D Foundation Models for 3D Segmentation}, 
      author={Karim Abou Zeid and Kadir Yilmaz and Daan de Geus and Alexander Hermans and David Adrian and Timm Linder and Bastian Leibe},
      year={2025},
      eprint={2503.18944},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.18944}, 
}

DINOv2 — DINOv2 paper

@misc{oquab2024dinov2learningrobustvisual,
      title={DINOv2: Learning Robust Visual Features without Supervision}, 
      author={Maxime Oquab and Timothée Darcet and Théo Moutakanni and Huy Vo and Marc Szafraniec and Vasil Khalidov and Pierre Fernandez and Daniel Haziza and Francisco Massa and Alaaeldin El-Nouby and Mahmoud Assran and Nicolas Ballas and Wojciech Galuba and Russell Howes and Po-Yao Huang and Shang-Wen Li and Ishan Misra and Michael Rabbat and Vasu Sharma and Gabriel Synnaeve and Hu Xu and Hervé Jegou and Julien Mairal and Patrick Labatut and Armand Joulin and Piotr Bojanowski},
      year={2024},
      eprint={2304.07193},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2304.07193}, 
}

GridNet-HD Dataset — GridNet-HD paper

@misc{gridnet-hd-dataset,
      title={GridNet-HD: A High-Resolution Multi-Modal Dataset for LiDAR-Image Fusion on Power Line Infrastructure}, 
      author={Antoine Carreaud and Shanci Li and Malo De Lacour and Digre Frinde and Jan Skaloud and Adrien Gressin},
      year={2026},
      eprint={2601.13052},
      url={https://arxiv.org/abs/2601.13052}, 
}

🧑‍💻 Authors & Contact

For questions, please open an issue or contact us directly.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train heig-vd-geo/PTv3_GridNet-HD_baseline

Collection including heig-vd-geo/PTv3_GridNet-HD_baseline

GridNet-HD

Collection

A new multimodal dataset "GridNet-HD" specifically designed for 3D semantic segmentation of electrical infrastructures. • 7 items • Updated Jan 21 • 3

Papers for heig-vd-geo/PTv3_GridNet-HD_baseline

GridNet-HD: A High-Resolution Multi-Modal Dataset for LiDAR-Image Fusion on Power Line Infrastructure

Paper • 2601.13052 • Published Jan 19