YAML Metadata Warning:The pipeline tag "point-cloud-segmentation" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, audio-text-to-text, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-ranking, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, image-text-to-image, image-text-to-video, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, visual-document-retrieval, any-to-any, video-to-video, other

Relational Feature Distillation for Lightweight 3D Point Cloud Segmentation

**** · GitHub · Paper


Model Overview

TopoPT is a family of lightweight 3D point cloud segmentation models obtained by compressing LitePT-S through a compact student architecture (TrimPT) trained with Stage-wise Relational Feature Distillation (SRFD).

TrimPT reduces LitePT-S's channel widths from (36, 72, 144, 252, 504) to (36, 54, 108, 180, 360) and cuts stage-3 attention depth from 6 to 4 blocks, preserving the full 1024-token attention window. This yields 5.84 M parameters and 12.95 GFLOPs — 2.18× fewer parameters and 1.96× fewer FLOPs than LitePT-S.

SRFD is a training-only objective applied at the compressed attention stages (stages 3–4). It matches pairwise cosine-similarity matrices between teacher (frozen LitePT-S) and student (TrimPT) features, explicitly regularizing their local affinity structure. Stage-specific linear projectors align teacher and student channel dimensions before computing both a pointwise cosine loss (L_pw) and a relational Frobenius loss (L_rel). After training, the teacher and all projectors are discarded — TopoPT has the same deployed architecture, checkpoint size, FLOPs, and latency as TrimPT.

On ScanNet semantic segmentation, TopoPT achieves 76.6% mIoU at 5.84 M parameters — matching the official LitePT-S result (76.5%) with substantially fewer resources.


Available Checkpoints

All checkpoints follow the naming convention {dataset}-{task}-{model}-{epochs}.
lw-c = TrimPT (no distillation) · lw-c-kd = TopoPT (with SRFD)

Semantic Segmentation

Model Benchmark Epochs val mIoU Checkpoint
TrimPT ScanNet 100 74.9 scannet-semseg-lw-c-100epoch
TrimPT ScanNet 1200 75.6 scannet-semseg-lw-c-1200epoch
TopoPT ScanNet 100 76.6 scannet-semseg-lw-c-kd-100epoch
TrimPT NuScenes 50 81.2 nuscenes-semseg-lw-c-50epoch
TopoPT NuScenes 50 81.4 nuscenes-semseg-lw-c-kd-50epoch
TrimPT Structured3D 50 69.4 structured3d-semseg-lw-c-50epoch
TopoPT Structured3D 50 70.3 structured3d-semseg-lw-c-kd-tl-50epoch

Instance Segmentation

Model Benchmark Epochs mAPâ‚…â‚€ Checkpoint
TrimPT ScanNet 100 63.1 scannet-insseg-lw-c-100epoch
TrimPT ScanNet 800 65.1 scannet-insseg-lw-c-800epoch
TopoPT ScanNet 100 63.9 scannet-insseg-lw-c-kd-100epoch
TrimPT ScanNet200 100 26.3 scannet200-insseg-lw-c-100epoch
TrimPT ScanNet200 800 31.7 scannet200-insseg-lw-c-800epoch
TopoPT ScanNet200 100 33.0 scannet200-insseg-lw-c-kd-100epoch

Ablation Checkpoints (ScanNet, 100 epochs)

These correspond to the compression ablation study (Table 2 in the paper).

Model Description Params (M) mIoU Checkpoint
LitePT-S (repro) Official arch, reproduced 12.7 75.3 scannet-semseg-litept-reruun-100epoch
LW-A Depth only 11.2 75.4 scannet-semseg-lw-a-100epoch
LW-B Width only 6.6 74.9 scannet-semseg-lw-b-100epoch
LW-C (= TrimPT) Depth + width 5.8 74.9 scannet-semseg-lw-c-100epoch
LW-D Patch size only 12.7 74.2 scannet-semseg-lw-d-100epoch
LW-E Depth + width + patch 5.8 74.1 scannet-semseg-lw-e-100epoch

Inference Efficiency

Profiled on NVIDIA RTX 3090 (batch size 1, 300 forward passes). Since the teacher and projectors are discarded after training, TopoPT and TrimPT have identical inference cost.

Dataset Model Params (M) GFLOPs Latency (ms) FPS Mem (GB) Size (MB)
ScanNet LitePT-S 12.71 25.42 34.08 29.34 1.332 145.8
ScanNet TrimPT / TopoPT 5.84 12.95 30.78 32.49 1.211 67.1
NuScenes LitePT-S 12.71 25.42 35.81 27.93 0.717 145.7
NuScenes TrimPT / TopoPT 5.84 12.95 28.84 34.68 0.432 67.0

How to Use

Please refer to the GitHub repository for full setup, training, and evaluation instructions. The codebase follows the same interface as LitePT and Pointcept.

Quick start — loading a TopoPT checkpoint for inference:

# Install and set up environment following the GitHub README, then:
export PYTHONPATH=./
python tools/test.py \
    --config-file configs/scannet/semseg-lw-c-kd-100epoch.py \
    --num-gpus 4 \
    --options save_path=exp/topopt_scannet \
              weight=/path/to/model_best.pth

Architecture Details

Property LitePT-S (teacher) TrimPT / TopoPT (student)
Channels (36, 72, 144, 252, 504) (36, 54, 108, 180, 360)
Stage depths (2, 2, 2, 6, 2) (2, 2, 2, 4, 2)
Attention stages 3, 4 3, 4
Attention window 1024 tokens 1024 tokens
Parameters 12.71 M 5.84 M
GFLOPs 25.42 12.95

Stages 1–2 use sparse convolution; stages 3–4 use windowed multi-head self-attention with PointROPE positional encoding (same as LitePT). SRFD distillation is applied at stages 3–4 during training only.


Citation

@inproceedings{topopt2026,
    title={{Relational Feature Distillation for Lightweight 3D Point Cloud Segmentation}},
    author={Anonymous},
    booktitle={...},
    year={2026}
}
@inproceedings{yuelitept2026,
    title={{LitePT: Lighter Yet Stronger Point Transformer}},
    author={Yue, Yuanwen and Robert, Damien and Wang, Jianyuan and Hong, Sunghwan and Wegner, Jan Dirk and Rupprecht, Christian and Schindler, Konrad},
    booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    year={2026}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support