YAML Metadata Warning:The pipeline tag "point-cloud-segmentation" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, audio-text-to-text, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-ranking, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, image-text-to-image, image-text-to-video, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, visual-document-retrieval, any-to-any, video-to-video, other
Relational Feature Distillation for Lightweight 3D Point Cloud Segmentation
Model Overview
TopoPT is a family of lightweight 3D point cloud segmentation models obtained by compressing LitePT-S through a compact student architecture (TrimPT) trained with Stage-wise Relational Feature Distillation (SRFD).
TrimPT reduces LitePT-S's channel widths from (36, 72, 144, 252, 504) to (36, 54, 108, 180, 360) and cuts stage-3 attention depth from 6 to 4 blocks, preserving the full 1024-token attention window. This yields 5.84 M parameters and 12.95 GFLOPs — 2.18× fewer parameters and 1.96× fewer FLOPs than LitePT-S.
SRFD is a training-only objective applied at the compressed attention stages (stages 3–4). It matches pairwise cosine-similarity matrices between teacher (frozen LitePT-S) and student (TrimPT) features, explicitly regularizing their local affinity structure. Stage-specific linear projectors align teacher and student channel dimensions before computing both a pointwise cosine loss (L_pw) and a relational Frobenius loss (L_rel). After training, the teacher and all projectors are discarded — TopoPT has the same deployed architecture, checkpoint size, FLOPs, and latency as TrimPT.
On ScanNet semantic segmentation, TopoPT achieves 76.6% mIoU at 5.84 M parameters — matching the official LitePT-S result (76.5%) with substantially fewer resources.
Available Checkpoints
All checkpoints follow the naming convention {dataset}-{task}-{model}-{epochs}.lw-c = TrimPT (no distillation) · lw-c-kd = TopoPT (with SRFD)
Semantic Segmentation
| Model | Benchmark | Epochs | val mIoU | Checkpoint |
|---|---|---|---|---|
| TrimPT | ScanNet | 100 | 74.9 | scannet-semseg-lw-c-100epoch |
| TrimPT | ScanNet | 1200 | 75.6 | scannet-semseg-lw-c-1200epoch |
| TopoPT | ScanNet | 100 | 76.6 | scannet-semseg-lw-c-kd-100epoch |
| TrimPT | NuScenes | 50 | 81.2 | nuscenes-semseg-lw-c-50epoch |
| TopoPT | NuScenes | 50 | 81.4 | nuscenes-semseg-lw-c-kd-50epoch |
| TrimPT | Structured3D | 50 | 69.4 | structured3d-semseg-lw-c-50epoch |
| TopoPT | Structured3D | 50 | 70.3 | structured3d-semseg-lw-c-kd-tl-50epoch |
Instance Segmentation
| Model | Benchmark | Epochs | mAPâ‚…â‚€ | Checkpoint |
|---|---|---|---|---|
| TrimPT | ScanNet | 100 | 63.1 | scannet-insseg-lw-c-100epoch |
| TrimPT | ScanNet | 800 | 65.1 | scannet-insseg-lw-c-800epoch |
| TopoPT | ScanNet | 100 | 63.9 | scannet-insseg-lw-c-kd-100epoch |
| TrimPT | ScanNet200 | 100 | 26.3 | scannet200-insseg-lw-c-100epoch |
| TrimPT | ScanNet200 | 800 | 31.7 | scannet200-insseg-lw-c-800epoch |
| TopoPT | ScanNet200 | 100 | 33.0 | scannet200-insseg-lw-c-kd-100epoch |
Ablation Checkpoints (ScanNet, 100 epochs)
These correspond to the compression ablation study (Table 2 in the paper).
| Model | Description | Params (M) | mIoU | Checkpoint |
|---|---|---|---|---|
| LitePT-S (repro) | Official arch, reproduced | 12.7 | 75.3 | scannet-semseg-litept-reruun-100epoch |
| LW-A | Depth only | 11.2 | 75.4 | scannet-semseg-lw-a-100epoch |
| LW-B | Width only | 6.6 | 74.9 | scannet-semseg-lw-b-100epoch |
| LW-C (= TrimPT) | Depth + width | 5.8 | 74.9 | scannet-semseg-lw-c-100epoch |
| LW-D | Patch size only | 12.7 | 74.2 | scannet-semseg-lw-d-100epoch |
| LW-E | Depth + width + patch | 5.8 | 74.1 | scannet-semseg-lw-e-100epoch |
Inference Efficiency
Profiled on NVIDIA RTX 3090 (batch size 1, 300 forward passes). Since the teacher and projectors are discarded after training, TopoPT and TrimPT have identical inference cost.
| Dataset | Model | Params (M) | GFLOPs | Latency (ms) | FPS | Mem (GB) | Size (MB) |
|---|---|---|---|---|---|---|---|
| ScanNet | LitePT-S | 12.71 | 25.42 | 34.08 | 29.34 | 1.332 | 145.8 |
| ScanNet | TrimPT / TopoPT | 5.84 | 12.95 | 30.78 | 32.49 | 1.211 | 67.1 |
| NuScenes | LitePT-S | 12.71 | 25.42 | 35.81 | 27.93 | 0.717 | 145.7 |
| NuScenes | TrimPT / TopoPT | 5.84 | 12.95 | 28.84 | 34.68 | 0.432 | 67.0 |
How to Use
Please refer to the GitHub repository for full setup, training, and evaluation instructions. The codebase follows the same interface as LitePT and Pointcept.
Quick start — loading a TopoPT checkpoint for inference:
# Install and set up environment following the GitHub README, then:
export PYTHONPATH=./
python tools/test.py \
--config-file configs/scannet/semseg-lw-c-kd-100epoch.py \
--num-gpus 4 \
--options save_path=exp/topopt_scannet \
weight=/path/to/model_best.pth
Architecture Details
| Property | LitePT-S (teacher) | TrimPT / TopoPT (student) |
|---|---|---|
| Channels | (36, 72, 144, 252, 504) | (36, 54, 108, 180, 360) |
| Stage depths | (2, 2, 2, 6, 2) | (2, 2, 2, 4, 2) |
| Attention stages | 3, 4 | 3, 4 |
| Attention window | 1024 tokens | 1024 tokens |
| Parameters | 12.71 M | 5.84 M |
| GFLOPs | 25.42 | 12.95 |
Stages 1–2 use sparse convolution; stages 3–4 use windowed multi-head self-attention with PointROPE positional encoding (same as LitePT). SRFD distillation is applied at stages 3–4 during training only.
Citation
@inproceedings{topopt2026,
title={{Relational Feature Distillation for Lightweight 3D Point Cloud Segmentation}},
author={Anonymous},
booktitle={...},
year={2026}
}
@inproceedings{yuelitept2026,
title={{LitePT: Lighter Yet Stronger Point Transformer}},
author={Yue, Yuanwen and Robert, Damien and Wang, Jianyuan and Hong, Sunghwan and Wegner, Jan Dirk and Rupprecht, Christian and Schindler, Konrad},
booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2026}
}