YAML Metadata Warning:The pipeline tag "point-cloud-segmentation" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, audio-text-to-text, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-ranking, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, image-text-to-image, image-text-to-video, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, visual-document-retrieval, any-to-any, video-to-video, other

Relational Feature Distillation for Lightweight 3D Point Cloud Segmentation

**** · GitHub · Paper

Model Overview

TopoPT is a family of lightweight 3D point cloud segmentation models obtained by compressing LitePT-S through a compact student architecture (TrimPT) trained with Stage-wise Relational Feature Distillation (SRFD).

TrimPT reduces LitePT-S's channel widths from (36, 72, 144, 252, 504) to (36, 54, 108, 180, 360) and cuts stage-3 attention depth from 6 to 4 blocks, preserving the full 1024-token attention window. This yields 5.84 M parameters and 12.95 GFLOPs — 2.18× fewer parameters and 1.96× fewer FLOPs than LitePT-S.

SRFD is a training-only objective applied at the compressed attention stages (stages 3–4). It matches pairwise cosine-similarity matrices between teacher (frozen LitePT-S) and student (TrimPT) features, explicitly regularizing their local affinity structure. Stage-specific linear projectors align teacher and student channel dimensions before computing both a pointwise cosine loss (L_pw) and a relational Frobenius loss (L_rel). After training, the teacher and all projectors are discarded — TopoPT has the same deployed architecture, checkpoint size, FLOPs, and latency as TrimPT.

On ScanNet semantic segmentation, TopoPT achieves 76.6% mIoU at 5.84 M parameters — matching the official LitePT-S result (76.5%) with substantially fewer resources.

Available Checkpoints

All checkpoints follow the naming convention {dataset}-{task}-{model}-{epochs}.
lw-c = TrimPT (no distillation) · lw-c-kd = TopoPT (with SRFD)

Semantic Segmentation

Model	Benchmark	Epochs	val mIoU	Checkpoint
TrimPT	ScanNet	100	74.9	scannet-semseg-lw-c-100epoch
TrimPT	ScanNet	1200	75.6	scannet-semseg-lw-c-1200epoch
TopoPT	ScanNet	100	76.6	scannet-semseg-lw-c-kd-100epoch
TrimPT	NuScenes	50	81.2	nuscenes-semseg-lw-c-50epoch
TopoPT	NuScenes	50	81.4	nuscenes-semseg-lw-c-kd-50epoch
TrimPT	Structured3D	50	69.4	structured3d-semseg-lw-c-50epoch
TopoPT	Structured3D	50	70.3	structured3d-semseg-lw-c-kd-tl-50epoch

Instance Segmentation

Model	Benchmark	Epochs	mAP₅₀	Checkpoint
TrimPT	ScanNet	100	63.1	scannet-insseg-lw-c-100epoch
TrimPT	ScanNet	800	65.1	scannet-insseg-lw-c-800epoch
TopoPT	ScanNet	100	63.9	scannet-insseg-lw-c-kd-100epoch
TrimPT	ScanNet200	100	26.3	scannet200-insseg-lw-c-100epoch
TrimPT	ScanNet200	800	31.7	scannet200-insseg-lw-c-800epoch
TopoPT	ScanNet200	100	33.0	scannet200-insseg-lw-c-kd-100epoch

Ablation Checkpoints (ScanNet, 100 epochs)

These correspond to the compression ablation study (Table 2 in the paper).

Model	Description	Params (M)	mIoU	Checkpoint
LitePT-S (repro)	Official arch, reproduced	12.7	75.3	scannet-semseg-litept-reruun-100epoch
LW-A	Depth only	11.2	75.4	scannet-semseg-lw-a-100epoch
LW-B	Width only	6.6	74.9	scannet-semseg-lw-b-100epoch
LW-C (= TrimPT)	Depth + width	5.8	74.9	scannet-semseg-lw-c-100epoch
LW-D	Patch size only	12.7	74.2	scannet-semseg-lw-d-100epoch
LW-E	Depth + width + patch	5.8	74.1	scannet-semseg-lw-e-100epoch

Inference Efficiency

Profiled on NVIDIA RTX 3090 (batch size 1, 300 forward passes). Since the teacher and projectors are discarded after training, TopoPT and TrimPT have identical inference cost.

Dataset	Model	Params (M)	GFLOPs	Latency (ms)	FPS	Mem (GB)	Size (MB)
ScanNet	LitePT-S	12.71	25.42	34.08	29.34	1.332	145.8
ScanNet	TrimPT / TopoPT	5.84	12.95	30.78	32.49	1.211	67.1
NuScenes	LitePT-S	12.71	25.42	35.81	27.93	0.717	145.7
NuScenes	TrimPT / TopoPT	5.84	12.95	28.84	34.68	0.432	67.0

How to Use

Please refer to the GitHub repository for full setup, training, and evaluation instructions. The codebase follows the same interface as LitePT and Pointcept.

Quick start — loading a TopoPT checkpoint for inference:

# Install and set up environment following the GitHub README, then:
export PYTHONPATH=./
python tools/test.py \
    --config-file configs/scannet/semseg-lw-c-kd-100epoch.py \
    --num-gpus 4 \
    --options save_path=exp/topopt_scannet \
              weight=/path/to/model_best.pth

Architecture Details

Property	LitePT-S (teacher)	TrimPT / TopoPT (student)
Channels	(36, 72, 144, 252, 504)	(36, 54, 108, 180, 360)
Stage depths	(2, 2, 2, 6, 2)	(2, 2, 2, 4, 2)
Attention stages	3, 4	3, 4
Attention window	1024 tokens	1024 tokens
Parameters	12.71 M	5.84 M
GFLOPs	25.42	12.95

Stages 1–2 use sparse convolution; stages 3–4 use windowed multi-head self-attention with PointROPE positional encoding (same as LitePT). SRFD distillation is applied at stages 3–4 during training only.

Citation

@inproceedings{topopt2026,
    title={{Relational Feature Distillation for Lightweight 3D Point Cloud Segmentation}},
    author={Anonymous},
    booktitle={...},
    year={2026}
}

@inproceedings{yuelitept2026,
    title={{LitePT: Lighter Yet Stronger Point Transformer}},
    author={Yue, Yuanwen and Robert, Damien and Wang, Jianyuan and Hong, Sunghwan and Wegner, Jan Dirk and Rupprecht, Christian and Schindler, Konrad},
    booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    year={2026}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support