RAN (OpenUrban3D) β€” ANIMA Module

Part of the ANIMA Perception Suite by Robot Flow Labs.

Open-vocabulary 3D semantic segmentation for large-scale urban point clouds β€” without manual annotations, aligned multi-view images, or pre-trained segmentation networks.

Paper

OpenUrban3D: Annotation-Free Open-Vocabulary Semantic Segmentation of Large-Scale Urban Point Clouds Chongyu Wang, Kunlei Jing, Jihua Zhu, Di Wang arXiv:2509.10842 (Sep 2025)

Architecture

RAN implements a knowledge distillation pipeline:

  1. Multi-view rendering β€” Render 3D point clouds from 8 hemispherical camera viewpoints
  2. SLIC mask generation β€” Unsupervised superpixel segmentation on rendered views
  3. CLIP ViT-L/14 feature extraction β€” Extract 768-dim vision-language features per mask
  4. Sample-balanced fusion β€” Aggregate mask-level features to per-point embeddings
  5. MinkUNet distillation β€” Train a 3D backbone to predict CLIP features from raw point coordinates
  6. Zero-shot segmentation β€” At inference, compare point features with text queries via cosine similarity

Model Details

Parameter Value
3D Backbone MinkUNet (dense fallback)
Feature dim 768 (CLIP ViT-L/14 aligned)
Parameters 0.97M
VL Teacher CLIP ViT-L/14 (frozen)
Voxel size 0.2m

Training

Setting Value
Dataset SensatUrban (24 blocks, 29.9M points)
Optimizer Adam
Learning rate 1e-4 (cosine annealing + warmup)
Batch size 4
Epochs 43/60 (early stopped, patience=10)
Best val_loss 13.04
Final train_loss 8.03
Precision bf16 mixed
Hardware NVIDIA L4 (22GB)
Training time 61 min

Exported Formats

Format File Size Use Case
PyTorch (.pth) pytorch/ran_v1.pth 3.9 MB Training, fine-tuning
SafeTensors pytorch/ran_v1.safetensors 3.9 MB Fast loading, safe
ONNX onnx/ran_v1.onnx 3.9 MB Cross-platform inference
Checkpoint checkpoints/best.pth 11 MB Resume training (includes optimizer)

TensorRT exports deferred to target hardware (Jetson/L4).

Usage

import torch
from safetensors.torch import load_file

# Load model
weights = load_file("pytorch/ran_v1.safetensors")
# ... build model and load weights

# Zero-shot segmentation
point_features = model(point_cloud)  # (N, 768)
text_features = clip.encode_text(["building", "tree", "road"])  # (C, 768)
similarity = point_features @ text_features.T  # (N, C)
labels = similarity.argmax(dim=-1)  # (N,)

Files

pytorch/ran_v1.pth              PyTorch weights
pytorch/ran_v1.safetensors      SafeTensors weights
onnx/ran_v1.onnx                ONNX export (opset 17)
checkpoints/best.pth            Full checkpoint (model + optimizer + scheduler)
configs/training.yaml           Training configuration
logs/training_history.json      Loss curves
paper.pdf                       OpenUrban3D paper (arXiv:2509.10842)

License

Apache 2.0 β€” Robot Flow Labs / AIFLOW LABS LIMITED

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for ilessio-aiflowlab/project_ran