CLFT-Sparse-AKS

Camera-LiDAR Fusion Transformer with Sparse Adaptive Kernel Selection

Model Description

CLFT-Sparse-AKS is a multi-modal semantic segmentation model that fuses camera (RGB) and LiDAR data for autonomous driving applications.

Key Features

  • Sparse Adaptive Kernel Selection [3, 5, 7, 9] - Distance-based kernel size selection
  • Semantic-Guided Depth Supervision - Direct supervision for kernel prediction
  • SS2D (State Space 2D) - Mamba-based global context aggregation
  • CUDA Graph Optimization - Efficient sparse attention processing

Performance (Waymo Dataset)

Condition Vehicle IoU Human IoU
Day-Clear 93.01% 71.95%
Day-Rain 93.84% 70.45%
Night-Clear 92.80% 71.47%
Night-Rain 91.99% 67.54%
Average 92.91% 70.35%
  • Best Human IoU: 73.09% (Epoch 269)
  • Inference Time: 28.95ms (34.5 FPS)
  • Parameters: 120.03M
  • VRAM: 3.46GB

Usage

import torch
from models.clft_sparse import CLFT_Sparse

# Load model
model = CLFT_Sparse(...)
checkpoint = torch.load('clft_sparse_epoch_269_best_human.pth')
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

# Inference
with torch.no_grad():
    output = model(rgb_input, lidar_input)

Requirements

  • Python 3.10
  • PyTorch 2.9.0+cu128
  • NATTEN 0.21.1
  • Mamba-SSM 2.3.0

Citation

@misc{clft_sparse_aks_2026,
  title={CLFT-Sparse-AKS: Camera-LiDAR Fusion with Sparse Adaptive Kernel Selection},
  author={Young},
  year={2026},
  url={https://github.com/mw701/CLFT_AKS}
}

Links

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support