CLFT-Sparse-AKS

Camera-LiDAR Fusion Transformer with Sparse Adaptive Kernel Selection

Model Description

CLFT-Sparse-AKS is a multi-modal semantic segmentation model that fuses camera (RGB) and LiDAR data for autonomous driving applications.

Key Features

Sparse Adaptive Kernel Selection [3, 5, 7, 9] - Distance-based kernel size selection
Semantic-Guided Depth Supervision - Direct supervision for kernel prediction
SS2D (State Space 2D) - Mamba-based global context aggregation
CUDA Graph Optimization - Efficient sparse attention processing

Performance (Waymo Dataset)

Condition	Vehicle IoU	Human IoU
Day-Clear	93.01%	71.95%
Day-Rain	93.84%	70.45%
Night-Clear	92.80%	71.47%
Night-Rain	91.99%	67.54%
Average	92.91%	70.35%

Best Human IoU: 73.09% (Epoch 269)
Inference Time: 28.95ms (34.5 FPS)
Parameters: 120.03M
VRAM: 3.46GB

Usage

import torch
from models.clft_sparse import CLFT_Sparse

# Load model
model = CLFT_Sparse(...)
checkpoint = torch.load('clft_sparse_epoch_269_best_human.pth')
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

# Inference
with torch.no_grad():
    output = model(rgb_input, lidar_input)

Requirements

Python 3.10
PyTorch 2.9.0+cu128
NATTEN 0.21.1
Mamba-SSM 2.3.0

Citation

@misc{clft_sparse_aks_2026,
  title={CLFT-Sparse-AKS: Camera-LiDAR Fusion with Sparse Adaptive Kernel Selection},
  author={Young},
  year={2026},
  url={https://github.com/mw701/CLFT_AKS}
}

BOUNG
/

CLFT-Sparse-AKS