CLFT-Sparse-AKS
Camera-LiDAR Fusion Transformer with Sparse Adaptive Kernel Selection
Model Description
CLFT-Sparse-AKS is a multi-modal semantic segmentation model that fuses camera (RGB) and LiDAR data for autonomous driving applications.
Key Features
- Sparse Adaptive Kernel Selection [3, 5, 7, 9] - Distance-based kernel size selection
- Semantic-Guided Depth Supervision - Direct supervision for kernel prediction
- SS2D (State Space 2D) - Mamba-based global context aggregation
- CUDA Graph Optimization - Efficient sparse attention processing
Performance (Waymo Dataset)
| Condition | Vehicle IoU | Human IoU |
|---|---|---|
| Day-Clear | 93.01% | 71.95% |
| Day-Rain | 93.84% | 70.45% |
| Night-Clear | 92.80% | 71.47% |
| Night-Rain | 91.99% | 67.54% |
| Average | 92.91% | 70.35% |
- Best Human IoU: 73.09% (Epoch 269)
- Inference Time: 28.95ms (34.5 FPS)
- Parameters: 120.03M
- VRAM: 3.46GB
Usage
import torch
from models.clft_sparse import CLFT_Sparse
# Load model
model = CLFT_Sparse(...)
checkpoint = torch.load('clft_sparse_epoch_269_best_human.pth')
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()
# Inference
with torch.no_grad():
output = model(rgb_input, lidar_input)
Requirements
- Python 3.10
- PyTorch 2.9.0+cu128
- NATTEN 0.21.1
- Mamba-SSM 2.3.0
Citation
@misc{clft_sparse_aks_2026,
title={CLFT-Sparse-AKS: Camera-LiDAR Fusion with Sparse Adaptive Kernel Selection},
author={Young},
year={2026},
url={https://github.com/mw701/CLFT_AKS}
}
Links
- GitHub: https://github.com/mw701/CLFT_AKS
- Technical Report: See
docs/CLFT_Sparse_AKS_Technical_Report.md
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support