--- license: mit tags: - semantic-segmentation - camera-lidar-fusion - autonomous-driving - waymo - pytorch datasets: - waymo language: - en --- # CLFT-Sparse-AKS **Camera-LiDAR Fusion Transformer with Sparse Adaptive Kernel Selection** ## Model Description CLFT-Sparse-AKS is a multi-modal semantic segmentation model that fuses camera (RGB) and LiDAR data for autonomous driving applications. ### Key Features - **Sparse Adaptive Kernel Selection** [3, 5, 7, 9] - Distance-based kernel size selection - **Semantic-Guided Depth Supervision** - Direct supervision for kernel prediction - **SS2D (State Space 2D)** - Mamba-based global context aggregation - **CUDA Graph Optimization** - Efficient sparse attention processing ## Performance (Waymo Dataset) | Condition | Vehicle IoU | Human IoU | |-----------|-------------|-----------| | Day-Clear | 93.01% | 71.95% | | Day-Rain | 93.84% | 70.45% | | Night-Clear | 92.80% | 71.47% | | Night-Rain | 91.99% | 67.54% | | **Average** | **92.91%** | **70.35%** | - **Best Human IoU**: 73.09% (Epoch 269) - **Inference Time**: 28.95ms (34.5 FPS) - **Parameters**: 120.03M - **VRAM**: 3.46GB ## Usage ```python import torch from models.clft_sparse import CLFT_Sparse # Load model model = CLFT_Sparse(...) checkpoint = torch.load('clft_sparse_epoch_269_best_human.pth') model.load_state_dict(checkpoint['model_state_dict']) model.eval() # Inference with torch.no_grad(): output = model(rgb_input, lidar_input) ``` ## Requirements - Python 3.10 - PyTorch 2.9.0+cu128 - NATTEN 0.21.1 - Mamba-SSM 2.3.0 ## Citation ```bibtex @misc{clft_sparse_aks_2026, title={CLFT-Sparse-AKS: Camera-LiDAR Fusion with Sparse Adaptive Kernel Selection}, author={Young}, year={2026}, url={https://github.com/mw701/CLFT_AKS} } ``` ## Links - **GitHub**: [https://github.com/mw701/CLFT_AKS](https://github.com/mw701/CLFT_AKS) - **Technical Report**: See `docs/CLFT_Sparse_AKS_Technical_Report.md`