---
license: mit
tags:
  - semantic-segmentation
  - camera-lidar-fusion
  - autonomous-driving
  - waymo
  - pytorch
datasets:
  - waymo
language:
  - en
---

# CLFT-Sparse-AKS

**Camera-LiDAR Fusion Transformer with Sparse Adaptive Kernel Selection**

## Model Description

CLFT-Sparse-AKS is a multi-modal semantic segmentation model that fuses camera (RGB) and LiDAR data for autonomous driving applications.

### Key Features
- **Sparse Adaptive Kernel Selection** [3, 5, 7, 9] - Distance-based kernel size selection
- **Semantic-Guided Depth Supervision** - Direct supervision for kernel prediction
- **SS2D (State Space 2D)** - Mamba-based global context aggregation
- **CUDA Graph Optimization** - Efficient sparse attention processing

## Performance (Waymo Dataset)

| Condition | Vehicle IoU | Human IoU |
|-----------|-------------|-----------|
| Day-Clear | 93.01% | 71.95% |
| Day-Rain | 93.84% | 70.45% |
| Night-Clear | 92.80% | 71.47% |
| Night-Rain | 91.99% | 67.54% |
| **Average** | **92.91%** | **70.35%** |

- **Best Human IoU**: 73.09% (Epoch 269)
- **Inference Time**: 28.95ms (34.5 FPS)
- **Parameters**: 120.03M
- **VRAM**: 3.46GB

## Usage

```python
import torch
from models.clft_sparse import CLFT_Sparse

# Load model
model = CLFT_Sparse(...)
checkpoint = torch.load('clft_sparse_epoch_269_best_human.pth')
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

# Inference
with torch.no_grad():
    output = model(rgb_input, lidar_input)
```

## Requirements

- Python 3.10
- PyTorch 2.9.0+cu128
- NATTEN 0.21.1
- Mamba-SSM 2.3.0

## Citation

```bibtex
@misc{clft_sparse_aks_2026,
  title={CLFT-Sparse-AKS: Camera-LiDAR Fusion with Sparse Adaptive Kernel Selection},
  author={Young},
  year={2026},
  url={https://github.com/mw701/CLFT_AKS}
}
```

## Links

- **GitHub**: [https://github.com/mw701/CLFT_AKS](https://github.com/mw701/CLFT_AKS)
- **Technical Report**: See `docs/CLFT_Sparse_AKS_Technical_Report.md`