CLFT-Sparse-AKS / README.md
BOUNG's picture
Upload README.md with huggingface_hub
00712b7 verified
---
license: mit
tags:
- semantic-segmentation
- camera-lidar-fusion
- autonomous-driving
- waymo
- pytorch
datasets:
- waymo
language:
- en
---
# CLFT-Sparse-AKS
**Camera-LiDAR Fusion Transformer with Sparse Adaptive Kernel Selection**
## Model Description
CLFT-Sparse-AKS is a multi-modal semantic segmentation model that fuses camera (RGB) and LiDAR data for autonomous driving applications.
### Key Features
- **Sparse Adaptive Kernel Selection** [3, 5, 7, 9] - Distance-based kernel size selection
- **Semantic-Guided Depth Supervision** - Direct supervision for kernel prediction
- **SS2D (State Space 2D)** - Mamba-based global context aggregation
- **CUDA Graph Optimization** - Efficient sparse attention processing
## Performance (Waymo Dataset)
| Condition | Vehicle IoU | Human IoU |
|-----------|-------------|-----------|
| Day-Clear | 93.01% | 71.95% |
| Day-Rain | 93.84% | 70.45% |
| Night-Clear | 92.80% | 71.47% |
| Night-Rain | 91.99% | 67.54% |
| **Average** | **92.91%** | **70.35%** |
- **Best Human IoU**: 73.09% (Epoch 269)
- **Inference Time**: 28.95ms (34.5 FPS)
- **Parameters**: 120.03M
- **VRAM**: 3.46GB
## Usage
```python
import torch
from models.clft_sparse import CLFT_Sparse
# Load model
model = CLFT_Sparse(...)
checkpoint = torch.load('clft_sparse_epoch_269_best_human.pth')
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()
# Inference
with torch.no_grad():
output = model(rgb_input, lidar_input)
```
## Requirements
- Python 3.10
- PyTorch 2.9.0+cu128
- NATTEN 0.21.1
- Mamba-SSM 2.3.0
## Citation
```bibtex
@misc{clft_sparse_aks_2026,
title={CLFT-Sparse-AKS: Camera-LiDAR Fusion with Sparse Adaptive Kernel Selection},
author={Young},
year={2026},
url={https://github.com/mw701/CLFT_AKS}
}
```
## Links
- **GitHub**: [https://github.com/mw701/CLFT_AKS](https://github.com/mw701/CLFT_AKS)
- **Technical Report**: See `docs/CLFT_Sparse_AKS_Technical_Report.md`