Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,82 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
tags:
|
| 4 |
+
- semantic-segmentation
|
| 5 |
+
- camera-lidar-fusion
|
| 6 |
+
- autonomous-driving
|
| 7 |
+
- waymo
|
| 8 |
+
- pytorch
|
| 9 |
+
datasets:
|
| 10 |
+
- waymo
|
| 11 |
+
language:
|
| 12 |
+
- en
|
| 13 |
+
---
|
| 14 |
+
|
| 15 |
+
# CLFT-Sparse-AKS
|
| 16 |
+
|
| 17 |
+
**Camera-LiDAR Fusion Transformer with Sparse Adaptive Kernel Selection**
|
| 18 |
+
|
| 19 |
+
## Model Description
|
| 20 |
+
|
| 21 |
+
CLFT-Sparse-AKS is a multi-modal semantic segmentation model that fuses camera (RGB) and LiDAR data for autonomous driving applications.
|
| 22 |
+
|
| 23 |
+
### Key Features
|
| 24 |
+
- **Sparse Adaptive Kernel Selection** [3, 5, 7, 9] - Distance-based kernel size selection
|
| 25 |
+
- **Semantic-Guided Depth Supervision** - Direct supervision for kernel prediction
|
| 26 |
+
- **SS2D (State Space 2D)** - Mamba-based global context aggregation
|
| 27 |
+
- **CUDA Graph Optimization** - Efficient sparse attention processing
|
| 28 |
+
|
| 29 |
+
## Performance (Waymo Dataset)
|
| 30 |
+
|
| 31 |
+
| Condition | Vehicle IoU | Human IoU |
|
| 32 |
+
|-----------|-------------|-----------|
|
| 33 |
+
| Day-Clear | 93.01% | 71.95% |
|
| 34 |
+
| Day-Rain | 93.84% | 70.45% |
|
| 35 |
+
| Night-Clear | 92.80% | 71.47% |
|
| 36 |
+
| Night-Rain | 91.99% | 67.54% |
|
| 37 |
+
| **Average** | **92.91%** | **70.35%** |
|
| 38 |
+
|
| 39 |
+
- **Best Human IoU**: 73.09% (Epoch 269)
|
| 40 |
+
- **Inference Time**: 28.95ms (34.5 FPS)
|
| 41 |
+
- **Parameters**: 120.03M
|
| 42 |
+
- **VRAM**: 3.46GB
|
| 43 |
+
|
| 44 |
+
## Usage
|
| 45 |
+
|
| 46 |
+
```python
|
| 47 |
+
import torch
|
| 48 |
+
from models.clft_sparse import CLFT_Sparse
|
| 49 |
+
|
| 50 |
+
# Load model
|
| 51 |
+
model = CLFT_Sparse(...)
|
| 52 |
+
checkpoint = torch.load('clft_sparse_epoch_269_best_human.pth')
|
| 53 |
+
model.load_state_dict(checkpoint['model_state_dict'])
|
| 54 |
+
model.eval()
|
| 55 |
+
|
| 56 |
+
# Inference
|
| 57 |
+
with torch.no_grad():
|
| 58 |
+
output = model(rgb_input, lidar_input)
|
| 59 |
+
```
|
| 60 |
+
|
| 61 |
+
## Requirements
|
| 62 |
+
|
| 63 |
+
- Python 3.10
|
| 64 |
+
- PyTorch 2.9.0+cu128
|
| 65 |
+
- NATTEN 0.21.1
|
| 66 |
+
- Mamba-SSM 2.3.0
|
| 67 |
+
|
| 68 |
+
## Citation
|
| 69 |
+
|
| 70 |
+
```bibtex
|
| 71 |
+
@misc{clft_sparse_aks_2026,
|
| 72 |
+
title={CLFT-Sparse-AKS: Camera-LiDAR Fusion with Sparse Adaptive Kernel Selection},
|
| 73 |
+
author={Young},
|
| 74 |
+
year={2026},
|
| 75 |
+
url={https://github.com/mw701/CLFT_AKS}
|
| 76 |
+
}
|
| 77 |
+
```
|
| 78 |
+
|
| 79 |
+
## Links
|
| 80 |
+
|
| 81 |
+
- **GitHub**: [https://github.com/mw701/CLFT_AKS](https://github.com/mw701/CLFT_AKS)
|
| 82 |
+
- **Technical Report**: See `docs/CLFT_Sparse_AKS_Technical_Report.md`
|