SiMO: Single-Modal-Operable Multimodal Collaborative Perception
This repository contains pretrained checkpoints for SiMO (Single-Modal-Operable Multimodal Collaborative Perception), a novel framework for robust multimodal collaborative 3D object detection in autonomous driving.
π Paper
Title: Single-Modal-Operable Multimodal Collaborative Perception Conference: ICLR 2026 OpenReview: Link
π Key Features
- Single-Modal Operability: Maintains functional performance when one modality fails
- LAMMA Fusion: Length-Adaptive Multi-Modal Fusion module
- PAFR Training: Pretrain-Align-Fuse-Random Drop training strategy
- Graceful Degradation: >80% AP@30 with camera-only operation
π¦ Available Models
| Model | Dataset | Architecture | Checkpoint |
|---|---|---|---|
| SiMO-PF | OPV2V-H | Pyramid Fusion + LAMMA | Download |
| SiMO-AttFuse | OPV2V-H | AttFusion + LAMMA | Download |
π Performance
OPV2V-H (with Random Drop)
| Modality | AP@30 | AP@50 | AP@70 |
|---|---|---|---|
| LiDAR + Camera | 98.30 | 97.94 | 94.64 |
| LiDAR-only | 97.32 | 97.07 | 94.06 |
| Camera-only | 80.81 | 69.63 | 44.82 |
π» Usage
Installation
git clone https://github.com/dempsey-wen/SiMO.git
cd SiMO
pip install -r requirements.txt
Download Checkpoint
# Install huggingface-hub
pip install huggingface-hub
# Download specific checkpoint
python -c "from huggingface_hub import hf_hub_download; hf_hub_download(repo_id='DempseyWen/SiMO', filename='***.pth')"
π Full Documentation
For complete documentation, training scripts, and data preparation instructions, please visit our GitHub repository.
π’ Acknowledgements
This work builds upon:
π Citation
If you find this work useful, please cite:
@inproceedings{wen2026simo,
title={Single-Modal-Operable Multimodal Collaborative Perception},
author={Wen, Dempsey and Lu, Yifan and others},
booktitle={International Conference on Learning Representations (ICLR)},
year={2026}
}
π License
MIT License - see LICENSE for details.