| | --- |
| | license: mit |
| | tags: |
| | - computer-vision |
| | - 3d-object-detection |
| | - autonomous-driving |
| | - multimodal-fusion |
| | - collaborative-perception |
| | - lidar |
| | - camera |
| | - opv2v |
| | - v2xset |
| | - dair-v2x |
| | language: |
| | - en |
| | pipeline_tag: object-detection |
| | --- |
| | |
| | # SiMO: Single-Modal-Operable Multimodal Collaborative Perception |
| |
|
| | This repository contains pretrained checkpoints for **SiMO** (Single-Modal-Operable Multimodal Collaborative Perception), a novel framework for robust multimodal collaborative 3D object detection in autonomous driving. |
| |
|
| | ## π Paper |
| |
|
| | **Title**: Single-Modal-Operable Multimodal Collaborative Perception |
| | **Conference**: ICLR 2026 |
| | **OpenReview**: [Link](https://openreview.net/forum?id=h0iRgjTmVs) |
| | **ArXiv**οΌ[Link](https://arxiv.org/abs/2603.08240) |
| |
|
| | ## π Key Features |
| |
|
| | - **Single-Modal Operability**: Maintains functional performance when one modality fails |
| | - **LAMMA Fusion**: Length-Adaptive Multi-Modal Fusion module |
| | - **PAFR Training**: Pretrain-Align-Fuse-Random Drop training strategy |
| | - **Graceful Degradation**: >80% AP@30 with camera-only operation |
| |
|
| | ## π¦ Available Models |
| |
|
| | | Model | Dataset | Architecture | Checkpoint | |
| | |-------|---------|--------------|------------| |
| | | SiMO-PF | OPV2V-H | Pyramid Fusion + LAMMA | [Download](https://huggingface.co/DempseyWen/SiMO/blob/main/SiMO_PF/net_epoch27.pth) | |
| | | SiMO-AttFuse | OPV2V-H | AttFusion + LAMMA | [Download](https://huggingface.co/DempseyWen/SiMO/blob/main/SiMO_AF/net_epoch21.pth) | |
| |
|
| | ## π Performance |
| |
|
| | ### OPV2V-H (with Random Drop) |
| |
|
| | | Modality | AP@30 | AP@50 | AP@70 | |
| | |----------|-------|-------|-------| |
| | | LiDAR + Camera | 98.30 | 97.94 | 94.64 | |
| | | LiDAR-only | 97.32 | 97.07 | 94.06 | |
| | | Camera-only | 80.81 | 69.63 | 44.82 | |
| |
|
| | ## π» Usage |
| |
|
| | ### Installation |
| |
|
| | ```bash |
| | git clone https://github.com/dempsey-wen/SiMO.git |
| | cd SiMO |
| | pip install -r requirements.txt |
| | ``` |
| |
|
| | ### Download Checkpoint |
| |
|
| | ```bash |
| | # Install huggingface-hub |
| | pip install huggingface-hub |
| | |
| | # Download specific checkpoint |
| | python -c "from huggingface_hub import hf_hub_download; hf_hub_download(repo_id='DempseyWen/SiMO', filename='***.pth')" |
| | ``` |
| |
|
| |
|
| | ## π Full Documentation |
| |
|
| | For complete documentation, training scripts, and data preparation instructions, please visit our [GitHub repository](https://github.com/dempsey-wen/SiMO). |
| |
|
| | ## π’ Acknowledgements |
| |
|
| | This work builds upon: |
| | - [OpenCOOD](https://github.com/DerrickXuNu/OpenCOOD) |
| | - [HEAL](https://github.com/yifanlu0227/HEAL) |
| |
|
| | ## π Citation |
| |
|
| | If you find this work useful, please cite: |
| |
|
| | ```bibtex |
| | @inproceedings{wen2026simo, |
| | title={Single-Modal-Operable Multimodal Collaborative Perception}, |
| | author={Wen, Dempsey and Lu, Yifan and others}, |
| | booktitle={International Conference on Learning Representations (ICLR)}, |
| | year={2026} |
| | } |
| | ``` |
| |
|
| | ## π License |
| |
|
| | MIT License - see [LICENSE](https://github.com/dempsey-wen/SiMO/blob/main/LICENSE) for details. |
| |
|