--- license: mit tags: - computer-vision - 3d-object-detection - autonomous-driving - multimodal-fusion - collaborative-perception - lidar - camera - opv2v - v2xset - dair-v2x language: - en pipeline_tag: object-detection --- # SiMO: Single-Modal-Operable Multimodal Collaborative Perception This repository contains pretrained checkpoints for **SiMO** (Single-Modal-Operable Multimodal Collaborative Perception), a novel framework for robust multimodal collaborative 3D object detection in autonomous driving. ## 📜 Paper **Title**: Single-Modal-Operable Multimodal Collaborative Perception **Conference**: ICLR 2026 **OpenReview**: [Link](https://openreview.net/forum?id=h0iRgjTmVs) ## 🚀 Key Features - **Single-Modal Operability**: Maintains functional performance when one modality fails - **LAMMA Fusion**: Length-Adaptive Multi-Modal Fusion module - **PAFR Training**: Pretrain-Align-Fuse-Random Drop training strategy - **Graceful Degradation**: >80% AP@30 with camera-only operation ## 📦 Available Models | Model | Dataset | Architecture | Checkpoint | |-------|---------|--------------|------------| | SiMO-PF | OPV2V-H | Pyramid Fusion + LAMMA | [Download](https://huggingface.co/DempseyWen/SiMO/blob/main/SiMO_PF/net_epoch27.pth) | | SiMO-AttFuse | OPV2V-H | AttFusion + LAMMA | [Download](https://huggingface.co/DempseyWen/SiMO/blob/main/SiMO_AF/net_epoch21.pth) | ## 📊 Performance ### OPV2V-H (with Random Drop) | Modality | AP@30 | AP@50 | AP@70 | |----------|-------|-------|-------| | LiDAR + Camera | 98.30 | 97.94 | 94.64 | | LiDAR-only | 97.32 | 97.07 | 94.06 | | Camera-only | 80.81 | 69.63 | 44.82 | ## 💻 Usage ### Installation ```bash git clone https://github.com/dempsey-wen/SiMO.git cd SiMO pip install -r requirements.txt ``` ### Download Checkpoint ```bash # Install huggingface-hub pip install huggingface-hub # Download specific checkpoint python -c "from huggingface_hub import hf_hub_download; hf_hub_download(repo_id='DempseyWen/SiMO', filename='***.pth')" ``` ## 📖 Full Documentation For complete documentation, training scripts, and data preparation instructions, please visit our [GitHub repository](https://github.com/dempsey-wen/SiMO). ## 🏢 Acknowledgements This work builds upon: - [OpenCOOD](https://github.com/DerrickXuNu/OpenCOOD) - [HEAL](https://github.com/yifanlu0227/HEAL) ## 📄 Citation If you find this work useful, please cite: ```bibtex @inproceedings{wen2026simo, title={Single-Modal-Operable Multimodal Collaborative Perception}, author={Wen, Dempsey and Lu, Yifan and others}, booktitle={International Conference on Learning Representations (ICLR)}, year={2026} } ``` ## 📄 License MIT License - see [LICENSE](https://github.com/dempsey-wen/SiMO/blob/main/LICENSE) for details.