File size: 2,847 Bytes

---
license: mit
tags:
- computer-vision
- 3d-object-detection
- autonomous-driving
- multimodal-fusion
- collaborative-perception
- lidar
- camera
- opv2v
- v2xset
- dair-v2x
language:
- en
pipeline_tag: object-detection
---

# SiMO: Single-Modal-Operable Multimodal Collaborative Perception

This repository contains pretrained checkpoints for **SiMO** (Single-Modal-Operable Multimodal Collaborative Perception), a novel framework for robust multimodal collaborative 3D object detection in autonomous driving.

## 📜 Paper

**Title**: Single-Modal-Operable Multimodal Collaborative Perception  
**Conference**: ICLR 2026  
**OpenReview**: [Link](https://openreview.net/forum?id=h0iRgjTmVs)  
**ArXiv**：[Link](https://arxiv.org/abs/2603.08240)

## 🚀 Key Features

- **Single-Modal Operability**: Maintains functional performance when one modality fails
- **LAMMA Fusion**: Length-Adaptive Multi-Modal Fusion module
- **PAFR Training**: Pretrain-Align-Fuse-Random Drop training strategy
- **Graceful Degradation**: >80% AP@30 with camera-only operation

## 📦 Available Models

| Model | Dataset | Architecture | Checkpoint |
|-------|---------|--------------|------------|
| SiMO-PF | OPV2V-H | Pyramid Fusion + LAMMA | [Download](https://huggingface.co/DempseyWen/SiMO/blob/main/SiMO_PF/net_epoch27.pth) |
| SiMO-AttFuse | OPV2V-H | AttFusion + LAMMA | [Download](https://huggingface.co/DempseyWen/SiMO/blob/main/SiMO_AF/net_epoch21.pth) |

## 📊 Performance

### OPV2V-H (with Random Drop)

| Modality | AP@30 | AP@50 | AP@70 |
|----------|-------|-------|-------|
| LiDAR + Camera | 98.30 | 97.94 | 94.64 |
| LiDAR-only | 97.32 | 97.07 | 94.06 |
| Camera-only | 80.81 | 69.63 | 44.82 |

## 💻 Usage

### Installation

```bash
git clone https://github.com/dempsey-wen/SiMO.git
cd SiMO
pip install -r requirements.txt
```

### Download Checkpoint

```bash
# Install huggingface-hub
pip install huggingface-hub

# Download specific checkpoint
python -c "from huggingface_hub import hf_hub_download; hf_hub_download(repo_id='DempseyWen/SiMO', filename='***.pth')"
```


## 📖 Full Documentation

For complete documentation, training scripts, and data preparation instructions, please visit our [GitHub repository](https://github.com/dempsey-wen/SiMO).

## 🏢 Acknowledgements

This work builds upon:
- [OpenCOOD](https://github.com/DerrickXuNu/OpenCOOD)
- [HEAL](https://github.com/yifanlu0227/HEAL)

## 📄 Citation

If you find this work useful, please cite:

```bibtex
@inproceedings{wen2026simo,
  title={Single-Modal-Operable Multimodal Collaborative Perception},
  author={Wen, Dempsey and Lu, Yifan and others},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2026}
}
```

## 📄 License

MIT License - see [LICENSE](https://github.com/dempsey-wen/SiMO/blob/main/LICENSE) for details.