File size: 2,847 Bytes
318b1ad
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
923f2ae
 
 
 
318b1ad
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
---
license: mit
tags:
- computer-vision
- 3d-object-detection
- autonomous-driving
- multimodal-fusion
- collaborative-perception
- lidar
- camera
- opv2v
- v2xset
- dair-v2x
language:
- en
pipeline_tag: object-detection
---

# SiMO: Single-Modal-Operable Multimodal Collaborative Perception

This repository contains pretrained checkpoints for **SiMO** (Single-Modal-Operable Multimodal Collaborative Perception), a novel framework for robust multimodal collaborative 3D object detection in autonomous driving.

## πŸ“œ Paper

**Title**: Single-Modal-Operable Multimodal Collaborative Perception  
**Conference**: ICLR 2026  
**OpenReview**: [Link](https://openreview.net/forum?id=h0iRgjTmVs)  
**ArXiv**:[Link](https://arxiv.org/abs/2603.08240)

## πŸš€ Key Features

- **Single-Modal Operability**: Maintains functional performance when one modality fails
- **LAMMA Fusion**: Length-Adaptive Multi-Modal Fusion module
- **PAFR Training**: Pretrain-Align-Fuse-Random Drop training strategy
- **Graceful Degradation**: >80% AP@30 with camera-only operation

## πŸ“¦ Available Models

| Model | Dataset | Architecture | Checkpoint |
|-------|---------|--------------|------------|
| SiMO-PF | OPV2V-H | Pyramid Fusion + LAMMA | [Download](https://huggingface.co/DempseyWen/SiMO/blob/main/SiMO_PF/net_epoch27.pth) |
| SiMO-AttFuse | OPV2V-H | AttFusion + LAMMA | [Download](https://huggingface.co/DempseyWen/SiMO/blob/main/SiMO_AF/net_epoch21.pth) |

## πŸ“Š Performance

### OPV2V-H (with Random Drop)

| Modality | AP@30 | AP@50 | AP@70 |
|----------|-------|-------|-------|
| LiDAR + Camera | 98.30 | 97.94 | 94.64 |
| LiDAR-only | 97.32 | 97.07 | 94.06 |
| Camera-only | 80.81 | 69.63 | 44.82 |

## πŸ’» Usage

### Installation

```bash
git clone https://github.com/dempsey-wen/SiMO.git
cd SiMO
pip install -r requirements.txt
```

### Download Checkpoint

```bash
# Install huggingface-hub
pip install huggingface-hub

# Download specific checkpoint
python -c "from huggingface_hub import hf_hub_download; hf_hub_download(repo_id='DempseyWen/SiMO', filename='***.pth')"
```


## πŸ“– Full Documentation

For complete documentation, training scripts, and data preparation instructions, please visit our [GitHub repository](https://github.com/dempsey-wen/SiMO).

## 🏒 Acknowledgements

This work builds upon:
- [OpenCOOD](https://github.com/DerrickXuNu/OpenCOOD)
- [HEAL](https://github.com/yifanlu0227/HEAL)

## πŸ“„ Citation

If you find this work useful, please cite:

```bibtex
@inproceedings{wen2026simo,
  title={Single-Modal-Operable Multimodal Collaborative Perception},
  author={Wen, Dempsey and Lu, Yifan and others},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2026}
}
```

## πŸ“„ License

MIT License - see [LICENSE](https://github.com/dempsey-wen/SiMO/blob/main/LICENSE) for details.