File size: 12,754 Bytes
18def95 b0800d3 d19bd3e 06c35d4 162e7a4 d19bd3e e3f4e57 162e7a4 6d2df77 3403979 162e7a4 d19bd3e b1da435 102ac67 6ce1271 1a2cae5 d19bd3e 3403979 d19bd3e 3403979 d19bd3e b1da435 d19bd3e 6ce1271 d19bd3e 6ce1271 d19bd3e 1a2cae5 d19bd3e 1a2cae5 d19bd3e 1a2cae5 d19bd3e 3403979 3ea6165 d19bd3e e3f4e57 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 | ---
license: mit
language:
- en
tags:
- 3d-object-detection
- autonomous-driving
- multi-camera
- bev
- sparse
- nuscenes
- pytorch
- onnx
datasets:
- nuscenes
library_name: mmdet3d
pipeline_tag: object-detection
model-index:
- name: SparseBEV (vit_eva02_1600x640_trainval_future)
results:
- task:
type: object-detection
name: 3D Object Detection
dataset:
type: nuscenes
name: nuScenes
split: test
metrics:
- type: nds
value: 70.2
name: NDS (test)
- task:
type: object-detection
name: 3D Object Detection
dataset:
type: nuscenes
name: nuScenes
split: validation
metrics:
- type: nds
value: 85.3
name: NDS (val)
- name: SparseBEV (vov99_dd3d_1600x640_trainval_future)
results:
- task:
type: object-detection
name: 3D Object Detection
dataset:
type: nuscenes
name: nuScenes
split: test
metrics:
- type: nds
value: 67.5
name: NDS (test)
- task:
type: object-detection
name: 3D Object Detection
dataset:
type: nuscenes
name: nuScenes
split: validation
metrics:
- type: nds
value: 84.9
name: NDS (val)
- name: SparseBEV (r50_nuimg_704x256)
results:
- task:
type: object-detection
name: 3D Object Detection
dataset:
type: nuscenes
name: nuScenes
split: validation
metrics:
- type: nds
value: 55.6
name: NDS (val)
---
# SparseBEV
This is a fork of [MCG-NJU/SparseBEV](https://github.com/MCG-NJU/SparseBEV) with modifications to add ONNX export support targeting Apple Silicon via ONNX Runtime's CoreML Execution Provider. [See the full description of changes below.](#changes-from-upstream)
This is the official PyTorch implementation for our ICCV 2023 paper:
> [**SparseBEV: High-Performance Sparse 3D Object Detection from Multi-Camera Videos**](https://arxiv.org/abs/2308.09244)<br>
> [Haisong Liu](https://scholar.google.com/citations?user=Z9yWFA0AAAAJ&hl=en&oi=sra), [Yao Teng](https://scholar.google.com/citations?user=eLIsViIAAAAJ&hl=en&oi=sra), [Tao Lu](https://scholar.google.com/citations?user=Ch28NiIAAAAJ&hl=en&oi=sra), [Haiguang Wang](https://miraclesinwang.github.io/), [Limin Wang](https://scholar.google.com/citations?user=HEuN8PcAAAAJ&hl=en&oi=sra)<br>Nanjing University, Shanghai AI Lab
δΈζθ§£θ―»οΌ[https://zhuanlan.zhihu.com/p/654821380](https://zhuanlan.zhihu.com/p/654821380)

## News
* 2024-03-31: The code of SparseOcc is released at [https://github.com/MCG-NJU/SparseOcc](https://github.com/MCG-NJU/SparseOcc).
* 2023-12-29: Check out our new paper ([https://arxiv.org/abs/2312.17118](https://arxiv.org/abs/2312.17118)) to learn about SparseOcc, a fully sparse architecture for panoptic occupancy!
* 2023-10-20: We provide code for visualizing the predictions and the sampling points, as requested in [#25](https://github.com/MCG-NJU/SparseBEV/issues/25).
* 2023-09-23: We release [the native PyTorch implementation of sparse sampling](https://github.com/MCG-NJU/SparseBEV/blob/97c8c798284555accedd0625395dd397fa4511d2/models/csrc/wrapper.py#L14). You can use this version if you encounter problems when compiling CUDA operators. Itβs only about 15% slower.
* 2023-08-21: We release the paper, code and pretrained weights.
* 2023-07-14: SparseBEV is accepted to ICCV 2023.
* 2023-02-09: SparseBEV-Beta achieves 65.6 NDS on [the nuScenes leaderboard](https://eval.ai/web/challenges/challenge-page/356/leaderboard/1012).
## Model Zoo
| Setting | Pretrain | Training Cost | NDS<sub>val</sub> | NDS<sub>test</sub> | FPS | Weights |
|----------|:--------:|:-------------:|:-----------------:|:------------------:|:---:|:-------:|
| [r50_nuimg_704x256](configs/r50_nuimg_704x256.py) | [nuImg](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/nuimages_semseg/cascade_mask_rcnn_r50_fpn_coco-20e_20e_nuim/cascade_mask_rcnn_r50_fpn_coco-20e_20e_nuim_20201009_124951-40963960.pth) | 21h (8x2080Ti) | 55.6 | - | 15.8 | [gdrive](https://drive.google.com/file/d/1ft34-pxLpHGo2Aw-jowEtCxyXcqszHNn/view) |
| [r50_nuimg_704x256_400q_36ep](configs/r50_nuimg_704x256_400q_36ep.py) | [nuImg](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/nuimages_semseg/cascade_mask_rcnn_r50_fpn_coco-20e_20e_nuim/cascade_mask_rcnn_r50_fpn_coco-20e_20e_nuim_20201009_124951-40963960.pth) | 28h (8x2080Ti) | 55.8 | - | 23.5 | [gdrive](https://drive.google.com/file/d/1C_Vn3iiSnSW1Dw1r0DkjJMwvHC5Y3zTN/view) |
| [r101_nuimg_1408x512](configs/r101_nuimg_1408x512.py) | [nuImg](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/nuimages_semseg/cascade_mask_rcnn_r101_fpn_1x_nuim/cascade_mask_rcnn_r101_fpn_1x_nuim_20201024_134804-45215b1e.pth) | 2d8h (8xV100) | 59.2 | - | 6.5 | [gdrive](https://drive.google.com/file/d/1dKu5cR1fuo-O0ynyBh-RCPtHrgut29mN/view) |
| [vov99_dd3d_1600x640_trainval_future](configs/vov99_dd3d_1600x640_trainval_future.py) | [DD3D](https://drive.google.com/file/d/1gQkhWERCzAosBwG5bh2BKkt1k0TJZt-A/view) | 4d1h (8xA100) | 84.9 | 67.5 | - | [gdrive](https://drive.google.com/file/d/1TL0QoCiWD5uq8PCAWWE3A-g73ibK1R0S/view) |
| [vit_eva02_1600x640_trainval_future](configs/vit_eva02_1600x640_trainval_future.py) | [EVA02](https://huggingface.co/Yuxin-CV/EVA-02/blob/main/eva02/det/eva02_L_coco_seg_sys_o365.pth) | 11d (8xA100) | 85.3 | 70.2 | - | [gdrive](https://drive.google.com/file/d/1cx7h6PUqiaVWPixpcuB9AhsX3Sx4n0q_/view) |
* We use `r50_nuimg_704x256` for ablation studies and `r50_nuimg_704x256_400q_36ep` for comparison with others.
* We recommend using `r50_nuimg_704x256` to validate new ideas since it trains faster and the result is more stable.
* FPS is measured with AMD 5800X CPU and RTX 3090 GPU (without `fp16`).
* The noise is around 0.3 NDS.
## Environment
Install PyTorch 2.0 + CUDA 11.8:
```
conda create -n sparsebev python=3.8
conda activate sparsebev
conda install pytorch==2.0.0 torchvision==0.15.0 pytorch-cuda=11.8 -c pytorch -c nvidia
```
or PyTorch 1.10.2 + CUDA 10.2 for older GPUs:
```
conda create -n sparsebev python=3.8
conda activate sparsebev
conda install pytorch==1.10.2 torchvision==0.11.3 cudatoolkit=10.2 -c pytorch
```
Install other dependencies:
```
pip install openmim
mim install mmcv-full==1.6.0
mim install mmdet==2.28.2
mim install mmsegmentation==0.30.0
mim install mmdet3d==1.0.0rc6
pip install setuptools==59.5.0
pip install numpy==1.23.5
```
Install turbojpeg and pillow-simd to speed up data loading (optional but important):
```
sudo apt-get update
sudo apt-get install -y libturbojpeg
pip install pyturbojpeg
pip uninstall pillow
pip install pillow-simd==9.0.0.post1
```
Compile CUDA extensions:
```
cd models/csrc
python setup.py build_ext --inplace
```
## Prepare Dataset
1. Download nuScenes from [https://www.nuscenes.org/nuscenes](https://www.nuscenes.org/nuscenes) and put it in `data/nuscenes`.
2. Download the generated info file from [Google Drive](https://drive.google.com/file/d/1uyoUuSRIVScrm_CUpge6V_UzwDT61ODO/view?usp=sharing) and unzip it.
3. Folder structure:
```
data/nuscenes
βββ maps
βββ nuscenes_infos_test_sweep.pkl
βββ nuscenes_infos_train_sweep.pkl
βββ nuscenes_infos_train_mini_sweep.pkl
βββ nuscenes_infos_val_sweep.pkl
βββ nuscenes_infos_val_mini_sweep.pkl
βββ samples
βββ sweeps
βββ v1.0-test
βββ v1.0-trainval
```
These `*.pkl` files can also be generated with our script: `gen_sweep_info.py`.
## Training
Download pretrained weights and put it in directory `pretrain/`:
```
pretrain
βββ cascade_mask_rcnn_r101_fpn_1x_nuim_20201024_134804-45215b1e.pth
βββ cascade_mask_rcnn_r50_fpn_coco-20e_20e_nuim_20201009_124951-40963960.pth
```
Train SparseBEV with 8 GPUs:
```
torchrun --nproc_per_node 8 train.py --config configs/r50_nuimg_704x256.py
```
Train SparseBEV with 4 GPUs (i.e the last four GPUs):
```
export CUDA_VISIBLE_DEVICES=4,5,6,7
torchrun --nproc_per_node 4 train.py --config configs/r50_nuimg_704x256.py
```
The batch size for each GPU will be scaled automatically. So there is no need to modify the `batch_size` in config files.
## Evaluation
Single-GPU evaluation:
```
export CUDA_VISIBLE_DEVICES=0
python val.py --config configs/r50_nuimg_704x256.py --weights checkpoints/r50_nuimg_704x256.pth
```
Multi-GPU evaluation:
```
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
torchrun --nproc_per_node 8 val.py --config configs/r50_nuimg_704x256.py --weights checkpoints/r50_nuimg_704x256.pth
```
## Timing
FPS is measured with a single GPU:
```
export CUDA_VISIBLE_DEVICES=0
python timing.py --config configs/r50_nuimg_704x256.py --weights checkpoints/r50_nuimg_704x256.pth
```
## Visualization
Visualize the predicted bbox:
```
python viz_bbox_predictions.py --config configs/r50_nuimg_704x256.py --weights checkpoints/r50_nuimg_704x256.pth
```
Visualize the sampling points (like Fig. 6 in the paper):
```
python viz_sample_points.py --config configs/r50_nuimg_704x256.py --weights checkpoints/r50_nuimg_704x256.pth
```
## Changes from upstream
This fork adds ONNX export support targeting [ONNX Runtime's CoreML Execution Provider](https://onnxruntime.ai/docs/execution-providers/CoreML-ExecutionProvider.html) for inference on Apple Silicon (Mac Studio).
### Dependency management
- `pyproject.toml` / `uv.lock` β project dependencies managed with [uv](https://docs.astral.sh/uv/)
- `justfile` β task runner for common operations
### ONNX export
Three code changes were required to make the model traceable with `torch.onnx.export`:
**`models/sparsebev_sampling.py`** β `sampling_4d()`
- Replaced 6-dimensional advanced tensor indexing (not supported by the ONNX tracer) with `torch.gather` for best-view selection
**`models/csrc/wrapper.py`** β new `msmv_sampling_onnx()`
- Added an ONNX-compatible sampling path that uses 4D `F.grid_sample` (ONNX opset 16+) and `torch.gather` for view selection, replacing the original 5D volumetric `grid_sample` which is not in the ONNX spec
- The existing CUDA kernel path (`msmv_sampling` / `msmv_sampling_pytorch`) is preserved and used when CUDA is available
**`models/sparsebev_transformer.py`**
- `SparseBEVTransformerDecoder.forward()`: added a fast path that accepts pre-computed `time_diff` and `lidar2img` tensors directly, bypassing the NumPy preprocessing that is not traceable
- `SparseBEVTransformerDecoderLayer.forward()`: replaced a masked in-place assignment (`tensor[mask] = value`) with `torch.where`, which is ONNX-compatible
- `SparseBEVSelfAttention.calc_bbox_dists()`: replaced a Python loop over the batch dimension with a vectorised `torch.norm` using broadcasting
### New files
| File | Purpose |
|------|---------|
| `export_onnx.py` | Exports the model to ONNX, runs ORT CPU + CoreML EP validation |
| `models/onnx_wrapper.py` | Thin `nn.Module` wrapper that accepts pre-computed tensors instead of `img_metas` dicts |
| `justfile` | `just onnx_export` / `just onnx_export_validate` |
| `exports/` | ONNX model files tracked via Git LFS |
### Running the export
```bash
just onnx_export
# or with validation against PyTorch and CoreML EP:
just onnx_export_validate
```
Exported models land in `exports/` as `sparsebev_{config}_opset{N}.onnx` (+ `.onnx.data` for weights).
**Inference with ONNX Runtime:**
```python
import onnxruntime as ort
sess = ort.InferenceSession(
'exports/sparsebev_r50_nuimg_704x256_400q_36ep_opset18.onnx',
providers=[('CoreMLExecutionProvider', {'MLComputeUnits': 'ALL'}),
'CPUExecutionProvider'],
)
cls_scores, bbox_preds = sess.run(None, {
'img': img_np, # [1, 48, 3, 256, 704] float32 BGR
'lidar2img': lidar2img_np, # [1, 48, 4, 4] float32
'time_diff': time_diff_np, # [1, 8] float32, seconds since frame 0
})
# cls_scores: [6, 1, 400, 10] raw logits per decoder layer
# bbox_preds: [6, 1, 400, 10] raw box params β decode with NMSFreeCoder
```
The `MLComputeUnits` option must be passed explicitly; without it ONNX Runtime discards the CoreML EP on the first unsupported partition instead of falling back per-node.
---
## Acknowledgements
Many thanks to these excellent open-source projects:
* 3D Detection: [DETR3D](https://github.com/WangYueFt/detr3d), [PETR](https://github.com/megvii-research/PETR), [BEVFormer](https://github.com/fundamentalvision/BEVFormer), [BEVDet](https://github.com/HuangJunJie2017/BEVDet), [StreamPETR](https://github.com/exiawsh/StreamPETR)
* 2D Detection: [AdaMixer](https://github.com/MCG-NJU/AdaMixer), [DN-DETR](https://github.com/IDEA-Research/DN-DETR)
* Codebase: [MMDetection3D](https://github.com/open-mmlab/mmdetection3d), [CamLiFlow](https://github.com/MCG-NJU/CamLiFlow)
|