BEVal Model Zoo β BEV Segmentation Models for Autonomous Driving
This repository contains the pre-trained model weights from the paper:
BEVal: A Cross-dataset Evaluation Study of BEV Segmentation Models for Autonomous Driving
Manuel Diaz-Zapata, Wenqian Liu, Robin Baruffa, Christian Laugier
2024 18th International Conference on Control, Automation, Robotics and Vision (ICARCV), pp. 704β709, IEEE.![]()
Overview
This collection contains 60 pre-trained models for semantic Bird's-Eye View (BEV) segmentation. The models span:
- 3 training datasets β nuScenes, Woven Planet (Lyft), and a combined cross-dataset split
- 5 model architectures β from a camera-only baseline to multi-modal camera+LiDAR architectures
- 4 semantic segmentation tasks β vehicle, pedestrian, drivable area, and a joint vehicle+drivable area task
Each model weights file is located at {dataset}/{architecture}/{task}/model_best.pt and ships alongside its config.yaml.
Model Zoo
File Organization
{dataset}/
{architecture}/
{task}/
model_best.pt β trained weights
config.yaml β training configuration
Available Models
| Dataset | Architecture | Vehicle | Human | Drivable Area | Vehicle + Drivable Area |
|---|---|---|---|---|---|
| lyft | lss |
β | β | β | β |
| lyft | lapt |
β | β | β | β |
| lyft | lapt_fpn |
β | β | β | β |
| lyft | lapt_pp |
β | β | β | β |
| lyft | lapt_pp_fpn |
β | β | β | β |
| nuscenes | lss |
β | β | β | β |
| nuscenes | lapt |
β | β | β | β |
| nuscenes | lapt_fpn |
β | β | β | β |
| nuscenes | lapt_pp |
β | β | β | β |
| nuscenes | lapt_pp_fpn |
β | β | β | β |
| nusc-lyft | lss |
β | β | β | β |
| nusc-lyft | lapt |
β | β | β | β |
| nusc-lyft | lapt_fpn |
β | β | β | β |
| nusc-lyft | lapt_pp |
β | β | β | β |
| nusc-lyft | lapt_pp_fpn |
β | β | β | β |
Architectures
All models output a semantic segmentation map in the Bird's-Eye View (BEV) frame over a 100 m Γ 100 m area centered on the ego-vehicle at a 0.5 m/cell resolution (200 Γ 200 grid).
LSS β Lift-Splat-Shoot (camera only)
Baseline camera-only architecture. Projects multi-camera features into a voxel representation using depth prediction, then encodes the voxels in BEV with a ResNet18-based decoder.
- Backbone: EfficientNet-B0 with learned depth bins
- Input: surround-view camera images (5β6 cameras)
- Reference: Philion & Fidler, ECCV 2020
LAPT β Lidar-Aided Projection Transformer
Extends LSS by replacing the learned depth distribution with LiDAR-guided depth, improving geometric accuracy of the image-to-BEV projection.
- Backbone: EfficientNet-B0 + LiDAR depth projection module
- Input: surround-view cameras + LiDAR point cloud
LAPT-FPN β LAPT with Feature Pyramid Network
Adds a Feature Pyramid Network to the camera encoder for richer multi-scale image features before lifting into BEV.
- Extras: dual-scale FPN (
use_fpn: True)
LAPT-PP β LAPT with PointPillars
Fuses camera features (LAPT) with LiDAR features extracted by a PointPillars network. Features are combined via average-pooling fusion.
- LiDAR encoder: Pillar Feature Network (PFN) with BN + ReLU
- Fusion:
avg_poolover camera + PointPillars feature maps
LAPT-PP-FPN β LAPT + PointPillars + FPN
Combines all modules: LAPT depth guidance, FPN multi-scale camera features, and PointPillars LiDAR features.
Usage
Installation
conda create -n beval python=3.10
conda activate beval
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
pip install efficientnet-pytorch==0.7.0 numba==0.57.1 nuscenes-devkit==1.1.9 \
lyft-dataset-sdk==0.0.8 yacs==0.1.8 tensorboardx==2.6.2.2 \
matplotlib==3.5.3 shapely==1.8.2
Clone the code repository:
git clone https://github.com/manueldiaz96/beval.git
cd beval
Set environment variables pointing to your dataset directories:
export NUSCENES=/path/to/nuscenes
export LYFT=/path/to/woven_planet
Note: The Woven Planet (Lyft) models require sub-sampled LiDAR point clouds.
Download them from manutheeng/subsampled-lyft and place them in$LYFT/subsampled_lidar/.
Running evaluation
Use the test script that matches the architecture:
# LSS (camera only)
python test_lift_splat.py \
--cfg path/to/model_zoo/nuscenes/lss/vehicle/config.yaml \
--weights path/to/model_zoo/nuscenes/lss/vehicle/model_best.pt
# LAPT / LAPT-FPN
python test_LAPT.py \
--cfg path/to/model_zoo/nuscenes/lapt/vehicle/config.yaml \
--weights path/to/model_zoo/nuscenes/lapt/vehicle/model_best.pt
# LAPT-PP / LAPT-PP-FPN
python test_LAPT_PP.py \
--cfg path/to/model_zoo/nuscenes/lapt_pp_fpn/vehicle/config.yaml \
--weights path/to/model_zoo/nuscenes/lapt_pp_fpn/vehicle/model_best.pt
The script prints per-class IoU (%) for the validation split.
Loading weights directly in Python
import torch
from models.LAPTNet import compile_model # or lift_splat / LAPTNet_PP
grid_conf = {"xbound": [-50., 50., 0.5], "ybound": [-50., 50., 0.5]}
data_aug_conf = {"ncams": 6, "rand_flip": False, "pc_rot": (-20, 20),
"cams": ["CAM_FRONT_LEFT", "CAM_FRONT", "CAM_FRONT_RIGHT",
"CAM_BACK_RIGHT", "CAM_BACK", "CAM_BACK_LEFT"]}
model = compile_model(grid_conf, data_aug_conf, outC=1, use_fpn=False)
model.load_state_dict(torch.load("nuscenes/lapt/vehicle/model_best.pt"))
model.eval()
Training Data
| Split | Dataset | Cameras | LiDAR |
|---|---|---|---|
lyft |
Woven Planet Perception Dataset (trainval) | 5 surround cameras | β (sub-sampled) |
nuscenes |
nuScenes (trainval) | 6 surround cameras | β |
nusc-lyft |
nuScenes + Woven Planet (joint) | 5β6 cameras | β |
BEV grid configuration (all models)
| Parameter | Value |
|---|---|
| X / Y range | Β±50 m |
| Resolution | 0.5 m/cell |
| Grid size | 200 Γ 200 |
Results
Please refer to the paper for full quantitative results. The figures below are reproduced from the paper.
Cross-dataset evaluation β models trained on one dataset and evaluated on the other:
Cross-dataset training β models trained on the combined nusc-lyft split:
Citation
If this model zoo is useful in your research, please cite:
@inproceedings{beval,
title = {BEVal: A Cross-dataset Evaluation Study of {BEV} Segmentation Models for Autonomous Driving},
author = {Diaz-Zapata, Manuel and Liu, Wenqian and Baruffa, Robin and Laugier, Christian},
booktitle = {2024 18th International Conference on Control, Automation, Robotics and Vision (ICARCV)},
pages = {704--709},
year = {2024},
organization = {IEEE}
}
License
This model zoo is released under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license.
Commercial use is not permitted.
Parts of the codebase are derived from Lift-Splat-Shoot (NVIDIA, MIT-licensed for the original code).

