BEVal Model Zoo — BEV Segmentation Models for Autonomous Driving

This repository contains the pre-trained model weights from the paper:

BEVal: A Cross-dataset Evaluation Study of BEV Segmentation Models for Autonomous Driving
Manuel Diaz-Zapata, Wenqian Liu, Robin Baruffa, Christian Laugier
2024 18th International Conference on Control, Automation, Robotics and Vision (ICARCV), pp. 704–709, IEEE.

Overview

This collection contains 60 pre-trained models for semantic Bird's-Eye View (BEV) segmentation. The models span:

3 training datasets — nuScenes, Woven Planet (Lyft), and a combined cross-dataset split
5 model architectures — from a camera-only baseline to multi-modal camera+LiDAR architectures
4 semantic segmentation tasks — vehicle, pedestrian, drivable area, and a joint vehicle+drivable area task

Each model weights file is located at {dataset}/{architecture}/{task}/model_best.pt and ships alongside its config.yaml.

Model Zoo

File Organization

{dataset}/
  {architecture}/
    {task}/
      model_best.pt     ← trained weights
      config.yaml       ← training configuration

Available Models

Dataset	Architecture	Vehicle	Human	Drivable Area	Vehicle + Drivable Area
lyft	`lss`	✓	✓	✓	✓
lyft	`lapt`	✓	✓	✓	✓
lyft	`lapt_fpn`	✓	✓	✓	✓
lyft	`lapt_pp`	✓	✓	✓	✓
lyft	`lapt_pp_fpn`	✓	✓	✓	✓
nuscenes	`lss`	✓	✓	✓	✓
nuscenes	`lapt`	✓	✓	✓	✓
nuscenes	`lapt_fpn`	✓	✓	✓	✓
nuscenes	`lapt_pp`	✓	✓	✓	✓
nuscenes	`lapt_pp_fpn`	✓	✓	✓	✓
nusc-lyft	`lss`	✓	✓	✓	✓
nusc-lyft	`lapt`	✓	✓	✓	✓
nusc-lyft	`lapt_fpn`	✓	✓	✓	✓
nusc-lyft	`lapt_pp`	✓	✓	✓	✓
nusc-lyft	`lapt_pp_fpn`	✓	✓	✓	✓

Architectures

All models output a semantic segmentation map in the Bird's-Eye View (BEV) frame over a 100 m × 100 m area centered on the ego-vehicle at a 0.5 m/cell resolution (200 × 200 grid).

LSS — Lift-Splat-Shoot (camera only)

Baseline camera-only architecture. Projects multi-camera features into a voxel representation using depth prediction, then encodes the voxels in BEV with a ResNet18-based decoder.

Backbone: EfficientNet-B0 with learned depth bins
Input: surround-view camera images (5–6 cameras)
Reference: Philion & Fidler, ECCV 2020

LAPT — Lidar-Aided Projection Transformer

Extends LSS by replacing the learned depth distribution with LiDAR-guided depth, improving geometric accuracy of the image-to-BEV projection.

Backbone: EfficientNet-B0 + LiDAR depth projection module
Input: surround-view cameras + LiDAR point cloud

LAPT-FPN — LAPT with Feature Pyramid Network

Adds a Feature Pyramid Network to the camera encoder for richer multi-scale image features before lifting into BEV.

Extras: dual-scale FPN (use_fpn: True)

LAPT-PP — LAPT with PointPillars

Fuses camera features (LAPT) with LiDAR features extracted by a PointPillars network. Features are combined via average-pooling fusion.

LiDAR encoder: Pillar Feature Network (PFN) with BN + ReLU
Fusion: avg_pool over camera + PointPillars feature maps

LAPT-PP-FPN — LAPT + PointPillars + FPN

Combines all modules: LAPT depth guidance, FPN multi-scale camera features, and PointPillars LiDAR features.

Usage

Installation

conda create -n beval python=3.10
conda activate beval
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
pip install efficientnet-pytorch==0.7.0 numba==0.57.1 nuscenes-devkit==1.1.9 \
            lyft-dataset-sdk==0.0.8 yacs==0.1.8 tensorboardx==2.6.2.2 \
            matplotlib==3.5.3 shapely==1.8.2

Clone the code repository:

git clone https://github.com/manueldiaz96/beval.git
cd beval

Set environment variables pointing to your dataset directories:

export NUSCENES=/path/to/nuscenes
export LYFT=/path/to/woven_planet

Note: The Woven Planet (Lyft) models require sub-sampled LiDAR point clouds.
Download them from manutheeng/subsampled-lyft and place them in $LYFT/subsampled_lidar/.

Running evaluation

Use the test script that matches the architecture:

# LSS (camera only)
python test_lift_splat.py \
    --cfg  path/to/model_zoo/nuscenes/lss/vehicle/config.yaml \
    --weights path/to/model_zoo/nuscenes/lss/vehicle/model_best.pt

# LAPT / LAPT-FPN
python test_LAPT.py \
    --cfg  path/to/model_zoo/nuscenes/lapt/vehicle/config.yaml \
    --weights path/to/model_zoo/nuscenes/lapt/vehicle/model_best.pt

# LAPT-PP / LAPT-PP-FPN
python test_LAPT_PP.py \
    --cfg  path/to/model_zoo/nuscenes/lapt_pp_fpn/vehicle/config.yaml \
    --weights path/to/model_zoo/nuscenes/lapt_pp_fpn/vehicle/model_best.pt

The script prints per-class IoU (%) for the validation split.

Loading weights directly in Python

import torch
from models.LAPTNet import compile_model  # or lift_splat / LAPTNet_PP

grid_conf    = {"xbound": [-50., 50., 0.5], "ybound": [-50., 50., 0.5]}
data_aug_conf = {"ncams": 6, "rand_flip": False, "pc_rot": (-20, 20),
                 "cams": ["CAM_FRONT_LEFT", "CAM_FRONT", "CAM_FRONT_RIGHT",
                          "CAM_BACK_RIGHT", "CAM_BACK", "CAM_BACK_LEFT"]}

model = compile_model(grid_conf, data_aug_conf, outC=1, use_fpn=False)
model.load_state_dict(torch.load("nuscenes/lapt/vehicle/model_best.pt"))
model.eval()

Training Data

Split	Dataset	Cameras	LiDAR
`lyft`	Woven Planet Perception Dataset (trainval)	5 surround cameras	✓ (sub-sampled)
`nuscenes`	nuScenes (trainval)	6 surround cameras	✓
`nusc-lyft`	nuScenes + Woven Planet (joint)	5–6 cameras	✓

BEV grid configuration (all models)

Parameter	Value
X / Y range	±50 m
Resolution	0.5 m/cell
Grid size	200 × 200

Results

Please refer to the paper for full quantitative results. The figures below are reproduced from the paper.

Cross-dataset evaluation — models trained on one dataset and evaluated on the other:

Cross-dataset training — models trained on the combined nusc-lyft split:

Citation

If this model zoo is useful in your research, please cite:

@inproceedings{beval,
  title     = {BEVal: A Cross-dataset Evaluation Study of {BEV} Segmentation Models for Autonomous Driving},
  author    = {Diaz-Zapata, Manuel and Liu, Wenqian and Baruffa, Robin and Laugier, Christian},
  booktitle = {2024 18th International Conference on Control, Automation, Robotics and Vision (ICARCV)},
  pages     = {704--709},
  year      = {2024},
  organization = {IEEE}
}

License

This model zoo is released under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license.
Commercial use is not permitted.

Parts of the codebase are derived from Lift-Splat-Shoot (NVIDIA, MIT-licensed for the original code).

Downloads last month: -; Downloads are not tracked for this model. How to track

Papers for manutheeng/beval_model_zoo

BEVal: A Cross-dataset Evaluation Study of BEV Segmentation Models for Autonomous Driving

Paper • 2408.16322 • Published Aug 29, 2024

Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D

Paper • 2008.05711 • Published Aug 13, 2020