BEVal Model Zoo β€” BEV Segmentation Models for Autonomous Driving

This repository contains the pre-trained model weights from the paper:

BEVal: A Cross-dataset Evaluation Study of BEV Segmentation Models for Autonomous Driving
Manuel Diaz-Zapata, Wenqian Liu, Robin Baruffa, Christian Laugier
2024 18th International Conference on Control, Automation, Robotics and Vision (ICARCV), pp. 704–709, IEEE.
arXiv GitHub


Overview

This collection contains 60 pre-trained models for semantic Bird's-Eye View (BEV) segmentation. The models span:

  • 3 training datasets β€” nuScenes, Woven Planet (Lyft), and a combined cross-dataset split
  • 5 model architectures β€” from a camera-only baseline to multi-modal camera+LiDAR architectures
  • 4 semantic segmentation tasks β€” vehicle, pedestrian, drivable area, and a joint vehicle+drivable area task

Each model weights file is located at {dataset}/{architecture}/{task}/model_best.pt and ships alongside its config.yaml.


Model Zoo

File Organization

{dataset}/
  {architecture}/
    {task}/
      model_best.pt     ← trained weights
      config.yaml       ← training configuration

Available Models

Dataset Architecture Vehicle Human Drivable Area Vehicle + Drivable Area
lyft lss βœ“ βœ“ βœ“ βœ“
lyft lapt βœ“ βœ“ βœ“ βœ“
lyft lapt_fpn βœ“ βœ“ βœ“ βœ“
lyft lapt_pp βœ“ βœ“ βœ“ βœ“
lyft lapt_pp_fpn βœ“ βœ“ βœ“ βœ“
nuscenes lss βœ“ βœ“ βœ“ βœ“
nuscenes lapt βœ“ βœ“ βœ“ βœ“
nuscenes lapt_fpn βœ“ βœ“ βœ“ βœ“
nuscenes lapt_pp βœ“ βœ“ βœ“ βœ“
nuscenes lapt_pp_fpn βœ“ βœ“ βœ“ βœ“
nusc-lyft lss βœ“ βœ“ βœ“ βœ“
nusc-lyft lapt βœ“ βœ“ βœ“ βœ“
nusc-lyft lapt_fpn βœ“ βœ“ βœ“ βœ“
nusc-lyft lapt_pp βœ“ βœ“ βœ“ βœ“
nusc-lyft lapt_pp_fpn βœ“ βœ“ βœ“ βœ“

Architectures

All models output a semantic segmentation map in the Bird's-Eye View (BEV) frame over a 100 m Γ— 100 m area centered on the ego-vehicle at a 0.5 m/cell resolution (200 Γ— 200 grid).

LSS β€” Lift-Splat-Shoot (camera only)

Baseline camera-only architecture. Projects multi-camera features into a voxel representation using depth prediction, then encodes the voxels in BEV with a ResNet18-based decoder.

  • Backbone: EfficientNet-B0 with learned depth bins
  • Input: surround-view camera images (5–6 cameras)
  • Reference: Philion & Fidler, ECCV 2020

LAPT β€” Lidar-Aided Projection Transformer

Extends LSS by replacing the learned depth distribution with LiDAR-guided depth, improving geometric accuracy of the image-to-BEV projection.

  • Backbone: EfficientNet-B0 + LiDAR depth projection module
  • Input: surround-view cameras + LiDAR point cloud

LAPT-FPN β€” LAPT with Feature Pyramid Network

Adds a Feature Pyramid Network to the camera encoder for richer multi-scale image features before lifting into BEV.

  • Extras: dual-scale FPN (use_fpn: True)

LAPT-PP β€” LAPT with PointPillars

Fuses camera features (LAPT) with LiDAR features extracted by a PointPillars network. Features are combined via average-pooling fusion.

  • LiDAR encoder: Pillar Feature Network (PFN) with BN + ReLU
  • Fusion: avg_pool over camera + PointPillars feature maps

LAPT-PP-FPN β€” LAPT + PointPillars + FPN

Combines all modules: LAPT depth guidance, FPN multi-scale camera features, and PointPillars LiDAR features.


Usage

Installation

conda create -n beval python=3.10
conda activate beval
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
pip install efficientnet-pytorch==0.7.0 numba==0.57.1 nuscenes-devkit==1.1.9 \
            lyft-dataset-sdk==0.0.8 yacs==0.1.8 tensorboardx==2.6.2.2 \
            matplotlib==3.5.3 shapely==1.8.2

Clone the code repository:

git clone https://github.com/manueldiaz96/beval.git
cd beval

Set environment variables pointing to your dataset directories:

export NUSCENES=/path/to/nuscenes
export LYFT=/path/to/woven_planet

Note: The Woven Planet (Lyft) models require sub-sampled LiDAR point clouds.
Download them from manutheeng/subsampled-lyft and place them in $LYFT/subsampled_lidar/.

Running evaluation

Use the test script that matches the architecture:

# LSS (camera only)
python test_lift_splat.py \
    --cfg  path/to/model_zoo/nuscenes/lss/vehicle/config.yaml \
    --weights path/to/model_zoo/nuscenes/lss/vehicle/model_best.pt

# LAPT / LAPT-FPN
python test_LAPT.py \
    --cfg  path/to/model_zoo/nuscenes/lapt/vehicle/config.yaml \
    --weights path/to/model_zoo/nuscenes/lapt/vehicle/model_best.pt

# LAPT-PP / LAPT-PP-FPN
python test_LAPT_PP.py \
    --cfg  path/to/model_zoo/nuscenes/lapt_pp_fpn/vehicle/config.yaml \
    --weights path/to/model_zoo/nuscenes/lapt_pp_fpn/vehicle/model_best.pt

The script prints per-class IoU (%) for the validation split.

Loading weights directly in Python

import torch
from models.LAPTNet import compile_model  # or lift_splat / LAPTNet_PP

grid_conf    = {"xbound": [-50., 50., 0.5], "ybound": [-50., 50., 0.5]}
data_aug_conf = {"ncams": 6, "rand_flip": False, "pc_rot": (-20, 20),
                 "cams": ["CAM_FRONT_LEFT", "CAM_FRONT", "CAM_FRONT_RIGHT",
                          "CAM_BACK_RIGHT", "CAM_BACK", "CAM_BACK_LEFT"]}

model = compile_model(grid_conf, data_aug_conf, outC=1, use_fpn=False)
model.load_state_dict(torch.load("nuscenes/lapt/vehicle/model_best.pt"))
model.eval()

Training Data

Split Dataset Cameras LiDAR
lyft Woven Planet Perception Dataset (trainval) 5 surround cameras βœ“ (sub-sampled)
nuscenes nuScenes (trainval) 6 surround cameras βœ“
nusc-lyft nuScenes + Woven Planet (joint) 5–6 cameras βœ“

BEV grid configuration (all models)

Parameter Value
X / Y range Β±50 m
Resolution 0.5 m/cell
Grid size 200 Γ— 200

Results

Please refer to the paper for full quantitative results. The figures below are reproduced from the paper.

Cross-dataset evaluation β€” models trained on one dataset and evaluated on the other:

Cross-dataset evaluation results

Cross-dataset training β€” models trained on the combined nusc-lyft split:

Cross-dataset training results


Citation

If this model zoo is useful in your research, please cite:

@inproceedings{beval,
  title     = {BEVal: A Cross-dataset Evaluation Study of {BEV} Segmentation Models for Autonomous Driving},
  author    = {Diaz-Zapata, Manuel and Liu, Wenqian and Baruffa, Robin and Laugier, Christian},
  booktitle = {2024 18th International Conference on Control, Automation, Robotics and Vision (ICARCV)},
  pages     = {704--709},
  year      = {2024},
  organization = {IEEE}
}

License

This model zoo is released under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license.
Commercial use is not permitted.

Parts of the codebase are derived from Lift-Splat-Shoot (NVIDIA, MIT-licensed for the original code).

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Papers for manutheeng/beval_model_zoo