DriveBench / README.md
Trazemag's picture
Upload README.md with huggingface_hub
2fe35b3 verified
|
Raw
History Blame Contribute Delete
4.97 kB
---
language:
- en
tags:
- autonomous-vehicles
- driving
- representation-learning
- multi-task-learning
- computer-vision
- safety
license: mit
---
# DriveBench: General-Purpose Driving Scene Encoder
**Author:** Nikhil Upadhyay | MSc Business Analytics | Dublin Business School
**Project:** [PRECOG-AV](https://github.com/TrazeMaG/PRECOG-AV)
## Overview
DriveBench is the first general-purpose driving scene encoder trained with
safety-focused multi-task supervision across **25 countries and 298,326 real
driving clips** β€” the largest geographic scale in driving representation learning.
Each clip is encoded into a **256-dimensional DriveBench embedding** that
simultaneously captures danger context, geographic driving patterns,
time-of-day risk, radar sensor health, and traffic density.
Use these embeddings like ImageNet features β€” but for driving scenes.
## Results
| Task | Metric | Score | Random Baseline |
|------|--------|-------|-----------------|
| Danger Anticipation | AUC | **0.8385** | 0.500 |
| Geographic Region | Accuracy | **0.4438** | 0.167 (6 classes) |
| Time of Day | Accuracy | **0.5168** | 0.250 (4 classes) |
| Radar Health | AUC | **1.0000** | 0.500 |
| TTC Regression | Pearson r | **0.3009** | 0.000 |
Tested on Greece and Bulgaria β€” countries never seen during training.
## What makes this different
All existing driving pre-training (DriveWorld, DriveTok, GASP) uses geometric
proxy tasks β€” depth prediction, occupancy, reconstruction β€” on 1 to 3 cities.
DriveBench uses **safety-relevant supervision signals** across **25 countries**:
- Danger labels from physics-based TTC analysis (not manual annotation)
- Radar sensor health as a training signal
- Geographic region (6 regions, 25 countries)
- Time-of-day risk patterns (peak danger 13:00-15:00 confirmed)
- Traffic density
## Architecture
ViT-B/16 features (5 frames Γ— 768-dim)
↓
TransformerEncoder (3 layers, 8 heads, 2048 FFN)
↓
DriveBench Embedding (256-dim) ← use this downstream
↓
5 multi-task heads:
Danger head β†’ AUC 0.84
Region head β†’ Acc 0.44 (6 regions)
Time-of-day β†’ Acc 0.52 (4 buckets)
Radar head β†’ AUC 1.00
TTC regression β†’ r = 0.30
## Usage
```python
import torch
import torch.nn as nn
from huggingface_hub import hf_hub_download
class DriveBenchModel(nn.Module):
def __init__(self, embed_dim=256, n_frames=5, n_regions=6):
super().__init__()
self.cls_token = nn.Parameter(torch.randn(1,1,768))
self.pos_embed = nn.Embedding(n_frames+1, 768)
layer = nn.TransformerEncoderLayer(
d_model=768, nhead=8, dim_feedforward=2048,
dropout=0.1, batch_first=True, norm_first=True)
self.transformer = nn.TransformerEncoder(layer, num_layers=3)
self.norm = nn.LayerNorm(768)
self.projector = nn.Sequential(
nn.Linear(768,512), nn.GELU(), nn.Dropout(0.15),
nn.Linear(512,embed_dim), nn.LayerNorm(embed_dim))
def encode(self, x):
B = x.shape[0]
cls = self.cls_token.expand(B,-1,-1)
x = torch.cat([cls,x],dim=1)
pos = torch.arange(x.shape[1], device=x.device)
x = x + self.pos_embed(pos)
x = self.norm(self.transformer(x))
return self.projector(x[:,0])
path = hf_hub_download("Trazemag/DriveBench", "drivebench_best.pt")
model = DriveBenchModel()
ckpt = torch.load(path, map_location="cpu", weights_only=False)
model.load_state_dict(ckpt["model_state"])
model.eval()
# Input: (batch, 5, 768) ViT-B/16 features from 5 consecutive frames
# Output: (batch, 256) DriveBench embedding
# Use as features for any downstream driving task
```
## Pre-computed Embeddings
298,326 embeddings already computed β€” download and use directly:
```python
import numpy as np
from huggingface_hub import hf_hub_download
path = hf_hub_download(
"Trazemag/DriveBench-Embeddings",
"drivebench_embeddings.npz",
repo_type="dataset")
data = np.load(path)
embeddings = data["embeddings"] # (298326, 256)
```
## Training Data
Built on the [NVIDIA PhysicalAI-AV](https://huggingface.co/datasets/nvidia/PhysicalAI-Autonomous-Vehicles)
dataset (gated β€” request access at HuggingFace).
Danger labels available at [Trazemag/PRECOG-Labels](https://huggingface.co/datasets/Trazemag/PRECOG-Labels).
## Related Models
| Model | Task | Link |
|-------|------|------|
| PRECOG-SENSE | Radar health from camera | [Trazemag/PRECOG-SENSE](https://huggingface.co/Trazemag/PRECOG-SENSE) |
| PRECOG-HERALD | Danger anticipation | [Trazemag/PRECOG-HERALD](https://huggingface.co/Trazemag/PRECOG-HERALD) |
| DriveBench | General scene encoder | This model |
## Citation
```bibtex
@misc{upadhyay2026drivebench,
title = {DriveBench: General-Purpose Driving Scene Encoder
via Multi-Task Safety-Focused Pre-training across 25 Countries},
author = {Upadhyay, Nikhil},
year = {2026},
url = {https://github.com/TrazeMaG/PRECOG-AV}
}
```