--- language: - en tags: - autonomous-vehicles - driving - representation-learning - multi-task-learning - computer-vision - safety license: mit --- # DriveBench: General-Purpose Driving Scene Encoder **Author:** Nikhil Upadhyay | MSc Business Analytics | Dublin Business School **Project:** [PRECOG-AV](https://github.com/TrazeMaG/PRECOG-AV) ## Overview DriveBench is the first general-purpose driving scene encoder trained with safety-focused multi-task supervision across **25 countries and 298,326 real driving clips** — the largest geographic scale in driving representation learning. Each clip is encoded into a **256-dimensional DriveBench embedding** that simultaneously captures danger context, geographic driving patterns, time-of-day risk, radar sensor health, and traffic density. Use these embeddings like ImageNet features — but for driving scenes. ## Results | Task | Metric | Score | Random Baseline | |------|--------|-------|-----------------| | Danger Anticipation | AUC | **0.8385** | 0.500 | | Geographic Region | Accuracy | **0.4438** | 0.167 (6 classes) | | Time of Day | Accuracy | **0.5168** | 0.250 (4 classes) | | Radar Health | AUC | **1.0000** | 0.500 | | TTC Regression | Pearson r | **0.3009** | 0.000 | Tested on Greece and Bulgaria — countries never seen during training. ## What makes this different All existing driving pre-training (DriveWorld, DriveTok, GASP) uses geometric proxy tasks — depth prediction, occupancy, reconstruction — on 1 to 3 cities. DriveBench uses **safety-relevant supervision signals** across **25 countries**: - Danger labels from physics-based TTC analysis (not manual annotation) - Radar sensor health as a training signal - Geographic region (6 regions, 25 countries) - Time-of-day risk patterns (peak danger 13:00-15:00 confirmed) - Traffic density ## Architecture ViT-B/16 features (5 frames × 768-dim) ↓ TransformerEncoder (3 layers, 8 heads, 2048 FFN) ↓ DriveBench Embedding (256-dim) ← use this downstream ↓ 5 multi-task heads: Danger head → AUC 0.84 Region head → Acc 0.44 (6 regions) Time-of-day → Acc 0.52 (4 buckets) Radar head → AUC 1.00 TTC regression → r = 0.30 ## Usage ```python import torch import torch.nn as nn from huggingface_hub import hf_hub_download class DriveBenchModel(nn.Module): def __init__(self, embed_dim=256, n_frames=5, n_regions=6): super().__init__() self.cls_token = nn.Parameter(torch.randn(1,1,768)) self.pos_embed = nn.Embedding(n_frames+1, 768) layer = nn.TransformerEncoderLayer( d_model=768, nhead=8, dim_feedforward=2048, dropout=0.1, batch_first=True, norm_first=True) self.transformer = nn.TransformerEncoder(layer, num_layers=3) self.norm = nn.LayerNorm(768) self.projector = nn.Sequential( nn.Linear(768,512), nn.GELU(), nn.Dropout(0.15), nn.Linear(512,embed_dim), nn.LayerNorm(embed_dim)) def encode(self, x): B = x.shape[0] cls = self.cls_token.expand(B,-1,-1) x = torch.cat([cls,x],dim=1) pos = torch.arange(x.shape[1], device=x.device) x = x + self.pos_embed(pos) x = self.norm(self.transformer(x)) return self.projector(x[:,0]) path = hf_hub_download("Trazemag/DriveBench", "drivebench_best.pt") model = DriveBenchModel() ckpt = torch.load(path, map_location="cpu", weights_only=False) model.load_state_dict(ckpt["model_state"]) model.eval() # Input: (batch, 5, 768) ViT-B/16 features from 5 consecutive frames # Output: (batch, 256) DriveBench embedding # Use as features for any downstream driving task ``` ## Pre-computed Embeddings 298,326 embeddings already computed — download and use directly: ```python import numpy as np from huggingface_hub import hf_hub_download path = hf_hub_download( "Trazemag/DriveBench-Embeddings", "drivebench_embeddings.npz", repo_type="dataset") data = np.load(path) embeddings = data["embeddings"] # (298326, 256) ``` ## Training Data Built on the [NVIDIA PhysicalAI-AV](https://huggingface.co/datasets/nvidia/PhysicalAI-Autonomous-Vehicles) dataset (gated — request access at HuggingFace). Danger labels available at [Trazemag/PRECOG-Labels](https://huggingface.co/datasets/Trazemag/PRECOG-Labels). ## Related Models | Model | Task | Link | |-------|------|------| | PRECOG-SENSE | Radar health from camera | [Trazemag/PRECOG-SENSE](https://huggingface.co/Trazemag/PRECOG-SENSE) | | PRECOG-HERALD | Danger anticipation | [Trazemag/PRECOG-HERALD](https://huggingface.co/Trazemag/PRECOG-HERALD) | | DriveBench | General scene encoder | This model | ## Citation ```bibtex @misc{upadhyay2026drivebench, title = {DriveBench: General-Purpose Driving Scene Encoder via Multi-Task Safety-Focused Pre-training across 25 Countries}, author = {Upadhyay, Nikhil}, year = {2026}, url = {https://github.com/TrazeMaG/PRECOG-AV} } ```