Trazemag
/

DriveBench

+---
+language:
+- en
+tags:
+- autonomous-vehicles
+- driving
+- representation-learning
+- multi-task-learning
+- computer-vision
+- safety
+license: mit
+---
+# DriveBench: General-Purpose Driving Scene Encoder
+**Author:** Nikhil Upadhyay | MSc Business Analytics | Dublin Business School
+**Project:** [PRECOG-AV](https://github.com/TrazeMaG/PRECOG-AV)
+## Overview
+DriveBench is the first general-purpose driving scene encoder trained with
+safety-focused multi-task supervision across **25 countries and 298,326 real
+driving clips** — the largest geographic scale in driving representation learning.
+Each clip is encoded into a **256-dimensional DriveBench embedding** that
+simultaneously captures danger context, geographic driving patterns,
+time-of-day risk, radar sensor health, and traffic density.
+Use these embeddings like ImageNet features — but for driving scenes.
+## Results
+| Task | Metric | Score | Random Baseline |
+|------|--------|-------|-----------------|
+| Danger Anticipation | AUC | **0.8385** | 0.500 |
+| Geographic Region | Accuracy | **0.4438** | 0.167 (6 classes) |
+| Time of Day | Accuracy | **0.5168** | 0.250 (4 classes) |
+| Radar Health | AUC | **1.0000** | 0.500 |
+| TTC Regression | Pearson r | **0.3009** | 0.000 |
+Tested on Greece and Bulgaria — countries never seen during training.
+## What makes this different
+All existing driving pre-training (DriveWorld, DriveTok, GASP) uses geometric
+proxy tasks — depth prediction, occupancy, reconstruction — on 1 to 3 cities.
+DriveBench uses **safety-relevant supervision signals** across **25 countries**:
+- Danger labels from physics-based TTC analysis (not manual annotation)
+- Radar sensor health as a training signal
+- Geographic region (6 regions, 25 countries)
+- Time-of-day risk patterns (peak danger 13:00-15:00 confirmed)
+- Traffic density
+## Architecture
+ViT-B/16 features (5 frames × 768-dim)
+↓
+TransformerEncoder (3 layers, 8 heads, 2048 FFN)
+↓
+DriveBench Embedding (256-dim)  ← use this downstream
+↓
+5 multi-task heads:
+Danger head     → AUC 0.84
+Region head     → Acc 0.44 (6 regions)
+Time-of-day     → Acc 0.52 (4 buckets)
+Radar head      → AUC 1.00
+TTC regression  → r = 0.30
+## Usage
+```python
+import torch
+import torch.nn as nn
+from huggingface_hub import hf_hub_download
+class DriveBenchModel(nn.Module):
+    def __init__(self, embed_dim=256, n_frames=5, n_regions=6):
+        super().__init__()
+        self.cls_token = nn.Parameter(torch.randn(1,1,768))
+        self.pos_embed = nn.Embedding(n_frames+1, 768)
+        layer = nn.TransformerEncoderLayer(
+            d_model=768, nhead=8, dim_feedforward=2048,
+            dropout=0.1, batch_first=True, norm_first=True)
+        self.transformer = nn.TransformerEncoder(layer, num_layers=3)
+        self.norm = nn.LayerNorm(768)
+        self.projector = nn.Sequential(
+            nn.Linear(768,512), nn.GELU(), nn.Dropout(0.15),
+            nn.Linear(512,embed_dim), nn.LayerNorm(embed_dim))
+    def encode(self, x):
+        B = x.shape[0]
+        cls = self.cls_token.expand(B,-1,-1)
+        x = torch.cat([cls,x],dim=1)
+        pos = torch.arange(x.shape[1], device=x.device)
+        x = x + self.pos_embed(pos)
+        x = self.norm(self.transformer(x))
+        return self.projector(x[:,0])
+path = hf_hub_download("Trazemag/DriveBench", "drivebench_best.pt")
+model = DriveBenchModel()
+ckpt = torch.load(path, map_location="cpu", weights_only=False)
+model.load_state_dict(ckpt["model_state"])
+model.eval()
+# Input:  (batch, 5, 768) ViT-B/16 features from 5 consecutive frames
+# Output: (batch, 256) DriveBench embedding
+# Use as features for any downstream driving task
+```
+## Pre-computed Embeddings
+298,326 embeddings already computed — download and use directly:
+```python
+import numpy as np
+from huggingface_hub import hf_hub_download
+path = hf_hub_download(
+    "Trazemag/DriveBench-Embeddings",
+    "drivebench_embeddings.npz",
+    repo_type="dataset")
+data = np.load(path)
+embeddings = data["embeddings"]  # (298326, 256)
+```
+## Training Data
+Built on the [NVIDIA PhysicalAI-AV](https://huggingface.co/datasets/nvidia/PhysicalAI-Autonomous-Vehicles)
+dataset (gated — request access at HuggingFace).
+Danger labels available at [Trazemag/PRECOG-Labels](https://huggingface.co/datasets/Trazemag/PRECOG-Labels).
+## Related Models
+| Model | Task | Link |
+|-------|------|------|
+| PRECOG-SENSE | Radar health from camera | [Trazemag/PRECOG-SENSE](https://huggingface.co/Trazemag/PRECOG-SENSE) |
+| PRECOG-HERALD | Danger anticipation | [Trazemag/PRECOG-HERALD](https://huggingface.co/Trazemag/PRECOG-HERALD) |
+| DriveBench | General scene encoder | This model |
+## Citation
+```bibtex
+@misc{upadhyay2026drivebench,
+  title  = {DriveBench: General-Purpose Driving Scene Encoder
+            via Multi-Task Safety-Focused Pre-training across 25 Countries},
+  author = {Upadhyay, Nikhil},
+  year   = {2026},
+  url    = {https://github.com/TrazeMaG/PRECOG-AV}
+}
+```