Upload README.md with huggingface_hub

2fe35b3 verified 15 days ago

4.97 kB

	---
	language:
	- en
	tags:
	- autonomous-vehicles
	- driving
	- representation-learning
	- multi-task-learning
	- computer-vision
	- safety
	license: mit
	---

	# DriveBench: General-Purpose Driving Scene Encoder

	Author: Nikhil Upadhyay \| MSc Business Analytics \| Dublin Business School
	Project: [PRECOG-AV](https://github.com/TrazeMaG/PRECOG-AV)

	## Overview

	DriveBench is the first general-purpose driving scene encoder trained with
	safety-focused multi-task supervision across **25 countries and 298,326 real
	driving clips** — the largest geographic scale in driving representation learning.

	Each clip is encoded into a 256-dimensional DriveBench embedding that
	simultaneously captures danger context, geographic driving patterns,
	time-of-day risk, radar sensor health, and traffic density.
	Use these embeddings like ImageNet features — but for driving scenes.

	## Results

	\| Task \| Metric \| Score \| Random Baseline \|
	\|------\|--------\|-------\|-----------------\|
	\| Danger Anticipation \| AUC \| 0.8385 \| 0.500 \|
	\| Geographic Region \| Accuracy \| 0.4438 \| 0.167 (6 classes) \|
	\| Time of Day \| Accuracy \| 0.5168 \| 0.250 (4 classes) \|
	\| Radar Health \| AUC \| 1.0000 \| 0.500 \|
	\| TTC Regression \| Pearson r \| 0.3009 \| 0.000 \|

	Tested on Greece and Bulgaria — countries never seen during training.

	## What makes this different

	All existing driving pre-training (DriveWorld, DriveTok, GASP) uses geometric
	proxy tasks — depth prediction, occupancy, reconstruction — on 1 to 3 cities.

	DriveBench uses safety-relevant supervision signals across 25 countries:
	- Danger labels from physics-based TTC analysis (not manual annotation)
	- Radar sensor health as a training signal
	- Geographic region (6 regions, 25 countries)
	- Time-of-day risk patterns (peak danger 13:00-15:00 confirmed)
	- Traffic density

	## Architecture
	ViT-B/16 features (5 frames × 768-dim)

	↓

	TransformerEncoder (3 layers, 8 heads, 2048 FFN)

	↓

	DriveBench Embedding (256-dim) ← use this downstream

	↓

	5 multi-task heads:

	Danger head → AUC 0.84

	Region head → Acc 0.44 (6 regions)

	Time-of-day → Acc 0.52 (4 buckets)

	Radar head → AUC 1.00

	TTC regression → r = 0.30

	## Usage

	```python
	import torch
	import torch.nn as nn
	from huggingface_hub import hf_hub_download

	class DriveBenchModel(nn.Module):
	def __init__(self, embed_dim=256, n_frames=5, n_regions=6):
	super().__init__()
	self.cls_token = nn.Parameter(torch.randn(1,1,768))
	self.pos_embed = nn.Embedding(n_frames+1, 768)
	layer = nn.TransformerEncoderLayer(
	d_model=768, nhead=8, dim_feedforward=2048,
	dropout=0.1, batch_first=True, norm_first=True)
	self.transformer = nn.TransformerEncoder(layer, num_layers=3)
	self.norm = nn.LayerNorm(768)
	self.projector = nn.Sequential(
	nn.Linear(768,512), nn.GELU(), nn.Dropout(0.15),
	nn.Linear(512,embed_dim), nn.LayerNorm(embed_dim))

	def encode(self, x):
	B = x.shape[0]
	cls = self.cls_token.expand(B,-1,-1)
	x = torch.cat([cls,x],dim=1)
	pos = torch.arange(x.shape[1], device=x.device)
	x = x + self.pos_embed(pos)
	x = self.norm(self.transformer(x))
	return self.projector(x[:,0])

	path = hf_hub_download("Trazemag/DriveBench", "drivebench_best.pt")
	model = DriveBenchModel()
	ckpt = torch.load(path, map_location="cpu", weights_only=False)
	model.load_state_dict(ckpt["model_state"])
	model.eval()

	# Input: (batch, 5, 768) ViT-B/16 features from 5 consecutive frames
	# Output: (batch, 256) DriveBench embedding
	# Use as features for any downstream driving task
	```

	## Pre-computed Embeddings

	298,326 embeddings already computed — download and use directly:

	```python
	import numpy as np
	from huggingface_hub import hf_hub_download

	path = hf_hub_download(
	"Trazemag/DriveBench-Embeddings",
	"drivebench_embeddings.npz",
	repo_type="dataset")
	data = np.load(path)
	embeddings = data["embeddings"] # (298326, 256)
	```

	## Training Data

	Built on the [NVIDIA PhysicalAI-AV](https://huggingface.co/datasets/nvidia/PhysicalAI-Autonomous-Vehicles)
	dataset (gated — request access at HuggingFace).

	Danger labels available at [Trazemag/PRECOG-Labels](https://huggingface.co/datasets/Trazemag/PRECOG-Labels).

	## Related Models

	\| Model \| Task \| Link \|
	\|-------\|------\|------\|
	\| PRECOG-SENSE \| Radar health from camera \| [Trazemag/PRECOG-SENSE](https://huggingface.co/Trazemag/PRECOG-SENSE) \|
	\| PRECOG-HERALD \| Danger anticipation \| [Trazemag/PRECOG-HERALD](https://huggingface.co/Trazemag/PRECOG-HERALD) \|
	\| DriveBench \| General scene encoder \| This model \|

	## Citation

	```bibtex
	@misc{upadhyay2026drivebench,
	title = {DriveBench: General-Purpose Driving Scene Encoder
	via Multi-Task Safety-Focused Pre-training across 25 Countries},
	author = {Upadhyay, Nikhil},
	year = {2026},
	url = {https://github.com/TrazeMaG/PRECOG-AV}
	}
	```