Trazemag commited on
Commit
2fe35b3
Β·
verified Β·
1 Parent(s): 296bbea

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +161 -0
README.md ADDED
@@ -0,0 +1,161 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - autonomous-vehicles
6
+ - driving
7
+ - representation-learning
8
+ - multi-task-learning
9
+ - computer-vision
10
+ - safety
11
+ license: mit
12
+ ---
13
+
14
+ # DriveBench: General-Purpose Driving Scene Encoder
15
+
16
+ **Author:** Nikhil Upadhyay | MSc Business Analytics | Dublin Business School
17
+ **Project:** [PRECOG-AV](https://github.com/TrazeMaG/PRECOG-AV)
18
+
19
+ ## Overview
20
+
21
+ DriveBench is the first general-purpose driving scene encoder trained with
22
+ safety-focused multi-task supervision across **25 countries and 298,326 real
23
+ driving clips** β€” the largest geographic scale in driving representation learning.
24
+
25
+ Each clip is encoded into a **256-dimensional DriveBench embedding** that
26
+ simultaneously captures danger context, geographic driving patterns,
27
+ time-of-day risk, radar sensor health, and traffic density.
28
+ Use these embeddings like ImageNet features β€” but for driving scenes.
29
+
30
+ ## Results
31
+
32
+ | Task | Metric | Score | Random Baseline |
33
+ |------|--------|-------|-----------------|
34
+ | Danger Anticipation | AUC | **0.8385** | 0.500 |
35
+ | Geographic Region | Accuracy | **0.4438** | 0.167 (6 classes) |
36
+ | Time of Day | Accuracy | **0.5168** | 0.250 (4 classes) |
37
+ | Radar Health | AUC | **1.0000** | 0.500 |
38
+ | TTC Regression | Pearson r | **0.3009** | 0.000 |
39
+
40
+ Tested on Greece and Bulgaria β€” countries never seen during training.
41
+
42
+ ## What makes this different
43
+
44
+ All existing driving pre-training (DriveWorld, DriveTok, GASP) uses geometric
45
+ proxy tasks β€” depth prediction, occupancy, reconstruction β€” on 1 to 3 cities.
46
+
47
+ DriveBench uses **safety-relevant supervision signals** across **25 countries**:
48
+ - Danger labels from physics-based TTC analysis (not manual annotation)
49
+ - Radar sensor health as a training signal
50
+ - Geographic region (6 regions, 25 countries)
51
+ - Time-of-day risk patterns (peak danger 13:00-15:00 confirmed)
52
+ - Traffic density
53
+
54
+ ## Architecture
55
+ ViT-B/16 features (5 frames Γ— 768-dim)
56
+
57
+ ↓
58
+
59
+ TransformerEncoder (3 layers, 8 heads, 2048 FFN)
60
+
61
+ ↓
62
+
63
+ DriveBench Embedding (256-dim) ← use this downstream
64
+
65
+ ↓
66
+
67
+ 5 multi-task heads:
68
+
69
+ Danger head β†’ AUC 0.84
70
+
71
+ Region head β†’ Acc 0.44 (6 regions)
72
+
73
+ Time-of-day β†’ Acc 0.52 (4 buckets)
74
+
75
+ Radar head β†’ AUC 1.00
76
+
77
+ TTC regression β†’ r = 0.30
78
+
79
+ ## Usage
80
+
81
+ ```python
82
+ import torch
83
+ import torch.nn as nn
84
+ from huggingface_hub import hf_hub_download
85
+
86
+ class DriveBenchModel(nn.Module):
87
+ def __init__(self, embed_dim=256, n_frames=5, n_regions=6):
88
+ super().__init__()
89
+ self.cls_token = nn.Parameter(torch.randn(1,1,768))
90
+ self.pos_embed = nn.Embedding(n_frames+1, 768)
91
+ layer = nn.TransformerEncoderLayer(
92
+ d_model=768, nhead=8, dim_feedforward=2048,
93
+ dropout=0.1, batch_first=True, norm_first=True)
94
+ self.transformer = nn.TransformerEncoder(layer, num_layers=3)
95
+ self.norm = nn.LayerNorm(768)
96
+ self.projector = nn.Sequential(
97
+ nn.Linear(768,512), nn.GELU(), nn.Dropout(0.15),
98
+ nn.Linear(512,embed_dim), nn.LayerNorm(embed_dim))
99
+
100
+ def encode(self, x):
101
+ B = x.shape[0]
102
+ cls = self.cls_token.expand(B,-1,-1)
103
+ x = torch.cat([cls,x],dim=1)
104
+ pos = torch.arange(x.shape[1], device=x.device)
105
+ x = x + self.pos_embed(pos)
106
+ x = self.norm(self.transformer(x))
107
+ return self.projector(x[:,0])
108
+
109
+ path = hf_hub_download("Trazemag/DriveBench", "drivebench_best.pt")
110
+ model = DriveBenchModel()
111
+ ckpt = torch.load(path, map_location="cpu", weights_only=False)
112
+ model.load_state_dict(ckpt["model_state"])
113
+ model.eval()
114
+
115
+ # Input: (batch, 5, 768) ViT-B/16 features from 5 consecutive frames
116
+ # Output: (batch, 256) DriveBench embedding
117
+ # Use as features for any downstream driving task
118
+ ```
119
+
120
+ ## Pre-computed Embeddings
121
+
122
+ 298,326 embeddings already computed β€” download and use directly:
123
+
124
+ ```python
125
+ import numpy as np
126
+ from huggingface_hub import hf_hub_download
127
+
128
+ path = hf_hub_download(
129
+ "Trazemag/DriveBench-Embeddings",
130
+ "drivebench_embeddings.npz",
131
+ repo_type="dataset")
132
+ data = np.load(path)
133
+ embeddings = data["embeddings"] # (298326, 256)
134
+ ```
135
+
136
+ ## Training Data
137
+
138
+ Built on the [NVIDIA PhysicalAI-AV](https://huggingface.co/datasets/nvidia/PhysicalAI-Autonomous-Vehicles)
139
+ dataset (gated β€” request access at HuggingFace).
140
+
141
+ Danger labels available at [Trazemag/PRECOG-Labels](https://huggingface.co/datasets/Trazemag/PRECOG-Labels).
142
+
143
+ ## Related Models
144
+
145
+ | Model | Task | Link |
146
+ |-------|------|------|
147
+ | PRECOG-SENSE | Radar health from camera | [Trazemag/PRECOG-SENSE](https://huggingface.co/Trazemag/PRECOG-SENSE) |
148
+ | PRECOG-HERALD | Danger anticipation | [Trazemag/PRECOG-HERALD](https://huggingface.co/Trazemag/PRECOG-HERALD) |
149
+ | DriveBench | General scene encoder | This model |
150
+
151
+ ## Citation
152
+
153
+ ```bibtex
154
+ @misc{upadhyay2026drivebench,
155
+ title = {DriveBench: General-Purpose Driving Scene Encoder
156
+ via Multi-Task Safety-Focused Pre-training across 25 Countries},
157
+ author = {Upadhyay, Nikhil},
158
+ year = {2026},
159
+ url = {https://github.com/TrazeMaG/PRECOG-AV}
160
+ }
161
+ ```