File size: 6,617 Bytes
ed24203
 
 
 
 
 
 
 
 
 
0595b87
 
e456192
ed24203
 
 
 
 
 
 
 
75d3dcc
ed24203
75d3dcc
ed24203
75d3dcc
0595b87
ed24203
 
 
75d3dcc
ed24203
75d3dcc
ed24203
75d3dcc
0595b87
75d3dcc
ed24203
75d3dcc
ed24203
0595b87
ed24203
75d3dcc
0595b87
 
ed24203
75d3dcc
ed24203
92843c8
0595b87
ed24203
0595b87
 
 
ed24203
92843c8
ed24203
92843c8
ed24203
92843c8
ed24203
92843c8
0595b87
92843c8
ed24203
 
 
 
 
 
92843c8
ed24203
92843c8
ed24203
92843c8
0595b87
92843c8
0595b87
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
92843c8
ed24203
92843c8
ed24203
 
 
 
 
 
92843c8
ed24203
92843c8
ed24203
92843c8
ed24203
92843c8
ed24203
 
 
 
92843c8
0595b87
 
ed24203
92843c8
ed24203
 
92843c8
0595b87
 
 
 
 
 
 
ed24203
92843c8
ed24203
92843c8
ed24203
92843c8
ed24203
0595b87
 
 
ed24203
92843c8
ed24203
92843c8
ed24203
92843c8
ed24203
 
0595b87
 
92843c8
ed24203
 
92843c8
ed24203
 
92843c8
0595b87
 
 
 
ed24203
0595b87
 
 
ed24203
92843c8
ed24203
92843c8
ed24203
92843c8
ed24203
92843c8
ed24203
 
 
 
 
e456192
ed24203
 
92843c8
ed24203
75d3dcc
ed24203
75d3dcc
ed24203
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
---
license: mit
language:
- en
tags:
- object-detection
- re-identification
- construction
- aerial-vision
- rf-detr
- yolo
- yolo26
- dinov3
- osnet
- real-time
- tracking
pipeline_tag: object-detection
library_name: pytorch
datasets:
- roboflow
---

# πŸ—οΈ SiteSense β€” Model Weights

**Real-Time Construction Equipment Monitoring via Aerial Computer Vision**

[![GitHub](https://img.shields.io/badge/GitHub-Repository-181717?logo=github&logoColor=white)](https://github.com/Mahmoud-Zaafan/SiteSense)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
[![Python 3.11](https://img.shields.io/badge/Python-3.11-3776AB?logo=python&logoColor=white)](https://python.org)
[![PyTorch 2.2+](https://img.shields.io/badge/PyTorch-2.2+-EE4C2C?logo=pytorch&logoColor=white)](https://pytorch.org)

---

## Overview

This repository hosts the trained model weights for [SiteSense](https://github.com/Mahmoud-Zaafan/SiteSense) β€” a real-time pipeline that **detects, tracks, identifies, and classifies the activity** of heavy construction equipment from drone/aerial video footage.

The system processes each frame through a multi-phase pipeline:

```
Video Frame β†’ Detector (RF-DETR or YOLO26-L) β†’ BoT-SORT Tracking β†’ DINOv3 Re-ID β†’ Activity Classification β†’ Kafka Events
```

Two interchangeable detectors are provided. Switch at runtime via the `DETECTOR_TYPE` environment variable (`rfdetr` or `yolo`) β€” no rebuild required.

---

## Model Weights

| File | Size | Architecture | Task | Notes |
|:---|:---:|:---|:---|:---|
| `rfdetr_construction.pth` | 122 MB | RF-DETR (Real-time Foundation DETR) | 8-class object detection | **Default** β€” best accuracy, NMS-free set prediction |
| `yolo26l_construction_v1.pt` | 51 MB | YOLO26-L (Ultralytics, 24.8 M params) | 8-class object detection | Faster alternative β€” STAL, NMS-free, ProgLoss |
| `dinov3_reid_head.pth` | 5.4 MB | Linear projection head (1536β†’256β†’128) | Equipment re-identification | Trained contrastively on tracked equipment crops |
| `osnet_x0_25_msmt17.pt` | 2.9 MB | OSNet x0.25 | Appearance-based ReID for BoT-SORT | MSMT17 (pretrained) |

> **Note:** The DINOv3 ViT-B/16 backbone (~327 MB) is **not included** here. It is auto-downloaded from [facebook/dinov3-vitb16-pretrain-lvd1689m](https://huggingface.co/facebook/dinov3-vitb16-pretrain-lvd1689m) on first run using your `HF_TOKEN`.

---

## Detection Classes

Both detectors are fine-tuned on the same merged MOCS + ACID v2 dataset to recognize **8 classes** of construction equipment from aerial perspectives:

| ID | Class | ID | Class |
|:---:|:---|:---:|:---|
| 0 | Excavator | 4 | Mobile Crane |
| 1 | Dump Truck | 5 | Tower Crane |
| 2 | Bulldozer | 6 | Roller Compactor |
| 3 | Wheel Loader | 7 | Cement Mixer |

---

## Training Results

Both detectors were trained on the **identical** train/val/test split (42,733 / 4,615 / 990 images) for direct comparison. Numbers below are on the held-out val split.

### Detector Comparison (val split)

| Metric | RF-DETR (default) | YOLO26-L | Ξ” (RF βˆ’ YOLO) |
|:---|:---:|:---:|:---:|
| **mAP@50:95** | **0.761** | 0.740 | +2.1 pts |
| **mAP@50** | **0.910** | 0.905 | +0.5 pts |
| **F1 Score** | **0.886** | 0.876 | +1.0 pts |
| **Precision** | **0.929** | 0.924 | +0.5 pts |
| **Recall** | **0.847** | 0.834 | +1.3 pts |
| **FPS** (RTX 3050 Ti) | 9–10 | 11–13 | YOLO faster |

RF-DETR wins on **7 of 8** per-class AP50-95 (only bulldozer goes to YOLO26-L: 0.796 vs 0.785). The largest RF-DETR margins are on the most under-represented classes β€” **mobile_crane (+4.7 pts)** and **tower_crane (+6.0 pts)** β€” where set-based prediction handles long boom shapes and heavy occlusion better than YOLO's anchor-based head.

<details>
<summary><strong>Per-class AP@50:95</strong></summary>

| Class | RF-DETR | YOLO26-L |
|:---|:---:|:---:|
| Excavator | **0.811** | 0.806 |
| Dump Truck | **0.675** | 0.661 |
| Bulldozer | 0.785 | **0.796** |
| Wheel Loader | **0.810** | 0.792 |
| Mobile Crane | **0.675** | 0.628 |
| Tower Crane | **0.692** | 0.632 |
| Roller Compactor | **0.838** | 0.825 |
| Cement Mixer | **0.800** | 0.779 |

</details>

### DINOv3 Re-ID Projection Head

| Metric | Value |
|:---|:---:|
| **Contrastive Loss** | 0.0482 |
| **Accuracy** | 96.8% |
| Embedding Dim | 128-d L2-normalized |
| Training Pairs | ~12,000 positive pairs |

---

## Quick Start

### Option A: Download All Weights (Recommended)

```bash
pip install huggingface_hub
huggingface-cli download Zaafan/sitesense-weights --local-dir models/
```

This pulls all four weight files at once into your `models/` directory β€” both detectors plus both Re-ID heads.

### Option B: Python API

```python
from huggingface_hub import hf_hub_download

# Detectors (pick one or both)
hf_hub_download(repo_id="Zaafan/sitesense-weights", filename="rfdetr_construction.pth",     local_dir="models/")
hf_hub_download(repo_id="Zaafan/sitesense-weights", filename="yolo26l_construction_v1.pt",  local_dir="models/")

# Re-ID
hf_hub_download(repo_id="Zaafan/sitesense-weights", filename="dinov3_reid_head.pth",        local_dir="models/")
hf_hub_download(repo_id="Zaafan/sitesense-weights", filename="osnet_x0_25_msmt17.pt",       local_dir="models/")
```

### Option C: Auto-Download (Zero Setup)

The SiteSense pipeline automatically downloads missing weights on first run:

```python
# In services/cv-inference/main.py β€” resolve_weights() handles this transparently.
# It picks the right file based on DETECTOR_TYPE (yolo or rfdetr).
weights_path = resolve_weights('yolo26l_construction_v1.pt')  # local first, HF fallback
```

---

## Usage with SiteSense Pipeline

```bash
# 1. Clone the repository
git clone https://github.com/Mahmoud-Zaafan/SiteSense.git
cd SiteSense

# 2. Download weights
huggingface-cli download Zaafan/sitesense-weights --local-dir models/

# 3. Configure environment
cp .env.example .env

# 4. Launch infrastructure
docker compose up --build -d

# 5a. Run pipeline with the default detector (YOLO26-L)
docker compose --profile pipeline up cv-inference

# 5b. Or switch to RF-DETR at runtime β€” no rebuild needed
DETECTOR_TYPE=rfdetr docker compose --profile pipeline up cv-inference
```

---

## Citation

If you use these weights in your research or projects, please cite:

```bibtex
@misc{sitesense2025,
  author = {Mahmoud Zaafan},
  title  = {SiteSense: Real-Time Construction Equipment Monitoring via Aerial Computer Vision},
  year   = {2025},
  url    = {https://github.com/Mahmoud-Zaafan/SiteSense}
}
```

---

## License

All weights are released under the [MIT License](LICENSE).