File size: 6,617 Bytes
ed24203 0595b87 e456192 ed24203 75d3dcc ed24203 75d3dcc ed24203 75d3dcc 0595b87 ed24203 75d3dcc ed24203 75d3dcc ed24203 75d3dcc 0595b87 75d3dcc ed24203 75d3dcc ed24203 0595b87 ed24203 75d3dcc 0595b87 ed24203 75d3dcc ed24203 92843c8 0595b87 ed24203 0595b87 ed24203 92843c8 ed24203 92843c8 ed24203 92843c8 ed24203 92843c8 0595b87 92843c8 ed24203 92843c8 ed24203 92843c8 ed24203 92843c8 0595b87 92843c8 0595b87 92843c8 ed24203 92843c8 ed24203 92843c8 ed24203 92843c8 ed24203 92843c8 ed24203 92843c8 ed24203 92843c8 0595b87 ed24203 92843c8 ed24203 92843c8 0595b87 ed24203 92843c8 ed24203 92843c8 ed24203 92843c8 ed24203 0595b87 ed24203 92843c8 ed24203 92843c8 ed24203 92843c8 ed24203 0595b87 92843c8 ed24203 92843c8 ed24203 92843c8 0595b87 ed24203 0595b87 ed24203 92843c8 ed24203 92843c8 ed24203 92843c8 ed24203 92843c8 ed24203 e456192 ed24203 92843c8 ed24203 75d3dcc ed24203 75d3dcc ed24203 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 | ---
license: mit
language:
- en
tags:
- object-detection
- re-identification
- construction
- aerial-vision
- rf-detr
- yolo
- yolo26
- dinov3
- osnet
- real-time
- tracking
pipeline_tag: object-detection
library_name: pytorch
datasets:
- roboflow
---
# ποΈ SiteSense β Model Weights
**Real-Time Construction Equipment Monitoring via Aerial Computer Vision**
[](https://github.com/Mahmoud-Zaafan/SiteSense)
[](LICENSE)
[](https://python.org)
[](https://pytorch.org)
---
## Overview
This repository hosts the trained model weights for [SiteSense](https://github.com/Mahmoud-Zaafan/SiteSense) β a real-time pipeline that **detects, tracks, identifies, and classifies the activity** of heavy construction equipment from drone/aerial video footage.
The system processes each frame through a multi-phase pipeline:
```
Video Frame β Detector (RF-DETR or YOLO26-L) β BoT-SORT Tracking β DINOv3 Re-ID β Activity Classification β Kafka Events
```
Two interchangeable detectors are provided. Switch at runtime via the `DETECTOR_TYPE` environment variable (`rfdetr` or `yolo`) β no rebuild required.
---
## Model Weights
| File | Size | Architecture | Task | Notes |
|:---|:---:|:---|:---|:---|
| `rfdetr_construction.pth` | 122 MB | RF-DETR (Real-time Foundation DETR) | 8-class object detection | **Default** β best accuracy, NMS-free set prediction |
| `yolo26l_construction_v1.pt` | 51 MB | YOLO26-L (Ultralytics, 24.8 M params) | 8-class object detection | Faster alternative β STAL, NMS-free, ProgLoss |
| `dinov3_reid_head.pth` | 5.4 MB | Linear projection head (1536β256β128) | Equipment re-identification | Trained contrastively on tracked equipment crops |
| `osnet_x0_25_msmt17.pt` | 2.9 MB | OSNet x0.25 | Appearance-based ReID for BoT-SORT | MSMT17 (pretrained) |
> **Note:** The DINOv3 ViT-B/16 backbone (~327 MB) is **not included** here. It is auto-downloaded from [facebook/dinov3-vitb16-pretrain-lvd1689m](https://huggingface.co/facebook/dinov3-vitb16-pretrain-lvd1689m) on first run using your `HF_TOKEN`.
---
## Detection Classes
Both detectors are fine-tuned on the same merged MOCS + ACID v2 dataset to recognize **8 classes** of construction equipment from aerial perspectives:
| ID | Class | ID | Class |
|:---:|:---|:---:|:---|
| 0 | Excavator | 4 | Mobile Crane |
| 1 | Dump Truck | 5 | Tower Crane |
| 2 | Bulldozer | 6 | Roller Compactor |
| 3 | Wheel Loader | 7 | Cement Mixer |
---
## Training Results
Both detectors were trained on the **identical** train/val/test split (42,733 / 4,615 / 990 images) for direct comparison. Numbers below are on the held-out val split.
### Detector Comparison (val split)
| Metric | RF-DETR (default) | YOLO26-L | Ξ (RF β YOLO) |
|:---|:---:|:---:|:---:|
| **mAP@50:95** | **0.761** | 0.740 | +2.1 pts |
| **mAP@50** | **0.910** | 0.905 | +0.5 pts |
| **F1 Score** | **0.886** | 0.876 | +1.0 pts |
| **Precision** | **0.929** | 0.924 | +0.5 pts |
| **Recall** | **0.847** | 0.834 | +1.3 pts |
| **FPS** (RTX 3050 Ti) | 9β10 | 11β13 | YOLO faster |
RF-DETR wins on **7 of 8** per-class AP50-95 (only bulldozer goes to YOLO26-L: 0.796 vs 0.785). The largest RF-DETR margins are on the most under-represented classes β **mobile_crane (+4.7 pts)** and **tower_crane (+6.0 pts)** β where set-based prediction handles long boom shapes and heavy occlusion better than YOLO's anchor-based head.
<details>
<summary><strong>Per-class AP@50:95</strong></summary>
| Class | RF-DETR | YOLO26-L |
|:---|:---:|:---:|
| Excavator | **0.811** | 0.806 |
| Dump Truck | **0.675** | 0.661 |
| Bulldozer | 0.785 | **0.796** |
| Wheel Loader | **0.810** | 0.792 |
| Mobile Crane | **0.675** | 0.628 |
| Tower Crane | **0.692** | 0.632 |
| Roller Compactor | **0.838** | 0.825 |
| Cement Mixer | **0.800** | 0.779 |
</details>
### DINOv3 Re-ID Projection Head
| Metric | Value |
|:---|:---:|
| **Contrastive Loss** | 0.0482 |
| **Accuracy** | 96.8% |
| Embedding Dim | 128-d L2-normalized |
| Training Pairs | ~12,000 positive pairs |
---
## Quick Start
### Option A: Download All Weights (Recommended)
```bash
pip install huggingface_hub
huggingface-cli download Zaafan/sitesense-weights --local-dir models/
```
This pulls all four weight files at once into your `models/` directory β both detectors plus both Re-ID heads.
### Option B: Python API
```python
from huggingface_hub import hf_hub_download
# Detectors (pick one or both)
hf_hub_download(repo_id="Zaafan/sitesense-weights", filename="rfdetr_construction.pth", local_dir="models/")
hf_hub_download(repo_id="Zaafan/sitesense-weights", filename="yolo26l_construction_v1.pt", local_dir="models/")
# Re-ID
hf_hub_download(repo_id="Zaafan/sitesense-weights", filename="dinov3_reid_head.pth", local_dir="models/")
hf_hub_download(repo_id="Zaafan/sitesense-weights", filename="osnet_x0_25_msmt17.pt", local_dir="models/")
```
### Option C: Auto-Download (Zero Setup)
The SiteSense pipeline automatically downloads missing weights on first run:
```python
# In services/cv-inference/main.py β resolve_weights() handles this transparently.
# It picks the right file based on DETECTOR_TYPE (yolo or rfdetr).
weights_path = resolve_weights('yolo26l_construction_v1.pt') # local first, HF fallback
```
---
## Usage with SiteSense Pipeline
```bash
# 1. Clone the repository
git clone https://github.com/Mahmoud-Zaafan/SiteSense.git
cd SiteSense
# 2. Download weights
huggingface-cli download Zaafan/sitesense-weights --local-dir models/
# 3. Configure environment
cp .env.example .env
# 4. Launch infrastructure
docker compose up --build -d
# 5a. Run pipeline with the default detector (YOLO26-L)
docker compose --profile pipeline up cv-inference
# 5b. Or switch to RF-DETR at runtime β no rebuild needed
DETECTOR_TYPE=rfdetr docker compose --profile pipeline up cv-inference
```
---
## Citation
If you use these weights in your research or projects, please cite:
```bibtex
@misc{sitesense2025,
author = {Mahmoud Zaafan},
title = {SiteSense: Real-Time Construction Equipment Monitoring via Aerial Computer Vision},
year = {2025},
url = {https://github.com/Mahmoud-Zaafan/SiteSense}
}
```
---
## License
All weights are released under the [MIT License](LICENSE).
|