sitesense-weights / README.md
Zaafan's picture
Update README.md
0595b87 verified
---
license: mit
language:
- en
tags:
- object-detection
- re-identification
- construction
- aerial-vision
- rf-detr
- yolo
- yolo26
- dinov3
- osnet
- real-time
- tracking
pipeline_tag: object-detection
library_name: pytorch
datasets:
- roboflow
---
# πŸ—οΈ SiteSense β€” Model Weights
**Real-Time Construction Equipment Monitoring via Aerial Computer Vision**
[![GitHub](https://img.shields.io/badge/GitHub-Repository-181717?logo=github&logoColor=white)](https://github.com/Mahmoud-Zaafan/SiteSense)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
[![Python 3.11](https://img.shields.io/badge/Python-3.11-3776AB?logo=python&logoColor=white)](https://python.org)
[![PyTorch 2.2+](https://img.shields.io/badge/PyTorch-2.2+-EE4C2C?logo=pytorch&logoColor=white)](https://pytorch.org)
---
## Overview
This repository hosts the trained model weights for [SiteSense](https://github.com/Mahmoud-Zaafan/SiteSense) β€” a real-time pipeline that **detects, tracks, identifies, and classifies the activity** of heavy construction equipment from drone/aerial video footage.
The system processes each frame through a multi-phase pipeline:
```
Video Frame β†’ Detector (RF-DETR or YOLO26-L) β†’ BoT-SORT Tracking β†’ DINOv3 Re-ID β†’ Activity Classification β†’ Kafka Events
```
Two interchangeable detectors are provided. Switch at runtime via the `DETECTOR_TYPE` environment variable (`rfdetr` or `yolo`) β€” no rebuild required.
---
## Model Weights
| File | Size | Architecture | Task | Notes |
|:---|:---:|:---|:---|:---|
| `rfdetr_construction.pth` | 122 MB | RF-DETR (Real-time Foundation DETR) | 8-class object detection | **Default** β€” best accuracy, NMS-free set prediction |
| `yolo26l_construction_v1.pt` | 51 MB | YOLO26-L (Ultralytics, 24.8 M params) | 8-class object detection | Faster alternative β€” STAL, NMS-free, ProgLoss |
| `dinov3_reid_head.pth` | 5.4 MB | Linear projection head (1536β†’256β†’128) | Equipment re-identification | Trained contrastively on tracked equipment crops |
| `osnet_x0_25_msmt17.pt` | 2.9 MB | OSNet x0.25 | Appearance-based ReID for BoT-SORT | MSMT17 (pretrained) |
> **Note:** The DINOv3 ViT-B/16 backbone (~327 MB) is **not included** here. It is auto-downloaded from [facebook/dinov3-vitb16-pretrain-lvd1689m](https://huggingface.co/facebook/dinov3-vitb16-pretrain-lvd1689m) on first run using your `HF_TOKEN`.
---
## Detection Classes
Both detectors are fine-tuned on the same merged MOCS + ACID v2 dataset to recognize **8 classes** of construction equipment from aerial perspectives:
| ID | Class | ID | Class |
|:---:|:---|:---:|:---|
| 0 | Excavator | 4 | Mobile Crane |
| 1 | Dump Truck | 5 | Tower Crane |
| 2 | Bulldozer | 6 | Roller Compactor |
| 3 | Wheel Loader | 7 | Cement Mixer |
---
## Training Results
Both detectors were trained on the **identical** train/val/test split (42,733 / 4,615 / 990 images) for direct comparison. Numbers below are on the held-out val split.
### Detector Comparison (val split)
| Metric | RF-DETR (default) | YOLO26-L | Ξ” (RF βˆ’ YOLO) |
|:---|:---:|:---:|:---:|
| **mAP@50:95** | **0.761** | 0.740 | +2.1 pts |
| **mAP@50** | **0.910** | 0.905 | +0.5 pts |
| **F1 Score** | **0.886** | 0.876 | +1.0 pts |
| **Precision** | **0.929** | 0.924 | +0.5 pts |
| **Recall** | **0.847** | 0.834 | +1.3 pts |
| **FPS** (RTX 3050 Ti) | 9–10 | 11–13 | YOLO faster |
RF-DETR wins on **7 of 8** per-class AP50-95 (only bulldozer goes to YOLO26-L: 0.796 vs 0.785). The largest RF-DETR margins are on the most under-represented classes β€” **mobile_crane (+4.7 pts)** and **tower_crane (+6.0 pts)** β€” where set-based prediction handles long boom shapes and heavy occlusion better than YOLO's anchor-based head.
<details>
<summary><strong>Per-class AP@50:95</strong></summary>
| Class | RF-DETR | YOLO26-L |
|:---|:---:|:---:|
| Excavator | **0.811** | 0.806 |
| Dump Truck | **0.675** | 0.661 |
| Bulldozer | 0.785 | **0.796** |
| Wheel Loader | **0.810** | 0.792 |
| Mobile Crane | **0.675** | 0.628 |
| Tower Crane | **0.692** | 0.632 |
| Roller Compactor | **0.838** | 0.825 |
| Cement Mixer | **0.800** | 0.779 |
</details>
### DINOv3 Re-ID Projection Head
| Metric | Value |
|:---|:---:|
| **Contrastive Loss** | 0.0482 |
| **Accuracy** | 96.8% |
| Embedding Dim | 128-d L2-normalized |
| Training Pairs | ~12,000 positive pairs |
---
## Quick Start
### Option A: Download All Weights (Recommended)
```bash
pip install huggingface_hub
huggingface-cli download Zaafan/sitesense-weights --local-dir models/
```
This pulls all four weight files at once into your `models/` directory β€” both detectors plus both Re-ID heads.
### Option B: Python API
```python
from huggingface_hub import hf_hub_download
# Detectors (pick one or both)
hf_hub_download(repo_id="Zaafan/sitesense-weights", filename="rfdetr_construction.pth", local_dir="models/")
hf_hub_download(repo_id="Zaafan/sitesense-weights", filename="yolo26l_construction_v1.pt", local_dir="models/")
# Re-ID
hf_hub_download(repo_id="Zaafan/sitesense-weights", filename="dinov3_reid_head.pth", local_dir="models/")
hf_hub_download(repo_id="Zaafan/sitesense-weights", filename="osnet_x0_25_msmt17.pt", local_dir="models/")
```
### Option C: Auto-Download (Zero Setup)
The SiteSense pipeline automatically downloads missing weights on first run:
```python
# In services/cv-inference/main.py β€” resolve_weights() handles this transparently.
# It picks the right file based on DETECTOR_TYPE (yolo or rfdetr).
weights_path = resolve_weights('yolo26l_construction_v1.pt') # local first, HF fallback
```
---
## Usage with SiteSense Pipeline
```bash
# 1. Clone the repository
git clone https://github.com/Mahmoud-Zaafan/SiteSense.git
cd SiteSense
# 2. Download weights
huggingface-cli download Zaafan/sitesense-weights --local-dir models/
# 3. Configure environment
cp .env.example .env
# 4. Launch infrastructure
docker compose up --build -d
# 5a. Run pipeline with the default detector (YOLO26-L)
docker compose --profile pipeline up cv-inference
# 5b. Or switch to RF-DETR at runtime β€” no rebuild needed
DETECTOR_TYPE=rfdetr docker compose --profile pipeline up cv-inference
```
---
## Citation
If you use these weights in your research or projects, please cite:
```bibtex
@misc{sitesense2025,
author = {Mahmoud Zaafan},
title = {SiteSense: Real-Time Construction Equipment Monitoring via Aerial Computer Vision},
year = {2025},
url = {https://github.com/Mahmoud-Zaafan/SiteSense}
}
```
---
## License
All weights are released under the [MIT License](LICENSE).