Update README.md

0595b87 verified 29 minutes ago

6.62 kB

	---
	license: mit
	language:
	- en
	tags:
	- object-detection
	- re-identification
	- construction
	- aerial-vision
	- rf-detr
	- yolo
	- yolo26
	- dinov3
	- osnet
	- real-time
	- tracking
	pipeline_tag: object-detection
	library_name: pytorch
	datasets:
	- roboflow
	---

	# 🏗️ SiteSense — Model Weights

	Real-Time Construction Equipment Monitoring via Aerial Computer Vision

	[![GitHub](https://img.shields.io/badge/GitHub-Repository-181717?logo=github&logoColor=white)](https://github.com/Mahmoud-Zaafan/SiteSense)
	[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
	[![Python 3.11](https://img.shields.io/badge/Python-3.11-3776AB?logo=python&logoColor=white)](https://python.org)
	[![PyTorch 2.2+](https://img.shields.io/badge/PyTorch-2.2+-EE4C2C?logo=pytorch&logoColor=white)](https://pytorch.org)

	---

	## Overview

	This repository hosts the trained model weights for [SiteSense](https://github.com/Mahmoud-Zaafan/SiteSense) — a real-time pipeline that detects, tracks, identifies, and classifies the activity of heavy construction equipment from drone/aerial video footage.

	The system processes each frame through a multi-phase pipeline:

	```
	Video Frame → Detector (RF-DETR or YOLO26-L) → BoT-SORT Tracking → DINOv3 Re-ID → Activity Classification → Kafka Events
	```

	Two interchangeable detectors are provided. Switch at runtime via the `DETECTOR_TYPE` environment variable (`rfdetr` or `yolo`) — no rebuild required.

	---

	## Model Weights

	\| File \| Size \| Architecture \| Task \| Notes \|
	\|:---\|:---:\|:---\|:---\|:---\|
	\| `rfdetr_construction.pth` \| 122 MB \| RF-DETR (Real-time Foundation DETR) \| 8-class object detection \| Default — best accuracy, NMS-free set prediction \|
	\| `yolo26l_construction_v1.pt` \| 51 MB \| YOLO26-L (Ultralytics, 24.8 M params) \| 8-class object detection \| Faster alternative — STAL, NMS-free, ProgLoss \|
	\| `dinov3_reid_head.pth` \| 5.4 MB \| Linear projection head (1536→256→128) \| Equipment re-identification \| Trained contrastively on tracked equipment crops \|
	\| `osnet_x0_25_msmt17.pt` \| 2.9 MB \| OSNet x0.25 \| Appearance-based ReID for BoT-SORT \| MSMT17 (pretrained) \|

	> Note: The DINOv3 ViT-B/16 backbone (~327 MB) is not included here. It is auto-downloaded from [facebook/dinov3-vitb16-pretrain-lvd1689m](https://huggingface.co/facebook/dinov3-vitb16-pretrain-lvd1689m) on first run using your `HF_TOKEN`.

	---

	## Detection Classes

	Both detectors are fine-tuned on the same merged MOCS + ACID v2 dataset to recognize 8 classes of construction equipment from aerial perspectives:

	\| ID \| Class \| ID \| Class \|
	\|:---:\|:---\|:---:\|:---\|
	\| 0 \| Excavator \| 4 \| Mobile Crane \|
	\| 1 \| Dump Truck \| 5 \| Tower Crane \|
	\| 2 \| Bulldozer \| 6 \| Roller Compactor \|
	\| 3 \| Wheel Loader \| 7 \| Cement Mixer \|

	---

	## Training Results

	Both detectors were trained on the identical train/val/test split (42,733 / 4,615 / 990 images) for direct comparison. Numbers below are on the held-out val split.

	### Detector Comparison (val split)

	\| Metric \| RF-DETR (default) \| YOLO26-L \| Δ (RF − YOLO) \|
	\|:---\|:---:\|:---:\|:---:\|
	\| mAP@50:95 \| 0.761 \| 0.740 \| +2.1 pts \|
	\| mAP@50 \| 0.910 \| 0.905 \| +0.5 pts \|
	\| F1 Score \| 0.886 \| 0.876 \| +1.0 pts \|
	\| Precision \| 0.929 \| 0.924 \| +0.5 pts \|
	\| Recall \| 0.847 \| 0.834 \| +1.3 pts \|
	\| FPS (RTX 3050 Ti) \| 9–10 \| 11–13 \| YOLO faster \|

	RF-DETR wins on 7 of 8 per-class AP50-95 (only bulldozer goes to YOLO26-L: 0.796 vs 0.785). The largest RF-DETR margins are on the most under-represented classes — mobile_crane (+4.7 pts) and tower_crane (+6.0 pts) — where set-based prediction handles long boom shapes and heavy occlusion better than YOLO's anchor-based head.

	<details>
	<summary><strong>Per-class AP@50:95</strong></summary>

	\| Class \| RF-DETR \| YOLO26-L \|
	\|:---\|:---:\|:---:\|
	\| Excavator \| 0.811 \| 0.806 \|
	\| Dump Truck \| 0.675 \| 0.661 \|
	\| Bulldozer \| 0.785 \| 0.796 \|
	\| Wheel Loader \| 0.810 \| 0.792 \|
	\| Mobile Crane \| 0.675 \| 0.628 \|
	\| Tower Crane \| 0.692 \| 0.632 \|
	\| Roller Compactor \| 0.838 \| 0.825 \|
	\| Cement Mixer \| 0.800 \| 0.779 \|

	</details>

	### DINOv3 Re-ID Projection Head

	\| Metric \| Value \|
	\|:---\|:---:\|
	\| Contrastive Loss \| 0.0482 \|
	\| Accuracy \| 96.8% \|
	\| Embedding Dim \| 128-d L2-normalized \|
	\| Training Pairs \| ~12,000 positive pairs \|

	---

	## Quick Start

	### Option A: Download All Weights (Recommended)

	```bash
	pip install huggingface_hub
	huggingface-cli download Zaafan/sitesense-weights --local-dir models/
	```

	This pulls all four weight files at once into your `models/` directory — both detectors plus both Re-ID heads.

	### Option B: Python API

	```python
	from huggingface_hub import hf_hub_download

	# Detectors (pick one or both)
	hf_hub_download(repo_id="Zaafan/sitesense-weights", filename="rfdetr_construction.pth", local_dir="models/")
	hf_hub_download(repo_id="Zaafan/sitesense-weights", filename="yolo26l_construction_v1.pt", local_dir="models/")

	# Re-ID
	hf_hub_download(repo_id="Zaafan/sitesense-weights", filename="dinov3_reid_head.pth", local_dir="models/")
	hf_hub_download(repo_id="Zaafan/sitesense-weights", filename="osnet_x0_25_msmt17.pt", local_dir="models/")
	```

	### Option C: Auto-Download (Zero Setup)

	The SiteSense pipeline automatically downloads missing weights on first run:

	```python
	# In services/cv-inference/main.py — resolve_weights() handles this transparently.
	# It picks the right file based on DETECTOR_TYPE (yolo or rfdetr).
	weights_path = resolve_weights('yolo26l_construction_v1.pt') # local first, HF fallback
	```

	---

	## Usage with SiteSense Pipeline

	```bash
	# 1. Clone the repository
	git clone https://github.com/Mahmoud-Zaafan/SiteSense.git
	cd SiteSense

	# 2. Download weights
	huggingface-cli download Zaafan/sitesense-weights --local-dir models/

	# 3. Configure environment
	cp .env.example .env

	# 4. Launch infrastructure
	docker compose up --build -d

	# 5a. Run pipeline with the default detector (YOLO26-L)
	docker compose --profile pipeline up cv-inference

	# 5b. Or switch to RF-DETR at runtime — no rebuild needed
	DETECTOR_TYPE=rfdetr docker compose --profile pipeline up cv-inference
	```

	---

	## Citation

	If you use these weights in your research or projects, please cite:

	```bibtex
	@misc{sitesense2025,
	author = {Mahmoud Zaafan},
	title = {SiteSense: Real-Time Construction Equipment Monitoring via Aerial Computer Vision},
	year = {2025},
	url = {https://github.com/Mahmoud-Zaafan/SiteSense}
	}
	```

	---

	## License

	All weights are released under the [MIT License](LICENSE).