| --- |
| license: mit |
| language: |
| - en |
| tags: |
| - object-detection |
| - re-identification |
| - construction |
| - aerial-vision |
| - rf-detr |
| - yolo |
| - yolo26 |
| - dinov3 |
| - osnet |
| - real-time |
| - tracking |
| pipeline_tag: object-detection |
| library_name: pytorch |
| datasets: |
| - roboflow |
| --- |
| |
| # ποΈ SiteSense β Model Weights |
|
|
| **Real-Time Construction Equipment Monitoring via Aerial Computer Vision** |
|
|
| [](https://github.com/Mahmoud-Zaafan/SiteSense) |
| [](LICENSE) |
| [](https://python.org) |
| [](https://pytorch.org) |
|
|
| --- |
|
|
| ## Overview |
|
|
| This repository hosts the trained model weights for [SiteSense](https://github.com/Mahmoud-Zaafan/SiteSense) β a real-time pipeline that **detects, tracks, identifies, and classifies the activity** of heavy construction equipment from drone/aerial video footage. |
|
|
| The system processes each frame through a multi-phase pipeline: |
|
|
| ``` |
| Video Frame β Detector (RF-DETR or YOLO26-L) β BoT-SORT Tracking β DINOv3 Re-ID β Activity Classification β Kafka Events |
| ``` |
|
|
| Two interchangeable detectors are provided. Switch at runtime via the `DETECTOR_TYPE` environment variable (`rfdetr` or `yolo`) β no rebuild required. |
|
|
| --- |
|
|
| ## Model Weights |
|
|
| | File | Size | Architecture | Task | Notes | |
| |:---|:---:|:---|:---|:---| |
| | `rfdetr_construction.pth` | 122 MB | RF-DETR (Real-time Foundation DETR) | 8-class object detection | **Default** β best accuracy, NMS-free set prediction | |
| | `yolo26l_construction_v1.pt` | 51 MB | YOLO26-L (Ultralytics, 24.8 M params) | 8-class object detection | Faster alternative β STAL, NMS-free, ProgLoss | |
| | `dinov3_reid_head.pth` | 5.4 MB | Linear projection head (1536β256β128) | Equipment re-identification | Trained contrastively on tracked equipment crops | |
| | `osnet_x0_25_msmt17.pt` | 2.9 MB | OSNet x0.25 | Appearance-based ReID for BoT-SORT | MSMT17 (pretrained) | |
|
|
| > **Note:** The DINOv3 ViT-B/16 backbone (~327 MB) is **not included** here. It is auto-downloaded from [facebook/dinov3-vitb16-pretrain-lvd1689m](https://huggingface.co/facebook/dinov3-vitb16-pretrain-lvd1689m) on first run using your `HF_TOKEN`. |
| |
| --- |
| |
| ## Detection Classes |
| |
| Both detectors are fine-tuned on the same merged MOCS + ACID v2 dataset to recognize **8 classes** of construction equipment from aerial perspectives: |
| |
| | ID | Class | ID | Class | |
| |:---:|:---|:---:|:---| |
| | 0 | Excavator | 4 | Mobile Crane | |
| | 1 | Dump Truck | 5 | Tower Crane | |
| | 2 | Bulldozer | 6 | Roller Compactor | |
| | 3 | Wheel Loader | 7 | Cement Mixer | |
| |
| --- |
| |
| ## Training Results |
| |
| Both detectors were trained on the **identical** train/val/test split (42,733 / 4,615 / 990 images) for direct comparison. Numbers below are on the held-out val split. |
| |
| ### Detector Comparison (val split) |
| |
| | Metric | RF-DETR (default) | YOLO26-L | Ξ (RF β YOLO) | |
| |:---|:---:|:---:|:---:| |
| | **mAP@50:95** | **0.761** | 0.740 | +2.1 pts | |
| | **mAP@50** | **0.910** | 0.905 | +0.5 pts | |
| | **F1 Score** | **0.886** | 0.876 | +1.0 pts | |
| | **Precision** | **0.929** | 0.924 | +0.5 pts | |
| | **Recall** | **0.847** | 0.834 | +1.3 pts | |
| | **FPS** (RTX 3050 Ti) | 9β10 | 11β13 | YOLO faster | |
| |
| RF-DETR wins on **7 of 8** per-class AP50-95 (only bulldozer goes to YOLO26-L: 0.796 vs 0.785). The largest RF-DETR margins are on the most under-represented classes β **mobile_crane (+4.7 pts)** and **tower_crane (+6.0 pts)** β where set-based prediction handles long boom shapes and heavy occlusion better than YOLO's anchor-based head. |
| |
| <details> |
| <summary><strong>Per-class AP@50:95</strong></summary> |
| |
| | Class | RF-DETR | YOLO26-L | |
| |:---|:---:|:---:| |
| | Excavator | **0.811** | 0.806 | |
| | Dump Truck | **0.675** | 0.661 | |
| | Bulldozer | 0.785 | **0.796** | |
| | Wheel Loader | **0.810** | 0.792 | |
| | Mobile Crane | **0.675** | 0.628 | |
| | Tower Crane | **0.692** | 0.632 | |
| | Roller Compactor | **0.838** | 0.825 | |
| | Cement Mixer | **0.800** | 0.779 | |
| |
| </details> |
| |
| ### DINOv3 Re-ID Projection Head |
| |
| | Metric | Value | |
| |:---|:---:| |
| | **Contrastive Loss** | 0.0482 | |
| | **Accuracy** | 96.8% | |
| | Embedding Dim | 128-d L2-normalized | |
| | Training Pairs | ~12,000 positive pairs | |
| |
| --- |
| |
| ## Quick Start |
| |
| ### Option A: Download All Weights (Recommended) |
| |
| ```bash |
| pip install huggingface_hub |
| huggingface-cli download Zaafan/sitesense-weights --local-dir models/ |
| ``` |
| |
| This pulls all four weight files at once into your `models/` directory β both detectors plus both Re-ID heads. |
| |
| ### Option B: Python API |
| |
| ```python |
| from huggingface_hub import hf_hub_download |
| |
| # Detectors (pick one or both) |
| hf_hub_download(repo_id="Zaafan/sitesense-weights", filename="rfdetr_construction.pth", local_dir="models/") |
| hf_hub_download(repo_id="Zaafan/sitesense-weights", filename="yolo26l_construction_v1.pt", local_dir="models/") |
|
|
| # Re-ID |
| hf_hub_download(repo_id="Zaafan/sitesense-weights", filename="dinov3_reid_head.pth", local_dir="models/") |
| hf_hub_download(repo_id="Zaafan/sitesense-weights", filename="osnet_x0_25_msmt17.pt", local_dir="models/") |
| ``` |
| |
| ### Option C: Auto-Download (Zero Setup) |
| |
| The SiteSense pipeline automatically downloads missing weights on first run: |
| |
| ```python |
| # In services/cv-inference/main.py β resolve_weights() handles this transparently. |
| # It picks the right file based on DETECTOR_TYPE (yolo or rfdetr). |
| weights_path = resolve_weights('yolo26l_construction_v1.pt') # local first, HF fallback |
| ``` |
| |
| --- |
| |
| ## Usage with SiteSense Pipeline |
| |
| ```bash |
| # 1. Clone the repository |
| git clone https://github.com/Mahmoud-Zaafan/SiteSense.git |
| cd SiteSense |
| |
| # 2. Download weights |
| huggingface-cli download Zaafan/sitesense-weights --local-dir models/ |
| |
| # 3. Configure environment |
| cp .env.example .env |
| |
| # 4. Launch infrastructure |
| docker compose up --build -d |
| |
| # 5a. Run pipeline with the default detector (YOLO26-L) |
| docker compose --profile pipeline up cv-inference |
| |
| # 5b. Or switch to RF-DETR at runtime β no rebuild needed |
| DETECTOR_TYPE=rfdetr docker compose --profile pipeline up cv-inference |
| ``` |
| |
| --- |
| |
| ## Citation |
| |
| If you use these weights in your research or projects, please cite: |
| |
| ```bibtex |
| @misc{sitesense2025, |
| author = {Mahmoud Zaafan}, |
| title = {SiteSense: Real-Time Construction Equipment Monitoring via Aerial Computer Vision}, |
| year = {2025}, |
| url = {https://github.com/Mahmoud-Zaafan/SiteSense} |
| } |
| ``` |
| |
| --- |
| |
| ## License |
| |
| All weights are released under the [MIT License](LICENSE). |
| |