Zaafan
/

sitesense-weights

@@ -8,6 +8,8 @@ tags:
 - construction
 - aerial-vision
 - rf-detr
 - dinov3
 - osnet
 - real-time
@@ -22,7 +24,7 @@ datasets:
 **Real-Time Construction Equipment Monitoring via Aerial Computer Vision**
-[![GitHub](https://img.shields.io/badge/GitHub-Repository-181717?logo=github&logoColor=white)](https://github.com/Mahmoud-Zaafan/asdfqer)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
 [![Python 3.11](https://img.shields.io/badge/Python-3.11-3776AB?logo=python&logoColor=white)](https://python.org)
 [![PyTorch 2.2+](https://img.shields.io/badge/PyTorch-2.2+-EE4C2C?logo=pytorch&logoColor=white)](https://pytorch.org)
@@ -31,22 +33,25 @@ datasets:
 ## Overview
-This repository hosts the trained model weights for [SiteSense](https://github.com/Mahmoud-Zaafan/asdfqer) — a real-time pipeline that **detects, tracks, identifies, and classifies the activity** of heavy construction equipment from drone/aerial video footage.
 The system processes each frame through a multi-phase pipeline:
 ```
-Video Frame → RF-DETR Detection → BoT-SORT Tracking → DINOv3 Re-ID → Activity Classification → Kafka Events
 ```
 ---
 ## Model Weights
-| File | Size | Architecture | Task | Training Data |
 |:---|:---:|:---|:---|:---|
-| `rfdetr_construction.pth` | 122 MB | RF-DETR (Real-time Foundation DETR) | 8-class object detection | Custom aerial construction dataset (Roboflow) |
-| `dinov3_reid_head.pth` | 5.4 MB | Linear projection head (1536→256→128) | Equipment re-identification | Contrastive pairs from tracked equipment |
 | `osnet_x0_25_msmt17.pt` | 2.9 MB | OSNet x0.25 | Appearance-based ReID for BoT-SORT | MSMT17 (pretrained) |
 > **Note:** The DINOv3 ViT-B/16 backbone (~327 MB) is **not included** here. It is auto-downloaded from [facebook/dinov3-vitb16-pretrain-lvd1689m](https://huggingface.co/facebook/dinov3-vitb16-pretrain-lvd1689m) on first run using your `HF_TOKEN`.
@@ -55,7 +60,7 @@ Video Frame → RF-DETR Detection → BoT-SORT Tracking → DINOv3 Re-ID → Act
 ## Detection Classes
-The RF-DETR detector is fine-tuned to recognize **8 classes** of construction equipment from aerial perspectives:
 | ID | Class | ID | Class |
 |:---:|:---|:---:|:---|
@@ -68,17 +73,36 @@ The RF-DETR detector is fine-tuned to recognize **8 classes** of construction eq
 ## Training Results
-### RF-DETR Detector
-| Metric | Value |
-|:---|:---:|
-| **mAP@50** | 0.8340 |
-| **mAP@50:95** | 0.7607 |
-| **F1 Score** | 0.8859 |
-| **Precision** | 0.8666 |
-| **Recall** | 0.9061 |
-| Resolution | 560×560 |
-| Epochs | 70 |
 ### DINOv3 Re-ID Projection Head
@@ -100,15 +124,20 @@ pip install huggingface_hub
 huggingface-cli download Zaafan/sitesense-weights --local-dir models/
 ```
 ### Option B: Python API
 ```python
 from huggingface_hub import hf_hub_download
-# Download individual weights
-hf_hub_download(repo_id="Zaafan/sitesense-weights", filename="rfdetr_construction.pth", local_dir="models/")
-hf_hub_download(repo_id="Zaafan/sitesense-weights", filename="dinov3_reid_head.pth", local_dir="models/")
-hf_hub_download(repo_id="Zaafan/sitesense-weights", filename="osnet_x0_25_msmt17.pt", local_dir="models/")
 ```
 ### Option C: Auto-Download (Zero Setup)
@@ -116,8 +145,9 @@ hf_hub_download(repo_id="Zaafan/sitesense-weights", filename="osnet_x0_25_msmt17
 The SiteSense pipeline automatically downloads missing weights on first run:
 ```python
-# In services/cv-inference/main.py — resolve_weights() handles this transparently
-weights_path = resolve_weights('rfdetr_construction.pth')  # local first, HF fallback
 ```
 ---
@@ -126,8 +156,8 @@ weights_path = resolve_weights('rfdetr_construction.pth')  # local first, HF fal
 ```bash
 # 1. Clone the repository
-git clone https://github.com/Mahmoud-Zaafan/asdfqer.git
-cd asdfqer
 # 2. Download weights
 huggingface-cli download Zaafan/sitesense-weights --local-dir models/
@@ -135,9 +165,14 @@ huggingface-cli download Zaafan/sitesense-weights --local-dir models/
 # 3. Configure environment
 cp .env.example .env
-# 4. Launch infrastructure + run pipeline
-docker compose up --build
 docker compose --profile pipeline up cv-inference
 ```
 ---

 - construction
 - aerial-vision
 - rf-detr
+- yolo
+- yolo26
 - dinov3
 - osnet
 - real-time
 **Real-Time Construction Equipment Monitoring via Aerial Computer Vision**
+[![GitHub](https://img.shields.io/badge/GitHub-Repository-181717?logo=github&logoColor=white)](https://github.com/Mahmoud-Zaafan/SiteSense)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
 [![Python 3.11](https://img.shields.io/badge/Python-3.11-3776AB?logo=python&logoColor=white)](https://python.org)
 [![PyTorch 2.2+](https://img.shields.io/badge/PyTorch-2.2+-EE4C2C?logo=pytorch&logoColor=white)](https://pytorch.org)
 ## Overview
+This repository hosts the trained model weights for [SiteSense](https://github.com/Mahmoud-Zaafan/SiteSense) — a real-time pipeline that **detects, tracks, identifies, and classifies the activity** of heavy construction equipment from drone/aerial video footage.
 The system processes each frame through a multi-phase pipeline:
 ```
+Video Frame → Detector (RF-DETR or YOLO26-L) → BoT-SORT Tracking → DINOv3 Re-ID → Activity Classification → Kafka Events
 ```
+Two interchangeable detectors are provided. Switch at runtime via the `DETECTOR_TYPE` environment variable (`rfdetr` or `yolo`) — no rebuild required.
 ---
 ## Model Weights
+| File | Size | Architecture | Task | Notes |
 |:---|:---:|:---|:---|:---|
+| `rfdetr_construction.pth` | 122 MB | RF-DETR (Real-time Foundation DETR) | 8-class object detection | **Default** — best accuracy, NMS-free set prediction |
+| `yolo26l_construction_v1.pt` | 51 MB | YOLO26-L (Ultralytics, 24.8 M params) | 8-class object detection | Faster alternative — STAL, NMS-free, ProgLoss |
+| `dinov3_reid_head.pth` | 5.4 MB | Linear projection head (1536→256→128) | Equipment re-identification | Trained contrastively on tracked equipment crops |
 | `osnet_x0_25_msmt17.pt` | 2.9 MB | OSNet x0.25 | Appearance-based ReID for BoT-SORT | MSMT17 (pretrained) |
 > **Note:** The DINOv3 ViT-B/16 backbone (~327 MB) is **not included** here. It is auto-downloaded from [facebook/dinov3-vitb16-pretrain-lvd1689m](https://huggingface.co/facebook/dinov3-vitb16-pretrain-lvd1689m) on first run using your `HF_TOKEN`.
 ## Detection Classes
+Both detectors are fine-tuned on the same merged MOCS + ACID v2 dataset to recognize **8 classes** of construction equipment from aerial perspectives:
 | ID | Class | ID | Class |
 |:---:|:---|:---:|:---|
 ## Training Results
+Both detectors were trained on the **identical** train/val/test split (42,733 / 4,615 / 990 images) for direct comparison. Numbers below are on the held-out val split.
+### Detector Comparison (val split)
+| Metric | RF-DETR (default) | YOLO26-L | Δ (RF − YOLO) |
+|:---|:---:|:---:|:---:|
+| **mAP@50:95** | **0.761** | 0.740 | +2.1 pts |
+| **mAP@50** | **0.910** | 0.905 | +0.5 pts |
+| **F1 Score** | **0.886** | 0.876 | +1.0 pts |
+| **Precision** | **0.929** | 0.924 | +0.5 pts |
+| **Recall** | **0.847** | 0.834 | +1.3 pts |
+| **FPS** (RTX 3050 Ti) | 9–10 | 11–13 | YOLO faster |
+RF-DETR wins on **7 of 8** per-class AP50-95 (only bulldozer goes to YOLO26-L: 0.796 vs 0.785). The largest RF-DETR margins are on the most under-represented classes — **mobile_crane (+4.7 pts)** and **tower_crane (+6.0 pts)** — where set-based prediction handles long boom shapes and heavy occlusion better than YOLO's anchor-based head.
+<details>
+<summary><strong>Per-class AP@50:95</strong></summary>
+| Class | RF-DETR | YOLO26-L |
+|:---|:---:|:---:|
+| Excavator | **0.811** | 0.806 |
+| Dump Truck | **0.675** | 0.661 |
+| Bulldozer | 0.785 | **0.796** |
+| Wheel Loader | **0.810** | 0.792 |
+| Mobile Crane | **0.675** | 0.628 |
+| Tower Crane | **0.692** | 0.632 |
+| Roller Compactor | **0.838** | 0.825 |
+| Cement Mixer | **0.800** | 0.779 |
+</details>
 ### DINOv3 Re-ID Projection Head
 huggingface-cli download Zaafan/sitesense-weights --local-dir models/
 ```
+This pulls all four weight files at once into your `models/` directory — both detectors plus both Re-ID heads.
 ### Option B: Python API
 ```python
 from huggingface_hub import hf_hub_download
+# Detectors (pick one or both)
+hf_hub_download(repo_id="Zaafan/sitesense-weights", filename="rfdetr_construction.pth",     local_dir="models/")
+hf_hub_download(repo_id="Zaafan/sitesense-weights", filename="yolo26l_construction_v1.pt",  local_dir="models/")
+# Re-ID
+hf_hub_download(repo_id="Zaafan/sitesense-weights", filename="dinov3_reid_head.pth",        local_dir="models/")
+hf_hub_download(repo_id="Zaafan/sitesense-weights", filename="osnet_x0_25_msmt17.pt",       local_dir="models/")
 ```
 ### Option C: Auto-Download (Zero Setup)
 The SiteSense pipeline automatically downloads missing weights on first run:
 ```python
+# In services/cv-inference/main.py — resolve_weights() handles this transparently.
+# It picks the right file based on DETECTOR_TYPE (yolo or rfdetr).
+weights_path = resolve_weights('yolo26l_construction_v1.pt')  # local first, HF fallback
 ```
 ---
 ```bash
 # 1. Clone the repository
+git clone https://github.com/Mahmoud-Zaafan/SiteSense.git
+cd SiteSense
 # 2. Download weights
 huggingface-cli download Zaafan/sitesense-weights --local-dir models/
 # 3. Configure environment
 cp .env.example .env
+# 4. Launch infrastructure
+docker compose up --build -d
+# 5a. Run pipeline with the default detector (YOLO26-L)
 docker compose --profile pipeline up cv-inference
+# 5b. Or switch to RF-DETR at runtime — no rebuild needed
+DETECTOR_TYPE=rfdetr docker compose --profile pipeline up cv-inference
 ```
 ---