Zaafan commited on
Commit
0595b87
Β·
verified Β·
1 Parent(s): dd8653b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +62 -27
README.md CHANGED
@@ -8,6 +8,8 @@ tags:
8
  - construction
9
  - aerial-vision
10
  - rf-detr
 
 
11
  - dinov3
12
  - osnet
13
  - real-time
@@ -22,7 +24,7 @@ datasets:
22
 
23
  **Real-Time Construction Equipment Monitoring via Aerial Computer Vision**
24
 
25
- [![GitHub](https://img.shields.io/badge/GitHub-Repository-181717?logo=github&logoColor=white)](https://github.com/Mahmoud-Zaafan/asdfqer)
26
  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
27
  [![Python 3.11](https://img.shields.io/badge/Python-3.11-3776AB?logo=python&logoColor=white)](https://python.org)
28
  [![PyTorch 2.2+](https://img.shields.io/badge/PyTorch-2.2+-EE4C2C?logo=pytorch&logoColor=white)](https://pytorch.org)
@@ -31,22 +33,25 @@ datasets:
31
 
32
  ## Overview
33
 
34
- This repository hosts the trained model weights for [SiteSense](https://github.com/Mahmoud-Zaafan/asdfqer) β€” a real-time pipeline that **detects, tracks, identifies, and classifies the activity** of heavy construction equipment from drone/aerial video footage.
35
 
36
  The system processes each frame through a multi-phase pipeline:
37
 
38
  ```
39
- Video Frame β†’ RF-DETR Detection β†’ BoT-SORT Tracking β†’ DINOv3 Re-ID β†’ Activity Classification β†’ Kafka Events
40
  ```
41
 
 
 
42
  ---
43
 
44
  ## Model Weights
45
 
46
- | File | Size | Architecture | Task | Training Data |
47
  |:---|:---:|:---|:---|:---|
48
- | `rfdetr_construction.pth` | 122 MB | RF-DETR (Real-time Foundation DETR) | 8-class object detection | Custom aerial construction dataset (Roboflow) |
49
- | `dinov3_reid_head.pth` | 5.4 MB | Linear projection head (1536β†’256β†’128) | Equipment re-identification | Contrastive pairs from tracked equipment |
 
50
  | `osnet_x0_25_msmt17.pt` | 2.9 MB | OSNet x0.25 | Appearance-based ReID for BoT-SORT | MSMT17 (pretrained) |
51
 
52
  > **Note:** The DINOv3 ViT-B/16 backbone (~327 MB) is **not included** here. It is auto-downloaded from [facebook/dinov3-vitb16-pretrain-lvd1689m](https://huggingface.co/facebook/dinov3-vitb16-pretrain-lvd1689m) on first run using your `HF_TOKEN`.
@@ -55,7 +60,7 @@ Video Frame β†’ RF-DETR Detection β†’ BoT-SORT Tracking β†’ DINOv3 Re-ID β†’ Act
55
 
56
  ## Detection Classes
57
 
58
- The RF-DETR detector is fine-tuned to recognize **8 classes** of construction equipment from aerial perspectives:
59
 
60
  | ID | Class | ID | Class |
61
  |:---:|:---|:---:|:---|
@@ -68,17 +73,36 @@ The RF-DETR detector is fine-tuned to recognize **8 classes** of construction eq
68
 
69
  ## Training Results
70
 
71
- ### RF-DETR Detector
72
 
73
- | Metric | Value |
74
- |:---|:---:|
75
- | **mAP@50** | 0.8340 |
76
- | **mAP@50:95** | 0.7607 |
77
- | **F1 Score** | 0.8859 |
78
- | **Precision** | 0.8666 |
79
- | **Recall** | 0.9061 |
80
- | Resolution | 560Γ—560 |
81
- | Epochs | 70 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
82
 
83
  ### DINOv3 Re-ID Projection Head
84
 
@@ -100,15 +124,20 @@ pip install huggingface_hub
100
  huggingface-cli download Zaafan/sitesense-weights --local-dir models/
101
  ```
102
 
 
 
103
  ### Option B: Python API
104
 
105
  ```python
106
  from huggingface_hub import hf_hub_download
107
 
108
- # Download individual weights
109
- hf_hub_download(repo_id="Zaafan/sitesense-weights", filename="rfdetr_construction.pth", local_dir="models/")
110
- hf_hub_download(repo_id="Zaafan/sitesense-weights", filename="dinov3_reid_head.pth", local_dir="models/")
111
- hf_hub_download(repo_id="Zaafan/sitesense-weights", filename="osnet_x0_25_msmt17.pt", local_dir="models/")
 
 
 
112
  ```
113
 
114
  ### Option C: Auto-Download (Zero Setup)
@@ -116,8 +145,9 @@ hf_hub_download(repo_id="Zaafan/sitesense-weights", filename="osnet_x0_25_msmt17
116
  The SiteSense pipeline automatically downloads missing weights on first run:
117
 
118
  ```python
119
- # In services/cv-inference/main.py β€” resolve_weights() handles this transparently
120
- weights_path = resolve_weights('rfdetr_construction.pth') # local first, HF fallback
 
121
  ```
122
 
123
  ---
@@ -126,8 +156,8 @@ weights_path = resolve_weights('rfdetr_construction.pth') # local first, HF fal
126
 
127
  ```bash
128
  # 1. Clone the repository
129
- git clone https://github.com/Mahmoud-Zaafan/asdfqer.git
130
- cd asdfqer
131
 
132
  # 2. Download weights
133
  huggingface-cli download Zaafan/sitesense-weights --local-dir models/
@@ -135,9 +165,14 @@ huggingface-cli download Zaafan/sitesense-weights --local-dir models/
135
  # 3. Configure environment
136
  cp .env.example .env
137
 
138
- # 4. Launch infrastructure + run pipeline
139
- docker compose up --build
 
 
140
  docker compose --profile pipeline up cv-inference
 
 
 
141
  ```
142
 
143
  ---
 
8
  - construction
9
  - aerial-vision
10
  - rf-detr
11
+ - yolo
12
+ - yolo26
13
  - dinov3
14
  - osnet
15
  - real-time
 
24
 
25
  **Real-Time Construction Equipment Monitoring via Aerial Computer Vision**
26
 
27
+ [![GitHub](https://img.shields.io/badge/GitHub-Repository-181717?logo=github&logoColor=white)](https://github.com/Mahmoud-Zaafan/SiteSense)
28
  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
29
  [![Python 3.11](https://img.shields.io/badge/Python-3.11-3776AB?logo=python&logoColor=white)](https://python.org)
30
  [![PyTorch 2.2+](https://img.shields.io/badge/PyTorch-2.2+-EE4C2C?logo=pytorch&logoColor=white)](https://pytorch.org)
 
33
 
34
  ## Overview
35
 
36
+ This repository hosts the trained model weights for [SiteSense](https://github.com/Mahmoud-Zaafan/SiteSense) β€” a real-time pipeline that **detects, tracks, identifies, and classifies the activity** of heavy construction equipment from drone/aerial video footage.
37
 
38
  The system processes each frame through a multi-phase pipeline:
39
 
40
  ```
41
+ Video Frame β†’ Detector (RF-DETR or YOLO26-L) β†’ BoT-SORT Tracking β†’ DINOv3 Re-ID β†’ Activity Classification β†’ Kafka Events
42
  ```
43
 
44
+ Two interchangeable detectors are provided. Switch at runtime via the `DETECTOR_TYPE` environment variable (`rfdetr` or `yolo`) β€” no rebuild required.
45
+
46
  ---
47
 
48
  ## Model Weights
49
 
50
+ | File | Size | Architecture | Task | Notes |
51
  |:---|:---:|:---|:---|:---|
52
+ | `rfdetr_construction.pth` | 122 MB | RF-DETR (Real-time Foundation DETR) | 8-class object detection | **Default** β€” best accuracy, NMS-free set prediction |
53
+ | `yolo26l_construction_v1.pt` | 51 MB | YOLO26-L (Ultralytics, 24.8 M params) | 8-class object detection | Faster alternative β€” STAL, NMS-free, ProgLoss |
54
+ | `dinov3_reid_head.pth` | 5.4 MB | Linear projection head (1536β†’256β†’128) | Equipment re-identification | Trained contrastively on tracked equipment crops |
55
  | `osnet_x0_25_msmt17.pt` | 2.9 MB | OSNet x0.25 | Appearance-based ReID for BoT-SORT | MSMT17 (pretrained) |
56
 
57
  > **Note:** The DINOv3 ViT-B/16 backbone (~327 MB) is **not included** here. It is auto-downloaded from [facebook/dinov3-vitb16-pretrain-lvd1689m](https://huggingface.co/facebook/dinov3-vitb16-pretrain-lvd1689m) on first run using your `HF_TOKEN`.
 
60
 
61
  ## Detection Classes
62
 
63
+ Both detectors are fine-tuned on the same merged MOCS + ACID v2 dataset to recognize **8 classes** of construction equipment from aerial perspectives:
64
 
65
  | ID | Class | ID | Class |
66
  |:---:|:---|:---:|:---|
 
73
 
74
  ## Training Results
75
 
76
+ Both detectors were trained on the **identical** train/val/test split (42,733 / 4,615 / 990 images) for direct comparison. Numbers below are on the held-out val split.
77
 
78
+ ### Detector Comparison (val split)
79
+
80
+ | Metric | RF-DETR (default) | YOLO26-L | Ξ” (RF βˆ’ YOLO) |
81
+ |:---|:---:|:---:|:---:|
82
+ | **mAP@50:95** | **0.761** | 0.740 | +2.1 pts |
83
+ | **mAP@50** | **0.910** | 0.905 | +0.5 pts |
84
+ | **F1 Score** | **0.886** | 0.876 | +1.0 pts |
85
+ | **Precision** | **0.929** | 0.924 | +0.5 pts |
86
+ | **Recall** | **0.847** | 0.834 | +1.3 pts |
87
+ | **FPS** (RTX 3050 Ti) | 9–10 | 11–13 | YOLO faster |
88
+
89
+ RF-DETR wins on **7 of 8** per-class AP50-95 (only bulldozer goes to YOLO26-L: 0.796 vs 0.785). The largest RF-DETR margins are on the most under-represented classes β€” **mobile_crane (+4.7 pts)** and **tower_crane (+6.0 pts)** β€” where set-based prediction handles long boom shapes and heavy occlusion better than YOLO's anchor-based head.
90
+
91
+ <details>
92
+ <summary><strong>Per-class AP@50:95</strong></summary>
93
+
94
+ | Class | RF-DETR | YOLO26-L |
95
+ |:---|:---:|:---:|
96
+ | Excavator | **0.811** | 0.806 |
97
+ | Dump Truck | **0.675** | 0.661 |
98
+ | Bulldozer | 0.785 | **0.796** |
99
+ | Wheel Loader | **0.810** | 0.792 |
100
+ | Mobile Crane | **0.675** | 0.628 |
101
+ | Tower Crane | **0.692** | 0.632 |
102
+ | Roller Compactor | **0.838** | 0.825 |
103
+ | Cement Mixer | **0.800** | 0.779 |
104
+
105
+ </details>
106
 
107
  ### DINOv3 Re-ID Projection Head
108
 
 
124
  huggingface-cli download Zaafan/sitesense-weights --local-dir models/
125
  ```
126
 
127
+ This pulls all four weight files at once into your `models/` directory β€” both detectors plus both Re-ID heads.
128
+
129
  ### Option B: Python API
130
 
131
  ```python
132
  from huggingface_hub import hf_hub_download
133
 
134
+ # Detectors (pick one or both)
135
+ hf_hub_download(repo_id="Zaafan/sitesense-weights", filename="rfdetr_construction.pth", local_dir="models/")
136
+ hf_hub_download(repo_id="Zaafan/sitesense-weights", filename="yolo26l_construction_v1.pt", local_dir="models/")
137
+
138
+ # Re-ID
139
+ hf_hub_download(repo_id="Zaafan/sitesense-weights", filename="dinov3_reid_head.pth", local_dir="models/")
140
+ hf_hub_download(repo_id="Zaafan/sitesense-weights", filename="osnet_x0_25_msmt17.pt", local_dir="models/")
141
  ```
142
 
143
  ### Option C: Auto-Download (Zero Setup)
 
145
  The SiteSense pipeline automatically downloads missing weights on first run:
146
 
147
  ```python
148
+ # In services/cv-inference/main.py β€” resolve_weights() handles this transparently.
149
+ # It picks the right file based on DETECTOR_TYPE (yolo or rfdetr).
150
+ weights_path = resolve_weights('yolo26l_construction_v1.pt') # local first, HF fallback
151
  ```
152
 
153
  ---
 
156
 
157
  ```bash
158
  # 1. Clone the repository
159
+ git clone https://github.com/Mahmoud-Zaafan/SiteSense.git
160
+ cd SiteSense
161
 
162
  # 2. Download weights
163
  huggingface-cli download Zaafan/sitesense-weights --local-dir models/
 
165
  # 3. Configure environment
166
  cp .env.example .env
167
 
168
+ # 4. Launch infrastructure
169
+ docker compose up --build -d
170
+
171
+ # 5a. Run pipeline with the default detector (YOLO26-L)
172
  docker compose --profile pipeline up cv-inference
173
+
174
+ # 5b. Or switch to RF-DETR at runtime β€” no rebuild needed
175
+ DETECTOR_TYPE=rfdetr docker compose --profile pipeline up cv-inference
176
  ```
177
 
178
  ---