meaculpitt commited on
Commit
1345dac
Β·
verified Β·
1 Parent(s): 7ac8eba

v3.28: remove petrol (element closed), dual-model only

Browse files
Files changed (1) hide show
  1. sv_gpu.py +2203 -0
sv_gpu.py ADDED
@@ -0,0 +1,2203 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Score Vision SN44 β€” Unified miner v3.28 (2026-04-08). R9c vehicle FP16 (mAP50=0.929). Person: TTA consensus.
3
+ Dual-model: vehicle (YOLO11m INT8 1280) + person (YOLO12s FP16 960 TRT).
4
+ Pose model: YOLOv8n-pose FP16 640 for false-positive filtering + keypoint box refinement.
5
+ Vehicle weights loaded from secondary HF repo (meaculpitt/ScoreVision-Vehicle).
6
+ Person weights loaded from primary HF repo (template downloads automatically).
7
+
8
+ Vehicle model (vehicle_weights.onnx):
9
+ Trained classes: 0=car, 1=bus, 2=truck, 3=motorcycle
10
+ Output: 0=bus, 1=car, 2=truck, 3=motorcycle. All classes scored (v3.20 bus fix).
11
+ Per-class confidence thresholds: car 0.45, truck 0.45, motorcycle 0.35.
12
+ Per-class aspect ratio bounds for FP filtering.
13
+ Single-pass (v3.19) β€” flip TTA removed for RTF improvement.
14
+
15
+ Person model (person_weights.onnx):
16
+ YOLO12s FP16 960px end2end [1,300,6]. Single class: 0=person.
17
+ Background TRT build: starts on CUDA immediately, builds TRT FP16 engine in background
18
+ thread (~18min on fresh node), swaps to TRT atomically when ready. Cached thereafter.
19
+ SAHI-style tiling: full + 2 adaptive tiles + flip TTA, max-conf NMS merge.
20
+
21
+ Pose model (pose_weights.onnx):
22
+ YOLOv8n-pose FP16 640px [1,56,8400]. 17 COCO keypoints.
23
+ Runs once on full image after person detection.
24
+ Anatomical keypoint scoring: weighted per-keypoint sum (head 0.38, upper 0.32, lower 0.30).
25
+ 1. Head keypoints visible β†’ never suppress, always refine box.
26
+ 2. Score >= 0.15 β†’ keep + refine. Score > 0 β†’ keep as-is. Score == 0 + large + low-conf β†’ suppress.
27
+ 3. Box refinement: blend detected box with tight keypoint bbox for better fit.
28
+ Face detector (optional): if face_session loaded, face inside box β†’ never suppress.
29
+
30
+ Vehicle + person models run on every image when hint='both'. All detections merged.
31
+ Vehicle eval uses cls_id 1-3. Person eval uses cls_id 0 only.
32
+ """
33
+
34
+ import os
35
+ import ctypes
36
+ import glob as _glob
37
+ import logging as _logging
38
+
39
+ _cuda_log = _logging.getLogger(__name__)
40
+
41
+ def _preload_cuda_libs():
42
+ """Pre-load CUDA + TensorRT libs from pip packages so ORT GPU/TRT providers work.
43
+
44
+ Search order for TRT libs (libnvinfer.so, libnvonnxparser.so):
45
+ 1. sys.path entries containing tensorrt_libs/ subdirectory
46
+ 2. site.getsitepackages() + user site-packages for tensorrt_libs/ or tensorrt/
47
+ 3. ctypes.util.find_library('nvinfer') as system-wide fallback
48
+ If not found, logs clearly and skips TRT β€” never attempts pip operations.
49
+ """
50
+ try:
51
+ import ctypes.util as _ctypes_util
52
+ lib_dirs = []
53
+ loaded = set()
54
+
55
+ # ── CUDA libs from nvidia pip packages ──
56
+ for mod_name in ['nvidia.cudnn', 'nvidia.cublas', 'nvidia.cuda_runtime',
57
+ 'nvidia.cufft', 'nvidia.curand', 'nvidia.cusolver',
58
+ 'nvidia.cusparse', 'nvidia.nvjitlink']:
59
+ try:
60
+ mod = __import__(mod_name, fromlist=['__file__'])
61
+ lib_dir = os.path.join(os.path.dirname(mod.__file__), 'lib')
62
+ if os.path.isdir(lib_dir) and lib_dir not in lib_dirs:
63
+ lib_dirs.append(lib_dir)
64
+ except ImportError:
65
+ pass
66
+
67
+ # ── TensorRT libs β€” multi-strategy search ──
68
+ import sys as _sys
69
+ _trt_dir = None
70
+
71
+ # Strategy 1: sys.path (covers standard pip installs)
72
+ for p in _sys.path:
73
+ for subdir in ('tensorrt_libs', 'tensorrt'):
74
+ candidate = os.path.join(p, subdir)
75
+ if os.path.isdir(candidate) and _glob.glob(os.path.join(candidate, 'libnvinfer*')):
76
+ _trt_dir = candidate
77
+ break
78
+ if _trt_dir:
79
+ break
80
+
81
+ # Strategy 2: site-packages directories (covers user installs, venvs)
82
+ if not _trt_dir:
83
+ import site
84
+ search_dirs = list(site.getsitepackages()) if hasattr(site, 'getsitepackages') else []
85
+ user_site = getattr(site, 'getusersitepackages', lambda: None)()
86
+ if user_site:
87
+ search_dirs.append(user_site)
88
+ # Also check common paths not always in site
89
+ search_dirs.extend([
90
+ '/usr/local/lib/python3.12/dist-packages',
91
+ os.path.expanduser('~/.local/lib/python3.12/site-packages'),
92
+ '/home/miner/.local/lib/python3.12/site-packages',
93
+ ])
94
+ for sp in search_dirs:
95
+ for subdir in ('tensorrt_libs', 'tensorrt'):
96
+ candidate = os.path.join(sp, subdir)
97
+ if os.path.isdir(candidate) and _glob.glob(os.path.join(candidate, 'libnvinfer*')):
98
+ _trt_dir = candidate
99
+ break
100
+ if _trt_dir:
101
+ break
102
+
103
+ # Strategy 3: ctypes.util.find_library (system-wide LD search)
104
+ if not _trt_dir:
105
+ nvinfer_path = _ctypes_util.find_library('nvinfer')
106
+ if nvinfer_path:
107
+ _cuda_log.info('TRT found via system library: %s', nvinfer_path)
108
+ try:
109
+ ctypes.CDLL(nvinfer_path, mode=ctypes.RTLD_GLOBAL)
110
+ loaded.add('nvinfer')
111
+ except OSError as e:
112
+ _cuda_log.warning('Failed to load system nvinfer: %s', e)
113
+
114
+ if _trt_dir:
115
+ if _trt_dir not in lib_dirs:
116
+ lib_dirs.append(_trt_dir)
117
+ _cuda_log.info('TRT libs directory: %s', _trt_dir)
118
+ elif 'nvinfer' not in loaded:
119
+ _cuda_log.info('TensorRT libs not found β€” TRT EP will be unavailable (CUDA EP still works)')
120
+
121
+ if not lib_dirs and not loaded:
122
+ _cuda_log.warning('No CUDA or TRT libs found to preload')
123
+ return
124
+
125
+ # Set LD_LIBRARY_PATH for any child processes / dlopen fallbacks
126
+ existing = os.environ.get('LD_LIBRARY_PATH', '')
127
+ os.environ['LD_LIBRARY_PATH'] = ':'.join(lib_dirs + ([existing] if existing else []))
128
+
129
+ # Load CUDA libs (glob all .so in nvidia dirs)
130
+ for lib_dir in lib_dirs:
131
+ if 'tensorrt' in lib_dir:
132
+ continue # TRT libs loaded selectively below
133
+ for so in sorted(_glob.glob(os.path.join(lib_dir, 'lib*.so*'))):
134
+ try:
135
+ ctypes.CDLL(so, mode=ctypes.RTLD_GLOBAL)
136
+ except OSError:
137
+ pass
138
+
139
+ # Load TRT libs selectively (only the essentials, not builder resources)
140
+ if _trt_dir:
141
+ for lib_name in ['libnvinfer.so', 'libnvinfer_plugin.so', 'libnvonnxparser.so']:
142
+ matches = _glob.glob(os.path.join(_trt_dir, lib_name + '*'))
143
+ if matches:
144
+ try:
145
+ ctypes.CDLL(matches[0], mode=ctypes.RTLD_GLOBAL)
146
+ loaded.add(lib_name.split('.')[0])
147
+ except OSError as e:
148
+ _cuda_log.warning('Failed to load %s: %s', lib_name, e)
149
+ else:
150
+ _cuda_log.info('%s not found in %s', lib_name, _trt_dir)
151
+
152
+ if loaded:
153
+ _cuda_log.info('Preloaded libs: %s', ', '.join(sorted(loaded)))
154
+ except Exception as e:
155
+ _cuda_log.warning('CUDA/TRT preload error: %s', e)
156
+
157
+ _preload_cuda_libs()
158
+
159
+
160
+
161
+ from pathlib import Path
162
+ import math
163
+ import time
164
+ import logging
165
+
166
+ import cv2
167
+ import numpy as np
168
+ import onnxruntime as ort
169
+ from numpy import ndarray
170
+ from pydantic import BaseModel
171
+
172
+ import json
173
+ import threading
174
+ from datetime import datetime, timezone
175
+ from concurrent.futures import ThreadPoolExecutor, as_completed
176
+ import inspect
177
+
178
+ # ── Latency logger (per-request timing) ─────────────────────────────────
179
+ import logging as _lat_logging
180
+ _lat_logger = _lat_logging.getLogger("sv_latency")
181
+ _lat_logger.setLevel(_lat_logging.INFO)
182
+ _lat_logger.propagate = False
183
+ if not _lat_logger.handlers:
184
+ try:
185
+ import tempfile as _lat_tempfile
186
+ # Try /home/miner first (Lium), fall back to /tmp (Chutes cloud)
187
+ for _lat_path in ["/home/miner/latency.log", _lat_tempfile.gettempdir() + "/latency.log"]:
188
+ try:
189
+ _lat_fh = _lat_logging.FileHandler(_lat_path)
190
+ _lat_fh.setFormatter(_lat_logging.Formatter(
191
+ "%(asctime)s.%(msecs)03d %(message)s", datefmt="%Y-%m-%d %H:%M:%S"))
192
+ _lat_logger.addHandler(_lat_fh)
193
+ break
194
+ except (OSError, PermissionError):
195
+ continue
196
+ except Exception:
197
+ pass # No file logging β€” latency still logged via main logger
198
+
199
+ logger = logging.getLogger(__name__)
200
+
201
+ # ── Vehicle config ──────────────────────────────────────────────────────────
202
+ VEH_MODEL_TO_OUT: dict[int, int] = {0: 1, 1: 0, 2: 2, 3: 3} # bus→0 (validator expects bus at idx 0)
203
+ VEH_SKIP_CLS = set() # v3.20: bus now scored (cls_id=0). Element detection prevents collision.
204
+ VEH_NUM_CLASSES = 4
205
+ VEH_CONF_THRES = 0.30 # Low decode threshold for TTA (final filter is per-class)
206
+ VEH_TTA_CONF = 0.20 # TTA flip pass decode threshold
207
+ VEH_NMS_IOU = 0.50
208
+
209
+ # ── Per-class vehicle confidence thresholds (output cls_id) ────────────────
210
+ # Raising from uniform 0.35: reduces FP (avg 4.1 FFPI β†’ target <2.0)
211
+ VEH_CLASS_CONF: dict[int, float] = {
212
+ 1: 0.60, # car β€” raised from 0.50, most FP-prone class (75% of training data)
213
+ 2: 0.45, # truck β€” keep
214
+ 3: 0.50, # motorcycle β€” raised from 0.45, small targets prone to FP
215
+ 0: 0.45, # bus β€” keep
216
+ }
217
+
218
+ # ── Per-class vehicle aspect ratio bounds (min_ratio, max_ratio) ───────────
219
+ # ratio = max(w,h) / min(w,h). Generous bounds to avoid suppressing valid detections.
220
+ VEH_CLASS_ASPECT: dict[int, float] = {
221
+ 1: 5.0, # car β€” rarely > 5:1 from any angle
222
+ 2: 6.0, # truck β€” can be elongated
223
+ 3: 4.5, # motorcycle β€” compact, rarely very elongated
224
+ 0: 8.0, # bus β€” elongated body
225
+ }
226
+
227
+ # ── Per-class minimum area (pixels) ───────────────────────────────────────
228
+ VEH_CLASS_MIN_AREA: dict[int, int] = {
229
+ 1: 196, # car β€” 14x14 min
230
+ 2: 256, # truck β€” 16x16 min (should be at least medium-sized)
231
+ 3: 100, # motorcycle β€” 10x10 min (can be very small in distance)
232
+ 0: 400, # bus β€” 20x20 min
233
+ }
234
+
235
+ # ── Vehicle box sanity filters (global fallbacks) ─────────────────────────
236
+ VEH_MIN_WH = 20 # was 8. Kills tiny horizon artifacts (confirmed: h<25 extras on block 7900800)
237
+ VEH_MIN_AREA = 100
238
+ VEH_MAX_ASPECT = 8.0
239
+ VEH_MAX_AREA_RATIO = 0.95
240
+ VEH_MAX_DET = 40
241
+
242
+ # ── Vehicle parts confirmation config ────────────────────────────────────
243
+ # Cross-validates vehicle detections using person detections, OpenCV analysis,
244
+ # and optional license plate detector. Small/distant vehicles exempt.
245
+ VEH_PARTS_ENABLED = True # Master switch for parts confirmation
246
+ VEH_PARTS_SMALL_AREA = 0.004 # Below this area ratio: exempt from suppression
247
+ VEH_PARTS_FP_CONF = 0.50 # Below this conf + large + unconfirmed β†’ suppress
248
+ VEH_PARTS_FP_CONF_STRICT = 0.55 # Stricter threshold when plate model loaded but no plate
249
+ VEH_PARTS_FP_AREA = 0.03 # Above this area ratio β†’ eligible for FP suppression
250
+ # Confidence boosts for confirmed parts (additive)
251
+ VEH_PARTS_BOOST_DRIVER = 0.08 # Person in driver/passenger region
252
+ VEH_PARTS_BOOST_RIDER = 0.10 # Person on motorcycle (overlap + optional lean)
253
+ VEH_PARTS_BOOST_HL = 0.05 # Headlight pair detected
254
+ VEH_PARTS_BOOST_PLATE = 0.12 # License plate detected (Phase 2)
255
+ VEH_PARTS_BOOST_WINDOW = 0.06 # Bus window pattern on truck
256
+ # Headlight detection thresholds
257
+ VEH_PARTS_HL_MIN_PX = 60 # Min vehicle width (px) for headlight check
258
+ VEH_PARTS_HL_BRIGHT = 200 # Grayscale threshold for bright spots
259
+ VEH_PARTS_HL_MIN_BLOB = 15 # Min contour area for headlight candidate
260
+ # Window pattern detection (bus/coach)
261
+ VEH_PARTS_WINDOW_MIN_PX = 100 # Min vehicle width for window pattern check
262
+ VEH_PARTS_WINDOW_MIN_PEAKS = 3 # Min periodic edge peaks for window confirmation
263
+ # Motorcycle rider pose
264
+ VEH_PARTS_RIDER_LEAN_DEG = 15.0 # Min torso lean from vertical (degrees) for rider pose
265
+ # Plate detection thresholds
266
+ VEH_PARTS_PLATE_MIN_PX = 80 # plates visible at ~80px vehicle width (was 120)
267
+ VEH_PARTS_PLATE_CONF = 0.35 # Min plate detection confidence
268
+
269
+ # ── Person config (TTA consensus) ───────────────────────────────────────────
270
+ PER_CONF_LOW = 0.60 # Was 0.55. Raised 2026-04-05 to match top peer precision floor after
271
+ # observing the 3-way tied 52-box group (conf_min=0.585, composite=0.280) was
272
+ # beaten by top peer's 44-box response (conf_min=0.716, composite=0.377).
273
+ # 0.60 targets the precision/recall inflection point without the full 0.65+
274
+ # aggression that might cost recall on sparse scenes.
275
+ PER_CONF_HIGH = 0.58 # NOTE: dead code, not referenced anywhere. Kept for reference only.
276
+ PER_CONSENSUS_IOU = 0.50
277
+ PER_RTF_BUDGET = 8.0
278
+
279
+ # ── Person box sanity filters ──────────────────────────────────────────────
280
+ PER_MIN_WH = 8
281
+ PER_MIN_AREA = 14 * 14
282
+ PER_MAX_ASPECT = 6.0
283
+ PER_MAX_AREA_RATIO = 0.80
284
+
285
+ # ── Person tiling config (SAHI-inspired) ────────────────────────────────────
286
+ PER_TILE_OVERLAP = 0.20 # 20% overlap between tiles
287
+ PER_TILE_MIN_DIM_RATIO = 1.15 # tile when image dim > model_dim * this (~1104px for 960 model)
288
+ PER_TILE_CONF = 0.55 # raised from 0.40 to match PER_CONF_LOW
289
+ PER_NMS_IOU = 0.50 # NMS IoU for merging across passes (max-conf wins)
290
+ PER_MAX_DET = 100 # Loose safety ceiling ONLY β€” not a count cap. Strategy is confidence-floor:
291
+ # PER_CONF_LOW=0.60 is the real filter; any box above threshold passes.
292
+ # Raised from 50 after 2026-04-05 investigation: top peers emit 77+ boxes on
293
+ # crowd eval images, and the currently-running chute (rev 6b9d0d6) caps at 30
294
+ # which is demonstrably hitting mAP50 0.39 on person crowd blocks. 50 would
295
+ # still clip. 100 gives real headroom β€” only triggers on pathological runaway
296
+ # FP cases where NMS has already failed. Previous values (10 spec'd, 50 first
297
+ # fix) were too tight. See FAILURE_ANALYSIS.md (2026-04-05).
298
+
299
+ # ── TTA consensus thresholds (DMSC19-inspired graduated approach) ────────────
300
+ # Cross-view confirmation eliminates the soft-NMS confidence decay bug.
301
+ # Instead of concatenate+soft-NMS (which decayed confs below floor), we match
302
+ # boxes across original+flip views and apply graduated confidence thresholds.
303
+ PER_TTA_MATCH_IOU = 0.50 # IoU threshold for cross-view box matching
304
+ PER_TTA_CONF_BOTH = 0.50 # Confirmed by both views: lower threshold (high confidence)
305
+ PER_TTA_CONF_ORIG = 0.60 # Original-only: standard threshold (PER_CONF_LOW)
306
+ PER_TTA_CONF_FLIP = 0.75 # Flip-only: strict (flip-only detections are likely FP)
307
+
308
+ # ── Frame quality gating (Laplacian variance) ───────────────────────────────
309
+ PER_BLUR_THRESHOLD = 50.0 # Laplacian variance below this = severely blurry
310
+ PER_BLUR_CONF_PENALTY = 0.85 # multiply confs by this for blurry frames (reduce FP)
311
+
312
+ # ── Adaptive CLAHE config ───────────────────────────────────────────────────
313
+ PER_CLAHE_CLIP = 2.0 # mild CLAHE (was 12.0, too aggressive)
314
+ PER_CLAHE_CONTRAST_THRESH = 40.0 # only apply CLAHE when L-channel std < this
315
+
316
+ # ── Perspective scaling confidence penalty ─────────────────────────────────
317
+ PERSP_DEVIATION_THRESH = 3.0 # ratio >3x or <1/3x triggers penalty
318
+ PERSP_CONF_PENALTY = 0.85 # multiply conf by this for perspective violations
319
+ PERSP_MIN_DETECTIONS = 3 # need β‰₯3 detections to estimate model
320
+ PERSP_MIN_Y_SPREAD = 0.15 # min y-spread as fraction of image height
321
+
322
+ # ── Pose FP filter + box refinement config ──────────────────────────────────
323
+ POSE_CONF_THRESH = 0.25 # Minimum confidence for pose detection
324
+ POSE_NMS_IOU = 0.65 # NMS IoU threshold for pose detections
325
+ POSE_MATCH_IOU = 0.30 # IoU threshold to match pose to person box
326
+ POSE_KP_CONF = 0.3 # Keypoint visibility threshold
327
+ POSE_FP_MAX_CONF = 0.65 # Max conf below which unmatched large boxes are suppressed
328
+ POSE_FP_MIN_AREA = 0.04 # Min area ratio (of image) for FP suppression to apply
329
+ POSE_REFINE_BLEND = 0.25 # Blend factor for keypoint box refinement (0=original, 1=keypoint)
330
+ POSE_KP_PAD = 0.10 # Padding around keypoint tight bbox
331
+
332
+ # ── Anatomical keypoint scoring ─────────────────────────────────────────────
333
+ # COCO keypoints: 0=nose 1=l_eye 2=r_eye 3=l_ear 4=r_ear
334
+ # 5=l_shoulder 6=r_shoulder 7=l_elbow 8=r_elbow 9=l_wrist 10=r_wrist
335
+ # 11=l_hip 12=r_hip 13=l_knee 14=r_knee 15=l_ankle 16=r_ankle
336
+ POSE_HEAD_KP = [0, 1, 2, 3, 4] # nose + eyes + ears
337
+ POSE_UPPER_KP = [5, 6, 7, 8, 9, 10] # shoulders + elbows + wrists
338
+ POSE_LOWER_KP = [11, 12, 13, 14, 15, 16] # hips + knees + ankles
339
+ # Per-keypoint weights (head > upper > lower). Sum of all = 1.0.
340
+ POSE_KP_WEIGHTS = np.array([
341
+ 0.12, # 0 nose β€” strongest single indicator
342
+ 0.08, # 1 left_eye
343
+ 0.08, # 2 right_eye
344
+ 0.05, # 3 left_ear
345
+ 0.05, # 4 right_ear
346
+ 0.07, # 5 left_shoulder
347
+ 0.07, # 6 right_shoulder
348
+ 0.05, # 7 left_elbow
349
+ 0.05, # 8 right_elbow
350
+ 0.04, # 9 left_wrist
351
+ 0.04, # 10 right_wrist
352
+ 0.05, # 11 left_hip
353
+ 0.05, # 12 right_hip
354
+ 0.04, # 13 left_knee
355
+ 0.04, # 14 right_knee
356
+ 0.03, # 15 left_ankle
357
+ 0.04, # 16 right_ankle
358
+ ], dtype=np.float32) # sums to 1.0
359
+ POSE_ANAT_REFINE_THRESH = 0.15 # Score above which we refine box with keypoints
360
+ POSE_ANAT_SUPPRESS_THRESH = 0.0 # Score at or below which suppression is considered
361
+
362
+ # ── TensorRT engine cache config ────────────────────────────────────────────
363
+ TRT_CACHE_PATH = "/tmp/trt_engine_cache"
364
+ TRT_FP16 = True
365
+ TRT_WORKSPACE_GB = 4
366
+
367
+ # ── Shared ──────────────────────────────────────────────────────────────────
368
+ WBF_SKIP_THR = 0.0001
369
+
370
+ # ── Speed config ────────────────────────────────────────────────────────────
371
+ ENABLE_TTA = True
372
+ ENABLE_PARALLEL = True
373
+
374
+ # ── Secondary HF repo for vehicle weights ───────────────────────────────────
375
+ VEHICLE_HF_REPO = "meaculpitt/ScoreVision-Vehicle"
376
+
377
+
378
+
379
+ def _wbf_multi(boxes_list, scores_list, labels_list, iou_thr=0.55, skip_thr=0.0001):
380
+ """Weighted Boxes Fusion (multi-class). Boxes in [0,1] normalized coords."""
381
+ if not boxes_list:
382
+ return np.empty((0, 4)), np.empty(0), np.empty(0)
383
+
384
+ all_b, all_s, all_l = [], [], []
385
+ for bx, sc, lb in zip(boxes_list, scores_list, labels_list):
386
+ for i in range(len(bx)):
387
+ if sc[i] < skip_thr:
388
+ continue
389
+ all_b.append(bx[i])
390
+ all_s.append(sc[i])
391
+ all_l.append(int(lb[i]))
392
+
393
+ if not all_b:
394
+ return np.empty((0, 4)), np.empty(0), np.empty(0)
395
+
396
+ all_b = np.array(all_b)
397
+ all_s = np.array(all_s)
398
+ all_l = np.array(all_l, dtype=int)
399
+
400
+ fused_b, fused_s, fused_l = [], [], []
401
+ for cls in np.unique(all_l):
402
+ m = all_l == cls
403
+ cb, cs = all_b[m], all_s[m]
404
+ order = cs.argsort()[::-1]
405
+ cb, cs = cb[order], cs[order]
406
+
407
+ clusters, cboxes = [], []
408
+ for i in range(len(cb)):
409
+ matched, best_iou = -1, iou_thr
410
+ for ci, cbox in enumerate(cboxes):
411
+ xx1 = max(cb[i, 0], cbox[0])
412
+ yy1 = max(cb[i, 1], cbox[1])
413
+ xx2 = min(cb[i, 2], cbox[2])
414
+ yy2 = min(cb[i, 3], cbox[3])
415
+ inter = max(0, xx2 - xx1) * max(0, yy2 - yy1)
416
+ a1 = (cb[i, 2] - cb[i, 0]) * (cb[i, 3] - cb[i, 1])
417
+ a2 = (cbox[2] - cbox[0]) * (cbox[3] - cbox[1])
418
+ iou = inter / (a1 + a2 - inter + 1e-9)
419
+ if iou > best_iou:
420
+ best_iou = iou
421
+ matched = ci
422
+ if matched >= 0:
423
+ clusters[matched].append(i)
424
+ idxs = clusters[matched]
425
+ w = cs[idxs]
426
+ cboxes[matched] = (cb[idxs] * w[:, None]).sum(0) / w.sum()
427
+ else:
428
+ clusters.append([i])
429
+ cboxes.append(cb[i].copy())
430
+
431
+ for ci, idxs in enumerate(clusters):
432
+ fused_b.append(cboxes[ci])
433
+ fused_s.append(cs[idxs].mean())
434
+ fused_l.append(cls)
435
+
436
+ if not fused_b:
437
+ return np.empty((0, 4)), np.empty(0), np.empty(0)
438
+ return np.array(fused_b), np.array(fused_s), np.array(fused_l)
439
+
440
+
441
+ def _wbf_single(boxes_list, scores_list, iou_thr=0.45, skip_thr=0.0001):
442
+ """Weighted Boxes Fusion (single-class). Boxes in [0,1] normalized coords."""
443
+ if not boxes_list:
444
+ return np.empty((0, 4)), np.empty(0)
445
+
446
+ all_b, all_s = [], []
447
+ for bx, sc in zip(boxes_list, scores_list):
448
+ for i in range(len(bx)):
449
+ if sc[i] < skip_thr:
450
+ continue
451
+ all_b.append(bx[i])
452
+ all_s.append(sc[i])
453
+
454
+ if not all_b:
455
+ return np.empty((0, 4)), np.empty(0)
456
+
457
+ all_b = np.array(all_b)
458
+ all_s = np.array(all_s)
459
+ order = all_s.argsort()[::-1]
460
+ all_b, all_s = all_b[order], all_s[order]
461
+
462
+ clusters, cboxes = [], []
463
+ for i in range(len(all_b)):
464
+ matched, best_iou = -1, iou_thr
465
+ for ci, cbox in enumerate(cboxes):
466
+ xx1 = max(all_b[i, 0], cbox[0])
467
+ yy1 = max(all_b[i, 1], cbox[1])
468
+ xx2 = min(all_b[i, 2], cbox[2])
469
+ yy2 = min(all_b[i, 3], cbox[3])
470
+ inter = max(0, xx2 - xx1) * max(0, yy2 - yy1)
471
+ a1 = (all_b[i, 2] - all_b[i, 0]) * (all_b[i, 3] - all_b[i, 1])
472
+ a2 = (cbox[2] - cbox[0]) * (cbox[3] - cbox[1])
473
+ iou = inter / (a1 + a2 - inter + 1e-9)
474
+ if iou > best_iou:
475
+ best_iou = iou
476
+ matched = ci
477
+ if matched >= 0:
478
+ clusters[matched].append(i)
479
+ idxs = clusters[matched]
480
+ w = all_s[idxs]
481
+ cboxes[matched] = (all_b[idxs] * w[:, None]).sum(0) / w.sum()
482
+ else:
483
+ clusters.append([i])
484
+ cboxes.append(all_b[i].copy())
485
+
486
+ fused_b, fused_s = [], []
487
+ for ci, idxs in enumerate(clusters):
488
+ fused_b.append(cboxes[ci])
489
+ fused_s.append(all_s[idxs].mean())
490
+
491
+ if not fused_b:
492
+ return np.empty((0, 4)), np.empty(0)
493
+ return np.array(fused_b), np.array(fused_s)
494
+
495
+
496
+ def _nms_per_class_boost(boxes, scores, labels, iou_thr=0.50):
497
+ """Per-class hard NMS with max-score cluster boosting.
498
+ Surviving box keeps its coordinates but gets the max confidence
499
+ among all boxes in its overlap cluster."""
500
+ if len(boxes) == 0:
501
+ return np.empty((0, 4)), np.empty(0), np.empty(0, dtype=int)
502
+
503
+ out_b, out_s, out_l = [], [], []
504
+ for cls in np.unique(labels):
505
+ m = labels == cls
506
+ cb, cs = boxes[m], scores[m]
507
+ order = cs.argsort()[::-1]
508
+ cb, cs = cb[order], cs[order]
509
+
510
+ suppressed = set()
511
+ for i in range(len(cb)):
512
+ if i in suppressed:
513
+ continue
514
+ max_score = float(cs[i])
515
+ for j in range(i + 1, len(cb)):
516
+ if j in suppressed:
517
+ continue
518
+ xx1 = max(cb[i, 0], cb[j, 0])
519
+ yy1 = max(cb[i, 1], cb[j, 1])
520
+ xx2 = min(cb[i, 2], cb[j, 2])
521
+ yy2 = min(cb[i, 3], cb[j, 3])
522
+ inter = max(0, xx2 - xx1) * max(0, yy2 - yy1)
523
+ a1 = (cb[i, 2] - cb[i, 0]) * (cb[i, 3] - cb[i, 1])
524
+ a2 = (cb[j, 2] - cb[j, 0]) * (cb[j, 3] - cb[j, 1])
525
+ iou = inter / (a1 + a2 - inter + 1e-9)
526
+ if iou >= iou_thr:
527
+ max_score = max(max_score, float(cs[j]))
528
+ suppressed.add(j)
529
+ out_b.append(cb[i])
530
+ out_s.append(max_score)
531
+ out_l.append(cls)
532
+
533
+ if not out_b:
534
+ return np.empty((0, 4)), np.empty(0), np.empty(0, dtype=int)
535
+ return np.array(out_b), np.array(out_s), np.array(out_l, dtype=int)
536
+
537
+
538
+ class BoundingBox(BaseModel):
539
+ x1: int
540
+ y1: int
541
+ x2: int
542
+ y2: int
543
+ cls_id: int
544
+ conf: float
545
+
546
+
547
+ class TVFrameResult(BaseModel):
548
+ frame_id: int
549
+ boxes: list[BoundingBox]
550
+ keypoints: list[tuple[int, int]]
551
+
552
+
553
+ class Miner:
554
+ def __init__(self, path_hf_repo: Path) -> None:
555
+ self.path_hf_repo = path_hf_repo
556
+
557
+ # Vehicle model β€” download from secondary HF repo with safety guard
558
+ t0 = time.monotonic()
559
+ veh_path = None # Path to secondary repo snapshot (also used for plate model)
560
+ try:
561
+ from huggingface_hub import snapshot_download as _sd
562
+ veh_path = Path(_sd(VEHICLE_HF_REPO))
563
+ veh_weights = str(veh_path / "vehicle_weights.onnx")
564
+ logger.info(f"[init] Vehicle weights from {VEHICLE_HF_REPO} in {time.monotonic()-t0:.1f}s")
565
+ except Exception as e:
566
+ # Fallback: try loading from primary repo (backward compat)
567
+ logger.warning(f"[init] Vehicle secondary repo failed ({e}), trying primary repo")
568
+ veh_weights = str(path_hf_repo / "vehicle_weights.onnx")
569
+ if not Path(veh_weights).exists():
570
+ raise FileNotFoundError(f"vehicle_weights.onnx not found in primary or secondary repo") from e
571
+
572
+ self.veh_session = ort.InferenceSession(
573
+ veh_weights,
574
+ providers=["CUDAExecutionProvider", "CPUExecutionProvider"],
575
+ )
576
+ veh_actual = self.veh_session.get_providers()
577
+ logger.warning(f"[init] Vehicle session ACTIVE providers: {veh_actual}")
578
+ if "CUDAExecutionProvider" not in veh_actual:
579
+ logger.error("[init] ⚠ VEHICLE IS ON CPU β€” CUDA EP NOT ACTIVE")
580
+ self.veh_input_name = self.veh_session.get_inputs()[0].name
581
+ veh_shape = self.veh_session.get_inputs()[0].shape
582
+ self.veh_h = int(veh_shape[2])
583
+ self.veh_w = int(veh_shape[3])
584
+
585
+ # FP32 fallback β€” lazy-loaded on first trigger to save ~300MB VRAM at startup
586
+ self.veh_session_fp32 = None
587
+ self._veh_fp32_path = None
588
+ try:
589
+ veh_fp32 = str(veh_path / "vehicle_weights_fp32.onnx") if veh_path else None
590
+ if veh_fp32 and Path(veh_fp32).exists():
591
+ self._veh_fp32_path = veh_fp32
592
+ logger.info("[init] Vehicle FP32 fallback available (lazy-load)")
593
+ else:
594
+ logger.info("[init] Vehicle FP32 fallback not available")
595
+ except Exception as e:
596
+ logger.warning(f"[init] Vehicle FP32 fallback path check failed: {e}")
597
+
598
+ # Person model β€” CUDA immediately, TRT engine builds in background
599
+ per_onnx = str(path_hf_repo / "person_weights.onnx")
600
+ self.per_session = ort.InferenceSession(
601
+ per_onnx,
602
+ providers=["CUDAExecutionProvider", "CPUExecutionProvider"],
603
+ )
604
+ self.per_input_name = self.per_session.get_inputs()[0].name
605
+ per_shape = self.per_session.get_inputs()[0].shape
606
+ self.per_h = int(per_shape[2])
607
+ self.per_w = int(per_shape[3])
608
+ self._trt_ready = False
609
+ logger.info("[init] Person model: CUDA (TRT build starting in background)")
610
+
611
+ # Launch background TRT engine build
612
+ os.makedirs(TRT_CACHE_PATH, exist_ok=True)
613
+ threading.Thread(
614
+ target=self._build_trt_engine,
615
+ args=(per_onnx,),
616
+ daemon=True,
617
+ name="trt-builder",
618
+ ).start()
619
+
620
+ # Pose model β€” for FP filtering + box refinement
621
+ pose_path = path_hf_repo / "pose_weights.onnx"
622
+ if pose_path.exists():
623
+ self.pose_session = ort.InferenceSession(
624
+ str(pose_path),
625
+ providers=["CUDAExecutionProvider", "CPUExecutionProvider"],
626
+ )
627
+ self.pose_input_name = self.pose_session.get_inputs()[0].name
628
+ pose_shape = self.pose_session.get_inputs()[0].shape
629
+ self.pose_h = int(pose_shape[2])
630
+ self.pose_w = int(pose_shape[3])
631
+ logger.info(f"[init] Pose model loaded: {self.pose_h}x{self.pose_w}")
632
+ else:
633
+ self.pose_session = None
634
+ logger.info("[init] No pose model found, FP filter disabled")
635
+
636
+ # Face detector (SCRFD-500M) β€” confirms person boxes, prevents FP suppression
637
+ face_path = path_hf_repo / "face_weights.onnx"
638
+ if face_path.exists():
639
+ self.face_session = ort.InferenceSession(
640
+ str(face_path),
641
+ providers=["CUDAExecutionProvider", "CPUExecutionProvider"],
642
+ )
643
+ self.face_input_name = self.face_session.get_inputs()[0].name
644
+ logger.info("[init] Face model (SCRFD-500M) loaded")
645
+ else:
646
+ self.face_session = None
647
+ logger.info("[init] No face model found")
648
+
649
+ # License plate detector β€” loaded from secondary HF repo alongside vehicle weights
650
+ plate_path = veh_path / "plate_weights.onnx" if veh_path else None
651
+ if plate_path and plate_path.exists():
652
+ self.plate_session = ort.InferenceSession(
653
+ str(plate_path),
654
+ providers=["CUDAExecutionProvider", "CPUExecutionProvider"],
655
+ )
656
+ self.plate_input_name = self.plate_session.get_inputs()[0].name
657
+ plate_shape = self.plate_session.get_inputs()[0].shape
658
+ self.plate_h = int(plate_shape[2]) if isinstance(plate_shape[2], int) else 640
659
+ self.plate_w = int(plate_shape[3]) if isinstance(plate_shape[3], int) else 640
660
+ logger.info(f"[init] Plate model loaded: {self.plate_h}x{self.plate_w}")
661
+ else:
662
+ self.plate_session = None
663
+ logger.info("[init] No plate model found, plate confirmation disabled")
664
+
665
+
666
+ # Pose cache β€” populated by _pose_filter_refine, read by vehicle parts
667
+ self._cached_pose_data = None
668
+
669
+ # Thread pool for parallel inference
670
+ self._executor = ThreadPoolExecutor(max_workers=2)
671
+
672
+ # Log provider info
673
+ veh_prov = self.veh_session.get_providers()
674
+ per_prov = self.per_session.get_providers()
675
+ logger.info(f"Vehicle ORT providers: {veh_prov}")
676
+ logger.info(f"Person ORT providers: {per_prov} (TRT building in background)")
677
+ logger.info(f"TTA={ENABLE_TTA} PARALLEL={ENABLE_PARALLEL}")
678
+
679
+ def _build_trt_engine(self, per_onnx):
680
+ """Build TRT FP16 engine in background, swap person session when ready.
681
+
682
+ On fresh nodes: ~18 min to compile. Cached engine loads in <1s.
683
+ During build, inference uses CUDAExecutionProvider (passes RTF at ~78ms).
684
+ After build, atomically swaps to TRT session (~29ms pipeline).
685
+ """
686
+ try:
687
+ trt_opts = {
688
+ "trt_fp16_enable": str(TRT_FP16).lower(),
689
+ "trt_max_workspace_size": str(TRT_WORKSPACE_GB << 30),
690
+ "trt_engine_cache_enable": "true",
691
+ "trt_engine_cache_path": TRT_CACHE_PATH,
692
+ }
693
+ t0 = time.monotonic()
694
+ logger.info("[trt-build] Creating TRT session (may take ~18min on fresh node)...")
695
+ trt_session = ort.InferenceSession(
696
+ per_onnx,
697
+ providers=[
698
+ ("TensorrtExecutionProvider", trt_opts),
699
+ "CUDAExecutionProvider",
700
+ "CPUExecutionProvider",
701
+ ],
702
+ )
703
+
704
+ provs = trt_session.get_providers()
705
+ if "TensorrtExecutionProvider" not in provs:
706
+ logger.warning("[trt-build] TRT provider not active (%s), keeping CUDA", provs)
707
+ return
708
+
709
+ # Run dummy inference to fully materialize the engine
710
+ inp_name = trt_session.get_inputs()[0].name
711
+ inp_shape = trt_session.get_inputs()[0].shape
712
+ dummy = np.zeros((1, 3, int(inp_shape[2]), int(inp_shape[3])), dtype=np.float32)
713
+ trt_session.run(None, {inp_name: dummy})
714
+
715
+ dt = time.monotonic() - t0
716
+ logger.info("[trt-build] TRT engine ready in %.1fs β€” swapping person session", dt)
717
+
718
+ # Atomic swap β€” Python GIL makes single attribute assignment safe.
719
+ # Any in-flight inference holds a reference to the old session, which
720
+ # stays alive until that inference completes.
721
+ self.per_session = trt_session
722
+ self._trt_ready = True
723
+
724
+ logger.info("[trt-build] Person model now using TensorRT FP16")
725
+ except Exception as e:
726
+ logger.warning("[trt-build] TRT build failed (%s), keeping CUDA", e)
727
+
728
+ def __repr__(self) -> str:
729
+ trt_status = "TRT" if self._trt_ready else "CUDA (TRT building)"
730
+ return f"Unified Miner v3.16 β€” person={trt_status}, background TRT engine build"
731
+
732
+ # ── Vehicle preprocessing (letterbox) ───────────────────────────────────
733
+
734
+ def _veh_letterbox(self, img):
735
+ h, w = img.shape[:2]
736
+ r = min(self.veh_h / h, self.veh_w / w)
737
+ nw, nh = int(round(w * r)), int(round(h * r))
738
+ img_r = cv2.resize(img, (nw, nh), interpolation=cv2.INTER_LINEAR)
739
+ dw, dh = self.veh_w - nw, self.veh_h - nh
740
+ pl, pt = dw // 2, dh // 2
741
+ img_p = cv2.copyMakeBorder(
742
+ img_r, pt, dh - pt, pl, dw - pl,
743
+ cv2.BORDER_CONSTANT, value=(114, 114, 114),
744
+ )
745
+ return img_p, r, pl, pt
746
+
747
+ def _veh_preprocess(self, image_bgr):
748
+ img_p, ratio, pl, pt = self._veh_letterbox(image_bgr)
749
+ rgb = cv2.cvtColor(img_p, cv2.COLOR_BGR2RGB)
750
+ inp = rgb.astype(np.float32) / 255.0
751
+ inp = np.ascontiguousarray(inp.transpose(2, 0, 1)[np.newaxis])
752
+ return inp, ratio, pl, pt
753
+
754
+ def _veh_decode(self, raw, ratio, pl, pt, ow, oh, conf_thresh):
755
+ pred = raw[0]
756
+ if pred.shape[0] < pred.shape[1]:
757
+ pred = pred.T
758
+ cls_scores = pred[:, 4:]
759
+ cls_ids = np.argmax(cls_scores, axis=1)
760
+ confs = np.max(cls_scores, axis=1)
761
+ mask = confs >= conf_thresh
762
+ if not mask.any():
763
+ return np.empty((0, 4)), np.empty(0), np.empty(0, dtype=int)
764
+ bx, confs, cls_ids = pred[mask, :4], confs[mask], cls_ids[mask]
765
+ cx, cy, bw, bh = bx[:, 0], bx[:, 1], bx[:, 2], bx[:, 3]
766
+ x1 = np.clip((cx - bw / 2 - pl) / ratio, 0, ow)
767
+ y1 = np.clip((cy - bh / 2 - pt) / ratio, 0, oh)
768
+ x2 = np.clip((cx + bw / 2 - pl) / ratio, 0, ow)
769
+ y2 = np.clip((cy + bh / 2 - pt) / ratio, 0, oh)
770
+ return np.stack([x1, y1, x2, y2], axis=1), confs, cls_ids
771
+
772
+ def _veh_run_pass(self, image_bgr, conf_thresh, session=None):
773
+ if session is None:
774
+ session = self.veh_session
775
+ oh, ow = image_bgr.shape[:2]
776
+ inp, ratio, pl, pt = self._veh_preprocess(image_bgr)
777
+ raw = session.run(None, {self.veh_input_name: inp})[0]
778
+ return self._veh_decode(raw, ratio, pl, pt, ow, oh, conf_thresh)
779
+
780
+ def _infer_vehicle_core(self, image_bgr, session=None):
781
+ """Core vehicle detection pipeline. session param allows FP32 fallback."""
782
+ oh, ow = image_bgr.shape[:2]
783
+
784
+ # Primary pass
785
+ boxes, confs, cls_ids = self._veh_run_pass(image_bgr, VEH_CONF_THRES, session)
786
+
787
+ # Flip TTA pass β€” horizontal flip, mirror boxes back
788
+ if ENABLE_TTA:
789
+ flipped = cv2.flip(image_bgr, 1)
790
+ f_boxes, f_confs, f_cls = self._veh_run_pass(flipped, VEH_TTA_CONF, session)
791
+ if len(f_boxes) > 0:
792
+ # Mirror x-coords: x1'=ow-x2, x2'=ow-x1
793
+ f_boxes[:, 0], f_boxes[:, 2] = ow - f_boxes[:, 2], ow - f_boxes[:, 0]
794
+ if len(boxes) > 0:
795
+ boxes = np.concatenate([boxes, f_boxes])
796
+ confs = np.concatenate([confs, f_confs])
797
+ cls_ids = np.concatenate([cls_ids, f_cls])
798
+ else:
799
+ boxes, confs, cls_ids = f_boxes, f_confs, f_cls
800
+
801
+ if len(boxes) == 0:
802
+ return []
803
+
804
+ # Remap model classes to output classes
805
+ out_cls = np.array([VEH_MODEL_TO_OUT[int(c)] for c in cls_ids])
806
+
807
+ # Per-class hard NMS with max-score cluster boosting
808
+ boxes, confs, out_cls = _nms_per_class_boost(
809
+ boxes, confs, out_cls, iou_thr=VEH_NMS_IOU)
810
+
811
+ if len(boxes) == 0:
812
+ return []
813
+
814
+ # Per-class confidence filter + aspect ratio filter + bus suppression
815
+ img_area = float(oh * ow)
816
+ sane = []
817
+ for i in range(len(boxes)):
818
+ cls = int(out_cls[i])
819
+
820
+ # Skip bus entirely (not scored by validator, just generates FP)
821
+ if cls in VEH_SKIP_CLS:
822
+ continue
823
+
824
+ # Per-class confidence threshold
825
+ min_conf = VEH_CLASS_CONF.get(cls, VEH_CONF_THRES)
826
+ if confs[i] < min_conf:
827
+ continue
828
+
829
+ bw = boxes[i, 2] - boxes[i, 0]
830
+ bh = boxes[i, 3] - boxes[i, 1]
831
+
832
+ # Minimum dimension
833
+ if bw < VEH_MIN_WH or bh < VEH_MIN_WH:
834
+ continue
835
+
836
+ area = bw * bh
837
+
838
+ # Per-class minimum area
839
+ min_area = VEH_CLASS_MIN_AREA.get(cls, VEH_MIN_AREA)
840
+ if area < min_area:
841
+ continue
842
+
843
+ # Per-class aspect ratio filter
844
+ aspect = max(bw, bh) / max(min(bw, bh), 1e-6)
845
+ max_aspect = VEH_CLASS_ASPECT.get(cls, VEH_MAX_ASPECT)
846
+ if aspect > max_aspect:
847
+ continue
848
+
849
+ # Max area ratio (covers entire image β€” likely FP)
850
+ if area / img_area > VEH_MAX_AREA_RATIO:
851
+ continue
852
+
853
+ sane.append(i)
854
+
855
+ if not sane:
856
+ return []
857
+ boxes, confs, out_cls = boxes[sane], confs[sane], out_cls[sane]
858
+
859
+ # Limit max detections
860
+ if len(boxes) > VEH_MAX_DET:
861
+ top_k = np.argsort(confs)[::-1][:VEH_MAX_DET]
862
+ boxes, confs, out_cls = boxes[top_k], confs[top_k], out_cls[top_k]
863
+
864
+ out = []
865
+ for i in range(len(boxes)):
866
+ b = boxes[i]
867
+ out.append(BoundingBox(
868
+ x1=max(0, min(ow, math.floor(b[0]))),
869
+ y1=max(0, min(oh, math.floor(b[1]))),
870
+ x2=max(0, min(ow, math.ceil(b[2]))),
871
+ y2=max(0, min(oh, math.ceil(b[3]))),
872
+ cls_id=int(out_cls[i]),
873
+ conf=max(0.0, min(1.0, float(confs[i]))),
874
+ ))
875
+ return out
876
+
877
+ def _infer_vehicle(self, image_bgr):
878
+ """Vehicle detection with FP32 fallback on catastrophic INT8 failure.
879
+
880
+ Runs INT8 model first. If it returns 0 boxes (true catastrophic failure,
881
+ see block 7905900), retries with FP32 model. Single-box results are
882
+ kept as-is β€” likely real sparse scenes, not INT8 degradation.
883
+ """
884
+ if not hasattr(self, '_veh_providers_logged'):
885
+ provs = self.veh_session.get_providers()
886
+ logger.warning(f"[vehicle] First inference β€” active providers: {provs}")
887
+ self._veh_providers_logged = True
888
+ boxes = self._infer_vehicle_core(image_bgr, self.veh_session)
889
+
890
+ if len(boxes) == 0 and (self.veh_session_fp32 or self._veh_fp32_path):
891
+ # Lazy-load FP32 session on first trigger
892
+ if self.veh_session_fp32 is None and self._veh_fp32_path:
893
+ try:
894
+ self.veh_session_fp32 = ort.InferenceSession(
895
+ self._veh_fp32_path,
896
+ providers=["CUDAExecutionProvider", "CPUExecutionProvider"],
897
+ )
898
+ logger.info("[vehicle] FP32 fallback lazy-loaded")
899
+ except Exception as e:
900
+ logger.warning(f"[vehicle] FP32 lazy-load failed: {e}")
901
+ self._veh_fp32_path = None
902
+ if self.veh_session_fp32:
903
+ boxes_fp32 = self._infer_vehicle_core(image_bgr, self.veh_session_fp32)
904
+ if len(boxes_fp32) > len(boxes):
905
+ logger.warning(
906
+ f"[vehicle] INT8 degraded ({len(boxes)} boxes), "
907
+ f"FP32 fallback recovered ({len(boxes_fp32)} boxes)"
908
+ )
909
+ return boxes_fp32
910
+
911
+ return boxes
912
+
913
+ # ── Vehicle parts confirmation ───────────────────────────────────────
914
+
915
+ @staticmethod
916
+ def _veh_check_driver(vb, person_boxes):
917
+ """Check if any person detection overlaps the driver/passenger region.
918
+
919
+ Driver region: upper 55% height, center 70% width of vehicle box.
920
+ A person's center inside this region β†’ vehicle confirmed.
921
+ """
922
+ if not person_boxes:
923
+ return False
924
+ vw = vb.x2 - vb.x1
925
+ vh = vb.y2 - vb.y1
926
+ dr_x1 = vb.x1 + vw * 0.15
927
+ dr_y1 = vb.y1
928
+ dr_x2 = vb.x2 - vw * 0.15
929
+ dr_y2 = vb.y1 + vh * 0.55
930
+ for pb in person_boxes:
931
+ pcx = (pb.x1 + pb.x2) / 2
932
+ pcy = (pb.y1 + pb.y2) / 2
933
+ if dr_x1 <= pcx <= dr_x2 and dr_y1 <= pcy <= dr_y2:
934
+ return True
935
+ return False
936
+
937
+ def _veh_check_rider(self, moto_box, person_boxes):
938
+ """Check if motorcycle has a rider, optionally with forward-lean pose.
939
+
940
+ Returns (has_overlap, has_lean_pose).
941
+ Uses cached pose keypoints from person pipeline to check torso angle.
942
+ Motorcycle riders lean forward (torso > 15Β° from vertical).
943
+ """
944
+ if not person_boxes:
945
+ return False, False
946
+ mw = moto_box.x2 - moto_box.x1
947
+ mh = moto_box.y2 - moto_box.y1
948
+ mx = mw * 0.1
949
+ my = mh * 0.1
950
+ has_overlap = False
951
+ for pb in person_boxes:
952
+ pcx = (pb.x1 + pb.x2) / 2
953
+ pcy = (pb.y1 + pb.y2) / 2
954
+ if (moto_box.x1 - mx <= pcx <= moto_box.x2 + mx and
955
+ moto_box.y1 - my <= pcy <= moto_box.y2 + my):
956
+ has_overlap = True
957
+ break
958
+ if not has_overlap:
959
+ return False, False
960
+
961
+ # Check forward-lean pose using cached pose data
962
+ if self._cached_pose_data is None:
963
+ return True, False
964
+ pose_boxes, pose_kps = self._cached_pose_data
965
+ if len(pose_boxes) == 0:
966
+ return True, False
967
+
968
+ for j in range(len(pose_boxes)):
969
+ pb = pose_boxes[j]
970
+ pcx = (pb[0] + pb[2]) / 2
971
+ pcy = (pb[1] + pb[3]) / 2
972
+ if not (moto_box.x1 - mx <= pcx <= moto_box.x2 + mx and
973
+ moto_box.y1 - my <= pcy <= moto_box.y2 + my):
974
+ continue
975
+ kps = pose_kps[j]
976
+ # Need at least one shoulder + one hip visible
977
+ l_sh, r_sh = kps[5], kps[6]
978
+ l_hip, r_hip = kps[11], kps[12]
979
+ sh_vis = [k[:2] for k in [l_sh, r_sh] if k[2] >= POSE_KP_CONF]
980
+ hip_vis = [k[:2] for k in [l_hip, r_hip] if k[2] >= POSE_KP_CONF]
981
+ if not sh_vis or not hip_vis:
982
+ continue
983
+ sh_mid = np.mean(sh_vis, axis=0)
984
+ hip_mid = np.mean(hip_vis, axis=0)
985
+ dx = sh_mid[0] - hip_mid[0]
986
+ dy = hip_mid[1] - sh_mid[1] # positive = shoulder above hip
987
+ if dy <= 0:
988
+ continue
989
+ angle = math.degrees(math.atan2(abs(dx), dy))
990
+ if angle >= VEH_PARTS_RIDER_LEAN_DEG:
991
+ return True, True
992
+ return True, False
993
+
994
+ def _veh_check_headlights(self, vb, image_bgr):
995
+ """Detect bright symmetric pair in lower portion of vehicle box.
996
+
997
+ Requires two bright blobs at similar y, on opposite sides of center,
998
+ with similar area. Only checks vehicles wider than VEH_PARTS_HL_MIN_PX.
999
+ """
1000
+ bw = vb.x2 - vb.x1
1001
+ bh = vb.y2 - vb.y1
1002
+ if bw < VEH_PARTS_HL_MIN_PX or bh < 30:
1003
+ return False
1004
+
1005
+ oh, ow = image_bgr.shape[:2]
1006
+ y1 = max(0, min(oh, int(vb.y1 + bh * 0.65)))
1007
+ y2 = max(0, min(oh, int(vb.y2)))
1008
+ x1 = max(0, min(ow, int(vb.x1)))
1009
+ x2 = max(0, min(ow, int(vb.x2)))
1010
+ if y2 - y1 < 5 or x2 - x1 < 10:
1011
+ return False
1012
+
1013
+ roi = image_bgr[y1:y2, x1:x2]
1014
+ gray = cv2.cvtColor(roi, cv2.COLOR_BGR2GRAY)
1015
+ _, bright = cv2.threshold(gray, VEH_PARTS_HL_BRIGHT, 255, cv2.THRESH_BINARY)
1016
+ contours, _ = cv2.findContours(bright, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
1017
+
1018
+ blobs = []
1019
+ for c in contours:
1020
+ area = cv2.contourArea(c)
1021
+ if area < VEH_PARTS_HL_MIN_BLOB:
1022
+ continue
1023
+ M = cv2.moments(c)
1024
+ if M["m00"] < 1:
1025
+ continue
1026
+ blobs.append((M["m10"] / M["m00"], M["m01"] / M["m00"], area))
1027
+
1028
+ if len(blobs) < 2:
1029
+ return False
1030
+
1031
+ roi_mid = (x2 - x1) / 2.0
1032
+ roi_h = y2 - y1
1033
+ for i in range(len(blobs)):
1034
+ for j in range(i + 1, len(blobs)):
1035
+ b1, b2 = blobs[i], blobs[j]
1036
+ if abs(b1[1] - b2[1]) > roi_h * 0.4:
1037
+ continue
1038
+ if max(b1[2], b2[2]) / max(min(b1[2], b2[2]), 1) > 3.0:
1039
+ continue
1040
+ if (b1[0] - roi_mid) * (b2[0] - roi_mid) < 0:
1041
+ return True
1042
+ return False
1043
+
1044
+ def _veh_check_windows(self, vb, image_bgr):
1045
+ """Detect repeated window pattern (bus/coach signature) using vertical edge periodicity.
1046
+
1047
+ Extracts middle horizontal band, applies vertical Sobel, projects vertically,
1048
+ and checks for 3+ regularly-spaced peaks (window frame edges).
1049
+ Only for large vehicles (truck cls_id=2).
1050
+ """
1051
+ bw = vb.x2 - vb.x1
1052
+ bh = vb.y2 - vb.y1
1053
+ if bw < VEH_PARTS_WINDOW_MIN_PX or bh < 40:
1054
+ return False
1055
+
1056
+ oh, ow = image_bgr.shape[:2]
1057
+ # Middle 40% of height (window band on a bus/coach)
1058
+ y1 = max(0, min(oh, int(vb.y1 + bh * 0.30)))
1059
+ y2 = max(0, min(oh, int(vb.y1 + bh * 0.70)))
1060
+ x1 = max(0, min(ow, int(vb.x1)))
1061
+ x2 = max(0, min(ow, int(vb.x2)))
1062
+ if y2 - y1 < 10 or x2 - x1 < 30:
1063
+ return False
1064
+
1065
+ roi = image_bgr[y1:y2, x1:x2]
1066
+ gray = cv2.cvtColor(roi, cv2.COLOR_BGR2GRAY)
1067
+
1068
+ # Vertical edge detection (window frames are vertical edges)
1069
+ sobel_v = cv2.Sobel(gray, cv2.CV_64F, 1, 0, ksize=3)
1070
+ abs_sobel = np.abs(sobel_v)
1071
+
1072
+ # Project vertically: mean per column
1073
+ projection = abs_sobel.mean(axis=0)
1074
+ if len(projection) < 10:
1075
+ return False
1076
+
1077
+ # Smooth projection
1078
+ ks = max(3, int(len(projection) * 0.02) | 1)
1079
+ projection = np.convolve(projection, np.ones(ks) / ks, mode='same')
1080
+
1081
+ # Find peaks above mean + 1 std
1082
+ thresh = projection.mean() + projection.std()
1083
+ peaks = []
1084
+ in_peak = False
1085
+ pk_start = 0
1086
+ for i in range(len(projection)):
1087
+ if projection[i] > thresh:
1088
+ if not in_peak:
1089
+ pk_start = i
1090
+ in_peak = True
1091
+ else:
1092
+ if in_peak:
1093
+ peaks.append((pk_start + i) // 2)
1094
+ in_peak = False
1095
+ if in_peak:
1096
+ peaks.append((pk_start + len(projection) - 1) // 2)
1097
+
1098
+ if len(peaks) < VEH_PARTS_WINDOW_MIN_PEAKS:
1099
+ return False
1100
+
1101
+ # Check regular spacing: gaps within 40% of median
1102
+ gaps = [peaks[i + 1] - peaks[i] for i in range(len(peaks) - 1)]
1103
+ if not gaps:
1104
+ return False
1105
+ med = sorted(gaps)[len(gaps) // 2]
1106
+ if med < 5:
1107
+ return False
1108
+ regular = sum(1 for g in gaps if abs(g - med) / max(med, 1) < 0.4)
1109
+ return regular >= len(gaps) * 0.6
1110
+
1111
+ def _veh_check_plate(self, vb, image_bgr):
1112
+ """Run license plate detector on a vehicle crop. Returns True if plate found."""
1113
+ if self.plate_session is None:
1114
+ return False
1115
+ bw = vb.x2 - vb.x1
1116
+ if bw < VEH_PARTS_PLATE_MIN_PX:
1117
+ return False
1118
+
1119
+ oh, ow = image_bgr.shape[:2]
1120
+ # Crop vehicle region with 5% padding
1121
+ pad_x = int(bw * 0.05)
1122
+ pad_y = int((vb.y2 - vb.y1) * 0.05)
1123
+ cx1 = max(0, int(vb.x1) - pad_x)
1124
+ cy1 = max(0, int(vb.y1) - pad_y)
1125
+ cx2 = min(ow, int(vb.x2) + pad_x)
1126
+ cy2 = min(oh, int(vb.y2) + pad_y)
1127
+ crop = image_bgr[cy1:cy2, cx1:cx2]
1128
+ if crop.size == 0:
1129
+ return False
1130
+
1131
+ # Letterbox to plate model input
1132
+ ch, cw = crop.shape[:2]
1133
+ r = min(self.plate_h / ch, self.plate_w / cw)
1134
+ nw, nh = int(round(cw * r)), int(round(ch * r))
1135
+ img_r = cv2.resize(crop, (nw, nh), interpolation=cv2.INTER_LINEAR)
1136
+ dw, dh = self.plate_w - nw, self.plate_h - nh
1137
+ pl, pt = dw // 2, dh // 2
1138
+ img_p = cv2.copyMakeBorder(
1139
+ img_r, pt, dh - pt, pl, dw - pl,
1140
+ cv2.BORDER_CONSTANT, value=(114, 114, 114),
1141
+ )
1142
+ rgb = cv2.cvtColor(img_p, cv2.COLOR_BGR2RGB)
1143
+ inp = rgb.astype(np.float32) / 255.0
1144
+ inp = np.ascontiguousarray(inp.transpose(2, 0, 1)[np.newaxis])
1145
+
1146
+ raw = self.plate_session.run(None, {self.plate_input_name: inp})[0]
1147
+ pred = raw[0] if raw.ndim == 3 else raw
1148
+
1149
+ # Handle both [N,6] end2end (post-NMS) and [N, 5+nc] raw formats
1150
+ if pred.shape[0] < pred.shape[1]:
1151
+ pred = pred.T # transpose [5+nc, N] -> [N, 5+nc]
1152
+ if pred.shape[1] < 5:
1153
+ return False
1154
+ # End2end post-NMS: few detections (< 500), col4=conf already final
1155
+ if pred.shape[0] < 500 and pred.shape[1] == 6:
1156
+ confs = pred[:, 4]
1157
+ elif pred.shape[1] == 5:
1158
+ confs = pred[:, 4] # single objectness score
1159
+ else:
1160
+ # Raw: x,y,w,h,objectness,cls_scores... β†’ conf = obj * max(cls)
1161
+ confs = pred[:, 4] * np.max(pred[:, 5:], axis=1)
1162
+ return bool((confs >= VEH_PARTS_PLATE_CONF).any())
1163
+
1164
+ def _vehicle_parts_confirm(self, vehicle_boxes, person_boxes, image_bgr):
1165
+ """Parts-based confidence scoring for vehicle detections.
1166
+
1167
+ Scoring hierarchy (confidence boosts are additive):
1168
+ 1. License plate detected β†’ +0.12 (strong, never suppress)
1169
+ 2. Person (driver/rider) inside vehicle β†’ +0.08-0.10
1170
+ 3. Headlight pair detected β†’ +0.05
1171
+ 4. Bus window pattern on truck β†’ +0.06
1172
+ 5. No parts but small/distant or high-conf β†’ keep original
1173
+ 6. Large + low-conf + no parts β†’ suppress as FP
1174
+
1175
+ Small/distant vehicles (area < 0.4% of image) are always exempt.
1176
+ Bus (cls_id=4) suppressed in _infer_vehicle β€” window check applies to trucks.
1177
+ """
1178
+ if not vehicle_boxes or not VEH_PARTS_ENABLED:
1179
+ return vehicle_boxes
1180
+
1181
+ oh, ow = image_bgr.shape[:2]
1182
+ img_area = float(oh * ow)
1183
+ has_plate_model = self.plate_session is not None
1184
+ # Skip plate checks on crowded scenes (aerial/drone, plates invisible)
1185
+ skip_plate = len(vehicle_boxes) > 20
1186
+
1187
+ result = []
1188
+ n_driver = 0
1189
+ n_rider = 0
1190
+ n_rider_lean = 0
1191
+ n_headlight = 0
1192
+ n_window = 0
1193
+ n_plate = 0
1194
+ n_suppressed = 0
1195
+
1196
+ for vb in vehicle_boxes:
1197
+ bw = vb.x2 - vb.x1
1198
+ bh = vb.y2 - vb.y1
1199
+ area_ratio = (bw * bh) / img_area
1200
+
1201
+ # Small/distant: exempt from parts check
1202
+ if area_ratio < VEH_PARTS_SMALL_AREA:
1203
+ result.append(vb)
1204
+ continue
1205
+
1206
+ boost = 0.0
1207
+ confirmed = False
1208
+
1209
+ # Check 1: License plate (strongest signal)
1210
+ if has_plate_model and not skip_plate and bw >= VEH_PARTS_PLATE_MIN_PX:
1211
+ try:
1212
+ if self._veh_check_plate(vb, image_bgr):
1213
+ boost += VEH_PARTS_BOOST_PLATE
1214
+ confirmed = True
1215
+ n_plate += 1
1216
+ except Exception:
1217
+ pass
1218
+
1219
+ # Check 2: Driver/passenger inside car or truck
1220
+ if vb.cls_id in (1, 2):
1221
+ if self._veh_check_driver(vb, person_boxes):
1222
+ boost += VEH_PARTS_BOOST_DRIVER
1223
+ confirmed = True
1224
+ n_driver += 1
1225
+
1226
+ # Check 3: Motorcycle rider (overlap + optional lean pose)
1227
+ if vb.cls_id == 3:
1228
+ has_overlap, has_lean = self._veh_check_rider(vb, person_boxes)
1229
+ if has_overlap:
1230
+ boost += VEH_PARTS_BOOST_RIDER
1231
+ if has_lean:
1232
+ boost += 0.05 # Extra for confirmed lean pose
1233
+ n_rider_lean += 1
1234
+ confirmed = True
1235
+ n_rider += 1
1236
+
1237
+ # Check 4: Headlight pair
1238
+ if bw >= VEH_PARTS_HL_MIN_PX:
1239
+ try:
1240
+ if self._veh_check_headlights(vb, image_bgr):
1241
+ boost += VEH_PARTS_BOOST_HL
1242
+ confirmed = True
1243
+ n_headlight += 1
1244
+ except Exception:
1245
+ pass
1246
+
1247
+ # Check 5: Window pattern (large trucks that might be buses)
1248
+ if vb.cls_id == 2 and bw >= VEH_PARTS_WINDOW_MIN_PX:
1249
+ try:
1250
+ if self._veh_check_windows(vb, image_bgr):
1251
+ boost += VEH_PARTS_BOOST_WINDOW
1252
+ n_window += 1
1253
+ except Exception:
1254
+ pass
1255
+
1256
+ # Apply boost and decide
1257
+ new_conf = min(1.0, vb.conf + boost)
1258
+
1259
+ if confirmed:
1260
+ result.append(BoundingBox(
1261
+ x1=vb.x1, y1=vb.y1, x2=vb.x2, y2=vb.y2,
1262
+ cls_id=vb.cls_id, conf=new_conf,
1263
+ ))
1264
+ elif area_ratio > VEH_PARTS_FP_AREA:
1265
+ # Large vehicle β€” use stricter threshold if plate model loaded
1266
+ fp_thresh = VEH_PARTS_FP_CONF_STRICT if (has_plate_model and not skip_plate) else VEH_PARTS_FP_CONF
1267
+ if vb.conf < fp_thresh:
1268
+ n_suppressed += 1
1269
+ else:
1270
+ result.append(vb)
1271
+ else:
1272
+ result.append(vb)
1273
+
1274
+ if n_driver or n_rider or n_headlight or n_window or n_plate or n_suppressed:
1275
+ logger.info(f"[veh-parts] plate={n_plate} driver={n_driver} rider={n_rider}"
1276
+ f"(lean={n_rider_lean}) hl={n_headlight} win={n_window} "
1277
+ f"suppress={n_suppressed}, kept {len(result)}/{len(vehicle_boxes)}")
1278
+ return result
1279
+
1280
+ # ── Person preprocessing (letterbox) ──────────────────────────────────
1281
+
1282
+ def _per_letterbox(self, img):
1283
+ h, w = img.shape[:2]
1284
+ r = min(self.per_h / h, self.per_w / w)
1285
+ nw, nh = int(round(w * r)), int(round(h * r))
1286
+ interp = cv2.INTER_CUBIC if r > 1.0 else cv2.INTER_LINEAR
1287
+ img_r = cv2.resize(img, (nw, nh), interpolation=interp)
1288
+ dw, dh = self.per_w - nw, self.per_h - nh
1289
+ pl, pt = dw // 2, dh // 2
1290
+ img_p = cv2.copyMakeBorder(
1291
+ img_r, pt, dh - pt, pl, dw - pl,
1292
+ cv2.BORDER_CONSTANT, value=(114, 114, 114),
1293
+ )
1294
+ return img_p, r, pl, pt
1295
+
1296
+ def _per_preprocess(self, image_bgr):
1297
+ img_p, ratio, pl, pt = self._per_letterbox(image_bgr)
1298
+ rgb = cv2.cvtColor(img_p, cv2.COLOR_BGR2RGB)
1299
+ inp = rgb.astype(np.float32) / 255.0
1300
+ inp = np.ascontiguousarray(inp.transpose(2, 0, 1)[np.newaxis])
1301
+ return inp, ratio, pl, pt
1302
+
1303
+ def _per_enhance(self, img_bgr):
1304
+ """Adaptive CLAHE: only apply to low-contrast frames, mild clip=2.0."""
1305
+ lab = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2LAB)
1306
+ l, a, b = cv2.split(lab)
1307
+ if float(l.std()) < PER_CLAHE_CONTRAST_THRESH:
1308
+ clahe = cv2.createCLAHE(clipLimit=PER_CLAHE_CLIP, tileGridSize=(8, 8))
1309
+ l = clahe.apply(l)
1310
+ return cv2.cvtColor(cv2.merge([l, a, b]), cv2.COLOR_LAB2BGR)
1311
+ return img_bgr # skip CLAHE on normal-contrast images
1312
+
1313
+ @staticmethod
1314
+ def _frame_blur_score(img_bgr):
1315
+ """Laplacian variance blur metric. Lower = blurrier."""
1316
+ gray = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2GRAY)
1317
+ return cv2.Laplacian(gray, cv2.CV_64F).var()
1318
+
1319
+ @staticmethod
1320
+ def _perspective_penalty(boxes, confs, image_h):
1321
+ """Apply confidence penalty to perspective-anomalous person detections.
1322
+
1323
+ Model: expected_height(y) = alpha * (y_foot - y_vp), where y_vp = image_h / 3.
1324
+ Alpha is estimated from the median height/distance ratio across detections.
1325
+ Detections deviating >3x from expected get conf *= 0.85.
1326
+ Fails open (returns confs unchanged) when model can't be estimated.
1327
+ """
1328
+ n = len(boxes)
1329
+ if n < PERSP_MIN_DETECTIONS:
1330
+ return confs
1331
+
1332
+ y_vp = image_h / 3.0
1333
+ y_feet = boxes[:, 3]
1334
+ heights = boxes[:, 3] - boxes[:, 1]
1335
+
1336
+ valid = y_feet > (y_vp + 10)
1337
+ if valid.sum() < PERSP_MIN_DETECTIONS:
1338
+ return confs
1339
+
1340
+ valid_y = y_feet[valid]
1341
+ valid_h = heights[valid]
1342
+
1343
+ y_spread = (valid_y.max() - valid_y.min()) / image_h
1344
+ if y_spread < PERSP_MIN_Y_SPREAD:
1345
+ return confs
1346
+
1347
+ alpha = float(np.median(valid_h / (valid_y - y_vp)))
1348
+ if alpha <= 0.01:
1349
+ return confs
1350
+
1351
+ new_confs = confs.copy()
1352
+ for i in range(n):
1353
+ if y_feet[i] <= y_vp:
1354
+ continue
1355
+ expected_h = alpha * (y_feet[i] - y_vp)
1356
+ if expected_h <= 0:
1357
+ continue
1358
+ ratio = heights[i] / expected_h
1359
+ if ratio > PERSP_DEVIATION_THRESH or ratio < (1.0 / PERSP_DEVIATION_THRESH):
1360
+ new_confs[i] *= PERSP_CONF_PENALTY
1361
+
1362
+ return new_confs
1363
+
1364
+ def _per_decode(self, raw, ratio, pl, pt, oh, ow, conf_thresh):
1365
+ pred = raw[0]
1366
+ if pred.ndim != 2:
1367
+ return np.empty((0, 4)), np.empty(0)
1368
+
1369
+ # Auto-detect output format
1370
+ if pred.shape[-1] == 6 and pred.shape[0] > pred.shape[1]:
1371
+ # YOLO26 end2end: [N, 6] = [x1, y1, x2, y2, conf, class_id]
1372
+ confs = pred[:, 4]
1373
+ keep = confs >= conf_thresh
1374
+ boxes, confs = pred[keep, :4], confs[keep]
1375
+ if len(boxes) == 0:
1376
+ return np.empty((0, 4)), np.empty(0)
1377
+ boxes[:, 0] = np.floor((boxes[:, 0] - pl) / ratio)
1378
+ boxes[:, 1] = np.floor((boxes[:, 1] - pt) / ratio)
1379
+ boxes[:, 2] = np.ceil((boxes[:, 2] - pl) / ratio)
1380
+ boxes[:, 3] = np.ceil((boxes[:, 3] - pt) / ratio)
1381
+ boxes = np.clip(boxes, 0, [[ow, oh, ow, oh]])
1382
+ return boxes, confs
1383
+
1384
+ # YOLO11 raw format: [5+nc, N] or [N, 5+nc]
1385
+ if pred.shape[0] < pred.shape[1]:
1386
+ pred = pred.T
1387
+ if pred.shape[1] < 5:
1388
+ return np.empty((0, 4)), np.empty(0)
1389
+ cls_scores = pred[:, 4:]
1390
+ confs = np.max(cls_scores, axis=1)
1391
+ keep = confs >= conf_thresh
1392
+ boxes, confs = pred[keep, :4], confs[keep]
1393
+ if len(boxes) == 0:
1394
+ return np.empty((0, 4)), np.empty(0)
1395
+ cx, cy, bw, bh = boxes[:, 0], boxes[:, 1], boxes[:, 2], boxes[:, 3]
1396
+ x1 = np.clip(np.floor((cx - bw / 2 - pl) / ratio), 0, ow)
1397
+ y1 = np.clip(np.floor((cy - bh / 2 - pt) / ratio), 0, oh)
1398
+ x2 = np.clip(np.ceil((cx + bw / 2 - pl) / ratio), 0, ow)
1399
+ y2 = np.clip(np.ceil((cy + bh / 2 - pt) / ratio), 0, oh)
1400
+ return np.stack([x1, y1, x2, y2], axis=1), confs
1401
+
1402
+ def _per_run_pass(self, image_bgr, conf_thresh):
1403
+ oh, ow = image_bgr.shape[:2]
1404
+ inp, ratio, pl, pt = self._per_preprocess(image_bgr)
1405
+ raw = self.per_session.run(None, {self.per_input_name: inp})[0]
1406
+ return self._per_decode(raw, ratio, pl, pt, oh, ow, conf_thresh)
1407
+
1408
+ def _generate_tiles(self, h, w):
1409
+ """SAHI-inspired tile generation.
1410
+
1411
+ Smart 2-tile split: horizontal for landscape, vertical for portrait.
1412
+ Edge-aware: for landscape, split in upper portion to avoid cutting
1413
+ through people standing in bottom third.
1414
+ Returns: [(x1,y1,x2,y2), ...] β€” always starts with full image.
1415
+ """
1416
+ tiles = [(0, 0, w, h)] # full image always first
1417
+
1418
+ # Only tile if image significantly exceeds model input
1419
+ if max(h, w) <= max(self.per_h, self.per_w) * PER_TILE_MIN_DIM_RATIO:
1420
+ return tiles
1421
+
1422
+ overlap_px_x = int(w * PER_TILE_OVERLAP)
1423
+ overlap_px_y = int(h * PER_TILE_OVERLAP)
1424
+
1425
+ if w >= h:
1426
+ # Landscape: 2 horizontal tiles (left + right)
1427
+ mid = w // 2
1428
+ tiles.append((0, 0, mid + overlap_px_x, h))
1429
+ tiles.append((mid - overlap_px_x, 0, w, h))
1430
+ else:
1431
+ # Portrait: 2 vertical tiles (top + bottom)
1432
+ # Edge-aware: bias split toward upper portion (people stand at bottom)
1433
+ mid = int(h * 0.45) # split at 45% height, not 50%
1434
+ tiles.append((0, 0, w, mid + overlap_px_y))
1435
+ tiles.append((0, mid - overlap_px_y, w, h))
1436
+
1437
+ return tiles
1438
+
1439
+ def _per_run_tile(self, image_bgr, tile_region, conf_thresh):
1440
+ """Run person model on a tile crop, return boxes in original coords."""
1441
+ x1t, y1t, x2t, y2t = tile_region
1442
+ crop = image_bgr[y1t:y2t, x1t:x2t]
1443
+ boxes, confs = self._per_run_pass(crop, conf_thresh)
1444
+ if len(boxes) == 0:
1445
+ return np.empty((0, 4)), np.empty(0)
1446
+ # Shift back to original image coordinates
1447
+ boxes[:, 0] += x1t
1448
+ boxes[:, 1] += y1t
1449
+ boxes[:, 2] += x1t
1450
+ boxes[:, 3] += y1t
1451
+ return boxes, confs
1452
+
1453
+ @staticmethod
1454
+ @staticmethod
1455
+ def _nms_max_conf(boxes, scores, iou_thr, sigma=0.5, min_conf=0.20):
1456
+ """Soft-NMS with Gaussian decay (replaces hard NMS).
1457
+
1458
+ Instead of suppressing overlapping boxes entirely, decays their
1459
+ confidence: score_j *= exp(-(iou^2) / sigma). This preserves
1460
+ partially-occluded detections in crowds while still penalising
1461
+ duplicates. Boxes whose confidence decays below min_conf are
1462
+ removed.
1463
+ """
1464
+ if len(boxes) == 0:
1465
+ return np.empty((0, 4)), np.empty(0)
1466
+
1467
+ b = boxes.copy().astype(np.float64)
1468
+ s = scores.copy().astype(np.float64)
1469
+ n = len(s)
1470
+ indices = list(range(n))
1471
+
1472
+ for i in range(n):
1473
+ # Find current max-confidence box
1474
+ max_idx = i
1475
+ for j in range(i + 1, n):
1476
+ if s[indices[j]] > s[indices[max_idx]]:
1477
+ max_idx = j
1478
+ # Swap to front
1479
+ indices[i], indices[max_idx] = indices[max_idx], indices[i]
1480
+
1481
+ ix = indices[i]
1482
+ # Decay overlapping boxes
1483
+ for j in range(i + 1, n):
1484
+ jx = indices[j]
1485
+ xx1 = max(b[ix, 0], b[jx, 0])
1486
+ yy1 = max(b[ix, 1], b[jx, 1])
1487
+ xx2 = min(b[ix, 2], b[jx, 2])
1488
+ yy2 = min(b[ix, 3], b[jx, 3])
1489
+ inter = max(0.0, xx2 - xx1) * max(0.0, yy2 - yy1)
1490
+ a1 = (b[ix, 2] - b[ix, 0]) * (b[ix, 3] - b[ix, 1])
1491
+ a2 = (b[jx, 2] - b[jx, 0]) * (b[jx, 3] - b[jx, 1])
1492
+ iou = inter / (a1 + a2 - inter + 1e-9)
1493
+ if iou > 0:
1494
+ s[jx] *= np.exp(-(iou * iou) / sigma)
1495
+
1496
+ # Keep boxes above min_conf
1497
+ keep = [indices[i] for i in range(n) if s[indices[i]] >= min_conf]
1498
+ if not keep:
1499
+ return np.empty((0, 4)), np.empty(0)
1500
+ return b[keep], s[keep]
1501
+
1502
+ # ── Pose FP filter + box refinement ──────────────────────────────────
1503
+
1504
+ def _pose_run(self, image_bgr):
1505
+ """Run pose model on full image, return (boxes [N,4], confs [N], keypoints [N,17,3]) in original coords."""
1506
+ if self.pose_session is None:
1507
+ return np.empty((0, 4)), np.empty(0), np.empty((0, 17, 3))
1508
+
1509
+ oh, ow = image_bgr.shape[:2]
1510
+
1511
+ # Letterbox to pose model input size
1512
+ r = min(self.pose_h / oh, self.pose_w / ow)
1513
+ nw, nh = int(round(ow * r)), int(round(oh * r))
1514
+ img_r = cv2.resize(image_bgr, (nw, nh), interpolation=cv2.INTER_LINEAR)
1515
+ dw, dh = self.pose_w - nw, self.pose_h - nh
1516
+ pl, pt = dw // 2, dh // 2
1517
+ img_p = cv2.copyMakeBorder(
1518
+ img_r, pt, dh - pt, pl, dw - pl,
1519
+ cv2.BORDER_CONSTANT, value=(114, 114, 114),
1520
+ )
1521
+
1522
+ rgb = cv2.cvtColor(img_p, cv2.COLOR_BGR2RGB)
1523
+ inp = rgb.astype(np.float32) / 255.0
1524
+ inp = np.ascontiguousarray(inp.transpose(2, 0, 1)[np.newaxis])
1525
+
1526
+ raw = self.pose_session.run(None, {self.pose_input_name: inp})[0]
1527
+
1528
+ # raw shape: [1, 56, 8400] -> transpose to [8400, 56]
1529
+ pred = raw[0] if raw.ndim == 3 else raw
1530
+ if pred.shape[0] < pred.shape[1]:
1531
+ pred = pred.T
1532
+
1533
+ # Decode: cols 0-3=xywh, col 4=conf, cols 5-55=17*3 keypoints
1534
+ confs = pred[:, 4]
1535
+ keep = confs >= POSE_CONF_THRESH
1536
+ if not keep.any():
1537
+ return np.empty((0, 4)), np.empty(0), np.empty((0, 17, 3))
1538
+
1539
+ pred = pred[keep]
1540
+ confs = pred[:, 4]
1541
+
1542
+ # Convert xywh to x1y1x2y2 in original coords
1543
+ cx, cy, bw, bh = pred[:, 0], pred[:, 1], pred[:, 2], pred[:, 3]
1544
+ x1 = np.clip((cx - bw / 2 - pl) / r, 0, ow)
1545
+ y1 = np.clip((cy - bh / 2 - pt) / r, 0, oh)
1546
+ x2 = np.clip((cx + bw / 2 - pl) / r, 0, ow)
1547
+ y2 = np.clip((cy + bh / 2 - pt) / r, 0, oh)
1548
+ boxes = np.stack([x1, y1, x2, y2], axis=1)
1549
+
1550
+ # Decode keypoints: [N, 51] -> [N, 17, 3]
1551
+ kp_raw = pred[:, 5:].reshape(-1, 17, 3).copy()
1552
+ kp_raw[:, :, 0] = (kp_raw[:, :, 0] - pl) / r # x
1553
+ kp_raw[:, :, 1] = (kp_raw[:, :, 1] - pt) / r # y
1554
+ kp_raw[:, :, 0] = np.clip(kp_raw[:, :, 0], 0, ow)
1555
+ kp_raw[:, :, 1] = np.clip(kp_raw[:, :, 1], 0, oh)
1556
+
1557
+ # NMS on pose detections
1558
+ order = np.argsort(-confs)
1559
+ boxes = boxes[order]
1560
+ confs = confs[order]
1561
+ kp_raw = kp_raw[order]
1562
+
1563
+ keep_idx = []
1564
+ suppressed = set()
1565
+ for i in range(len(boxes)):
1566
+ if i in suppressed:
1567
+ continue
1568
+ keep_idx.append(i)
1569
+ for j in range(i + 1, len(boxes)):
1570
+ if j in suppressed:
1571
+ continue
1572
+ xx1 = max(boxes[i, 0], boxes[j, 0])
1573
+ yy1 = max(boxes[i, 1], boxes[j, 1])
1574
+ xx2 = min(boxes[i, 2], boxes[j, 2])
1575
+ yy2 = min(boxes[i, 3], boxes[j, 3])
1576
+ inter = max(0, xx2 - xx1) * max(0, yy2 - yy1)
1577
+ a1 = (boxes[i, 2] - boxes[i, 0]) * (boxes[i, 3] - boxes[i, 1])
1578
+ a2 = (boxes[j, 2] - boxes[j, 0]) * (boxes[j, 3] - boxes[j, 1])
1579
+ iou_val = inter / (a1 + a2 - inter + 1e-9)
1580
+ if iou_val >= POSE_NMS_IOU:
1581
+ suppressed.add(j)
1582
+
1583
+ if not keep_idx:
1584
+ return np.empty((0, 4)), np.empty(0), np.empty((0, 17, 3))
1585
+ keep_idx = np.array(keep_idx)
1586
+ return boxes[keep_idx], confs[keep_idx], kp_raw[keep_idx]
1587
+
1588
+ _FACE_SIZE = 640
1589
+ _FACE_STRIDES = (8, 16, 32)
1590
+ _FACE_NUM_ANCHORS = 2
1591
+ _FACE_THRESH = 0.5
1592
+ _FACE_NMS_THRESH = 0.4
1593
+
1594
+ def _face_run(self, image_bgr):
1595
+ """Run SCRFD-500M face detector. Returns (face_boxes [N,4], face_confs [N])."""
1596
+ if self.face_session is None:
1597
+ return np.empty((0, 4)), np.empty(0)
1598
+
1599
+ oh, ow = image_bgr.shape[:2]
1600
+ sz = self._FACE_SIZE
1601
+
1602
+ # Letterbox resize preserving aspect ratio (top-left aligned)
1603
+ scale = min(sz / oh, sz / ow)
1604
+ nw, nh = int(round(ow * scale)), int(round(oh * scale))
1605
+ resized = cv2.resize(image_bgr, (nw, nh), interpolation=cv2.INTER_LINEAR)
1606
+ det_img = np.zeros((sz, sz, 3), dtype=np.uint8)
1607
+ det_img[:nh, :nw, :] = resized
1608
+
1609
+ # Preprocess: BGR→RGB, (pixel - 127.5) / 128.0
1610
+ blob = cv2.dnn.blobFromImage(
1611
+ det_img, 1.0 / 128.0, (sz, sz), (127.5, 127.5, 127.5), swapRB=True,
1612
+ )
1613
+
1614
+ outputs = self.face_session.run(None, {self.face_input_name: blob})
1615
+
1616
+ # Decode 3 stride levels: outputs[0:3]=scores, [3:6]=bboxes, [6:9]=kps
1617
+ all_scores, all_boxes = [], []
1618
+ for idx, stride in enumerate(self._FACE_STRIDES):
1619
+ scores = outputs[idx][:, 0] # (N,)
1620
+ bbox_d = outputs[idx + 3] # (N, 4) distances
1621
+ keep = scores >= self._FACE_THRESH
1622
+ if not keep.any():
1623
+ continue
1624
+ scores = scores[keep]
1625
+ bbox_d = bbox_d[keep]
1626
+
1627
+ # Generate anchor centers for kept positions
1628
+ fh, fw = sz // stride, sz // stride
1629
+ grid_y, grid_x = np.mgrid[:fh, :fw]
1630
+ centers = np.stack([grid_x, grid_y], axis=-1).astype(np.float32).reshape(-1, 2)
1631
+ centers = np.tile(centers, (1, self._FACE_NUM_ANCHORS)).reshape(-1, 2) * stride
1632
+ centers = centers[keep]
1633
+
1634
+ # distance β†’ bbox: [x1, y1, x2, y2]
1635
+ x1 = centers[:, 0] - bbox_d[:, 0] * stride
1636
+ y1 = centers[:, 1] - bbox_d[:, 1] * stride
1637
+ x2 = centers[:, 0] + bbox_d[:, 2] * stride
1638
+ y2 = centers[:, 1] + bbox_d[:, 3] * stride
1639
+ boxes = np.stack([x1, y1, x2, y2], axis=-1) / scale
1640
+
1641
+ all_scores.append(scores)
1642
+ all_boxes.append(boxes)
1643
+
1644
+ if not all_scores:
1645
+ return np.empty((0, 4)), np.empty(0)
1646
+
1647
+ scores = np.concatenate(all_scores)
1648
+ boxes = np.concatenate(all_boxes)
1649
+
1650
+ # NMS
1651
+ order = scores.argsort()[::-1]
1652
+ scores, boxes = scores[order], boxes[order]
1653
+ keep = []
1654
+ x1, y1, x2, y2 = boxes[:, 0], boxes[:, 1], boxes[:, 2], boxes[:, 3]
1655
+ areas = (x2 - x1) * (y2 - y1)
1656
+ suppressed = np.zeros(len(scores), dtype=bool)
1657
+ for i in range(len(scores)):
1658
+ if suppressed[i]:
1659
+ continue
1660
+ keep.append(i)
1661
+ xx1 = np.maximum(x1[i], x1[i + 1:])
1662
+ yy1 = np.maximum(y1[i], y1[i + 1:])
1663
+ xx2 = np.minimum(x2[i], x2[i + 1:])
1664
+ yy2 = np.minimum(y2[i], y2[i + 1:])
1665
+ inter = np.maximum(0, xx2 - xx1) * np.maximum(0, yy2 - yy1)
1666
+ ovr = inter / (areas[i] + areas[i + 1:] - inter + 1e-6)
1667
+ suppressed[i + 1:] |= ovr > self._FACE_NMS_THRESH
1668
+
1669
+ return boxes[keep], scores[keep]
1670
+
1671
+ @staticmethod
1672
+ def _anatomical_score(kps, kp_conf_thresh=POSE_KP_CONF):
1673
+ """Compute weighted anatomical score from keypoints [17, 3].
1674
+
1675
+ Returns (score, has_head, n_visible):
1676
+ score: weighted sum of visible keypoints (0.0-1.0)
1677
+ has_head: True if any head keypoint (nose/eyes/ears) is visible
1678
+ n_visible: number of visible keypoints
1679
+ """
1680
+ visible = kps[:, 2] >= kp_conf_thresh
1681
+ n_visible = int(visible.sum())
1682
+ score = float((visible.astype(np.float32) * POSE_KP_WEIGHTS).sum())
1683
+ has_head = bool(visible[POSE_HEAD_KP].any())
1684
+ return score, has_head, n_visible
1685
+
1686
+ def _refine_box_with_keypoints(self, pb, kps, ow, oh):
1687
+ """Blend person box with tight keypoint bbox."""
1688
+ visible = kps[:, 2] >= POSE_KP_CONF
1689
+ if not visible.any():
1690
+ return pb
1691
+ vis_kps = kps[visible]
1692
+ kp_x1 = float(vis_kps[:, 0].min())
1693
+ kp_y1 = float(vis_kps[:, 1].min())
1694
+ kp_x2 = float(vis_kps[:, 0].max())
1695
+ kp_y2 = float(vis_kps[:, 1].max())
1696
+
1697
+ # Pad around keypoint bbox
1698
+ kp_w = kp_x2 - kp_x1
1699
+ kp_h = kp_y2 - kp_y1
1700
+ pad_x = kp_w * POSE_KP_PAD
1701
+ pad_y = kp_h * POSE_KP_PAD
1702
+ kp_x1 = max(0, kp_x1 - pad_x)
1703
+ kp_y1 = max(0, kp_y1 - pad_y)
1704
+ kp_x2 = min(ow, kp_x2 + pad_x)
1705
+ kp_y2 = min(oh, kp_y2 + pad_y)
1706
+
1707
+ a = POSE_REFINE_BLEND
1708
+ return BoundingBox(
1709
+ x1=max(0, min(ow, int(pb.x1 * (1 - a) + kp_x1 * a))),
1710
+ y1=max(0, min(oh, int(pb.y1 * (1 - a) + kp_y1 * a))),
1711
+ x2=max(0, min(ow, int(pb.x2 * (1 - a) + kp_x2 * a))),
1712
+ y2=max(0, min(oh, int(pb.y2 * (1 - a) + kp_y2 * a))),
1713
+ cls_id=0,
1714
+ conf=pb.conf,
1715
+ )
1716
+
1717
+ def _pose_filter_refine(self, person_boxes, image_bgr):
1718
+ """Filter FP detections and refine boxes using anatomical keypoint scoring.
1719
+
1720
+ Anatomical scoring: weighted sum of visible keypoints where head/face
1721
+ keypoints (nose, eyes, ears) contribute most, upper body (shoulders,
1722
+ elbows, wrists) next, lower body (hips, knees, ankles) least.
1723
+
1724
+ Decision logic:
1725
+ 1. Run pose model once on full image.
1726
+ 2. Run face detector (if available) for additional confirmation.
1727
+ 3. Match each person detection to best-overlapping pose detection.
1728
+ 4. For matched boxes:
1729
+ a. Head keypoints visible OR face detected β†’ KEEP + refine (never suppress)
1730
+ b. Anatomical score >= REFINE threshold β†’ KEEP + refine
1731
+ c. Anatomical score > 0 β†’ KEEP as-is (partially visible person)
1732
+ d. Anatomical score == 0 + large + low-conf β†’ SUPPRESS (FP candidate)
1733
+ 5. For unmatched boxes:
1734
+ a. Face detected inside box β†’ KEEP
1735
+ b. Large + low-conf β†’ SUPPRESS
1736
+ c. Small or high-conf β†’ KEEP (SAHI-detected or confident)
1737
+ """
1738
+ if not person_boxes or self.pose_session is None:
1739
+ return person_boxes
1740
+
1741
+ oh, ow = image_bgr.shape[:2]
1742
+ img_area = float(oh * ow)
1743
+
1744
+ # Run pose model
1745
+ t_pose = time.monotonic()
1746
+ pose_boxes, pose_confs, pose_kps = self._pose_run(image_bgr)
1747
+ dt_pose = (time.monotonic() - t_pose) * 1000
1748
+
1749
+ # Cache pose data for motorcycle rider check in vehicle parts confirmation
1750
+ self._cached_pose_data = (pose_boxes, pose_kps)
1751
+
1752
+ # Run face detector if available
1753
+ face_boxes = np.empty((0, 4))
1754
+ if self.face_session is not None:
1755
+ t_face = time.monotonic()
1756
+ face_boxes, _ = self._face_run(image_bgr)
1757
+ dt_face = (time.monotonic() - t_face) * 1000
1758
+ logger.info(f"[pose] {len(pose_boxes)} pose, {len(face_boxes)} faces "
1759
+ f"in {dt_pose:.0f}+{dt_face:.0f}ms")
1760
+ else:
1761
+ logger.info(f"[pose] {len(pose_boxes)} pose detections in {dt_pose:.0f}ms")
1762
+
1763
+ # Helper: check if any face detection is inside a person box
1764
+ def has_face_inside(pb):
1765
+ if len(face_boxes) == 0:
1766
+ return False
1767
+ for fb in face_boxes:
1768
+ # Face center must be inside person box
1769
+ fcx = (fb[0] + fb[2]) / 2
1770
+ fcy = (fb[1] + fb[3]) / 2
1771
+ if pb.x1 <= fcx <= pb.x2 and pb.y1 <= fcy <= pb.y2:
1772
+ return True
1773
+ return False
1774
+
1775
+ if len(pose_boxes) == 0:
1776
+ # No pose detections β€” use face detector or size/conf heuristic
1777
+ result = []
1778
+ n_suppressed = 0
1779
+ for pb in person_boxes:
1780
+ if has_face_inside(pb):
1781
+ result.append(pb)
1782
+ continue
1783
+ bw = pb.x2 - pb.x1
1784
+ bh = pb.y2 - pb.y1
1785
+ area_ratio = (bw * bh) / img_area
1786
+ if area_ratio > POSE_FP_MIN_AREA and pb.conf < POSE_FP_MAX_CONF:
1787
+ n_suppressed += 1
1788
+ continue
1789
+ result.append(pb)
1790
+ if n_suppressed:
1791
+ logger.info(f"[pose] Suppressed {n_suppressed} FP (no pose detections)")
1792
+ return result
1793
+
1794
+ # Match person detections to pose detections via IoU
1795
+ result = []
1796
+ n_refined = 0
1797
+ n_suppressed = 0
1798
+ n_face_saved = 0
1799
+
1800
+ for pb in person_boxes:
1801
+ pb_arr = np.array([pb.x1, pb.y1, pb.x2, pb.y2], dtype=float)
1802
+ best_iou = 0.0
1803
+ best_idx = -1
1804
+
1805
+ for j in range(len(pose_boxes)):
1806
+ xx1 = max(pb_arr[0], pose_boxes[j, 0])
1807
+ yy1 = max(pb_arr[1], pose_boxes[j, 1])
1808
+ xx2 = min(pb_arr[2], pose_boxes[j, 2])
1809
+ yy2 = min(pb_arr[3], pose_boxes[j, 3])
1810
+ inter = max(0, xx2 - xx1) * max(0, yy2 - yy1)
1811
+ a1 = (pb_arr[2] - pb_arr[0]) * (pb_arr[3] - pb_arr[1])
1812
+ a2 = (pose_boxes[j, 2] - pose_boxes[j, 0]) * (pose_boxes[j, 3] - pose_boxes[j, 1])
1813
+ iou_val = inter / (a1 + a2 - inter + 1e-9)
1814
+ if iou_val > best_iou:
1815
+ best_iou = iou_val
1816
+ best_idx = j
1817
+
1818
+ if best_iou >= POSE_MATCH_IOU and best_idx >= 0:
1819
+ # Matched to a pose detection β€” compute anatomical score
1820
+ kps = pose_kps[best_idx] # [17, 3]
1821
+ anat_score, has_head, n_vis = self._anatomical_score(kps)
1822
+
1823
+ if has_head or has_face_inside(pb):
1824
+ # Head/face visible β†’ definitely a person, refine box
1825
+ result.append(self._refine_box_with_keypoints(pb, kps, ow, oh))
1826
+ n_refined += 1
1827
+ elif anat_score >= POSE_ANAT_REFINE_THRESH:
1828
+ # Good anatomical score β†’ person confirmed, refine
1829
+ result.append(self._refine_box_with_keypoints(pb, kps, ow, oh))
1830
+ n_refined += 1
1831
+ elif anat_score > POSE_ANAT_SUPPRESS_THRESH:
1832
+ # Some keypoints visible but low score β€” keep as-is
1833
+ result.append(pb)
1834
+ else:
1835
+ # Matched to pose bbox but ZERO keypoints visible
1836
+ # Only suppress if also large and low confidence
1837
+ bw = pb.x2 - pb.x1
1838
+ bh = pb.y2 - pb.y1
1839
+ area_ratio = (bw * bh) / img_area
1840
+ if area_ratio > POSE_FP_MIN_AREA and pb.conf < POSE_FP_MAX_CONF:
1841
+ n_suppressed += 1
1842
+ continue
1843
+ result.append(pb)
1844
+ else:
1845
+ # Not matched to any pose detection
1846
+ if has_face_inside(pb):
1847
+ # Face detector confirms a person
1848
+ result.append(pb)
1849
+ n_face_saved += 1
1850
+ continue
1851
+
1852
+ bw = pb.x2 - pb.x1
1853
+ bh = pb.y2 - pb.y1
1854
+ area_ratio = (bw * bh) / img_area
1855
+
1856
+ if area_ratio > POSE_FP_MIN_AREA and pb.conf < POSE_FP_MAX_CONF:
1857
+ # Large unmatched low-conf box β€” likely FP
1858
+ n_suppressed += 1
1859
+ continue
1860
+ else:
1861
+ # Small box or high conf β€” keep
1862
+ result.append(pb)
1863
+
1864
+ if n_refined or n_suppressed or n_face_saved:
1865
+ logger.info(f"[pose] Refined {n_refined}, suppressed {n_suppressed} FP, "
1866
+ f"face-saved {n_face_saved}, "
1867
+ f"kept {len(result)}/{len(person_boxes)}")
1868
+ return result
1869
+
1870
+ # ── Person inference with SAHI tiling ────────────────────────────────
1871
+
1872
+ @staticmethod
1873
+ def _match_boxes_iou(boxes_a, boxes_b, iou_thr):
1874
+ """Match boxes from two sets by IoU. Returns (matched_pairs, unmatched_a, unmatched_b).
1875
+
1876
+ matched_pairs: list of (idx_a, idx_b, iou) tuples
1877
+ unmatched_a: list of indices in boxes_a with no match
1878
+ unmatched_b: list of indices in boxes_b with no match
1879
+ """
1880
+ if len(boxes_a) == 0:
1881
+ return [], [], list(range(len(boxes_b)))
1882
+ if len(boxes_b) == 0:
1883
+ return [], list(range(len(boxes_a))), []
1884
+
1885
+ matched_pairs = []
1886
+ used_b = set()
1887
+
1888
+ for i in range(len(boxes_a)):
1889
+ best_iou = 0
1890
+ best_j = -1
1891
+ for j in range(len(boxes_b)):
1892
+ if j in used_b:
1893
+ continue
1894
+ xx1 = max(boxes_a[i, 0], boxes_b[j, 0])
1895
+ yy1 = max(boxes_a[i, 1], boxes_b[j, 1])
1896
+ xx2 = min(boxes_a[i, 2], boxes_b[j, 2])
1897
+ yy2 = min(boxes_a[i, 3], boxes_b[j, 3])
1898
+ inter = max(0.0, xx2 - xx1) * max(0.0, yy2 - yy1)
1899
+ a1 = (boxes_a[i, 2] - boxes_a[i, 0]) * (boxes_a[i, 3] - boxes_a[i, 1])
1900
+ a2 = (boxes_b[j, 2] - boxes_b[j, 0]) * (boxes_b[j, 3] - boxes_b[j, 1])
1901
+ iou = inter / (a1 + a2 - inter + 1e-9)
1902
+ if iou > best_iou:
1903
+ best_iou = iou
1904
+ best_j = j
1905
+ if best_iou >= iou_thr:
1906
+ matched_pairs.append((i, best_j, best_iou))
1907
+ used_b.add(best_j)
1908
+
1909
+ matched_a = {p[0] for p in matched_pairs}
1910
+ unmatched_a = [i for i in range(len(boxes_a)) if i not in matched_a]
1911
+ unmatched_b = [j for j in range(len(boxes_b)) if j not in used_b]
1912
+
1913
+ return matched_pairs, unmatched_a, unmatched_b
1914
+
1915
+ def _infer_person(self, image_bgr):
1916
+ """Person detection with TTA consensus merging.
1917
+
1918
+ Pipeline (v3.23 β€” replaces concatenate+soft-NMS with consensus merging):
1919
+ 1. Original pass at native 960px
1920
+ 2. Flip TTA pass
1921
+ 3. Match boxes across views (IoU >= PER_TTA_MATCH_IOU)
1922
+ 4. Graduated confidence thresholds:
1923
+ - Confirmed by both views: keep at PER_TTA_CONF_BOTH (0.50)
1924
+ - Original-only: keep at PER_TTA_CONF_ORIG (0.60)
1925
+ - Flip-only: keep at PER_TTA_CONF_FLIP (0.75)
1926
+ 5. Hard NMS on merged result
1927
+ 6. Sanity filters + safety ceiling
1928
+ 7. Pose FP filter + box refinement (if time allows)
1929
+ """
1930
+ oh, ow = image_bgr.shape[:2]
1931
+ t_start = time.monotonic()
1932
+
1933
+ # Frame quality gating
1934
+ blur_score = self._frame_blur_score(image_bgr)
1935
+ is_blurry = blur_score < PER_BLUR_THRESHOLD
1936
+
1937
+ # Pass 1: original image
1938
+ boxes_orig, confs_orig = self._per_run_pass(image_bgr, PER_TTA_CONF_BOTH)
1939
+
1940
+ # Pass 2: horizontal flip
1941
+ flipped = cv2.flip(image_bgr, 1)
1942
+ boxes_flip, confs_flip = self._per_run_pass(flipped, PER_TTA_CONF_BOTH)
1943
+ if len(boxes_flip) > 0:
1944
+ boxes_flip[:, 0], boxes_flip[:, 2] = (
1945
+ ow - boxes_flip[:, 2], ow - boxes_flip[:, 0])
1946
+
1947
+ if len(boxes_orig) == 0 and len(boxes_flip) == 0:
1948
+ return []
1949
+
1950
+ # TTA consensus: match boxes across views
1951
+ matched, unmatched_o, unmatched_f = self._match_boxes_iou(
1952
+ boxes_orig, boxes_flip, PER_TTA_MATCH_IOU)
1953
+
1954
+ # Build merged result with graduated thresholds
1955
+ merged_b = []
1956
+ merged_s = []
1957
+
1958
+ # Confirmed by both views: keep original box, use max confidence, threshold=0.50
1959
+ for i_o, i_f, iou in matched:
1960
+ conf = max(float(confs_orig[i_o]), float(confs_flip[i_f]))
1961
+ if conf >= PER_TTA_CONF_BOTH:
1962
+ merged_b.append(boxes_orig[i_o])
1963
+ merged_s.append(conf)
1964
+
1965
+ # Original-only: need higher confidence (0.60)
1966
+ for i_o in unmatched_o:
1967
+ if confs_orig[i_o] >= PER_TTA_CONF_ORIG:
1968
+ merged_b.append(boxes_orig[i_o])
1969
+ merged_s.append(float(confs_orig[i_o]))
1970
+
1971
+ # Flip-only: strict threshold (0.75) β€” flip-only detections are likely FP
1972
+ for i_f in unmatched_f:
1973
+ if confs_flip[i_f] >= PER_TTA_CONF_FLIP:
1974
+ merged_b.append(boxes_flip[i_f])
1975
+ merged_s.append(float(confs_flip[i_f]))
1976
+
1977
+ if not merged_b:
1978
+ return []
1979
+
1980
+ merged_b = np.array(merged_b)
1981
+ merged_s = np.array(merged_s)
1982
+
1983
+ # Hard NMS on merged result (no soft-NMS β€” no confidence decay)
1984
+ keep = _nms_per_class_boost(
1985
+ merged_b, merged_s,
1986
+ np.zeros(len(merged_s), dtype=int), # single class
1987
+ iou_thr=PER_NMS_IOU)
1988
+ merged_b, merged_s = keep[0], keep[1]
1989
+
1990
+ # Safety ceiling
1991
+ if len(merged_s) > PER_MAX_DET:
1992
+ top_idx = np.argsort(merged_s)[-PER_MAX_DET:]
1993
+ merged_b = merged_b[top_idx]
1994
+ merged_s = merged_s[top_idx]
1995
+
1996
+ if len(merged_b) == 0:
1997
+ return []
1998
+
1999
+ # Blur confidence penalty
2000
+ if is_blurry:
2001
+ merged_s = merged_s * PER_BLUR_CONF_PENALTY
2002
+
2003
+ # Perspective scaling penalty
2004
+ merged_s = self._perspective_penalty(merged_b, merged_s, oh)
2005
+
2006
+ # Final confidence floor (catches blur/perspective decay edge cases)
2007
+ keep_mask = merged_s >= PER_TTA_CONF_BOTH
2008
+ merged_b = merged_b[keep_mask]
2009
+ merged_s = merged_s[keep_mask]
2010
+
2011
+ # Sanity filters
2012
+ img_area = float(oh * ow)
2013
+ out = []
2014
+ for i in range(len(merged_b)):
2015
+ bw = merged_b[i, 2] - merged_b[i, 0]
2016
+ bh = merged_b[i, 3] - merged_b[i, 1]
2017
+ if bw < PER_MIN_WH or bh < PER_MIN_WH:
2018
+ continue
2019
+ area = bw * bh
2020
+ if area < PER_MIN_AREA:
2021
+ continue
2022
+ if max(bw, bh) / max(min(bw, bh), 1e-6) > PER_MAX_ASPECT:
2023
+ continue
2024
+ if area / img_area > PER_MAX_AREA_RATIO:
2025
+ continue
2026
+ b = merged_b[i]
2027
+ out.append(BoundingBox(
2028
+ x1=max(0, min(ow, int(b[0]))),
2029
+ y1=max(0, min(oh, int(b[1]))),
2030
+ x2=max(0, min(ow, int(b[2]))),
2031
+ y2=max(0, min(oh, int(b[3]))),
2032
+ cls_id=0,
2033
+ conf=max(0.0, min(1.0, float(merged_s[i]))),
2034
+ ))
2035
+
2036
+ # Pose FP filter + box refinement (only if time budget allows)
2037
+ if time.monotonic() - t_start < PER_RTF_BUDGET * 0.85:
2038
+ out = self._pose_filter_refine(out, image_bgr)
2039
+
2040
+ return out
2041
+
2042
+ # ── Element detection (stack frame inspection) ──────────────────────────
2043
+ _CHALLENGE_TYPE_MAP = {2: 'person', 12: 'vehicle'}
2044
+
2045
+ def _detect_element_hint(self) -> str:
2046
+ """Detect whether this request is for person or vehicle.
2047
+
2048
+ Reads challenge_type_id from the chute template predict() metadata
2049
+ via stack frame inspection. Returns 'person', 'vehicle', or 'both'.
2050
+ """
2051
+ frame = None
2052
+ try:
2053
+ frame = inspect.currentframe()
2054
+ for _ in range(10):
2055
+ frame = frame.f_back
2056
+ if frame is None:
2057
+ break
2058
+ meta = frame.f_locals.get('metadata')
2059
+ if isinstance(meta, dict) and 'challenge_type_id' in meta:
2060
+ ct_id = meta['challenge_type_id']
2061
+ hint = self._CHALLENGE_TYPE_MAP.get(ct_id)
2062
+ if hint:
2063
+ return hint
2064
+ return 'both'
2065
+ except Exception:
2066
+ pass
2067
+ finally:
2068
+ del frame
2069
+ return 'both'
2070
+
2071
+ # ── Unified inference ───────────────────────────────────────────────────
2072
+
2073
+ def _infer_single(self, image_bgr: ndarray, element_hint: str = 'both') -> list[BoundingBox]:
2074
+ self._cached_pose_data = None # reset before each frame
2075
+
2076
+ if element_hint == 'person':
2077
+ return self._infer_person(image_bgr)
2078
+
2079
+ if element_hint == 'vehicle':
2080
+ # Run vehicle detection + parts confirmation with empty person_boxes.
2081
+ # Plate/headlight/window checks fire normally; driver/rider overlap
2082
+ # check finds no matches (boost=0) but doesn't suppress.
2083
+ vehicle_boxes = self._infer_vehicle(image_bgr)
2084
+ return self._vehicle_parts_confirm(vehicle_boxes, [], image_bgr)
2085
+
2086
+ # Fallback: run both (original behavior)
2087
+ if ENABLE_PARALLEL:
2088
+ veh_future = self._executor.submit(self._infer_vehicle, image_bgr)
2089
+ per_future = self._executor.submit(self._infer_person, image_bgr)
2090
+ vehicle_boxes = veh_future.result()
2091
+ person_boxes = per_future.result()
2092
+ else:
2093
+ vehicle_boxes = self._infer_vehicle(image_bgr)
2094
+ person_boxes = self._infer_person(image_bgr)
2095
+
2096
+ # Vehicle parts confirmation: cross-reference with person detections
2097
+ vehicle_boxes = self._vehicle_parts_confirm(
2098
+ vehicle_boxes, person_boxes, image_bgr)
2099
+
2100
+ return vehicle_boxes + person_boxes
2101
+
2102
+
2103
+ # -- Replay buffer -------------------------------------------------------
2104
+ REPLAY_DIR = Path("/home/miner/replay_buffer")
2105
+ REPLAY_MAX = 100
2106
+
2107
+ def _replay_save(self, batch_images, results):
2108
+ try:
2109
+ ts = datetime.now(timezone.utc).strftime("%Y%m%d_%H%M%S_%f")
2110
+ query_dir = self.REPLAY_DIR / ts
2111
+ query_dir.mkdir(parents=True, exist_ok=True)
2112
+
2113
+ for i, img in enumerate(batch_images):
2114
+ cv2.imwrite(str(query_dir / f"img_{i:03d}.jpg"), img,
2115
+ [cv2.IMWRITE_JPEG_QUALITY, 95])
2116
+
2117
+ preds = []
2118
+ for r in results:
2119
+ preds.append({
2120
+ "frame_id": r.frame_id,
2121
+ "boxes": [b.model_dump() for b in r.boxes],
2122
+ })
2123
+ meta = {
2124
+ "timestamp": ts,
2125
+ "num_images": len(batch_images),
2126
+ "image_shapes": [list(img.shape) for img in batch_images],
2127
+ "predictions": preds,
2128
+ }
2129
+ (query_dir / "meta.json").write_text(json.dumps(meta, indent=2))
2130
+ self._replay_prune()
2131
+ except Exception:
2132
+ pass
2133
+
2134
+ def _replay_prune(self):
2135
+ try:
2136
+ dirs = sorted(
2137
+ [d for d in self.REPLAY_DIR.iterdir() if d.is_dir()],
2138
+ key=lambda d: d.name,
2139
+ )
2140
+ if len(dirs) > self.REPLAY_MAX:
2141
+ import shutil
2142
+ for old in dirs[: len(dirs) - self.REPLAY_MAX]:
2143
+ shutil.rmtree(old, ignore_errors=True)
2144
+ except Exception:
2145
+ pass
2146
+
2147
+ def predict_batch(
2148
+ self,
2149
+ batch_images: list[ndarray],
2150
+ offset: int,
2151
+ n_keypoints: int,
2152
+ ) -> list[TVFrameResult]:
2153
+ t_start = time.perf_counter()
2154
+
2155
+ # Detect element type from caller metadata
2156
+ element_hint = self._detect_element_hint()
2157
+ t_setup = time.perf_counter()
2158
+ dt_setup = (t_setup - t_start) * 1000
2159
+
2160
+ _lat_logger.info(
2161
+ "REQUEST batch=%d hint=%s setup=%.1fms",
2162
+ len(batch_images), element_hint, dt_setup,
2163
+ )
2164
+
2165
+ results: list[TVFrameResult] = []
2166
+ for idx, image in enumerate(batch_images):
2167
+ t_img = time.perf_counter()
2168
+ boxes = self._infer_single(image, element_hint=element_hint)
2169
+ t_post = time.perf_counter()
2170
+ dt_infer = (t_post - t_img) * 1000
2171
+
2172
+ keypoints = [(0, 0) for _ in range(max(0, int(n_keypoints)))]
2173
+ results.append(TVFrameResult(
2174
+ frame_id=offset + idx, boxes=boxes, keypoints=keypoints,
2175
+ ))
2176
+ dt_post = (time.perf_counter() - t_post) * 1000
2177
+
2178
+ if idx < 3 or idx == len(batch_images) - 1:
2179
+ _lat_logger.info(
2180
+ " IMG %d/%d boxes=%d infer=%.1fms post=%.1fms shape=%s",
2181
+ idx, len(batch_images), len(boxes), dt_infer, dt_post,
2182
+ image.shape,
2183
+ )
2184
+
2185
+ t_done = time.perf_counter()
2186
+ dt_total = (t_done - t_start) * 1000
2187
+ total_boxes = sum(len(r.boxes) for r in results)
2188
+
2189
+ _lat_logger.info(
2190
+ "DONE batch=%d boxes=%d total=%.1fms setup=%.1fms hint=%s",
2191
+ len(batch_images), total_boxes, dt_total, dt_setup, element_hint,
2192
+ )
2193
+ logger.info(f"[miner] predict_batch: {len(batch_images)} images, "
2194
+ f"{total_boxes} total boxes, {dt_total:.0f}ms (hint={element_hint})")
2195
+
2196
+ threading.Thread(
2197
+ target=self._replay_save,
2198
+ args=(batch_images, results),
2199
+ daemon=True,
2200
+ ).start()
2201
+
2202
+ return results
2203
+ # Miner v3.19 β€” 1-pass vehicle + CLAHE pass + parts_confirm fix β€” element detection + per-step timing β€” background TRT engine build + CUDA-first fallback 20260402