Spaces:

DariusGiannoli
/

PerceptionBenchmark

Sleeping

App Files Files Community

DariusGiannoli commited on Mar 8

Commit

8ac50b6

1 Parent(s): 565e61c

feat: central model registry, real-time detection, stereo geometry & home page

Browse files

Files changed (6) hide show

app.py +191 -16
pages/3_Feature_Lab.py +7 -31
pages/4_Model_Tuning.py +17 -47
pages/5_RealTime_Detection.py +338 -0
pages/6_Stereo_Geometry.py +327 -0
src/models.py +250 -0

app.py CHANGED Viewed

@@ -1,17 +1,192 @@
 import streamlit as st
-import cv2
-import numpy as np
-from src.detectors.yolo import YOLODetector
-st.set_page_config(page_title="Perception Benchmark", layout="wide")
-st.title("🦅 Bird Perception Stack")
-st.write("Current Status: Recognition Engine Online. Stereo Depth Engine Pending.")
-# Simple test of your existing YOLO class
-if st.button("Initialize YOLOv8n"):
-    try:
-        detector = YOLODetector()
-        st.success("YOLOv8n Loaded Successfully from weights!")
-    except Exception as e:
-        st.error(f"Error loading weights: {e}")

 import streamlit as st
+st.set_page_config(page_title="Perception Benchmark", layout="wide", page_icon="🦅")
+# ===================================================================
+#  Header
+# ===================================================================
+st.title("🦅 Recognition BenchMark")
+st.subheader("A stereo-vision pipeline for object recognition & depth estimation")
+st.caption("Compare classical feature engineering (RCE) against modern deep learning backbones — end-to-end, in your browser.")
+st.divider()
+# ===================================================================
+#  Pipeline Overview
+# ===================================================================
+st.header("🗺️ Pipeline Overview")
+st.markdown("""
+The app is structured as a **5-stage sequential pipeline**.
+Complete each page in order — every stage feeds the next.
+""")
+stages = [
+    ("🧪", "1 · Data Lab",          "Upload a stereo image pair, camera calibration file, and two PFM ground-truth depth maps. "
+                                      "Define an object ROI (bounding box), then apply live data augmentation "
+                                      "(brightness, contrast, rotation, noise, blur, shift, flip). "
+                                      "All assets are locked into session state — nothing is written to disk."),
+    ("🔬", "2 · Feature Lab",        "Toggle RCE physics modules (Intensity · Sobel · Spectral) to build a modular "
+                                      "feature vector. Compare it live against CNN activation maps extracted from a "
+                                      "frozen backbone via forward hooks. Lock your active module configuration."),
+    ("⚙️", "3 · Model Tuning",       "Train lightweight **heads** on your session data (augmented crop = positives, "
+                                      "random non-overlapping patches from the scene = negatives). "
+                                      "Both RCE and CNN heads are trained identically with LogisticRegression "
+                                      "and stored in session state only — no disk writes."),
+    ("🎯", "4 · Real-Time Detection","Run a **sliding window** across the right image using both the RCE head and "
+                                      "your chosen CNN head simultaneously. Watch the scan live, then compare "
+                                      "bounding boxes, confidence heatmaps, and latency."),
+    ("📐", "5 · Stereo Geometry",    "Compute a disparity map with **StereoSGBM**, convert it to metric depth "
+                                      "using the stereo formula $Z = fB/(d+d_{\\text{offs}})$, then read depth "
+                                      "directly at every detected bounding box. Compare against PFM ground truth."),
+]
+for icon, title, desc in stages:
+    with st.container(border=True):
+        c1, c2 = st.columns([1, 12])
+        c1.markdown(f"## {icon}")
+        c2.markdown(f"**{title}**  \n{desc}")
+st.divider()
+# ===================================================================
+#  Models
+# ===================================================================
+st.header("🧠 Models Used")
+tab_rce, tab_resnet, tab_mobilenet, tab_mobilevit = st.tabs(
+    ["RCE Engine", "ResNet-18", "MobileNetV3-Small", "MobileViT-XXS"])
+with tab_rce:
+    st.markdown("### 🧬 RCE — Relative Contextual Encoding")
+    st.markdown("""
+**Type:** Modular hand-crafted feature extractor
+**Architecture:** Three physics-inspired modules, each producing a 10-bin histogram:
+| Module | Input | Operation |
+|--------|-------|-----------|
+| **Intensity** | Grayscale | Pixel-value histogram (global appearance) |
+| **Sobel** | Gradient magnitude | Edge strength distribution (texture) |
+| **Spectral** | FFT log-magnitude | Frequency content (pattern / structure) |
+**Strengths:**
+- Fully explainable — every dimension has a physical meaning
+- Extremely fast (µs per patch, no GPU needed)
+- Modular: disable any module and immediately see the effect on the vector
+- Zero pre-training needed
+**Weakness:** Less discriminative than deep features for complex visual scenes.
+    """)
+with tab_resnet:
+    st.markdown("### 🏗️ ResNet-18")
+    st.markdown("""
+**Source:** PyTorch Hub (`torchvision.models.ResNet18_Weights.DEFAULT`)
+**Pre-training:** ImageNet-1k (1.28 M images, 1 000 classes)
+**Backbone output:** 512-dimensional embedding (after `avgpool`)
+**Head:** LogisticRegression trained on your session data
+**Architecture highlights:**
+- 18 layers with residual (skip) connections
+- Residual blocks prevent vanishing gradients in deeper networks
+- `layer4` is hooked for activation map visualisation
+**In this app:** The entire backbone is **frozen** (`requires_grad=False`).
+Only the lightweight head adapts to your specific object.
+    """)
+with tab_mobilenet:
+    st.markdown("### 📱 MobileNetV3-Small")
+    st.markdown("""
+**Source:** PyTorch Hub (`torchvision.models.MobileNet_V3_Small_Weights.DEFAULT`)
+**Pre-training:** ImageNet-1k
+**Backbone output:** 576-dimensional embedding (classifier replaced with `Identity`)
+**Head:** LogisticRegression trained on your session data
+**Architecture highlights:**
+- Inverted residuals + linear bottlenecks (MobileNetV2 heritage)
+- Hard-Swish / Hard-Sigmoid activations (hardware-friendly)
+- Squeeze-and-Excitation (SE) blocks for channel attention
+- Designed for **edge / mobile inference** — ~2.5 M parameters
+**In this app:** Typically 3–5× faster than ResNet-18.
+`features[-1]` is hooked for activation maps.
+    """)
+with tab_mobilevit:
+    st.markdown("### 🤖 MobileViT-XXS")
+    st.markdown("""
+**Source:** timm — `mobilevit_xxs.cvnets_in1k` (Apple Research, 2022)
+**Pre-training:** ImageNet-1k
+**Backbone output:** 320-dimensional embedding (`num_classes=0`)
+**Head:** LogisticRegression trained on your session data
+**Architecture highlights:**
+- **Hybrid CNN + Vision Transformer** — local convolutions for spatial features,
+  global self-attention for long-range context
+- MobileNetV2 stem + MobileViT blocks (attention on non-overlapping patches)
+- Only ~1.3 M parameters — smallest of the three
+**In this app:** The final transformer stage `stages[-1]` is hooked.
+Slower than MobileNetV3 but captures global structure.
+    """)
+st.divider()
+# ===================================================================
+#  Depth Estimation
+# ===================================================================
+st.header("📐 Stereo Depth Estimation")
+col_d1, col_d2 = st.columns(2)
+with col_d1:
+    st.markdown("""
+**Algorithm:** `cv2.StereoSGBM` (Semi-Global Block Matching)
+SGBM minimises a global energy function combining:
+- Data cost (pixel intensity difference)
+- Smoothness penalty (P1, P2 regularisation)
+It processes multiple horizontal and diagonal scan-line passes,
+making it significantly more accurate than basic block matching.
+    """)
+with col_d2:
+    st.markdown("""
+**Depth formula (Middlebury convention):**
+    """)
+    st.latex(r"Z = \frac{f \times B}{d + d_{\text{offs}}}")
+    st.markdown("""
+- $f$ — focal length (pixels)
+- $B$ — baseline (mm, from calibration file)
+- $d$ — disparity (pixels)
+- $d_\\text{offs}$ — optical-center offset between cameras
+    """)
+st.divider()
+# ===================================================================
+#  Session Status
+# ===================================================================
+st.header("📋 Session Status")
+pipe = st.session_state.get("pipeline_data", {})
+checks = {
+    "Data Lab locked":       "left" in pipe,
+    "Crop defined":          "crop" in pipe,
+    "Augmentation applied":  "crop_aug" in pipe,
+    "Active modules locked": "active_modules" in st.session_state,
+    "RCE head trained":      "rce_head" in st.session_state,
+    "CNN head trained":      any(f"cnn_head_{n}" in st.session_state
+                                  for n in ["ResNet-18", "MobileNetV3", "MobileViT-XXS"]),
+    "RCE detections ready":  "rce_dets" in st.session_state,
+    "CNN detections ready":  "cnn_dets" in st.session_state,
+}
+cols = st.columns(4)
+for i, (label, done) in enumerate(checks.items()):
+    cols[i % 4].markdown(
+        f"{'✅' if done else '⬜'} {'~~' if not done else ''}{label}{'~~' if not done else ''}"
+    )
+st.divider()
+st.caption("Navigate using the sidebar → Start with **🧪 Data Lab**")

pages/3_Feature_Lab.py CHANGED Viewed

@@ -5,25 +5,7 @@ import plotly.graph_objects as go
 import sys, os
 sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
 from src.detectors.rce.features import REGISTRY
-# ---------------------------------------------------------------------------
-# Cached model loaders — instantiated once, reused across reruns
-# ---------------------------------------------------------------------------
-@st.cache_resource
-def load_resnet():
-    from src.detectors.resnet import ResNetDetector
-    return ResNetDetector()
-@st.cache_resource
-def load_mobilenet():
-    from src.detectors.mobilenet import MobileNetDetector
-    return MobileNetDetector()
-@st.cache_resource
-def load_mobilevit():
-    from src.detectors.mobilevit import MobileViTDetector
-    return MobileViTDetector()
 st.set_page_config(page_title="Feature Lab", layout="wide")
@@ -86,22 +68,16 @@ with col_rce:
 # ---------------------------------------------------------------------------
 with col_cnn:
     st.header("🧠 CNN: Static Architecture")
-    selected_cnn = st.selectbox("Compare against Model", ["ResNet-18", "MobileViT-XXS", "MobileNetV3"])
     st.info("CNN features are fixed by pre-trained weights. You cannot toggle them like the RCE.")
     with st.spinner(f"Loading {selected_cnn} and extracting activations..."):
         try:
-            if selected_cnn == "ResNet-18":
-                detector = load_resnet()
-                layer_name = "layer4 (last conv block)"
-            elif selected_cnn == "MobileViT-XXS":
-                detector = load_mobilevit()
-                layer_name = "stages[-1] (last transformer stage)"
-            else:
-                detector = load_mobilenet()
-                layer_name = "features[-1] (last features block)"
-            act_maps = detector.get_activation_maps(obj, n_maps=6)
             st.caption(f"Hooked layer: `{layer_name}` — showing 6 of {len(act_maps)} channels")
             act_cols = st.columns(3)
             for i, amap in enumerate(act_maps):

 import sys, os
 sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
 from src.detectors.rce.features import REGISTRY
+from src.models import BACKBONES
 st.set_page_config(page_title="Feature Lab", layout="wide")
 # ---------------------------------------------------------------------------
 with col_cnn:
     st.header("🧠 CNN: Static Architecture")
+    selected_cnn = st.selectbox("Compare against Model", list(BACKBONES.keys()))
     st.info("CNN features are fixed by pre-trained weights. You cannot toggle them like the RCE.")
     with st.spinner(f"Loading {selected_cnn} and extracting activations..."):
         try:
+            bmeta = BACKBONES[selected_cnn]
+            backbone = bmeta["loader"]()       # cached frozen backbone
+            layer_name = bmeta["hook_layer"]
+            act_maps = backbone.get_activation_maps(obj, n_maps=6)
             st.caption(f"Hooked layer: `{layer_name}` — showing 6 of {len(act_maps)} channels")
             act_cols = st.columns(3)
             for i, amap in enumerate(act_maps):

pages/4_Model_Tuning.py CHANGED Viewed

@@ -7,6 +7,7 @@ import sys, os
 sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
 from src.detectors.rce.features import REGISTRY
 st.set_page_config(page_title="Model Tuning", layout="wide")
 st.title("⚙️ Model Tuning: Train & Compare")
@@ -26,31 +27,6 @@ bbox      = assets.get("crop_bbox", (0, 0, crop.shape[1], crop.shape[0]))
 active_modules = st.session_state.get("active_modules", {k: True for k in REGISTRY})
-# ---------------------------------------------------------------------------
-# Cached model loaders
-# ---------------------------------------------------------------------------
-@st.cache_resource
-def load_resnet():
-    from src.detectors.resnet import ResNetDetector
-    return ResNetDetector()
-@st.cache_resource
-def load_mobilenet():
-    from src.detectors.mobilenet import MobileNetDetector
-    return MobileNetDetector()
-@st.cache_resource
-def load_mobilevit():
-    from src.detectors.mobilevit import MobileViTDetector
-    return MobileViTDetector()
-CNN_MODELS = {
-    "ResNet-18":      {"loader": load_resnet,    "dim": 512},
-    "MobileNetV3":    {"loader": load_mobilenet,  "dim": 576},
-    "MobileViT-XXS":  {"loader": load_mobilevit,  "dim": 320},
-}
 # ---------------------------------------------------------------------------
 # Build training set from session data (no disk reads)
 # ---------------------------------------------------------------------------
@@ -137,7 +113,6 @@ with col_rce:
     if st.button("🚀 Train RCE Head"):
         images, labels = build_training_set()
-        from sklearn.linear_model import LogisticRegression
         from sklearn.metrics import accuracy_score
         progress = st.progress(0, text="Extracting RCE features...")
@@ -151,12 +126,11 @@ with col_rce:
         progress.progress(1.0, text="Fitting Logistic Regression...")
         t0 = time.perf_counter()
-        head = LogisticRegression(max_iter=rce_max_iter, C=rce_C)
-        head.fit(X, labels)
         train_time = time.perf_counter() - t0
         progress.progress(1.0, text="✅ Training complete!")
-        preds = head.predict(X)
         train_acc = accuracy_score(labels, preds)
         st.success(f"Trained in **{train_time:.2f}s**")
@@ -184,10 +158,9 @@ with col_rce:
         head = st.session_state["rce_head"]
         t0 = time.perf_counter()
         vec = build_rce_vector(crop_aug)
-        probs = head.predict_proba([vec])[0]
         dt = (time.perf_counter() - t0) * 1000
-        idx = np.argmax(probs)
-        st.write(f"**{head.classes_[idx]}** — {probs[idx]:.1%} confidence — {dt:.1f} ms")
 # ---------------------------------------------------------------------------
@@ -196,8 +169,8 @@ with col_rce:
 with col_cnn:
     st.header("🧠 CNN Fine-Tuning")
-    selected = st.selectbox("Select Model", list(CNN_MODELS.keys()))
-    meta = CNN_MODELS[selected]
     st.caption(f"Backbone embedding: **{meta['dim']}D** → Logistic Regression head")
     st.subheader("Training Parameters")
@@ -208,28 +181,26 @@ with col_cnn:
     if st.button(f"🚀 Train {selected} Head"):
         images, labels = build_training_set()
-        detector = meta["loader"]()
-        from sklearn.linear_model import LogisticRegression
         from sklearn.metrics import accuracy_score
         progress = st.progress(0, text=f"Extracting {selected} features...")
         n = len(images)
         X = []
         for i, img in enumerate(images):
-            X.append(detector._get_features(img))
             progress.progress((i + 1) / n, text=f"Feature extraction: {i+1}/{n}")
         X = np.array(X)
         progress.progress(1.0, text="Fitting Logistic Regression...")
         t0 = time.perf_counter()
-        head = LogisticRegression(max_iter=cnn_max_iter, C=cnn_C)
-        head.fit(X, labels)
         train_time = time.perf_counter() - t0
         progress.progress(1.0, text="✅ Training complete!")
-        preds = head.predict(X)
         train_acc = accuracy_score(labels, preds)
         st.success(f"Trained in **{train_time:.2f}s**")
@@ -254,14 +225,13 @@ with col_cnn:
     if f"cnn_head_{selected}" in st.session_state:
         st.divider()
         st.subheader("Quick Predict (Crop)")
-        detector = meta["loader"]()
         head = st.session_state[f"cnn_head_{selected}"]
         t0 = time.perf_counter()
-        feats = detector._get_features(crop_aug)
-        probs = head.predict_proba([feats])[0]
         dt = (time.perf_counter() - t0) * 1000
-        idx = np.argmax(probs)
-        st.write(f"**{head.classes_[idx]}** — {probs[idx]:.1%} confidence — {dt:.1f} ms")
 # ===========================================================================
@@ -275,11 +245,11 @@ rows = []
 if rce_acc is not None:
     rows.append({"Model": "RCE", "Train Accuracy": f"{rce_acc:.1%}",
                  "Vector Size": str(sum(10 for k in active_modules if active_modules[k]))})
-for name in CNN_MODELS:
     acc = st.session_state.get(f"cnn_acc_{name}")
     if acc is not None:
         rows.append({"Model": name, "Train Accuracy": f"{acc:.1%}",
-                     "Vector Size": f"{CNN_MODELS[name]['dim']}D"})
 if rows:
     import pandas as pd

 sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
 from src.detectors.rce.features import REGISTRY
+from src.models import BACKBONES, RecognitionHead
 st.set_page_config(page_title="Model Tuning", layout="wide")
 st.title("⚙️ Model Tuning: Train & Compare")
 active_modules = st.session_state.get("active_modules", {k: True for k in REGISTRY})
 # ---------------------------------------------------------------------------
 # Build training set from session data (no disk reads)
 # ---------------------------------------------------------------------------
     if st.button("🚀 Train RCE Head"):
         images, labels = build_training_set()
         from sklearn.metrics import accuracy_score
         progress = st.progress(0, text="Extracting RCE features...")
         progress.progress(1.0, text="Fitting Logistic Regression...")
         t0 = time.perf_counter()
+        head = RecognitionHead(C=rce_C, max_iter=rce_max_iter).fit(X, labels)
         train_time = time.perf_counter() - t0
         progress.progress(1.0, text="✅ Training complete!")
+        preds = head.model.predict(X)
         train_acc = accuracy_score(labels, preds)
         st.success(f"Trained in **{train_time:.2f}s**")
         head = st.session_state["rce_head"]
         t0 = time.perf_counter()
         vec = build_rce_vector(crop_aug)
+        label, conf = head.predict(vec)
         dt = (time.perf_counter() - t0) * 1000
+        st.write(f"**{label}** — {conf:.1%} confidence — {dt:.1f} ms")
 # ---------------------------------------------------------------------------
 with col_cnn:
     st.header("🧠 CNN Fine-Tuning")
+    selected = st.selectbox("Select Model", list(BACKBONES.keys()))
+    meta = BACKBONES[selected]
     st.caption(f"Backbone embedding: **{meta['dim']}D** → Logistic Regression head")
     st.subheader("Training Parameters")
     if st.button(f"🚀 Train {selected} Head"):
         images, labels = build_training_set()
+        backbone = meta["loader"]()          # cached frozen backbone
         from sklearn.metrics import accuracy_score
         progress = st.progress(0, text=f"Extracting {selected} features...")
         n = len(images)
         X = []
         for i, img in enumerate(images):
+            X.append(backbone.get_features(img))
             progress.progress((i + 1) / n, text=f"Feature extraction: {i+1}/{n}")
         X = np.array(X)
         progress.progress(1.0, text="Fitting Logistic Regression...")
         t0 = time.perf_counter()
+        head = RecognitionHead(C=cnn_C, max_iter=cnn_max_iter).fit(X, labels)
         train_time = time.perf_counter() - t0
         progress.progress(1.0, text="✅ Training complete!")
+        preds = head.model.predict(X)
         train_acc = accuracy_score(labels, preds)
         st.success(f"Trained in **{train_time:.2f}s**")
     if f"cnn_head_{selected}" in st.session_state:
         st.divider()
         st.subheader("Quick Predict (Crop)")
+        backbone = meta["loader"]()          # cached frozen backbone
         head = st.session_state[f"cnn_head_{selected}"]
         t0 = time.perf_counter()
+        feats = backbone.get_features(crop_aug)
+        label, conf = head.predict(feats)
         dt = (time.perf_counter() - t0) * 1000
+        st.write(f"**{label}** — {conf:.1%} confidence — {dt:.1f} ms")
 # ===========================================================================
 if rce_acc is not None:
     rows.append({"Model": "RCE", "Train Accuracy": f"{rce_acc:.1%}",
                  "Vector Size": str(sum(10 for k in active_modules if active_modules[k]))})
+for name in BACKBONES:
     acc = st.session_state.get(f"cnn_acc_{name}")
     if acc is not None:
         rows.append({"Model": name, "Train Accuracy": f"{acc:.1%}",
+                     "Vector Size": f"{BACKBONES[name]['dim']}D"})
 if rows:
     import pandas as pd

pages/5_RealTime_Detection.py CHANGED Viewed

	@@ -0,0 +1,338 @@

+import streamlit as st
+import cv2
+import numpy as np
+import time
+import plotly.graph_objects as go
+import sys, os
+sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+from src.detectors.rce.features import REGISTRY
+from src.models import BACKBONES, RecognitionHead
+st.set_page_config(page_title="Real-Time Detection", layout="wide")
+st.title("🎯 Real-Time Detection")
+# ---------------------------------------------------------------------------
+# Guard
+# ---------------------------------------------------------------------------
+if "pipeline_data" not in st.session_state or "crop" not in st.session_state.get("pipeline_data", {}):
+    st.error("Complete **Data Lab** first (upload assets & define a crop).")
+    st.stop()
+assets       = st.session_state["pipeline_data"]
+right_img    = assets["right"]
+crop         = assets["crop"]
+crop_aug     = assets.get("crop_aug", crop)
+bbox         = assets.get("crop_bbox", (0, 0, crop.shape[1], crop.shape[0]))
+active_mods  = st.session_state.get("active_modules", {k: True for k in REGISTRY})
+x0, y0, x1, y1 = bbox
+win_h, win_w = y1 - y0, x1 - x0   # window = same size as crop
+rce_head = st.session_state.get("rce_head")
+has_any_cnn = any(f"cnn_head_{n}" in st.session_state for n in BACKBONES)
+if rce_head is None and not has_any_cnn:
+    st.warning("No trained heads found. Go to **Model Tuning** and train at least one head.")
+    st.stop()
+# ===================================================================
+#  Sliding Window Engine  (shared by both sides)
+# ===================================================================
+def sliding_window_detect(
+    image: np.ndarray,
+    feature_fn,          # callable(patch_bgr) -> 1-D np.ndarray
+    head: RecognitionHead,
+    stride: int,
+    conf_thresh: float,
+    nms_iou: float,
+    progress_placeholder=None,
+    live_image_placeholder=None,
+):
+    """
+    Slide a window of size (win_h, win_w) across *image* with *stride*.
+    At each position call *feature_fn* → *head.predict*.
+    Returns (detections, heatmap, total_time_ms, n_windows).
+    Each detection is (x, y, x+win_w, y+win_h, label, confidence).
+    heatmap is a float32 array same size as image (object confidence).
+    """
+    H, W = image.shape[:2]
+    heatmap = np.zeros((H, W), dtype=np.float32)
+    detections = []
+    t0 = time.perf_counter()
+    positions = []
+    for y in range(0, H - win_h + 1, stride):
+        for x in range(0, W - win_w + 1, stride):
+            positions.append((x, y))
+    n_total = len(positions)
+    if n_total == 0:
+        return [], heatmap, 0.0, 0
+    for idx, (x, y) in enumerate(positions):
+        patch = image[y:y+win_h, x:x+win_w]
+        feats = feature_fn(patch)
+        label, conf = head.predict(feats)
+        # Fill heatmap with object confidence
+        if label == "object":
+            heatmap[y:y+win_h, x:x+win_w] = np.maximum(
+                heatmap[y:y+win_h, x:x+win_w], conf)
+            if conf >= conf_thresh:
+                detections.append((x, y, x+win_w, y+win_h, label, conf))
+        # Live updates (every 5th window or last)
+        if live_image_placeholder is not None and (idx % 5 == 0 or idx == n_total - 1):
+            vis = image.copy()
+            # Draw current scan position
+            cv2.rectangle(vis, (x, y), (x+win_w, y+win_h), (255, 255, 0), 1)
+            # Draw current detections
+            for dx, dy, dx2, dy2, dl, dc in detections:
+                cv2.rectangle(vis, (dx, dy), (dx2, dy2), (0, 255, 0), 2)
+                cv2.putText(vis, f"{dc:.0%}", (dx, dy - 4),
+                            cv2.FONT_HERSHEY_SIMPLEX, 0.4, (0, 255, 0), 1)
+            live_image_placeholder.image(
+                cv2.cvtColor(vis, cv2.COLOR_BGR2RGB),
+                caption=f"Scanning… {idx+1}/{n_total}",
+                use_container_width=True)
+        if progress_placeholder is not None:
+            progress_placeholder.progress(
+                (idx + 1) / n_total,
+                text=f"Window {idx+1}/{n_total}")
+    total_ms = (time.perf_counter() - t0) * 1000
+    # --- Non-Maximum Suppression ---
+    if detections:
+        detections = _nms(detections, nms_iou)
+    return detections, heatmap, total_ms, n_total
+def _nms(dets, iou_thresh):
+    """Greedy NMS on list of (x1,y1,x2,y2,label,conf)."""
+    dets = sorted(dets, key=lambda d: d[5], reverse=True)
+    keep = []
+    while dets:
+        best = dets.pop(0)
+        keep.append(best)
+        dets = [d for d in dets if _iou(best, d) < iou_thresh]
+    return keep
+def _iou(a, b):
+    """IoU between two (x1,y1,x2,y2,…) tuples."""
+    xi1 = max(a[0], b[0]); yi1 = max(a[1], b[1])
+    xi2 = min(a[2], b[2]); yi2 = min(a[3], b[3])
+    inter = max(0, xi2-xi1) * max(0, yi2-yi1)
+    aa = (a[2]-a[0])*(a[3]-a[1])
+    ab = (b[2]-b[0])*(b[3]-b[1])
+    return inter / (aa + ab - inter + 1e-6)
+# ===================================================================
+#  RCE feature function
+# ===================================================================
+def rce_feature_fn(patch_bgr):
+    gray = cv2.cvtColor(patch_bgr, cv2.COLOR_BGR2GRAY)
+    vec = []
+    for key, meta in REGISTRY.items():
+        if active_mods.get(key, False):
+            v, _ = meta["fn"](gray)
+            vec.extend(v)
+    return np.array(vec, dtype=np.float32)
+# ===================================================================
+#  Controls
+# ===================================================================
+st.subheader("Sliding Window Parameters")
+p1, p2, p3 = st.columns(3)
+stride = p1.slider("Stride (px)", 4, max(win_w, win_h),
+                    max(win_w // 4, 4), step=2,
+                    help="Lower = more windows = slower but finer")
+conf_thresh = p2.slider("Confidence Threshold", 0.5, 1.0, 0.7, 0.05)
+nms_iou = p3.slider("NMS IoU Threshold", 0.1, 0.9, 0.3, 0.05)
+st.caption(f"Window size: **{win_w}×{win_h} px**  |  "
+           f"Right image: **{right_img.shape[1]}×{right_img.shape[0]} px**  |  "
+           f"≈ {((right_img.shape[0]-win_h)//stride + 1) * ((right_img.shape[1]-win_w)//stride + 1)} windows")
+st.divider()
+# ===================================================================
+#  Side-by-side layout
+# ===================================================================
+col_rce, col_cnn = st.columns(2)
+# -------------------------------------------------------------------
+#  LEFT — RCE Detection
+# -------------------------------------------------------------------
+with col_rce:
+    st.header("🧬 RCE Detection")
+    if rce_head is None:
+        st.info("No RCE head trained. Train one in **Model Tuning**.")
+    else:
+        st.caption(f"Modules: {', '.join(REGISTRY[k]['label'] for k in active_mods if active_mods[k])}")
+        rce_run = st.button("▶ Run RCE Scan", key="rce_run")
+        rce_progress = st.empty()
+        rce_live     = st.empty()
+        rce_results  = st.container()
+        if rce_run:
+            dets, hmap, ms, nw = sliding_window_detect(
+                right_img, rce_feature_fn, rce_head,
+                stride, conf_thresh, nms_iou,
+                progress_placeholder=rce_progress,
+                live_image_placeholder=rce_live,
+            )
+            # Final image with boxes
+            final = right_img.copy()
+            for x1d, y1d, x2d, y2d, lbl, cf in dets:
+                cv2.rectangle(final, (x1d, y1d), (x2d, y2d), (0, 255, 0), 2)
+                cv2.putText(final, f"{cf:.0%}", (x1d, y1d - 6),
+                            cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
+            rce_live.image(cv2.cvtColor(final, cv2.COLOR_BGR2RGB),
+                           caption="RCE — Final Detections",
+                           use_container_width=True)
+            rce_progress.empty()
+            with rce_results:
+                # Metrics
+                rm1, rm2, rm3, rm4 = st.columns(4)
+                rm1.metric("Detections", len(dets))
+                rm2.metric("Windows", nw)
+                rm3.metric("Total Time", f"{ms:.0f} ms")
+                rm4.metric("Per Window", f"{ms/max(nw,1):.2f} ms")
+                # Confidence heatmap
+                if hmap.max() > 0:
+                    hmap_color = cv2.applyColorMap(
+                        (hmap / hmap.max() * 255).astype(np.uint8),
+                        cv2.COLORMAP_JET)
+                    blend = cv2.addWeighted(right_img, 0.5, hmap_color, 0.5, 0)
+                    st.image(cv2.cvtColor(blend, cv2.COLOR_BGR2RGB),
+                             caption="RCE — Confidence Heatmap",
+                             use_container_width=True)
+                # Detection table
+                if dets:
+                    import pandas as pd
+                    df = pd.DataFrame(dets, columns=["x1","y1","x2","y2","label","conf"])
+                    st.dataframe(df, use_container_width=True, hide_index=True)
+            st.session_state["rce_dets"] = dets
+            st.session_state["rce_det_ms"] = ms
+# -------------------------------------------------------------------
+#  RIGHT — CNN Detection
+# -------------------------------------------------------------------
+with col_cnn:
+    st.header("🧠 CNN Detection")
+    # Find which CNN heads are trained
+    trained_cnns = [n for n in BACKBONES if f"cnn_head_{n}" in st.session_state]
+    if not trained_cnns:
+        st.info("No CNN head trained. Train one in **Model Tuning**.")
+    else:
+        selected = st.selectbox("Select Model", trained_cnns, key="det_cnn_sel")
+        bmeta    = BACKBONES[selected]
+        backbone = bmeta["loader"]()
+        head     = st.session_state[f"cnn_head_{selected}"]
+        st.caption(f"Backbone: **{selected}** ({bmeta['dim']}D) — Head in session state")
+        cnn_run = st.button(f"▶ Run {selected} Scan", key="cnn_run")
+        cnn_progress = st.empty()
+        cnn_live     = st.empty()
+        cnn_results  = st.container()
+        if cnn_run:
+            dets, hmap, ms, nw = sliding_window_detect(
+                right_img, backbone.get_features, head,
+                stride, conf_thresh, nms_iou,
+                progress_placeholder=cnn_progress,
+                live_image_placeholder=cnn_live,
+            )
+            # Final image
+            final = right_img.copy()
+            for x1d, y1d, x2d, y2d, lbl, cf in dets:
+                cv2.rectangle(final, (x1d, y1d), (x2d, y2d), (0, 0, 255), 2)
+                cv2.putText(final, f"{cf:.0%}", (x1d, y1d - 6),
+                            cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 2)
+            cnn_live.image(cv2.cvtColor(final, cv2.COLOR_BGR2RGB),
+                           caption=f"{selected} — Final Detections",
+                           use_container_width=True)
+            cnn_progress.empty()
+            with cnn_results:
+                cm1, cm2, cm3, cm4 = st.columns(4)
+                cm1.metric("Detections", len(dets))
+                cm2.metric("Windows", nw)
+                cm3.metric("Total Time", f"{ms:.0f} ms")
+                cm4.metric("Per Window", f"{ms/max(nw,1):.2f} ms")
+                if hmap.max() > 0:
+                    hmap_color = cv2.applyColorMap(
+                        (hmap / hmap.max() * 255).astype(np.uint8),
+                        cv2.COLORMAP_JET)
+                    blend = cv2.addWeighted(right_img, 0.5, hmap_color, 0.5, 0)
+                    st.image(cv2.cvtColor(blend, cv2.COLOR_BGR2RGB),
+                             caption=f"{selected} — Confidence Heatmap",
+                             use_container_width=True)
+                if dets:
+                    import pandas as pd
+                    df = pd.DataFrame(dets, columns=["x1","y1","x2","y2","label","conf"])
+                    st.dataframe(df, use_container_width=True, hide_index=True)
+            st.session_state["cnn_dets"] = dets
+            st.session_state["cnn_det_ms"] = ms
+# ===================================================================
+#  Bottom — Comparison (if both have run)
+# ===================================================================
+rce_dets = st.session_state.get("rce_dets")
+cnn_dets = st.session_state.get("cnn_dets")
+if rce_dets is not None and cnn_dets is not None:
+    st.divider()
+    st.subheader("📊 Side-by-Side Comparison")
+    import pandas as pd
+    comp = pd.DataFrame({
+        "Metric": ["Detections", "Best Confidence", "Total Time (ms)"],
+        "RCE": [
+            len(rce_dets),
+            f"{max((d[5] for d in rce_dets), default=0):.1%}",
+            f"{st.session_state.get('rce_det_ms', 0):.0f}",
+        ],
+        "CNN": [
+            len(cnn_dets),
+            f"{max((d[5] for d in cnn_dets), default=0):.1%}",
+            f"{st.session_state.get('cnn_det_ms', 0):.0f}",
+        ],
+    })
+    st.dataframe(comp, use_container_width=True, hide_index=True)
+    # Overlay both on one image
+    overlay = right_img.copy()
+    for x1d, y1d, x2d, y2d, _, cf in rce_dets:
+        cv2.rectangle(overlay, (x1d, y1d), (x2d, y2d), (0, 255, 0), 2)
+        cv2.putText(overlay, f"RCE {cf:.0%}", (x1d, y1d - 6),
+                    cv2.FONT_HERSHEY_SIMPLEX, 0.4, (0, 255, 0), 1)
+    for x1d, y1d, x2d, y2d, _, cf in cnn_dets:
+        cv2.rectangle(overlay, (x1d, y1d), (x2d, y2d), (0, 0, 255), 2)
+        cv2.putText(overlay, f"CNN {cf:.0%}", (x1d, y2d + 12),
+                    cv2.FONT_HERSHEY_SIMPLEX, 0.4, (0, 0, 255), 1)
+    st.image(cv2.cvtColor(overlay, cv2.COLOR_BGR2RGB),
+             caption="Green = RCE  |  Blue = CNN",
+             use_container_width=True)

pages/6_Stereo_Geometry.py CHANGED Viewed

	@@ -0,0 +1,327 @@

+import streamlit as st
+import cv2
+import numpy as np
+import re
+import plotly.graph_objects as go
+import sys, os
+sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+st.set_page_config(page_title="Stereo Geometry", layout="wide")
+st.title("📐 Stereo Geometry: Distance Estimation")
+# ---------------------------------------------------------------------------
+# Guard
+# ---------------------------------------------------------------------------
+if "pipeline_data" not in st.session_state or "left" not in st.session_state.get("pipeline_data", {}):
+    st.error("Complete **Data Lab** first.")
+    st.stop()
+assets   = st.session_state["pipeline_data"]
+img_l    = assets["left"]
+img_r    = assets["right"]
+gt_left  = assets.get("gt_left")      # float32 depth map from PFM
+gt_right = assets.get("gt_right")
+conf_raw = assets.get("conf_raw", "")
+crop_bbox = assets.get("crop_bbox")    # (x0, y0, x1, y1) on LEFT image
+rce_dets = st.session_state.get("rce_dets", [])   # list of (x1,y1,x2,y2,label,conf)
+cnn_dets = st.session_state.get("cnn_dets", [])
+# ===================================================================
+#  Parse Middlebury-style camera config
+# ===================================================================
+def parse_config(text: str) -> dict:
+    """
+    Parse a Middlebury .txt / .conf calibration file.
+    Expected keys: cam0, cam1, doffs, baseline, width, height, ndisp, vmin, vmax
+    cam0 / cam1 are 3×3 matrices in bracket notation: [f 0 cx; 0 f cy; 0 0 1]
+    """
+    params = {}
+    for line in text.strip().splitlines():
+        line = line.strip()
+        if "=" not in line:
+            continue
+        key, val = line.split("=", 1)
+        key = key.strip()
+        val = val.strip()
+        # Matrix?
+        if "[" in val:
+            nums = list(map(float, re.findall(r"[-+]?\d*\.?\d+(?:[eE][-+]?\d+)?", val)))
+            if len(nums) == 9:
+                params[key] = np.array(nums).reshape(3, 3)
+            else:
+                params[key] = nums
+        else:
+            try:
+                params[key] = float(val)
+            except ValueError:
+                params[key] = val
+    return params
+calib = parse_config(conf_raw)
+# Extract intrinsics
+focal  = calib.get("cam0", np.eye(3))[0, 0] if isinstance(calib.get("cam0"), np.ndarray) else 0.0
+doffs   = float(calib.get("doffs", 0.0))
+baseline = float(calib.get("baseline", 1.0))
+ndisp    = int(calib.get("ndisp", 128))
+st.subheader("Camera Calibration")
+cc1, cc2, cc3, cc4 = st.columns(4)
+cc1.metric("Focal Length (px)", f"{focal:.1f}")
+cc2.metric("Baseline (mm)", f"{baseline:.1f}")
+cc3.metric("Doffs (px)", f"{doffs:.2f}")
+cc4.metric("ndisp", str(ndisp))
+with st.expander("Full Calibration"):
+    st.json({k: v.tolist() if isinstance(v, np.ndarray) else v for k, v in calib.items()})
+st.divider()
+# ===================================================================
+#  Step 1 — Compute Disparity Map
+# ===================================================================
+st.subheader("Step 1: Disparity Map (StereoSGBM)")
+sc1, sc2, sc3 = st.columns(3)
+block_size  = sc1.slider("Block Size", 3, 21, 5, step=2)
+p1_mult     = sc2.slider("P1 multiplier", 1, 32, 8)
+p2_mult     = sc3.slider("P2 multiplier", 1, 128, 32)
+@st.cache_data
+def compute_disparity(_left, _right, _ndisp, _block_size, _p1m, _p2m):
+    gray_l = cv2.cvtColor(_left,  cv2.COLOR_BGR2GRAY)
+    gray_r = cv2.cvtColor(_right, cv2.COLOR_BGR2GRAY)
+    # Align ndisp to 16
+    nd = max(16, (_ndisp // 16) * 16)
+    channels = 1
+    sgbm = cv2.StereoSGBM_create(
+        minDisparity=0,
+        numDisparities=nd,
+        blockSize=_block_size,
+        P1=_p1m * channels * _block_size ** 2,
+        P2=_p2m * channels * _block_size ** 2,
+        disp12MaxDiff=1,
+        uniquenessRatio=10,
+        speckleWindowSize=100,
+        speckleRange=32,
+        mode=cv2.STEREO_SGBM_MODE_SGBM_3WAY,
+    )
+    disp = sgbm.compute(gray_l, gray_r).astype(np.float32) / 16.0
+    return disp
+with st.spinner("Computing disparity..."):
+    disp = compute_disparity(img_l, img_r, ndisp, block_size, p1_mult, p2_mult)
+# Visualize disparity
+disp_vis = disp.copy()
+disp_vis[disp_vis <= 0] = 0
+disp_max = disp_vis.max() if disp_vis.max() > 0 else 1.0
+disp_norm = (disp_vis / disp_max * 255).astype(np.uint8)
+disp_color = cv2.applyColorMap(disp_norm, cv2.COLORMAP_INFERNO)
+dc1, dc2 = st.columns(2)
+dc1.image(cv2.cvtColor(img_l, cv2.COLOR_BGR2RGB), caption="Left Image", use_container_width=True)
+dc2.image(cv2.cvtColor(disp_color, cv2.COLOR_BGR2RGB), caption="Disparity Map (SGBM)", use_container_width=True)
+# ===================================================================
+#  Step 2 — Depth Map from Disparity
+# ===================================================================
+st.divider()
+st.subheader("Step 2: Depth Map from Disparity")
+st.latex(r"Z = \frac{f \times B}{d + d_{\text{offs}}}")
+st.caption("Z = depth (mm), f = focal length (px), B = baseline (mm), d = disparity (px), d_offs = optical center offset (px)")
+# Compute depth from disparity
+valid = (disp + doffs) > 0
+depth_map = np.zeros_like(disp)
+depth_map[valid] = (focal * baseline) / (disp[valid] + doffs)
+depth_map[~valid] = 0
+# Visualize
+depth_vis = depth_map.copy()
+finite = depth_vis[depth_vis > 0]
+if len(finite) > 0:
+    clip_max = np.percentile(finite, 98)
+    depth_vis = np.clip(depth_vis, 0, clip_max)
+    depth_norm = (depth_vis / clip_max * 255).astype(np.uint8)
+else:
+    depth_norm = np.zeros_like(depth_map, dtype=np.uint8)
+depth_color = cv2.applyColorMap(depth_norm, cv2.COLORMAP_TURBO)
+zc1, zc2 = st.columns(2)
+zc1.image(cv2.cvtColor(depth_color, cv2.COLOR_BGR2RGB),
+          caption="Estimated Depth (SGBM)", use_container_width=True)
+# Ground truth comparison
+if gt_left is not None:
+    gt_vis = gt_left.copy()
+    gt_finite = gt_vis[np.isfinite(gt_vis) & (gt_vis > 0)]
+    if len(gt_finite) > 0:
+        gt_clip = np.percentile(gt_finite, 98)
+        gt_vis = np.clip(np.nan_to_num(gt_vis, nan=0), 0, gt_clip)
+        gt_norm = (gt_vis / gt_clip * 255).astype(np.uint8)
+    else:
+        gt_norm = np.zeros_like(gt_vis, dtype=np.uint8)
+    gt_color = cv2.applyColorMap(gt_norm, cv2.COLORMAP_TURBO)
+    zc2.image(cv2.cvtColor(gt_color, cv2.COLOR_BGR2RGB),
+              caption="Ground Truth Depth", use_container_width=True)
+# ===================================================================
+#  Step 3 — Error Map (SGBM vs Ground Truth)
+# ===================================================================
+if gt_left is not None:
+    st.divider()
+    st.subheader("Step 3: Error Analysis (SGBM vs Ground Truth)")
+    # The GT is disparity in PFM for Middlebury; convert to depth for comparison
+    # Middlebury PFM stores DISPARITY, not depth. Let's handle both:
+    gt_disp = gt_left   # Middlebury standard: PFM = disparity map
+    gt_depth_from_disp = np.zeros_like(gt_disp)
+    gt_valid = np.isfinite(gt_disp) & (gt_disp + doffs > 0) & (gt_disp != np.inf)
+    gt_depth_from_disp[gt_valid] = (focal * baseline) / (gt_disp[gt_valid] + doffs)
+    # Crop to common valid region
+    both_valid = valid & gt_valid
+    if both_valid.any():
+        # Disparity error
+        disp_err = np.abs(disp - gt_disp)
+        disp_err[~both_valid] = 0
+        # Stats
+        err_vals = disp_err[both_valid]
+        mae = float(np.mean(err_vals))
+        rmse = float(np.sqrt(np.mean(err_vals ** 2)))
+        bad_2 = float(np.mean(err_vals > 2.0)) * 100  # % of pixels with error > 2px
+        em1, em2, em3 = st.columns(3)
+        em1.metric("MAE (px)", f"{mae:.2f}")
+        em2.metric("RMSE (px)", f"{rmse:.2f}")
+        em3.metric("Bad-2.0 (%)", f"{bad_2:.1f}%")
+        # Error heatmap
+        err_clip = np.clip(disp_err, 0, 10)
+        err_norm = (err_clip / 10 * 255).astype(np.uint8)
+        err_color = cv2.applyColorMap(err_norm, cv2.COLORMAP_HOT)
+        st.image(cv2.cvtColor(err_color, cv2.COLOR_BGR2RGB),
+                 caption="Disparity Error Map (red = high error, clipped at 10 px)",
+                 use_container_width=True)
+        # Histogram
+        fig = go.Figure(data=[go.Histogram(x=err_vals, nbinsx=50,
+                                           marker_color="#ff6361")])
+        fig.update_layout(title="Disparity Error Distribution",
+                          xaxis_title="Absolute Error (px)",
+                          yaxis_title="Pixel Count",
+                          template="plotly_dark", height=300)
+        st.plotly_chart(fig, use_container_width=True)
+    else:
+        st.warning("No overlapping valid pixels between SGBM disparity and ground truth.")
+# ===================================================================
+#  Step 4 — Object Distance from Detections
+# ===================================================================
+st.divider()
+st.subheader("Step 4: Object Distance Estimation")
+all_dets = []
+if rce_dets:
+    for d in rce_dets:
+        all_dets.append(("RCE", *d))
+if cnn_dets:
+    for d in cnn_dets:
+        all_dets.append(("CNN", *d))
+if not all_dets and crop_bbox is not None:
+    st.info("No detections from the Real-Time Detection page. Using the **crop bounding box on the left image** as a fallback.")
+    x0, y0, x1, y1 = crop_bbox
+    all_dets.append(("Crop (left)", x0, y0, x1, y1, "object", 1.0))
+elif not all_dets:
+    st.warning("No detections found. Run **Real-Time Detection** first, or define a crop in **Data Lab**.")
+    st.stop()
+# For each detection, compute median depth inside the bounding box
+import pandas as pd
+rows = []
+det_overlay = img_l.copy() if all_dets and all_dets[0][0] == "Crop (left)" else img_r.copy()
+for source, dx1, dy1, dx2, dy2, lbl, conf in all_dets:
+    dx1, dy1, dx2, dy2 = int(dx1), int(dy1), int(dx2), int(dy2)
+    # Clamp to image bounds
+    H, W = depth_map.shape[:2]
+    dx1c = max(0, min(dx1, W-1))
+    dy1c = max(0, min(dy1, H-1))
+    dx2c = max(0, min(dx2, W))
+    dy2c = max(0, min(dy2, H))
+    roi_depth = depth_map[dy1c:dy2c, dx1c:dx2c]
+    roi_disp  = disp[dy1c:dy2c, dx1c:dx2c]
+    roi_valid = roi_depth[roi_depth > 0]
+    if len(roi_valid) > 0:
+        med_depth = float(np.median(roi_valid))
+        mean_depth = float(np.mean(roi_valid))
+        med_disp  = float(np.median(roi_disp[roi_disp > 0])) if (roi_disp > 0).any() else 0
+    else:
+        med_depth = mean_depth = med_disp = 0.0
+    # Ground truth depth at this region (for comparison)
+    gt_depth_val = 0.0
+    if gt_left is not None:
+        gt_roi = gt_left[dy1c:dy2c, dx1c:dx2c]
+        gt_roi_valid = gt_roi[np.isfinite(gt_roi) & (gt_roi > 0)]
+        if len(gt_roi_valid) > 0:
+            # Convert GT disparity → depth
+            gt_med_disp = float(np.median(gt_roi_valid))
+            gt_depth_val = (focal * baseline) / (gt_med_disp + doffs) if (gt_med_disp + doffs) > 0 else 0
+    error_mm = abs(med_depth - gt_depth_val) if gt_depth_val > 0 else float('nan')
+    rows.append({
+        "Source":           source,
+        "Box":              f"({dx1},{dy1})→({dx2},{dy2})",
+        "Confidence":       f"{conf:.1%}" if isinstance(conf, float) else str(conf),
+        "Med Disparity":    f"{med_disp:.1f} px",
+        "Med Depth":        f"{med_depth:.0f} mm",
+        "Mean Depth":       f"{mean_depth:.0f} mm",
+        "GT Depth":         f"{gt_depth_val:.0f} mm" if gt_depth_val > 0 else "N/A",
+        "Error":            f"{error_mm:.0f} mm" if not np.isnan(error_mm) else "N/A",
+    })
+    # Draw on overlay
+    color = (0, 255, 0) if "RCE" in source else (0, 0, 255) if "CNN" in source else (255, 255, 0)
+    cv2.rectangle(det_overlay, (dx1c, dy1c), (dx2c, dy2c), color, 2)
+    depth_str = f"{med_depth/1000:.2f}m" if med_depth > 0 else "?"
+    cv2.putText(det_overlay, f"{source} {depth_str}",
+                (dx1c, dy1c - 6), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)
+# Show overlay
+st.image(cv2.cvtColor(det_overlay, cv2.COLOR_BGR2RGB),
+         caption="Detections with Estimated Distance",
+         use_container_width=True)
+# Table
+st.dataframe(pd.DataFrame(rows), use_container_width=True, hide_index=True)
+# Big metric cards for the best detection
+if rows:
+    best = rows[0]
+    st.divider()
+    st.subheader("🎯 Primary Detection — Distance")
+    bc1, bc2, bc3 = st.columns(3)
+    bc1.metric("Estimated Depth", best["Med Depth"])
+    bc2.metric("Ground Truth", best["GT Depth"])
+    bc3.metric("Absolute Error", best["Error"])

src/models.py ADDED Viewed

	@@ -0,0 +1,250 @@

+"""
+src/models.py  —  Central Model Registry
+=========================================
+Downloads backbone weights **once** from the internet (PyTorch Hub / timm),
+freezes every feature-extraction layer, and caches the result in RAM with
+Streamlit's ``@st.cache_resource``.
+Strategy
+--------
+1. **Freeze the Backbone**   → ``requires_grad = False`` on every parameter.
+   The backbone is a pure feature extractor — no gradient updates, ever.
+2. **Cache the Resource**    → ``@st.cache_resource`` keeps the heavy model
+   in RAM even when you switch pages.
+3. **Define the Head**       → ``RecognitionHead``: a tiny sklearn
+   LogisticRegression that takes the backbone's feature vector and
+   produces a recognition score.  Lives only in ``st.session_state``.
+"""
+import streamlit as st
+import torch
+import torch.nn as nn
+import torchvision.models as models
+import torchvision.transforms as transforms
+import timm
+import cv2
+import numpy as np
+# ---------------------------------------------------------------------------
+# Device selection (MPS > CUDA > CPU)
+# ---------------------------------------------------------------------------
+DEVICE = (
+    "mps"  if torch.backends.mps.is_available()  else
+    "cuda" if torch.cuda.is_available()           else
+    "cpu"
+)
+# ---------------------------------------------------------------------------
+# Shared ImageNet preprocessing
+# ---------------------------------------------------------------------------
+_IMAGENET_TRANSFORM = transforms.Compose([
+    transforms.ToPILImage(),
+    transforms.Resize((224, 224)),
+    transforms.ToTensor(),
+    transforms.Normalize(mean=[0.485, 0.456, 0.406],
+                         std=[0.229, 0.224, 0.225]),
+])
+# ===================================================================
+#  Base class
+# ===================================================================
+class _FrozenBackbone:
+    """Shared helpers: freeze, normalise activation maps."""
+    DIM: int = 0                     # overridden by subclasses
+    # --- freeze every parameter ---
+    def _freeze(self, model: nn.Module) -> nn.Module:
+        model.eval()
+        for p in model.parameters():
+            p.requires_grad = False
+        return model.to(DEVICE)
+    # --- public interface ---
+    def get_features(self, img_bgr: np.ndarray) -> np.ndarray:
+        """Return a 1-D float32 feature vector for *img_bgr* (BGR uint8)."""
+        raise NotImplementedError
+    def get_activation_maps(self, img_bgr: np.ndarray,
+                            n_maps: int = 6) -> list[np.ndarray]:
+        """Return *n_maps* normalised float32 spatial activation maps."""
+        raise NotImplementedError
+    @staticmethod
+    def _norm(m: np.ndarray) -> np.ndarray:
+        lo, hi = m.min(), m.max()
+        return ((m - lo) / (hi - lo + 1e-5)).astype(np.float32)
+# ===================================================================
+#  ResNet-18
+# ===================================================================
+class ResNet18Backbone(_FrozenBackbone):
+    """ResNet-18 downloaded from PyTorch Hub, frozen, classifier removed."""
+    DIM = 512
+    def __init__(self):
+        full = models.resnet18(weights=models.ResNet18_Weights.DEFAULT)
+        self.backbone = self._freeze(full)
+        self.extractor = nn.Sequential(*list(full.children())[:-1]).to(DEVICE)
+        self.transform = _IMAGENET_TRANSFORM
+    def get_features(self, img_bgr):
+        t = self.transform(cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB))
+        with torch.no_grad():
+            return self.extractor(t.unsqueeze(0).to(DEVICE)).cpu().numpy().flatten()
+    def get_activation_maps(self, img_bgr, n_maps=6):
+        cap = {}
+        hook = self.backbone.layer4.register_forward_hook(
+            lambda _m, _i, o: cap.update(feat=o))
+        t = self.transform(cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB))
+        with torch.no_grad():
+            self.backbone(t.unsqueeze(0).to(DEVICE))
+        hook.remove()
+        acts = cap["feat"][0].cpu().numpy()
+        return [self._norm(acts[i]) for i in range(min(n_maps, acts.shape[0]))]
+# ===================================================================
+#  MobileNetV3-Small
+# ===================================================================
+class MobileNetV3Backbone(_FrozenBackbone):
+    """MobileNetV3-Small from PyTorch Hub, frozen, classifier = Identity."""
+    DIM = 576
+    def __init__(self):
+        self.backbone = models.mobilenet_v3_small(
+            weights=models.MobileNet_V3_Small_Weights.DEFAULT)
+        self.backbone.classifier = nn.Identity()
+        self._freeze(self.backbone)
+        self.transform = _IMAGENET_TRANSFORM
+    def get_features(self, img_bgr):
+        t = self.transform(cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB))
+        with torch.no_grad():
+            return self.backbone(t.unsqueeze(0).to(DEVICE)).cpu().numpy().flatten()
+    def get_activation_maps(self, img_bgr, n_maps=6):
+        cap = {}
+        hook = self.backbone.features[-1].register_forward_hook(
+            lambda _m, _i, o: cap.update(feat=o))
+        t = self.transform(cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB))
+        with torch.no_grad():
+            self.backbone(t.unsqueeze(0).to(DEVICE))
+        hook.remove()
+        acts = cap["feat"][0].cpu().numpy()
+        return [self._norm(acts[i]) for i in range(min(n_maps, acts.shape[0]))]
+# ===================================================================
+#  MobileViT-XXS
+# ===================================================================
+class MobileViTBackbone(_FrozenBackbone):
+    """MobileViT-XXS from timm (Apple Research), frozen."""
+    DIM = 320
+    def __init__(self):
+        self.backbone = timm.create_model(
+            "mobilevit_xxs.cvnets_in1k", pretrained=True, num_classes=0)
+        self._freeze(self.backbone)
+        cfg = timm.data.resolve_model_data_config(self.backbone)
+        self.transform = timm.data.create_transform(**cfg, is_training=False)
+    def _to_tensor(self, img_bgr):
+        from PIL import Image
+        pil = Image.fromarray(cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB))
+        return self.transform(pil).unsqueeze(0).to(DEVICE)
+    def get_features(self, img_bgr):
+        with torch.no_grad():
+            return self.backbone(self._to_tensor(img_bgr)).cpu().numpy().flatten()
+    def get_activation_maps(self, img_bgr, n_maps=6):
+        cap = {}
+        hook = self.backbone.stages[-1].register_forward_hook(
+            lambda _m, _i, o: cap.update(feat=o))
+        with torch.no_grad():
+            self.backbone(self._to_tensor(img_bgr))
+        hook.remove()
+        acts = cap["feat"][0].cpu().numpy()
+        return [self._norm(acts[i]) for i in range(min(n_maps, acts.shape[0]))]
+# ===================================================================
+#  Lightweight Head  (lives in session state, never on disk)
+# ===================================================================
+class RecognitionHead:
+    """
+    A tiny trainable layer on top of a frozen backbone.
+    Wraps sklearn ``LogisticRegression`` for binary classification.
+    Stored in ``st.session_state`` — never saved to disk.
+    """
+    def __init__(self, C: float = 1.0, max_iter: int = 1000):
+        from sklearn.linear_model import LogisticRegression
+        self.model = LogisticRegression(C=C, max_iter=max_iter)
+        self.is_trained = False
+    def fit(self, X, y):
+        self.model.fit(X, y)
+        self.is_trained = True
+        return self
+    def predict(self, features: np.ndarray):
+        """Return *(label, confidence)* for a single feature vector."""
+        probs = self.model.predict_proba([features])[0]
+        idx = int(np.argmax(probs))
+        return self.model.classes_[idx], probs[idx]
+    def predict_proba(self, X):
+        return self.model.predict_proba(X)
+    @property
+    def classes_(self):
+        return self.model.classes_
+# ===================================================================
+#  Cached loaders  —  @st.cache_resource keeps models in RAM
+# ===================================================================
+@st.cache_resource
+def get_resnet() -> ResNet18Backbone:
+    """Download & freeze ResNet-18.  Stays in RAM across page switches."""
+    return ResNet18Backbone()
+@st.cache_resource
+def get_mobilenet() -> MobileNetV3Backbone:
+    """Download & freeze MobileNetV3-Small.  Stays in RAM."""
+    return MobileNetV3Backbone()
+@st.cache_resource
+def get_mobilevit() -> MobileViTBackbone:
+    """Download & freeze MobileViT-XXS.  Stays in RAM."""
+    return MobileViTBackbone()
+# ===================================================================
+#  BACKBONES  —  The Registry Dict
+# ===================================================================
+BACKBONES = {
+    "ResNet-18": {
+        "loader":     get_resnet,
+        "dim":        ResNet18Backbone.DIM,
+        "hook_layer": "layer4 (last conv block)",
+    },
+    "MobileNetV3": {
+        "loader":     get_mobilenet,
+        "dim":        MobileNetV3Backbone.DIM,
+        "hook_layer": "features[-1] (last features block)",
+    },
+    "MobileViT-XXS": {
+        "loader":     get_mobilevit,
+        "dim":        MobileViTBackbone.DIM,
+        "hook_layer": "stages[-1] (last transformer stage)",
+    },
+}