TTA: Hungarian-matched multi-seed inference with strict 3-pass agreement

Three priority-sample seeds (2718, 31415, 42), Hungarian-match segments
across passes via flip-invariant endpoint distance, drop anchor segments
that don't appear in BOTH supporting passes (min_passes_for_keep=2).
Surviving segments are orientation-aligned and averaged across the 2-3
passes that saw them.

The previous TTA (857514e, reverted) picked one pass's output - that's
why it failed. This version aggregates and filters.

Local 100-sample A/B vs orphan_refine baseline:
baseline (orphan_refine): mean=0.3856 q5=0.0620 q50=0.3946
TTA simple concat-and-merge: mean=0.3756 q5=0.0842 (-0.010, REJECTED)
TTA Hungarian min_passes=1: mean=0.3853 q5=0.0848 (-0.000, neutral)
TTA Hungarian min_passes=2: mean=0.3888 q5=0.0907 q50=0.4032 (+0.003 mean,
+0.029 q5)
The strict variant wins primarily on hard scenes - q5 jumps 47%
(0.062 -> 0.091), q25 +0.006, q50 +0.009. Strict filter throws out
hallucinations on difficult scenes where the model is uncertain across
sampling seeds.

Cost: 3x inference time. Feature-flagged via USE_TTA in script.py for
easy revert if HF Space hits time limits.

Also includes (not used in production yet):
- edge_fill.py: attempted edge filling from 2D mask evidence
(rejected in A/B testing; was net -0.004 even with capped fills)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Files changed (4) hide show

edge_fill.py +161 -0
local_eval.py +75 -12
script.py +47 -9
tta.py +245 -0

edge_fill.py ADDED Viewed

	@@ -0,0 +1,161 @@

+"""Edge filling from 2D gestalt evidence.
+For each pair of predicted vertices (V_i, V_j) that is NOT currently an edge:
+  1. Project both endpoints into every COLMAP view.
+  2. Sample N points along the projected 2D segment.
+  3. Count points falling on gestalt edge-class pixels (using the dilated mask).
+  4. A view "supports" the candidate edge if support_frac >= min_pixel_frac.
+  5. If at least min_views_support views agree, ADD the edge.
+This is the inverse of edge_2d_filter.filter_edges_by_2d_support:
+the filter regressed q5 because dropping edges based on a binary mask check
+hurt recall. Adding edges is asymmetric: false positives waste a precision
+slot, but false negatives are catastrophic (real edges missing). With strong
+thresholds we should mostly add genuinely-missed edges.
+Conservative defaults: 40% min support, 2+ views agreeing, max edge length
+5m (most building edges are short). Pairs are scored by closest-first and
+capped at max_pair_check to keep cost bounded.
+Topology change only — adds new edges, never moves or removes vertices.
+Falls back to (pv, pe) on any error.
+"""
+from __future__ import annotations
+import numpy as np
+import cv2
+def fill_missing_edges_from_2d(
+    pv,
+    pe,
+    sample,
+    min_views_support: int = 3,
+    min_pixel_frac: float = 0.60,
+    max_edge_length_meters: float = 5.0,
+    max_pair_check: int = 100,
+    max_fills_abs: int = 6,
+    max_fills_rel: float = 0.25,
+    dilate_px: int = 4,
+    sample_steps: int = 20,
+):
+    """Add edges between existing vertex pairs that have strong 2D edge support.
+    Args:
+        pv: (N, 3) vertices in world coordinates.
+        pe: existing edge list. New edges are appended; existing edges
+            are never removed.
+        sample: raw dataset entry.
+        min_views_support: minimum views with strong mask support to add edge.
+        min_pixel_frac: fraction of sampled segment pixels that must lie on
+            a gestalt edge-class pixel for a view to count as supporting.
+        max_edge_length_meters: skip pairs whose 3D distance exceeds this.
+        max_pair_check: hard cap on candidate pairs evaluated per sample
+            (sorted by ascending 3D distance, so closest pairs go first).
+        dilate_px: edge-mask dilation radius (same as edge_2d_filter).
+        sample_steps: number of points sampled along each 2D segment.
+    Returns:
+        (pv, pe_extended). Vertex array unchanged; edges only grow.
+        Falls back to inputs on any error.
+    """
+    try:
+        from hoho2025.example_solutions import convert_entry_to_human_readable
+        from mvs_utils import collect_views, project_world_to_image
+        from edge_2d_filter import _build_edge_masks
+        pv_arr = np.asarray(pv, dtype=np.float64)
+        if pv_arr.ndim != 2 or pv_arr.shape[0] < 2:
+            return pv, pe
+        good = convert_entry_to_human_readable(sample)
+        colmap_rec = good.get("colmap") or good.get("colmap_binary")
+        if colmap_rec is None:
+            return pv, pe
+        views = collect_views(colmap_rec, good["image_ids"])
+        if len(views) < min_views_support:
+            return pv, pe
+        view_masks = _build_edge_masks(good, views, dilate_px=dilate_px)
+        if not view_masks:
+            return pv, pe
+        existing = set()
+        for a, b in pe:
+            a, b = int(a), int(b)
+            lo, hi = (a, b) if a < b else (b, a)
+            existing.add((lo, hi))
+        N = pv_arr.shape[0]
+        # Build candidate list: all pairs not already in `existing` and within
+        # max_edge_length. Sort by ascending 3D distance and cap.
+        candidates = []
+        for i in range(N):
+            for j in range(i + 1, N):
+                if (i, j) in existing:
+                    continue
+                d = float(np.linalg.norm(pv_arr[i] - pv_arr[j]))
+                if d > max_edge_length_meters or d < 1e-3:
+                    continue
+                candidates.append((d, i, j))
+        candidates.sort()
+        if len(candidates) > max_pair_check:
+            candidates = candidates[:max_pair_check]
+        if not candidates:
+            return pv, pe
+        # Cache view list once (avoid dict-iteration in hot loop)
+        view_items = [(img_id, views[img_id]["P"], *view_masks[img_id])
+                      for img_id in view_masks]
+        # Score each candidate by (total support across views, # views supporting).
+        # Then accept only those meeting the threshold, capped by ranking.
+        scored = []
+        for _, i, j in candidates:
+            endpoints = np.stack([pv_arr[i], pv_arr[j]])
+            supporting = 0
+            total_support = 0.0
+            for _img_id, P, mask_bool, H, W in view_items:
+                uv, z = project_world_to_image(P, endpoints)
+                if z[0] <= 0 or z[1] <= 0:
+                    continue
+                if not (
+                    0 <= uv[0, 0] < W and 0 <= uv[0, 1] < H
+                    and 0 <= uv[1, 0] < W and 0 <= uv[1, 1] < H
+                ):
+                    continue
+                t = np.linspace(0.0, 1.0, sample_steps)
+                xs = uv[0, 0] + t * (uv[1, 0] - uv[0, 0])
+                ys = uv[0, 1] + t * (uv[1, 1] - uv[0, 1])
+                xs_i = np.clip(xs.astype(np.int32), 0, W - 1)
+                ys_i = np.clip(ys.astype(np.int32), 0, H - 1)
+                frac = int(mask_bool[ys_i, xs_i].sum()) / float(sample_steps)
+                if frac >= min_pixel_frac:
+                    supporting += 1
+                    total_support += frac
+            if supporting >= min_views_support:
+                # Score: prioritize multi-view agreement, break ties on total support
+                scored.append((supporting, total_support, i, j))
+        if not scored:
+            return pv, pe
+        # Cap additions: min(absolute cap, rel-fraction of existing edges)
+        max_to_add = max(1, min(max_fills_abs,
+                                int(max_fills_rel * max(len(pe), 1))))
+        scored.sort(reverse=True)
+        added = [(i, j) for _, _, i, j in scored[:max_to_add]]
+        new_pe = list(pe) + added
+        return pv, new_pe
+    except Exception:
+        return pv, pe

local_eval.py CHANGED Viewed

@@ -58,13 +58,30 @@ def parse_args():
                    help="vertex refine: min views with 2D match")
     p.add_argument("--refine-max-move", type=float, default=0.5,
                    help="vertex refine: max 3D displacement in meters")
     return p.parse_args()
 def predict_one(sample, model, device, cfg, rng,
                 use_tracks=True, use_2d_filter=True, orphan_only=False,
                 strict_no_support=False, vertex_refine=False,
-                refine_kwargs=None):
     """Run the full inference pipeline on one sample. Returns (pv, pe, diag)."""
     diag = {"colmap": -1, "fused": 0, "track_v": 0, "track_e": 0,
             "pred_v": 0, "pred_e": 0, "2dfilt_in": 0, "2dfilt_out": 0,
@@ -79,17 +96,40 @@ def predict_one(sample, model, device, cfg, rng,
     except Exception:
         pass
-    fused = script.fuse_and_sample(sample, cfg, rng)
-    if fused is None:
-        diag["status"] = "fuse_failed"
-        return *script.empty_solution(), diag
-    diag["fused"] = len(fused["xyz_norm"])
-    try:
-        pred_v, pred_e = script.predict_sample(fused, model, device)
-    except Exception as e:
-        diag["status"] = f"predict_failed:{type(e).__name__}"
-        return *script.empty_solution(), diag
     if use_tracks:
         try:
@@ -144,6 +184,17 @@ def predict_one(sample, model, device, cfg, rng,
             diag["status"] = f"2dfilt_failed:{type(e).__name__}"
     diag["2dfilt_out"] = len(pred_e) if hasattr(pred_e, '__len__') else 0
     diag["pred_v"] = len(pred_v) if hasattr(pred_v, '__len__') else 0
     diag["pred_e"] = len(pred_e) if hasattr(pred_e, '__len__') else 0
     return pred_v, pred_e, diag
@@ -212,6 +263,14 @@ def main():
                 "min_views": args.refine_min_views,
                 "max_move_meters": args.refine_max_move,
             }
             pred_v, pred_e, diag = predict_one(
                 sample, model, device, cfg, rng,
                 use_tracks=not args.no_tracks,
@@ -219,7 +278,11 @@ def main():
                 orphan_only=args.orphan_only,
                 strict_no_support=args.strict_no_support,
                 vertex_refine=args.vertex_refine,
-                refine_kwargs=refine_kwargs)
             if torch.backends.mps.is_available():
                 torch.mps.empty_cache()

                    help="vertex refine: min views with 2D match")
     p.add_argument("--refine-max-move", type=float, default=0.5,
                    help="vertex refine: max 3D displacement in meters")
+    p.add_argument("--tta", action="store_true",
+                   help="enable multi-seed TTA (3 priority-sample seeds, concat segments)")
+    p.add_argument("--tta-hungarian", action="store_true",
+                   help="use Hungarian-matched averaging TTA (rejects unmatched segments)")
+    p.add_argument("--tta-min-passes", type=int, default=1,
+                   help="hungarian TTA: drop anchor segments without this many supporting passes")
+    p.add_argument("--tta-seeds", type=str, default="2718,31415,42",
+                   help="comma-separated priority-sample seeds for TTA")
+    p.add_argument("--edge-fill", action="store_true",
+                   help="enable edge filling from 2D mask evidence")
+    p.add_argument("--fill-min-views", type=int, default=2,
+                   help="edge fill: min views supporting a new edge")
+    p.add_argument("--fill-min-frac", type=float, default=0.40,
+                   help="edge fill: min support fraction along projected segment")
+    p.add_argument("--fill-max-length", type=float, default=5.0,
+                   help="edge fill: max edge length in meters")
     return p.parse_args()
 def predict_one(sample, model, device, cfg, rng,
                 use_tracks=True, use_2d_filter=True, orphan_only=False,
                 strict_no_support=False, vertex_refine=False,
+                refine_kwargs=None, edge_fill=False, fill_kwargs=None,
+                tta=False, tta_seeds=None):
     """Run the full inference pipeline on one sample. Returns (pv, pe, diag)."""
     diag = {"colmap": -1, "fused": 0, "track_v": 0, "track_e": 0,
             "pred_v": 0, "pred_e": 0, "2dfilt_in": 0, "2dfilt_out": 0,
     except Exception:
         pass
+    if tta:
+        try:
+            seeds = tta_seeds or (2718, 31415, 42)
+            tta_method = (
+                "predict_sample_tta_hungarian"
+                if getattr(predict_one, "_tta_hungarian", False)
+                else "predict_sample_tta"
+            )
+            import tta as _tta_mod
+            fn = getattr(_tta_mod, tta_method)
+            if tta_method == "predict_sample_tta_hungarian":
+                pred_v, pred_e = fn(
+                    sample, cfg, model, device, seeds=tuple(seeds),
+                    min_passes_for_keep=getattr(predict_one, "_tta_min_passes", 1),
+                )
+            else:
+                pred_v, pred_e = fn(
+                    sample, cfg, model, device, seeds=tuple(seeds))
+            diag["fused"] = -1  # not single-seed
+        except Exception as e:
+            diag["status"] = f"tta_failed:{type(e).__name__}"
+            return *script.empty_solution(), diag
+    else:
+        fused = script.fuse_and_sample(sample, cfg, rng)
+        if fused is None:
+            diag["status"] = "fuse_failed"
+            return *script.empty_solution(), diag
+        diag["fused"] = len(fused["xyz_norm"])
+        try:
+            pred_v, pred_e = script.predict_sample(fused, model, device)
+        except Exception as e:
+            diag["status"] = f"predict_failed:{type(e).__name__}"
+            return *script.empty_solution(), diag
     if use_tracks:
         try:
             diag["status"] = f"2dfilt_failed:{type(e).__name__}"
     diag["2dfilt_out"] = len(pred_e) if hasattr(pred_e, '__len__') else 0
+    if edge_fill:
+        e_before = len(pred_e) if hasattr(pred_e, '__len__') else 0
+        try:
+            from edge_fill import fill_missing_edges_from_2d
+            pred_v, pred_e = fill_missing_edges_from_2d(
+                pred_v, pred_e, sample,
+                **(fill_kwargs or {}))
+            diag["filled"] = (len(pred_e) if hasattr(pred_e, '__len__') else 0) - e_before
+        except Exception as e:
+            diag["status"] = f"fill_failed:{type(e).__name__}"
     diag["pred_v"] = len(pred_v) if hasattr(pred_v, '__len__') else 0
     diag["pred_e"] = len(pred_e) if hasattr(pred_e, '__len__') else 0
     return pred_v, pred_e, diag
                 "min_views": args.refine_min_views,
                 "max_move_meters": args.refine_max_move,
             }
+            fill_kwargs = {
+                "min_views_support": args.fill_min_views,
+                "min_pixel_frac": args.fill_min_frac,
+                "max_edge_length_meters": args.fill_max_length,
+            }
+            tta_seeds_tuple = tuple(int(s) for s in args.tta_seeds.split(","))
+            predict_one._tta_hungarian = args.tta_hungarian
+            predict_one._tta_min_passes = args.tta_min_passes
             pred_v, pred_e, diag = predict_one(
                 sample, model, device, cfg, rng,
                 use_tracks=not args.no_tracks,
                 orphan_only=args.orphan_only,
                 strict_no_support=args.strict_no_support,
                 vertex_refine=args.vertex_refine,
+                refine_kwargs=refine_kwargs,
+                edge_fill=args.edge_fill,
+                fill_kwargs=fill_kwargs,
+                tta=args.tta,
+                tta_seeds=tta_seeds_tuple)
             if torch.backends.mps.is_available():
                 torch.mps.empty_cache()

script.py CHANGED Viewed

@@ -59,6 +59,14 @@ CONF_THRESH = 0.4
 MERGE_THRESH = 0.4
 SNAP_RADIUS = 0.5
 def fuse_and_sample(sample, cfg, rng):
     """Run point fusion + priority sampling on a raw dataset sample.
@@ -398,21 +406,51 @@ if __name__ == "__main__":
             except Exception:
                 pass
-            # Fuse + sample
-            fused = fuse_and_sample(sample, cfg, rng)
-            n_fused_pts = len(fused["xyz_norm"]) if fused is not None else 0
             track_v_count, track_e_count = 0, 0
             pred_status = "ok"
-            if fused is None:
-                pred_v, pred_e = empty_solution()
-                pred_status = "fuse_failed"
-            else:
                 try:
-                    pred_v, pred_e = predict_sample(fused, model, device)
                     if torch.cuda.is_available():
                         torch.cuda.empty_cache()
                     # Apply handcrafted triangulation tracking to catch missing corners/edges
                     try:
                         from triangulation import predict_wireframe_tracks

 MERGE_THRESH = 0.4
 SNAP_RADIUS = 0.5
+# Test-time augmentation: 3 priority-sample seeds + Hungarian matching with
+# strict 3-pass agreement (min_passes_for_keep=2). Local 100-sample A/B:
+# q5 0.062 -> 0.091 (+47%), mean +0.003. Costs 3x inference time but
+# strict filter dramatically improves precision on hard scenes.
+USE_TTA = True
+TTA_SEEDS = (2718, 31415, 42)
+TTA_MIN_PASSES = 2
 def fuse_and_sample(sample, cfg, rng):
     """Run point fusion + priority sampling on a raw dataset sample.
             except Exception:
                 pass
             track_v_count, track_e_count = 0, 0
             pred_status = "ok"
+            n_fused_pts = 0
+            if USE_TTA:
+                # Multi-seed TTA: fuse + predict 3 times, Hungarian-match segments
+                # across passes, drop those without min_passes agreement.
                 try:
+                    from tta import predict_sample_tta_hungarian
+                    pred_v, pred_e = predict_sample_tta_hungarian(
+                        sample, cfg, model, device,
+                        seeds=TTA_SEEDS,
+                        min_passes_for_keep=TTA_MIN_PASSES,
+                    )
                     if torch.cuda.is_available():
                         torch.cuda.empty_cache()
+                except Exception as e:
+                    import traceback
+                    print(f"  TTA failed for {order_id}:\n{traceback.format_exc()}")
+                    pred_v, pred_e = empty_solution()
+                    pred_status = "tta_failed"
+                    if torch.cuda.is_available():
+                        torch.cuda.empty_cache()
+            else:
+                # Single-seed inference (legacy path, kept for easy revert).
+                fused = fuse_and_sample(sample, cfg, rng)
+                n_fused_pts = len(fused["xyz_norm"]) if fused is not None else 0
+                if fused is None:
+                    pred_v, pred_e = empty_solution()
+                    pred_status = "fuse_failed"
+                else:
+                    try:
+                        pred_v, pred_e = predict_sample(fused, model, device)
+                        if torch.cuda.is_available():
+                            torch.cuda.empty_cache()
+                    except Exception as e:
+                        import traceback
+                        print(f"  Predict failed for {order_id}:\n{traceback.format_exc()}")
+                        pred_v, pred_e = empty_solution()
+                        pred_status = "predict_failed"
+                        if torch.cuda.is_available():
+                            torch.cuda.empty_cache()
+            if pred_status == "ok":
+                try:
                     # Apply handcrafted triangulation tracking to catch missing corners/edges
                     try:
                         from triangulation import predict_wireframe_tracks

tta.py ADDED Viewed

	@@ -0,0 +1,245 @@

+"""Test-time augmentation via multi-seed priority sampling.
+Runs the model N times with different priority-sample seeds, concatenates the
+world-space segment predictions from each pass, and lets the standard
+merge_vertices_iterative do the deduplication. Because the iterative merge
+takes union-find clusters and uses each cluster's centroid as the merged
+position, this is effectively Hungarian-averaging for matched segments —
+without needing to solve a real assignment problem.
+The previous TTA attempt (commit 857514e, reverted) failed because it picked
+ONE pass's output. This implementation aggregates ALL passes' segments and
+lets the established merge logic combine them.
+Why it should work:
+  - Stochastic variation comes from priority-sample seed (the model itself is
+    deterministic). Different seeds give different points → slightly different
+    model predictions for the same scene.
+  - Matched segments (true edges) appear in all passes near each other → they
+    cluster in the merge and get averaged toward consensus.
+  - Spurious segments (hallucinations) appear in only 1 pass → they survive
+    individually but are typically not high-confidence enough to win.
+  - The iterative merge thresholds 0.15→0.6 m are appropriate for the typical
+    inter-pass jitter of correctly-predicted segments.
+"""
+from __future__ import annotations
+import numpy as np
+import torch
+import script
+from s23dr_2026_example.segment_postprocess import merge_vertices_iterative
+from s23dr_2026_example.varifold import segments_to_vertices_edges
+from s23dr_2026_example.postprocess_v2 import snap_to_point_cloud, snap_horizontal
+def _model_to_world_segments(sample_dict, model, device):
+    """Run the model on a single fused sample, return (N, 2, 3) world segments.
+    Returns None if no segments pass the confidence threshold.
+    """
+    tokens, masks = script.build_tokens_single(sample_dict, model, device)
+    scale = float(sample_dict["scale"])
+    center = sample_dict["center"]
+    with torch.no_grad(), torch.autocast(
+        device_type='cuda', dtype=torch.float16,
+        enabled=(device.type == 'cuda'),
+    ):
+        out = model.forward_tokens(tokens, masks)
+    segs = out["segments"][0].float().cpu()
+    conf = (
+        torch.sigmoid(out["conf"][0].float()).cpu().numpy()
+        if "conf" in out else None
+    )
+    if conf is not None:
+        segs = segs[conf > script.CONF_THRESH]
+    if len(segs) < 1:
+        return None
+    return segs.numpy() * scale + center  # (N, 2, 3)
+def _match_segments(anchor_segs, other_segs, max_endpoint_dist=0.4):
+    """Hungarian-match segments between two passes (flip-invariant distance).
+    Returns list of (i_anchor, j_other, was_flipped) for accepted matches.
+    Matches with cost > 2*max_endpoint_dist are rejected.
+    """
+    if len(anchor_segs) == 0 or len(other_segs) == 0:
+        return []
+    from scipy.optimize import linear_sum_assignment
+    N, M = len(anchor_segs), len(other_segs)
+    # vectorized cost computation
+    a0 = anchor_segs[:, 0][:, None, :]  # (N, 1, 3)
+    a1 = anchor_segs[:, 1][:, None, :]
+    b0 = other_segs[None, :, 0]          # (1, M, 3)
+    b1 = other_segs[None, :, 1]
+    d_same = (np.linalg.norm(a0 - b0, axis=-1) +
+              np.linalg.norm(a1 - b1, axis=-1))
+    d_flip = (np.linalg.norm(a0 - b1, axis=-1) +
+              np.linalg.norm(a1 - b0, axis=-1))
+    cost = np.minimum(d_same, d_flip)
+    flipped = d_flip < d_same
+    row, col = linear_sum_assignment(cost)
+    threshold = 2.0 * max_endpoint_dist
+    return [(int(i), int(j), bool(flipped[i, j]))
+            for i, j in zip(row, col) if cost[i, j] <= threshold]
+def predict_sample_tta_hungarian(sample, cfg, model, device,
+                                  seeds=(2718, 31415, 42),
+                                  match_dist: float = 0.4,
+                                  min_passes_for_keep: int = 1,
+                                  snap_target_classes=(0, 1, 2)):
+    """Hungarian-averaged TTA. Aggregates segments via flip-invariant matching.
+    Pass 0 is the anchor. Pass 1+ segments are matched to anchor via Hungarian
+    on endpoint distance. Each anchor segment gets averaged with its matches
+    (orientation-aligned). Anchor segments with < min_passes_for_keep matches
+    are kept only if min_passes_for_keep == 0; otherwise dropped.
+    Args:
+        min_passes_for_keep: 0 = keep all anchor segments, 1 = require matching
+            in at least 1 other pass.
+    """
+    sample_dicts = []
+    all_segs = []
+    for seed in seeds:
+        rng = np.random.RandomState(int(seed))
+        sd = script.fuse_and_sample(sample, cfg, rng)
+        if sd is None:
+            continue
+        sw = _model_to_world_segments(sd, model, device)
+        if sw is None or len(sw) == 0:
+            continue
+        sample_dicts.append(sd)
+        all_segs.append(sw)
+    if not all_segs:
+        return script.empty_solution()
+    if len(all_segs) == 1:
+        # No TTA gain possible; just run the normal post-process.
+        return _post_segments_to_wireframe(
+            all_segs[0], sample_dicts[0], snap_target_classes)
+    anchor = all_segs[0]
+    matches_per_anchor = [[] for _ in range(len(anchor))]  # list of (other_seg, flipped)
+    for p_idx in range(1, len(all_segs)):
+        for i_a, j_o, flipped in _match_segments(anchor, all_segs[p_idx],
+                                                   max_endpoint_dist=match_dist):
+            matches_per_anchor[i_a].append((all_segs[p_idx][j_o], flipped))
+    averaged = []
+    for i, matches in enumerate(matches_per_anchor):
+        if len(matches) < min_passes_for_keep:
+            continue
+        seg = anchor[i]
+        if not matches:
+            averaged.append(seg)
+            continue
+        # Align orientations to anchor and average
+        aligned = [seg]
+        for other_seg, flipped in matches:
+            aligned.append(other_seg[::-1] if flipped else other_seg)
+        averaged.append(np.mean(aligned, axis=0))
+    if not averaged:
+        # Defensive: fall back to concat-and-merge if matching dropped everything
+        return _post_segments_to_wireframe(
+            np.concatenate(all_segs, axis=0), sample_dicts[0],
+            snap_target_classes)
+    return _post_segments_to_wireframe(
+        np.asarray(averaged), sample_dicts[0], snap_target_classes)
+def _post_segments_to_wireframe(segments, sd0, snap_target_classes):
+    """Standard post-process: segments -> vertices/edges -> merge -> snap."""
+    pv, pe = segments_to_vertices_edges(torch.tensor(segments))
+    pv, pe = pv.numpy(), np.array(pe, dtype=np.int32)
+    pv, pe = merge_vertices_iterative(pv, pe)
+    xyz_norm = sd0["xyz_norm"]
+    mask = sd0["mask"]
+    cid = sd0["class_id"]
+    xyz_world = xyz_norm[mask] * float(sd0["scale"]) + sd0["center"]
+    cid_valid = cid[mask]
+    pv = snap_to_point_cloud(
+        pv, xyz_world, cid_valid,
+        snap_radius=script.SNAP_RADIUS,
+        target_classes=list(snap_target_classes),
+    )
+    pv = snap_horizontal(pv, pe)
+    if len(pv) < 2 or len(pe) < 1:
+        return script.empty_solution()
+    return pv, [(int(a), int(b)) for a, b in pe]
+def predict_sample_tta(sample, cfg, model, device,
+                        seeds=(2718, 31415, 42),
+                        snap_target_classes=(0, 1, 2)):
+    """Multi-seed TTA prediction. Returns (vertices, edges) in world space.
+    Args:
+        sample: raw dataset entry.
+        cfg: FuserConfig.
+        model: loaded model.
+        device: torch device.
+        seeds: tuple of priority-sample seeds. Length = # TTA passes.
+        snap_target_classes: classes for snap_to_point_cloud (default
+            [apex, eave_end_point, flashing_end_point] = [0, 1, 2]).
+    """
+    # 1) Run fuse + model for each seed, collect world-space segments.
+    sample_dicts = []
+    all_segs = []
+    for seed in seeds:
+        rng = np.random.RandomState(int(seed))
+        sd = script.fuse_and_sample(sample, cfg, rng)
+        if sd is None:
+            continue
+        sw = _model_to_world_segments(sd, model, device)
+        if sw is None or len(sw) == 0:
+            continue
+        sample_dicts.append(sd)
+        all_segs.append(sw)
+    if not all_segs:
+        return script.empty_solution()
+    # 2) Concatenate segments across passes. The downstream merge will cluster
+    #    near-duplicate vertices (matched across passes) and take centroids,
+    #    yielding the Hungarian-average behavior.
+    combined = np.concatenate(all_segs, axis=0)
+    # 3) Standard post-process: segments -> vertices/edges, iterative merge.
+    pv, pe = segments_to_vertices_edges(torch.tensor(combined))
+    pv, pe = pv.numpy(), np.array(pe, dtype=np.int32)
+    pv, pe = merge_vertices_iterative(pv, pe)
+    # 4) Snap to point cloud. Use the FIRST sample_dict's context (the
+    #    fused points are roughly similar across seeds so this is a reasonable
+    #    proxy; using merged xyz would require re-fusing).
+    sd0 = sample_dicts[0]
+    xyz_norm = sd0["xyz_norm"]
+    mask = sd0["mask"]
+    cid = sd0["class_id"]
+    scale0 = float(sd0["scale"])
+    center0 = sd0["center"]
+    xyz_world = xyz_norm[mask] * scale0 + center0
+    cid_valid = cid[mask]
+    pv = snap_to_point_cloud(
+        pv, xyz_world, cid_valid,
+        snap_radius=script.SNAP_RADIUS,
+        target_classes=list(snap_target_classes),
+    )
+    pv = snap_horizontal(pv, pe)
+    if len(pv) < 2 or len(pe) < 1:
+        return script.empty_solution()
+    edges = [(int(a), int(b)) for a, b in pe]
+    return pv, edges