Masters-four-Tab-OpenAI / docs /dev /open_tasks.md
Pete Dunn
Fix masters eval latency regression
7dcd66d

Open Tasks

Single source of truth for active work.

Priority Legend

  • P0 = blocking production/demo
  • P1 = high impact
  • P2 = nice to have

Active

ID Priority Task Owner Status Next Step Verification
T-137 P2 Keep repo hygiene bounded by pruning non-canonical eval artifacts while preserving canonical eval baselines and runner assets Engineering DONE Reuse backend/scripts/cleanup_repo_artifacts.py --dry-run --no-backup periodically instead of letting timestamped eval reruns accumulate in git python3 backend/scripts/cleanup_repo_artifacts.py --no-backup -> removed_dirs=75, removed_files=62; follow-up python3 backend/scripts/cleanup_repo_artifacts.py --dry-run --no-backup -> 0 pending removals; canonical probes for latest_eval25_guarded_gpt_check, latest_eval50_guarded_gpt_check, latest_eval6_concept_check, release_gate, shards10, and shards5_eval75 all returned OK
T-146 P1 Add a real Hugging Face canary target so the gated deploy workflow exercises both canary and production lanes instead of production-only Engineering DONE Keep the canary Space config aligned with production except for host-specific APP_BASE_URL / VITE_APP_BASE_URL, and keep HF_SPACE_ID_CANARY populated in GitHub Actions GitHub Actions secret HF_SPACE_ID_CANARY now points at crazycrazypete/Masters-four-Tab-OpenAI-Canary; Actions run 22813479490 finished with both deploy-canary and deploy-production green after the canary host-specific base URL was corrected
T-147 P1 Run the first authenticated hosted smoke pass against the refreshed production build and use that result to close the stale hosted sign-off loop Engineering DONE Reuse the same minimal hosted smoke set after future deploys: auth full-flow, one assistant-family provider query, and one POTS workspace shell check on both canary and production when relevant Credentialed hosted smoke passed on both production and canary: cd frontend && npx playwright test e2e/auth.full-flow.spec.ts --reporter=line -> 1 passed; cd frontend && E2E_BASE_URL=https://crazycrazypete-masters-four-tab-openai-canary.hf.space npx playwright test e2e/auth.full-flow.spec.ts --reporter=line -> 1 passed; cd frontend && npx playwright test e2e/pots.provider-coverage.spec.ts --reporter=line -> 1 passed; cd frontend && E2E_BASE_URL=https://crazycrazypete-masters-four-tab-openai-canary.hf.space npx playwright test e2e/pots.provider-coverage.spec.ts --reporter=line -> 1 passed; one-off headless POTS workspace smoke passed on both hosts and confirmed the POTS Project Workspace shell is live instead of the old stacked page
D-232 Removed the duplicate assistant-tab security/CAPTCHA checks from the shared Help + Assist launcher, Unified Knowledgebase, and POTS assistant flows while keeping the Rapid Router order-submit CAPTCHA intact 2026-03-07 backend/app/main.py, backend/app/test_knowledgebase_api.py, backend/app/test_chat_guidance_api.py, frontend/src/components/FloatingRouterHelper.tsx, frontend/src/pages/UnifiedKnowledgebase.tsx, frontend/src/pages/PotsAssistant.tsx, frontend/src/pages/RapidRouter.tsx; cd backend && .venv/bin/python -m pytest -q app/test_knowledgebase_api.py app/test_chat_guidance_api.py app/test_rapid_router_api_shell.py -> 37 passed; cd frontend && npm run build -> success; helper Vitest file remains blocked by local worker hang after startup
T-144 P1 Remove duplicate per-tab assistant security checks from the shared help/assistant tabs while preserving the Rapid Router order-submit CAPTCHA Engineering DONE If assistant abuse appears later, add rate-limit/abuse controls at the backend instead of restoring per-tab CAPTCHA friction cd backend && .venv/bin/python -m pytest -q app/test_knowledgebase_api.py app/test_chat_guidance_api.py app/test_rapid_router_api_shell.py -> 37 passed; cd frontend && npm run build -> success; cd frontend && npx vitest run src/components/FloatingRouterHelper.test.tsx --reporter=dot still stalls after startup in the current local Vitest worker environment
T-143 P1 Make Rapid Router advanced configuration notes optional whenever at least one advanced checkbox option is selected, while keeping notes required for freeform advanced requests with no selected task Engineering DONE Keep backend/frontend validation aligned if new advanced task checkboxes are added later cd backend && .venv/bin/python -m pytest -q app/rapid_router/test_rapid_router_core.py app/test_rapid_router_api_shell.py -> 53 passed; cd frontend && npx vitest run src/pages/RapidRouter.test.tsx --reporter=dot -> 6 passed; cd frontend && npm run build -> success
T-142 P1 Add the four new required Rapid Router approval attestations and enforce them server-side in the order submit path Engineering DONE Keep any future approval-copy changes mirrored in both frontend validation and backend approvals schema so order-submit rules cannot drift cd backend && .venv/bin/python -m pytest -q app/rapid_router/test_rapid_router_core.py app/test_rapid_router_api_shell.py -> 53 passed; cd frontend && npx vitest run src/pages/RapidRouter.test.tsx --reporter=dot -> 5 passed; cd frontend && npm run build -> success
D-231 Changed Rapid Router BoBo bill-to phone to a full 10-digit US phone field with (111) 222-2222 formatting, matching frontend validation, backend normalization, and rendered PDF/email output 2026-03-07 frontend/src/pages/RapidRouter.tsx, frontend/src/pages/RapidRouter.test.tsx, backend/app/rapid_router/core.py, backend/app/rapid_router/test_rapid_router_core.py; cd backend && .venv/bin/python -m pytest -q app/rapid_router/test_rapid_router_core.py app/test_rapid_router_api_shell.py -> 52 passed; cd frontend && npx vitest run src/pages/RapidRouter.test.tsx --reporter=dot -> 5 passed; cd frontend && npm run build -> success
T-141 P1 Align Rapid Router BoBo bill-to phone UX and validation with the requested full-phone example (111) 222-2222 instead of the legacy 7-digit local-number rule Engineering DONE Keep BoBo bill-to phone on the same full 10-digit validation/rendering path as other US phone fields unless the business later confirms a local-only requirement cd backend && .venv/bin/python -m pytest -q app/rapid_router/test_rapid_router_core.py app/test_rapid_router_api_shell.py -> 52 passed; cd frontend && npx vitest run src/pages/RapidRouter.test.tsx --reporter=dot -> 5 passed; cd frontend && npm run build -> success
D-230 Rapid Router split shipping now clamps per-location assignment to ordered quantity, disables adding more locations once all units are assigned, and persists the optional Configure IP passthrough advanced task through backend normalization and output rendering 2026-03-07 frontend/src/pages/RapidRouter.tsx, frontend/src/pages/RapidRouter.test.tsx, backend/app/rapid_router/core.py, backend/app/rapid_router/test_rapid_router_core.py; cd backend && .venv/bin/python -m pytest -q app/rapid_router/test_rapid_router_core.py app/test_rapid_router_api_shell.py -> 52 passed; cd frontend && npx vitest run src/pages/RapidRouter.test.tsx --reporter=dot -> 5 passed; cd frontend && npm run build -> success
T-140 P1 Keep Rapid Router split-shipping allocations bounded to total cart quantity and add the optional advanced Configure IP passthrough task through submit/output paths Engineering DONE If shipping allocation rules expand later, keep clamping at edit-time instead of allowing invalid temporary over-assignment states cd backend && .venv/bin/python -m pytest -q app/rapid_router/test_rapid_router_core.py app/test_rapid_router_api_shell.py -> 52 passed; cd frontend && npx vitest run src/pages/RapidRouter.test.tsx --reporter=dot -> 5 passed; cd frontend && npm run build -> success
T-138 P1 Close the last local continuity gap on the former guarded-GPT Masters mention-lookup tail (31, 32, 35, 37) and then validate that improvement in the next broader rerun Engineering DONE Keep _should_skip_masters_concept_preflight(...) aligned with the explicit doc-lookup vocabulary so plural forms like documents, docs, files, and sources never reactivate concept fallback for file-title lookups Focused slice stayed fast: python3 backend/scripts/unified_kb_eval150.py --cases /tmp/mtk_focused_eval_slices/masters_slice_cases.json ... -> 4/4 passed, avg 7.13ms, p95 26.42ms; exact .env.codex repros dropped from multi-second latency to fast-path timings (31: ~`2461ms->28.83ms, 32: ~2635ms->26.95ms); broader 31-40rerun ->10/10 passed, avg 5.12ms, Masters avg 4.19ms, p95 26.91ms`
T-139 P1 Finish trimming the remaining reusable router compare/render tail so the broader guarded suites can be rerun against a materially cleaner router latency baseline Engineering DONE Keep the shared-query compare/antenna pattern in place unless a future rerun shows router regressions again; router is no longer the first broad-suite blocker Focused slice after the shared-query pass: python3 backend/scripts/unified_kb_eval150.py --cases /tmp/mtk_focused_eval_slices/router_slice_cases.json ... -> 7/7 passed, avg 145.25ms, router-doc avg 328.76ms, p95 627.21ms; broader 75 rerun in docs/evals/20260308_guarded75_after_masters_fix/ kept router compare prompts materially lower (42 ~`327ms, 114 ~661ms) while overall suite finished 75/75, avg_latency_ms=28.81, p95_ms=55.15`
T-145 P1 Re-run the focused Rapid Router frontend validation chain once the local Codex unified-exec saturation is cleared so the new browse-first + BoBo/authorization flow has a clean frontend automated verification summary Engineering DONE Keep future focused frontend reruns isolated so local exec saturation does not masquerade as product regressions cd backend && .venv/bin/python -m pytest -x -vv app/rapid_router/test_rapid_router_core.py -> 28 passed; cd backend && .venv/bin/python -m pytest -q app/test_rapid_router_api_shell.py -> 24 passed; cd frontend && npx vitest run src/pages/RapidRouter.test.tsx --reporter=dot -> 3 passed; cd frontend && npm run build -> success
T-137 P1 Eliminate the remaining guarded-GPT delegate latency tails now that the former Masters mention bucket and the main router compare bucket are no longer the broad-suite blocker Engineering IN PROGRESS Chase the new dominant tails next: POTS playbook/provider prompts (79/82/86) first, then the smaller longer-form Masters content-pack cluster (97/99/101, 106/111/134) plus the lingering BuSS-SKU doc prompt (30) docs/evals/20260308_guarded75_after_masters_fix/unified_kb_eval150_shards10_summary.json -> 75/75, avg_latency_ms=28.81, p95_ms=55.15, p99_ms=327.53, stage_budget_exits=0; docs/evals/20260308_guarded150_after_masters_fix/unified_kb_eval150_shards10_summary.json -> 150/150, avg_latency_ms=151.14, p95_ms=661.36, p99_ms=2969.52, stage_budget_exits=0; remaining dominant outliers are 79/82/86 at 2.88s-4.36s, with secondary longer-form Masters prompts 97/99/101, 106/111/134 and case 30 (725ms)
T-133 P1 Trim the residual deterministic/content-assembly latency that remains after the Masters mention and router compare fixes Engineering OPEN Start with the old POTS playbook/provider bucket (79/82/86), then profile the longer-form Masters content-pack prompts before paying for another promotion-quality baseline rerun docs/evals/20260308_guarded150_after_masters_fix/unified_kb_eval150_shards10_summary.json -> 150/150, p95_ms=661.36, p99_ms=2969.52; dominant remaining tails: 79 2879.88ms, 82 2969.52ms, 86 4359.05ms; secondary longer-form Masters/content prompts: 97 334.73ms, 99 639.50ms, 101 661.69ms, 106 669.12ms, 111 315.54ms, 134 773.01ms
T-132 P1 Rerun broader guarded-GPT 75 and 150 evals against the new deterministic fast answers, exact/current blocked-case net, and the now-green 50-case concept pack Engineering DONE Use the rerun artifacts as comparison evidence, but keep 25/50 as the stable lightweight gates until the new 75/150 regressions in T-133 are fixed docs/evals/20260307_010031_eval75_guarded_gpt_rerun/unified_kb_eval150_shards10_summary.json -> 75 / 75 passed; docs/evals/20260307_010031_eval150_guarded_gpt_rerun/unified_kb_eval150_shards10_summary.json -> 149 / 150 passed; bash backend/scripts/test_backend.sh --full remained previously green at 501 passed
T-131 P1 Confirm hosted deployment env does not override the new gpt-5-mini repo defaults with stale OPENAI_MODEL or assistant-specific model pins Engineering OPEN Inspect Hugging Face Space secrets/variables and any production deployment env to ensure OPENAI_MODEL, UNIFIED_KB_OPENAI_MODEL, and ROUTER_RAG_OPENAI_MODEL are unset or explicitly gpt-5-mini, then rerun one live assistant smoke check Local repo/runtime verification is complete, but hosted env values are external to git and can still override repo defaults
T-130 P1 Standardize all active LLM-assisted backend/runtime defaults, env examples, and local repo env pins on gpt-5-mini Engineering DONE Treat gpt-5-mini as the current default baseline and only revisit if a future model migration is deliberate and fully tested across backend + eval runners rg -n 'gpt-5\\.2' README.md backend/app backend/scripts backend/.env.test.example .env.codex backend/.env.codex ... -> no active runtime/config hits; cd backend && .venv/bin/python -m pytest -q app/test_pots_conversation_regression.py -k 'concept_fallback_for_generic_pots_question or llm_synthesis_omits_temperature_for_gpt5_models' -> 2 passed; bash backend/scripts/test_backend.sh --full -> 478 passed; cd frontend && npx tsc -p tsconfig.json --noEmit -> success; cd frontend && npm run build -> success; cd frontend && npm run test -> 31 files / 111 passed; docs/evals/20260306_230403_eval25_gpt5mini_default/unified_kb_eval150_shards10_summary.json -> 25 / 25 passed
T-129 P2 Stabilize the residual guarded-GPT dual-pathway explainer and rerun the reusable 25-case pack Engineering DONE Keep the dual-pathway phrasing under regression watch as future concept-fallback work lands; no additional action is required for the current acceptance gate docs/evals/20260306_230403_eval25_gpt5mini_default/unified_kb_eval150_shards10_summary.json -> 25 / 25 passed, failed_ids=[]
T-128 P1 Create a reusable guarded-GPT acceptance pack with 25 questions split into 5-question shards and a dedicated shard runner Engineering DONE Use this pack as the lightweight regression gate for future guarded-GPT changes; only expand it after stabilizing or replacing residual case 13 python3 - <<'PY' ... len(rows)==25 ... PY -> success; bash -n backend/scripts/run_unified_kb_eval25_guarded_gpt_chunks.sh -> success; docs/evals/20260307_001201_eval25_phase12/unified_kb_eval150_shards10_summary.json -> 25 / 25 passed (100.0%)
T-127 P1 Roll out the shared assistant-family concept fallback chain with allow/deny gates, gpt-5-mini, provenance labels, and GPT+web only when the model-only concept answer still needs refinement Engineering DONE Expand deterministic internal concept fast answers for the highest-frequency telecom/router/POTS explainers so the new fallback is used less often for questions that can be answered cheaply from curated internal patterns cd backend && .venv/bin/python -m pytest -q app/test_assistant_fallback.py app/test_unified_kb_core.py app/test_router_rag_module.py app/test_masters_conversation_regression.py app/test_pots_conversation_regression.py app/test_chat_guidance_api.py app/test_knowledgebase_api.py -> 202 passed; bash backend/scripts/test_backend.sh --full -> 477 passed; Router RAG smoke -> 10 passed; cd frontend && npx tsc -p tsconfig.json --noEmit -> success; cd frontend && npm run build -> success; cd frontend && npm run test -> 31 files / 111 passed; docs/evals/latest_eval6_concept_check/unified_kb_eval150_shards10_summary.json -> 6 / 6 passed
T-126 P1 Redeploy the hosted Hugging Face app and rerun the live POTS provider-coverage Playwright spec so the local MetTel provider-card backfill is validated against the actual live site Engineering OPEN Ship the current backend provider-card patch, wait for Hugging Face to rebuild, then rerun cd frontend && npx playwright test e2e/pots.provider-coverage.spec.ts --config=playwright.config.ts against the hosted base URL Local fix coverage is green: cd backend && .venv/bin/python -m pytest -q app/test_unified_kb_core.py -k 'provider_inventory_supplements_missing_pots_provider_cards_from_router_corpus or provider_inventory_backfills_missing_router_hint_paths_from_index_hits' -> 2 passed; cd backend && .venv/bin/python -m pytest -q app/test_pots_provider_recall.py -> 2 passed; live suite remains 9 passed / 1 failed / 4 skipped until redeploy
T-125 P1 Enforce the current UI-lock scan rules by removing dead collapsed banners, hiding default status/debug entry points, and eliminating duplicate primary CTAs where they still leak into the active viewport Engineering DONE Keep future cleanup focused on dense admin-only surfaces and message-detail consistency, not reopening already-compliant shell patterns cd frontend && npx tsc -p tsconfig.json --noEmit -> success; cd frontend && npx vitest run src/components/AssistantWorkspace.test.tsx src/components/PromptCoach.test.tsx src/components/BrandHeader.test.tsx src/pages/RapidRouter.test.tsx --reporter=dot -> 11 passed; cd frontend && npm run build -> success; cd frontend && npm run test -> 30 files / 105 passed; git diff --check -> success
T-124 P1 Lock the knowledge/chat family to one shared assistant shell, auto-collapse setup after the first user turn, and restyle the legacy assistant pages onto that pattern Engineering DONE If assistant cleanup continues, unify the deeper response-detail treatments (Why, Next action, Sources, file panels) so all assistant answers share one internal message pattern as well cd frontend && npx tsc -p tsconfig.json --noEmit -> success; cd frontend && npx vitest run src/components/AssistantWorkspace.test.tsx src/components/PageArchetypes.test.tsx --reporter=dot -> 4 passed; cd frontend && npm run build -> success; cd frontend && npm run test -> 30 files / 105 passed
T-123 P1 Rebuild RapidRouter as a staged scan-and-build commerce flow (Filter, Browse, Quantity, Customer info, Review) with a sticky cart and collapsed secondary tools Engineering DONE Collapse any remaining late-stage advanced/admin helper clusters behind one secondary control so the new commerce sequence stays clean under heavy use cd frontend && npx tsc -p tsconfig.json --noEmit -> success; cd frontend && npx vitest run src/pages/RapidRouter.test.tsx --reporter=dot -> 2 passed; cd frontend && npm run build -> success; cd frontend && npm run test -> 29 files / 103 passed
T-122 P1 Collapse Telco assumptions, what-if mode, diagnostics, quote helpers, scenario JSON/CSV, and assistant coaching into one shared Advanced drawer so the default calculator surface stays on the business flow Engineering DONE Apply the same single-secondary-control rule to RapidRouter, which still exposes too many business and support surfaces in parallel cd frontend && npx tsc -p tsconfig.json --noEmit -> success; cd frontend && npx vitest run src/pages/TelcoCalculator.test.tsx --reporter=dot -> 2 passed; cd frontend && npm run build -> success; cd frontend && npm run test -> 28 files / 101 passed
T-121 P1 Rebuild TelcoCalculator as a single-path step sequence with Locations, Pricing, Results, and Export instead of a simultaneous tri-column calculator surface Engineering DONE Apply the same step-led simplification to RapidRouter, which still mixes catalog, helper, and order-prep surfaces in one view cd frontend && npx tsc -p tsconfig.json --noEmit -> success; cd frontend && npx vitest run src/pages/TelcoCalculator.test.tsx --reporter=dot -> 1 passed; cd frontend && npm run build -> success; cd frontend && npm run test -> 28 files / 100 passed
T-120 P1 Replace paragraph-style POTS instructions with a stable three-line step guide so each step only says what it does, what is needed now, and what happens next Engineering DONE Reuse the same guide pattern in the POTS project drawer and any later summary/export surfaces that still read as prose-heavy cd frontend && npx tsc -p tsconfig.json --noEmit -> success; cd frontend && npx vitest run src/pages/PotsEstimateIntake.test.tsx src/pages/PotsSavingsEstimator.test.tsx src/pages/PotsIntake.test.tsx src/pages/PotsWorkspace.test.tsx --reporter=dot -> 23 passed; cd frontend && npm run build -> success; cd frontend && npm run test -> 27 files / 99 passed
T-119 P1 Flatten the embedded PotsEstimateIntake wrapper so the estimator and intake inherit a lighter host shell instead of stacking full cards inside cards Engineering DONE If more simplification is still needed, flatten the later review/export sections inside PotsIntake rather than adding more wrapper-level chrome cd frontend && npx tsc -p tsconfig.json --noEmit -> success; cd frontend && npx vitest run src/pages/PotsEstimateIntake.test.tsx src/pages/PotsSavingsEstimator.test.tsx src/pages/PotsIntake.test.tsx src/pages/PotsWorkspace.test.tsx --reporter=dot -> 23 passed; cd frontend && npm run build -> success; cd frontend && npm run test -> 27 files / 99 passed
T-118 P1 Convert PotsWorkspace routing questions into a one-question-at-a-time conversation with answer cards and compact Why this matters disclosure Engineering DONE Reuse the same guided-question pattern in other dense decision forms if later UI lock passes show similar cognitive overload cd frontend && npx tsc -p tsconfig.json --noEmit -> success; cd frontend && npx vitest run src/pages/PotsWorkspace.test.tsx --reporter=dot -> 10 passed; cd frontend && npm run build -> success; cd frontend && npm run test -> 27 files / 99 passed
T-117 P1 Move active-project creation and saved-project switching behind the Project tools drawer so PotsWorkspace stops showing setup UI in the main wizard by default Engineering DONE Apply the same hide-setup-behind-drawer rule in other dense workflows such as RapidRouter and TelcoCalculator where setup/admin surfaces still compete with the active task cd frontend && npx tsc -p tsconfig.json --noEmit -> success; cd frontend && npx vitest run src/pages/PotsWorkspace.test.tsx --reporter=dot -> 9 passed; cd frontend && npm run build -> success; cd frontend && npm run test -> 27 files / 98 passed
T-116 P1 Convert PotsWorkspace into a true wizard shell with one active step and one optional utility drawer Engineering DONE Apply the same step-led shell discipline to RapidRouter and TelcoCalculator if the UI lock continues beyond POTS cd frontend && npx tsc -p tsconfig.json --noEmit -> success; cd frontend && npx vitest run src/pages/PotsWorkspace.test.tsx --reporter=dot -> 8 passed; cd frontend && npm run build -> success; cd frontend && npm run test -> 27 files / 97 passed; git diff --check -> success
T-115 P1 Enforce one obvious primary action per screen so setup, reset, export, and support controls stop competing with the current forward move Engineering DONE Continue the CTA-hierarchy pass in RapidRouter, TelcoCalculator, and the assistant-family export/help clusters where multiple strong actions still share one viewport cd frontend && npx tsc -p tsconfig.json --noEmit -> success; cd frontend && npx vitest run src/pages/PotsEstimateIntake.test.tsx src/pages/PotsSavingsEstimator.test.tsx src/pages/PotsWorkspace.test.tsx --reporter=dot -> 16 passed; cd frontend && npm run build -> success; cd frontend && npm run test -> 27 files / 97 passed; git diff --check -> success
T-114 P1 Tighten and standardize the radius system so major shells use 20px, secondary surfaces use 16px, controls use 12px, and full-pill styling is reserved for chips Engineering DONE Continue the same radius cleanup through RapidRouter, TelcoCalculator, CommandPalette, and any remaining legacy modal/helper surfaces that still overuse rounded-2xl cd frontend && npx tsc -p tsconfig.json --noEmit -> success; cd frontend && npx vitest run src/components/PrimaryNavigation.test.tsx src/components/FloatingRouterHelper.test.tsx src/components/PromptCoach.test.tsx src/components/chat/ChatTranscript.test.tsx src/pages/PotsSavingsEstimator.test.tsx src/pages/PotsEstimateIntake.test.tsx src/pages/PotsIntake.test.tsx src/pages/PotsWorkspace.test.tsx --reporter=dot -> 34 passed; cd frontend && npm run build -> success; cd frontend && npm run test -> 27 files / 97 passed; git diff --check -> success
T-113 P1 Replace border-heavy card stacking with the locked three-surface whitespace hierarchy in the shared shell and active POTS flow Engineering DONE Continue the same whitespace-hierarchy cleanup in TelcoCalculator, RapidRouter, and the still-denser late-step surfaces in PotsIntake if the UI lock continues cd frontend && npx tsc -p tsconfig.json --noEmit -> success; cd frontend && npx vitest run src/components/PageArchetypes.test.tsx src/pages/PotsWorkspace.test.tsx src/pages/PotsSavingsEstimator.test.tsx src/pages/PotsEstimateIntake.test.tsx src/pages/PotsIntake.test.tsx --reporter=dot -> 23 passed; cd frontend && npm run build -> success; cd frontend && npm run test -> 27 files / 97 passed; git diff --check -> success
T-112 P1 Reduce badge and label noise in the shared shell, POTS flow, and assistant-family pages so metadata stops competing with primary actions Engineering DONE If the badge-noise pass continues, target the remaining denser local-state surfaces next: TelcoCalculator, RapidRouter, and the still-busy parts of PotsIntake cd frontend && npx tsc -p tsconfig.json --noEmit -> success; cd frontend && npx vitest run src/components/FloatingRouterHelper.test.tsx src/components/PageArchetypes.test.tsx src/pages/PotsWorkspace.test.tsx src/pages/PotsSavingsEstimator.test.tsx --reporter=dot -> 19 passed; cd frontend && npm run build -> success; cd frontend && npm run test -> 27 files / 97 passed
T-111 P1 Lock the shared typography system so the shell uses Public Sans, a slightly larger reading scale, and uppercase only for true metadata Engineering DONE Continue the UI lock by applying the same shared typography utilities opportunistically to any remaining dense admin/reporting surfaces as later layout passes touch them cd frontend && npx tsc -p tsconfig.json --noEmit -> success; cd frontend && npx vitest run src/components/BrandHeader.test.tsx src/components/PrimaryNavigation.test.tsx src/components/PageArchetypes.test.tsx src/components/FloatingRouterHelper.test.tsx src/pages/PotsWorkspace.test.tsx src/pages/PotsSavingsEstimator.test.tsx src/pages/PotsEstimateIntake.test.tsx src/pages/PotsIntake.test.tsx --reporter=dot -> 36 passed; cd frontend && npm run build -> success; cd frontend && npm run test -> 27 files / 97 passed; typography-scope grep confirmed no remaining uppercase classes in the shared shell + active assistant/POTS lock scope
T-110 P1 Lock the shared UI color system so navy is primary, slate is structural, green is success/live, amber is caution, and red is reserved for destructive/error emphasis Engineering DONE Continue the UI lock by converting any remaining page-local legacy color classes onto the shared token system as the next visual recommendations are implemented cd frontend && npx tsc -p tsconfig.json --noEmit -> success; cd frontend && npx vitest run src/components/BrandHeader.test.tsx src/components/PrimaryNavigation.test.tsx src/components/PageArchetypes.test.tsx src/components/FloatingRouterHelper.test.tsx --reporter=dot -> 15 passed; cd frontend && npm run build -> success; cd frontend && npm run test -> 27 files / 97 passed; shared shell grep confirmed no remaining #EE0000, #1a2b56, #243869, or old blue helper classes in frontend/src/components / frontend/src/index.css
T-109 P1 Define and apply four shared page archetype shells (Workspace, Calculator, Catalog, Assistant) so the main tabs stop mixing layout patterns Engineering DONE Extend the same shared shells to the remaining assistant-class tabs (RouterKnowledgebase, RoutersAssistant, MastersAI, PotsAssistant) and then decide whether any shell-specific cleanup is still needed per page cd frontend && npx tsc -p tsconfig.json --noEmit -> success; cd frontend && npx vitest run src/components/PageArchetypes.test.tsx src/components/BrandHeader.test.tsx src/pages/PotsWorkspace.test.tsx --reporter=dot -> 15 passed; cd frontend && npm run build -> success; cd frontend && npm run test -> 27 files / 97 passed; local desktop/mobile browser spot-check confirmed POTS, Telco, Rapid Router, and Knowledgebase render the expected archetype shells
T-108 P1 Consolidate floating support and helper controls into one shared help launcher with internal tabs Engineering DONE Continue the UI lock by reviewing any remaining duplicated top-level utility affordances and deciding whether command-palette/status visibility should stay as-is or be simplified further cd frontend && npx tsc -p tsconfig.json --noEmit -> success; cd frontend && npx vitest run src/components/FloatingRouterHelper.test.tsx src/components/PrimaryNavigation.test.tsx src/components/BrandHeader.test.tsx --reporter=dot -> 12 passed; cd frontend && npm run build -> success; cd frontend && npm run test -> 26 files / 94 passed; local desktop/mobile browser spot-check confirmed one floating Help launcher with Assist and Support tabs
T-107 P1 Remove emoji-style workspace cues and standardize the shell on restrained enterprise navigation icons Engineering DONE Continue the UI lock by consolidating the duplicate floating global launchers so the cleaner shell is not undercut by competing bottom-of-screen controls cd frontend && npx tsc -p tsconfig.json --noEmit -> success; cd frontend && npx vitest run src/components/PrimaryNavigation.test.tsx src/components/BrandHeader.test.tsx --reporter=dot -> 8 passed; cd frontend && npm run build -> success; cd frontend && npm run test -> 25 files / 90 passed; local desktop/mobile browser spot-check confirmed the workspace shell no longer exposes 🧠 📚 🧮 📉 📡 ⚡ text
T-106 P1 Replace the old toolbox interaction with a true primary navigation system: visible desktop workspace rail, mobile workspace sheet, and integrated workspace search Engineering DONE Continue the UI lock by simplifying the remaining persistent global controls, starting with the duplicate bottom launchers (Get support, Open router helper) cd frontend && npx tsc -p tsconfig.json --noEmit -> success; cd frontend && npx vitest run src/components/PrimaryNavigation.test.tsx src/components/BrandHeader.test.tsx --reporter=dot -> 7 passed; cd frontend && npm run build -> success; cd frontend && npm run test -> 25 files / 89 passed; local desktop/mobile browser spot-check confirmed the old Open toolbox / Toolbox is collapsed copy is gone
T-105 P1 Collapse the global shell into one compact utility header and remove always-visible toolbox chrome Engineering DONE Treat the compact header as the locked baseline for the new primary navigation shell; no separate follow-up remains beyond the broader UI lock work cd frontend && npx tsc -p tsconfig.json --noEmit -> success; cd frontend && npx vitest run src/components/BrandHeader.test.tsx --reporter=dot -> 4 passed; cd frontend && npm run build -> success
T-104 P1 Redeploy the Hugging Face hosted frontend so hosted POTS QA runs against the latest simplified workspace/intake implementation instead of the stale stacked build Engineering IN_PROGRESS Trigger/reconfirm the Space rebuild, then rerun the hosted POTS desktop/mobile sign-off pass against the refreshed deployment Hosted/Auth0 check on 2026-03-06: cd frontend && npx playwright test e2e/auth.full-flow.spec.ts --config=playwright.config.ts --reporter=line -> 1 passed; hosted POTS desktop/mobile inspection -> 0/2 sign-off passes because both viewports still rendered the older stacked POTS workspace with POTS Project Workspace, POTS Estimates + Intake, and POTS Savings Estimator on one page
T-103 P1 Sweep remaining frontend destructive actions so saved drafts, chat resets, and scoped removals all require confirmation before data loss Engineering DONE If final sign-off needs it, spot-check representative destructive flows in the hosted/authenticated runtime; local/frontend verification is complete cd frontend && npx tsc -p tsconfig.json --noEmit -> success; cd frontend && npx vitest run src/utils/chatCommands.test.ts src/utils/confirmAction.test.ts src/pages/PotsSavingsEstimator.test.tsx src/pages/PotsEstimateIntake.test.tsx src/pages/PotsIntake.test.tsx src/pages/PotsWorkspace.test.tsx --reporter=dot -> 27 passed; cd frontend && npm run build -> success; cd frontend && npm run test -> 24 files / 86 passed
T-102 P1 Simplify PotsWorkspace into a progressive, one-step-at-a-time surface with collapsed secondary tools Engineering DONE If final deploy sign-off needs it, repeat the same flow against the hosted/authenticated runtime; no more default-open changes are pending from the local browser pass cd frontend && npx vitest run src/pages/PotsWorkspace.test.tsx --reporter=dot -> 7 passed; cd frontend && npm run build -> success; cd frontend && npm run test -> 23 files / 79 passed; local browser QA at 1440x1024 + 390x844 confirmed the support panels now behave as a true accordion
T-101 P1 Simplify the active POTS estimate/intake UX with progressive disclosure and fewer always-open support panels Engineering DONE Treat hosted/authenticated browser QA as optional final sign-off only; local browser QA did not justify opening any additional intake disclosures by default cd frontend && npx vitest run src/pages/PotsIntake.test.tsx src/pages/PotsEstimateIntake.test.tsx --reporter=dot -> 6 passed; cd frontend && npm run build -> success; cd frontend && npm run test -> 23 files / 79 passed; local browser QA at 1440x1024 + 390x844 confirmed See all sites, optional notes, and helper disclosures can stay closed by default
T-100 P1 Clarify POTS estimator start-path choices and seed intake according to the chosen entry mode Engineering DONE Run hosted/manual QA on all three chooser paths (quick estimate, totals now, site details next, site-by-site now) and confirm the seeded intake drafts feel obvious under real auth/runtime conditions cd frontend && npx vitest run src/pages/PotsSavingsEstimator.test.tsx src/pages/PotsEstimateIntake.test.tsx --reporter=dot -> 5 passed; cd frontend && npm run build -> success; cd frontend && npm run test -> 22 files / 72 passed
T-099 P1 Add clear project deletion flow to POTS workspace with confirmation pop-up Engineering DONE Use the selector delete action during hosted/manual QA and note any copy/layout polish or additional destructive-action confirmation gaps elsewhere in the app cd frontend && npx vitest run src/pages/PotsWorkspace.test.tsx --reporter=dot -> 5 passed; cd frontend && npm run build -> success; cd frontend && npm run test -> 20 files / 67 passed; python3 -m pytest -q backend/app/test_pots_workspace_api.py -> 46 passed; python3 -m pytest -q backend/app/test_pots_workspace_api.py backend/app/test_pots_response_contract.py backend/app/test_pots_conversation_regression.py -> 51 passed
T-098 P1 Expose phase-9-24 POTS workspace workflow controls in the SPA for manual/hosted verification Engineering DONE Use the new workflow panel for credentialed hosted/browser QA, export review, and responsive checks; continue tracking any remaining phase-25+ surface gaps separately cd frontend && npx vitest run src/pages/PotsWorkspace.test.tsx --reporter=dot -> 3 passed; cd frontend && npm run build -> success; cd frontend && npm run test -> 20 files / 65 passed; python3 -m pytest -q backend/app/test_pots_workspace_api.py -> 45 passed
T-097 P1 Deep-dive Phase 9-40 workflow logic for gotchas and add detailed edge-case regression coverage Engineering DONE Continue hosted manual UX verification for phase-9+ workflow controls and export review with pilot users python3 -m pytest -q backend/app/test_pots_workspace_api.py -k \"remove_last_location_resets_project_counts or excel_export_has_required_tabs\" -> 2 passed; python3 -m pytest -q backend/app/test_pots_workspace_api.py backend/app/test_pots_response_contract.py backend/app/test_pots_conversation_regression.py -> 50 passed; cd backend && python3 -m pytest -q -> 435 passed; npm --prefix frontend run build -> success; npm --prefix frontend run test -> 19 files / 62 passed
T-096 P1 Execute Phases 9-40 of POTS workspace roadmap in strict order (guided discovery through launch optimization) Engineering DONE Core backend roadmap and initial SPA workflow surface are complete; next step is hosted/manual UX pass plus credentialed browser journeys using the new PotsWorkspace controls Per-phase gate: for p in $(seq 9 40); do python3 -m pytest -q backend/app/test_pots_workspace_api.py -k \"phase${p}\"; done -> each selector 1 passed; consolidated: python3 -m pytest -q backend/app/test_pots_workspace_api.py -> 43 passed; python3 -m pytest -q backend/app/test_pots_workspace_api.py backend/app/test_pots_response_contract.py backend/app/test_pots_conversation_regression.py -> 48 passed; npm --prefix frontend run build -> success; npm --prefix frontend run test -> 19 files / 62 passed
T-095 P1 Execute Phase 8 of 40-phase POTS roadmap: audit log v1 Engineering DONE Start Phase 9 guided-discovery question tree implementation python3 -m pytest -q backend/app/test_pots_workspace_api.py -> 11 passed; python3 -m pytest -q backend/app/test_pots_workspace_api.py backend/app/test_pots_response_contract.py backend/app/test_pots_conversation_regression.py -> 16 passed
T-094 P1 Execute Phase 7 of 40-phase POTS roadmap: delegation skeleton (internal section assignment) Engineering DONE Start Phase 8 audit log v1 for immutable activity timeline python3 -m pytest -q backend/app/test_pots_workspace_api.py -> 10 passed; python3 -m pytest -q backend/app/test_pots_workspace_api.py backend/app/test_pots_response_contract.py backend/app/test_pots_conversation_regression.py -> 15 passed
T-093 P1 Execute Phase 6 of 40-phase POTS roadmap: intake progress model and completion scoring Engineering DONE Start Phase 7 delegation skeleton (internal assignment ownership) python3 -m pytest -q backend/app/test_pots_workspace_api.py -> 9 passed; python3 -m pytest -q backend/app/test_pots_workspace_api.py backend/app/test_pots_response_contract.py backend/app/test_pots_conversation_regression.py -> 14 passed
T-092 P1 Execute Phase 5 of 40-phase POTS roadmap: workspace-home UX (mode-first start + next action guidance) Engineering IN_REVIEW Complete manual desktop/tablet/mobile QA checklist in docs/dev/pots_workspace_phase5_home_ux.md and close remaining layout nits npm --prefix frontend run build -> success; npm --prefix frontend run test -> 19 files / 62 passed
T-091 P1 Execute Phase 4 of 40-phase POTS roadmap: tenant/user isolation hardening Engineering DONE Start Phase 5 workspace-home UX refinement and next-action card design python3 -m pytest -q backend/app/test_pots_workspace_api.py -> 8 passed; python3 -m pytest -q backend/app/test_pots_workspace_api.py backend/app/test_pots_response_contract.py backend/app/test_pots_conversation_regression.py -> 13 passed
T-090 P1 Execute Phase 3 of 40-phase POTS roadmap: lifecycle state machine with guarded transitions Engineering DONE Start Phase 4 tenant/user isolation hardening and fallback handling checks python3 -m pytest -q backend/app/test_pots_workspace_api.py -> 7 passed; python3 -m pytest -q backend/app/test_pots_response_contract.py backend/app/test_pots_conversation_regression.py -> 5 passed
T-089 P1 Execute Phase 2 of 40-phase POTS roadmap: role/collaboration model (internal-first) Engineering DONE Use published role matrix to drive delegation/audit implementation phases; external customer contribution remains deferred `rg -n "Role Matrix
T-088 P1 Execute Phase 1 of the new 40-phase POTS workspace roadmap (scoped project API + triage + workspace shell) Engineering DONE Begin Phase 2 role/collaboration model and external-contribution boundary decisioning (kept deferred from Phase 1 per user direction) python3 -m pytest -q backend/app/test_pots_workspace_api.py backend/app/test_pots_response_contract.py backend/app/test_pots_conversation_regression.py -> 9 passed; npm --prefix frontend run build -> success; npm --prefix frontend run test -> 19 files / 62 passed
T-087 P1 Publish detailed 40-phase project map for enterprise POTS workspace execution Engineering DONE Use this map as execution baseline; track phase-by-phase completion status in this task table and session handoff docs/dev/pots_workspace_40_phase_project_map.md created with phase definitions, verification gates, and exit criteria
T-086 P1 Execute saved cross-workstream gameplan for remaining fixes/enhancements (phased next-thread plan) Engineering IN_PROGRESS Keep parser backlog item deferred; auth blocker is cleared; next focus is broader hosted/manual sign-off items plus optional 150 stability push from 94.7% to >=95% Phase 0 auth refresh: cd frontend && npx vitest run src/auth/config.test.ts src/auth/errorUtils.test.ts src/components/HealthStatusModal.test.tsx -> 16 passed; python3 -m pytest -q backend/app/test_auth.py backend/app/test_startup_rate_limit.py -> 31 passed; cd frontend && npx playwright test e2e/auth.spec.ts --reporter=line -> 6 passed; cd frontend && npx playwright test e2e/auth.full-flow.spec.ts --reporter=line -> 1 passed; Phase 1 gate run: frontend build success + frontend tests 59 passed + Rapid Router/API pytest 49 passed; Phase 2 gate re-run: frontend build success + frontend tests 59 passed + consolidation pytest 68 passed; Phase 3 gate run: 150 142/150 (94.7%) failed [24,36,88,98,99,104,112,129] (docs/evals/20260305T013817_phase3_gate150_final/unified_kb_eval150_shards10_summary.json), 75 74/75 (98.7%) failed [3] (docs/evals/20260305T015614_phase3_gate75_final/unified_kb_eval150_shards10_summary.json), 50 50/50 (100.0%) failed [] (docs/evals/20260305T020530_phase3_gate50_final/unified_kb_eval150_shards10_summary.json); additional 150 target attempt 141/150 (94.0%) failed [48,55,78,89,99,107,110,112,118] (docs/evals/20260305T021154_phase3_gate150_rerun2_final/unified_kb_eval150_shards10_summary.json); Phase 4 gate run: python3 -m pytest -q backend/app/test_unified_kb_core.py backend/app/test_knowledgebase_api.py backend/app/rapid_router/test_rapid_router_core.py backend/app/test_rapid_router_api_shell.py -> 151 passed; Phase 5 targeted runs: cd backend && python3 -m pytest -q app/test_unified_kb_core.py app/test_pots_conversation_regression.py app/test_unified_kb_eval150_script.py -> 102 passed
T-085 P1 Validate new Smart Profile + Customer Memory + resume/repeat flows across KB, POTS, and Rapid Router in hosted runtime Engineering IN_REVIEW Seed frontend/.env.e2e or frontend/.env.e2e.local with two Auth0 test users, then run cd frontend && npx playwright test e2e/rapid-router.memory-isolation.spec.ts plus the manual same-browser two-user swap: (1) save/apply Rapid Router profile as user A, (2) log out/in as user B and confirm no customer-profile carryover appears, (3) switch back to user A and confirm scoped memory is still present, (4) repeat KB/POTS handoff checks npm --prefix frontend run build -> success; cd frontend && npx vitest run src/utils/customerMemory.test.ts --pool=threads --maxWorkers=1 -> 4 passed; cd frontend && npx playwright test e2e/rapid-router.memory-isolation.spec.ts --list -> 1 test listed; cd frontend && npx vitest run src/components/BrandHeader.test.tsx --pool=threads --maxWorkers=1 -> 4 passed
T-084 P1 Validate header Slack-chip responsiveness and spacing with command/status toggles on narrow widths Engineering IN_REVIEW Manually check header controls at mobile/tablet/desktop and with command-palette/system-status hidden; ensure Slack chip remains accessible without wrapping collisions npm --prefix frontend run build -> success; cd frontend && npx vitest run src/components/BrandHeader.test.tsx --pool=threads --maxWorkers=1 -> 4 passed
T-083 P1 Validate global floating support launcher UX in hosted runtime (desktop + mobile overlap with router helper) Engineering IN_REVIEW Capture hosted screenshots and confirm Slack/email/phone links open correctly from each tab; tune spacing/z-index if mobile overlaps with bottom-page controls npm --prefix frontend run build -> success; manual hosted check from all enabled tabs
T-082 P1 Validate hosted UX for new Rapid Router split-shipping flow (single-model orders only) across desktop/tablet/mobile Engineering IN_REVIEW Run manual hosted pass covering: default single-address flow, enabling split shipping on one selected model, cap enforcement (locations <= qty), and mixed-model disabled state; capture screenshots and any copy/layout nits npm --prefix frontend run build -> success; python3 -m pytest -q backend/app/rapid_router/test_rapid_router_core.py -> 25 passed; python3 -m pytest -q backend/app/test_rapid_router_api_shell.py -> 24 passed
T-080 P0 Remove legacy masters-toolkit-api audience assumptions from auth-required deployments and verify hosted login without a custom API audience Engineering DONE Legacy audience placeholder is now ignored in active auth code and hosted login passed with credentialed Playwright runs; keep deployment env clean by leaving VITE_AUTH0_AUDIENCE / AUTH0_AUDIENCE unset unless a real Auth0 API Identifier is introduced later cd frontend && npx vitest run src/auth/config.test.ts src/auth/errorUtils.test.ts src/components/HealthStatusModal.test.tsx -> 16 passed; python3 -m pytest -q backend/app/test_auth.py backend/app/test_startup_rate_limit.py -> 31 passed; npm --prefix frontend run build -> success; cd frontend && npx playwright test e2e/auth.spec.ts --reporter=line -> 6 passed; cd frontend && npx playwright test e2e/auth.full-flow.spec.ts --reporter=line -> 1 passed
T-081 P1 Fill missing Crown (ASKNCM1100E) WAN/LAN detail fields in deterministic router fact CSV for cleaner Dragon-vs-Crown compares Engineering DONE Added source-backed Crown interface counts to deterministic CSV and covered fast-path behavior with regression assertions python3 -m pytest -q backend/app/test_unified_kb_core.py backend/app/test_knowledgebase_api.py backend/app/rapid_router/test_rapid_router_core.py backend/app/test_rapid_router_api_shell.py -> 151 passed
T-079 P1 Recover mixed-domain shard regressions from OpenAI 2026-02-27 run (150 suite failed IDs) Engineering IN_REVIEW Latest best Phase-3 150 gate remains 142/150 (94.7%) with failed IDs [24,36,88,98,99,104,112,129]; additional rerun showed semantic variance (141/150, failed [48,55,78,89,99,107,110,112,118]); focus next on masters/pots long-form semantic stability to hold >=95% target Re-run cd backend && CHUNK_SIZE=15 START_ID=1 END_ID=150 SEMANTIC_POLICY=all OUT_DIR=../docs/evals/<stamp> CASES_PATH=../docs/evals/unified_kb_eval150_cases.json ./scripts/run_unified_kb_eval150_chunks.sh; artifacts: docs/evals/20260305T013817_phase3_gate150_final/ and docs/evals/20260305T021154_phase3_gate150_rerun2_final/; maintain >=92%, target >=95%
T-078 P1 Raise router-helper quality for generated 50-question conceptual set (currently 23/50) Engineering DONE Completed targeted conceptual-intent/routing fixes in backend/app/knowledgebase/core.py; raised generated-50 suite from 23/50 to 47/50 (94.0%) with no stage-budget exits in latest run cd backend && CHUNK_SIZE=5 START_ID=1 END_ID=50 SEMANTIC_POLICY=all OUT_DIR=../docs/evals/shards10_eval50_openai_all_20260227_fix7_full CASES_PATH=../docs/evals/unified_kb_eval50_new_questions_router_helper_cases.json ./scripts/run_unified_kb_eval150_chunks.sh
T-077 P1 Consolidate Routers tab capabilities into Master’s Telecom AI Knowledgebase as single source tab (no duplicate tool surfaces) Engineering IN_REVIEW Hosted parity sign-off remains: capture credentialed runtime proof for KB-first router journey and final tab-retirement readiness notes npm --prefix frontend run build -> success; npm --prefix frontend run test -> 19 files / 59 passed; python3 -m pytest -q backend/app/test_knowledgebase_api.py backend/app/routers/router_tab_smoke_test.py backend/app/test_tab_final_pass_matrix.py backend/app/test_pots_response_contract.py backend/app/test_pots_conversation_regression.py -> 68 passed
T-076 P1 Merge POTS Savings Estimator + POTS Replacement Intake into one guided tab with estimator-to-intake handoff Engineering IN_REVIEW Hosted guided-flow sign-off remains: run credentialed journey for estimator->intake carryover and confirm user-facing prefill clarity npm --prefix frontend run build -> success; npm --prefix frontend run test -> 19 files / 59 passed; python3 -m pytest -q backend/app/test_knowledgebase_api.py backend/app/routers/router_tab_smoke_test.py backend/app/test_tab_final_pass_matrix.py backend/app/test_pots_response_contract.py backend/app/test_pots_conversation_regression.py -> 68 passed
T-075 P1 Run credentialed hosted browser E2E for full tab journeys (real page-to-page progression) Engineering IN_REVIEW Auth-only hosted runs are green; next expand into the new POTS workspace workflow panel plus existing KB/Rapid Router flows and capture screenshots/artifacts cd frontend && npx playwright test e2e/auth.spec.ts --reporter=line -> 6 passed; cd frontend && npx playwright test e2e/auth.full-flow.spec.ts --reporter=line -> 1 passed
T-074 P1 Implement non-Rapid tab UI polish pack from cross-tab advisory review Engineering IN_REVIEW Phase-1 quick wins and automated deep-dive visual QA are complete (21 viewport-tab runs, 0 visual issues); execute remaining phase-2/phase-3 structural interactions npm --prefix frontend run build -> success; npm --prefix frontend run test -> 18 files / 54 tests passed; visual audit frontend/frontend/tmp/visual_audit/visual_audit_results.json shows failedRuns=0, totalVisualIssues=0
T-073 P1 Simplify helper comparison-table UX to table-first output with clearer CTA Engineering DONE Added table-detection/simplification in global helper and bypassed long-answer preview/details for table responses; aligned CTA wording across helper table renderers and published checkpoint commit npm --prefix frontend run build -> success; commit 1014b78; git push origin main + git push hf-fourtab main
T-072 P1 Publish router-ingestion checkpoint to required remotes Engineering DONE Committed and pushed current router RAG mapping/report/doc updates to both required remotes commit 8050c76; git push origin main -> 21c3962..8050c76; git push hf-fourtab main -> 21c3962..8050c76
T-071 P1 Ingest new router knowledgebase corpus batch (EX400, RX400, ER815, IR624, Balance 310X) Engineering DONE Added deterministic intake mappings and executed full intake pipeline on staged batch source; verified manifest/chunk inclusion and recall smoke pass bash backend/scripts/router_rag_intake_pipeline.sh ../tmp/router_rag_intake_2026-02-27_batch -> included=7, skipped=0; python3 backend/scripts/router_rag_smoke.py --query ... -> 5 queries, 0 failures
T-068 P1 Add basic CAPTCHA gate before Rapid Router order submit and first Knowledgebase/POTS/helper request Engineering DONE Completed backend challenge/verify APIs + scoped token enforcement + frontend one-time gate cards + regression coverage python3 -m pytest -q backend/app/test_rapid_router_api_shell.py backend/app/test_knowledgebase_api.py backend/app/test_chat_guidance_api.py backend/app/rapid_router/test_rapid_router_core.py -> 57 passed; npm --prefix frontend run build -> success
T-059 P1 Add Rapid Router CSV ingestion validator + dry-run import path (schema/lint + duplicate/SKU checks + preview) Engineering DONE Completed core CSV validator + duplicate checks + dry-run/apply path + admin API endpoint + regression tests python3 -m pytest -q backend/app/rapid_router/test_rapid_router_core.py backend/app/test_rapid_router_api_shell.py -> 39 passed
T-067 P1 Execute Rapid Router 10-point readability/simplicity cleanup pass Engineering IN_REVIEW Core 3-phase refactor is implemented in RapidRouter.tsx; run desktop/mobile visual QA and capture any spacing/copy nits before marking done npm --prefix frontend run build -> success; visual QA checklist pass for step header, staged actions, fix-list-only validation, helper readability, and admin modal flow
T-069 P1 Implement user-requested 12-point Rapid Router + global UI visibility overhaul Engineering IN_REVIEW Deep-dive compliance pass applied final cleanup patches (remove leftover column-focus/copy controls and unify helper compare label to Device details); perform hosted browser QA/deploy smoke npm --prefix frontend run build -> success; python3 -m pytest -q backend/app/rapid_router/test_rapid_router_core.py backend/app/test_rapid_router_api_shell.py -> 45 passed, 9 warnings; python3 -m pytest -q backend/app/test_unified_kb_core.py backend/app/test_knowledgebase_api.py -> 88 passed, 9 warnings
T-070 P1 Run a targeted visual polish sprint for Rapid Router and shared rails/cards Engineering IN_REVIEW Complete hosted desktop/tablet/mobile screenshot QA for the newly shipped polish pass and capture any residual spacing/copy nits before publish npm --prefix frontend run build -> success; python3 -m pytest -q backend/app/rapid_router/test_rapid_router_core.py backend/app/test_rapid_router_api_shell.py -> 45 passed, 9 warnings
T-066 P1 Profile HF Space startup/wake latency with real runtime timings and recommend env tuning Engineering IN_REVIEW Capture startup stage timings from runtime logs/health (bootstrap, csv_sanity, preload, integrity) and decide preload policy (light vs none) for production HF boot logs include per-stage timing and restart median improves without regressions
T-063 P2 Clean up third-party deprecation warning noise in Rapid Router test runs (reportlab + SWIG/PyMuPDF) Engineering DONE Added narrowly scoped warning filters/containment around vetted external noise while preserving real exception visibility python3 -m pytest -q backend/app/test_unified_kb_core.py backend/app/test_knowledgebase_api.py backend/app/rapid_router/test_rapid_router_core.py backend/app/test_rapid_router_api_shell.py -> 151 passed with warning noise contained
T-065 P2 Contain known benign MuPDF startup font-warning noise from seed-doc setup-note extraction Engineering DONE Wrapped setup-note extraction in targeted stderr containment to suppress known benign font spam only Startup probe python3 - <<'PY' ... RapidRouterCore(...) ... PY now prints clean startup_ok 12 without repeated MuPDF font warning
T-060 P1 Add Rapid Router <-> Knowledgebase catalog sync contract checks Engineering DONE Added contract test asserting seeded Rapid Router catalog remains queryable via KB fast paths and provider wiring python3 -m pytest -q backend/app/test_unified_kb_core.py backend/app/test_knowledgebase_api.py backend/app/rapid_router/test_rapid_router_core.py backend/app/test_rapid_router_api_shell.py -> 151 passed
T-061 P1 Add per-stage latency instrumentation and SLO guardrails for KB helper paths Engineering DONE Added per-stage timing fields + stage SLO evaluation in eval script and shard aggregate summary output cd backend && python3 -m pytest -q app/test_unified_kb_eval150_script.py -> 6 passed
T-062 P1 Strengthen store schema-version migration tests and strict validation for Rapid Router store JSON Engineering DONE Hardened migration/load paths for malformed versions/products/prices and added regression coverage python3 -m pytest -q backend/app/rapid_router/test_rapid_router_core.py (included in Phase 4 gate 151 passed)
T-058 P1 Integrate Rapid Router store data into Unified Knowledgebase router-docs answers Engineering DONE Completed provider injection + deterministic Rapid Router fast paths + fallback coverage tests cd backend && python3 -m pytest -q app/test_unified_kb_core.py app/test_knowledgebase_api.py app/rapid_router/test_rapid_router_core.py -> 92 passed; manual API check of /api/knowledgebase/message with mode=router_docs returned deterministic_rapid_router_catalog_list_fast with rapid_router_store.json sources
T-057 P1 Validate first-login/re-login with real Auth0 credentials in auth-required runtime Engineering DONE Credentialed hosted login/logout verification passed after audience-optional + legacy-placeholder-ignore auth fixes cd frontend && npx playwright test e2e/auth.full-flow.spec.ts --reporter=line -> 1 passed; cd frontend && npx playwright test e2e/auth.spec.ts --reporter=line -> 6 passed
T-056 P1 Run a focused UX cleanup pass for Rapid Router/toolbox (progressive disclosure + clearer hierarchy) Engineering DONE Completed full 10-item UX pass in one batch (summary rail, completion chips, jump links, table view, review modal, mobile sticky CTA, and helper readability controls) cd frontend && npm run build; python3 -m pytest -q backend/app/rapid_router/test_rapid_router_core.py; python3 -m pytest -q backend/app/test_rapid_router_api_shell.py
T-055 P0 Implement MSRP + Masters contact dropdown + configuration-options pricing in Rapid Router Engineering DONE Completed and pushed in commit 176ff8f cd backend && python3 -m pytest app/rapid_router/test_rapid_router_core.py app/test_rapid_router_api_shell.py app/test_tab_final_pass_matrix.py -q -> 31 passed; cd frontend && npm run build passed
T-054 P2 Track Rapid Router file-size growth during helper rollout Engineering IN_REVIEW Monitor RapidRouter.tsx line growth and decide if helper should be split into dedicated component/module frontend/src/pages/RapidRouter.tsx line count trend captured across releases
T-053 P1 Add Rapid Router in-page helper chatbot for router selection + rep Q&A Engineering IN_REVIEW Run targeted rep-prompt QA, add feature flag, and add focused frontend tests for helper interactions Helper responses remain within timeout budget and do not regress Rapid Router submit flow
T-043 P1 Recover local access to backend/app/test_unified_kb_core.py in Dropbox workspace and commit pending local delta Engineering DONE Confirmed file readability and successful targeted pytest execution from Dropbox workspace wc -l backend/app/test_unified_kb_core.py read succeeds; cd backend && python3 -m pytest -q app/test_unified_kb_core.py -> 93 passed
T-042 P1 Reduce long-tail latency on top 10 slow cases while preserving 150/150 pass rate Engineering TODO Profile and trim delegate/web-fallback on 66,111,86,91,88,85,92,82,93,99 with per-phase budgets and cache hits Re-run CHUNK_SIZE=10 START_ID=1 END_ID=150 and target p95 < 7000ms, pass=150
T-037 P1 Post-commit stabilization for residual 75-case failure (ID 75) Engineering TODO Reproduce case 75 with profiler traces and patch mixed Verizon gateway + POTS end-to-end response path docs/evals/shards5_eval75/unified_kb_eval150_shards10_summary.json shows no failed IDs
T-064 P2 Stabilize Rapid Router 25-case suite residual semantic miss (ID 3) Engineering TODO Inspect shards5_rapidrouter25 case 3 response wording and tighten quote-clarification template for W1850 ambiguity without relaxing guardrails Re-run CHUNK_SIZE=5 START_ID=1 END_ID=25 CASES_PATH=../docs/evals/unified_kb_eval25_rapid_router_cases.json OUT_DIR=../docs/evals/shards5_rapidrouter25 and target 25/25
T-029 P1 Eliminate 75-case p95 regression versus legacy baseline (318.1ms) while preserving pass rate Engineering TODO Profile slow 75 shards (58-64 cluster) and reduce tail in POTS compare/assumption paths p95 <= 318.1ms target vs docs/evals/shards5_eval75/unified_kb_eval75_shards5_summary.json
T-030 P1 Finalize commit policy for root docs/faq/FAQ_ongoing_candidates.csv churn Engineering DONE Adopted pytest-time isolation policy via backend conftest.py so local regressions default to temp FAQ candidate path unless explicitly overridden Repeated test runs preserve root FAQ hash (sha256 unchanged before/after targeted pytest)
T-031 P2 Add focused tests for _parallel_index_search budget behavior under slow index calls Engineering DONE Added deterministic slow-stub tests for bounded in-flight submission and shared executor reuse cd backend && python3 -m pytest -q app/test_unified_kb_core.py -> 93 passed
T-034 P2 Add dedicated latency guard tests for long-form POTS rewrite path in conversation regression suite Engineering DONE Added long-form single-turn and cumulative-turn latency guard tests in POTS regression suite cd backend && python3 -m pytest -q app/test_pots_conversation_regression.py -> 3 passed
T-036 P1 Clear remaining 75-case failure (ID 75) without degrading other MSRP/Verizon intents Engineering TODO Reproduce case 75, adjust mixed Verizon gateway + POTS synthesis response to satisfy semantic scorer while preserving guardrails docs/evals/shards5_eval75/unified_kb_eval150_shards10_summary.json shows no failed IDs
T-038 P2 Prevent test-run churn in root docs/faq/FAQ_ongoing_candidates.csv during local regressions Engineering DONE Added session-level fixture that routes FAQ candidate writes to temp path by default in test runs Root FAQ candidate file hash remains stable after repeat pytest runs under default test env

Backlog

ID Priority Task Owner Notes
D-240 Added four new required Rapid Router approval attestations (180-day commitment, quote approval before IMEI release, active MDN before shipment, and truth/correctness) with matching frontend + backend validation and persistence 2026-03-07 frontend/src/pages/RapidRouter.tsx, frontend/src/pages/RapidRouter.test.tsx, backend/app/rapid_router/core.py, backend/app/rapid_router/test_rapid_router_core.py; cd backend && .venv/bin/python -m pytest -q app/rapid_router/test_rapid_router_core.py app/test_rapid_router_api_shell.py -> 53 passed; cd frontend && npx vitest run src/pages/RapidRouter.test.tsx --reporter=dot -> 5 passed; cd frontend && npm run build -> success
B-007 P2 Deferred by instruction: add Paste order lines parser (5 CR602, 2 RX60) to auto-fill quantities/models Engineering Explicitly excluded from current execution cycle; revisit only on direct user re-approval
B-005 P2 Add optional cleanup hook for shared search executor in long-lived workers Engineering Defensive hardening for unusual shutdown environments
B-006 P2 Add script-level tests for shard runner TREND_FILE/FAQ out-dir isolation defaults Engineering Guard against regressions in eval tooling

Done (Recent)

ID Task Completed On Evidence
D-226 Rapid Router review validation links now open the target accordion chain and focus the exact invalid field, fixing closed-accordion jumps in customer/order sections 2026-03-07 frontend/src/pages/RapidRouter.tsx, frontend/src/pages/RapidRouter.test.tsx; cd frontend && npx vitest run src/pages/RapidRouter.test.tsx --reporter=dot -> 8 passed; cd frontend && npm run build -> success
D-251 Pruned non-canonical timestamped eval history and local workspace clutter while preserving the canonical eval baselines, runner assets, and cleanup policy docs 2026-03-07 backend/scripts/cleanup_repo_artifacts.py, docs/evals/README.md; python3 backend/scripts/cleanup_repo_artifacts.py --no-backup -> removed_dirs=75, removed_files=62; python3 backend/scripts/cleanup_repo_artifacts.py --dry-run --no-backup -> 0 pending removals; canonical probes for latest_eval25_guarded_gpt_check, latest_eval50_guarded_gpt_check, latest_eval6_concept_check, release_gate, shards10, and shards5_eval75 all returned OK
D-233 Split the remaining dirty worktree into auditable cleanup batches, refreshed the stale Rapid Router final-pass matrix fixture, reran the full backend suite clean, normalized visible-copy casing on remaining active frontend surfaces, and archived timestamped eval reruns outside the repo 2026-03-07 backend/app/knowledgebase/core.py, backend/app/router_rag/core.py, backend/app/test_router_rag_module.py, backend/app/test_tab_final_pass_matrix.py, backend/app/test_unified_kb_core.py, frontend/src/components/PromptCoach.tsx, frontend/src/pages/MastersAI.tsx, frontend/src/pages/PotsIntake.tsx, frontend/src/pages/PotsSavingsEstimator.tsx, frontend/src/pages/RouterKnowledgebase.tsx, frontend/src/pages/RoutersAssistant.tsx, frontend/src/pages/TelcoCalculator.tsx, frontend/src/pages/TelcoCalculator.test.tsx; cd backend && .venv/bin/python -m pytest -q app/test_tab_final_pass_matrix.py -k rapid_router_final_pass_30_case_matrix -> 1 passed; cd backend && bash scripts/test_backend.sh --full -> 523 passed; cd frontend && npx vitest run src/pages/TelcoCalculator.test.tsx --reporter=dot -> 2 passed; cd frontend && npm run build -> success; timestamped eval reruns archived at /Users/petedunn/Desktop/codex_eval_archives/cleanup_eval_artifacts_20260307.tar.gz
D-241 Fixed Rapid Router order-options completion so advanced notes are optional when at least one advanced checkbox is selected, matching backend validation and removing the false review blocker 2026-03-07 frontend/src/pages/RapidRouter.tsx, frontend/src/pages/RapidRouter.test.tsx; cd frontend && npx vitest run src/pages/RapidRouter.test.tsx --reporter=dot -> 7 passed; cd frontend && npm run build -> success; cd backend && .venv/bin/python -m pytest -q app/rapid_router/test_rapid_router_core.py app/test_rapid_router_api_shell.py -> 54 passed
D-242 Normalized visible capitalization across active frontend surfaces so form labels/actions use sentence case by default and title case is reserved for structural headings/proper nouns 2026-03-07 frontend/src/components/PromptCoach.tsx, frontend/src/pages/RapidRouter.tsx, frontend/src/pages/TelcoCalculator.tsx, frontend/src/pages/PotsSavingsEstimator.tsx, frontend/src/pages/PotsIntake.tsx, frontend/src/pages/UnifiedKnowledgebase.tsx, frontend/src/pages/RouterKnowledgebase.tsx, frontend/src/pages/MastersAI.tsx, frontend/src/pages/PotsAssistant.tsx, frontend/src/pages/RoutersAssistant.tsx; cd frontend && npx vitest run src/pages/RapidRouter.test.tsx src/pages/TelcoCalculator.test.tsx src/components/FloatingRouterHelper.test.tsx --reporter=dot -> 13 passed; cd frontend && npm run build -> success
D-225 Rapid Router review validation links now open the target accordion and focus the exact invalid field, fixing the closed-accordion navigation bug in customer/order sections 2026-03-07 frontend/src/pages/RapidRouter.tsx, frontend/src/pages/RapidRouter.test.tsx; cd frontend && npx vitest run src/pages/RapidRouter.test.tsx --reporter=dot -> 7 passed; cd frontend && npm run build -> success
D-243 Rapid Router advanced configuration notes now become optional when any advanced task checkbox is selected; notes remain required only for advanced requests with no selected checkbox option 2026-03-07 frontend/src/pages/RapidRouter.tsx, frontend/src/pages/RapidRouter.test.tsx, backend/app/rapid_router/core.py, backend/app/rapid_router/test_rapid_router_core.py; cd backend && .venv/bin/python -m pytest -q app/rapid_router/test_rapid_router_core.py app/test_rapid_router_api_shell.py -> 53 passed; cd frontend && npx vitest run src/pages/RapidRouter.test.tsx --reporter=dot -> 6 passed; cd frontend && npm run build -> success
D-250 Updated Rapid Router so the flow starts on Browse, defaults payment to BoBo, requires a 7-digit BoBo Bill-to phone, and requires customer-information authorization/communication consent plus an authorization-provider name before submit; synced backend order normalization/PDF/email output to persist those fields 2026-03-07 frontend/src/pages/RapidRouter.tsx, frontend/src/pages/RapidRouter.test.tsx, backend/app/rapid_router/core.py, backend/app/rapid_router/test_rapid_router_core.py; cd backend && .venv/bin/python -m pytest -x -vv app/rapid_router/test_rapid_router_core.py -> 28 passed; cd backend && .venv/bin/python -m pytest -q app/test_rapid_router_api_shell.py -> 24 passed; frontend tsc passed but frontend Vitest/build stalled after startup under the current unified-exec saturation
D-223 Added shared preferred-public-source guidance to every active server-side web-assisted assistant path so web fallback now explicitly prefers opendevelopment.verizonwireless.com, masterstelecom.com, and 5gstore.com when relevant 2026-03-07 backend/app/assistant_fallback.py, backend/app/knowledgebase/core.py, backend/app/router_rag/core.py, backend/app/masters_ai/core.py, backend/app/pots_ai/core.py; python3 -m py_compile backend/app/assistant_fallback.py backend/app/router_rag/core.py backend/app/masters_ai/core.py backend/app/pots_ai/core.py backend/app/knowledgebase/core.py backend/app/test_router_rag_module.py backend/app/test_unified_kb_core.py backend/app/test_masters_conversation_regression.py backend/app/test_pots_conversation_regression.py -> success; direct smoke -> SMOKE_OK
D-238 Optimized the three dominant broad-suite latency buckets enough to restore 75/75 and 150/150 accuracy with zero stage-budget exits, while leaving a smaller deterministic tail-latency cleanup open for a follow-up pass 2026-03-07 backend/app/assistant_fallback.py, backend/app/knowledgebase/core.py, backend/app/test_unified_kb_core.py; cd backend && .venv/bin/python -m pytest -q app/test_unified_kb_core.py -k 'masters_contact_center_doc_lookup_prefers_filename_match_without_search or masters_pots_materials_overview_prefers_doc_fast_without_search or pots_use_case_compare_prefers_cached_provider_fast or pots_provider_emphasis_summary_routes_fast or pots_generic_objection_prompt_skips_deep_search or pots_discovery_first_routes_to_concept_fast or router_inventory_audit_skips_concept_preflight or router_gateway_device_type_skips_concept_preflight' -> 8 passed; bash backend/scripts/test_backend.sh --full -> 510 passed; docs/evals/20260307_020040_eval75_guarded_gpt_rerun/unified_kb_eval150_shards10_summary.json -> 75 / 75 passed, stage_budget_exit_rate_pct=0.0; docs/evals/20260307_020040_eval150_guarded_gpt_rerun/unified_kb_eval150_shards10_summary.json -> 150 / 150 passed, stage_budget_exit_rate_pct=0.0
D-239 Added a TTL-backed keyed title cache for Masters mention lookups, proved the cache works under refresh suppression, and confirmed that the remaining 31/32/35/37 latency tail still lives in the delegate path rather than in title rescans 2026-03-07 backend/app/knowledgebase/core.py, backend/app/test_unified_kb_core.py; cd backend && .venv/bin/python -m pytest -q app/test_unified_kb_core.py -k 'masters_securefax_doc_lookup_prefers_file_discovery or masters_securefax_doc_lookup_uses_cached_title_rows_within_refresh_ttl or masters_contact_center_doc_lookup_prefers_filename_match_without_search or masters_pots_materials_overview_prefers_doc_fast_without_search' -> 4 passed; bash backend/scripts/test_backend.sh --full -> 511 passed; docs/evals/20260307_023133_eval150_masters_lookup_slice/unified_kb_eval150_31_37.json -> 7 / 7 passed, avg_latency_ms=2499.04, p95_ms=4383.97
D-237 Fixed the 150 case-133 overblock by narrowing the code-adjudication regex to require code/inspection/AHJ context around approved/approval, and profiled the remaining broad-suite latency clusters before the next 75/150 rerun 2026-03-07 backend/app/knowledgebase/core.py, backend/app/test_unified_kb_core.py; cd backend && .venv/bin/python -m pytest -q app/test_unified_kb_core.py -k 'allows_approved_masters_references_outline_request or blocks_code_adjudication_globally or blocks_exact_current_lead_times_globally or blocks_exact_current_band_support_globally or blocks_exact_current_certification_status_globally or blocks_exact_current_lifecycle_date_globally or blocks_exact_current_availability_globally' -> 7 passed; direct case-133 spot-check returned domain='masters', retrieval_mode='masters_outline_fast', timing_ms.total=4.37; bash backend/scripts/test_backend.sh --full -> 502 passed; Router RAG smoke -> 10 queries / 0 failures
D-236 Reran guarded-GPT 25, 50, 75, and 150 against the current baselines, confirming 25 and 50 are stable while exposing new broad-suite latency tails and one 150 overblock (ID 133) that now define the next cleanup target 2026-03-07 docs/evals/20260307_010031_eval25_guarded_gpt_rerun/unified_kb_eval150_shards10_summary.json -> 25 / 25 passed, p95=499.50ms; docs/evals/20260307_010031_eval50_guarded_gpt_rerun/unified_kb_eval150_shards10_summary.json -> 50 / 50 passed, p95=381.53ms; docs/evals/20260307_010031_eval75_guarded_gpt_rerun/unified_kb_eval150_shards10_summary.json -> 75 / 75 passed, p95=3645.73ms, ab_gate.p95_non_regression=False; docs/evals/20260307_010031_eval150_guarded_gpt_rerun/unified_kb_eval150_shards10_summary.json -> 149 / 150 passed, failed_ids=[133], stage_budget_exit_rate_pct=1.33
D-235 Expanded the guarded-GPT concept pack to 50 reusable questions in 5-question shards, added global early refusals for exact/current risky asks, and validated the broader pack at 50 / 50 passed without latency regression 2026-03-07 backend/scripts/run_unified_kb_eval50_guarded_gpt_chunks.sh, docs/evals/unified_kb_eval50_guarded_gpt_cases.json, backend/app/knowledgebase/core.py, backend/app/assistant_fallback.py, backend/app/test_assistant_fallback.py, backend/app/test_unified_kb_core.py, backend/app/test_masters_conversation_regression.py; cd backend && .venv/bin/python -m pytest -q app/test_assistant_fallback.py app/test_unified_kb_core.py -k 'contact_center or exact_current or code_adjudication or high_risk_code_compliance or lead_times_globally or band_support_globally or certification_status_globally or lifecycle_date_globally or availability_globally' -> 11 passed; set -a && source .env.codex && set +a && cd backend && OUT_DIR=../docs/evals/latest_eval50_guarded_gpt_check ./scripts/run_unified_kb_eval50_guarded_gpt_chunks.sh -> 50 / 50 passed; bash backend/scripts/test_backend.sh --full -> 501 passed
D-234 Implemented Phase 1 and Phase 2 together by hardening blocked-case tests, fixing false-positive regulatory matching, narrowing strict-citation gating for safe concept explainers, and expanding deterministic concept preflight so the reusable 25-case guarded-GPT pack now runs 25 / 25 passed with the POTS concept shard in low-millisecond latency 2026-03-06 backend/app/assistant_fallback.py, backend/app/knowledgebase/core.py, backend/app/pots_ai/core.py, backend/app/test_assistant_fallback.py, backend/app/test_pots_conversation_regression.py, backend/app/test_unified_kb_core.py; cd backend && .venv/bin/python -m pytest -q app/test_assistant_fallback.py app/test_pots_conversation_regression.py app/test_unified_kb_core.py -k 'ul_substring or real_ul_compliance or replacement_plain_english or multisite_stays_internal_fast or dual_pathway or copper_sunset' -> 7 passed; cd backend && .venv/bin/python -m pytest -q app/test_assistant_fallback.py app/test_router_rag_module.py app/test_masters_conversation_regression.py app/test_pots_conversation_regression.py app/test_unified_kb_core.py -> 205 passed; set -a && source .env.codex && set +a && backend/scripts/run_unified_kb_eval25_guarded_gpt_chunks.sh -> 25 / 25 passed; bash backend/scripts/test_backend.sh --full -> 493 passed
D-244 Standardized all active backend/runtime LLM defaults, current env examples, and local repo env pins on gpt-5-mini, fixed the POTS GPT-5 temperature incompatibility, and revalidated the guarded-GPT acceptance pack at 25 / 25 passed 2026-03-06 README.md, backend/.env.test.example, .env.codex, backend/.env.codex, backend/app/main.py, backend/app/chat_nlu.py, backend/app/knowledgebase/core.py, backend/app/router_rag/core.py, backend/app/masters_ai/core.py, backend/app/pots_ai/core.py, backend/app/routers/router_core.py, backend/app/test_pots_conversation_regression.py, backend/scripts/unified_kb_eval150.py, backend/scripts/router_rag_eval50.py, backend/scripts/router_rag_smoke.py, backend/scripts/run_unified_kb_eval150_chunks.sh; cd backend && .venv/bin/python -m pytest -q app/test_pots_conversation_regression.py -k 'concept_fallback_for_generic_pots_question or llm_synthesis_omits_temperature_for_gpt5_models' -> 2 passed; bash backend/scripts/test_backend.sh --full -> 478 passed; cd frontend && npx tsc -p tsconfig.json --noEmit -> success; cd frontend && npm run build -> success; cd frontend && npm run test -> 31 files / 111 passed; docs/evals/20260306_230403_eval25_gpt5mini_default/unified_kb_eval150_shards10_summary.json -> 25 / 25 passed
D-245 Added a canonical reusable 25-question guarded-GPT eval pack (5 shards of 5) plus a dedicated shard runner, then stabilized the suite to 24 / 25 passed (96.0%) with only residual case 13 left open 2026-03-06 docs/evals/unified_kb_eval25_guarded_gpt_cases.json, backend/scripts/run_unified_kb_eval25_guarded_gpt_chunks.sh, docs/evals/README.md, docs/evals/latest_eval25_guarded_gpt_check/unified_kb_eval150_shards10_summary.json; targeted reruns for IDs 6,7,8,11,15; final aggregate 24 / 25 passed, failed_ids=[13]
D-246 Added a shared assistant-family concept-fallback module with allow/deny gates, gpt-5-mini model-only fallback, explicit provenance labels, fallback-only +4s budget extension, and GPT+web refinement only when the concept answer still needed current/public information 2026-03-06 backend/app/assistant_fallback.py, backend/app/knowledgebase/core.py, backend/app/router_rag/core.py, backend/app/masters_ai/core.py, backend/app/pots_ai/core.py, backend/app/main.py, frontend/src/utils/chatProvenance.ts, frontend/src/components/chat/ConversationHeader.tsx, frontend/src/pages/UnifiedKnowledgebase.tsx, frontend/src/pages/RouterKnowledgebase.tsx, frontend/src/pages/MastersAI.tsx, frontend/src/pages/PotsAssistant.tsx, frontend/src/pages/RoutersAssistant.tsx; cd backend && .venv/bin/python -m pytest -q app/test_assistant_fallback.py app/test_unified_kb_core.py app/test_router_rag_module.py app/test_masters_conversation_regression.py app/test_pots_conversation_regression.py app/test_chat_guidance_api.py app/test_knowledgebase_api.py -> 202 passed; bash backend/scripts/test_backend.sh --full -> 477 passed; cd frontend && npm run test -> 31 files / 111 passed; docs/evals/latest_eval6_concept_check/unified_kb_eval150_shards10_summary.json -> 6 / 6 passed
D-247 Completed the requested full validation sweep: backend full suite green, frontend full suite green, live Playwright reduced to one hosted POTS provider-coverage miss, and OpenAI shard suites landed at 146/150 (97.3%), 73/75 (97.3%), and 50/50 (100%); also patched local provider-card building to backfill missing providers such as MetTel from indexed evidence mapped back to known files 2026-03-06 backend/app/knowledgebase/core.py, backend/app/test_unified_kb_core.py; bash backend/scripts/test_backend.sh --full -> 459 passed + Router RAG smoke 10/10; cd frontend && npx tsc -p tsconfig.json --noEmit -> success; cd frontend && npm run build -> success; cd frontend && npm run test -> 30 files / 106 passed; cd frontend && npx playwright test --config=playwright.config.ts -> 9 passed / 1 failed / 4 skipped; docs/evals/20260306_190557_eval150_rerun/unified_kb_eval150_shards10_summary.json; docs/evals/20260306_192259_eval75_rerun/unified_kb_eval150_shards10_summary.json; docs/evals/20260306_193023_eval50_rerun/unified_kb_eval150_shards10_summary.json; focused regressions 2 passed + 2 passed
D-229 Enforced the current UI-lock scan rules by removing collapsed-state banners, hiding the default header status button, demoting coach/browse actions that competed with the page primary CTA, and consolidating Rapid Router stage progression under the sticky cart 2026-03-06 frontend/src/components/AssistantWorkspace.tsx, frontend/src/components/ConversationalSidePanel.tsx, frontend/src/components/PromptCoach.tsx, frontend/src/components/BrandHeader.tsx, frontend/src/pages/RapidRouter.tsx; cd frontend && npx tsc -p tsconfig.json --noEmit -> success; cd frontend && npx vitest run src/components/AssistantWorkspace.test.tsx src/components/PromptCoach.test.tsx src/components/BrandHeader.test.tsx src/pages/RapidRouter.test.tsx --reporter=dot -> 11 passed; cd frontend && npm run build -> success; cd frontend && npm run test -> 30 files / 105 passed; git diff --check -> success
D-228 Standardized UnifiedKnowledgebase, RouterKnowledgebase, RoutersAssistant, MastersAI, and PotsAssistant on one assistant shell with shared auto-collapsing setup, then added focused setup-panel regression coverage 2026-03-06 frontend/src/components/AssistantWorkspace.tsx, frontend/src/components/AssistantWorkspace.test.tsx, frontend/src/pages/UnifiedKnowledgebase.tsx, frontend/src/pages/RouterKnowledgebase.tsx, frontend/src/pages/RoutersAssistant.tsx, frontend/src/pages/MastersAI.tsx, frontend/src/pages/PotsAssistant.tsx; cd frontend && npx tsc -p tsconfig.json --noEmit -> success; cd frontend && npx vitest run src/components/AssistantWorkspace.test.tsx src/components/PageArchetypes.test.tsx --reporter=dot -> 4 passed; cd frontend && npm run build -> success; cd frontend && npm run test -> 30 files / 105 passed
D-227 Rebuilt RapidRouter into a staged commerce flow with one active step at a time, a sticky cart rail, and collapsed Commerce tools, then added focused regression coverage for the new behavior 2026-03-06 frontend/src/pages/RapidRouter.tsx, frontend/src/pages/RapidRouter.test.tsx; cd frontend && npx tsc -p tsconfig.json --noEmit -> success; cd frontend && npx vitest run src/pages/RapidRouter.test.tsx --reporter=dot -> 2 passed; cd frontend && npm run build -> success; cd frontend && npm run test -> 29 files / 103 passed
D-226 Collapsed Telco assumptions, what-if mode, diagnostics, quote helpers, scenario JSON/CSV, and assistant coaching into one shared Advanced drawer so the default calculator surface stays on the business flow 2026-03-06 frontend/src/pages/TelcoCalculator.tsx, frontend/src/pages/TelcoCalculator.test.tsx; cd frontend && npx tsc -p tsconfig.json --noEmit -> success; cd frontend && npx vitest run src/pages/TelcoCalculator.test.tsx --reporter=dot -> 2 passed; cd frontend && npm run build -> success; cd frontend && npm run test -> 28 files / 101 passed
D-248 Rebuilt TelcoCalculator as a four-step sequence (Locations, Pricing, Results, Export) and added regression coverage for the new step flow 2026-03-06 frontend/src/pages/TelcoCalculator.tsx, frontend/src/pages/TelcoCalculator.test.tsx; cd frontend && npx tsc -p tsconfig.json --noEmit -> success; cd frontend && npx vitest run src/pages/TelcoCalculator.test.tsx --reporter=dot -> 1 passed; cd frontend && npm run build -> success; cd frontend && npm run test -> 28 files / 100 passed
D-224 Replaced paragraph-style POTS instructions with a shared three-line StepGuide pattern across the merged estimate/intake flow so each step now states what it does, what is needed now, and what happens next 2026-03-06 frontend/src/components/ui.tsx, frontend/src/pages/PotsEstimateIntake.tsx, frontend/src/pages/PotsSavingsEstimator.tsx, frontend/src/pages/PotsIntake.tsx, frontend/src/pages/PotsEstimateIntake.test.tsx, frontend/src/pages/PotsSavingsEstimator.test.tsx, frontend/src/pages/PotsIntake.test.tsx; cd frontend && npx tsc -p tsconfig.json --noEmit -> success; cd frontend && npx vitest run src/pages/PotsEstimateIntake.test.tsx src/pages/PotsSavingsEstimator.test.tsx src/pages/PotsIntake.test.tsx src/pages/PotsWorkspace.test.tsx --reporter=dot -> 23 passed; cd frontend && npm run build -> success; cd frontend && npm run test -> 27 files / 99 passed
D-249 Flattened the embedded PotsEstimateIntake shell by adding explicit embedded-mode rendering to the merged wrapper, estimator, and intake so the combined flow no longer feels like nested full-page cards 2026-03-06 frontend/src/pages/PotsEstimateIntake.tsx, frontend/src/pages/PotsSavingsEstimator.tsx, frontend/src/pages/PotsIntake.tsx; cd frontend && npx tsc -p tsconfig.json --noEmit -> success; cd frontend && npx vitest run src/pages/PotsEstimateIntake.test.tsx src/pages/PotsSavingsEstimator.test.tsx src/pages/PotsIntake.test.tsx src/pages/PotsWorkspace.test.tsx --reporter=dot -> 23 passed; cd frontend && npm run build -> success; cd frontend && npm run test -> 27 files / 99 passed
D-222 Converted PotsWorkspace routing into a one-question-at-a-time conversation with answer cards, compact Why this matters disclosure, and a final review/edit step while preserving the existing triage payload 2026-03-06 frontend/src/pages/PotsWorkspace.tsx, frontend/src/pages/PotsWorkspace.test.tsx; cd frontend && npx tsc -p tsconfig.json --noEmit -> success; cd frontend && npx vitest run src/pages/PotsWorkspace.test.tsx --reporter=dot -> 10 passed; cd frontend && npm run build -> success; cd frontend && npm run test -> 27 files / 99 passed
D-221 Moved active-project creation and saved-project switching into the Project tools drawer so PotsWorkspace no longer shows setup UI in the main wizard by default 2026-03-06 frontend/src/pages/PotsWorkspace.tsx, frontend/src/pages/PotsWorkspace.test.tsx; cd frontend && npx tsc -p tsconfig.json --noEmit -> success; cd frontend && npx vitest run src/pages/PotsWorkspace.test.tsx --reporter=dot -> 9 passed; cd frontend && npm run build -> success
D-220 Converted PotsWorkspace from a stacked dashboard into a true wizard shell with one active step card plus a focused utilities drawer for routing/intake 2026-03-06 frontend/src/pages/PotsWorkspace.tsx, frontend/src/pages/PotsWorkspace.test.tsx; cd frontend && npx tsc -p tsconfig.json --noEmit -> success; cd frontend && npx vitest run src/pages/PotsWorkspace.test.tsx --reporter=dot -> 8 passed; cd frontend && npm run build -> success; cd frontend && npm run test -> 27 files / 97 passed; git diff --check -> success
D-219 Completed the app-wide destructive-action confirmation sweep, added shared confirmation helper + cancel-aware slash resets, and covered the highest-risk cancel paths with focused frontend regression tests 2026-03-06 frontend/src/utils/confirmAction.ts, frontend/src/utils/chatCommands.ts, frontend/src/pages/PotsEstimateIntake.tsx, frontend/src/pages/PotsIntake.tsx, frontend/src/pages/PotsWorkspace.tsx, frontend/src/pages/TelcoCalculator.tsx, frontend/src/pages/RapidRouter.tsx, frontend/src/pages/UnifiedKnowledgebase.tsx, frontend/src/pages/RouterKnowledgebase.tsx, frontend/src/pages/MastersAI.tsx, frontend/src/pages/PotsAssistant.tsx, frontend/src/pages/RoutersAssistant.tsx, frontend/src/components/FloatingRouterHelper.tsx; cd frontend && npx tsc -p tsconfig.json --noEmit -> success; cd frontend && npx vitest run src/utils/chatCommands.test.ts src/utils/confirmAction.test.ts src/pages/PotsSavingsEstimator.test.tsx src/pages/PotsEstimateIntake.test.tsx src/pages/PotsIntake.test.tsx src/pages/PotsWorkspace.test.tsx --reporter=dot -> 27 passed; cd frontend && npm run build -> success; cd frontend && npm run test -> 24 files / 86 passed
D-218 Finished the remaining POTS intake density pass, added focused intake/workspace regression coverage, and used desktop/mobile browser QA to justify a true single-open workspace accordion plus closed-by-default intake scope disclosures 2026-03-06 frontend/src/pages/PotsIntake.tsx, frontend/src/pages/PotsIntake.test.tsx, frontend/src/pages/PotsWorkspace.tsx, frontend/src/pages/PotsWorkspace.test.tsx; cd frontend && npx vitest run src/pages/PotsIntake.test.tsx src/pages/PotsEstimateIntake.test.tsx src/pages/PotsWorkspace.test.tsx --reporter=dot -> 13 passed; cd frontend && npm run build -> success; cd frontend && npm run test -> 23 files / 79 passed; local Playwright/browser QA at 1440x1024 and 390x844 confirmed the final disclosure defaults
D-217 Simplified the active POTS estimate/intake experience by hiding support chrome behind disclosures, gating estimate inputs behind customer basics, and collapsing the full estimate math until results are requested 2026-03-06 frontend/src/pages/PotsSavingsEstimator.tsx, frontend/src/pages/PotsEstimateIntake.tsx, frontend/src/pages/PotsIntake.tsx, frontend/src/pages/PotsSavingsEstimator.test.tsx; cd frontend && npx vitest run src/pages/PotsSavingsEstimator.test.tsx src/pages/PotsEstimateIntake.test.tsx --reporter=dot -> 6 passed; cd frontend && npm run build -> success; cd frontend && npm run test -> 22 files / 73 passed
D-216 Clarified POTS estimator start paths with an explicit three-mode chooser and made intake seeding follow the selected mode (quick estimate, totals now, site-by-site now) 2026-03-06 frontend/src/pages/PotsSavingsEstimator.tsx, frontend/src/pages/PotsSavingsEstimator.test.tsx, frontend/src/pages/PotsEstimateIntake.tsx, frontend/src/pages/PotsEstimateIntake.test.tsx; cd frontend && npx vitest run src/pages/PotsSavingsEstimator.test.tsx src/pages/PotsEstimateIntake.test.tsx --reporter=dot -> 5 passed; cd frontend && npm run build -> success; cd frontend && npm run test -> 22 files / 72 passed
D-215 Verified hosted auth-required runtime after removing legacy masters-toolkit-api audience dependency: smoke suite and full credentialed login/logout flow both passed 2026-03-06 cd frontend && npx playwright test e2e/auth.spec.ts --reporter=line -> 6 passed; cd frontend && npx playwright test e2e/auth.full-flow.spec.ts --reporter=line -> 1 passed
D-213 Added dedicated Playwright coverage for Rapid Router two-user saved-profile isolation and enabled ignored local frontend/.env.e2e(.local) loading for repeatable credentialed hosted runs 2026-03-05 frontend/e2e/rapid-router.memory-isolation.spec.ts, frontend/playwright.config.ts, frontend/e2e.env.template, frontend/package.json; npm --prefix frontend run build -> success; cd frontend && npx playwright test e2e/rapid-router.memory-isolation.spec.ts --list -> 1 test listed
D-214 Removed active reliance on legacy Auth0 audience https://masters-toolkit-api by treating it as invalid/ignored in frontend and backend auth config, and added user-facing callback guidance for the exact Service not found error 2026-03-06 frontend/src/auth/config.ts, frontend/src/auth/errorUtils.ts, frontend/src/auth/config.test.ts, frontend/src/auth/errorUtils.test.ts, backend/app/auth.py, backend/app/test_auth.py; cd frontend && npx vitest run src/auth/config.test.ts src/auth/errorUtils.test.ts src/components/HealthStatusModal.test.tsx -> 16 passed; python3 -m pytest -q backend/app/test_auth.py backend/app/test_startup_rate_limit.py -> 31 passed; npm --prefix frontend run build -> success
D-212 Scoped shared Smart Profile, resume cards, POTS carryover, and Rapid Router repeat-draft memory per authenticated user so customer data is no longer browser-global across logins 2026-03-05 frontend/src/utils/customerMemory.ts, frontend/src/utils/customerMemory.test.ts, frontend/src/auth/AuthGate.tsx, frontend/src/main.tsx, frontend/src/pages/RapidRouter.tsx; npm --prefix frontend run build -> success; cd frontend && npx vitest run src/utils/customerMemory.test.ts --pool=threads --maxWorkers=1 -> 4 passed
D-211 Fixed battery-router shortlist omission so removable option (CR202-Lite) is preserved for best routers with batteries and added regression coverage 2026-03-05 backend/app/knowledgebase/core.py, backend/app/test_unified_kb_core.py; PYTHONPATH=backend python3 -m pytest -q backend/app/test_unified_kb_core.py -k "battery_best_list_keeps_removable_option" -> 1 passed; runtime harness probe now returns CR202-Lite in battery options table
D-210 Executed additional 150 rerun attempt to reach >=95% target and logged stochastic variance outcome for follow-up (T-079) 2026-03-05 cd backend && CHUNK_SIZE=15 START_ID=1 END_ID=150 SEMANTIC_POLICY=all OUT_DIR=../docs/evals/20260305T021154_phase3_gate150_rerun2_final CASES_PATH=../docs/evals/unified_kb_eval150_cases.json ./scripts/run_unified_kb_eval150_chunks.sh -> 141/150 (94.0%), failed IDs [48,55,78,89,99,107,110,112,118], stage_budget_exits=0
D-209 Completed gameplan Phase 3 evaluation verification gate command set (150/75/50) with quality floor maintained and docs/eval artifacts published 2026-03-05 150: 142/150 (94.7%), failed [24,36,88,98,99,104,112,129] (docs/evals/20260305T013817_phase3_gate150_final/unified_kb_eval150_shards10_summary.json); 75: 74/75 (98.7%), failed [3] (docs/evals/20260305T015614_phase3_gate75_final/unified_kb_eval150_shards10_summary.json); 50: 50/50 (100.0%), failed [] (docs/evals/20260305T020530_phase3_gate50_final/unified_kb_eval150_shards10_summary.json)
D-208 Completed gameplan Phase 2 consolidation verification gate and moved T-076/T-077 to hosted sign-off (IN_REVIEW) 2026-03-05 npm --prefix frontend run build -> success; npm --prefix frontend run test -> 19 files / 59 tests passed; python3 -m pytest -q backend/app/test_knowledgebase_api.py backend/app/routers/router_tab_smoke_test.py backend/app/test_tab_final_pass_matrix.py backend/app/test_pots_response_contract.py backend/app/test_pots_conversation_regression.py -> 68 passed
D-207 Completed gameplan Phase 5 repo/tooling hygiene hardening (T-030, T-038, T-031, T-034, T-043) with verification reruns 2026-03-05 Added backend/app/conftest.py fixture isolation + warning filters; added _parallel_index_search budget tests and POTS long-form latency guard tests; verified cd backend && python3 -m pytest -q app/test_unified_kb_core.py app/test_pots_conversation_regression.py app/test_unified_kb_eval150_script.py -> 102 passed; root FAQ hash unchanged across repeat run
D-206 Completed gameplan Phase 4 data/contract/migration hardening (T-081, T-060, T-061, T-062, T-063, T-065) 2026-03-05 Updated Crown deterministic facts in feb2026routers.csv; added Rapid Router/KB contract tests and store migration hardening/tests; added stage-timing SLO outputs in eval scripts; contained MuPDF/reportlab noise; phase gate python3 -m pytest -q backend/app/test_unified_kb_core.py backend/app/test_knowledgebase_api.py backend/app/rapid_router/test_rapid_router_core.py backend/app/test_rapid_router_api_shell.py -> 151 passed
D-205 Completed Phase 1 verification gate from the saved gameplan (frontend build/test + Rapid Router backend regression command set) 2026-03-05 npm --prefix frontend run build -> success; npm --prefix frontend run test -> 19 files / 59 tests passed; python3 -m pytest -q backend/app/rapid_router/test_rapid_router_core.py backend/app/test_rapid_router_api_shell.py -> 49 passed, 9 warnings
D-204 Completed Phase 0 auth verification gate from the saved gameplan with hosted URL substitution and explicit credential-blocker capture 2026-03-05 cd frontend && npx vitest run src/auth/config.test.ts src/auth/errorUtils.test.ts -> 13 passed; python3 -m pytest -q backend/app/test_auth.py -> 21 passed; cd frontend && E2E_DISABLE_WEBSERVER=true E2E_BASE_URL=https://crazycrazypete-masters-four-tab-openai.hf.space npx playwright test e2e/auth.full-flow.spec.ts -> 1 skipped
D-203 Saved next-thread phased execution gameplan for remaining fixes/enhancements and recorded explicit parser deferral scope 2026-03-04 docs/dev/next_thread_remaining_fixes_enhancements_gameplan.md; T-086 added; B-007 parser deferred
D-202 Re-ran post-edit verification gate for Smart Profile/customer-memory + resume/carryover + KB action-chip rollout before handoff 2026-03-04 git status --short; npm --prefix frontend run build -> success; cd frontend && npx vitest run src/utils/customerMemory.test.ts --pool=threads --maxWorkers=1 -> 3 passed
D-201 Implemented shared Smart Profile/Customer Memory utility + resume/repeat work cards, hardened estimate->intake carryover replay, and Knowledgebase action chips to Router Helper / Rapid Router order draft 2026-03-04 frontend/src/utils/customerMemory.ts, frontend/src/utils/customerMemory.test.ts, frontend/src/pages/PotsSavingsEstimator.tsx, frontend/src/pages/PotsEstimateIntake.tsx, frontend/src/pages/UnifiedKnowledgebase.tsx, frontend/src/pages/RapidRouter.tsx, frontend/src/App.tsx; npm --prefix frontend run build -> success; cd frontend && npx vitest run src/utils/customerMemory.test.ts --pool=threads --maxWorkers=1 -> 3 passed
D-200 Published consolidated checkpoint commit to both required remotes (origin, hf-fourtab) 2026-03-04 commit fcd2934; git push origin main -> e1ec24c..fcd2934; git push hf-fourtab main -> e1ec24c..fcd2934
D-199 Added persistent header one-click Slack support chip in shared BrandHeader (global across tabs) 2026-03-04 frontend/src/components/BrandHeader.tsx, frontend/src/components/BrandHeader.test.tsx; npm --prefix frontend run build -> success; cd frontend && npx vitest run src/components/BrandHeader.test.tsx --pool=threads --maxWorkers=1 -> 4 passed
D-198 Added global floating support launcher with Slack-first CTA plus email/phone one-click fallback and command-palette open action 2026-03-04 frontend/src/components/FloatingSupportLauncher.tsx, frontend/src/App.tsx; npm --prefix frontend run build -> success; cd frontend && npx vitest run src/components/BrandHeader.test.tsx src/components/PromptCoach.test.tsx --pool=threads --maxWorkers=1 -> 5 passed
D-197 Implemented Rapid Router split-shipping locations for single-model orders with frontend + backend validation and order PDF/email location breakdown 2026-03-04 frontend/src/pages/RapidRouter.tsx; backend/app/rapid_router/core.py; backend/app/rapid_router/test_rapid_router_core.py; python3 -m pytest -q backend/app/rapid_router/test_rapid_router_core.py -> 25 passed, 6 warnings; python3 -m pytest -q backend/app/test_rapid_router_api_shell.py -> 24 passed, 9 warnings; npm --prefix frontend run build -> success
D-196 Imported Dragon quick guide + Spark/Kadet docs into canonical Router RAG corpus and added deterministic Dragon/M106/M519/K500A/K300NB router-fact coverage with alias normalization 2026-03-04 Intake run backend/scripts/router_rag_intake_pipeline.sh /tmp/router_rag_intake_2026-03-04_dragon_spark_kadet -> included=6; canonical files under _RAG_Ready_KB_Organized/01_documents/routers/connect_csg/ and /routers/verizon/; updated feb2026routers.csv; alias/routing updates in backend/app/knowledgebase/core.py; regression python3 -m pytest -q backend/app/test_unified_kb_core.py -k dragon_and_katalyst_phrase_aliases -> 1 passed
D-195 Extracted and reported exact failed-question lists (ID + query text) for recovered 150/75/50 suites from shard artifacts 2026-02-28 Parsed results[] where pass=false from docs/evals/shards15_eval150_openai_all_20260227_fix12/, docs/evals/shards10_eval75_openai_all_20260227_fix8/, docs/evals/shards10_eval50_openai_all_20260227_fix7_full/; counts 8, 2, 3 respectively
D-194 Restored eval quality gate above 92% across all requested OpenAI suites and validated with targeted KB regressions 2026-02-27 python3 -m pytest -q backend/app/test_unified_kb_core.py backend/app/test_knowledgebase_api.py -> 96 passed, 9 warnings; docs/evals/shards15_eval150_openai_all_20260227_fix12/unified_kb_eval150_shards10_summary.json -> 142/150 (94.7%); docs/evals/shards10_eval75_openai_all_20260227_fix8/unified_kb_eval150_shards10_summary.json -> 73/75 (97.3%); docs/evals/shards10_eval50_openai_all_20260227_fix7_full/unified_kb_eval150_shards10_summary.json -> 47/50 (94.0%)
D-193 Patched Auth0 audience normalization to prefer non-trailing-slash identifier first, fixing login callback failures caused by https://masters-toolkit-api/ audience selection 2026-02-27 frontend/src/auth/config.ts, frontend/src/auth/config.test.ts, backend/app/auth.py, backend/app/test_auth.py; cd frontend && npx vitest run src/auth/config.test.ts src/auth/errorUtils.test.ts -> 13 passed; python3 -m pytest -q backend/app/test_auth.py -> 21 passed; npm --prefix frontend run build -> success
D-192 Executed requested OpenAI shard evaluation batch (150 + 75 + new 50) in 10 groups each and published aggregate summaries + failed-ID lists 2026-02-27 docs/evals/shards10_eval150_openai_all_20260227/unified_kb_eval150_shards10_summary.json -> 119/150; docs/evals/shards10_eval75_openai_all_20260227/unified_kb_eval150_shards10_summary.json -> 73/75; docs/evals/shards10_eval50_openai_all_20260227/unified_kb_eval150_shards10_summary.json -> 23/50; generated case pack docs/evals/unified_kb_eval50_new_questions_router_helper_cases.json
D-191 Implemented first-pass single-workspace convergence: added unified POTS Estimates + Intake tab flow with estimator handoff + fresh-start session behavior, plus Knowledgebase action/global command to launch floating router helper from any page 2026-02-27 frontend/src/pages/PotsEstimateIntake.tsx, frontend/src/pages/PotsSavingsEstimator.tsx, frontend/src/App.tsx, frontend/src/pages/UnifiedKnowledgebase.tsx, frontend/src/components/FloatingRouterHelper.tsx; npm --prefix frontend run build -> success; npm --prefix frontend run test -> 18 files / 54 tests passed; python3 -m pytest -q backend/app/test_tab_final_pass_matrix.py backend/app/test_knowledgebase_api.py backend/app/routers/router_tab_smoke_test.py -> 63 passed, 9 warnings
D-190 Completed local cross-tab validation sweep and fixed two test roadblocks (router compare fallback scenario stability + fast E2E skip on non-frontend base URLs) 2026-02-27 python3 -m pytest -q backend/app -> 357 passed; python3 -m pytest -q backend/app/test_tab_final_pass_matrix.py -> 4 passed; npm --prefix frontend run test -> 18 files / 54 tests; BASE_URL=http://127.0.0.1:4173/ node frontend/tmp/visual_audit/run_visual_audit.mjs -> 21 runs, 0 issues; backend/app/routers/router_tab_smoke_test.py; frontend/e2e/upload.features.spec.ts
D-189 Removed recommended wording from Knowledgebase Mode options copy (Auto line) while preserving mode behavior text 2026-02-27 frontend/src/pages/UnifiedKnowledgebase.tsx; npm --prefix frontend run build -> success; npm --prefix frontend run test -> 18 files / 54 tests passed
D-188 Consolidated Knowledgebase answer metadata (Why, Next action, Files, Sources) into a single collapsed Response details accordion 2026-02-27 frontend/src/pages/UnifiedKnowledgebase.tsx; npm --prefix frontend run build -> success; npm --prefix frontend run test -> 18 files / 54 tests passed
D-187 Imported IR302 doc batch (quick guide/user manual/spec), rebuilt chunks, and added deterministic IR302 fact row with MSRP $179.00 2026-02-27 backend/scripts/router_rag_import_corpus.py mapping additions; bash backend/scripts/router_rag_intake_pipeline.sh tmp/router_rag_intake_2026-02-27_ir302 -> included=3; reports docs/reports/router_rag_intake_ir302_20260227TIR302.csv/.md; feb2026routers.csv IR302 row; API probe (router_docs) returned deterministic_router_fact_index with MSRP $179.00
D-186 Verified RV50X Feb2022 -F upload is already in canonical corpus via duplicate-hash mapping and added deterministic RV50X host-interface fact coverage (single Ethernet + RS-232 serial) with regression test 2026-02-27 python3 backend/scripts/router_rag_import_corpus.py --source-dir /tmp/rv50x-intake-* --data-dir _RAG_Ready_KB_Organized ... -> skipped=1 (duplicate_hash -> Semtech-RV50X-Data Sheet-Feb2022.pdf); feb2026routers.csv RV50X row added; backend/app/test_unified_kb_core.py new RV50X host-interface test; python3 -m pytest -q backend/app/test_unified_kb_core.py -k \"router_fact_fast_path_from_csv or rv50x_host_interfaces_include_single_ethernet_and_serial\" -> 2 passed; python3 -m pytest -q backend/app/test_knowledgebase_api.py -> 7 passed, 9 warnings
D-185 Generated and executed an ungraded 50-question Knowledgebase batch and captured full raw responses for manual review/scoring 2026-02-27 docs/evals/kb_50_new_questions_results_2026-02-27.json, docs/evals/kb_50_new_questions_results_2026-02-27.md; python3 - <<'PY' ... FastAPI TestClient batch ... PY -> 50/50 HTTP 200 in ~16.0s
D-184 Replaced Rapid Router primary logo asset with user-provided arrow-logo variant (asset-only swap; no layout logic changes) 2026-02-27 frontend/public/rapid-router-primary-logo.png; source /Users/petedunn/Library/Containers/com.apple.Preview/Data/tmp/PreviewTemp-QpJOdK/Untitled Image 3.png; npm --prefix frontend run build -> success
D-183 Completed deep-dive multi-viewport render verification and patched residual overflow hotspots (BrandHeader mobile wrapping, Rapid setup-note URL wrapping, Rapid signature block overflow containment) 2026-02-27 frontend/src/components/BrandHeader.tsx, frontend/src/pages/RapidRouter.tsx, frontend/src/pages/UnifiedKnowledgebase.tsx, frontend/src/pages/RouterKnowledgebase.tsx, frontend/src/pages/RoutersAssistant.tsx; npm --prefix frontend run build -> success; npm --prefix frontend run test -> 54 passed; visual audit frontend/frontend/tmp/visual_audit/visual_audit_results.json -> 21 runs, 0 failures, 0 visual issues
D-182 Executed phase-1 non-Rapid cross-tab UI polish pass (shared chat table renderer + sticky composers + Telco table readability + POTS side-rail/flow quick wins) 2026-02-27 frontend/src/components/chat/markdownTableComponents.tsx, frontend/src/components/chat/ChatComposer.tsx, frontend/src/pages/UnifiedKnowledgebase.tsx, frontend/src/pages/RouterKnowledgebase.tsx, frontend/src/pages/RoutersAssistant.tsx, frontend/src/pages/TelcoCalculator.tsx, frontend/src/pages/PotsSavingsEstimator.tsx, frontend/src/pages/PotsIntake.tsx; npm --prefix frontend run build -> success
D-181 Added centered Rapid Router header logo block using new public asset and responsive framed hero treatment 2026-02-27 frontend/src/pages/RapidRouter.tsx, frontend/public/rapid-router-primary-logo.png; npm --prefix frontend run build -> success
D-180 Completed non-Rapid tab UI/visual advisory sweep and produced per-tab advanced suggestion pack (no-code) 2026-02-27 Reviewed frontend/src/App.tsx + non-Rapid page components; recommendations delivered in chat; no runtime code changes
D-179 Published helper comparison-table UX simplification checkpoint to both required remotes 2026-02-27 commit 1014b78; git push origin main -> 087d265..1014b78; git push hf-fourtab main -> 087d265..1014b78
D-178 Simplified helper comparison output to table-first UI and made CTA consistently explicit (Click here for comparison table) 2026-02-27 frontend/src/components/FloatingRouterHelper.tsx, frontend/src/pages/RapidRouter.tsx; npm --prefix frontend run build -> success
D-177 Published router-ingestion checkpoint commit to both required remotes 2026-02-27 commit 8050c76; git push origin main -> 21c3962..8050c76; git push hf-fourtab main -> 21c3962..8050c76
D-176 Processed and ingested 7 new router PDFs into canonical Router RAG corpus with deterministic mapping, parse/chunk rebuild, and recall verification 2026-02-27 backend/scripts/router_rag_import_corpus.py; docs/reports/router_rag_intake_2026-02-27_batch_import_report_20260227T005515Z.csv (included=7, skipped=0); python3 backend/scripts/router_rag_smoke.py --query ... -> 5/5 pass
D-175 Published Rapid Router UI polish/readability checkpoint to both required remotes 2026-02-27 commit ac92a10; git push origin main -> 9897015..ac92a10; git push hf-fourtab main -> 9897015..ac92a10
D-174 Executed full Rapid Router/floating-helper visual polish batch from advisory list (density toggle, staged submit CTA hierarchy, compact right rail, expandable fix list, helper long-answer details) 2026-02-27 frontend/src/pages/RapidRouter.tsx, frontend/src/components/FloatingRouterHelper.tsx; npm --prefix frontend run build -> success; python3 -m pytest -q backend/app/rapid_router/test_rapid_router_core.py backend/app/test_rapid_router_api_shell.py -> 45 passed, 9 warnings
D-173 Published generalized Ericsson/CradlePoint ...50 non-WiFi alias mapping checkpoint to both required remotes 2026-02-26 commit b3420ef; git push origin main -> aa0ddb8..b3420ef; git push hf-fourtab main -> aa0ddb8..b3420ef
D-172 Generalized Ericsson/CradlePoint ...50 model alias rule so non-WiFi variants map to ...00 base models (for example AER2250 -> AER2200) with matching variant notes/Wi-Fi override behavior 2026-02-26 backend/app/routers/router_core.py, backend/app/routers/router_tab_smoke_test.py; python3 -m pytest -q backend/app/routers/router_tab_smoke_test.py -> 52 passed, 9 warnings
D-171 Published Rapid Router rail-width + currency-alignment patch checkpoint to both required remotes 2026-02-26 commit 00ea9d8; git push origin main -> 70f3a5c..00ea9d8; git push hf-fourtab main -> 70f3a5c..00ea9d8
D-170 Tightened Rapid Router right-rail width and hardened per-card currency rendering with split $ + amount columns for stable symbol alignment in both pricing and unit/subtotal blocks 2026-02-26 frontend/src/pages/RapidRouter.tsx; npm --prefix frontend run build -> success
D-169 Unblocked POTS Intake line-inventory spreadsheet flow by enabling Keep number / port needed? toggles while keeping requirement enforcement 2026-02-26 frontend/src/pages/PotsIntake.tsx; npm --prefix frontend run build -> success
D-168 Fixed Routers typo/parse reliability for inventory pastes: 12 RX60 no longer misparses as 12 R x60, and likely transposed typo models now trigger confirmation (RX60 -> XR60) before snapshot execution 2026-02-26 backend/app/routers/router_core.py, backend/app/routers/router_tab_smoke_test.py; python3 -m pytest -q backend/app/routers/router_tab_smoke_test.py -> 50 passed, 9 warnings
D-167 Fixed Routers inventory customer ownership carry-forward for Customer has qty model, qty model... syntax and added regression coverage for Hoover has 200 IBR650, 12 AER2200, 16 MG51 2026-02-26 backend/app/routers/router_core.py, backend/app/routers/router_tab_smoke_test.py; python3 -m pytest -q backend/app/routers/router_tab_smoke_test.py -> 47 passed, 9 warnings
D-166 Rebalanced Rapid Router column layout (narrower right rail, wider left router cards) and aligned dollar signs in both top pricing and Unit/Subtotal blocks 2026-02-26 frontend/src/pages/RapidRouter.tsx; npm --prefix frontend run build -> success
D-165 Committed and pushed dollar-sign alignment fix for Rapid Router pricing rows to both required remotes 2026-02-26 commit ae70744; git push origin main -> 8584959..ae70744; git push hf-fourtab main -> 8584959..ae70744
D-164 Implemented explicit dollar-sign vertical alignment in product-card pricing by using a shared fixed-width value column and left-aligned currency strings 2026-02-26 frontend/src/pages/RapidRouter.tsx; npm --prefix frontend run build -> success
D-163 Committed and pushed follow-up laptop-width pricing-readability hardening to both required remotes 2026-02-26 commit 6312e7d; git push origin main -> fa21c6f..6312e7d; git push hf-fourtab main -> fa21c6f..6312e7d
D-162 Added follow-up pricing readability hardening: xl card density reduced to 3 columns and price rows use fixed value-column width for cleaner label/value separation 2026-02-26 frontend/src/pages/RapidRouter.tsx; npm --prefix frontend run build -> success
D-161 Committed and pushed pricing-overlap readability hotfix to both required remotes 2026-02-26 commit dfd9f34; git push origin main -> 07fc56e..dfd9f34; git push hf-fourtab main -> 07fc56e..dfd9f34
D-160 Fixed Rapid Router product-card pricing readability by replacing overlapping compact pricing grid with row-based flex layout (MSRP, Standard FWA, Backup / Pooled) 2026-02-26 frontend/src/pages/RapidRouter.tsx; npm --prefix frontend run build -> success
D-159 Committed and pushed deep-dive Rapid Router/helper compliance checkpoint to both required remotes 2026-02-26 commit 2f4082e; git push origin main -> a957b5c..2f4082e; git push hf-fourtab main -> a957b5c..2f4082e
D-158 Ran deep-dive visual-compliance code audit for prior Rapid Router/helper requests and patched two remaining inconsistencies: removed legacy Column focus/Copy CSV controls from remaining table-reader path and unified generic compare label to Device details 2026-02-26 frontend/src/pages/RapidRouter.tsx, backend/app/knowledgebase/core.py; npm --prefix frontend run build -> success; python3 -m pytest -q backend/app/rapid_router/test_rapid_router_core.py backend/app/test_rapid_router_api_shell.py -> 45 passed; python3 -m pytest -q backend/app/test_unified_kb_core.py backend/app/test_knowledgebase_api.py -> 88 passed
D-157 Added global floating Router helper across all tabs, moved/kept Rapid Router Find+Filter in right rail above Order status, enforced runtime HF visibility flags for admin/command-palette/system-status, and preserved Activation verification as top/default configuration option 2026-02-26 frontend/src/components/FloatingRouterHelper.tsx, frontend/src/App.tsx, frontend/src/components/BrandHeader.tsx, frontend/src/pages/RapidRouter.tsx, backend/app/main.py; npm --prefix frontend run build -> success; python3 -m pytest -q backend/app/rapid_router/test_rapid_router_core.py backend/app/test_rapid_router_api_shell.py -> 45 passed; python3 -m pytest -q backend/app/test_unified_kb_core.py backend/app/test_knowledgebase_api.py -> 88 passed
D-156 Hardened Auth0 token finalization flow to prevent premature timeout/config errors during active token setup; added preferred-audience persistence/rotation and timeout-specific guidance 2026-02-26 frontend/src/auth/AuthGate.tsx; npm --prefix frontend run build -> success; cd frontend && npx vitest run src/auth/config.test.ts src/auth/errorUtils.test.ts -> 12 passed; python3 -m pytest -q backend/app/test_auth.py -> 20 passed
D-155 Published requested checkpoint commit/push for current CAPTCHA + Rapid Router UX simplification workspace state to both remotes 2026-02-26 Commit/push executed on main after green npm --prefix frontend run build and targeted python3 -m pytest -q backend/app/rapid_router/test_rapid_router_core.py backend/app/test_rapid_router_api_shell.py
D-154 Executed Rapid Router 10-point UX refactor in 3 phases: compact 5-step header, staged review/submit actions, selection-first cards, reduced action noise, single persistent fix list, helper readability upgrades, admin moved to modal, and UX acceptance targets 2026-02-26 frontend/src/pages/RapidRouter.tsx; npm --prefix frontend run build -> success; python3 -m pytest -q backend/app/rapid_router/test_rapid_router_core.py backend/app/test_rapid_router_api_shell.py -> 44 passed, 9 warnings
D-153 Implemented scoped CAPTCHA challenge/verify flow and enforced it on Rapid Router submit + Knowledgebase/POTS message endpoints, with frontend one-time gate UI and token-expiry recovery 2026-02-26 backend/app/main.py, frontend/src/utils/captchaGate.ts, frontend/src/components/CaptchaGateCard.tsx, frontend/src/pages/UnifiedKnowledgebase.tsx, frontend/src/pages/PotsAssistant.tsx, frontend/src/pages/RapidRouter.tsx; python3 -m pytest -q backend/app/test_rapid_router_api_shell.py backend/app/test_knowledgebase_api.py backend/app/test_chat_guidance_api.py backend/app/rapid_router/test_rapid_router_core.py -> 57 passed, 9 warnings; npm --prefix frontend run build -> success
D-152 Produced critical UX/readability audit for Rapid Router and defined a prioritized 10-point simplification game plan 2026-02-26 Audit reviewed key dense zones in frontend/src/pages/RapidRouter.tsx (catalog controls, product cards, right rail, order/options/validation); plan captured in current turn response and tracked as T-067
D-151 Aligned Rapid Router product-card quantity and subtotal controls to a shared bottom baseline using flex anchoring + stabilized placeholder spacing 2026-02-26 frontend/src/pages/RapidRouter.tsx (card h-full flex-column, mt-auto quantity/pricing region, backup plan-code placeholder, shipping-note min-h); cd frontend && npm run build -> success
D-150 Investigated Space boot/wake latency and removed avoidable Rapid Router restart overhead by skipping seed-product rebuild when no seeded-ID backfill is needed 2026-02-25 backend/app/rapid_router/core.py (missing-ID gate around _seed_products()), backend/app/rapid_router/test_rapid_router_core.py (new guard test); python3 -m pytest -q backend/app/rapid_router/test_rapid_router_core.py -> 20 passed; python3 -m pytest -q backend/app/test_rapid_router_api_shell.py -> 23 passed; startup probes show second-run startup 2882ms vs first-run 6367ms in persisted-storage scenario
D-149 Prepared/published FAQ helper routing fix checkpoint to both required remotes on request 2026-02-25 Publish scope: backend/app/knowledgebase/core.py, backend/app/test_unified_kb_core.py, docs/dev/*, docs/faq/FAQ_ongoing_candidates.csv; pre-publish validation: python3 -m pytest -q backend/app/test_unified_kb_core.py (81 passed), python3 -m pytest -q backend/app/test_knowledgebase_api.py (7 passed, 9 warnings)
D-148 Fixed Rapid Router helper FAQ access so generic concept asks (e.g., network slicing) prioritize FAQ fast-lane with FAQ citation while preserving selected-model compare flows 2026-02-25 backend/app/knowledgebase/core.py (FAQ query context stripping + router-doc FAQ-first branch for RR helper generic asks), backend/app/test_unified_kb_core.py (stronger regression assertions for FAQ route/citation); python3 -m pytest -q backend/app/test_unified_kb_core.py -> 81 passed; python3 -m pytest -q backend/app/test_knowledgebase_api.py -> 7 passed, 9 warnings
D-147 Converted Shipping / Configuration / Payment order-options columns into separate bubble cards for consistent visual grouping 2026-02-25 frontend/src/pages/RapidRouter.tsx (rr-order-options three-column wrappers updated to rounded bordered panels); cd frontend && npm run build -> success
D-146 Normalized Rapid Router card alignment by reserving fixed document and setup-note spacing when optional docs are missing 2026-02-25 frontend/src/pages/RapidRouter.tsx (two fixed doc slots with invisible placeholders; setup-note placeholder block for absent notes); cd frontend && npm run build -> success
D-145 Fixed Rapid Router address-validation suggestion truncation by using full Census matched street line with structured fallback 2026-02-25 backend/app/rapid_router/core.py (_street_from_census_match, validate_us_address mapping), backend/app/rapid_router/test_rapid_router_core.py (2 new regression tests); python3 -m pytest -q backend/app/rapid_router/test_rapid_router_core.py backend/app/test_rapid_router_api_shell.py -> 42 passed, 9 warnings
D-144 Removed duplicate build timestamp display from header (kept single title-area build label) 2026-02-25 frontend/src/components/BrandHeader.tsx (deleted sticky-toolbar build badge); cd frontend && npm run build -> success
D-143 Expanded Rapid Router helper readability (wider rail, larger typography, fuller message/table rendering) while preserving helper logic 2026-02-25 frontend/src/pages/RapidRouter.tsx (grid rail width, helper card/chat sizing, assistant full-width bubbles, larger inline comparison-table preview and CTA); cd frontend && npm run build -> success
D-142 Added PRM lead mode radios (enter_now vs masters_reverse) with conditional validation and mode-aware order outputs 2026-02-25 frontend/src/pages/RapidRouter.tsx (radio controls, conditional PRM input/validation, payload + draft updates), backend/app/rapid_router/core.py (mode normalization/validation + PDF/email label rendering), backend/app/rapid_router/test_rapid_router_core.py (new reverse-PRM success test); python3 -m pytest -q backend/app/rapid_router/test_rapid_router_core.py -> 17 passed; python3 -m pytest -q backend/app/test_rapid_router_api_shell.py -> 23 passed; cd frontend && npm run build -> success
D-141 Simplified helper comparison-table controls to a single prominent Open table reader CTA for better visibility/selection 2026-02-25 frontend/src/pages/RapidRouter.tsx (HelperMarkdownTable: removed inline expand/copy strip actions, upgraded single CTA styling, retained modal CSV copy); cd frontend && npm run build -> success
D-140 Collapsed Rapid Router catalog search/filter toolbar under a default-closed accordion and kept command-focus compatibility 2026-02-25 frontend/src/pages/RapidRouter.tsx (catalogFiltersOpen state, details/summary wrapper, rapid_router:focus_search auto-open); cd frontend && npm run build -> success
D-139 Fixed Rapid Router helper context-intent regression so generic FAQ/concept asks are not forced into catalog fast-path tables 2026-02-25 backend/app/knowledgebase/core.py (intent detection now uses primary message; selected-context matching remains explicit), backend/app/test_unified_kb_core.py (new regression tests); python3 -m pytest -q backend/app/test_unified_kb_core.py -k \"rapid_router_helper_context\" -> 4 passed; python3 -m pytest -q backend/app/test_unified_kb_core.py -> 81 passed; python3 -m pytest -q backend/app/test_knowledgebase_api.py -> 7 passed
D-138 Triaged restart-time MuPDF FT_New_Memory_Face warning as non-blocking and localized source PDF (atel_re600_manual.pdf) 2026-02-25 Repro command: python3 seed-PDF scan with PyMuPDF over backend/app/rapid_router/seed/assets/*.pdf; warning only on atel_re600_manual.pdf; extraction remained successful (ok pages=5 chars=4261)
D-137 Committed and pushed all outstanding workspace changes to both required remotes on user request 2026-02-25 Modified-set publish including frontend/src/App.tsx, backend/app/rapid_router/seed/assets/atel_w01_u.png, docs/dev/*, docs/faq/FAQ_ongoing_candidates.csv; remotes origin/main, hf-fourtab/main
D-136 Triaged HF env Missing badges and confirmed listed variables are largely optional defaults/presence diagnostics (not immediate runtime failures) 2026-02-25 Code review: frontend/src/components/HealthStatusModal.tsx, backend/app/main.py (/api/health, tab/env defaults), backend/app/router_rag/core.py; guidance delivered with must-set vs optional list
D-135 Set Rapid Router as default landing tab by updating initial tab state, storage-key version, and default flag visibility 2026-02-25 frontend/src/App.tsx; cd frontend && npm run build; cd frontend && npx vitest run --pool=threads --maxWorkers=1 (18 files, 54 tests, all passed)
D-134 Hid Master’s AI and POTS Replacement Q&A toolbox tabs from UI while keeping underlying code paths intact 2026-02-25 frontend/src/App.tsx; cd frontend && npm run build; cd frontend && npx vitest run --pool=threads --maxWorkers=1 (18 files, 54 tests, all passed)
D-133 Replaced incorrect ATEL W01-U seed photo with corrected device image and verified Rapid Router core regression suite 2026-02-25 backend/app/rapid_router/seed/assets/atel_w01_u.png; python3 -m pytest -q backend/app/rapid_router/test_rapid_router_core.py -> 16 passed, 6 warnings
D-132 Prepared and published Rapid Router helper accessibility/table-reader fix bundle checkpoint to both required remotes 2026-02-25 frontend/src/pages/RapidRouter.tsx, docs/dev/*; push targets: origin/main, hf-fourtab/main
D-131 Fixed helper rail accessibility by activating two-column/sticky behavior at lg and ordering rail before main form on single-column layouts 2026-02-25 frontend/src/pages/RapidRouter.tsx; cd frontend && npm run build; cd frontend && npx vitest run --pool=threads --maxWorkers=1 (18 files, 54 tests, all passed)
D-130 Added helper table-reader Column focus dropdown with per-column show/hide (first column pinned) for wide comparison-table readability 2026-02-25 frontend/src/pages/RapidRouter.tsx; cd frontend && npm run build; cd frontend && npx vitest run --pool=threads --maxWorkers=1 (18 files, 54 tests, all passed)
D-129 Fixed Rapid Router helper comparison-table usability: always-visible Open table reader, stronger inline expand behavior, and sticky first-column/header context 2026-02-25 frontend/src/pages/RapidRouter.tsx; cd frontend && npm run build; cd frontend && npx vitest run --pool=threads --maxWorkers=1 (18 files, 54 tests, all passed)
D-128 Reordered Rapid Router right rail (Router helper above Order status) and de-cluttered both cards with shorter copy/chips/actions 2026-02-25 frontend/src/pages/RapidRouter.tsx; cd frontend && npm run build passed
D-127 Committed and pushed startup integrity FAQ/router CSV fix bundle to both required remotes 2026-02-25 commit 914699f; git push origin main -> 13886dc..914699f; git push hf-fourtab main -> 13886dc..914699f
D-126 Published operator runbook for Docker rebuild/redeploy and post-deploy browser cache reset after hashed-asset 404 2026-02-25 Guidance delivered for commit/push to origin + hf-fourtab, wait for HF rebuild complete, then hard refresh/private window to clear stale bundle references
D-125 Fixed startup integrity false alarms in Docker by hardening repo/app path resolution and packaging FAQ corpus in image 2026-02-25 backend/app/knowledgebase/core.py (root/app resolver), Dockerfile (COPY docs/faq /app/docs/faq), backend/app/test_unified_kb_core.py (new resolver tests); python3 -m pytest -q backend/app/test_unified_kb_core.py -> 79 passed; startup integrity probe: faq_entries=551, router_fact_csv_count=3, warnings=[]
D-124 Hardened Auth0 login finalization against silent token timeout and added explicit offline_access scopes 2026-02-25 frontend/src/main.tsx, frontend/src/auth/AuthGate.tsx; cd frontend && npm run build succeeded; cd frontend && npx vitest run --pool=threads --maxWorkers=1 -> 18 passed
D-123 Delivered transfer-oriented one-to-two-page architecture/stack summary for incoming project owner 2026-02-25 Summary prepared from current repo state and ops docs (README.md, backend/app/main.py, docs/dev/open_tasks.md, workflow files)
D-122 Committed and pushed Rapid Router eval25 suite + run-record docs checkpoint to both required remotes 2026-02-25 commit ce1860a; git push origin main -> 7cbce22..ce1860a; git push hf-fourtab main -> 7cbce22..ce1860a
D-121 Diagnosed shard 1-5 failure in Rapid Router eval25 as MSRP-omission semantic miss on W1850 clarify prompt 2026-02-25 `jq '.results[]
D-120 Created Rapid Router-focused 25-case eval suite and executed shard-5 run 2026-02-25 Added docs/evals/unified_kb_eval25_rapid_router_cases.json; cd backend && CHUNK_SIZE=5 START_ID=1 END_ID=25 CASE_TIMEOUT_S=30 OPENAI_MODEL=gpt-5.2 CASES_PATH=../docs/evals/unified_kb_eval25_rapid_router_cases.json OUT_DIR=../docs/evals/shards5_rapidrouter25 TREND_FILE=../docs/evals/shards5_rapidrouter25/unified_kb_eval25_rapidrouter_trend.json ./scripts/run_unified_kb_eval150_chunks.sh -> 24/25, failed IDs [3], avg 23.31ms, p95 30.33ms
D-119 Re-ran full sharded suites on demand and refreshed live metrics for current baseline reporting 2026-02-25 cd backend && CHUNK_SIZE=10 START_ID=1 END_ID=150 OPENAI_MODEL=gpt-5.2 ./scripts/run_unified_kb_eval150_chunks.sh -> 150/150, avg 900.47ms, p95 6316.81ms; cd backend && CHUNK_SIZE=5 START_ID=1 END_ID=75 CASE_TIMEOUT_S=30 OPENAI_MODEL=gpt-5.2 CASES_PATH=../docs/evals/unified_kb_eval75_msrp_verizon_cases.json OUT_DIR=../docs/evals/shards5_eval75 TREND_FILE=../docs/evals/shards5_eval75/unified_kb_eval75_trend.json ./scripts/run_unified_kb_eval150_chunks.sh -> 74/75 (failed IDs [75]), avg 200.59ms, p95 465.47ms
D-118 Re-ran all sharded Unified KB suites (150 + 75) and captured updated aggregate baselines 2026-02-25 cd backend && CHUNK_SIZE=10 START_ID=1 END_ID=150 OPENAI_MODEL=gpt-5.2 ./scripts/run_unified_kb_eval150_chunks.sh -> 150/150; cd backend && CHUNK_SIZE=5 START_ID=1 END_ID=75 CASE_TIMEOUT_S=30 OPENAI_MODEL=gpt-5.2 CASES_PATH=../docs/evals/unified_kb_eval75_msrp_verizon_cases.json OUT_DIR=../docs/evals/shards5_eval75 TREND_FILE=../docs/evals/shards5_eval75/unified_kb_eval75_trend.json ./scripts/run_unified_kb_eval150_chunks.sh -> 74/75 (failed IDs [75])
D-117 Triaged Rapid Router test warnings as non-blocking and captured warning-hygiene follow-up task 2026-02-25 python3 -m pytest -q backend/app/rapid_router/test_rapid_router_core.py backend/app/test_rapid_router_api_shell.py -> 39 passed, 9 warnings; sources: reportlab deprecation + SWIG/PyMuPDF import warnings
D-116 Committed and pushed current CR602 + T-059 + router alias-normalization batch to both required remotes 2026-02-25 commit b87d5d7; git push origin main -> 8d77217..b87d5d7; git push hf-fourtab main -> 8d77217..b87d5d7
D-115 Added deterministic router model alias normalization for hyphen/punctuation variants (MAX-BR1-PRO-5G, XR_60) and regression coverage 2026-02-25 backend/app/knowledgebase/core.py, backend/app/test_unified_kb_core.py; python3 -m pytest -q backend/app/test_unified_kb_core.py -> 77 passed
D-114 Implemented T-059 Rapid Router CSV ingestion validator + dry-run preview/apply admin path with schema/lint checks and duplicate ID/SKU protection 2026-02-25 backend/app/rapid_router/core.py, backend/app/main.py, backend/app/rapid_router/test_rapid_router_core.py, backend/app/test_rapid_router_api_shell.py; python3 -m pytest -q backend/app/rapid_router/test_rapid_router_core.py backend/app/test_rapid_router_api_shell.py -> 39 passed
D-113 Prepared detailed new-thread bootstrap prompt aligned to AGENTS + required dev docs + live working tree state 2026-02-25 Prompt package delivered in chat; continuity anchors: docs/dev/session_handoff.md, docs/dev/decisions.md, docs/dev/open_tasks.md
D-112 Produced ranked 20-item update backlog with complexity/value/risk scoring and identified top 5 implementation targets 2026-02-25 Planning output delivered in chat; promoted top-5 into T-057, T-059, T-060, T-061, T-062
D-111 Added InHand Networks CR602 to Rapid Router seeded catalog with bundled datasheet/manual/image assets and regression assertions 2026-02-25 backend/app/rapid_router/core.py, backend/app/rapid_router/test_rapid_router_core.py, backend/app/rapid_router/seed/assets/inhand_cr602.png, backend/app/rapid_router/seed/assets/inhand_cr602_datasheet.pdf, backend/app/rapid_router/seed/assets/inhand_cr602_user_manual.pdf; python3 -m pytest -q backend/app/rapid_router/test_rapid_router_core.py -> 13 passed
D-110 Committed and pushed helper non-store fallback fix checkpoint to both remotes 2026-02-24 commit df60837; git push origin main -> 8f805fb..df60837; git push hf-fourtab main -> 8f805fb..df60837
D-109 Fixed Rapid Router helper model-compare fallback: explicit non-store model asks now bypass store fast path and fall back to standard router-doc compare/spec logic with non-orderable notice 2026-02-24 backend/app/knowledgebase/core.py, backend/app/test_unified_kb_core.py; cd backend && python3 -m pytest -q app/test_unified_kb_core.py app/test_knowledgebase_api.py app/rapid_router/test_rapid_router_core.py app/test_rapid_router_api_shell.py -> 117 passed
D-108 Committed and pushed T-058 + Rapid Router BoBo/PRM validation hardening checkpoint to both remotes 2026-02-24 commit 7a884c8; git push origin main -> 7215527..7a884c8; git push hf-fourtab main -> 7215527..7a884c8
D-107 Enforced strict PRM Lead format (EL- + exactly 7 digits) with fixed-prefix UI control, backend validation, admin-config validation, and store migration 2026-02-24 frontend/src/pages/RapidRouter.tsx, backend/app/rapid_router/core.py, backend/app/rapid_router/test_rapid_router_core.py, backend/app/test_rapid_router_api_shell.py, backend/app/test_tab_final_pass_matrix.py; python3 -m py_compile backend/app/rapid_router/core.py; cd frontend && npm run build; cd backend && python3 -m pytest -q app/rapid_router/test_rapid_router_core.py app/test_rapid_router_api_shell.py app/test_tab_final_pass_matrix.py -> 38 passed
D-106 Added BoBo-conditional required payment fields (Company Name, SPOC, ECPD/VZ Account Number) across Rapid Router UI + submit validation + persisted order/email/PDF outputs 2026-02-24 frontend/src/pages/RapidRouter.tsx, backend/app/rapid_router/core.py, backend/app/rapid_router/test_rapid_router_core.py, backend/app/test_tab_final_pass_matrix.py; cd backend && python3 -m pytest -q app/rapid_router/test_rapid_router_core.py app/test_tab_final_pass_matrix.py -> 16 passed; cd backend && python3 -m pytest -q app/test_rapid_router_api_shell.py -> 21 passed; cd frontend && npm run build passed
D-105 Implemented T-058 Rapid Router catalog integration into Unified Knowledgebase (router_docs) with provider injection, deterministic catalog fast paths (list/price/feature/compare), cache-fingerprint wiring, and fallback-to-router-fact behavior 2026-02-24 backend/app/knowledgebase/core.py, backend/app/main.py, backend/app/test_unified_kb_core.py, backend/app/test_knowledgebase_api.py; cd backend && python3 -m pytest -q app/test_unified_kb_core.py app/test_knowledgebase_api.py app/rapid_router/test_rapid_router_core.py -> 92 passed; manual API check via TestClient confirmed retrieval mode deterministic_rapid_router_catalog_list_fast
D-104 Added full-screen comparison-table reader for Rapid Router helper messages (with sticky table headers, improved cell readability, and CSV copy retained) 2026-02-24 frontend/src/pages/RapidRouter.tsx; cd frontend && npm run build passed
D-103 Committed and pushed current Rapid Router + auth stabilization working tree to both remotes 2026-02-24 commit 44c021b; git push origin main and git push hf-fourtab main succeeded
D-102 Updated auth smoke Playwright helper to skip quickly in non-auth local runtime (eliminated false 60s timeouts) 2026-02-24 frontend/e2e/auth.spec.ts; cd frontend && E2E_DISABLE_WEBSERVER=true E2E_BASE_URL=http://127.0.0.1:7860 npx playwright test e2e/auth.spec.ts -> 6 skipped
D-101 Fixed AuthGate refresh-token recovery-flag lifecycle for deterministic re-login/consent recovery behavior 2026-02-24 frontend/src/auth/AuthGate.tsx; cd frontend && npm run build; cd frontend && npx vitest run --pool=threads --maxWorkers=1 -> 18 passed
D-100 Hardened AuthGate timeout env parsing (VITE_AUTH_FINALIZING_WATCHDOG_MS, VITE_AUTH_SILENT_TIMEOUT_MS) against quoted/malformed values 2026-02-24 frontend/src/auth/AuthGate.tsx; cd frontend && npm run build; python3 -m pytest -q backend/app/test_auth.py backend/app/test_rapid_router_api_shell.py backend/app/rapid_router/test_rapid_router_core.py -> 52 passed
D-099 Implemented full Rapid Router UX cleanup bundle (compact order rail, completion chips, jump-to-error links, card/table toggle, shipping indicators, collapsed sections, review modal, helper CSV copy/spacing, session-draft badges, mobile sticky CTA) 2026-02-24 frontend/src/pages/RapidRouter.tsx; cd frontend && npm run build passed; python3 -m pytest -q backend/app/rapid_router/test_rapid_router_core.py -> 11 passed; python3 -m pytest -q backend/app/test_rapid_router_api_shell.py -> 21 passed
D-098 Prepared prioritized UI/UX improvement recommendations for Rapid Router/toolbox cleanup 2026-02-24 Recommendation package delivered in chat; implementation task tracked as T-056
D-097 Added search-driven auto-expand behavior for collapsed Support Toolbox 2026-02-24 frontend/src/App.tsx; cd frontend && npm run build passed
D-096 Collapsed Support Toolbox cards behind a closed-by-default accordion toggle (Open toolbox / Hide toolbox) 2026-02-24 frontend/src/App.tsx; cd frontend && npm run build passed
D-095 Made Ordering assistant and Router selection helper follow together as a single sticky right-column block 2026-02-24 frontend/src/pages/RapidRouter.tsx; cd frontend && npm run build passed
D-094 Updated ground shipping policy to $9.99 with automatic waiver for Standard FWA items; added migration and order/PDF/email shipping breakdown fields 2026-02-24 backend/app/rapid_router/core.py, frontend/src/pages/RapidRouter.tsx; python3 -m pytest -q backend/app/rapid_router/test_rapid_router_core.py -> 11 passed; python3 -m pytest -q backend/app/test_rapid_router_api_shell.py -> 21 passed; cd frontend && npm run build passed
D-093 Updated Peplink MAX BR1 Pro 5G MSRP to $999.00 and added startup migration to correct stale/null persisted MSRP values 2026-02-24 backend/app/rapid_router/core.py, backend/app/rapid_router/test_rapid_router_core.py; python3 -m pytest -q backend/app/rapid_router/test_rapid_router_core.py -> 10 passed
D-092 Fixed Router selection helper table rendering by switching assistant bubbles to markdown + added per-table expand/collapse UI 2026-02-24 frontend/src/pages/RapidRouter.tsx; cd frontend && npm run build passed
D-091 Sorted routers by price_primary ascending within each 4G and 5G section on Rapid Router 2026-02-24 frontend/src/pages/RapidRouter.tsx; cd frontend && npm run build passed
D-090 Grouped Rapid Router product catalog into distinct 4G then 5G sections with visual differentiation 2026-02-24 frontend/src/pages/RapidRouter.tsx; cd frontend && npm run build passed
D-089 Committed and pushed Rapid Router reload-reset behavior to both required remotes 2026-02-24 commit a469363; pushed origin/main and hf-fourtab/main
D-088 Changed Rapid Router draft persistence to in-memory only so full reload clears quantities/details while in-app tab switches retain state 2026-02-24 frontend/src/pages/RapidRouter.tsx; cd frontend && npm run build passed
D-087 Verified ATEL RE600 (Black) image was already correct (no-op fix) 2026-02-24 backend/app/rapid_router/seed/assets/atel_re600_black.png; SHA-256 matched Screenshot 2026-02-24 at 11.13.41 AM.png
D-086 Corrected Inseego Wavemaker FX4210 card image (replaced mismatched seed asset with proper Inseego device art) 2026-02-24 backend/app/rapid_router/seed/assets/inseego_wavemaker_fx4210.png; python3 -m pytest -q backend/app/rapid_router/test_rapid_router_core.py -> 9 passed
D-085 Corrected swapped ATEL image assignments: V810AD now uses single-antenna tabletop image and RE600 uses multi-antenna image 2026-02-24 backend/app/rapid_router/seed/assets/atel_v810ad.png, backend/app/rapid_router/seed/assets/atel_re600_black.png; python3 -m pytest -q backend/app/rapid_router/test_rapid_router_core.py -> 9 passed
D-084 Applied ATEL W01-U image hotfix as explicit seed-asset rewrite to corrected user-provided image 2026-02-24 backend/app/rapid_router/seed/assets/atel_w01_u.png; python3 -m pytest -q backend/app/rapid_router/test_rapid_router_core.py -> 9 passed
D-083 Prepared single-commit deployment package for Rapid Router new-device expansion (catalog + assets + migration + tests + template) and executed commit/push workflow 2026-02-24 backend/app/rapid_router/core.py, backend/app/rapid_router/test_rapid_router_core.py, backend/app/rapid_router/seed/assets/*, docs/templates/rapid_router_new_devices_upload_template.csv
D-082 Replaced new Rapid Router device photos with exact user-supplied attachment images and enabled startup refresh for existing runtime copies 2026-02-24 backend/app/rapid_router/seed/assets/peplink_b_one_5g.png, backend/app/rapid_router/seed/assets/atel_w01_u.png, backend/app/rapid_router/seed/assets/atel_pw550.png, backend/app/rapid_router/seed/assets/atel_re600_black.png, backend/app/rapid_router/seed/assets/atel_v810ad.png, backend/app/rapid_router/seed/assets/atel_v810vd_bp.png, backend/app/rapid_router/seed/assets/inseego_wavemaker_fx4210.png, backend/app/rapid_router/core.py; python3 -m pytest -q backend/app/rapid_router/test_rapid_router_core.py -> 9 passed
D-081 Added 7 new Rapid Router devices (Peplink B One 5G, ATEL W01-U/PW550/RE600/V810AD/V810VD-BP, Inseego Wavemaker FX4210) with seeded assets and automatic backfill for existing stores 2026-02-24 backend/app/rapid_router/core.py, backend/app/rapid_router/seed/assets/*, backend/app/rapid_router/test_rapid_router_core.py; python3 -m pytest -q backend/app/rapid_router/test_rapid_router_core.py -> 9 passed
D-073 Simplified Rapid Router Ordering Assistant to compact status card with fewer actions and reduced visual complexity 2026-02-24 frontend/src/pages/RapidRouter.tsx; cd frontend && npm run build passed
D-074 Added Rapid Router MSRP support (store schema + product cards + admin add-product MSRP input) 2026-02-24 backend/app/rapid_router/core.py, backend/app/main.py, frontend/src/pages/RapidRouter.tsx; cd frontend && npm run build passed
D-075 Added workbook-backed required Masters contact dropdown and routed selected contact into order-email To recipients 2026-02-24 backend/app/rapid_router/seed/masters_contacts.xlsx, backend/app/rapid_router/core.py, frontend/src/pages/RapidRouter.tsx; cd backend && python3 -m pytest app/rapid_router/test_rapid_router_core.py app/test_rapid_router_api_shell.py app/test_tab_final_pass_matrix.py -q -> 29 passed
D-076 Added required Configuration Options with advanced-task validation and per-router pricing rolled into totals/PDF/email 2026-02-24 backend/app/rapid_router/core.py, frontend/src/pages/RapidRouter.tsx; cd backend && python3 -m pytest app/rapid_router/test_rapid_router_core.py app/test_rapid_router_api_shell.py app/test_tab_final_pass_matrix.py -q -> 29 passed
D-077 Committed and pushed Rapid Router MSRP/contact/configuration expansion to both required remotes 2026-02-24 commit 176ff8f; pushed origin/main and hf-fourtab/main
D-078 Remapped Peplink MAX BR1 Pro 5G to use the current MAX BR1 Mini (Wi-Fi) photo and added startup migration for persisted stores 2026-02-24 backend/app/rapid_router/core.py, backend/app/rapid_router/test_rapid_router_core.py; cd backend && python3 -m pytest app/rapid_router/test_rapid_router_core.py -q -> 7 passed
D-079 Replaced MAX BR1 Mini (Wi-Fi) product image with requested photo and enforced startup refresh for existing runtime assets 2026-02-24 backend/app/rapid_router/seed/assets/peplink_br1_mini_5g_wifi.png, backend/app/rapid_router/core.py, backend/app/rapid_router/test_rapid_router_core.py; cd backend && python3 -m pytest app/rapid_router/test_rapid_router_core.py -q -> 8 passed
D-080 Added reusable CSV template for Rapid Router new-device intake with MSRP and pricing columns 2026-02-24 docs/templates/rapid_router_new_devices_upload_template.csv
D-072 Fixed mobile overlap where Ordering Assistant covered Router selection helper by making side panel sticky only at lg+ breakpoints 2026-02-24 frontend/src/components/ConversationalSidePanel.tsx; cd frontend && npm run build passed
D-071 Committed and pushed Rapid Router helper chatbot fast-path to both required remotes 2026-02-24 commit 6c6f7dc; pushed origin/main and hf-fourtab/main
D-070 Implemented Rapid Router in-page helper chatbot using existing knowledgebase endpoint in router_docs mode 2026-02-24 frontend/src/pages/RapidRouter.tsx; cd frontend && npm run build passed
D-063 Committed and pushed item 1-5 changes + eval150 rerun results to both required remotes 2026-02-24 commit 54a654c; pushed origin/main and hf-fourtab/main
D-062 Re-ran full unified eval150 (shards10, OpenAI semantic) after implementing items 1-5; achieved zero failures 2026-02-24 docs/evals/shards10/unified_kb_eval150_shards10_summary.json (150/150, 100.0%, failed IDs [])
D-061 Added resilient local /tmp corpus staging in shard runner with manifest fallback generation from chunks 2026-02-24 backend/scripts/run_unified_kb_eval150_chunks.sh (ROUTER_RAG_DATA_DIR=/tmp/router_rag_eval_stage/... confirmed in run logs)
D-060 Added Router RAG fingerprint modes (strict/hybrid/metadata) with timeout-safe hash fallback 2026-02-24 backend/app/router_rag/index.py, backend/app/test_router_rag_module.py (47 passed)
D-059 Added safe .env.codex loading fallback and optional single-process shard execution mode 2026-02-24 backend/scripts/run_unified_kb_eval150_chunks.sh (load_env_file_safe, SINGLE_PROCESS_SHARDS)
D-058 Re-ran unified eval150 in 10-question shards with semantic grading and published updated summary 2026-02-24 docs/evals/shards10/unified_kb_eval150_shards10_summary.json (126/150, 84.0%, failed IDs include 2,3,39-58,116,118)
D-055 Fixed CBA850 token-only routing from weak router-docs path to deterministic lifecycle output 2026-02-20 backend/app/knowledgebase/core.py (_single_lifecycle_only_model_token, auto-mode routing + router-docs bridge), backend/app/test_unified_kb_core.py
D-056 Added regression tests for lifecycle-only single-token routing in router_docs and auto 2026-02-20 cd backend && python3 -m pytest -q app/test_unified_kb_core.py -> 70 passed
D-057 Full backend regression passed after CBA850 routing fix 2026-02-20 cd backend && python3 -m pytest -q -> 316 passed, 9 warnings
D-054 Committed and pushed deep-analysis hardening patch to GitHub and HF four-tab remote 2026-02-20 commit f1e0811; pushed origin/main and hf-fourtab/main
D-051 Patched web fallback timeout budgeting to respect remaining request budget 2026-02-20 backend/app/knowledgebase/core.py (_web_fallback remaining budget guard + timeout cap)
D-052 Hardened parallel index search against stale/shutdown shared executor and added recovery test 2026-02-20 backend/app/knowledgebase/core.py, backend/app/test_unified_kb_core.py
D-053 Deep-analysis verification cycle passed after hardening patches 2026-02-20 cd backend && python3 -m pytest -q -> 314 passed, 9 warnings; cd backend && python3 -m pytest -q app/test_unified_kb_core.py -> 68 passed
D-050 Finalized and pushed enhancement batch commit to both remotes 2026-02-20 commit 925b963; pushed origin/main and hf-fourtab/main
D-044 Implemented targeted fail-ID fixes for masters FAQ clarify over-trigger (102,108) and POTS top-10 objection parsing (63) 2026-02-20 backend/app/knowledgebase/core.py; targeted reruns in docs/evals/shards1_target_102_108/ and docs/evals/shards1_target_75_id63/
D-045 Added stage-budget-exit telemetry and retrieval-mode tracking to eval payloads/summaries 2026-02-20 backend/scripts/unified_kb_eval150.py
D-046 Added runner profile toggle + explicit commit-gate fields (no_new_failed_ids, p95_non_regression) and non-persistent FAQ churn policy by default 2026-02-20 backend/scripts/run_unified_kb_eval150_chunks.sh
D-047 Added regression tests for FAQ medium-confidence bypass and hyphenated top-10 objection handling 2026-02-20 backend/app/test_unified_kb_core.py, backend/app/test_unified_kb_eval150_script.py
D-048 Full backend regression passed after enhancement batch 2026-02-20 cd backend && python3 -m pytest -q -> 312 passed, 9 warnings
D-049 Full OpenAI shard reruns completed (v3) 2026-02-20 docs/evals/shards5_150_balanced_v3/unified_kb_eval150_shards10_summary.json (150/150), docs/evals/shards5_75_balanced_v3/unified_kb_eval150_shards10_summary.json (74/75, fail 3)
D-043 Logged pre-commit low-risk enhancement shortlist and translated into active tasks 2026-02-20 docs/dev/decisions.md, docs/dev/open_tasks.md
D-037 Implemented balanced-profile token/perf caps in router web fallback + POTS synthesis + semantic grader defaults 2026-02-20 backend/app/router_rag/core.py, backend/app/pots_ai/core.py, backend/scripts/unified_kb_eval150.py, backend/scripts/run_unified_kb_eval150_chunks.sh
D-038 Applied OpenAI compatibility fix for POTS completions cap (max_completion_tokens) 2026-02-20 backend/app/pots_ai/core.py
D-039 Clean 150-case rerun (balanced-v2) completed 2026-02-20 docs/evals/shards5_150_balanced_v2/unified_kb_eval150_shards10_summary.json (148/150, fails 102,108)
D-040 Clean 75-case rerun (balanced-v2) completed 2026-02-20 docs/evals/shards5_75_balanced_v2/unified_kb_eval150_shards10_summary.json (74/75, fails 63)
D-041 Full backend regression passed after balanced-v2 changes 2026-02-20 cd backend && python3 -m pytest -q -> 308 passed, 9 warnings
D-042 Before/after comparison package prepared for commit-gate decision 2026-02-20 docs/dev/session_handoff.md, docs/dev/decisions.md
D-019 Full backend deep-dive regression run passed 2026-02-20 python3 -m pytest -q -> 299 passed
D-020 Patched shared bounded retrieval executor path in unified KB 2026-02-20 backend/app/knowledgebase/core.py
D-021 Added runtime health assertion for parallel-search executor flags 2026-02-20 backend/app/test_unified_kb_core.py
D-022 Shard runner now defaults trend and FAQ ongoing paths to OUT_DIR 2026-02-20 backend/scripts/run_unified_kb_eval150_chunks.sh
D-023 Post-patch shard smoke run passed and wrote trend in smoke out-dir 2026-02-20 docs/evals/shards10_deepdive_smoke/unified_kb_eval150_shards10_summary.json
D-024 150-case semantic rerun (shard-5, 30s timeout) completed 2026-02-20 docs/evals/shards5_150_rerun/unified_kb_eval150_shards10_summary.json (146/150)
D-025 75-case MSRP/Verizon semantic rerun (shard-5, 30s timeout) completed 2026-02-20 docs/evals/shards5_75_rerun/unified_kb_eval150_shards10_summary.json (74/75)
D-026 Post-rerun full backend regression passed 2026-02-20 cd backend && python3 -m pytest -q -> 299 passed
D-027 Device comparison table schema updated to user-locked format with hidden evidence column 2026-02-20 backend/app/knowledgebase/core.py
D-028 Comparison schema regression tests added and passing 2026-02-20 cd backend && python3 -m pytest -q app/test_unified_kb_core.py -> 56 passed
D-029 Implemented full guarded 10-suggestion patch set (core + eval tooling + tests) 2026-02-20 backend/app/knowledgebase/core.py, backend/scripts/unified_kb_eval150.py, backend/scripts/run_unified_kb_eval150_chunks.sh, backend/app/test_unified_kb_core.py
D-030 Full backend regression passed after patch set 2026-02-20 cd backend && python3 -m pytest -q -> 308 passed
D-031 150-case shard-5 semantic rerun completed post-patch 2026-02-20 docs/evals/shards10/unified_kb_eval150_shards10_summary.json (144/150, fails 7,86,90,102,108,129)
D-032 75-case MSRP/Verizon shard-5 semantic rerun completed post-patch 2026-02-20 docs/evals/shards5_eval75/unified_kb_eval75_shards5_summary.json (74/75, fails 3)
D-033 Current batch committed and pushed to GitHub + HF four-tab remote 2026-02-20 commit 9e5a3bd; pushed origin/main and hf-fourtab/main
D-034 OpenAI token-usage hotspot analysis completed (no-code step) 2026-02-20 Reviewed backend/scripts/unified_kb_eval150.py, backend/app/pots_ai/core.py, backend/app/router_rag/core.py
D-035 Token-optimization actions ranked by difficulty/perf/token impact and rollout priority 2026-02-20 User-facing ranked matrix prepared; order captured in docs/dev/decisions.md
D-036 Balanced profile recommendation published (performance vs quality) 2026-02-20 Decision logged in docs/dev/decisions.md; task added as T-026

Standard Verification Commands

# Full backend regression
cd backend
python3 -m pytest -q

# Deep-dive runner smoke
cd backend
CHUNK_SIZE=1 START_ID=1 END_ID=1 OUT_DIR=../docs/evals/shards10_deepdive_smoke \
SHARD_WORKERS=1 OPENAI_MODEL=gpt-5.2 ./scripts/run_unified_kb_eval150_chunks.sh