| # Search-UI — Aggressive Refactor Plan |
|
|
| This document tracks the autonomous refactor begun on `claude/refactor-aggressive`. |
| It is the canonical source for what the refactor is doing, the order of work, |
| and the gates between phases. |
|
|
| ## Goal |
|
|
| Cut LOC by ~50%, separate ingestion from serving, replace hand-rolled infra |
| with proven libraries, modernize the frontend. App must stay functional at |
| every phase boundary. |
|
|
| ## North-star metrics |
|
|
| | Metric | Before | Target | |
| |---|---|---| |
| | Backend Python LOC | ~35,850 | ~10,000 | |
| | Frontend JS/JSX LOC | ~11,820 | ~5,000 | |
| | `*_routes.py` files | 13 | ≤ 5 | |
| | SQLite DB files | 5 | 1 | |
| | Web-process cold-start RSS | 450 MB → **76 MB** (Session 4) | < 500 MB ✓ (cold) | |
| | Tests collecting cleanly | 85/443 → 443/443 after Phase 0 | 443/443 | |
| | Tests passing (sandbox, no model network) | 440/443 | 443/443 in user env | |
|
|
| ## Phases & status |
|
|
| | Phase | Title | Status | |
| |---|---|---| |
| | 0 | Safety net (conftest, golden tests, smoke script) | DONE | |
| | 1 | Subtraction: dead code, redundant routers, rerankers, dedup | DONE | |
| | 2 | Single database + merge tool + Alembic | MERGE TOOL VALIDATED on real DBs (Session 3): found+fixed a vec0 INTEGER-PRIMARY-KEY-alias copy bug; merged 5→1 (3.2 GB), all 15 key row counts exact, golden diff identical except one benign tie-swap of two equal-score (0.71191) semantic hits. Flipped to the single DB (Session 3). Alembic SCOPED FOUNDATION added (opt-in); full cutover deliberately deferred — see "Phase 2 Alembic foundation" below. | |
| | 3a | ML out of web request path: CLI extraction | **Cutover step 1 (lazy torch imports) DONE & runtime-verified (Session 4): cold-start web RSS 450→76 MB, torch absent at boot, models load lazily on first query, results byte-identical (golden + eval). CLI scaffold from Session 3 stands. Steps 3–4 (enqueue ingestion / relocate modules) still open.** | |
| | 3b | Query-time embedding sidecar (optional but recommended) | **DROPPED for current deployments (Session 4 decision). On the single-container HF Docker Space, web+sidecar share one container's RAM, so a sidecar does NOT lower container memory — its only real benefit is a true multi-machine split (web on HF, model on a GPU box), which isn't the deployment and would add per-query network latency + an always-on paid service. Revisit only if/when web and ML run on separate machines.** | |
| | 3c | Video lifecycle: prune source MP4s after ingestion | DONE | |
| | 4 | Route consolidation: 13 → 6 coherent routers | DONE at 6 (Session 3 decision). 13→6 via cloud PRs #34–39 (search←content+analysis, face←speaker, catalog←status, workflow←AD+settings+sync). Stopped at 6 deliberately — the last two merges would chase a counter at the cost of multi-concern files; see "Phase 4 progress" below. | |
| | 5 | Frontend: TypeScript + React Router + TanStack Query + Tailwind | DEFERRED — needs runtime verification | |
| | 6 | Face: DeepFace → insightface (ONNX) | DEFERRED — needs runtime verification | |
| | 7 | Desktop story (drop Electron or → Tauri) | DEFERRED — needs decision | |
| | 8 | Smart-search reckoning (instrument & decide) | DEFERRED — needs production data | |
|
|
| ## Phase 0 — Safety net (1–2 days) |
|
|
| Build the regression detector before changing anything. |
|
|
| **Deliverables**: |
| - `conftest.py` at repo root that fixes the two-import-conventions test mess |
| - `backend/__init__.py` so tests can import either flat or namespaced |
| - `tests/golden/` query fixture set (run against user's real DB to populate) |
| - `scripts/snapshot_golden.py` — capture top-N results per query |
| - `scripts/diff_golden.py` — re-run and diff |
| - `scripts/smoke_test.sh` — boot backend, hit 20 critical endpoints |
| - All collection errors fixed (target: 131/131 tests collect, even if some skip) |
| - Tag pre-refactor state for rollback |
|
|
| **Gate**: all tests collect cleanly. Golden snapshot scripts run end-to-end |
| against a dummy DB. Smoke script returns exit 0. |
|
|
| ## Phase 1 — Subtraction (2–3 days) |
|
|
| Pure deletion. No new deps, no architecture change. |
|
|
| **Delete outright**: |
| - `frontend/src/designMockups/` (13 files, ~485 LOC) — DONE |
| - `MockupReview.jsx` + `?view=mockups` routing — DONE |
| - `main.py:/api/hello`, `/api/data` placeholders — DONE |
| - `llm_router.py` + `llm_client.py` (orphaned, ~896 LOC) — DONE |
| - `MiniLM` fallback in `ai_features.py` — DONE |
| - Mixedbread legacy embedding code — DEFERRED to Phase 2 |
| (still used by search_semantic.py for whole-document index; |
| removing it now breaks search against legacy DB rows. Will be |
| retired via Alembic backfill migration during DB consolidation.) |
| - `pywebview` — DEFERRED to Phase 7 (desktop decision) |
| |
| **Consolidate**: |
| - 3 rerankers → `search_visual_rerank_rules.py` — DONE (common+event+rules |
| merged; bug found and fixed in _best_label_score case-sensitivity) |
| - 8 face files → defer to Phase 4. On inspection, the small ones |
| (face_search_common 40 LOC, face_person_index 87 LOC, face_route_common |
| 48 LOC) sit at the bottom of the dep tree and can't be merged upward |
| without circularity. The 5 mixin files map to real concerns (db, storage, |
| people, recognition, review) — collapsing them touches a 2000+ line class |
| and isn't worth the regression risk in this phase. |
| - Duplicate `_get_vod_categories()` → one helper in `media_metadata.py` — TODO |
| |
| **Policy change**: drop the 600-line hard limit from `CLAUDE.md`. Replace with |
| guidance: "files should do one thing; never split a coherent concept just to |
| satisfy a counter." |
| |
| **Gate**: golden tests pass. Tests still collect. LOC down ≥ 5k. |
| |
| ## Phase 2 — Single DB + Alembic (4–5 days) |
| |
| **Deliverables**: |
| - `backend/schema_version.py` — one connection factory, one DB file |
| - `backend/migrations/` — Alembic with baseline migration |
| - `scripts/migrate_to_single_db.py` — merge 5 source DBs into 1, verify row counts |
| - Remove every `CREATE TABLE IF NOT EXISTS` from app boot code |
| - Replace `_schema_metadata` per-db with single `alembic_version` |
|
|
| **Gate**: migration script runs cleanly on a copy of real DBs. |
| Golden tests pass against single DB. Row counts match. |
|
|
| **Risk mitigation**: original 5 DBs untouched until 2 weeks of normal use. |
|
|
| ## Phase 3 — ML out of web process |
|
|
| ### 3a: CLI extraction |
|
|
| **Deliverables**: |
| - `backend/cli.py` with subcommands: `jws ingest vod`, `jws ingest subtitles`, |
| `jws ingest video`, `jws ingest faces`, `jws ingest images`, `jws reindex embeddings` |
| - All ML imports (`torch`, `deepface`, `transformers`, `whisper`, `transnetv2`) |
| moved into `backend/jwsearch/ingest/` |
| - Web process imports only `sentence-transformers` for query embeddings |
| (or none if 3b ships) |
| - Endpoints that previously kicked off processing now enqueue jobs |
|
|
| ### 3b: Query-time embedding sidecar (optional) |
|
|
| **Deliverables**: |
| - `backend/jwsearch/embed_service.py` — 100-line FastAPI process holding Qwen3 |
| - Main web process makes HTTP calls to it |
| - Web process imports zero ML libraries |
|
|
| ### 3c: Video lifecycle |
|
|
| **Deliverables**: |
| - `jws prune videos --keep-thumbnails --keep-embeddings` CLI command |
| - New column `source_deleted_at` on the videos table |
| - New env flag `SEARCH_UI_KEEP_SOURCE_VIDEOS` (default `false`) |
| - Ingestion workflow deletes MP4 after extraction if flag is unset |
| - `content_status` treats videos with `source_deleted_at` set as "complete" |
|
|
| **Rationale**: thumbnails (~50 MB/video) are 10× smaller than source MP4s |
| (~350 MB/video). JW.org streams playback via `progressiveDownloadURL` already. |
| Re-extraction only needs re-download (bandwidth, not storage). |
|
|
| **Gate**: web process cold-start under 5s, RSS under 500 MB. Background |
| ingest produces identical indexed data (golden tests pass). |
| A pruned video still plays via `ClipPlayer` (poster + streaming URL). |
|
|
| ## Phase 4 — Route consolidation (3–5 days) |
|
|
| 13 routers → 4: |
|
|
| | New router | Replaces | |
| |---|---| |
| | `search.py` | search_routes, content_routes, analysis_routes (scripture) | |
| | `catalog.py` | catalog_routes, status_routes, publication_routes (read) | |
| | `people.py` | face_routes, face_route_persons, speaker_routes | |
| | `jobs.py` | workflow_routes, processing_routes, sync_routes, audio_description_routes, settings_routes | |
|
|
| Plus: `services.py` (centralized service factory), `errors.py` (global handlers), |
| `schemas.py` (Pydantic DTOs). |
|
|
| **Gate**: golden tests pass. Frontend works without changes (URLs preserved). |
|
|
| ## Phases 5–8 — Deferred (need runtime verification or production data) |
|
|
| 5: Frontend modernization, 6: Face re-platform, 7: Desktop story, 8: Router decision. |
| Documented in chat; not started in this autonomous session. |
|
|
| ## Session 1 actual outcome (2026-05-26) |
|
|
| Phases 0, 1, 3c shipped. Phase 2 scaffolded (merge tool only). |
| Phases 3a, 3b, 4, 5, 6, 7, 8 deferred — they need runtime verification, |
| production data, or are best sequenced after the user flips to the |
| single DB. |
|
|
| **Net change**: ~5,000 LOC removed. 18 new tests added. Test suite: |
| 455/458 passing (3 pre-existing HuggingFace-network failures unchanged). |
| Branch: `claude/refactor-aggressive`. |
|
|
| **What the user needs to do next** to continue the refactor: |
|
|
| 1. **Run the golden snapshot** against current backend with real data: |
| ``` |
| python scripts/snapshot_golden.py --base-url http://localhost:8001 \ |
| --output tests/golden/snapshot.json |
| git add tests/golden/snapshot.json && git commit |
| ``` |
| Without a baseline snapshot, Phase 4+ can't detect ranking regressions. |
|
|
| 2. **Merge the databases.** The merge tool now reconstructs regular |
| tables, vec0 (sqlite-vec) embeddings, AND FTS5 full-text indices, |
| preserving rowids so embedding→metadata joins survive. Steps: |
| ``` |
| # Dry run first — reports per-source row counts, writes nothing |
| python scripts/merge_databases.py --output ~/searchui-merged.db --dry-run |
| # Then for real (needs: pip install sqlite-vec) |
| python scripts/merge_databases.py --output ~/searchui-merged.db |
| ``` |
| Verify the merged DB serves searches correctly, THEN flip the app by |
| pointing all DB env vars at it: |
| ``` |
| export SEARCH_UI_SEARCH_DB_PATH=~/searchui-merged.db |
| export SEARCH_UI_IMAGE_DB_PATH=~/searchui-merged.db |
| export SEARCH_UI_FACE_DB_PATH=~/searchui-merged.db |
| export SEARCH_UI_SPEAKER_DB_PATH=~/searchui-merged.db |
| export SEARCH_UI_PUBLICATIONS_DB_PATH=~/searchui-merged.db |
| ``` |
| Run the golden diff (`scripts/diff_golden.py`) against the flipped |
| app to confirm no ranking drift. Keep the original 5 DBs untouched |
| for two weeks as the rollback path before archiving. |
|
|
| Still TODO in Phase 2: replace the per-DB `CREATE TABLE IF NOT EXISTS` |
| bootstrap + `_schema_metadata` with Alembic migrations so the single |
| DB has real schema versioning. The merge tool keeps only the first |
| source's `_schema_metadata` row — Alembic's `alembic_version` will |
| supersede it. |
|
|
| 3. **Try the video prune** in dry-run mode first: |
| ``` |
| python scripts/prune_source_videos.py --dry-run |
| ``` |
| Expect ~1 TB of disk reclaimed across 3,713 videos. |
|
|
| ## Session 2 actual outcome (2026-05-28) |
|
|
| Completed the Phase 2 merge tool (vec0 + FTS5 reconstruction with rowid |
| preservation), then ran full pre-merge QC on the whole `claude/refactor-aggressive` |
| branch (18 commits, net −2,952 LOC): |
|
|
| - **Safety review** (subagent): zero dangling references — every deleted module |
| (`llm_client`, `llm_router`, the two visual rerankers) and removed endpoint |
| (`/api/hello`, `/api/data`, `designMockups`) has zero remaining referents. |
| - **Standards review** (subagent): zero MUST-FIX. Fixed SHOULD-FIX items — |
| `SEARCH_UI_KEEP_SOURCE_VIDEOS` was silently ignored on the batch path |
| (`process_all_local_videos` hard-coded `delete_video_after=True`); defaulted |
| to the `None` sentinel + added a regression test. Removed a phantom |
| `--keep-largest` doc line and two dead imports. |
| - **Security review** (skill): no vulnerabilities. The SQL-building merge tool |
| and file-deleting prune script are operator CLIs whose only external inputs |
| are trusted env/CLI values and the app's own schema. |
|
|
| Test suite: **460 passed**, 3 pre-existing HuggingFace-network failures. Branch |
| merged to `main` via PR. Continuation handoff for the remaining phases lives in |
| `CONTINUATION_PROMPT.md`. |
|
|
| **Next (see CONTINUATION_PROMPT.md):** Phase 4 route consolidation → Phase 2 |
| Alembic → Phase 3a CLI-ingestion scaffold, each as an atomic-commit branch with |
| subagent QC and a PR for Glenn to merge after an app smoke-test. |
| |
| ## Phase 4 progress (Session 2, branch claude/phase4-route-consolidation) |
| |
| **Done & verified (13 → 10 route modules):** |
| - Added `backend/tests/test_app_boot.py` — assembles the real `create_app()` |
| (startup checks off via `SEARCH_UI_STARTUP_CHECKS=false`) and asserts the full |
| /api surface + no duplicate (path, method) registrations. This is the |
| regression guard that makes consolidation verifiable without the live app. |
| - `search_routes.py` ← absorbed `content_routes.py` + `analysis_routes.py` |
| (deduped the byte-identical `_get_default_*_service` helpers; dropped a dead |
| `import json`). |
| - `face_routes.py` ← absorbed `speaker_routes.py` (kept the module-level speaker |
| `router` singleton). |
| - Each step verified by a **byte-identical 128-route manifest** + full suite |
| green (462 passed). Old files deleted; importing tests use module aliases. |
|
|
| **Naming note:** `search.py` is the search ENGINE module, so the consolidated |
| router keeps the `*_routes.py` convention rather than the plan's `search.py`. |
| Targets are now: `search_routes`, `catalog_routes`, `face_routes`, |
| `workflow_routes` (4 ≤ 5). |
|
|
| **Final state (Session 3): 13 → 6 routers. Phase 4 closed here.** |
| Cloud PRs #34–39 carried it past the Session-2 "13→10" note: `catalog_routes` |
| absorbed `status_routes`, and `workflow_routes` absorbed `audio_description` + |
| `settings` + `sync`. Current routers (6): `search_routes`, `catalog_routes` |
| (+status), `face_routes` (+speaker), `workflow_routes` (+AD/settings/sync), |
| `processing_routes`, `publication_routes`. |
|
|
| **Decision: stop at 6. Do NOT force the last two merges.** A deep-dive review |
| (Session 3) found both remaining merges trade maintainability for a smaller |
| count — exactly what CLAUDE.md's file-size guidance warns against: |
| - **`processing → workflow`** → a ~1,790-LOC file with **five** unrelated |
| concerns (orchestration, AD, settings, sync, batch-processing) wired by two |
| independent runtimes (`WorkflowRuntime`, `ProcessingRuntime`) that share no |
| code. Strictly worse. Rejected. |
| - **`publication → catalog`** → a ~1,158-LOC, three-concern `catalog_routes`, |
| and `test_publication_routes.py`'s `importlib.reload` would pollute sibling |
| catalog/status tests. Publications is its own concern (separate JW |
| publications API, DB, and image search). Marginal count gain, real risk. |
| Rejected. |
|
|
| The `≤5` North-star was a ceiling, not a quota. 6 coherent routers is the |
| maintainable resting place; the churn budget is better spent on search quality. |
| If a future session wants 5, the publication→catalog merge is the only |
| defensible one and must first replace that test's reload with attribute-patching. |
|
|
| ## Session 3 — DB consolidation flipped to single DB (2026-05-29) |
|
|
| Validated and **flipped the live app onto the merged single DB**. Steps taken |
| and what's needed to keep/rollback it: |
|
|
| - **Merged DB:** `/Users/avsadmin/searchui-merged.db` (3.2 GB; built with |
| `scripts/merge_databases.py` after fixing the vec0 PK-alias bug). All 15 key |
| row counts match the per-source dry-run exactly. |
| - **Durable launcher:** `scripts/run-backend-merged.sh` sets the five env vars |
| (override location via `SEARCH_UI_MERGED_DB`); see CLAUDE.md Development |
| Workflow. Merged to `main` 2026-05-29. |
| - **Flip is LIVE in the running backends** (`:8001` and `:8002`) via these env |
| vars on the uvicorn launch (settings.db stays separate — it is NOT merged): |
| ``` |
| SEARCH_UI_SEARCH_DB_PATH=/Users/avsadmin/searchui-merged.db |
| SEARCH_UI_IMAGE_DB_PATH=/Users/avsadmin/searchui-merged.db |
| SEARCH_UI_FACE_DB_PATH=/Users/avsadmin/searchui-merged.db |
| SEARCH_UI_SPEAKER_DB_PATH=/Users/avsadmin/searchui-merged.db |
| SEARCH_UI_PUBLICATIONS_DB_PATH=/Users/avsadmin/searchui-merged.db |
| ``` |
| - **To make it durable** (survive a manual restart): start the backend with |
| `scripts/run-backend-merged.sh` (it sets those env vars for you), per CLAUDE.md's |
| Development Workflow. The app reads `os.environ` directly — there is no `.env` |
| loader — so starting it the *plain* way (without the launcher) reverts to the |
| 5 source DBs. |
| - **Rollback:** unset the five env vars and restart → back to the original 5 |
| DBs, which are left untouched. **RETENTION: do NOT archive/delete the original |
| 5 source DBs (`database.db`, `images_database.db`, `faces_database.db`, |
| `speakers_database.db`, `publications_database.db`) before ~2026-06-12** (≈2 |
| weeks of normal single-DB use). They are the rollback path until then. |
| - **Golden baseline re-captured on the merged DB** (`tests/golden/snapshot.json`). |
| Smoke-tested keyword/semantic/hybrid/image-content/title/scripture/ |
| publication-image on the flipped `:8001` — all return sensible results. |
| - **Known cosmetic diff vs the 5-DB world:** one semantic query's two |
| equal-score (0.71191) hits swap tie-order (vec0 index rebuild). Same set, |
| same scores. A deterministic tie-break (ORDER BY distance, natural_key) is a |
| good follow-up under search-quality. |
| |
| ## Phase 2 Alembic foundation (Session 3) — SCOPED, opt-in |
| |
| Added a safe foundation for schema versioning without the risky full cutover |
| (the Plan agent showed a full cutover is currently unsafe — see "deferred"). |
| |
| **Delivered:** |
| - `backend/schema_version.py` — `ensure_alembic_version()` idempotently brings a DB under |
| Alembic: existing DB → `stamp` baseline; empty file → `upgrade`; already |
| versioned → no-op. Plus `get_primary_db_path()`. |
| - Wired into `app_runtime.create_app_runtime()` **behind `SEARCH_UI_ALEMBIC_MANAGE` |
| (OFF by default)** — so merging changes nothing until opted in. When enabled, |
| the live merged DB gets stamped at baseline (`alembic_version` row; safe, |
| reversible, no data change). `init_db()` still owns the schema; failure is |
| logged, not fatal. |
| - Tests: stamp/upgrade/idempotent detection + flag-gated boot wiring. |
| |
| **Deferred (do NOT do without a separate, carefully-verified PR):** |
| - Replacing the no-op baseline with a verbatim live-schema migration, and |
| removing the per-subsystem `init_db()` `CREATE TABLE IF NOT EXISTS`. Blockers: |
| Alembic autogenerate can't model the 12 vec0/FTS5 virtual tables (baseline |
| must be hand-authored raw DDL with sqlite-vec loaded at migrate time), and |
| `_schema_metadata` carries embedding model/recipe info `alembic_version` does |
| not replace. |
| |
| **Finding — orphan tables not in source.** `video_concepts`, `video_concepts_fts`, |
| `video_concept_embeddings` are READ by `search_semantic.py` but have **no |
| `CREATE` statement in `backend/`** — they exist only because some ingestion path |
| (outside the web backend) created them in the live DB. A fresh web-only install |
| would lack them. Their live DDL (captured for the future baseline): |
| ```sql |
| CREATE TABLE video_concepts (natural_key TEXT NOT NULL, language TEXT NOT NULL, |
| summary TEXT NOT NULL, topics_json TEXT NOT NULL, keywords_json TEXT NOT NULL, |
| concept_text TEXT NOT NULL, content_hash TEXT, recipe_version INTEGER NOT NULL, |
| recipe_payload TEXT NOT NULL, indexed_at TEXT NOT NULL DEFAULT CURRENT_TIMESTAMP, |
| visual_cues_json TEXT NOT NULL DEFAULT '[]', PRIMARY KEY (natural_key, language)); |
| CREATE VIRTUAL TABLE video_concepts_fts USING fts5(natural_key, language, concept_text); |
| CREATE VIRTUAL TABLE video_concept_embeddings USING vec0(natural_key TEXT, language TEXT, embedding float[1024]); |
| ``` |
| ## Phase 3a CLI scaffold (Session 3) — additive, ML untouched |
| |
| Added `backend/cli.py`: a stdlib-argparse ingestion entrypoint so ingestion can |
| run OUTSIDE the web process. **Purely additive** — it calls the existing |
| processing functions unchanged and does NOT move ML off the web request path |
| (that flip needs Glenn's cold-start RSS verification; see cutover below). |
| |
| **Subcommands wired (each calls an existing function; heavy imports are |
| function-local so `cli.py --help` and the parser load no torch):** |
| `ingest vod` · `ingest subtitles` · `ingest video [--all]` · `ingest images |
| --source {publications|web}` · `process subtitles` · `reindex embeddings` · |
| `reindex subtitles`. Tested via parser-routing + dispatch (faked modules) + a |
| subprocess guard asserting `torch`/`process_video` are absent after import. |
| |
| **Deferred:** `ingest faces` — faces have no standalone ingestion function; they |
| run inside `process_video`'s thumbnail step. A standalone face re-index is |
| net-new orchestration (pair it with Phase 6). |
|
|
| **ML import boundary (what the cutover must move):** the web process imports |
| torch eagerly today via `search.py` (`from sentence_transformers import ...`, |
| top-level) and `search_images.py` (`import torch` / `from transformers import |
| ...`, top-level), both pulled in at boot through `app_runtime`. DeepFace/TF |
| (face_search), Whisper (transcription), TransNetV2 (scene detect), CLAP/VLM |
| (scene_processing) are lazy *with respect to web boot* — the boot path never |
| imports them, so they stay out of web RSS. (Note: `video_scene_detect` itself |
| imports `search_images` at module level, so importing it directly still pulls |
| torch; only the TransNet model load is lazy. Relevant for step 4's relocation.) |
|
|
| **Cutover plan (separate PR, gated on Glenn's runtime check — do NOT do blind):** |
| 1. Make torch import-lazy in `search.py` + `search_images.py` (move the imports |
| into `get_embedding_model()` / `get_siglip_model()`; `TYPE_CHECKING` for |
| annotations). Web boot then imports no torch. |
| 2. Decide query-time embeddings (the real fork): keyword/title/scripture/ |
| image-category need NO model; semantic/hybrid + image-content text→embedding |
| DO. Either ship the **3b embedding sidecar** (web imports zero ML — only way |
| to hold <500 MB steady-state) or keep the model lazy in-process (cold-start |
| is lean but the first semantic query loads torch into web RSS). |
| 3. Make heavy-ingestion endpoints **enqueue a job / shell out to `cli.py`** |
| instead of running ML in the request worker. |
| 4. (optional) relocate ML modules under `backend/jwsearch/ingest/`. |
|
|
| **Gate (Glenn verifies in his runtime, not a sandbox):** boot `uvicorn main:app` |
| with the single-DB env vars, no warm queries; `ps -o rss= -p <pid>` after |
| `/api/health` 200 → target < 512000 KB; assert `torch` absent from the web |
| process; re-run `scripts/diff_golden.py` (no ranking drift) and confirm |
| background-ingest output is byte-identical to in-process. |
|
|
| ## Session 4 — Tie-break + Phase 3a cold-start cutover (2026-06-01) |
|
|
| Baseline confirmed green first: `main`==`origin/main` (50ef139), 628 tests |
| passing, merged-DB flip live, all six search families sane, golden diff clean. |
|
|
| **1. Deterministic semantic tie-break (search-quality follow-up from Session 3) — |
| committed.** sqlite-vec rejects a secondary `ORDER BY` on KNN queries, so |
| equal-distance hits arrived in index-dependent order (the 0.71191 tie-swap noted |
| in Session 3). Fixed in the three Python ranking sites (`search_semantic`, |
| `search_video_concepts`, `search_hybrid`) by breaking score ties on |
| `natural_key`. Verified on the real merged DB: the two previously-drifting golden |
| queries now have identical key sets + identical (key,score) multisets — only |
| equal-score hits reorder, now deterministically. search-eval (n=150): |
| title/keyword unchanged; hybrid recall@1 24.67→24.00% (1 sample, tie-ambiguity |
| noise, now *stable* vs rebuild-dependent). The hybrid wobble traces to the |
| semantic tie-break propagating into hybrid RRF ranks (hybrid reuses |
| search_semantic's order), not the concept sort. Golden re-baselined deterministic. |
| Added `test_search_tie_break.py`. |
|
|
| **2. Phase 3a cutover step 1 (lazy torch imports) — committed & runtime-verified.** |
| Moved torch/transformers/sentence-transformers out of module scope into |
| function-local imports in `search.py`, `search_images.py`, |
| `image_siglip_inference.py` (the three boot-path torch importers). **Cold-start |
| web RSS 450 MB → 76 MB; torch absent at boot** (new subprocess guard |
| `test_web_boot_ml_free.py`). Models still load lazily on first query (first |
| semantic → 828 MB, +visual → 1071 MB); results byte-identical (golden: all 12 |
| match; eval unchanged). 632 tests pass. |
|
|
| **Decision (Glenn): do Option A (lazy in-process), NOT the 3b sidecar.** Rationale |
| is the HF deployment shape: the Space is a *single* Docker container (free tier, |
| one uvicorn). A sidecar in the same container doesn't lower container RAM (the |
| model sits in one process either way); it only helps a true multi-machine split, |
| which isn't the deployment and would add per-query network latency + an always-on |
| paid service. Steady-state ~1.07 GB fits the 16 GB Space fine. The cold-start win |
| (faster Space wake; word/title/scripture searches ready instantly without loading |
| ML) is the real benefit and is now banked. Adjacent known issue, NOT addressed |
| here: CPU semantic/visual latency on the free Space (inherent to no-GPU; would |
| need caching / lighter model / precompute — a separate conversation). |
|
|
| **3. Single-video ingestion moved off the web worker — committed & runtime-verified.** |
| `POST /api/process-video` now shells out to `python -m cli ingest video` (new |
| `--result-json` gives a clean JSON artifact; the helper inherits env so the |
| subprocess targets the same DBs). Verified: endpoint returns HTTP 200 with the |
| identical result dict while the **web worker stays 76 MB / 0 torch** and a |
| separate subprocess (~1.3 GB) does all the ML. Response contract + delete |
| semantics unchanged. 640 tests pass. |
|
|
| **Still open in 3a (not started):** |
| - The **streaming bulk endpoints** (`process-all-videos`, `-v2`, `retry-failed`) |
| still run `process_video` inline — they stream live SSE progress, so moving |
| them out-of-process needs a progress-bridge design (subprocess → file/queue → |
| SSE). Deliberately deferred as a separate, designed piece. |
| - Composite workflows (`/api/update-content`, `reprocess-existing`, |
| `nuclear-rebuild`) — no single CLI subcommand yet. |
| - `index-publication-images` / `crawl-web-images` → `cli ingest images`. |
| - Optionally relocate ML modules under `backend/jwsearch/ingest/`. |
|
|
| ## Guardrails |
|
|
| 1. Golden tests pass at every commit. |
| 2. No phase ships without a rollback path. |
| 3. CLAUDE.md updated as part of the phase that invalidates a rule. |
| 4. Atomic commits. Subagent code review at each phase boundary. |
| 5. No model swaps and dependency cleanup in the same commit. |
|
|