Spaces:

jw-tools
/

jw-search

Running on CPU Upgrade

App Files Files Community

jw-search / REFACTOR_PLAN.md

jw-tools

deploy: latest main (lazy-ML cold start, durable launcher, web-image search, scene search) + full-app data refresh

7ea1851 verified 6 days ago

preview code

raw

history blame contribute delete

26.6 kB

	# Search-UI — Aggressive Refactor Plan

	This document tracks the autonomous refactor begun on `claude/refactor-aggressive`.
	It is the canonical source for what the refactor is doing, the order of work,
	and the gates between phases.

	## Goal

	Cut LOC by ~50%, separate ingestion from serving, replace hand-rolled infra
	with proven libraries, modernize the frontend. App must stay functional at
	every phase boundary.

	## North-star metrics

	\| Metric \| Before \| Target \|
	\|---\|---\|---\|
	\| Backend Python LOC \| ~35,850 \| ~10,000 \|
	\| Frontend JS/JSX LOC \| ~11,820 \| ~5,000 \|
	\| `*_routes.py` files \| 13 \| ≤ 5 \|
	\| SQLite DB files \| 5 \| 1 \|
	\| Web-process cold-start RSS \| 450 MB → 76 MB (Session 4) \| < 500 MB ✓ (cold) \|
	\| Tests collecting cleanly \| 85/443 → 443/443 after Phase 0 \| 443/443 \|
	\| Tests passing (sandbox, no model network) \| 440/443 \| 443/443 in user env \|

	## Phases & status

	\| Phase \| Title \| Status \|
	\|---\|---\|---\|
	\| 0 \| Safety net (conftest, golden tests, smoke script) \| DONE \|
	\| 1 \| Subtraction: dead code, redundant routers, rerankers, dedup \| DONE \|
	\| 2 \| Single database + merge tool + Alembic \| MERGE TOOL VALIDATED on real DBs (Session 3): found+fixed a vec0 INTEGER-PRIMARY-KEY-alias copy bug; merged 5→1 (3.2 GB), all 15 key row counts exact, golden diff identical except one benign tie-swap of two equal-score (0.71191) semantic hits. Flipped to the single DB (Session 3). Alembic SCOPED FOUNDATION added (opt-in); full cutover deliberately deferred — see "Phase 2 Alembic foundation" below. \|
	\| 3a \| ML out of web request path: CLI extraction \| Cutover step 1 (lazy torch imports) DONE & runtime-verified (Session 4): cold-start web RSS 450→76 MB, torch absent at boot, models load lazily on first query, results byte-identical (golden + eval). CLI scaffold from Session 3 stands. Steps 3–4 (enqueue ingestion / relocate modules) still open. \|
	\| 3b \| Query-time embedding sidecar (optional but recommended) \| DROPPED for current deployments (Session 4 decision). On the single-container HF Docker Space, web+sidecar share one container's RAM, so a sidecar does NOT lower container memory — its only real benefit is a true multi-machine split (web on HF, model on a GPU box), which isn't the deployment and would add per-query network latency + an always-on paid service. Revisit only if/when web and ML run on separate machines. \|
	\| 3c \| Video lifecycle: prune source MP4s after ingestion \| DONE \|
	\| 4 \| Route consolidation: 13 → 6 coherent routers \| DONE at 6 (Session 3 decision). 13→6 via cloud PRs #34–39 (search←content+analysis, face←speaker, catalog←status, workflow←AD+settings+sync). Stopped at 6 deliberately — the last two merges would chase a counter at the cost of multi-concern files; see "Phase 4 progress" below. \|
	\| 5 \| Frontend: TypeScript + React Router + TanStack Query + Tailwind \| DEFERRED — needs runtime verification \|
	\| 6 \| Face: DeepFace → insightface (ONNX) \| DEFERRED — needs runtime verification \|
	\| 7 \| Desktop story (drop Electron or → Tauri) \| DEFERRED — needs decision \|
	\| 8 \| Smart-search reckoning (instrument & decide) \| DEFERRED — needs production data \|

	## Phase 0 — Safety net (1–2 days)

	Build the regression detector before changing anything.

	Deliverables:
	- `conftest.py` at repo root that fixes the two-import-conventions test mess
	- `backend/__init__.py` so tests can import either flat or namespaced
	- `tests/golden/` query fixture set (run against user's real DB to populate)
	- `scripts/snapshot_golden.py` — capture top-N results per query
	- `scripts/diff_golden.py` — re-run and diff
	- `scripts/smoke_test.sh` — boot backend, hit 20 critical endpoints
	- All collection errors fixed (target: 131/131 tests collect, even if some skip)
	- Tag pre-refactor state for rollback

	Gate: all tests collect cleanly. Golden snapshot scripts run end-to-end
	against a dummy DB. Smoke script returns exit 0.

	## Phase 1 — Subtraction (2–3 days)

	Pure deletion. No new deps, no architecture change.

	Delete outright:
	- `frontend/src/designMockups/` (13 files, ~485 LOC) — DONE
	- `MockupReview.jsx` + `?view=mockups` routing — DONE
	- `main.py:/api/hello`, `/api/data` placeholders — DONE
	- `llm_router.py` + `llm_client.py` (orphaned, ~896 LOC) — DONE
	- `MiniLM` fallback in `ai_features.py` — DONE
	- Mixedbread legacy embedding code — DEFERRED to Phase 2
	(still used by search_semantic.py for whole-document index;
	removing it now breaks search against legacy DB rows. Will be
	retired via Alembic backfill migration during DB consolidation.)
	- `pywebview` — DEFERRED to Phase 7 (desktop decision)

	Consolidate:
	- 3 rerankers → `search_visual_rerank_rules.py` — DONE (common+event+rules
	merged; bug found and fixed in _best_label_score case-sensitivity)
	- 8 face files → defer to Phase 4. On inspection, the small ones
	(face_search_common 40 LOC, face_person_index 87 LOC, face_route_common
	48 LOC) sit at the bottom of the dep tree and can't be merged upward
	without circularity. The 5 mixin files map to real concerns (db, storage,
	people, recognition, review) — collapsing them touches a 2000+ line class
	and isn't worth the regression risk in this phase.
	- Duplicate `_get_vod_categories()` → one helper in `media_metadata.py` — TODO

	Policy change: drop the 600-line hard limit from `CLAUDE.md`. Replace with
	guidance: "files should do one thing; never split a coherent concept just to
	satisfy a counter."

	Gate: golden tests pass. Tests still collect. LOC down ≥ 5k.

	## Phase 2 — Single DB + Alembic (4–5 days)

	Deliverables:
	- `backend/schema_version.py` — one connection factory, one DB file
	- `backend/migrations/` — Alembic with baseline migration
	- `scripts/migrate_to_single_db.py` — merge 5 source DBs into 1, verify row counts
	- Remove every `CREATE TABLE IF NOT EXISTS` from app boot code
	- Replace `_schema_metadata` per-db with single `alembic_version`

	Gate: migration script runs cleanly on a copy of real DBs.
	Golden tests pass against single DB. Row counts match.

	Risk mitigation: original 5 DBs untouched until 2 weeks of normal use.

	## Phase 3 — ML out of web process

	### 3a: CLI extraction

	Deliverables:
	- `backend/cli.py` with subcommands: `jws ingest vod`, `jws ingest subtitles`,
	`jws ingest video`, `jws ingest faces`, `jws ingest images`, `jws reindex embeddings`
	- All ML imports (`torch`, `deepface`, `transformers`, `whisper`, `transnetv2`)
	moved into `backend/jwsearch/ingest/`
	- Web process imports only `sentence-transformers` for query embeddings
	(or none if 3b ships)
	- Endpoints that previously kicked off processing now enqueue jobs

	### 3b: Query-time embedding sidecar (optional)

	Deliverables:
	- `backend/jwsearch/embed_service.py` — 100-line FastAPI process holding Qwen3
	- Main web process makes HTTP calls to it
	- Web process imports zero ML libraries

	### 3c: Video lifecycle

	Deliverables:
	- `jws prune videos --keep-thumbnails --keep-embeddings` CLI command
	- New column `source_deleted_at` on the videos table
	- New env flag `SEARCH_UI_KEEP_SOURCE_VIDEOS` (default `false`)
	- Ingestion workflow deletes MP4 after extraction if flag is unset
	- `content_status` treats videos with `source_deleted_at` set as "complete"

	Rationale: thumbnails (~50 MB/video) are 10× smaller than source MP4s
	(~350 MB/video). JW.org streams playback via `progressiveDownloadURL` already.
	Re-extraction only needs re-download (bandwidth, not storage).

	Gate: web process cold-start under 5s, RSS under 500 MB. Background
	ingest produces identical indexed data (golden tests pass).
	A pruned video still plays via `ClipPlayer` (poster + streaming URL).

	## Phase 4 — Route consolidation (3–5 days)

	13 routers → 4:

	\| New router \| Replaces \|
	\|---\|---\|
	\| `search.py` \| search_routes, content_routes, analysis_routes (scripture) \|
	\| `catalog.py` \| catalog_routes, status_routes, publication_routes (read) \|
	\| `people.py` \| face_routes, face_route_persons, speaker_routes \|
	\| `jobs.py` \| workflow_routes, processing_routes, sync_routes, audio_description_routes, settings_routes \|

	Plus: `services.py` (centralized service factory), `errors.py` (global handlers),
	`schemas.py` (Pydantic DTOs).

	Gate: golden tests pass. Frontend works without changes (URLs preserved).

	## Phases 5–8 — Deferred (need runtime verification or production data)

	5: Frontend modernization, 6: Face re-platform, 7: Desktop story, 8: Router decision.
	Documented in chat; not started in this autonomous session.

	## Session 1 actual outcome (2026-05-26)

	Phases 0, 1, 3c shipped. Phase 2 scaffolded (merge tool only).
	Phases 3a, 3b, 4, 5, 6, 7, 8 deferred — they need runtime verification,
	production data, or are best sequenced after the user flips to the
	single DB.

	Net change: ~5,000 LOC removed. 18 new tests added. Test suite:
	455/458 passing (3 pre-existing HuggingFace-network failures unchanged).
	Branch: `claude/refactor-aggressive`.

	What the user needs to do next to continue the refactor:

	1. Run the golden snapshot against current backend with real data:
	```
	python scripts/snapshot_golden.py --base-url http://localhost:8001 \
	--output tests/golden/snapshot.json
	git add tests/golden/snapshot.json && git commit
	```
	Without a baseline snapshot, Phase 4+ can't detect ranking regressions.

	2. Merge the databases. The merge tool now reconstructs regular
	tables, vec0 (sqlite-vec) embeddings, AND FTS5 full-text indices,
	preserving rowids so embedding→metadata joins survive. Steps:
	```
	# Dry run first — reports per-source row counts, writes nothing
	python scripts/merge_databases.py --output ~/searchui-merged.db --dry-run
	# Then for real (needs: pip install sqlite-vec)
	python scripts/merge_databases.py --output ~/searchui-merged.db
	```
	Verify the merged DB serves searches correctly, THEN flip the app by
	pointing all DB env vars at it:
	```
	export SEARCH_UI_SEARCH_DB_PATH=~/searchui-merged.db
	export SEARCH_UI_IMAGE_DB_PATH=~/searchui-merged.db
	export SEARCH_UI_FACE_DB_PATH=~/searchui-merged.db
	export SEARCH_UI_SPEAKER_DB_PATH=~/searchui-merged.db
	export SEARCH_UI_PUBLICATIONS_DB_PATH=~/searchui-merged.db
	```
	Run the golden diff (`scripts/diff_golden.py`) against the flipped
	app to confirm no ranking drift. Keep the original 5 DBs untouched
	for two weeks as the rollback path before archiving.

	Still TODO in Phase 2: replace the per-DB `CREATE TABLE IF NOT EXISTS`
	bootstrap + `_schema_metadata` with Alembic migrations so the single
	DB has real schema versioning. The merge tool keeps only the first
	source's `_schema_metadata` row — Alembic's `alembic_version` will
	supersede it.

	3. Try the video prune in dry-run mode first:
	```
	python scripts/prune_source_videos.py --dry-run
	```
	Expect ~1 TB of disk reclaimed across 3,713 videos.

	## Session 2 actual outcome (2026-05-28)

	Completed the Phase 2 merge tool (vec0 + FTS5 reconstruction with rowid
	preservation), then ran full pre-merge QC on the whole `claude/refactor-aggressive`
	branch (18 commits, net −2,952 LOC):

	- Safety review (subagent): zero dangling references — every deleted module
	(`llm_client`, `llm_router`, the two visual rerankers) and removed endpoint
	(`/api/hello`, `/api/data`, `designMockups`) has zero remaining referents.
	- Standards review (subagent): zero MUST-FIX. Fixed SHOULD-FIX items —
	`SEARCH_UI_KEEP_SOURCE_VIDEOS` was silently ignored on the batch path
	(`process_all_local_videos` hard-coded `delete_video_after=True`); defaulted
	to the `None` sentinel + added a regression test. Removed a phantom
	`--keep-largest` doc line and two dead imports.
	- Security review (skill): no vulnerabilities. The SQL-building merge tool
	and file-deleting prune script are operator CLIs whose only external inputs
	are trusted env/CLI values and the app's own schema.

	Test suite: 460 passed, 3 pre-existing HuggingFace-network failures. Branch
	merged to `main` via PR. Continuation handoff for the remaining phases lives in
	`CONTINUATION_PROMPT.md`.

	Next (see CONTINUATION_PROMPT.md): Phase 4 route consolidation → Phase 2
	Alembic → Phase 3a CLI-ingestion scaffold, each as an atomic-commit branch with
	subagent QC and a PR for Glenn to merge after an app smoke-test.

	## Phase 4 progress (Session 2, branch claude/phase4-route-consolidation)

	Done & verified (13 → 10 route modules):
	- Added `backend/tests/test_app_boot.py` — assembles the real `create_app()`
	(startup checks off via `SEARCH_UI_STARTUP_CHECKS=false`) and asserts the full
	/api surface + no duplicate (path, method) registrations. This is the
	regression guard that makes consolidation verifiable without the live app.
	- `search_routes.py` ← absorbed `content_routes.py` + `analysis_routes.py`
	(deduped the byte-identical `_get_default_*_service` helpers; dropped a dead
	`import json`).
	- `face_routes.py` ← absorbed `speaker_routes.py` (kept the module-level speaker
	`router` singleton).
	- Each step verified by a byte-identical 128-route manifest + full suite
	green (462 passed). Old files deleted; importing tests use module aliases.

	Naming note: `search.py` is the search ENGINE module, so the consolidated
	router keeps the `*_routes.py` convention rather than the plan's `search.py`.
	Targets are now: `search_routes`, `catalog_routes`, `face_routes`,
	`workflow_routes` (4 ≤ 5).

	Final state (Session 3): 13 → 6 routers. Phase 4 closed here.
	Cloud PRs #34–39 carried it past the Session-2 "13→10" note: `catalog_routes`
	absorbed `status_routes`, and `workflow_routes` absorbed `audio_description` +
	`settings` + `sync`. Current routers (6): `search_routes`, `catalog_routes`
	(+status), `face_routes` (+speaker), `workflow_routes` (+AD/settings/sync),
	`processing_routes`, `publication_routes`.

	Decision: stop at 6. Do NOT force the last two merges. A deep-dive review
	(Session 3) found both remaining merges trade maintainability for a smaller
	count — exactly what CLAUDE.md's file-size guidance warns against:
	- `processing → workflow` → a ~1,790-LOC file with five unrelated
	concerns (orchestration, AD, settings, sync, batch-processing) wired by two
	independent runtimes (`WorkflowRuntime`, `ProcessingRuntime`) that share no
	code. Strictly worse. Rejected.
	- `publication → catalog` → a ~1,158-LOC, three-concern `catalog_routes`,
	and `test_publication_routes.py`'s `importlib.reload` would pollute sibling
	catalog/status tests. Publications is its own concern (separate JW
	publications API, DB, and image search). Marginal count gain, real risk.
	Rejected.

	The `≤5` North-star was a ceiling, not a quota. 6 coherent routers is the
	maintainable resting place; the churn budget is better spent on search quality.
	If a future session wants 5, the publication→catalog merge is the only
	defensible one and must first replace that test's reload with attribute-patching.

	## Session 3 — DB consolidation flipped to single DB (2026-05-29)

	Validated and flipped the live app onto the merged single DB. Steps taken
	and what's needed to keep/rollback it:

	- Merged DB: `/Users/avsadmin/searchui-merged.db` (3.2 GB; built with
	`scripts/merge_databases.py` after fixing the vec0 PK-alias bug). All 15 key
	row counts match the per-source dry-run exactly.
	- Durable launcher: `scripts/run-backend-merged.sh` sets the five env vars
	(override location via `SEARCH_UI_MERGED_DB`); see CLAUDE.md Development
	Workflow. Merged to `main` 2026-05-29.
	- Flip is LIVE in the running backends (`:8001` and `:8002`) via these env
	vars on the uvicorn launch (settings.db stays separate — it is NOT merged):
	```
	SEARCH_UI_SEARCH_DB_PATH=/Users/avsadmin/searchui-merged.db
	SEARCH_UI_IMAGE_DB_PATH=/Users/avsadmin/searchui-merged.db
	SEARCH_UI_FACE_DB_PATH=/Users/avsadmin/searchui-merged.db
	SEARCH_UI_SPEAKER_DB_PATH=/Users/avsadmin/searchui-merged.db
	SEARCH_UI_PUBLICATIONS_DB_PATH=/Users/avsadmin/searchui-merged.db
	```
	- To make it durable (survive a manual restart): start the backend with
	`scripts/run-backend-merged.sh` (it sets those env vars for you), per CLAUDE.md's
	Development Workflow. The app reads `os.environ` directly — there is no `.env`
	loader — so starting it the plain way (without the launcher) reverts to the
	5 source DBs.
	- Rollback: unset the five env vars and restart → back to the original 5
	DBs, which are left untouched. **RETENTION: do NOT archive/delete the original
	5 source DBs (`database.db`, `images_database.db`, `faces_database.db`,
	`speakers_database.db`, `publications_database.db`) before ~2026-06-12** (≈2
	weeks of normal single-DB use). They are the rollback path until then.
	- Golden baseline re-captured on the merged DB (`tests/golden/snapshot.json`).
	Smoke-tested keyword/semantic/hybrid/image-content/title/scripture/
	publication-image on the flipped `:8001` — all return sensible results.
	- Known cosmetic diff vs the 5-DB world: one semantic query's two
	equal-score (0.71191) hits swap tie-order (vec0 index rebuild). Same set,
	same scores. A deterministic tie-break (ORDER BY distance, natural_key) is a
	good follow-up under search-quality.

	## Phase 2 Alembic foundation (Session 3) — SCOPED, opt-in

	Added a safe foundation for schema versioning without the risky full cutover
	(the Plan agent showed a full cutover is currently unsafe — see "deferred").

	Delivered:
	- `backend/schema_version.py` — `ensure_alembic_version()` idempotently brings a DB under
	Alembic: existing DB → `stamp` baseline; empty file → `upgrade`; already
	versioned → no-op. Plus `get_primary_db_path()`.
	- Wired into `app_runtime.create_app_runtime()` **behind `SEARCH_UI_ALEMBIC_MANAGE`
	(OFF by default)** — so merging changes nothing until opted in. When enabled,
	the live merged DB gets stamped at baseline (`alembic_version` row; safe,
	reversible, no data change). `init_db()` still owns the schema; failure is
	logged, not fatal.
	- Tests: stamp/upgrade/idempotent detection + flag-gated boot wiring.

	Deferred (do NOT do without a separate, carefully-verified PR):
	- Replacing the no-op baseline with a verbatim live-schema migration, and
	removing the per-subsystem `init_db()` `CREATE TABLE IF NOT EXISTS`. Blockers:
	Alembic autogenerate can't model the 12 vec0/FTS5 virtual tables (baseline
	must be hand-authored raw DDL with sqlite-vec loaded at migrate time), and
	`_schema_metadata` carries embedding model/recipe info `alembic_version` does
	not replace.

	Finding — orphan tables not in source. `video_concepts`, `video_concepts_fts`,
	`video_concept_embeddings` are READ by `search_semantic.py` but have **no
	`CREATE` statement in `backend/`** — they exist only because some ingestion path
	(outside the web backend) created them in the live DB. A fresh web-only install
	would lack them. Their live DDL (captured for the future baseline):
	```sql
	CREATE TABLE video_concepts (natural_key TEXT NOT NULL, language TEXT NOT NULL,
	summary TEXT NOT NULL, topics_json TEXT NOT NULL, keywords_json TEXT NOT NULL,
	concept_text TEXT NOT NULL, content_hash TEXT, recipe_version INTEGER NOT NULL,
	recipe_payload TEXT NOT NULL, indexed_at TEXT NOT NULL DEFAULT CURRENT_TIMESTAMP,
	visual_cues_json TEXT NOT NULL DEFAULT '[]', PRIMARY KEY (natural_key, language));
	CREATE VIRTUAL TABLE video_concepts_fts USING fts5(natural_key, language, concept_text);
	CREATE VIRTUAL TABLE video_concept_embeddings USING vec0(natural_key TEXT, language TEXT, embedding float[1024]);
	```
	## Phase 3a CLI scaffold (Session 3) — additive, ML untouched

	Added `backend/cli.py`: a stdlib-argparse ingestion entrypoint so ingestion can
	run OUTSIDE the web process. Purely additive — it calls the existing
	processing functions unchanged and does NOT move ML off the web request path
	(that flip needs Glenn's cold-start RSS verification; see cutover below).

	**Subcommands wired (each calls an existing function; heavy imports are
	function-local so `cli.py --help` and the parser load no torch):**
	`ingest vod` · `ingest subtitles` · `ingest video [--all]` · `ingest images
	--source {publications\|web}` · `process subtitles` · `reindex embeddings` ·
	`reindex subtitles`. Tested via parser-routing + dispatch (faked modules) + a
	subprocess guard asserting `torch`/`process_video` are absent after import.

	Deferred: `ingest faces` — faces have no standalone ingestion function; they
	run inside `process_video`'s thumbnail step. A standalone face re-index is
	net-new orchestration (pair it with Phase 6).

	ML import boundary (what the cutover must move): the web process imports
	torch eagerly today via `search.py` (`from sentence_transformers import ...`,
	top-level) and `search_images.py` (`import torch` / `from transformers import
	...`, top-level), both pulled in at boot through `app_runtime`. DeepFace/TF
	(face_search), Whisper (transcription), TransNetV2 (scene detect), CLAP/VLM
	(scene_processing) are lazy with respect to web boot — the boot path never
	imports them, so they stay out of web RSS. (Note: `video_scene_detect` itself
	imports `search_images` at module level, so importing it directly still pulls
	torch; only the TransNet model load is lazy. Relevant for step 4's relocation.)

	Cutover plan (separate PR, gated on Glenn's runtime check — do NOT do blind):
	1. Make torch import-lazy in `search.py` + `search_images.py` (move the imports
	into `get_embedding_model()` / `get_siglip_model()`; `TYPE_CHECKING` for
	annotations). Web boot then imports no torch.
	2. Decide query-time embeddings (the real fork): keyword/title/scripture/
	image-category need NO model; semantic/hybrid + image-content text→embedding
	DO. Either ship the 3b embedding sidecar (web imports zero ML — only way
	to hold <500 MB steady-state) or keep the model lazy in-process (cold-start
	is lean but the first semantic query loads torch into web RSS).
	3. Make heavy-ingestion endpoints enqueue a job / shell out to `cli.py`
	instead of running ML in the request worker.
	4. (optional) relocate ML modules under `backend/jwsearch/ingest/`.

	Gate (Glenn verifies in his runtime, not a sandbox): boot `uvicorn main:app`
	with the single-DB env vars, no warm queries; `ps -o rss= -p <pid>` after
	`/api/health` 200 → target < 512000 KB; assert `torch` absent from the web
	process; re-run `scripts/diff_golden.py` (no ranking drift) and confirm
	background-ingest output is byte-identical to in-process.

	## Session 4 — Tie-break + Phase 3a cold-start cutover (2026-06-01)

	Baseline confirmed green first: `main`==`origin/main` (50ef139), 628 tests
	passing, merged-DB flip live, all six search families sane, golden diff clean.

	**1. Deterministic semantic tie-break (search-quality follow-up from Session 3) —
	committed.** sqlite-vec rejects a secondary `ORDER BY` on KNN queries, so
	equal-distance hits arrived in index-dependent order (the 0.71191 tie-swap noted
	in Session 3). Fixed in the three Python ranking sites (`search_semantic`,
	`search_video_concepts`, `search_hybrid`) by breaking score ties on
	`natural_key`. Verified on the real merged DB: the two previously-drifting golden
	queries now have identical key sets + identical (key,score) multisets — only
	equal-score hits reorder, now deterministically. search-eval (n=150):
	title/keyword unchanged; hybrid recall@1 24.67→24.00% (1 sample, tie-ambiguity
	noise, now stable vs rebuild-dependent). The hybrid wobble traces to the
	semantic tie-break propagating into hybrid RRF ranks (hybrid reuses
	search_semantic's order), not the concept sort. Golden re-baselined deterministic.
	Added `test_search_tie_break.py`.

	2. Phase 3a cutover step 1 (lazy torch imports) — committed & runtime-verified.
	Moved torch/transformers/sentence-transformers out of module scope into
	function-local imports in `search.py`, `search_images.py`,
	`image_siglip_inference.py` (the three boot-path torch importers). **Cold-start
	web RSS 450 MB → 76 MB; torch absent at boot** (new subprocess guard
	`test_web_boot_ml_free.py`). Models still load lazily on first query (first
	semantic → 828 MB, +visual → 1071 MB); results byte-identical (golden: all 12
	match; eval unchanged). 632 tests pass.

	Decision (Glenn): do Option A (lazy in-process), NOT the 3b sidecar. Rationale
	is the HF deployment shape: the Space is a single Docker container (free tier,
	one uvicorn). A sidecar in the same container doesn't lower container RAM (the
	model sits in one process either way); it only helps a true multi-machine split,
	which isn't the deployment and would add per-query network latency + an always-on
	paid service. Steady-state ~1.07 GB fits the 16 GB Space fine. The cold-start win
	(faster Space wake; word/title/scripture searches ready instantly without loading
	ML) is the real benefit and is now banked. Adjacent known issue, NOT addressed
	here: CPU semantic/visual latency on the free Space (inherent to no-GPU; would
	need caching / lighter model / precompute — a separate conversation).

	3. Single-video ingestion moved off the web worker — committed & runtime-verified.
	`POST /api/process-video` now shells out to `python -m cli ingest video` (new
	`--result-json` gives a clean JSON artifact; the helper inherits env so the
	subprocess targets the same DBs). Verified: endpoint returns HTTP 200 with the
	identical result dict while the web worker stays 76 MB / 0 torch and a
	separate subprocess (~1.3 GB) does all the ML. Response contract + delete
	semantics unchanged. 640 tests pass.

	Still open in 3a (not started):
	- The streaming bulk endpoints (`process-all-videos`, `-v2`, `retry-failed`)
	still run `process_video` inline — they stream live SSE progress, so moving
	them out-of-process needs a progress-bridge design (subprocess → file/queue →
	SSE). Deliberately deferred as a separate, designed piece.
	- Composite workflows (`/api/update-content`, `reprocess-existing`,
	`nuclear-rebuild`) — no single CLI subcommand yet.
	- `index-publication-images` / `crawl-web-images` → `cli ingest images`.
	- Optionally relocate ML modules under `backend/jwsearch/ingest/`.

	## Guardrails

	1. Golden tests pass at every commit.
	2. No phase ships without a rollback path.
	3. CLAUDE.md updated as part of the phase that invalidates a rule.
	4. Atomic commits. Subagent code review at each phase boundary.
	5. No model swaps and dependency cleanup in the same commit.