jw-search / REFACTOR_PLAN.md
jw-tools's picture
deploy: latest main (lazy-ML cold start, durable launcher, web-image search, scene search) + full-app data refresh
7ea1851 verified

Search-UI β€” Aggressive Refactor Plan

This document tracks the autonomous refactor begun on claude/refactor-aggressive. It is the canonical source for what the refactor is doing, the order of work, and the gates between phases.

Goal

Cut LOC by ~50%, separate ingestion from serving, replace hand-rolled infra with proven libraries, modernize the frontend. App must stay functional at every phase boundary.

North-star metrics

Metric Before Target
Backend Python LOC ~35,850 ~10,000
Frontend JS/JSX LOC ~11,820 ~5,000
*_routes.py files 13 ≀ 5
SQLite DB files 5 1
Web-process cold-start RSS 450 MB β†’ 76 MB (Session 4) < 500 MB βœ“ (cold)
Tests collecting cleanly 85/443 β†’ 443/443 after Phase 0 443/443
Tests passing (sandbox, no model network) 440/443 443/443 in user env

Phases & status

Phase Title Status
0 Safety net (conftest, golden tests, smoke script) DONE
1 Subtraction: dead code, redundant routers, rerankers, dedup DONE
2 Single database + merge tool + Alembic MERGE TOOL VALIDATED on real DBs (Session 3): found+fixed a vec0 INTEGER-PRIMARY-KEY-alias copy bug; merged 5β†’1 (3.2 GB), all 15 key row counts exact, golden diff identical except one benign tie-swap of two equal-score (0.71191) semantic hits. Flipped to the single DB (Session 3). Alembic SCOPED FOUNDATION added (opt-in); full cutover deliberately deferred β€” see "Phase 2 Alembic foundation" below.
3a ML out of web request path: CLI extraction Cutover step 1 (lazy torch imports) DONE & runtime-verified (Session 4): cold-start web RSS 450β†’76 MB, torch absent at boot, models load lazily on first query, results byte-identical (golden + eval). CLI scaffold from Session 3 stands. Steps 3–4 (enqueue ingestion / relocate modules) still open.
3b Query-time embedding sidecar (optional but recommended) DROPPED for current deployments (Session 4 decision). On the single-container HF Docker Space, web+sidecar share one container's RAM, so a sidecar does NOT lower container memory β€” its only real benefit is a true multi-machine split (web on HF, model on a GPU box), which isn't the deployment and would add per-query network latency + an always-on paid service. Revisit only if/when web and ML run on separate machines.
3c Video lifecycle: prune source MP4s after ingestion DONE
4 Route consolidation: 13 β†’ 6 coherent routers DONE at 6 (Session 3 decision). 13β†’6 via cloud PRs #34–39 (search←content+analysis, face←speaker, catalog←status, workflow←AD+settings+sync). Stopped at 6 deliberately β€” the last two merges would chase a counter at the cost of multi-concern files; see "Phase 4 progress" below.
5 Frontend: TypeScript + React Router + TanStack Query + Tailwind DEFERRED β€” needs runtime verification
6 Face: DeepFace β†’ insightface (ONNX) DEFERRED β€” needs runtime verification
7 Desktop story (drop Electron or β†’ Tauri) DEFERRED β€” needs decision
8 Smart-search reckoning (instrument & decide) DEFERRED β€” needs production data

Phase 0 β€” Safety net (1–2 days)

Build the regression detector before changing anything.

Deliverables:

  • conftest.py at repo root that fixes the two-import-conventions test mess
  • backend/__init__.py so tests can import either flat or namespaced
  • tests/golden/ query fixture set (run against user's real DB to populate)
  • scripts/snapshot_golden.py β€” capture top-N results per query
  • scripts/diff_golden.py β€” re-run and diff
  • scripts/smoke_test.sh β€” boot backend, hit 20 critical endpoints
  • All collection errors fixed (target: 131/131 tests collect, even if some skip)
  • Tag pre-refactor state for rollback

Gate: all tests collect cleanly. Golden snapshot scripts run end-to-end against a dummy DB. Smoke script returns exit 0.

Phase 1 β€” Subtraction (2–3 days)

Pure deletion. No new deps, no architecture change.

Delete outright:

  • frontend/src/designMockups/ (13 files, ~485 LOC) β€” DONE
  • MockupReview.jsx + ?view=mockups routing β€” DONE
  • main.py:/api/hello, /api/data placeholders β€” DONE
  • llm_router.py + llm_client.py (orphaned, ~896 LOC) β€” DONE
  • MiniLM fallback in ai_features.py β€” DONE
  • Mixedbread legacy embedding code β€” DEFERRED to Phase 2 (still used by search_semantic.py for whole-document index; removing it now breaks search against legacy DB rows. Will be retired via Alembic backfill migration during DB consolidation.)
  • pywebview β€” DEFERRED to Phase 7 (desktop decision)

Consolidate:

  • 3 rerankers β†’ search_visual_rerank_rules.py β€” DONE (common+event+rules merged; bug found and fixed in _best_label_score case-sensitivity)
  • 8 face files β†’ defer to Phase 4. On inspection, the small ones (face_search_common 40 LOC, face_person_index 87 LOC, face_route_common 48 LOC) sit at the bottom of the dep tree and can't be merged upward without circularity. The 5 mixin files map to real concerns (db, storage, people, recognition, review) β€” collapsing them touches a 2000+ line class and isn't worth the regression risk in this phase.
  • Duplicate _get_vod_categories() β†’ one helper in media_metadata.py β€” TODO

Policy change: drop the 600-line hard limit from CLAUDE.md. Replace with guidance: "files should do one thing; never split a coherent concept just to satisfy a counter."

Gate: golden tests pass. Tests still collect. LOC down β‰₯ 5k.

Phase 2 β€” Single DB + Alembic (4–5 days)

Deliverables:

  • backend/schema_version.py β€” one connection factory, one DB file
  • backend/migrations/ β€” Alembic with baseline migration
  • scripts/migrate_to_single_db.py β€” merge 5 source DBs into 1, verify row counts
  • Remove every CREATE TABLE IF NOT EXISTS from app boot code
  • Replace _schema_metadata per-db with single alembic_version

Gate: migration script runs cleanly on a copy of real DBs. Golden tests pass against single DB. Row counts match.

Risk mitigation: original 5 DBs untouched until 2 weeks of normal use.

Phase 3 β€” ML out of web process

3a: CLI extraction

Deliverables:

  • backend/cli.py with subcommands: jws ingest vod, jws ingest subtitles, jws ingest video, jws ingest faces, jws ingest images, jws reindex embeddings
  • All ML imports (torch, deepface, transformers, whisper, transnetv2) moved into backend/jwsearch/ingest/
  • Web process imports only sentence-transformers for query embeddings (or none if 3b ships)
  • Endpoints that previously kicked off processing now enqueue jobs

3b: Query-time embedding sidecar (optional)

Deliverables:

  • backend/jwsearch/embed_service.py β€” 100-line FastAPI process holding Qwen3
  • Main web process makes HTTP calls to it
  • Web process imports zero ML libraries

3c: Video lifecycle

Deliverables:

  • jws prune videos --keep-thumbnails --keep-embeddings CLI command
  • New column source_deleted_at on the videos table
  • New env flag SEARCH_UI_KEEP_SOURCE_VIDEOS (default false)
  • Ingestion workflow deletes MP4 after extraction if flag is unset
  • content_status treats videos with source_deleted_at set as "complete"

Rationale: thumbnails (50 MB/video) are 10Γ— smaller than source MP4s (350 MB/video). JW.org streams playback via progressiveDownloadURL already. Re-extraction only needs re-download (bandwidth, not storage).

Gate: web process cold-start under 5s, RSS under 500 MB. Background ingest produces identical indexed data (golden tests pass). A pruned video still plays via ClipPlayer (poster + streaming URL).

Phase 4 β€” Route consolidation (3–5 days)

13 routers β†’ 4:

New router Replaces
search.py search_routes, content_routes, analysis_routes (scripture)
catalog.py catalog_routes, status_routes, publication_routes (read)
people.py face_routes, face_route_persons, speaker_routes
jobs.py workflow_routes, processing_routes, sync_routes, audio_description_routes, settings_routes

Plus: services.py (centralized service factory), errors.py (global handlers), schemas.py (Pydantic DTOs).

Gate: golden tests pass. Frontend works without changes (URLs preserved).

Phases 5–8 β€” Deferred (need runtime verification or production data)

5: Frontend modernization, 6: Face re-platform, 7: Desktop story, 8: Router decision. Documented in chat; not started in this autonomous session.

Session 1 actual outcome (2026-05-26)

Phases 0, 1, 3c shipped. Phase 2 scaffolded (merge tool only). Phases 3a, 3b, 4, 5, 6, 7, 8 deferred β€” they need runtime verification, production data, or are best sequenced after the user flips to the single DB.

Net change: ~5,000 LOC removed. 18 new tests added. Test suite: 455/458 passing (3 pre-existing HuggingFace-network failures unchanged). Branch: claude/refactor-aggressive.

What the user needs to do next to continue the refactor:

  1. Run the golden snapshot against current backend with real data:

    python scripts/snapshot_golden.py --base-url http://localhost:8001 \
        --output tests/golden/snapshot.json
    git add tests/golden/snapshot.json && git commit
    

    Without a baseline snapshot, Phase 4+ can't detect ranking regressions.

  2. Merge the databases. The merge tool now reconstructs regular tables, vec0 (sqlite-vec) embeddings, AND FTS5 full-text indices, preserving rowids so embedding→metadata joins survive. Steps:

    # Dry run first β€” reports per-source row counts, writes nothing
    python scripts/merge_databases.py --output ~/searchui-merged.db --dry-run
    # Then for real (needs: pip install sqlite-vec)
    python scripts/merge_databases.py --output ~/searchui-merged.db
    

    Verify the merged DB serves searches correctly, THEN flip the app by pointing all DB env vars at it:

    export SEARCH_UI_SEARCH_DB_PATH=~/searchui-merged.db
    export SEARCH_UI_IMAGE_DB_PATH=~/searchui-merged.db
    export SEARCH_UI_FACE_DB_PATH=~/searchui-merged.db
    export SEARCH_UI_SPEAKER_DB_PATH=~/searchui-merged.db
    export SEARCH_UI_PUBLICATIONS_DB_PATH=~/searchui-merged.db
    

    Run the golden diff (scripts/diff_golden.py) against the flipped app to confirm no ranking drift. Keep the original 5 DBs untouched for two weeks as the rollback path before archiving.

    Still TODO in Phase 2: replace the per-DB CREATE TABLE IF NOT EXISTS bootstrap + _schema_metadata with Alembic migrations so the single DB has real schema versioning. The merge tool keeps only the first source's _schema_metadata row β€” Alembic's alembic_version will supersede it.

  3. Try the video prune in dry-run mode first:

    python scripts/prune_source_videos.py --dry-run
    

    Expect ~1 TB of disk reclaimed across 3,713 videos.

Session 2 actual outcome (2026-05-28)

Completed the Phase 2 merge tool (vec0 + FTS5 reconstruction with rowid preservation), then ran full pre-merge QC on the whole claude/refactor-aggressive branch (18 commits, net βˆ’2,952 LOC):

  • Safety review (subagent): zero dangling references β€” every deleted module (llm_client, llm_router, the two visual rerankers) and removed endpoint (/api/hello, /api/data, designMockups) has zero remaining referents.
  • Standards review (subagent): zero MUST-FIX. Fixed SHOULD-FIX items β€” SEARCH_UI_KEEP_SOURCE_VIDEOS was silently ignored on the batch path (process_all_local_videos hard-coded delete_video_after=True); defaulted to the None sentinel + added a regression test. Removed a phantom --keep-largest doc line and two dead imports.
  • Security review (skill): no vulnerabilities. The SQL-building merge tool and file-deleting prune script are operator CLIs whose only external inputs are trusted env/CLI values and the app's own schema.

Test suite: 460 passed, 3 pre-existing HuggingFace-network failures. Branch merged to main via PR. Continuation handoff for the remaining phases lives in CONTINUATION_PROMPT.md.

Next (see CONTINUATION_PROMPT.md): Phase 4 route consolidation β†’ Phase 2 Alembic β†’ Phase 3a CLI-ingestion scaffold, each as an atomic-commit branch with subagent QC and a PR for Glenn to merge after an app smoke-test.

Phase 4 progress (Session 2, branch claude/phase4-route-consolidation)

Done & verified (13 β†’ 10 route modules):

  • Added backend/tests/test_app_boot.py β€” assembles the real create_app() (startup checks off via SEARCH_UI_STARTUP_CHECKS=false) and asserts the full /api surface + no duplicate (path, method) registrations. This is the regression guard that makes consolidation verifiable without the live app.
  • search_routes.py ← absorbed content_routes.py + analysis_routes.py (deduped the byte-identical _get_default_*_service helpers; dropped a dead import json).
  • face_routes.py ← absorbed speaker_routes.py (kept the module-level speaker router singleton).
  • Each step verified by a byte-identical 128-route manifest + full suite green (462 passed). Old files deleted; importing tests use module aliases.

Naming note: search.py is the search ENGINE module, so the consolidated router keeps the *_routes.py convention rather than the plan's search.py. Targets are now: search_routes, catalog_routes, face_routes, workflow_routes (4 ≀ 5).

Final state (Session 3): 13 β†’ 6 routers. Phase 4 closed here. Cloud PRs #34–39 carried it past the Session-2 "13β†’10" note: catalog_routes absorbed status_routes, and workflow_routes absorbed audio_description + settings + sync. Current routers (6): search_routes, catalog_routes (+status), face_routes (+speaker), workflow_routes (+AD/settings/sync), processing_routes, publication_routes.

Decision: stop at 6. Do NOT force the last two merges. A deep-dive review (Session 3) found both remaining merges trade maintainability for a smaller count β€” exactly what CLAUDE.md's file-size guidance warns against:

  • processing β†’ workflow β†’ a ~1,790-LOC file with five unrelated concerns (orchestration, AD, settings, sync, batch-processing) wired by two independent runtimes (WorkflowRuntime, ProcessingRuntime) that share no code. Strictly worse. Rejected.
  • publication β†’ catalog β†’ a ~1,158-LOC, three-concern catalog_routes, and test_publication_routes.py's importlib.reload would pollute sibling catalog/status tests. Publications is its own concern (separate JW publications API, DB, and image search). Marginal count gain, real risk. Rejected.

The ≀5 North-star was a ceiling, not a quota. 6 coherent routers is the maintainable resting place; the churn budget is better spent on search quality. If a future session wants 5, the publicationβ†’catalog merge is the only defensible one and must first replace that test's reload with attribute-patching.

Session 3 β€” DB consolidation flipped to single DB (2026-05-29)

Validated and flipped the live app onto the merged single DB. Steps taken and what's needed to keep/rollback it:

  • Merged DB: /Users/avsadmin/searchui-merged.db (3.2 GB; built with scripts/merge_databases.py after fixing the vec0 PK-alias bug). All 15 key row counts match the per-source dry-run exactly.
  • Durable launcher: scripts/run-backend-merged.sh sets the five env vars (override location via SEARCH_UI_MERGED_DB); see CLAUDE.md Development Workflow. Merged to main 2026-05-29.
  • Flip is LIVE in the running backends (:8001 and :8002) via these env vars on the uvicorn launch (settings.db stays separate β€” it is NOT merged):
    SEARCH_UI_SEARCH_DB_PATH=/Users/avsadmin/searchui-merged.db
    SEARCH_UI_IMAGE_DB_PATH=/Users/avsadmin/searchui-merged.db
    SEARCH_UI_FACE_DB_PATH=/Users/avsadmin/searchui-merged.db
    SEARCH_UI_SPEAKER_DB_PATH=/Users/avsadmin/searchui-merged.db
    SEARCH_UI_PUBLICATIONS_DB_PATH=/Users/avsadmin/searchui-merged.db
    
  • To make it durable (survive a manual restart): start the backend with scripts/run-backend-merged.sh (it sets those env vars for you), per CLAUDE.md's Development Workflow. The app reads os.environ directly β€” there is no .env loader β€” so starting it the plain way (without the launcher) reverts to the 5 source DBs.
  • Rollback: unset the five env vars and restart β†’ back to the original 5 DBs, which are left untouched. RETENTION: do NOT archive/delete the original 5 source DBs (database.db, images_database.db, faces_database.db, speakers_database.db, publications_database.db) before ~2026-06-12 (β‰ˆ2 weeks of normal single-DB use). They are the rollback path until then.
  • Golden baseline re-captured on the merged DB (tests/golden/snapshot.json). Smoke-tested keyword/semantic/hybrid/image-content/title/scripture/ publication-image on the flipped :8001 β€” all return sensible results.
  • Known cosmetic diff vs the 5-DB world: one semantic query's two equal-score (0.71191) hits swap tie-order (vec0 index rebuild). Same set, same scores. A deterministic tie-break (ORDER BY distance, natural_key) is a good follow-up under search-quality.

Phase 2 Alembic foundation (Session 3) β€” SCOPED, opt-in

Added a safe foundation for schema versioning without the risky full cutover (the Plan agent showed a full cutover is currently unsafe β€” see "deferred").

Delivered:

  • backend/schema_version.py β€” ensure_alembic_version() idempotently brings a DB under Alembic: existing DB β†’ stamp baseline; empty file β†’ upgrade; already versioned β†’ no-op. Plus get_primary_db_path().
  • Wired into app_runtime.create_app_runtime() behind SEARCH_UI_ALEMBIC_MANAGE (OFF by default) β€” so merging changes nothing until opted in. When enabled, the live merged DB gets stamped at baseline (alembic_version row; safe, reversible, no data change). init_db() still owns the schema; failure is logged, not fatal.
  • Tests: stamp/upgrade/idempotent detection + flag-gated boot wiring.

Deferred (do NOT do without a separate, carefully-verified PR):

  • Replacing the no-op baseline with a verbatim live-schema migration, and removing the per-subsystem init_db() CREATE TABLE IF NOT EXISTS. Blockers: Alembic autogenerate can't model the 12 vec0/FTS5 virtual tables (baseline must be hand-authored raw DDL with sqlite-vec loaded at migrate time), and _schema_metadata carries embedding model/recipe info alembic_version does not replace.

Finding β€” orphan tables not in source. video_concepts, video_concepts_fts, video_concept_embeddings are READ by search_semantic.py but have no CREATE statement in backend/ β€” they exist only because some ingestion path (outside the web backend) created them in the live DB. A fresh web-only install would lack them. Their live DDL (captured for the future baseline):

CREATE TABLE video_concepts (natural_key TEXT NOT NULL, language TEXT NOT NULL,
  summary TEXT NOT NULL, topics_json TEXT NOT NULL, keywords_json TEXT NOT NULL,
  concept_text TEXT NOT NULL, content_hash TEXT, recipe_version INTEGER NOT NULL,
  recipe_payload TEXT NOT NULL, indexed_at TEXT NOT NULL DEFAULT CURRENT_TIMESTAMP,
  visual_cues_json TEXT NOT NULL DEFAULT '[]', PRIMARY KEY (natural_key, language));
CREATE VIRTUAL TABLE video_concepts_fts USING fts5(natural_key, language, concept_text);
CREATE VIRTUAL TABLE video_concept_embeddings USING vec0(natural_key TEXT, language TEXT, embedding float[1024]);

Phase 3a CLI scaffold (Session 3) β€” additive, ML untouched

Added backend/cli.py: a stdlib-argparse ingestion entrypoint so ingestion can run OUTSIDE the web process. Purely additive β€” it calls the existing processing functions unchanged and does NOT move ML off the web request path (that flip needs Glenn's cold-start RSS verification; see cutover below).

Subcommands wired (each calls an existing function; heavy imports are function-local so cli.py --help and the parser load no torch): ingest vod Β· ingest subtitles Β· ingest video [--all] Β· ingest images --source {publications|web} Β· process subtitles Β· reindex embeddings Β· reindex subtitles. Tested via parser-routing + dispatch (faked modules) + a subprocess guard asserting torch/process_video are absent after import.

Deferred: ingest faces β€” faces have no standalone ingestion function; they run inside process_video's thumbnail step. A standalone face re-index is net-new orchestration (pair it with Phase 6).

ML import boundary (what the cutover must move): the web process imports torch eagerly today via search.py (from sentence_transformers import ..., top-level) and search_images.py (import torch / from transformers import ..., top-level), both pulled in at boot through app_runtime. DeepFace/TF (face_search), Whisper (transcription), TransNetV2 (scene detect), CLAP/VLM (scene_processing) are lazy with respect to web boot β€” the boot path never imports them, so they stay out of web RSS. (Note: video_scene_detect itself imports search_images at module level, so importing it directly still pulls torch; only the TransNet model load is lazy. Relevant for step 4's relocation.)

Cutover plan (separate PR, gated on Glenn's runtime check β€” do NOT do blind):

  1. Make torch import-lazy in search.py + search_images.py (move the imports into get_embedding_model() / get_siglip_model(); TYPE_CHECKING for annotations). Web boot then imports no torch.
  2. Decide query-time embeddings (the real fork): keyword/title/scripture/ image-category need NO model; semantic/hybrid + image-content text→embedding DO. Either ship the 3b embedding sidecar (web imports zero ML — only way to hold <500 MB steady-state) or keep the model lazy in-process (cold-start is lean but the first semantic query loads torch into web RSS).
  3. Make heavy-ingestion endpoints enqueue a job / shell out to cli.py instead of running ML in the request worker.
  4. (optional) relocate ML modules under backend/jwsearch/ingest/.

Gate (Glenn verifies in his runtime, not a sandbox): boot uvicorn main:app with the single-DB env vars, no warm queries; ps -o rss= -p <pid> after /api/health 200 β†’ target < 512000 KB; assert torch absent from the web process; re-run scripts/diff_golden.py (no ranking drift) and confirm background-ingest output is byte-identical to in-process.

Session 4 β€” Tie-break + Phase 3a cold-start cutover (2026-06-01)

Baseline confirmed green first: main==origin/main (50ef139), 628 tests passing, merged-DB flip live, all six search families sane, golden diff clean.

1. Deterministic semantic tie-break (search-quality follow-up from Session 3) β€” committed. sqlite-vec rejects a secondary ORDER BY on KNN queries, so equal-distance hits arrived in index-dependent order (the 0.71191 tie-swap noted in Session 3). Fixed in the three Python ranking sites (search_semantic, search_video_concepts, search_hybrid) by breaking score ties on natural_key. Verified on the real merged DB: the two previously-drifting golden queries now have identical key sets + identical (key,score) multisets β€” only equal-score hits reorder, now deterministically. search-eval (n=150): title/keyword unchanged; hybrid recall@1 24.67β†’24.00% (1 sample, tie-ambiguity noise, now stable vs rebuild-dependent). The hybrid wobble traces to the semantic tie-break propagating into hybrid RRF ranks (hybrid reuses search_semantic's order), not the concept sort. Golden re-baselined deterministic. Added test_search_tie_break.py.

2. Phase 3a cutover step 1 (lazy torch imports) β€” committed & runtime-verified. Moved torch/transformers/sentence-transformers out of module scope into function-local imports in search.py, search_images.py, image_siglip_inference.py (the three boot-path torch importers). Cold-start web RSS 450 MB β†’ 76 MB; torch absent at boot (new subprocess guard test_web_boot_ml_free.py). Models still load lazily on first query (first semantic β†’ 828 MB, +visual β†’ 1071 MB); results byte-identical (golden: all 12 match; eval unchanged). 632 tests pass.

Decision (Glenn): do Option A (lazy in-process), NOT the 3b sidecar. Rationale is the HF deployment shape: the Space is a single Docker container (free tier, one uvicorn). A sidecar in the same container doesn't lower container RAM (the model sits in one process either way); it only helps a true multi-machine split, which isn't the deployment and would add per-query network latency + an always-on paid service. Steady-state ~1.07 GB fits the 16 GB Space fine. The cold-start win (faster Space wake; word/title/scripture searches ready instantly without loading ML) is the real benefit and is now banked. Adjacent known issue, NOT addressed here: CPU semantic/visual latency on the free Space (inherent to no-GPU; would need caching / lighter model / precompute β€” a separate conversation).

3. Single-video ingestion moved off the web worker β€” committed & runtime-verified. POST /api/process-video now shells out to python -m cli ingest video (new --result-json gives a clean JSON artifact; the helper inherits env so the subprocess targets the same DBs). Verified: endpoint returns HTTP 200 with the identical result dict while the web worker stays 76 MB / 0 torch and a separate subprocess (~1.3 GB) does all the ML. Response contract + delete semantics unchanged. 640 tests pass.

Still open in 3a (not started):

  • The streaming bulk endpoints (process-all-videos, -v2, retry-failed) still run process_video inline β€” they stream live SSE progress, so moving them out-of-process needs a progress-bridge design (subprocess β†’ file/queue β†’ SSE). Deliberately deferred as a separate, designed piece.
  • Composite workflows (/api/update-content, reprocess-existing, nuclear-rebuild) β€” no single CLI subcommand yet.
  • index-publication-images / crawl-web-images β†’ cli ingest images.
  • Optionally relocate ML modules under backend/jwsearch/ingest/.

Guardrails

  1. Golden tests pass at every commit.
  2. No phase ships without a rollback path.
  3. CLAUDE.md updated as part of the phase that invalidates a rule.
  4. Atomic commits. Subagent code review at each phase boundary.
  5. No model swaps and dependency cleanup in the same commit.