Agentic-Service-Data-Eyond-Catalog

Running

App Files Files Community

feat/Analysis State & Report Rework

by sofhiaazzhr - opened 10 days ago

base: refs/heads/main

←

from: refs/pr/4

Discussion Files changed

+1385

-2254

Files changed (31) hide show

.gitignore +2 -3
API_ENDPOINTS.md +373 -0
ARCHITECTURE.md +0 -353
CHECKPOINT_PLAN_2026-06-17.md +0 -147
DEV_PLAN.md +160 -0
PHASE1_TO_PHASE2_REPORT.md +0 -273
PROGRESS.md +0 -692
PROJECT_BRD.md +150 -0
REPO_CONTEXT.md +0 -494
REPO_STATUS.md +306 -0
src/agents/chat_handler.py +48 -41
src/agents/gate.py +16 -13
src/agents/handlers/help.py +28 -22
src/agents/handlers/problem_statement.py +4 -0
src/agents/orchestration.py +7 -5
src/agents/report/generator.py +38 -26
src/agents/report/readiness.py +17 -13
src/agents/report/schemas.py +10 -8
src/agents/report/store.py +1 -1
src/agents/slow_path/schemas.py +1 -1
src/agents/slow_path/store.py +14 -14
src/agents/state_store.py +6 -6
src/api/v1/analysis.py +3 -3
src/api/v1/report.py +40 -4
src/api/v1/tools.py +80 -64
src/config/prompts/help.md +27 -33
src/config/prompts/intent_router.md +5 -17
src/config/prompts/report_summary.md +4 -3
src/config/settings.py +4 -5
src/db/postgres/init_db.py +1 -1
src/db/postgres/models.py +40 -12

.gitignore CHANGED Viewed

@@ -53,6 +53,5 @@ migratego/
 docs/specs/tabular_parquet_contract.md
 docs/specs/tabular_parquet.md
-# Personal / local working docs (not for the shared repo)
-AGENT_ARCHITECTURE_CONTEXT_new.md
-PROJECT_SUMMARY.md

 docs/specs/tabular_parquet_contract.md
 docs/specs/tabular_parquet.md
+# Personal / local working docs (not for the shared repo) — archived out of root
+docs/_archive/

API_ENDPOINTS.md ADDED Viewed

	@@ -0,0 +1,373 @@

+# Data Eyond — Python Agentic Service: FE-Callable API (for Go integration)
+**Audience:** Harry (Go gateway) wiring the FE → Go → Python surface.
+**Scope:** the **4 FE-callable surfaces** the Python service exposes after the 2026-06-24 pivot
+(DEV_PLAN decision #6). Everything else under `/api/v1` is internal / Phase-1 legacy / Go-owned —
+see [§7](#7-not-fe-facing) and the full inventory in [§9](#9-appendix--complete-endpoint-inventory-all-registered-routes).
+**Branch:** `pr/4` · **Snapshot:** 2026-06-25 · **Companion:** [REPO_STATUS.md](REPO_STATUS.md).
+> Request flow is **FE → Go → Python**. The FE never calls Python directly except for chat
+> streaming. Auth/JWT is terminated at the Go gateway; Python receives `user_id` / `room_id` as
+> **trusted inputs** and does no auth of its own.
+---
+## 1. The 4 FE-callable surfaces
+| # | Logical name | HTTP | How it's invoked |
+|---|---|---|---|
+| 1 | **`call_agent`** | `POST /api/v1/chat/stream` | The one streaming chat call. Router classifies + dispatches. |
+| 2 | **`list_skills`** | `GET /api/v1/tools` | Static slash-command catalog for the FE "/" menu. Cacheable. |
+| 3 | **skill: `help`** | *(via `call_agent`)* | **No dedicated endpoint** — the router resolves it to the `help` intent inside `/chat/stream`. |
+| 4 | **skill: `report`** | `POST /api/v1/report` (+ 2 `GET`s) | Dedicated REST API. **Not** through `/chat/stream`. |
+**Key consequence for Go:** the two catalog skills are invoked **differently**. `/help` goes through
+`/chat/stream`; `/report` is a direct REST call to the Report API. The catalog's `name` field is the
+internal route key (`help` = router intent; `report` = the Report API), not a uniform dispatch key.
+**Conventions:**
+- Base path: `/api/v1`.
+- **`room_id == analysis_id`** — one chat room == one analysis session (#9). Callers pass `room_id`
+  to chat; it *is* the `analysis_id` used by the report API.
+- Streaming uses **SSE** (`text/event-stream`, `sse-starlette`).
+---
+## 2. `call_agent` — `POST /api/v1/chat/stream`
+The only FE→Python call in normal operation. Source: [chat.py:169](src/api/v1/chat.py:169).
+**Request body** (`application/json`) — `ChatRequest`:
+```json
+{
+  "user_id": "u_1a2b3c",
+  "room_id": "room_42",
+  "message": "What were total sales by region last quarter?"
+}
+```
+`room_id` is the analysis session id. No auth header (handled by Go).
+**Response:** `text/event-stream`. Events arrive in this order:
+| `event:` | `data:` payload | Notes |
+|---|---|---|
+| `sources` | JSON array of source refs | `{document_id, filename, page_label}`. Structured: one per executed table (`document_id = "{user_id}_{table}"`, `page_label = null`). Unstructured: deduped doc/page. `chat`/`help`/`error`: `[]`. |
+| `status` | text | **Slow-path only** — progress pings ("Planning…", "Running N steps…"). Keeps the SSE alive; safe to surface or ignore. |
+| `chunk` | text fragment | Concatenate in order to form the answer. |
+| `done` | *(empty)* | End of stream. |
+| `error` | text | Terminal error; stream stops after this. |
+> The handler also emits an internal `intent` event — it is **consumed inside Python** (gates
+> caching) and **not forwarded** to the client. Go/FE will never see it.
+**Example — `structured_flow` answer** (raw SSE wire; blank line separates events). Source shape:
+[chat_handler.py:607](src/agents/chat_handler.py:607).
+```
+event: sources
+data: [{"document_id":"u_1a2b3c_orders","filename":"orders","page_label":null}]
+event: status
+data: Planning analysis…
+event: status
+data: Running 3 steps…
+event: chunk
+data: Total sales by region last quarter:
+event: chunk
+data: Central led at $1.21M (38%), East $0.74M, West $0.55M (down 12% QoQ).
+event: done
+data:
+```
+**Example — simple `chat` reply** (no status pings, empty sources):
+```
+event: sources
+data: []
+event: chunk
+data: I'm your AI data analyst — connect a source or ask a question to get started.
+event: done
+data:
+```
+**Behavior worth knowing for integration:**
+- **Redis response cache** (1h TTL) is applied to the stateless `chat` intent only; cached replies
+  replay as `sources`/`chunk`/`done`.
+- **Greeting/farewell fast-path** returns a canned reply with no LLM call.
+- The LLM **router** classifies every message into one of **5 intents** —
+  `chat` · `help` · `check` · `unstructured_flow` · `structured_flow` — and dispatches. Messages
+  persist (user + assistant) on `done`.
+---
+## 3. `list_skills` — `GET /api/v1/tools`
+Static, deterministic, **safe for Go to cache**. Source: [tools.py:133](src/api/v1/tools.py:133).
+**Request:** none (no params, no body).
+**Response** `200` (`ListToolsResponse`):
+```json
+{
+  "count": 2,
+  "tools": [
+    { "command": "/help",   "name": "help",   "type": "skill",
+      "description": "Show what the assistant can do and guide your next step." },
+    { "command": "/report", "name": "report", "type": "skill",
+      "description": "Generate a versioned analysis report (background, EDA, key findings, insights)." }
+  ]
+}
+```
+`CommandResponse` = `{ command, name, type, description }`, `type ∈ {skill, analytics, data_access}`.
+Post-KM-678 the catalog is **`/help` + `/report` only**; the `analyze_*`, `check_*`, `retrieve_*`
+and retired `/problem-statement` entries are commented out (kept for restorability), not deleted.
+---
+## 4. skill: `help` — via `call_agent`
+**There is no `/help` endpoint.** The FE "/" menu surfaces `/help`; to invoke it, call
+`POST /api/v1/chat/stream` and let the router classify the message as the `help` intent
+([chat_handler.py:363](src/agents/chat_handler.py:363)). Help streams `chunk` events (same SSE
+shape as §2, with `sources: []` and no `status` pings) — a state-aware, next-step guidance reply.
+```
+event: sources
+data: []
+event: chunk
+data: Your goal is set — you can start exploring now. Try a question like "average order value by month", then I can generate a report.
+event: done
+data:
+```
+> **Open integration question (for Harry):** the Python `/chat/stream` contract has **no
+> forced-intent / slash-bypass param** — `handle()` always routes via the LLM classifier. So
+> deterministic `/help` dispatch depends on either (a) Go forwarding the literal slash text and
+> trusting the router to classify it as `help`, or (b) adding a forced-intent input to the chat
+> contract. The `tools.py` docstring's "slash invocation bypasses the router to the tool directly"
+> is **not yet true on the Python side.** Needs a decision. (DEV_PLAN #8/#18.)
+---
+## 5. skill: `report` — Report API
+Dedicated REST surface (the "Generate Report" button), **not** a chat route.
+Source: [report.py](src/api/v1/report.py).
+### `POST /api/v1/report`
+Generate, persist, and return a new report **version**.
+**Query params:** `analysis_id` (required), `user_id` (required). No request body.
+```
+POST /api/v1/report?analysis_id=room_42&user_id=u_1a2b3c
+```
+| Status | Meaning |
+|---|---|
+| `201` | New version generated → `AnalysisReport` body. |
+| `409` | Floor not met — **no recorded analyses yet** for this session, nothing to report. |
+| `500` | Generation or persistence failed. |
+**`201` response** (`AnalysisReport`):
+```json
+{
+  "report_id": "8f3a2b1c9d4e4f6a8b0c1d2e3f4a5b6c",
+  "analysis_id": "room_42",
+  "user_id": "u_1a2b3c",
+  "version": 2,
+  "generated_at": "2026-06-25T09:14:33.512Z",
+  "problem_statement": {
+    "objective": "Understand which regions drive revenue and why Q1 dipped.",
+    "business_questions": [
+      "Which regions contribute most to total revenue?",
+      "Did any region decline quarter-over-quarter?"
+    ]
+  },
+  "record_ids": ["rec_a1", "rec_b2"],
+  "executive_summary": "Revenue is concentrated in the Central region (38% of total). The West was the only region to contract, down 12% QoQ — the main driver of the Q1 dip.",
+  "findings": [
+    { "text": "Central region contributed 38% of total revenue, the largest share.",
+      "record_ids": ["rec_a1"], "supporting_data": null },
+    { "text": "West region revenue fell 12% quarter-over-quarter.",
+      "record_ids": ["rec_b2"], "supporting_data": null }
+  ],
+  "caveats": [
+    { "text": "March data for the East region was partially missing (~6% of rows).",
+      "record_ids": ["rec_b2"] }
+  ],
+  "open_questions": [
+    { "text": "What drove the West region's QoQ decline?", "record_ids": ["rec_b2"] }
+  ],
+  "data_sources": [
+    { "source_id": "src_sales_db", "name": "orders", "source_type": "postgres",
+      "detail": { "tables": ["orders"], "row_count": 48213,
+                  "columns": ["region", "amount", "ordered_at"] } }
+  ],
+  "method_steps": [
+    { "task_id": "t1", "stage": "data_understanding", "objective": "Inventory the sales source",
+      "status": "success", "tools_used": ["check_data"] },
+    { "task_id": "t2", "stage": "modeling", "objective": "Aggregate revenue by region",
+      "status": "success", "tools_used": ["analyze_aggregate"] }
+  ],
+  "rendered_markdown": "# Analysis Report\n\n*Generated 2026-06-25 by u_1a2b3c · 2 analyses · 1 source(s)*\n\n## Objective\nUnderstand which regions drive revenue…\n\n## Key Findings\n1. Central region contributed 38%…"
+}
+```
+**`409` response** (floor not met — the demo's most common error):
+```json
+{ "detail": "Not ready to generate a report — still needs at least one completed analysis." }
+```
+> ⚠️ **Demo/integration precondition:** `AnalysisRecord`s persist **only on the slow path**, so
+> reports require **`enable_slow_path=true`** on the Python deployment *and* ≥1 prior
+> `structured_flow` question in the session. With slow path off, `POST /report` **409s by design**,
+> not a bug. (DEV_PLAN #15/#16.)
+### `GET /api/v1/report/{analysis_id}`
+List a session's report versions (oldest-first). Returns `[ReportVersionEntry]`; `[]` if none.
+```json
+[
+  { "report_id": "1b2c3d4e…", "version": 1, "generated_at": "2026-06-24T15:02:11Z", "record_count": 1 },
+  { "report_id": "8f3a2b1c…", "version": 2, "generated_at": "2026-06-25T09:14:33Z", "record_count": 2 }
+]
+```
+### `GET /api/v1/report/{analysis_id}/{version}`
+Fetch one version → `AnalysisReport` (same shape as the `POST` 201 body above); `404` if that
+version doesn't exist.
+```json
+{ "detail": "No report v3 for analysis 'room_42'." }
+```
+---
+## 6. Schemas
+**`AnalysisReport`** (POST + GET-version body):
+| Field | Type | Notes |
+|---|---|---|
+| `report_id` | str | |
+| `analysis_id` | str | == `room_id` |
+| `user_id` | str \| null | |
+| `version` | int | monotonic V1, V2, … |
+| `generated_at` | datetime | ISO 8601, UTC |
+| `problem_statement` | `{ objective: str, business_questions: string[] }` | the frozen goal snapshot (new pivot shape) |
+| `record_ids` | string[] | records the version was built from |
+| `executive_summary` | str | the **only** LLM-authored field |
+| `findings` | `ReportFinding[]` | `{ text, record_ids[], supporting_data? }` |
+| `caveats` | `AttributedNote[]` | `{ text, record_ids[] }` |
+| `open_questions` | `AttributedNote[]` | `{ text, record_ids[] }` |
+| `data_sources` | `DataSourceRef[]` | `{ source_id, name, source_type, detail }` |
+| `method_steps` | `TaskSummary[]` | `{ task_id, stage, objective, status, tools_used[] }`; `stage` ∈ CRISP-DM phases |
+| `rendered_markdown` | str | the full rendered report |
+> **Persistence caveat:** dedorch `reports` stores **markdown only**. On read-back via the `GET`
+> endpoints, the structured fields above come back **empty** and `rendered_markdown` is the source of
+> truth. (REPO_STATUS §5.)
+**`ReportVersionEntry`** (GET-list rows): `{ report_id, version, generated_at, record_count }`.
+---
+## 7. Not FE-facing
+Registered under `/api/v1` but **not** part of the FE→Python surface — do not wire these from the FE:
+- **Analysis CRUD** — `POST /analysis/create`, `GET /analysis`, `GET /analysis/{id}`. Intended to
+  move behind Go (state writes via Go, per decision #5/#18). Router still **mounted** (Go may use it);
+  the FE should not call it.
+- **`check_data` / `check_knowledge`** — served by **Go**, not surfaced as Python FE endpoints.
+- **Chat cache management** — `DELETE /chat/cache`, `/chat/cache/room/{id}`, `/retrieval/cache/{user_id}`
+  (ops/internal).
+- **Phase-1 legacy routers** — `users`, `room`, `document`, `db_client`, `data_catalog`
+  (functionally migrated to Go; mostly dormant).
+- **Health/root** — `GET /`, `GET /health` (liveness only).
+---
+## 8. Open items affecting this contract
+1. **`/help` dispatch mechanism** — router-classify vs. forced-intent param (§4). *(DEV_PLAN #8/#18)*
+2. **`/report` needs `enable_slow_path=true`** + a prior `structured_flow` question, else 409.
+   *(DEV_PLAN #15)*
+3. **`analysis_records` home** post-`SKIP_INIT_DB` cutover — the report API depends on this table
+   existing. *(DEV_PLAN #14/#16)*
+4. **Analysis-state writes** — once Go owns creation + state writes, Python's per-turn state
+   `ensure` becomes a read-only get (Go must guarantee the row exists before any chat turn).
+   *(DEV_PLAN #18)*
+---
+## 9. Appendix — complete endpoint inventory (all registered routes)
+Every route mounted in [main.py](main.py), so task #8 can be decided against the full picture.
+**32 routes** across 9 routers + 2 app-level. Status legend:
+**✅ FE-callable** (one of the 4 surfaces — keep) · **✂️ comment out** (task #8 target) ·
+**🟦 legacy → Go** (Phase-1, functionally migrated; not FE→Python; mostly dormant) ·
+**⚙️ internal/ops**.
+| Method | Path | Purpose | Router | Status |
+|---|---|---|---|---|
+| POST | `/api/v1/chat/stream` | Main chat SSE — **`call_agent`**; carries chat/help/check/structured/unstructured intents | Chat | ✅ FE-callable (#1, +help #3) |
+| GET | `/api/v1/tools` | Slash-command catalog — **`list_skills`** (Go caches) | Tools | ✅ FE-callable (#2) |
+| POST | `/api/v1/report` | Generate a report version | Report | ✅ FE-callable (#4) |
+| GET | `/api/v1/report/{analysis_id}` | List report versions | Report | ✅ FE-callable (#4) |
+| GET | `/api/v1/report/{analysis_id}/{version}` | Fetch one report version | Report | ✅ FE-callable (#4) |
+| POST | `/api/v1/analysis/create` | Create session (state + room + bindings) | Analysis | ✂️ comment (#8 → Go) |
+| GET | `/api/v1/analysis` | List a user's analyses | Analysis | ✂️ comment (#8) |
+| GET | `/api/v1/analysis/{analysis_id}` | Get one session's state + sources | Analysis | ✂️ comment (#8) |
+| DELETE | `/api/v1/chat/cache` | Clear one cached reply | Chat | ⚙️ internal/ops |
+| DELETE | `/api/v1/chat/cache/room/{room_id}` | Clear a room's cache | Chat | ⚙️ internal/ops |
+| DELETE | `/api/v1/retrieval/cache/{user_id}` | Clear a user's retrieval cache | Chat | ⚙️ internal/ops |
+| GET | `/` | Service status | (app) | ⚙️ internal/ops |
+| GET | `/health` | Liveness probe | (app) | ⚙️ internal/ops |
+| POST | `/api/login` | Login by email + password ⚠️ mounted at `/api`, **not** `/api/v1` | Users | 🟦 legacy → Go |
+| GET | `/api/v1/documents/doctypes` | Supported document types | Documents | 🟦 legacy → Go |
+| GET | `/api/v1/documents/{user_id}` | List a user's documents | Documents | 🟦 legacy → Go |
+| POST | `/api/v1/document/upload` | Upload a document (10/min) | Documents | 🟦 legacy → Go |
+| DELETE | `/api/v1/document/delete` | Delete a document | Documents | 🟦 legacy → Go |
+| POST | `/api/v1/document/process` | Process / ingest a document | Documents | 🟦 legacy → Go |
+| GET | `/api/v1/rooms/{user_id}` | List a user's rooms | Rooms | 🟦 legacy → Go |
+| GET | `/api/v1/room/{room_id}` | Get one room | Rooms | 🟦 legacy → Go |
+| DELETE | `/api/v1/room/{room_id}` | Delete a room | Rooms | 🟦 legacy → Go |
+| POST | `/api/v1/room/create` | Create a room | Rooms | 🟦 legacy → Go |
+| GET | `/api/v1/data-catalog/{user_id}` | List catalog index | Data Catalog | 🟦 legacy → Go |
+| POST | `/api/v1/data-catalog/rebuild` | Rebuild a user's catalog | Data Catalog | 🟦 legacy → Go |
+| GET | `/api/v1/database-clients/dbtypes` | Supported DB types | Database Clients | 🟦 legacy → Go |
+| POST | `/api/v1/database-clients` | Create a DB connection | Database Clients | 🟦 legacy → Go |
+| GET | `/api/v1/database-clients/{user_id}` | List a user's DB connections | Database Clients | 🟦 legacy → Go |
+| GET | `/api/v1/database-clients/{user_id}/{client_id}` | Get one DB connection | Database Clients | 🟦 legacy → Go |
+| PUT | `/api/v1/database-clients/{client_id}` | Update a DB connection | Database Clients | 🟦 legacy → Go |
+| DELETE | `/api/v1/database-clients/{client_id}` | Delete a DB connection | Database Clients | 🟦 legacy → Go |
+| POST | `/api/v1/database-clients/{client_id}/ingest` | Build the catalog for a DB connection | Database Clients | 🟦 legacy → Go |
+**Tally:** 5 ✅ FE-callable · 3 ✂️ to comment (#8) · 19 🟦 legacy→Go · 5 ⚙️ internal/ops.
+**Task #8 reading:**
+- **Keep exposed:** the 5 ✅ rows (`chat/stream`, `/tools`, the 3 `report` routes). `help` rides on
+  `chat/stream` — no route of its own.
+- **Comment out (the #8 to-do):** the 3 `analysis` routes — analysis CRUD moves behind Go (#5/#18).
+- **`check_data` is not an HTTP endpoint** — it's the `check` router intent (runs inside
+  `chat/stream`) plus its now-commented slash-catalog entry (KM-678); Go serves it to the FE. So
+  "comment check_data" = the catalog line (done) + don't expose a Python route (there isn't one).
+- The 19 🟦 routers (`users`, `document`, `room`, `data_catalog`, `db_client`) are Phase-1 legacy,
+  already functionally in Go (REPO_STATUS §7). They're out of the FE→Python path but **still
+  mounted** — a separate cleanup from #8's analysis-CRUD scope.

ARCHITECTURE.md DELETED Viewed

@@ -1,353 +0,0 @@
-# Architecture — Data Eyond Agentic Service
-**Last updated**: 2026-05-20
-**Status**: Phase 2 catalog path shipped; document ingestion has moved to a separate Go service. The long-term split is **Python = agent/ML layer, Go = data plane**; this document covers the Python side only.
----
-## Product vision (north star)
-Data Eyond is an *AI data scientist* for business analytics, structured around **CRISP-DM** (Business Understanding → Data Understanding → Data Preparation → Modeling → Evaluation → Deployment). Targets executives doing self-serve deep-dives and data analysts/scientists offloading routine work.
-Envisioned user flow: **interview agent** captures goal → user connects data sources → asks natural-language question → CRISP-DM-structured analytical response, exportable as a **presentation** or **notebook-style report**.
-The catalog-driven, IR-based architecture documented below is the *foundation*. The next architectural evolution is an agentic layer (analytical planner, per-stage CRISP-DM agents, evaluator, reporter) that consumes the existing IntentRouter → QueryPlanner → Executor → ChatbotAgent spine as its tool layer. See `REPO_CONTEXT.md` → *Roadmap — agentic evolution* for the target agent topology.
----
-## TL;DR
-A catalog-driven AI service for data analysis. Users upload documents and register databases or tabular files; they ask natural-language questions and get answers grounded in their data.
-The architecture has two paths:
-- **Unstructured** (PDF, DOCX, TXT) — dense similarity over prose chunks (the right primitive for free-form text). **Ingestion is handled by a separate Go service**; this Python service reads embeddings from PGVector at query time.
-- **Structured** (databases, XLSX, CSV, Parquet) — a per-user **data catalog** describes what tables/columns exist; an LLM produces a structured **JSON intermediate representation (IR)** of the user's intent; a deterministic compiler turns the IR into SQL or pandas operations.
-The LLM produces *intent*, not query syntax. Deterministic code does the rest.
----
-## 1. Why catalog-driven design
-For a database or spreadsheet, a user's question maps to *known tables and columns* — not to *similar text fragments*. Treating structured data with the same retrieval primitive as prose (chunk + embed + rank top-K) makes the right column survive a probabilistic ranking lottery. Catalog-based **lookup** is the right primitive instead.
-A central per-user catalog also means:
-- One place to keep table/column descriptions (AI-generated, refreshed when the source changes).
-- The query planner sees the user's full data landscape in a single prompt.
-- Schema stays stable across user sessions without hitting the source DB on every query.
-- New sources auto-update the catalog without re-embedding chunks.
----
-## 2. Source taxonomy
-```
-Sources
-├── Unstructured (pdf, docx, txt)        →  Cu  (prose chunks via DocumentRetriever)
-└── Structured
-    ├── Schema (DB)                       →  Cs  (DB tables + columns)
-    └── Tabular (xlsx, csv, parquet)      →  Ct  (sheets + columns)
-                                           Cs ∪ Ct = Data Catalog Context
-```
-- **Cu** = unstructured prose context. Retrieval primitive: dense similarity over chunks.
-- **Cs** = DB schema context (tables, columns, descriptions, sample values).
-- **Ct** = tabular file context (sheets, columns, descriptions, sample values).
-- **Data Catalog Context** = `Cs ∪ Ct`. Passed to the query planner as a single unified view.
-DB vs tabular is **not** a routing concern — it's a per-source attribute (`source_type`) on each catalog entry. The split only matters at execution time (SQL vs pandas).
----
-## 3. Routing model
-> **Superseded 2026-06-18** — the 3-way `source_hint` below was reworked into a flat **6-intent** handler router (`chat`, `help`, `problem_statement`, `check`, `unstructured_flow`, `structured_flow`). Modality (structured vs unstructured *data*) is now the Planner's job, not the router's. See `ORCHESTRATOR_REWORK_PLAN.md`.
-```
-source_hint ∈ { chat, unstructured, structured }
-```
-- `chat` — no search, conversational reply only
-- `unstructured` — DocumentRetriever path (Cu)
-- `structured` — catalog-driven path (Cs ∪ Ct → planner → compiler → executor)
-The router commits to one path. Cross-source questions ("compare DB sales vs uploaded customer file") are handled inside the structured path because the planner sees both Cs and Ct in one prompt.
----
-## 4. Core architectural decisions
-### 4.1 Catalog as primary context, not retrieval
-For most users (≤50 tables), the entire catalog fits in ~3-5k tokens and is passed verbatim to the planner. No vector search, no BM25, no chunk retrieval. The LLM reads the whole catalog and picks the right table.
-When a user has hundreds of tables, **catalog-level retrieval** (BM25 + table-level vectors with RRF) can be added as a slicer between `CatalogReader` and `Planner`. Deferred until measurably needed.
-### 4.2 JSON IR over raw SQL
-The planner LLM emits a structured JSON IR describing query intent — not a SQL string. A deterministic compiler turns the IR into SQL (per dialect) or pandas/polars operations.
-Benefits:
-- Validatable with Pydantic before execution
-- Compiler whitelists allowed operations (no DROP, DELETE, etc.)
-- Portable: same IR → SQL (any dialect) / pandas / polars
-- Cheaper tokens, easier to debug, trivially testable without an LLM
-- LLM cannot emit valid-but-wrong SQL syntax
-### 4.3 Deterministic compiler, not LLM SQL writer
-The LLM produces *intent* (the IR). All actual query construction is deterministic Python. Compiler bugs are reproducible and fixable. Same IR always produces the same query.
-### 4.4 Pipeline stage isolation
-Each stage is its own module with typed input and typed output. No god classes. Stages: `IntentRouter`, `CatalogReader`, `QueryPlanner`, `IRValidator`, `QueryCompiler`, `QueryExecutor`, `ChatbotAgent`. Each is testable in isolation.
-### 4.5 Minimal LLM surface
-LLM calls happen in exactly three places (KM-557 removed `CatalogEnricher`; ingestion is now LLM-free — the planner reads column names, stats, and sample rows directly):
-1. **`IntentRouter`** — once per user message
-2. **`QueryPlanner`** — once per structured query (produces the IR)
-3. **`ChatbotAgent`** — once per answer (formats the response)
-Compiler and executors are pure code. No LLM in the hot path of query construction.
----
-## 5. End-to-end flow
-### Ingestion (when user uploads a file or connects a DB)
-```
-Structured sources (DB connect / XLSX / CSV / Parquet upload) — Python:
-source upload / DB connect
-    ↓
-introspect schema (DB: information_schema; tabular: file headers + sample rows)
-    ↓
-validate (Pydantic)
-    ↓
-write to catalog store (Postgres jsonb in `data_catalog`, keyed by user_id)
-```
-**Unstructured ingestion (PDF / DOCX / TXT) is handled by a separate Go service**, which writes chunks + embeddings into the `documents` collection in PGVector. The Python service does not own this path — it reads only.
-### Query (per user message)
-```
-User message
-    ↓
-Chat cache check (Redis, 24h TTL)
-    ↓ miss
-Load chat history
-    ↓
-IntentRouter LLM   →  needs_search?  source_hint?
-    ↓
-    ├── chat        → ChatbotAgent → SSE stream
-    ├── unstructured → DocumentRetriever (raw SQL: pgvector `<=>` cosine or `<+>` manhattan) → answerer
-    └── structured  →
-            CatalogReader (load full Cs ∪ Ct for user)
-                ↓
-            QueryPlanner LLM  →  JSON IR
-                ↓
-            IRValidator  (Pydantic + columns-exist + ops whitelist)
-                ↓
-            QueryCompiler  →  SQL (schema source) or pandas (tabular source)
-                ↓
-            QueryExecutor  (DbExecutor or TabularExecutor)
-                ↓
-            QueryResult
-                ↓
-            ChatbotAgent → SSE stream
-```
----
-## 6. Data catalog
-### Storage
-Per-user JSON document, stored as a `jsonb` row in Postgres keyed by `user_id`.
-### Schema (initial scope)
-```
-Catalog
-├── user_id, schema_version, generated_at
-└── sources[]
-    └── Source
-        ├── source_id, source_type, name, description, location_ref, updated_at
-        └── tables[]
-            └── Table
-                ├── table_id, name, description, row_count
-                └── columns[]
-                    └── Column
-                        ├── column_id, name, data_type, description
-                        ├── nullable
-                        ├── pii_flag
-                        ├── sample_values[]
-                        └── stats: { min, max, distinct_count } | null
-```
-### Best-practice fields deferred
-`description_human`, `synonyms[]`, `tags[]`, `primary_key`, `foreign_keys`, `unit`, `semantic_type`, `example_questions[]`, `schema_hash`, `enrichment_status`. Add when justified by user need.
-### Stable IDs
-`source_id`, `table_id`, `column_id` are stable internal references. `name` fields can change (e.g. column rename in source DB) without invalidating cached IRs.
-### PII handling
-Columns with `pii_flag: true` have `sample_values: null` — real values never enter LLM prompts. Auto-detected at ingestion via name patterns + value regex.
----
-## 7. JSON IR
-### Schema (initial scope)
-```
-QueryIR
-├── ir_version          : "1.0"
-├── source_id           : str   (references catalog)
-├── table_id            : str   (references catalog)
-├── select[]            : SelectItem
-│   ├── { kind: "column", column_id, alias? }
-│   └── { kind: "agg",    fn, column_id?, alias? }
-├── filters[]           : { column_id, op, value, value_type }
-├── group_by[]          : column_id
-├── order_by[]          : { column_id | alias, dir }
-└── limit               : int | null
-```
-### Whitelisted operators
-```
-Filter ops:  = != < <= > >= in not_in is_null is_not_null like between
-Agg fns:     count count_distinct sum avg min max
-```
-### Validation rules (enforced before execution)
-- `source_id` exists in catalog for this user
-- `table_id` belongs to that source
-- Every `column_id` exists in that table
-- Every `agg.fn` and `filter.op` is whitelisted
-- `value_type` consistent with column's `data_type`
-- `limit` positive int, ≤ hard cap (e.g. 10000)
-If any rule fails → reject IR → re-prompt planner with error context (max 3 retries).
-### Deferred features
-`having`, `offset`, boolean tree filters (OR/NOT), `distinct`, joins, window functions. Add as user demand proves the limitation.
----
-## 8. Executors
-Same input (validated IR), same output (`QueryResult`), different backends.
-### DbExecutor (schema sources)
-```
-IR → SqlCompiler → SQL string + params
-     ↓
-sqlglot validation (SELECT-only, whitelist tables/columns, LIMIT enforced)
-     ↓
-asyncpg / pymysql in read-only transaction with timeout (30s)
-     ↓
-QueryResult
-```
-Identifiers come from catalog (verified at validation time, safe to inline as quoted identifiers). Values are always parameterized — never inlined as strings.
-### TabularExecutor (tabular sources)
-```
-IR → PandasCompiler → operation chain
-     ↓
-choose strategy by file size:
-  ≤ 100 MB    → eager pandas
-  100 MB-1 GB → pyarrow with predicate pushdown
-  > 1 GB      → polars lazy scan
-     ↓
-execute in asyncio.to_thread (CPU work off the event loop)
-     ↓
-QueryResult
-```
-Initially eager pandas is sufficient. Add the others when a real file is too big.
-### Shared safety guarantees
-1. IR validated before reaching compiler
-2. Compiler is deterministic (no LLM)
-3. Identifiers from catalog (trusted)
-4. Values parameterized
-5. sqlglot second-line defence for SQL
-6. Read-only at every layer
-7. Timeouts and row caps
----
-## 9. Implementation scope
-### Initial PR — what ships first
-| Item | Folder |
-|---|---|
-| Data catalog Pydantic models | `src/catalog/models.py` |
-| Catalog ingestion (introspect → enrich → validate → store) | `src/catalog/`, `src/pipeline/` |
-| `IntentRouter` with 3-way source_hint | `src/agents/` |
-| `CatalogReader` (loads full catalog) | `src/catalog/reader.py` |
-| `QueryPlanner` LLM call | `src/query/planner/` |
-| JSON IR Pydantic models | `src/query/ir/models.py` |
-| IR validator | `src/query/ir/validator.py` |
-**Output**: a validated JSON IR object. Execution lands in a follow-up PR.
-### Follow-up PRs
-| PR | Scope |
-|---|---|
-| 2 | `QueryCompiler` (IR → SQL / pandas) |
-| 3 | `QueryExecutor` split: `DbExecutor` + `TabularExecutor` |
-| 4 | Retry / self-correction loop on execution failure |
-| 5 | Eval harness (golden question→IR→result examples) |
-| 6 | Auto PII tagging in catalog |
-| Later | Joins in IR, schema drift detection, hybrid catalog search |
----
-## 10. Open questions
-| # | Question | Why it matters |
-|---|---|---|
-| 1 | Catalog storage: JSON file per user vs Postgres `jsonb` row? | Affects ingestion + read performance |
-| 2 | Should the catalog also list unstructured files (with descriptions only)? | Gives router unified view of all user sources |
-| 3 | Catalog refresh trigger: explicit "rebuild" button, on every upload, or background TTL? | Staleness vs latency tradeoff |
-| 4 | Confirm joins are out of initial IR scope? | Limits what user questions can be answered |
-| 5 | PII handling for sample_values: mask, synthesize, or skip? | Affects what gets sent to LLM prompts |
----
-## 11. References
-- `docs/flowchart.html` — interactive end-to-end diagram (open in browser)
-- `docs/flowchart.mmd` — mermaid source for the diagram
----
-## Glossary
-- **Cu** — unstructured context (prose chunks)
-- **Cs** — schema context (DB tables/columns from catalog)
-- **Ct** — tabular context (file sheets/columns from catalog)
-- **IR** — intermediate representation (the JSON query shape)
-- **PR** — pull request (a unit of code change)
-- **PII** — personally identifiable information (names, emails, etc.)
-- **ABC** — abstract base class (Python contract for subclasses)

CHECKPOINT_PLAN_2026-06-17.md DELETED Viewed

@@ -1,147 +0,0 @@
-# Checkpoint Plan — Wednesday, 17 June 2026
-Working plan for Sofhia & Rifqi based on the checkpoint with mas Harry on **Thursday, 11 June 2026**.
-Goal: everything below is **merged and demo-able before the next sync on Wednesday, 17 June (afternoon)**.
-**Updated at: Friday, 12 June 2026** (Sofhia + Rifqi)
-> Source of truth for decisions is the meeting itself. Note: the NotebookLM summary is **stale on two points** — Data Availability Check was *eliminated* as a tool, and Success Metrics was *folded into* the Problem Statement template. Do not build either as a standalone skill.
----
-## 0. Progress (per Fri 12 Jun — Sofhia)
-Dated snapshot of what landed this session. Live task status (incl. what's left) lives in §2 Ownership — this section only records the deltas + traceability.
-- ✅ **Tool matrix** built (xlsx, all ~10 tools + status colours) — presentation material ready.
-- ✅ **Registry trimmed to 4 active analytics** (`KM-641`, commit `66e2e4d`): `ACTIVE_ANALYTICS_TOOLS` (descriptive, aggregate, correlation, trend) vs `DEFERRED_ANALYTICS_TOOLS` (comparison, contribution, profile, segment) — specs + compute fns kept, only registry exposure withheld. Tests 206 pass, ruff/mypy clean.
-- ✅ **Planner few-shot synced**: Example A `analyze_contribution` → `analyze_aggregate` (so few-shots don't reference a deferred tool).
-- ✅ **Data-access tools renamed** (`KM-642`, commit `c38c0c2`): `query_structured` → `data_retrieve`, `retrieve_documents` → `knowledge_retrieve` across the tool layer + planner stub/prompt/validator/few-shots. Mechanical, no behavior change.
-- ✅ **`data_check` merge + `knowledge_check`** (`KM-643`, commit `4bd5f1e`): `list_sources` + `describe_source` → one parameterized `data_check` (no arg = list structured sources; `source_id` = schema) + new `knowledge_check` (unstructured). Tests 206 pass.
-- ✅ **Redis Cloud live** (free tier, TTL = 1 h), env vars shared in the group (Rifqi).
-- ✅ **Planner tool list verified** against the trimmed registry — no references to old tool names or deferred analytics anywhere in `src/` (Rifqi).
-- 📌 **Decision:** `tests/` stays gitignored — team decided not to push tests to origin (closes PROGRESS.md R3 as won't-do).
-- 📌 **Ownership:** Rifqi owns `generate_report` development + the `analysis_records` table / real `AnalysisStore` (contract still co-designed with Sofhia).
-- ✅ **R5 cache fix** (Rifqi, `b701e95`): chat cache scoped by `user_id`, TTL 24h→1h.
-- ✅ **AnalysisRecord persistence landed** (Rifqi): `stage` now flows to the record (CRISP-DM grouping for the report) + identity fields (`record_id`/`analysis_id`/`user_id`); `PostgresAnalysisStore` + `analysis_records` table replace `NullAnalysisStore`, wired into `ChatHandler`. Unblocks the `generate_report` renderer and the DoD "record persisted" step. Open: `analysis_id` handoff from Harry's Analysis State.
-- ✅ **Verb-first tool naming** (Sofhia, commit `2d6406d`): the 4 data/knowledge tools renamed to lead with a verb — `data_check`→`check_data`, `knowledge_check`→`check_knowledge`, `data_retrieve`→`retrieve_data`, `knowledge_retrieve`→`retrieve_knowledge` (the `analyze_*` tools already lead with a verb). These verb-first names are now canonical; the tool-set table + §3 below use them. Dated log entries above keep the old names as historical record.
----
-## 1. Locked decisions (from the 2026-06-11 checkpoint)
-1. **Single chat page.** The separate interview/survey page is killed. Sidebar = Knowledge menu (connect/manage data) + Analysis menu (sessions).
-2. **Data-first hard gate.** Creating a new analysis requires **≥ 1 bound data source** (server-side rejection, no empty sessions). User provides title + optional short description.
-3. **Analysis State lives in the DB.** Per-analysis row: `user_id`, `data_source_ids[]`, `interview_status` (default `not_pass`), `report_status` (default `no_report` → `V1`, `V2`, …). Explicitly **NOT cached, NOT in Redis** — the Orchestrator reads it from Postgres every turn.
-4. **Skills, not agents.** No separate interview agent. The Orchestrator routes per user turn using the Analysis State; an analytical request still executes through the existing Planner → TaskRunner → Assembler spine (static plan, no mid-run LLM).
-5. **Interview = one skill: Problem Statement.** Success metrics become fields inside the PS template (what to increase/decrease + target). Data availability check is handled by the data-first creation gate + PS validation cross-checking fields against the bound catalog — not a separate tool.
-6. **Analytics focus = 4 tools:** descriptive, aggregate, correlation, trend. The other four composites (comparison, contribution, profile, segment) are **deprioritized, not deleted** — keep the code, just don't register them. If "comparison" returns later it should be a proper statistical **test**, not a generic compare.
-7. **`describe_source` merges into the listing tool** — one call returns sources *with* their schema/metadata, fewer tools for the planner.
-8. **Report = on-demand, button-triggered (not a chat skill).** A dedicated "Generate Report" button in the Analysis menu calls a **report API** (not the chat route): trigger generation for a session, list its versions, fetch a version. Renders from accumulated **AnalysisRecords + the Problem Statement** — never from chat history. Each report is a **persisted, versioned artifact**: generation snapshots the record IDs it used and bumps `report_status` to `V<n>`. (Owner: Rifqi, KM-644.)
-9. **Help = deterministic guide.** No LLM: read Analysis State → tell the user the next required step. Callable in any state.
-10. **Redis Cloud free tier, TTL = 1 hour**, env shared in the team group — for retrieval/query caching only, never for state.
-### Final tool set (~10)
-| Tool (canonical, verb-first) | Maps to (lineage) | Status |
-|---|---|---|
-| `check_knowledge` | new — list user's documents + metadata | done |
-| `check_data` | `list_sources` + `describe_source` merged (catalog-backed) | done |
-| `retrieve_knowledge` | `retrieve_documents` → `knowledge_retrieve` | done |
-| `retrieve_data` | `query_structured` → `data_retrieve` (tabular: file + DB, both working) | done |
-| `analyze_descriptive` | `src/tools/analytics/descriptive.py` | done |
-| `analyze_aggregate` | `src/tools/analytics/aggregation.py` | done |
-| `analyze_correlation` | `src/tools/analytics/relationship.py` | done |
-| `analyze_trend` | `src/tools/analytics/temporal.py` | done |
-| `problem_statement` | new — interview skill (**Harry**) | Harry |
-| `generate_report` | new — on-demand, versioned | to design |
-| `help` | new — deterministic state guide | to build |
-(`problem_statement` + `help` live at the orchestrator level; `generate_report` is **button-triggered via a dedicated report API**, not chat-routed (decision #8). The TaskRunner registry holds the 4 analytics + 4 data/knowledge tools. Unregister `analyze_comparison`, `analyze_contribution`, `analyze_profile`, `analyze_segment` from the planner-visible registry — keep the modules.)
----
-## 2. Ownership
-### Sofhia
-- [x] 4 analytics tools: trim registry to 4 active, tests still pass after deprioritizing the other four. (`KM-641`, commit `66e2e4d`)
-- [x] Data/knowledge tools: merge `describe_source` into `data_check`, rename `retrieve_documents` → `knowledge_retrieve`, `query_structured` → `data_retrieve`, build `knowledge_check`. (`KM-642` `c38c0c2`, `KM-643` `4bd5f1e`)
-- [ ] Co-design `generate_report` contract with Rifqi (Rifqi owns development, see §3).
-- [x] Tool matrix (see §4).
-### Rifqi
-- [x] **Redis Cloud free tier** (~30–50 MB): create instance, set TTL = 1 h, share env vars in the group. (done 12 Jun)
-- [x] **R5 cache fix**: chat cache key scoped by `user_id`, TTL 24h→1h (urgent on shared Redis). (12 Jun, commit `b701e95`)
-- [x] **AnalysisRecord contract gaps closed**: `stage` (CRISP-DM) now flows Task→TaskResult→TaskSummary so the report can group the method appendix; `AnalysisRecord` gained `record_id`/`analysis_id`/`user_id` identity fields. (12 Jun)
-- [x] **`analysis_records` table + real `AnalysisStore`**: `PostgresAnalysisStore` (save + `list_for_analysis`, never-throw) replaces `NullAnalysisStore`; wired into `ChatHandler`, `user_id` stamped at save. Satisfies the DoD "record persisted" step. (12 Jun)
-- [ ] **Own `generate_report` development — KM-644 "Report Generator"** (contract co-designed with Sofhia, see §3). Button-triggered via a dedicated **report API** (trigger / list versions / fetch); reads `analysis_records` + Problem Statement; persists a versioned report artifact, bumps `report_status`. *(record persistence done above; report API + persistence + renderer + contract doc next)*
-- [x] Verify planner tool list matches the trimmed registry (4 analytics + 4 data/knowledge) and few-shots don't reference removed tools. (verified 12 Jun — no stale tool names in `src/`)
-- ⚠️ **Blocked-on-Harry**: `analysis_id` is `NULL` on persisted records until the Analysis State reaches the slow path — need the session-ID handoff so `generate_report` can group records per analysis.
-### Shared (Sofhia + Rifqi)
-- [ ] `generate_report` design + skeleton: input = AnalysisRecords for the session + Problem Statement from Analysis State; output = versioned artifact; bumps `report_status`. Agree on the contract even if rendering is stubbed for Wednesday. (Development: Rifqi.)
-- [ ] `help` skill: deterministic — read Analysis State, return the next required step. Small, do it together or whoever finishes first.
-- [ ] Tool behavior smoke test end-to-end on an easy case (descriptive/aggregate path), per Harry's ask: "robust tools before agents."
-### Harry (dependencies — not ours, but we block on them)
-- `problem_statement` skill + PS template (incl. increase/decrease target fields).
-- Analysis State class + DB table, frontend analysis-builder step.
-- Merging our PRs (he auto-merges; he clones from latest after).
----
-## 3. Per-tool behavior contract (how to build each one)
-Harry's framing: for every tool, define **goal / trigger / input / process / output**, and behave like a Claude-style skill — if a required argument is missing, respond with a polite feedback message asking for it (e.g. table/column name), never guess silently.
-- **`check_knowledge`** — "what documents do I have?" → list documents with name, type, uploaded-at.
-- **`check_data`** — "what data do I have?" → sources (file + DB) with schema/metadata from the data catalog, created/uploaded timestamps.
-- **`retrieve_knowledge`** — RAG over uploaded documents; returns passages with source attribution.
-- **`retrieve_data`** — query tabular data (file + DB) via QueryIR; output consumable by the `analyze_*` tools.
-- **`analyze_*` (4)** — require valid table/column references; if missing or wrong, return actionable feedback instead of guessing.
-- **`generate_report`** — button-triggered via a dedicated report API (not chat-routed); on-demand only (never auto); post-pass gated; renders from AnalysisRecords + PS; persists a versioned artifact, snapshots record IDs, bumps version. (KM-644, Rifqi.)
-- **`help`** — no LLM; state → next step. Repeating it is fine, that's its job.
----
-## 4. Tool matrix (deliverable for the sync)
-Harry explicitly asked for a matrix covering every tool. Produce one sheet/markdown table with columns:
-`tool | goal | trigger (when the orchestrator calls it) | input | process | output | gated by interview_status? | status (done / in progress / planned)`
-Use the tool set table in §1 as the row list. This doubles as the presentation material on Wednesday.
----
-## 5. Day-by-day
-| Day | Target |
-|---|---|
-| **Thu 11** | Checkpoint meeting + task split with Harry. |
-| **Fri 12 (today)** | ✅ Registry trimmed to 4 analytics + few-shot synced (Sofhia, KM-641). ✅ Tool matrix built. ⏳ Redis Cloud + env share (Rifqi). |
-| **Mon 15** | Data/knowledge tools done (`data_check` merge, renames, `knowledge_check`). `generate_report` contract agreed. |
-| **Tue 16** | `help` skill done. `generate_report` skeleton wired to AnalysisRecord. Tool matrix drafted. End-to-end smoke test on the easy path. |
-| **Wed 17 (AM)** | Buffer: fix fallout, finalize matrix, rehearse the demo flow. |
-| **Wed 17 (PM)** | **Sync with Harry.** |
----
-## 6. Open questions to confirm with Harry on Wednesday
-1. **Gate scope.** Proposal: keep the fast path + exploration tools (`check_knowledge`, `check_data`, retrieves, `help`, arguably `descriptive`) available **pre-pass**; gate only the insight tools (correlation, trend, report). Hard-gating everything risks frustrating users who just want to look at their data.
-2. **Who flips `interview_status` to `pass`?** Proposal: a deterministic validator (PS template slots complete + fields cross-checked against the bound catalog) makes the call — the LLM conducts the conversation but never decides the pass. ("Conversational skin, deterministic skeleton.")
-3. **Skills vs spine — one sentence to lock in writing:** *"Skills are registry tools executed by the existing Planner → TaskRunner → Assembler spine; the Analysis State gate is a pre-check in the Orchestrator."* This keeps the new flow and the locked architecture fully compatible.
-4. `generate_report` invocation goes through the same gate (post-pass only) — confirm.
----
-## 7. Definition of done for Wednesday
-- [ ] All team PRs merged; Harry unblocked on the Analysis State class.
-- [ ] Registry exposes exactly 4 analytics + 4 data/knowledge tools, all passing local tests.
-- [ ] Redis Cloud shared and working locally for all three of us (TTL 1 h).
-- [ ] `help` works against a (possibly stubbed) Analysis State.
-- [ ] `generate_report` contract written; skeleton callable.
-- [ ] Tool matrix ready to present.
-- [ ] One end-to-end happy path runs: create analysis (with data) → blocked pre-pass → interview stub passes → descriptive/aggregate answer → record persisted.

DEV_PLAN.md ADDED Viewed

	@@ -0,0 +1,160 @@

+# Data Eyond — Current Development Plan (post 2026-06-24 meeting + 2026-06-25 checkpoint)
+**Purpose:** context file for Claude Code sessions working on the current sprint.
+**Branch:** `pr/4` · **Snapshot:** 2026-06-25.
+**Companion:** [REPO_STATUS.md](REPO_STATUS.md) describes the repo's *current built state*; this file
+describes the *in-flight plan* that changes it. New decisions from the 2026-06-25 checkpoint are in
+[§1.5](#15-2026-06-25-checkpoint-deltas).
+---
+## 1. The direction change (locked decisions from 2026-06-24)
+1. **"Problem statement" is replaced by two user-entered fields: `objective` + `business_questions`.**
+   User fills them at onboarding; **both mandatory to submit; NO agent validation.**
+2. The **gate (`problem_validated`) and the `problem_statement` skill/intent are removed** (comment out, don't delete).
+3. **Report is records-based** (reads persisted `AnalysisRecord`s) — **decided and pushed** (KM-674).
+   It is formal markdown: title, date, "generated by {user}", objective, business questions,
+   findings, insights. **NOT gated** on whether business questions were answered.
+4. **`owner_id` → `user_id`** everywhere (Harry mirrors in dedorch/Go).
+5. **State writes go through a request to Go**, not direct Python DB writes.
+6. **FE-callable surface = 4 endpoints:** `call_agent` (chat/stream), `list_skills` (`GET /tools`),
+   **skill: help**, **skill: report**. `problem_statement` removed; `check_data` not FE-facing from
+   Python (Go provides it); analysis CRUD not needed from FE (comment, don't delete).
+7. Deliverables for Harry: (a) API endpoint doc (MD); (b) full Python project doc (MD → PDF/Word BRD).
+8. Integration tested via Swagger `/docs` on the HF Python build (simulating FE manually). Target ~Wed.
+## 1.5. 2026-06-25 checkpoint deltas
+Confirms the 2026-06-24 direction and adds these concrete changes (folded into §4 as tasks 21–28):
+1. **Rename `analysis_records` → `report_inputs`** (DONE #21) — names the table by purpose (the rows
+   report generation reads); avoids clashing with Go's `analyses_messages` and with Langfuse
+   observability. **Stays Python-owned**; finalized schema handed to Harry so his dedorch migration
+   creates it post-`SKIP_INIT_DB` (#22, resolves #16). Write scope = **one row per slow-path analysis
+   run** (decided — not per-agent-call telemetry; that stays Langfuse).
+2. **`analyses` table (Go) — `status`, `data_bind` + `data_bind_version`, `report_collection`** (id+version).
+   **Verified 2026-06-25: these + `user_id` are ALREADY present in dedorch `analyses`.** Plus Harry drops
+   the duplicate/wrong singular `analysis` table. (→ #3)
+3. **`analyses_messages` (Go) = the analysis chat room** (user Q + agent A) — replaces the now-**deprecated**
+   `chat_messages`/`rooms`; Python's chat read/write must migrate here before cutover. (→ #25)
+4. **Reports: Go owns ALL writes.** Report stays a **skill** (no router intent): FE → Go → Python;
+   Python only returns content. Input = the records table (now `agent_observability`); edit-mode may
+   also need the last report. (→ #7/#18/#24)
+5. **Markdown minimum now:** tables, **bold**, *italic*, horizontal separators — optimize that before
+   anything fancier. (→ #23)
+6. **Deferred:** charts (prefer **Plotly→JSON** in a future `chart` table over matplotlib PNGs) and
+   images (image table keyed by analysis/message/report + originals in a bucket). (→ #26/#27)
+7. **Near-term:** the remove-`problem_statement` work isn't on HF yet → **PR + deploy + test in the
+   playground** (#13). Harry stabilizes Go ~Fri; FE manual testing ~Mon. **Keep it playground-able.**
+8. **UI research** (no dedicated UI person): new-analysis form (title/objective/business_questions),
+   knowledge menu (user-level vs analysis-level binding), report artifacts panel + version selector;
+   interview + old analysis UI removed. (→ #28)
+## 2. What is already done (KM-674, pushed on `pr/4`)
+Report layer adapted to the new goal shape:
+- `report/schemas.py::ProblemStatement` → `objective: str` + `business_questions: list[str]`
+  (old `target_value`/`scope`/`metric_direction`/`target_metric` dropped). Class name kept for now
+  (rename to `ReportGoal` once the upstream AnalysisState rename lands).
+- `report/generator.py` renders **Objective** + numbered **Business Questions** + a
+  **"generated by {user}"** line.
+- `api/v1/report.py::_problem_statement_from` is **tolerant**: prefers new `objective` /
+  `business_questions` from state, falls back to legacy `problem_statement` — works before AND after
+  Harry's migration.
+- `config/prompts/report_summary.md` updated to objective + business questions.
+- Report stays **records-based**; the floor gate (`problem_validated`) was deliberately left for task #2.
+**This tolerant-migration pattern (getattr fallback) is the model for tasks #2 and #4.**
+## 3. Assessment — gaps & contradictions to resolve before building
+These came out of reviewing the plan against the actual code. They are folded into the task table (§4) as tasks 15–19.
+- **G1 (→ task 15). Records-based reports need the slow path ON.** `AnalysisRecord`s persist only in
+  `chat_handler._run_slow_path`, which runs only when `ENABLE_SLOW_PATH=true`. Default is off → no
+  records → `POST /report` 409s. The Swagger demo can't show a non-empty report unless slow path is
+  flipped on and a `structured_flow` question is run first. `BusinessContext` is still a stub but the
+  slow path runs fine on it.
+- **G2 (→ task 16). `analysis_records` ownership is now required and collides with `SKIP_INIT_DB`.**
+  It's created today by Python `create_all` (`db/postgres/init_db.py`), is in no dedorch/Go migration,
+  and after the dedorch cutover (`SKIP_INIT_DB=true`) Python stops running `create_all` → the table
+  won't exist → reports break. Decide: dedorch migration (Harry) OR a Python carve-out that creates
+  just this one table even under `SKIP_INIT_DB`. *(Resolved 2026-06-25 — see §1.5.1 / #16 / #22.)*
+- **G3 (→ task 17). `chat_history` in the report contract is vestigial.** Records-based generation
+  reads records by `analysis_id`; it never uses chat history. Drop `chat_history` from the report
+  skill contract, or mark it reserved/unused.
+- **G4 (→ task 4 note). Make #2/#4 tolerant of both state shapes.** If Harry drops
+  `problem_validated`/`owner_id` from dedorch before Python stops reading them, Python's gate +
+  state_store break. Use the same `getattr` tolerance KM-674 used. The `owner_id`→`user_id` rename
+  also touches `api/v1/analysis.py` (`_serialize_state`, `list_analyses`, `get_analysis`), not just
+  the model + state_store.
+- **G5 (→ task 18). "State writes via Go" is bigger than `report_id`.** Python still writes state in
+  `/analysis/create` (state + room + bindings, plus the data-first gate and soon the mandatory-field
+  check) and in `state_store.ensure` per turn. If creation moves to Go (consistent with commenting
+  analysis CRUD), then Go owns ALL state writes + both creation gates, and Python's `ensure` must
+  become a **read-only get** (Go must guarantee the row exists before any chat turn).
+- **G6 (smaller).**
+  - Removing `problem_statement` (task 1) means neutering it in 4 places: the `Intent` literal
+    (`agents/orchestration.py`), the router prompt (`config/prompts/intent_router.md`), the handler,
+    and the gate's redirect *target*. Do it with task 2.
+  - "generated by {user}" currently prints the raw `user_id`; a formal report wants a name — source
+    from `users.fullname` or have Go pass a display name (task 19).
+  - The meeting's outline (background / EDA / insights) isn't fully in the renderer; map those
+    sections onto the record fields deliberately (task 5 follow-up).
+  - The full project doc (task 11) should reuse [REPO_STATUS.md](REPO_STATUS.md), not restart.
+## 4. Task table
+Status legend: ⬜ not started · 🔄 in progress · ✅ done · ⛔ blocked · 🔎 verify · ⏸️ deferred.
+| # | Task | Owner | Status | Note |
+|---|---|---|---|---|
+| 1 | Comment out `problem_statement` skill **+ `Intent` literal + router prompt + gate redirect target**; remove `/problem-statement` from `list_tools` | Rifqi | ✅ | Done 2026-06-25 (one commit w/ #2). Unwired in `orchestration.py`, `intent_router.md`, `chat_handler.py`, `tools.py`; `problem_statement.py` kept intact |
+| 2 | Drop `problem_validated`: gate neutered; `is_report_ready`/`report_floor` → **≥1 completed analysis** only, no-LLM | Rifqi | ✅ | Done 2026-06-25. `gate.py` no-op, gate call site commented in `chat_handler.py`, `report_floor` drops the goal check. Tests updated (`test_gate`/`test_chat_handler`/`test_readiness`). Suite: **284 passed, 7 skipped**; ruff clean |
+| 3 | dedorch `analyses` migration: drop `problem_statement`/`problem_validated`, add `objective` + `business_questions` | Harry | 🔄 | **Verified dedorch 2026-06-25:** `analyses` (plural) ALREADY has `user_id` + `status` + `data_bind` + `data_bind_version` + `report_collection` → those parts done. **Remaining:** drop `problem_statement`/`problem_validated` + add `objective`/`business_questions`. Singular `analysis` = deprecated duplicate to drop |
+| 4 | Update Python `analyses` model + `state_store` + `analysis.py` to match dedorch; `owner_id`→`user_id` | Rifqi/Sofhia | ✅ | Done 2026-06-26. `owner_id`→`user_id` + added `status`/`data_bind`/`data_bind_version`/`report_collection` (DB-only, not in the `AnalysisState` pydantic) across `models.py`/`gate.py`/`state_store.py`/`analysis.py` + 3 local tests; also `report_inputs` `id`/`analysis_id` → `uuid`. Kept `problem_statement`/`problem_validated`; `objective`/`business_questions` wait on Harry's #3. Suite **284 passed** |
+| 5 | Report generator → `objective`+`business_questions`, "generated by {user}", formal outline | Sofhia | ✅ | Goal-shape (KM-674) + author name (#19) + outline (KM-680): Objective → Business Questions → Executive Summary → Key Findings → EDA → Notes & Limitations → How This Was Analyzed |
+| 6 | Report skill input contract: `analysis_id` + `user_id` (no `chat_history`) | Sofhia/Rifqi | ✅ | No-op: `POST /report` already takes only analysis_id + user_id (records-based). Documented in API_ENDPOINTS.md §5. *(Edit-mode input revisited in #24.)* |
+| 7 | `report_id` state update via request to Go, not direct DB | Sofhia + Harry | ⬜ | Needs Go endpoint. **Checkpoint:** Go owns ALL `reports` writes; Python stops any direct insert/update and only returns content; report stays a **skill** (no intent). See #18 |
+| 8 | Expose/confirm 4 FE endpoints; comment `check_data` + analysis CRUD | Sofhia | ✅ | KM-678: `list_tools` trimmed to `/help` + `/report` (analytics/check/retrieve commented in the **menu**). `help` confirmed as a `call_agent` intent — no own endpoint. Analysis CRUD endpoint left **registered**: "comment the rest" was about the FE slash menu, not killing HTTP routes Go needs |
+| 9 | Verify `analysis_id` in `call_agent` contract | Sofhia | ✅ | Verified: no separate field — carried as `room_id` (`analysis_id == room_id`), per REPO_STATUS §4/§11. Action for Go: send the id as `room_id` |
+| 10 | API endpoint doc (MD), 4 endpoints, for Go integration | Rifqi + Sofhia | ✅ | Done 2026-06-25 — `API_ENDPOINTS.md` (repo root). 4 FE surfaces with request/response **examples** (chat SSE transcript, report 201/409 JSON, version list), schemas, §9 full 32-route inventory + task-8 reading |
+| 11 | Full Python project doc (MD → PDF/Word BRD) | Rifqi | ✅ | Done 2026-06-26 — `PROJECT_BRD.md` (repo root): purpose/context, FR-1..9 capabilities, lifecycle, architecture, data model, API (→ API_ENDPOINTS), NFRs, integrations, open items. Reuses REPO_STATUS/API_ENDPOINTS; convert to PDF/Word for distribution |
+| 12 | Reconcile/open the `list_tools` PR cleanly (stacked commits) | Rifqi | ✅ | N/A — we develop directly on the single active branch `pr/4` (KM-652 + KM-678 already stacked there); no separate PR to reconcile |
+| 13 | Deploy HF Python build (remove-`problem_statement` work) → test 4 endpoints via Swagger / playground | Sofhia + Harry | 🔄 | **Unblocked (#15 ✅).** Remove-PS work is on `pr/4` but **not on HF `main` yet** → PR + deploy, then manual test. Harry stabilizes Go ~Fri; FE testing ~Mon |
+| 14 | `analysis_records` home | Rifqi + Sofhia + lead | ✅ | **Resolved 2026-06-25:** stays Python-owned, **renamed** (→ #21); schema handed to Harry so the dedorch migration creates it post-cutover (→ #22). Not moved to Go |
+| 15 | Flip `ENABLE_SLOW_PATH=true` + verify an `AnalysisRecord` persists from a `structured_flow` question | Rifqi | ✅ | Verified locally 2026-06-25 (in-process). structured_flow on Titanic.csv → 3-task plan `check_data→retrieve_data→analyze_aggregate` (all success) → AnalysisRecord persisted (substantive) → `report_floor` pass → report generates (201). HF env-flip + Swagger run folds into #13 |
+| 16 | Decide `analysis_records` creation under `SKIP_INIT_DB` | Rifqi + Harry | ✅ | **Resolved 2026-06-25:** Python defines it; **Harry's dedorch migration creates it** on env-move (Python still creates locally meanwhile) → exists post-cutover. Execution = #22 |
+| 17 | Reconcile report contract with records-based: remove/flag `chat_history` | Sofhia/Rifqi | ✅ | Nothing to remove — `chat_history` was never in the report contract/code (only in help.md). Confirmed via grep; API_ENDPOINTS.md §5 documents the clean contract |
+| 18 | Confirm Go owns ALL analysis-state writes + both creation gates; make Python `state_store.ensure` read-only | Rifqi + Harry | ⬜ | **Confirmed by 2026-06-25 checkpoint** (Python read-only; Go owns writes + new tables). Execution pending Go endpoints |
+| 19 | Decide report author display-name source (`users.fullname` vs Go-passed name) | Sofhia | ✅ | Done 2026-06-25. `AnalysisReport.user_name`; `generator` renders `user_name or user_id`; `api/v1/report.py::_resolve_user_name` reads `users.fullname` never-throw (fallback `user_id`). Decided: resolve in Python (unblocked); swap to Go-passed name later if preferred |
+| 20 | **Help handoff:** update `handlers/help.py` + `help.md` — drop the `problem_validated` tier + `define_problem_statement` action (the skill it points at is gone as of #1) | Sofhia | ✅ | Done 2026-06-25. `help.py`: actions = `ask_analysis_question` (always) + `generate_report` (if ready); renders objective/business_questions (getattr-tolerant). `help.md` v1→v2: 3 tiers, no `/problem_statement`, `/generate report`→`/report`. Local test_help updated → 11 pass |
+| 21 | Rename `analysis_records` → **`report_inputs`** (table, ORM `ReportInputRow`, store `*ReportInputStore`) | Rifqi | ✅ | Done 2026-06-26. `sed` rename across 9 files; Pydantic `AnalysisRecord` kept; columns stay String (pure rename — uuid+FK is the #22 Harry schema). Name `report_inputs` (purpose; avoids Langfuse/`analyses_messages` clash). Write scope = one row per slow-path run. Suite **284 passed** |
+| 22 | Finalize `report_inputs` schema → hand to Harry for the dedorch migration | Rifqi → Harry | 🔄 | **DDL ready** (uuid `id`/`analysis_id` + FK→`analyses(id)`; `user_id`/`plan_id` text; `data` jsonb = serialized `AnalysisRecord`, shape documented). dedorch has empty `analysis_records` → rename. Resolves #16. **Action: send Harry the DDL + `data` shape** |
+| 23 | Report markdown formatting: tables, **bold**, *italic*, horizontal separators | Sofhia | ✅ | Done 2026-06-25. Added `---` separators between header + each section in `_render_markdown`. Tables (EDA) / bold (method labels) / italic (meta + citations) already emitted. Relaxed `report_summary.md` to allow inline `**bold**`/`*italic*` for emphasis (kept no-headings/no-bullets so it doesn't duplicate the section structure / Key Findings). Compile + ruff clean |
+| 24 | Clarify report input contract: records table (+ `last_report` for edit mode?) | Rifqi/Sofhia ↔ Harry | ⬜ new | Edit-mode input left open at the checkpoint |
+| 25 | Migrate Python chat path to Go `analyses_messages` (+ `analyses`) | Rifqi ↔ Harry | ⬜ | **Bigger than "confirm" (verified 2026-06-25):** dedorch `rooms` + `chat_messages` are **deprecated** (`zdeprecated_*`). Python's `Room`/`ChatMessage` models + `chat.py` `load_history`/`save_messages` target them → **break post-cutover**. Move history read/write to `analyses_messages` before the conn-string cutover |
+| 26 | **Charts (DEFERRED):** store Plotly JSON in a future `chart` table (not matplotlib PNG) | — | ⏸️ | After the markdown path is done end-to-end |
+| 27 | **Images (DEFERRED):** image table (id, analysis_id, msg/report ref, order) + originals in a bucket | — | ⏸️ | Maintenance-heavy; parked |
+| 28 | **UI research** (FE): new-analysis form, knowledge menu (user vs analysis level), report artifacts + version selector | Team | ⬜ new | No dedicated UI person; interview + old analysis UI removed |
+## 5. Critical path & sequencing
+- **Critical path:** #22 (send Harry the `report_inputs` schema). HF deploy (#13) for the playground. (#4 ✅, #21 ✅; Harry's #3 no longer blocks us — Python is getattr-tolerant.)
+- **Parallelizable now:** #22 (handoff). (#4 ✅, #11 ✅ done.)
+- **Harry-blocked / coordinated:** #3 (now 🔄, blocks #4), #7 (Go endpoint), #18 (Go state ownership), #24 (contract). **#25 = chat-path migration to `analyses_messages` — a cutover blocker.**
+- **Demo gate (playground, #13):** deploy the remove-`problem_statement` work to HF — slow path (#15 ✅)
+  and the report path are verified locally, and #16 is resolved (#22 hands Harry the schema). **Keep it
+  playground-able.**
+## 6. Decisions still open (need the team / Harry / lead)
+- ~~`analysis_records`: dedorch-owned vs Python-owned (#16/#14).~~ RESOLVED: Python-owned + renamed **`report_inputs`** (#21 done); Harry's migration creates it (#22).
+- ~~Whether `help` is its own endpoint or via `call_agent` (#8).~~ RESOLVED: `help` is a `call_agent` intent (no own endpoint).
+- ~~Author display-name source for the report (#19).~~ RESOLVED: Python resolves `users.fullname` (fallback `user_id`); swap to a Go-passed name later if preferred.
+- ~~Keep vs drop `chat_history` in the report contract (#17).~~ RESOLVED: never in the contract; report is records-based (analysis_id + user_id only).
+- Confirm Go takes over analysis creation + both creation gates (data-first + mandatory fields) (#18).
+- **Report input for edit mode** — does Python need the last report content? (#24)
+- ~~`report_inputs` write scope — every agent call vs slow-path-only? (#21)~~ RESOLVED: one row per slow-path run (telemetry stays Langfuse).
+- **Python history source** — confirm Go's `analysis_message` (#25).

PHASE1_TO_PHASE2_REPORT.md DELETED Viewed

@@ -1,273 +0,0 @@
-# Phase 1 → Phase 2 Migration Report
-A walkthrough of what changed between the original retrieval-style backend (Phase 1) and the current catalog-driven backend (Phase 2). Intended as a hand-off for the lead.
----
-## 1. The conceptual change
-**Phase 1** was a single retrieval-style RAG pipeline. Every question — whether it pointed at a database, a spreadsheet, or a PDF — went through the same primitive: **chunk + embed + top-K** over PGVector. Schema and tabular columns were embedded as chunks and ranked alongside prose. When the question needed SQL, the LLM **wrote the SQL string directly** (via `query_executor`).
-**Phase 2** splits the system into two paths governed by an LLM router:
-| Path | Primitive | Why |
-|---|---|---|
-| Unstructured (PDF / DOCX / TXT) | Dense similarity over prose chunks (PGVector) | Right primitive for free text |
-| Structured (DB / CSV / XLSX / Parquet) | **Per-user data catalog** → LLM emits a **JSON IR** of intent → deterministic **compiler** → **executor** (SQL or pandas) | A column lookup shouldn't go through a similarity ranking lottery; the LLM emits intent, never SQL syntax |
-Three explicit LLM call sites only:
-1. **Intent router** (classifies the user message into `chat` / `unstructured` / `structured`)
-2. **Query planner** (turns the question + catalog into a Pydantic-validated `QueryIR`)
-3. **Chatbot agent** (formats the final answer, streamed over SSE)
-Everything else — IR validation, SQL/pandas compilation, execution — is deterministic Python.
----
-## 2. File-by-file changes
-### 2.1 Deleted (Phase 1 only)
-| Phase 1 path | Reason it was removed |
-|---|---|
-| `src/rag/base.py`, `src/rag/retriever.py`, `src/rag/router.py` | Replaced by `src/retrieval/` |
-| `src/rag/retrievers/baseline.py`, `schema.py`, `document.py` | Schema retrieval gone (catalog replaces it); document retriever rewritten in `src/retrieval/document.py` |
-| `src/tools/search.py` (whole `tools/` folder) | Only consumer was `rag/router.py` |
-| `src/query/base.py` | Duplicate of `query/executor/base.py` |
-| `src/query/query_executor.py` | Replaced by `src/query/service.py` |
-| `src/query/executors/db_executor.py` | Replaced by `src/query/executor/db.py` |
-| `src/query/executors/tabular.py` | Replaced by `src/query/executor/tabular.py` |
-| `src/agents/chatbot.py` (Phase 1 LangChain chatbot) | Phase 2 `ChatbotAgent` lives at the same path now — see §2.2 |
-| `src/api/v1/knowledge.py` | Fake `/knowledge/rebuild` endpoint, never wired |
-| `src/config/agents/system_prompt.md`, `guardrails_prompt.md` | Replaced by `src/config/prompts/{chatbot_system,guardrails}.md` |
-| `src/models/structured_output.py` (`IntentClassification`) | Replaced by `IntentRouterDecision` Pydantic model inside `agents/orchestration.py` |
-| `src/models/sql_query.py` | LLM no longer emits SQL; IR replaces it |
-| `src/pipeline/orchestrator.py` (empty stub) | Redundant — `StructuredPipeline` takes the introspector at `run()` time |
-### 2.2 Renamed / moved (same role, new home)
-| Phase 1 location | Phase 2 location | Notes |
-|---|---|---|
-| `src/agents/chatbot.py` (Phase 1) → deleted, then `src/agents/answer_agent.py` (`AnswerAgent`) → renamed | `src/agents/chatbot.py::ChatbotAgent` | Final answer formation; streams via `astream` |
-| `src/knowledge/parquet_service.py` | `src/storage/parquet.py` | Parquet upload/download helper |
-| `src/pipeline/document_pipeline/document_pipeline.py` (folder) | `src/pipeline/document_pipeline.py` (flat) | Single module |
-| `src/rag/retrievers/document.py` | `src/retrieval/document.py` | `DocumentRetriever` migrated; tabular file types filtered out of results. **Post-report update (mentor commit 61c746f, 2026-05-20):** rewritten to raw SQL (pgvector `<=>` cosine, `<+>` manhattan only) to dodge asyncpg type-mapping issues with the Go-ingested schema. MMR / euclidean / inner_product dropped. |
-| `src/rag/router.py` | `src/retrieval/router.py` | `RetrievalRouter`, Redis-cached, unstructured-only; dead `db: AsyncSession` + `source_hint` params removed |
-| `src/rag/base.py` (`RetrievalResult`, `BaseRetriever`) | `src/retrieval/base.py` | Same dataclass + ABC |
-> **Heads-up on the intent router**: the Phase 1 file `src/agents/orchestration.py` and its class `OrchestratorAgent` were **kept in place** for Phase 2 — but the body was fully rewritten. The class now emits `IntentRouterDecision(needs_search, source_hint ∈ {chat, unstructured, structured}, rewritten_query)`. The prompt file and test file use the `intent_router` name (`config/prompts/intent_router.md`, `tests/agents/test_intent_router.py`), but **the source module is still `orchestration.py` and the class is still `OrchestratorAgent`**. Existing imports continue to work; only the behavior changed.
-### 2.3 Added (Phase 2 new)
-**Catalog subsystem (whole new concept)**
-| Path | Role |
-|---|---|
-| `src/catalog/models.py` | Pydantic: `Catalog → Source[] → Table[] → Column[]`, `ForeignKey`, `ColumnStats.top_values` |
-| `src/catalog/introspect/base.py` | `BaseIntrospector` ABC |
-| `src/catalog/introspect/database.py` | DB introspector — wraps Phase 1 `db_pipeline/extractor.py` (`get_schema`, `profile_column`, `get_row_count`) |
-| `src/catalog/introspect/tabular.py` | CSV / XLSX / Parquet introspector — one `Table` per XLSX sheet |
-| `src/catalog/render.py` | Renders a `Source` for the planner prompt |
-| `src/catalog/validator.py` | Unique-ID + foreign-key-ref invariants |
-| `src/catalog/store.py` | Postgres `jsonb` upsert keyed by `user_id` (table `data_catalog`) |
-| `src/catalog/reader.py` | Loads + filters catalog by `source_hint` |
-| `src/catalog/pii_detector.py` | Flags PII columns at ingestion → suppresses `sample_values` |
-| `src/security/pii_patterns.py` | Name patterns + value regex used by the detector |
-**JSON IR + query subsystem**
-| Path | Role |
-|---|---|
-| `src/query/ir/models.py` | `QueryIR` Pydantic schema |
-| `src/query/ir/operators.py` | `ALLOWED_FILTER_OPS`, `ALLOWED_AGG_FNS`, `LIMIT_HARD_CAP`, `TYPE_COMPATIBILITY` |
-| `src/query/ir/validator.py` | Catalog-aware IR validation (rejects unknown column ids, bad ops, type mismatches, oversize limits) |
-| `src/query/planner/service.py` | `QueryPlannerService.plan(question, catalog, previous_error)` — Azure OpenAI structured output → `QueryIR` |
-| `src/query/planner/prompt.py` | Builds the planner prompt from catalog text |
-| `src/query/compiler/base.py` | Compiler ABC |
-| `src/query/compiler/sql.py` | `SqlCompiler` (Postgres) — all 12 filter ops, params as a dict |
-| `src/query/compiler/pandas.py` | `PandasCompiler` — returns `CompiledPandas(apply, output_columns)` |
-| `src/query/executor/base.py` | `BaseExecutor` + `QueryResult` |
-| `src/query/executor/db.py` | `DbExecutor` — sqlglot SELECT-only guard, RO txn, 30 s `statement_timeout`, 10 k row cap |
-| `src/query/executor/tabular.py` | `TabularExecutor` — Parquet via blob, `asyncio.to_thread`, 10 k cap |
-| `src/query/executor/dispatcher.py` | `ExecutorDispatcher.pick(ir)` — picks by `source.source_type` |
-| `src/query/service.py` | `QueryService.run(user_id, question, catalog)` — plan → validate → retry (max 3) → dispatch → execute |
-**Agents**
-| Path | Role |
-|---|---|
-| `src/agents/orchestration.py` | `OrchestratorAgent` — Phase 1 file/class name preserved; Phase 2 body. Emits `IntentRouterDecision` |
-| `src/agents/chatbot.py` | `ChatbotAgent` — formerly `AnswerAgent` in `agents/answer_agent.py`; renamed in Cleanup PR |
-| `src/agents/chat_handler.py` | `ChatHandler.handle(...)` — top-level orchestrator; yields `intent` / `chunk` / `done` / `error` SSE events |
-**Pipelines & API**
-| Path | Role |
-|---|---|
-| `src/pipeline/structured_pipeline.py` | DB / tabular ingestion: introspect → merge → validate → upsert |
-| `src/pipeline/triggers.py` | `on_db_registered`, `on_tabular_uploaded`, `on_document_uploaded`, `on_catalog_rebuild_requested` |
-| `src/api/v1/data_catalog.py` | `GET /api/v1/data-catalog/{user_id}` + `POST /api/v1/data-catalog/rebuild` |
-| `src/models/api/catalog.py` | Catalog request/response models |
-| `src/config/prompts/intent_router.md`, `query_planner.md`, `chatbot_system.md`, `guardrails.md` | New prompts. `guardrails.md` is appended to `chatbot_system.md` at load time |
-| `src/db/postgres/models.py` (added `Catalog` SQLAlchemy class) | Stores the per-user jsonb document in `data_catalog` |
-### 2.4 Rewired API endpoints
-| Endpoint | Phase 1 wiring | Phase 2 wiring |
-|---|---|---|
-| `POST /api/v1/chat/stream` | Inline in `chat.py`: `OrchestratorAgent` → `retriever` → `query_executor` → `chatbot` | Delegates to `ChatHandler.handle()`. Redis cache, fast intent, history load, and message persistence stay in the endpoint |
-| `POST /api/v1/database-clients/{id}/ingest` | Called `db_pipeline_service.run()` and dual-wrote vectors | Calls **only** `on_db_registered` (catalog build). Failure → HTTP 500 |
-| `POST /api/v1/document/process` | Always pushed to vector store | PDF/DOCX/TXT → `knowledge_processor` (vectors); CSV/XLSX → `on_tabular_uploaded` (catalog only, **no vector embedding**) |
-| `POST /api/v1/document/upload` | Storage + DB row | Same, plus `on_document_uploaded` trigger |
-| `POST /api/v1/data-catalog/rebuild` | — | New: iterates all sources, re-runs per-source trigger |
-| `GET /api/v1/data-catalog/{user_id}` | — | New: returns `list[CatalogIndexEntry]` |
-### 2.5 Phase 1 files still in production use
-These were **not rewritten** — Phase 2 imports them directly:
-- `src/database_client/database_client_service.py`
-- `src/utils/db_credential_encryption.py` (`decrypt_credentials_dict`) — `src/security/credentials.py` is still a stub
-- `src/pipeline/db_pipeline/db_pipeline_service.py` (`engine_scope` context manager — used by both the introspector and `DbExecutor`)
-- `src/pipeline/db_pipeline/extractor.py` (`get_schema`, `profile_column`, `get_row_count`)
-- `src/knowledge/processing_service.py` (PDF / DOCX / TXT extraction + embedding)
-- `src/db/postgres/{connection,init_db,vector_store}.py`, `src/storage/az_blob/`, `src/middlewares/`, `src/security/auth.py`
----
-## 3. End-to-end flow (current state)
-### 3.1 Ingestion
-```
-User action                Pipeline                                Storage
-──────────────             ────────────────────────────             ─────────────────
-upload PDF/DOCX/TXT  →   DocumentPipeline                       →  Azure Blob + PGVector
-                          (extract → chunk → embed)                 (table: langchain_pg_embedding)
-                          + on_document_uploaded                    + retrieval cache invalidate
-upload CSV/XLSX     →    TabularIntrospector                   →  Azure Blob (Parquet)
-                          (sheets / columns + sample + stats)       + data_catalog jsonb row
-                          → CatalogValidator → CatalogStore         (NO vector store — catalog only)
-                          via on_tabular_uploaded
-register DB         →    DatabaseIntrospector                  →  data_catalog jsonb row
-                          (information_schema + sample + FKs)
-                          → validate → store
-                          via on_db_registered
-```
-### 3.2 Query (per user message → SSE stream)
-```
-POST /api/v1/chat/stream
-        │
-        ├── Redis cache check (24h TTL) — hit returns cached stream
-        ├── _fast_intent (greetings / goodbyes) — bypass LLM
-        ├── load history from chat_messages
-        │
-        └── ChatHandler.handle(message, user_id, history)         [src/agents/chat_handler.py]
-                │
-                ├─ OrchestratorAgent.classify()                    [agents/orchestration.py]
-                │     → needs_search, source_hint, rewritten_query
-                │
-                ├── source_hint == "chat"
-                │     → ChatbotAgent.astream() → yield chunk events
-                │
-                ├── source_hint == "unstructured"
-                │     → RetrievalRouter.retrieve()                 [retrieval/router.py, Redis-cached]
-                │         → DocumentRetriever (raw SQL: pgvector `<=>` cosine or `<+>` manhattan)
-                │     → ChatbotAgent.astream(chunks=...)
-                │
-                └── source_hint == "structured"
-                      → CatalogReader.read(user_id, "structured")  [catalog/reader.py]
-                      → QueryService.run(user_id, question, catalog)   [query/service.py]
-                           │
-                           ├─ QueryPlannerService.plan(...)        [query/planner/service.py]
-                           │    LLM(catalog, question, prev_error?) → QueryIR
-                           │
-                           ├─ IRValidator.validate(ir, catalog)    [query/ir/validator.py]
-                           │    fail → loop back to planner with error context (max 3)
-                           │
-                           ├─ ExecutorDispatcher.pick(ir)          [query/executor/dispatcher.py]
-                           │    schema source  → DbExecutor
-                           │    tabular source → TabularExecutor
-                           │
-                           ├─ DbExecutor.run(ir):                  [query/executor/db.py]
-                           │    SqlCompiler → (sql, params)
-                           │      → sqlglot SELECT-only guard
-                           │      → engine_scope (Phase 1 utility) in asyncio.to_thread
-                           │      → RO txn + statement_timeout=30s + 10k cap
-                           │
-                           ├─ TabularExecutor.run(ir):             [query/executor/tabular.py]
-                           │    resolve Parquet blob path
-                           │      → download → PandasCompiler.apply(df)
-                           │      → asyncio.to_thread → 10k cap
-                           │
-                           └─ QueryResult { rows, columns, row_count,
-                                            truncated, source_id, error?, elapsed_ms }
-                      →
-                      ChatbotAgent.astream(query_result=...)
-                            → yield chunk events
-                │
-                └── final events: done / error
-        │
-        └── persist user + assistant messages to chat_messages
-        └── populate Redis cache
-```
-**Safety invariants for the structured path** (read-only at every layer):
-1. IR validated against the catalog before reaching the compiler
-2. Identifiers come from the catalog (trusted; inlined as quoted identifiers)
-3. Values from `IR.filters` are always parameterized
-4. Compiler is deterministic — no LLM in the hot path
-5. sqlglot rejects anything that isn't a pure SELECT
-6. DB connection is read-only with a 30 s `statement_timeout`
-7. Hard 10 000 row cap on both executors; neither raises — errors go in `QueryResult.error`
----
-## 4. Summary table for review
-| Concern | Phase 1 — where it lived | Phase 2 — where it lives | Change type |
-|---|---|---|---|
-| Intent classification | `agents/orchestration.py::OrchestratorAgent` (free-text intent) | **Same path + same class name** — body rewritten to emit `IntentRouterDecision` | Body rewrite only |
-| Top-level chat orchestration | Inline in `api/v1/chat.py` | `agents/chat_handler.py::ChatHandler` | Extracted to a reusable module |
-| Final answer formation | `agents/chatbot.py` (Phase 1 LangChain) | `agents/chatbot.py::ChatbotAgent` (was `AnswerAgent` in `answer_agent.py` mid-cycle) | Rewritten + renamed |
-| Schema retrieval (DB / tabular) | `rag/retrievers/schema.py` + PGVector chunks | **Removed**. Replaced by catalog (`catalog/store.py` jsonb) loaded verbatim into planner prompt | Whole concept replaced |
-| Doc retrieval (PDF / DOCX / TXT) | `rag/retrievers/document.py`, `rag/router.py` | `retrieval/document.py`, `retrieval/router.py` | Moved; Redis cache restored; tabular files filtered. **Post-report update:** rewritten to raw SQL (cosine / manhattan only); collection renamed `document_embeddings` → `documents` to match the Go ingestion service. |
-| Query writing | `query/query_executor.py` + `models/sql_query.py` (LLM writes SQL) | `query/planner/service.py` (LLM writes IR) + `query/compiler/sql.py` (deterministic) | LLM emits intent, not SQL |
-| DB execution | `query/executors/db_executor.py` | `query/executor/db.py::DbExecutor` | Folder renamed (`executors` → `executor`); sqlglot guard + RO txn + 30 s timeout kept |
-| Tabular execution | `query/executors/tabular.py` | `query/executor/tabular.py::TabularExecutor` | Parquet-only; pandas compiler split out |
-| Executor selection | Hard-coded in `query_executor.py` | `query/executor/dispatcher.py::ExecutorDispatcher` | New; routes by `source.source_type` |
-| Catalog (NEW) | — | `catalog/` (models, introspect/, validator, store, reader, pii_detector, render) | New subsystem |
-| Catalog persistence | (data was embedded in PGVector) | Postgres jsonb table `data_catalog`, keyed by `user_id` | New table |
-| Ingestion triggers | Inline in API endpoints | `pipeline/triggers.py` (`on_db_registered`, `on_tabular_uploaded`, `on_document_uploaded`, `on_catalog_rebuild_requested`) | Centralized event entry points |
-| Structured pipeline | `pipeline/db_pipeline/db_pipeline_service.py` (still present for `engine_scope` + extractor reuse) | `pipeline/structured_pipeline.py` (orchestrator) — reuses Phase 1 extractor | New orchestrator wraps Phase 1 introspection helpers |
-| Document pipeline | `pipeline/document_pipeline/document_pipeline.py` (folder) | `pipeline/document_pipeline.py` (file) | Flattened; CSV / XLSX now skip the vector store |
-| Parquet helper | `knowledge/parquet_service.py` | `storage/parquet.py` | Moved into `storage/` |
-| Prompts | `config/agents/system_prompt.md`, `guardrails_prompt.md` | `config/prompts/{intent_router,query_planner,chatbot_system,guardrails}.md` | Folder renamed; split into four files; guardrails appended to `chatbot_system` at load |
-| PII detection | — | `catalog/pii_detector.py` + `security/pii_patterns.py` | New. Columns flagged `pii_flag=true` get `sample_values: null` so PII never enters prompts |
-| Chat endpoint | `api/v1/chat.py` (does everything inline) | `api/v1/chat.py` (cache + history + persistence) → delegates to `ChatHandler` | Slimmed; SSE event shape is `intent` / `chunk` / `done` / `error` |
-| DB ingest endpoint | `api/v1/db_client.py::ingest` (Phase 1 `db_pipeline_service.run()`) | `api/v1/db_client.py::ingest` (calls `on_db_registered` only) | Phase 1 dual-write removed |
-| Document process endpoint | `api/v1/document.py::process` (always vectorize) | `api/v1/document.py::process` (PDF/DOCX/TXT → vectors; CSV/XLSX → catalog via `on_tabular_uploaded`) | Routing by file type |
-| Catalog management API | — | `api/v1/data_catalog.py` (GET index + POST rebuild) | New |
-**Bottom line.** Every Phase 1 file under `src/rag/`, `src/tools/`, `src/query/executors/`, `src/query/query_executor.py`, `src/query/base.py`, `src/api/v1/knowledge.py`, and `src/config/agents/` is gone. Phase 1 introspection helpers under `src/pipeline/db_pipeline/` and `src/database_client/` are still imported by Phase 2 — they were not rewritten, just wrapped. The three LLM call sites are now explicit and the SQL-writing one no longer exists; the planner emits a Pydantic-validated `QueryIR` instead.
-The one filename gotcha to remember: the **intent router** still lives at `src/agents/orchestration.py` as class `OrchestratorAgent` (Phase 1 name kept for import-site compatibility, Phase 2 body). The matching prompt and tests use the `intent_router` name, but the source module does not.
----
-## 5. Addendum — post-report changes (2026-05-20, mentor commit `61c746f`)
-This report was originally written as a snapshot at Phase 2 completion. The Phase 2 architecture itself hasn't changed, but a few implementation details have shifted as the Go migration progresses. Captured here so the report stays trustworthy:
-- **Doc ingestion is now a Go service.** PDF/DOCX/TXT chunking + embedding + writes into PGVector are no longer done by Python. The Python service reads only.
-- **PGVector collection renamed:** `document_embeddings` → `documents` (to match the Go service's writes). Touched files: `db/postgres/vector_store.py`, `retrieval/document.py`.
-- **`DocumentRetriever` rewritten to raw SQL.** Uses pgvector operators directly (`<=>` cosine, `<+>` manhattan). The LangChain ORM path couldn't cope with the schema written by the Go service (asyncpg type-mapping issues — id String vs UUID, jsonb_path_match binding quirks). MMR / euclidean / inner_product were dropped as part of the rewrite.
-- **Intent router defaults flipped.** Ambiguous topical/knowledge questions now prefer `unstructured` (was `structured`). Indonesian few-shot examples added to the prompt.
-- **Cache management endpoints added:** `DELETE /api/v1/chat/cache`, `DELETE /api/v1/chat/cache/room/{id}`, `DELETE /api/v1/retrieval/cache/{user_id}`. Redis chat cache now stores `{response, sources}` (was just `response`) so cached replies repopulate `message_sources`.
-- **Direction.** The long-term split is **Python = agent/ML layer, Go = data plane**. More pieces are expected to follow doc ingestion out of Python.

PROGRESS.md DELETED Viewed

@@ -1,692 +0,0 @@
-# Progress — Phase 2 catalog-driven build
-Persistent tracker mirroring the 42-item ownership table in `REPO_CONTEXT.md` "Team — division of work". Update as PRs land. Future Claude Code sessions read this to know what's already done.
-**Last updated**: 2026-06-12 (Redis Cloud live; R3 closed as won't-do; R5 cache fix; AnalysisRecord persistence landed — `PostgresAnalysisStore` + `analysis_records` table)
-**Current open PR**: `pr/3` — active.
----
-## What just shipped (2026-06-12 — AnalysisRecord persistence, Rifqi)
-Groundwork for `generate_report`. The slow path now persists a real, citable
-record; the report (next) renders from it.
-- **Contract gaps closed** (`agents/slow_path/schemas.py`): `stage: CrispStage`
-  added to `TaskResult` + `TaskSummary` and populated at all 3 `TaskResult` build
-  sites in `task_runner.py` + copied in `assembler._build_record` — so the report
-  can group its method appendix by CRISP-DM phase. `AnalysisRecord` gained identity:
-  `record_id` (auto uuid), `analysis_id`/`user_id` (optional; stamped at persist).
-- **Real store** (`agents/slow_path/store.py`): `PostgresAnalysisStore` —
-  `save()` (never-throw, idempotent upsert) + `list_for_analysis()` (oldest-first,
-  the report's render order). `NullAnalysisStore` kept (tests / disabled persistence).
-  `AnalysisStore` Protocol gained `list_for_analysis`.
-- **Table** (`db/postgres/models.py`): `analysis_records` jsonb table (one row per
-  run, indexed by `analysis_id` + `user_id`); registered in `init_db.py`, created by
-  `create_all` on startup (no migration — `data_catalog` precedent).
-- **Wired** (`agents/chat_handler.py`): default store flipped to `PostgresAnalysisStore`;
-  `user_id` stamped onto the record at the save site (in scope there).
-- **Open**: `analysis_id` is `NULL` until Harry's Analysis State reaches the slow
-  path (session-ID handoff needed to group records per analysis).
----
-## Principal architecture review (2026-06-10) — findings + fix tracker
-A full external review (read the context docs + the slow path, tool layer, query
-spine, catalog plumbing, chat endpoint, config/connection layers) landed. It confirmed
-the DB-latency diagnosis and surfaced several gaps **not previously tracked here**.
-Verified against code before logging. Severity: **critical** / important / nice-to-have.
-**Runtime / latency (the original problem):**
-- DB connection handling is the anomaly, NOT cold start. `DbExecutor._run_sync`
-  (`db.py:192`) → `engine_scope` does `create_engine → connect (TCP+TLS+SCRAM) → 2×SET
-  → dispose` on EVERY query. Measured ~6–8s for 60 rows; a 2nd warm-session query was
-  still ~6.6s → per-call handshake, never amortized. `engine_scope`'s connect-once-dispose
-  semantics were designed for the ingestion pipeline and wrongly inherited by the query path.
-- `describe_source` ~3.5s is **planner-induced waste**: every few-shot (`examples.py`)
-  opens with a `describe_source` task, so the LLM always plans a tool that re-reads from
-  the catalog DB the same catalog already rendered into its prompt. Its impl does 2
-  sequential full-catalog reads (`data_access.py:127-128`). Total catalog reads/request ~5×.
-- Azure LLM clients rebuilt per request: `ChatHandler(enable_tracing=True)` is constructed
-  per request (`chat.py:172`) → fresh Orchestrator/Chatbot → fresh AzureChatOpenAI → fresh
-  TLS to Azure each call. Planner/Assembler correctly use module singletons; the other two don't.
-- Tokens (~13k/request) are NORMAL for this design — do not optimize for $.
-- **Reject the scheduled DB-warmer idea**: targets cold start (~1.8s slice) not the per-call
-  handshake, keeps serverless user DBs awake 24/7 (their compute bill), and decrypts every
-  tenant's creds on a cron (attack surface). Strictly dominated by an engine cache +
-  request-scoped pre-connect.
-**Fix tracker (new):**
-| # | Fix | Severity | Owner | Status |
-|---|---|---|---|---|
-| R1 | **AuthN/AuthZ** on data endpoints — reject body-supplied `user_id`/`room_id`, derive identity from a verified token. `/chat/stream` has none (`chat.py:40,128`); tenant isolation is client honesty. **CORRECTION to the review:** `security/auth.py` is a STUB (all `NotImplementedError`); the real JWT impl lives in `src/users/users.py` (`encode_jwt`/`decode_jwt`, HS, env-keyed) **but is unused** — `/login` (`api/v1/users.py`) returns the user profile as plain JSON and mints NO token. So R1 is cross-team: (1) `/login` must issue a JWT, (2) frontend must send it as `Bearer`, (3) data endpoints validate it. **Gates the engine-cache work (DB2).** | **critical** | DB/B + frontend | `[ ]` |
-| R2 | **Always compile a LIMIT** — `sql.py` now emits a bound for every query: explicit limit honored (clamped to `MAX_RESULT_ROWS=10000`), unbounded queries get `LIMIT cap+1` so an unbounded SELECT can't stream a whole table into memory. `CompiledSql.row_cap` carries the cap; `DbExecutor` caps + flags truncation from it (dropped its own `_ROW_HARD_CAP`). Tests updated (`test_sql.py`, +3 cases); `S608` restored to `tests/**` ruff ignore (was dropped). | **critical** | DB | `[x]` |
-| R3 | **Commit `tests/` + minimal CI** — `tests/` is gitignored; the 200+ tests cited as done exist only on laptops (already caused rename rot). ~~GitHub origin carries tests; HF Space gets the Docker build.~~ **2026-06-12: team decided tests stay gitignored/local — closed as won't-do.** | **critical (process)** | shared | `[won't do]` |
-| DB1 | **In-memory `describe_source`** (request-scoped `MemoizingCatalogReader`, `reader.py`) + **LLM-client hoist** (shared module-level `ChatHandler` in `chat.py`). Measured live: `describe_source` 3.5s→~2.0s (structured read now served from the planner's cached snapshot; only the unstructured read remains a round-trip), catalog reads/request ~5→~2. External `query_structured` handshake unchanged (DB2's job) so total slow path is ~flat until DB2. Tests: `tests/catalog/test_reader.py`. | important | agent | `[x]` |
-| DB2 | **Keyed engine cache** — `src/database_client/engine.py::UserEngineCache` (process singleton): pooled engines keyed by `client_id + creds-hash` (rotation auto-invalidates), bounded LRU (50) + 600s idle TTL, `pool_pre_ping` + `pool_recycle=300`. `DbExecutor._run_sync` reuses the warm connection instead of `create_engine→connect→dispose` per query (postgres/supabase only; other db_types keep the legacy path — no regression). **Live-measured: warm `query_structured` 6.6–9.4s → ~2.5s** (the residual is the per-call catalog-DB client fetch + pre-ping, not the external handshake). **Finding:** Neon's transaction pooler REJECTS `default_transaction_read_only` as a libpq startup `option` — caught live; moved read-only + statement_timeout to a per-connection `connect` event (best-effort; authoritative read-only is the SELECT-only compiler + sqlglot guard, see R10). Per-request ownership/active check kept. Proceeded ahead of R1 per owner decision (marginal security delta over the existing no-auth state; auth tracked separately). Tests: `tests/database_client/test_engine.py`. First query/process still cold → DB3. | important | DB | `[x]` |
-| DB3 | **Speculative pre-connect** — `DbExecutor.prewarm(catalog, user_id)` warms the pooled engine for schema sources (fire-and-forget at slow-path entry) so the cold first-query handshake overlaps the ~4s Planner call. Best-effort, never raises; gated to the default path (skipped when a coordinator factory is injected). Verified live through `ChatHandler.handle`. | nice-to-have | DB | `[x]` |
-| R4 | **Per-stage progress events** — `SlowPathCoordinator.run` gained an optional `progress` callback; `ChatHandler` bridges it to SSE `status` events (`chat.py` forwards them). Live: stream now shows `Planning…`→`Running N steps…`→`Composing…` (max wire gap ~4.6s, was ~13s of silence) → fixes proxy idle-timeout + UX. **Deferred:** token-streaming the Assembler answer needs splitting it into a streamed prose call + a structured-record call — that doubles the Assembler LLM calls (cost/latency), so it's a separate decision; the answer is still emitted as one chunk after the (fast ~2.5s) Assembler. Test: `test_chat_handler_wiring.py`. | important | agent | `[~]` |
-| R5 | **Response cache**: key on `user_id` + catalog version; invalidate on ingest. Was `chat:{room_id}:{message}`, 24h TTL, no user → cross-user replay + stale answers. **2026-06-12 (Rifqi):** key now `chat:{room_id}:{user_id}:{message}` via `_chat_cache_key()`, TTL 24h→1h (checkpoint decision) — urgent now that Redis is a shared Cloud instance. `DELETE /chat/cache` gained a required `user_id` param (frontend heads-up); room-wide clear pattern unchanged. **Still open:** catalog-version in key / invalidate-on-ingest. | important | B | `[~]` |
-| R6 | **Hard time budget** — wrap `coordinator.run()` in `asyncio.wait_for` (60–90s). `Constraints.time_budget_seconds` is rendered but not enforced. | important | agent | `[ ]` |
-| R7 | **Root-task-failure short-circuit** before the Assembler (templated/fast-path fallback, NOT replanning) — stops paying ~2k tok to narrate an empty RunState. | important | agent | `[ ]` |
-| R8 | **Catalog upsert race** — per-user advisory lock around read-merge-upsert (`store.py`); concurrent uploads can drop a source. | important | DB | `[ ]` |
-| R9 | **`extra="ignore"`** in `settings.py:15` (currently `allow` → typo'd env vars silently swallowed); require Azure keys in prod. | nice-to-have | B | `[ ]` |
-| R10 | **Read-only enforcement is session-state, not a server role.** `REPO_CONTEXT.md` counts "read-only DB credentials" as a defense layer but nothing requests/verifies a read-only role. Either request read-only creds at registration (verify via `SELECT current_setting(...)`) or drop the claim. | important | DB | `[ ]` |
-| R11 | **De-duplicate** `_PLACEHOLDER_RE` (`task_runner.py:31` vs validator) and `_DATA_ACCESS_TOOLS` (invoker vs planner registry) — import one from the other; comments aren't a sync mechanism. **TAB slice done (90e80f9):** canonical `DATA_ACCESS_TOOLS` now lives once in `tools/data_access.py`; `invoker.py` imports it (was a duplicated frozenset synced by comment). **Agent slice done (2026-06-10):** `PLACEHOLDER_RE` single-sourced in `planner/schemas.py` (part of the ToolCall placeholder convention); validator + task_runner import it. `planner/registry.py` keeps local spec *bodies* (stub pending KM-465 #4) but name-checks them against `DATA_ACCESS_TOOLS` in `_data_access_slice()` — upstream rename/add now raises at `default_registry()` instead of drifting silently. Registry output unchanged (same 12 tools, same order). | nice-to-have | agent/tool | `[x]` |
-| R12 | **Doc/process hygiene** — some code docstrings cite internal design specs that are not committed to the repo (design docs are kept out of version control), so the references dangle for anyone but the author; `CLAUDE.md` lists deleted modules (enricher, `pipeline/orchestrator.py`); `main` is 38 commits behind on a dead architecture. | nice-to-have | agent | `[ ]` |
-| R13 | **Pre-existing test failure** (found during R2, NOT caused by it): `tests/query/planner/test_prompt.py::test_render_catalog_with_sources` fails — `query/planner/prompt.py::render_catalog` now renders stable IDs (`src_test_db`) the test asserts are absent. Old query-planner path; confirmed failing on a clean tree. | nice-to-have | DB | `[ ]` |
-| T1 | **`input_schema` is presence-only, not type-checked** — `ToolSpec.input_schema` comment said "validates ToolCall.args", but `TaskRunner._validate_args` only enforces `required` presence; the `properties` types are documentation, never validated at runtime. Clarified the contract in `tools/contracts.py` so nobody assumes type-safety (a wrong-typed arg passes validation, surfaces only inside the compute fn). Doc-only, no behavior change (90e80f9). | nice-to-have | TAB | `[x]` |
-| T2 | **Dead Python embed path?** — `document_pipeline.process()` → `knowledge_processor` → `vector_store.aadd_documents()` still writes PDF/DOCX/TXT embeddings to `langchain_pg_embedding`, contradicting CLAUDE.md's "Go is sole writer, Python reads only". Verified the Go service (`Orchestrator-Agent-Service/internal/documents`) IS a complete ingestion writer to the same tables for all 5 file types (OCR + chunk + embed) → the Python embed branch is very likely redundant. **Blocked on one operational fact:** does the frontend still upload to `/document/process` (Python) or to Go? Park until confirmed — deleting a live ingestion path would break unstructured RAG. The csv/xlsx parquet branch stays regardless (feeds the catalog/tabular path). | nice-to-have | TAB | `[blocked]` |
-**Slow-path endpoint wiring (2026-06-10):** the Orchestrator→slow-path is now wired
-into the live endpoint behind an **env flag**. `settings.enable_slow_path` (env
-`ENABLE_SLOW_PATH`, default **off**) is passed to the shared `ChatHandler` in
-`api/v1/chat.py`. Flip `ENABLE_SLOW_PATH=true` to route `structured` intents through
-Planner→TaskRunner→Assembler and test end-to-end from `/chat/stream` (status progress
-events + answer stream). Stays opt-in because `BusinessContext` is still the stub;
-fast/unstructured paths unchanged. Verified live via `ChatHandler.handle`.
-**Architecture verdict:** fundamentally sound (catalog-driven IR + deterministic compiler
-+ static plan is the right call). Debt is transitional duplication (two planners/registries/
-contract modules — documented, owned) and `ChatHandler` drifting toward a god object
-(extract the slow-path composition root + the SSE `_build_sources`/`_normalize_chunks`
-mappers when convenient).
----
-## What just shipped (2026-06-09/10 — tool layer, tracing, slow-path wiring)
-Big stretch since the slow-path workers landed. The tool layer (teammate-owned) is
-now **complete and real**, the slow path is **wired into `ChatHandler` behind a gate**,
-and the whole chat pipeline is **traced**. Fast path still untouched; live behavior
-unchanged (flags default off).
-**Tool layer — COMPLETE (teammate, KM-624→630).** `src/tools/` was re-created (the
-2026-05-11 note about deleting it is superseded). Now teammate-owned:
-- `src/tools/analytics/` — the 8 **composite** `analyze_*` computes (descriptive,
-  aggregate, comparison, contribution, profile, correlation, segment, trend) +
-  prompt-style DESCRIPTIONs (KM-624/625).
-- `src/tools/contracts.py` — canonical `ToolSpec`/`ToolRegistry`/`ToolOutput` (KM-627).
-  `agents/planner/contracts.py` now just re-exports them + keeps the `BusinessContext`
-  stub (lead's).
-- `src/tools/registry.py::analytics_registry()` (KM-628); `src/tools/invoker.py` +
-  `src/tools/data_access.py` — `AnalyticsToolInvoker` (KM-629), `DataAccessToolInvoker`
-  + `CompositeToolInvoker` (KM-630). All never-throw. **Pattern A confirmed** (`analyze_*`
-  take a `data` `${t<id>}` placeholder from an upstream `query_structured`).
-- **Verified live E2E (2026-06-09):** real `query_structured` against a user's Neon
-  Postgres → `analyze_trend` → Assembler. `analyze_contribution` surfaced a real tool
-  bug (Decimal vs float in `decomposition.py`) — degrade-and-continue held; **now fixed
-  by the tool owner** (`_coerce_decimals` in `invoker._materialize`, KM-630 / commit
-  1195870), so the whole `analyze_*` family is covered in one place. **Directive:** agent
-  side does NOT modify `src/tools/` without confirmation.
-**Planner — realigned to the real tools (KM-626).** `registry.py::default_registry()`
-composes the real `analytics_registry()` + a local stub for the 4 data-access tools.
-Few-shots grown to **A–D**: A `analyze_contribution`, B `analyze_trend`, C mixed
-structured+unstructured (`retrieve_documents`, independent branch), D `analyze_aggregate`.
-`parallelizable_with` **removed** from `Task` (schema/validator/examples/prompt) —
-TaskRunner derives parallelism from `depends_on` alone.
-**Slow-path wiring — built, GATED OFF (KM-626).** `agents/chat_handler.py` gains a
-`structured→slow` branch behind `ChatHandler(enable_slow_path=False)`: when on it builds
-a per-request `CompositeToolInvoker` (composition root) + `SlowPathCoordinator`, streams
-`chat_answer`, persists the `analysis_record`. Two seams isolate the remaining blockers:
-- `agents/planner/business_context.py::get_business_context(user_id)` — async stub
-  `BusinessContext`; TODO(lead) swap for the real read.
-- `agents/slow_path/store.py` — `AnalysisStore` Protocol + `NullAnalysisStore` (logs
-  only). Real store = `analysis_records` table in the catalog DB (Neon `dataeyond`) —
-  **table not created yet**. `chat_answer` still emitted as one chunk (not token-streamed).
-**Observability — Langfuse tracing wired (KM-631).** `src/observability/langfuse/
-tracing.py` — `RequestTracer`/`NullTracer`/`TracingToolInvoker` + `_redact`. One trace
-per request groups Orchestrator.classify, Planner.plan (each retry = its own generation),
-Assembler.assemble, Chatbot.astream + tool spans (latency/metadata only). Gated:
-`ChatHandler(enable_tracing=False)`; `api/v1/chat.py` opts in (`=True`). PII policy:
-Orchestrator+Planner unmasked (question + PII-safe summary); Assembler+Chatbot masked
-(see real rows/chunks); tool spans carry name + arg keys + row count only. Zero added
-LLM tokens; verified live to US Cloud.
-**Live evals green (2026-06-09, real Azure 4o):** `RUN_PLANNER_EVAL=1` and
-`RUN_SLOW_PATH_EVAL=1` both pass — Planner emits valid catalog-consistent `QueryIR` and
-wires Pattern A correctly; self-corrects via retry.
-**Open follow-ups:** real `BusinessContext` (lead); create `analysis_records` table +
-real `AnalysisStore` (**Rifqi owns, 2026-06-12** — folded into `generate_report` work,
-see `CHECKPOINT_PLAN_2026-06-17.md`); register data-access `ToolSpec`s upstream (`data_access_registry()`)
-or keep the planner stub; 4o → GPT-mini deployment swap; flip `enable_slow_path` on once
-`BusinessContext` is real. NOTE: 3 test files pre-existing broken from rename rot
-(`test_chat_handler.py`, `test_intent_router.py`, `test_answer_agent.py` import the old
-`answer_agent`/`intent_router` module names).
----
-## What just shipped (2026-06-10 — TAB: tool-layer hardening + DRY)
-Owner-side companion to the agent block above. After the live E2E surfaced real-data
-edge cases, the tool layer got a round of correctness hardening. All in TAB-owned paths
-(`src/tools/`, `src/catalog/`); no agent-side or API change.
-**JSON-safety across the `analyze_*` family.** Real DB rows carry scalar types that
-don't survive the jsonb / SSE round-trip:
-- `[KM-630] coerce DB Decimal → float` (commit 1195870) — `_coerce_decimals` in
-  `invoker._materialize` converts object-columns holding `decimal.Decimal` (asyncpg
-  returns NUMERIC as `Decimal`) to `float64` before any compute runs. Fixes the
-  `float + Decimal` TypeError in `decomposition.analyze_contribution` **and** the whole
-  family in one seam — only touches columns that actually contain a `Decimal`.
-- `[KM-624] non-JSON-safe scalars in mode & top_value` (commit 6981ed3) — normalize
-  numpy / non-native scalars so descriptive + top-value outputs serialize cleanly.
-**Planner↔Tools registry alignment + Timestamp keys** (commit 4bb7623, `fix(tools)`):
-- `registry.py` — `analyze_descriptive.required` corrected `["data"]` → `["data",
-  "column_ids"]` to match the compute signature (`column_ids` has no default). Prevents
-  the Planner from emitting a call that's missing a required arg. `analyze_profile` stays
-  `["data"]` (its `column_ids` defaults to `None`).
-- `aggregation._clean` — group-by over a datetime column produced `pd.Timestamp` group
-  keys that aren't JSON-safe; now normalized to `.isoformat()` alongside the existing
-  numpy `.item()` branch.
-**DRY: single `SAMPLE_LIMIT` constant** (commit 6d46ba5, `[NOTICKET] refactor(catalog)`):
-- One source of truth in `catalog/introspect/base.py` (`SAMPLE_LIMIT = 3`, down from 5 —
-  token cost: sample values feed the planner prompt). Both introspection paths import it:
-  `catalog/introspect/tabular.py` and `pipeline/db_pipeline/extractor.py` (which dropped
-  its own local `= 3`). Dependency direction is pipeline→catalog (no circular import).
-  Stale test `test_sample_values_capped_at_five` updated to assert the real cap (3).
-**Audit result:** Planner↔Tools arg alignment swept end-to-end — 7/8 `analyze_*` tools
-already matched; the 1 mismatch (`analyze_descriptive`) is the fix above. Pattern A holds
-across all of them.
----
-## What just shipped (2026-06-08 — KM-626: slow-path agent layer)
-The rest of the slow path after the Planner (KM-567) — TaskRunner, Assembler, and
-the coordinator. Built and tested against
-mocks; **not yet wired into the live `ChatHandler`** (waits on the tool team's real
-`ToolInvoker` + a real `BusinessContext`). Fast path untouched.
-**Naming:** "Orchestrator" = the entry dispatcher only (`agents/orchestration.py`).
-The slow-path **workers** live in **`agents/slow_path/`** — deliberately NOT named
-"orchestrator".
-**Files added** (`src/agents/slow_path/`):
-- `schemas.py` — `TaskResult`, `RunState`; `TaskSummary`, `AnalysisRecord`,
-  `AssembledOutput`, `AssemblerNarrative`. Reuses `ToolOutput`.
-- `invoker.py` — `ToolInvoker` Protocol only; the tool team owns the impl (KM-418).
-- `errors.py` — `SlowPathError`, `AssemblerError`.
-- `task_runner.py` — deterministic, 0 LLM: wave-based execution, `${t<id>}` placeholder
-  resolution, internal `validate_args`, never-throw invoke, status labeling,
-  degrade-and-continue → `RunState`.
-- `assembler.py` + `prompt.py` + `config/prompts/assembler.md` — single LLM call →
-  `AssemblerNarrative`; code merges with `RunState` to build the `AnalysisRecord`
-  (structured fields copied, never re-authored).
-- `coordinator.py` — `SlowPathCoordinator`: Planner → TaskRunner → Assembler.
-**Tests added** (`tests/agents/slow_path/`, 12 passing; gitignored): schema round-trips
-+ chat_answer-first; runner happy/placeholder/parallel/degrade/arg-miss; assembler
-narrative-vs-snapshot + question threading; coordinator end-to-end. `ruff` clean;
-tool-agnostic (no `src/tools/*` import).
-**Open follow-ups (not blockers):** wire `SlowPathCoordinator` into the expanded
-Orchestrator/`ChatHandler` once the real invoker + `BusinessContext` exist; swap the
-test `MockToolInvoker` for the tool team's real one (zero agent change, INV-7); 4o →
-GPT-mini deployment swap.
----
-## What just shipped (2026-06-08 — tool taxonomy + ownership revision)
-Team decisions after the teammate pushed KM-624 (`src/tools/analytics/`):
-- **Composite tools, not atomic.** v1 uses **composite "family" tools** (`analyze_*`),
-  not the atomic `compute_*` set the earlier draft assumed. One `analyze_*` call does a
-  whole analytical job (e.g. `analyze_descriptive` subsumes median/mode/stddev/percentile;
-  `analyze_trend` subsumes `date_trunc`). Tool-taxonomy decision recorded.
-- **Tool team owns ALL tools** — compute, data-access (`query_structured`,
-  `retrieve_documents`, `list_sources`, `describe_source`), the wrapper/invoker layer
-  (KM-418), and **all tool tests**. The agent team owns nothing below the registry contract.
-- **Planner stub realigned to the real tools.** `registry.py` rewritten from the 9 atomic
-  entries to **12 composite entries** (4 data-access + 8 `analyze_*`); `examples.py`
-  rewritten (Example A → `analyze_contribution`, Example B → `analyze_trend`); `planner.md`
-  bullet updated; planner tests updated. 32 passing + 1 gated, `ruff` clean.
-- **Open (tool team's call):** Pattern A (analyze_* take a `${t<id>}` `data` placeholder
-  from an upstream `query_structured`) vs Pattern B (self-fetch by `source_id`). Stub
-  assumes A; reshaped to match once decided (agent code unaffected, INV-7).
-- **New coupling:** the tool team's `query_structured`/`retrieve_documents` are expected
-  to call our existing `QueryService`/`RetrievalRouter`; `query_structured` stays
-  inline-`QueryIR` so `IRValidator` still applies. Interface to coordinate.
-**Next (our scope, all mock-able now):** TaskRunner + Assembler against a `MockToolInvoker`,
-then Orchestrator slow-path wiring. Stubs still to retire on integration: `contracts.py`
-(BusinessContext from lead; ToolSpec/ToolRegistry/ToolOutput from tool team) and `registry.py`
-(real registry from tool team). Infra: swap the 4o stand-in for a GPT-mini deployment.
----
-## What just shipped (2026-06-05 — Phase 3: Planner agent)
-First slow-path agent (the Planner). A single LLM
-call turns BusinessContext + Catalog + ToolRegistry + question + Constraints into a
-validated, **static** `TaskList` (DAG of fully-specified tool-call chains). No
-replanning (INV-6); tool-agnostic against a registry contract (INV-7). Fast path
-(`agents/orchestration.py`, `agents/chatbot.py`, `query/`) untouched.
-**Files added** (`src/agents/planner/`):
-- `contracts.py` — **STUB** Pydantic contracts pending reconciliation: `BusinessContext`
-  (+KeyTerm/DataTableNote/DataColumnNote, lead's), `ToolSpec`/`ToolRegistry` (tool
-  team KM-608), `ToolOutput` envelope.
-- `schemas.py` — `CrispStage`, `ToolCall`, `Task`, `TaskList`. No replan schemas.
-- `inputs.py` — `CatalogSummary` (condensed, PII `sample_values` nulled, `from_catalog`
-  builder + `render`) and `Constraints` (max_tasks=5, modeling_allowed=False).
-- `registry.py` — **STUB** v1 P0 registry: query_structured, retrieve_documents,
-  list_sources, describe_source, compute_median/stddev/percentile/mode, date_trunc.
-- `errors.py` — `PlannerError`, `PlannerValidationError`.
-- `prompt.py` + `config/prompts/planner.md` — system prompt (INV-1/6/7 + principles) +
-  per-call human content (context + catalog + tools + constraints + few-shots + question).
-- `examples.py` — two few-shots (A exploratory revenue-by-category; B descriptive
-  monthly-trend-by-region with date_trunc), built from the real `TaskList` schema.
-- `validator.py` — `PlannerValidator` running the 8 checks; reuses the existing
-  `IRValidator` for inline `query_structured` IRs.
-- `service.py` — `PlannerService` + `plan_analysis(...)`: chain (mirrors
-  `query/planner/service.py`) + validate-and-retry loop (max 3, mirrors `QueryService`).
-**Tests added** (`tests/agents/planner/`, 30 passing + 1 gated): `test_schemas.py`,
-`test_inputs.py`, `test_validator.py` (one failure per check + happy paths),
-`test_service.py` (`_FakeChain` + retry), `test_golden_questions.py` (live eval gated on
-`RUN_PLANNER_EVAL=1`). `ruff check` clean on planner paths.
-**Open follow-ups (not blockers):** reconcile `BusinessContext` with the lead and
-`ToolRegistry`/`ToolSpec` + real tools with teammate (KM-608); "GPT mini" currently uses
-the configured 4o deployment (swap `azure_deployment` when a mini deployment exists). Next:
-Orchestrator slow-path expansion + TaskRunner + Assembler.
----
-## Legend
-- `[x]` done and merged
-- `[~]` in progress (open PR or active branch)
-- `[ ]` not started
-- **DB** / **TAB** / **B** — ownership (from REPO_CONTEXT.md)
----
-## PR sequence
-| PR | Status | Owner(s) | Scope |
-|---|---|---|---|
-| PR1 | `[x]` merged | DB | Contract locks + catalog plumbing + DB introspector + IR validator + tests |
-| PR1-tab | `[x]` shipped | TAB | Tabular introspector + on_tabular_uploaded trigger + 31 unit tests |
-| PR2a | `[x]` merged | DB | CatalogEnricher + StructuredPipeline + on_db_registered trigger + FK extension on Table (enricher later removed in KM-557) |
-| KM-557 | `[x]` shipped | DB | Drop CatalogEnricher entirely (cost cut — planner uses stats + sample rows directly); rename jsonb table `catalogs` → `data_catalog`; add `GET /api/v1/data-catalog/{user_id}` index endpoint for catalog refresher |
-| PR2b | `[x]` shipped | DB-solo (B-review) | IntentRouter + planner prompt + planner LLM service |
-| PR3-DB | `[x]` shipped | DB | SqlCompiler (Postgres) + DbExecutor (sqlglot guard, RO + statement_timeout, asyncio.to_thread) + 36 golden IR→SQL tests |
-| PR3-TAB | `[x]` shipped | TAB | PandasCompiler + TabularExecutor + 43+12 golden IR→DataFrame tests |
-| PR4 | `[x]` | DB-solo (B-review) | ExecutorDispatcher + QueryService + ChatHandler module. **API rewired in Cleanup PR.** |
-| PR5 | `[x]` shipped | DB-solo (B-review) | Retry/self-correction loop on validation failure (lives in QueryService, max 3 attempts, planner re-prompted with prior error) |
-| PR6 | `[~]` scaffold | DB-solo (B-review) | Eval harness scaffold + 3 DB-targeting golden cases. Skipped without `RUN_PLANNER_EVAL=1` env. TAB extends with tabular cases. |
-| PR7 | `[x]` | DB-solo (B-review) | `ChatbotAgent` (renamed from `AnswerAgent`) + chatbot_system + guardrails prompts. `answer_agent.py` → `chatbot.py`, `AnswerAgent` → `ChatbotAgent`. API rewired in Cleanup PR. |
-| Cleanup | `[x]` | B | ChatHandler wired to chat.py; Phase 1 dual-write dropped from /ingest; on_catalog_rebuild_requested + POST /data-catalog/rebuild; dead modules deleted (chatbot Phase 1, orchestrator, query/base, knowledge.py, config/agents/); retrieval cache restored via RetrievalRouter; top_values added to ColumnStats; lifespan migration; knowledge_router removed. |
----
-## All items
-### Contracts (B — shared)
-| # | Item | Status | Notes |
-|---|---|---|---|
-| 1 | Catalog Pydantic models (`catalog/models.py`) | `[x]` | PR1 added `location_ref` URI-scheme docstring; PR2a added `ForeignKey` model + `Table.foreign_keys` field |
-| 2 | IR Pydantic models (`query/ir/models.py`) | `[x]` | Pre-existing scaffold |
-| 3 | IR operator whitelists (`query/ir/operators.py`) | `[x]` | PR1 filled `TYPE_COMPATIBILITY` matrix |
-| 4 | PII patterns / regex (`security/pii_patterns.py`) | `[x]` | Pre-existing |
-| — | `data_catalog` Postgres jsonb table (`db/postgres/models.py`) | `[x]` | PR1 added `Catalog` SQLAlchemy class + `init_db.py` import. KM-557 renamed `__tablename__` from `catalogs` → `data_catalog`; created fresh (no migration) |
-| — | `QueryResult` shape (`query/executor/base.py`) | `[x]` | Pre-existing scaffold; `columns: list[str]` added (TAB owner, PR1-tab) — DbExecutor updated to populate it. |
-| — | `Source.location_ref` URI scheme | `[x]` | PR1 documented in `catalog/models.py` docstring |
-### Ingestion — introspection
-| # | Item | Owner | Status | Notes |
-|---|---|---|---|---|
-| 5 | DB introspector (`catalog/introspect/database.py`) | DB | `[x]` | PR1 — reuses Phase 1 `database_client_service`, `db_credential_encryption`, `db_pipeline_service.engine_scope`, `extractor.get_schema/profile_column/get_row_count`. PR2a wired FK extraction (was discarded before). |
-| 6 | Tabular introspector (`catalog/introspect/tabular.py`) | TAB | `[x]` | PR1-tab — downloads original blob (CSV/XLSX/Parquet), one Table per sheet (XLSX) or one Table (CSV/Parquet). `source_id = document_id`. `fetch_doc`/`fetch_blob` injectable for unit tests (no Settings). **2026-06-10**: sample cap now imports the shared `SAMPLE_LIMIT` (=3) from `catalog/introspect/base.py` — single source of truth across the tabular + DB introspection paths (commit 6d46ba5). |
-| 7 | `BaseIntrospector` ABC (`catalog/introspect/base.py`) | B | `[x]` | Pre-existing; signature locked |
-### Ingestion — shared catalog plumbing
-| # | Item | Owner | Status | Notes |
-|---|---|---|---|---|
-| 8 | ~~Catalog enricher + prompt~~ | B | **REMOVED in KM-557** | Cost optimization — planner reads stats + sample rows + column names directly. `catalog/enricher.py` + `config/prompts/catalog_enricher.md` deleted. `render_source` (the only piece still needed) moved to `src/catalog/render.py`. Tests moved to `tests/catalog/test_render.py`. |
-| 9 | Catalog validator (`catalog/validator.py`) | B | `[x]` | PR1 (DB owner picked up) — uniqueness invariants |
-| 10 | Catalog store — Postgres jsonb (`catalog/store.py`) | B | `[x]` | PR1 (DB owner picked up) — `INSERT ... ON CONFLICT` |
-| 11 | Catalog reader (`catalog/reader.py`) | B | `[x]` | PR1 (DB owner picked up) — filters by source_hint, empty on miss |
-| 12 | PII detector (`catalog/pii_detector.py`) | B | `[x]` | PR1 (DB owner picked up) — name + value matching, bias toward over-flag |
-### Ingestion — pipelines
-| # | Item | Owner | Status | Notes |
-|---|---|---|---|---|
-| 13 | Structured pipeline (`pipeline/structured_pipeline.py`) | B | `[x]` | PR2a (DB owner) — Source-type-agnostic: caller supplies the introspector. `default_structured_pipeline()` factory wires production deps lazily so tests can inject mocks without `Settings()` construction. **KM-557**: enrich step removed; pipeline is now `introspect → merge with existing → validate → upsert`. Constructor no longer takes `enricher`. |
-| 14 | Triggers (`pipeline/triggers.py`) | B | `[x]` | PR2a — `on_db_registered` implemented (DB owner). PR1-tab — `on_tabular_uploaded` implemented (TAB owner). **2026-05-11** — `on_document_uploaded` implemented. **2026-05-12** — `on_catalog_rebuild_requested` implemented: iterates all Sources in current catalog, re-runs `on_db_registered` (schema) or `on_tabular_uploaded` (tabular) per source; per-source errors logged but don't abort. |
-| 15 | Ingestion orchestrator (`pipeline/orchestrator.py`) | B | **DELETED** | Redundant stub — `StructuredPipeline` already takes introspector at run() time. Deleted in Cleanup PR. |
-| 16 | Document pipeline (`pipeline/document_pipeline.py`) | TAB | `[x]` | Flattened `pipeline/document_pipeline/document_pipeline.py` (folder) → `pipeline/document_pipeline.py` (file). Updated import in `api/v1/document.py`. |
-### Query — shared spine
-| # | Item | Owner | Status | Notes |
-|---|---|---|---|---|
-| 17 | IR validator (`query/ir/validator.py`) | B | `[x]` | PR1 (DB owner) — full rule set; descriptive errors for planner retry |
-| 18 | Planner LLM service (`query/planner/service.py`) | B | `[x]` | PR2b — Azure OpenAI structured output → `QueryIR`. Injectable chain. Supports retry via `previous_error` argument. |
-| 19 | Planner prompt (`query/planner/prompt.py`, `config/prompts/query_planner.md`) | B | `[x]` | PR2b — system prompt with hard constraints + few-shot for DB and tabular sources. `build_planner_prompt(question, catalog, previous_error)` calls `catalog.render.render_source` (renamed from `catalog.enricher.render_source` in KM-557). |
-| 20 | Intent router (`agents/orchestration.py` — class `OrchestratorAgent`; `config/prompts/intent_router.md`) | B | `[x]` | PR2b — single LLM call → `IntentRouterDecision(needs_search, source_hint, rewritten_query)`. Supports conversation history. **NOTE**: source filename + class name were kept from Phase 1 for import-site compatibility; only the body is Phase 2. Prompt file and test file use the `intent_router` name. |
-| 21 | Executor base + `QueryResult` (`query/executor/base.py`) | B | `[x]` | Pre-existing scaffold |
-| 22 | Executor dispatcher (`query/executor/dispatcher.py`) | B | `[x]` | PR4 — picks DbExecutor / TabularExecutor by `source.source_type`. Lazy imports of production executors keep import side-effect-free for tests. Caches per source_type. |
-| 23 | Compiler base ABC (`query/compiler/base.py`) | B | `[x]` | Pre-existing scaffold |
-| 24 | Top-level QueryService (`query/service.py`) | B | `[x]` | PR4+5 — `plan → validate → dispatch → execute → QueryResult`. Retry loop on validation failure (max 3, planner re-prompted with prior error). Catches NotImplementedError from TabularExecutor placeholder gracefully. Never raises. |
-### Query — DB path
-| # | Item | Status | Notes |
-|---|---|---|---|
-| 25 | SQL compiler (`query/compiler/sql.py`) | `[x]` | PR3-DB — Postgres dialect (Supabase reuses); deterministic IR → (sql, named-params dict); double-quoted identifiers from catalog; all whitelisted ops (=, !=, <, <=, >, >=, in, not_in, is_null, is_not_null, like, between); alias-aware order_by; `CompiledSql.params: dict[str, Any]` (changed from `list`). MySQL/BigQuery/Snowflake compilers later. |
-| 26 | DB executor (`query/executor/db.py`) | `[x]` | PR3-DB — sync engine via `db_pipeline_service.engine_scope` inside `asyncio.to_thread`. sqlglot SELECT-only / no-DML guard. Postgres-only session settings: `default_transaction_read_only=on` + `statement_timeout=30000`. asyncio.wait_for backstop. Never raises — populates `QueryResult.error`. 10k row hard cap. |
-| 27 | Credential encryption (`security/credentials.py`) | `[ ]` | Stub exists; PR1 reused Phase 1 `utils/db_credential_encryption.py` instead. Move in cleanup PR |
-| 28 | User-DB connection management | `[x]` | PR3-DB reused Phase 1 `db_pipeline_service.engine_scope` (same as PR1 introspector); no new helper needed |
-### Query — Tabular path
-| # | Item | Status | Notes |
-|---|---|---|---|
-| 29 | Pandas compiler (`query/compiler/pandas.py`) | `[x]` | PR3-TAB — `CompiledPandas` dataclass; all 12 filter ops; all 6 aggs; group_by via `pd.concat` of Series; alias-aware order_by; `_like_to_regex` (`%`→`.*`, `_`→`.`); pure module-level helpers. (`polars` for large files still deferred — see Planned dependencies.) |
-| 30 | Tabular executor (`query/executor/tabular.py`) | `[x]` | PR3-TAB — `fetch_blob` injectable for tests; blob path: single-table → `{uid}/{did}.parquet`, multi-table → `{uid}/{did}__{table.name}.parquet`; `asyncio.to_thread`; 10k row hard cap; errors → `QueryResult.error`. Dispatcher routes to it by `source_type`. |
-| 31 | Parquet upload/download wrapper | `[x]` | Moved `knowledge/parquet_service.py` → `storage/parquet.py`. Updated 4 import sites: `pipeline/document_pipeline.py`, `knowledge/processing_service.py`, `query/executor/tabular.py`, `query/executors/tabular.py`. |
-### Agents + chat
-| # | Item | Status | Notes |
-|---|---|---|---|
-| 32 | Chatbot agent + prompt (`agents/chatbot.py`, `config/prompts/chatbot_system.md`) | `[x]` | PR7-bundle — `ChatbotAgent` (was `AnswerAgent`) streams tokens, accepts `QueryResult` or list[`DocumentChunk`] or neither. **Cleanup PR**: renamed `answer_agent.py` → `chatbot.py`, `AnswerAgent` → `ChatbotAgent`; Phase 1 `agents/chatbot.py` deleted. |
-| 33 | Guardrails prompt (`config/prompts/guardrails.md`) | `[x]` | PR7-bundle — appended to `chatbot_system.md` so guardrails take precedence in conflict. |
-| — | Chat handler / orchestrator (`agents/chat_handler.py`) | `[x]` | PR4-bundle — top-level Phase 2 orchestrator. Routes by `source_hint`: chat → AnswerAgent direct; structured → CatalogReader + QueryService; unstructured → DocumentRetriever placeholder + AnswerAgent. Yields `intent` / `chunk` / `done` / `error` SSE-style events. Phase 1 chat.py NOT touched — cleanup PR rewires the API to call this. **2026-06-09**: gained the gated `structured→slow` branch (`enable_slow_path=False`) + `enable_tracing` (KM-626/631). |
-### Tools — slow-path "Tools" component (TAB)
-New scope after the original 42-item table; added as the tool layer landed (KM-608/624–631). All TAB-owned (`src/tools/`), all never-throw.
-| # | Item | Owner | Status | Notes |
-|---|---|---|---|---|
-| — | Analytics compute fns (`tools/analytics/`) | TAB | `[x]` | KM-608/624/625 — 8 **composite** `analyze_*` fns (descriptive, aggregate, comparison, contribution, profile, correlation, segment, trend) + prompt-style DESCRIPTIONs. Pure pandas, no I/O. JSON-safe outputs (numpy/Decimal/Timestamp normalized — KM-624 + commit 4bb7623). |
-| — | Tool contracts (`tools/contracts.py`) | TAB | `[x]` | KM-627 — canonical `ToolSpec` / `ToolRegistry` / `ToolOutput`. `agents/planner/contracts.py` re-exports them (+ keeps the lead's `BusinessContext` stub). |
-| — | Analytics registry (`tools/registry.py`) | TAB | `[x]` | KM-628 — `analytics_registry()`. `analyze_descriptive.required` = `["data","column_ids"]` (aligned to compute signature, commit 4bb7623). |
-| — | Invoker layer (`tools/invoker.py`) | TAB | `[x]` | KM-629 — `AnalyticsToolInvoker` (Pattern A: `analyze_*` take a `data` `${t<id>}` placeholder from upstream `query_structured`; `_materialize` → DataFrame, `_coerce_decimals` covers the whole family) + `CompositeToolInvoker` (routes data-access vs analytics by name). |
-| — | Data-access tools (`tools/data_access.py`) | TAB | `[x]` | KM-630 — `DataAccessToolInvoker`: `list_sources` / `describe_source` / `query_structured` / `retrieve_documents`. Per-request DI (`user_id` + `CatalogReader`). `query_structured` calls `IRValidator` + `ExecutorDispatcher` (planner skipped — IR pre-built by the agent Planner). **Superseded by KM-642/643** — renamed `data_retrieve`/`knowledge_retrieve` and `list_sources`+`describe_source` merged into `data_check` + new `knowledge_check`; see row below. |
-| — | Tool tests (`tests/unit/tools/`) | TAB | `[x]` | analytics + data-access + invoker tests (gitignored). Incl. regression `test_decimal_columns_coerced_for_analyze_contribution`. |
-| — | Data/knowledge tool taxonomy (`tools/data_access.py`) | TAB | `[x]` | KM-642/643 (commits c38c0c2, 4bd5f1e) — renamed `query_structured`→`data_retrieve`, `retrieve_documents`→`knowledge_retrieve`; merged `list_sources`+`describe_source` → parameterized `data_check` (no arg = list structured sources; `source_id` = that source's schema) + new `knowledge_check` (unstructured/documents). Split mirrors the catalog's structured/unstructured slices. Planner stub/prompt/validator/few-shots synced; `DATA_ACCESS_TOOLS` guard kept in lockstep. Note: dated log entries above (e.g. the 2026-06-09 E2E) keep the old names as historical record. |
-### API surface
-| # | Item | Owner | Status | Notes |
-|---|---|---|---|---|
-| 34 | DB client endpoints (`api/v1/db_client.py`) | DB | `[x]` | **Cleanup PR** — `/ingest` now calls only `on_db_registered`. Phase 1 `db_pipeline_service.run()` + `decrypt_credentials_dict` removed. Error from catalog build now raises HTTP 500 (was silent log). Response simplified to `{"status": "success", "client_id": ...}`. |
-| 35 | Document/tabular upload endpoints (`api/v1/document.py`) | TAB | `[x]` | Rewired `/document/process` — after processing CSV/XLSX, calls `on_tabular_uploaded(document_id, user_id)`. Catalog ingestion failure is logged but does not fail the request. **2026-05-11** — CSV/XLSX no longer ingested to vector store (`knowledge_processor` skipped for tabular types in `document_pipeline.py`); they go to catalog only. |
-| 36 | Chat stream endpoint (`api/v1/chat.py`) | B | `[x]` | Rewired `/chat/stream` — replaced `query_executor.execute()` (Phase 1) with `CatalogReader + QueryService` (Phase 2). **Cleanup PR**: fully rewired to `ChatHandler.handle()`. Inline intent routing, retrieval, and answer generation removed. Redis cache, fast intent, history loading, and message persistence remain in chat.py. Sources event emits `[]` (retrieval not yet exposed by ChatHandler). |
-| 37 | Room / users endpoints (`api/v1/room.py`, `api/v1/users.py`) | B | `[ ]` | No catalog work; only touch if auth flow changes |
-| — | Data catalog index endpoint (`api/v1/data_catalog.py`) | DB | `[x]` | **KM-557** — `GET /api/v1/data-catalog/{user_id}` → `list[CatalogIndexEntry]`. **Cleanup PR** — added `POST /api/v1/data-catalog/rebuild?user_id=` → calls `on_catalog_rebuild_requested`; per-source errors logged but don't fail the request. |
-### Tests + eval
-| # | Item | Owner | Status | Notes |
-|---|---|---|---|---|
-| 38 | DB compiler golden tests (`tests/query/compiler/test_sql.py`) | DB | `[x]` | PR3-DB — 36 tests across all whitelisted ops, identifier quoting, agg / count_distinct / count(*), order_by alias resolution, parameter sequencing, error paths. Pure-Python, no LLM, no DB. |
-| 39 | Pandas compiler golden tests (`tests/unit/query/compiler/test_pandas_compiler.py`) | TAB | `[x]` | PR3-TAB — 43 tests: all 12 filter ops, all 6 aggs, group_by, order_by, limit, aliases, empty DataFrame, error paths. `test_tabular_executor.py` adds 12 more (blob name resolution + happy path + error paths). |
-| 40 | IR validator tests (`tests/query/ir/test_validator.py`) | B | `[x]` | PR1 — 19 tests, all rules covered |
-| — | PII detector tests (`tests/catalog/test_pii_detector.py`) | B | `[x]` | PR1 — 26 tests (parametrized) |
-| — | Catalog validator tests (`tests/catalog/test_validator.py`) | B | `[x]` | PR1 — 5 tests |
-| — | Catalog render tests (`tests/catalog/test_render.py`) | B | `[x]` | **KM-557** — 5 tests (renamed from `test_enricher.py`; LLM enrichment tests dropped, render-only tests kept). |
-| — | Catalog store integration test (`tests/catalog/test_store.py`) | DB | `[x]` | PR1 — module-level skip without `RUN_INTEGRATION_TESTS=1` |
-| — | DB introspector test | DB | `[ ]` | Deferred to PR2 — needs Postgres testcontainer or fixture infra |
-| — | Tabular introspector test | TAB | `[x]` | PR1-tab — 31 unit tests (CSV/XLSX/Parquet, stats, PII, error paths). No DB/blob I/O — mocks injected via constructor. |
-| 41 | Planner eval (`tests/query/planner/`) | B | `[x]` | PR6-scaffold — `test_golden_questions.py` with 3 DB-targeting cases. TAB added `test_golden_tabular.py` with 4 tabular cases (group_by+sum, top-N+limit, date range filter, XLSX sheet selection). All 4 passed against real Azure OpenAI. Fix shipped alongside: `query/planner/service.py` replaced `("system", text)` tuple with `SystemMessage` — without this, `{...}` in `query_planner.md` was parsed as f-string variables and crashed on every real invocation. |
-| 42 | E2E smoke tests (`tests/e2e/`) | B | `[ ]` | Defer until Phase 2 endpoints are wired (cleanup PR). Component-level orchestration is already covered by `test_chat_handler.py` + `test_service.py`. |
-| — | Golden IR fixtures (`tests/fixtures/golden_irs.json`) | B | `[~]` | PR1 seeded with 5 DB-targeting examples; TAB extends in PR1-tab |
-| — | Shared `sample_catalog` fixture (`tests/conftest.py`) | B | `[x]` | PR1 — DB-shaped; TAB may add tabular sibling |
----
-## What just shipped (2026-05-12 — Cleanup PR)
-**Phase 1 removal + Phase 2 API rewiring:**
-- `src/api/v1/chat.py` — fully rewired to `ChatHandler.handle()`. Removed inline IntentRouter, retrieval, and ChatbotAgent calls. Redis cache, fast intent, load_history, save_messages stay in chat.py.
-- `src/api/v1/db_client.py` — `/ingest` now calls only `on_db_registered`. Phase 1 `db_pipeline_service.run()` block removed. Catalog build failure now raises HTTP 500.
-- `src/api/v1/data_catalog.py` — added `POST /api/v1/data-catalog/rebuild` endpoint.
-- `src/pipeline/triggers.py` — `on_catalog_rebuild_requested` implemented: iterates catalog sources, re-runs the appropriate trigger per source type, per-source errors logged.
-**Dead modules deleted:**
-- `src/agents/chatbot.py` (Phase 1 LangChain chatbot)
-- `src/pipeline/orchestrator.py` (empty stub)
-- `src/query/base.py` (old duplicate of `executor/base.py`)
-- `src/api/v1/knowledge.py` (fake `/knowledge/rebuild` endpoint)
-- `src/config/agents/` (folder — prompts only used by deleted Phase 1 chatbot)
-**Renames:**
-- `src/agents/answer_agent.py` → `src/agents/chatbot.py`; `AnswerAgent` → `ChatbotAgent`; updated all import sites (`chat_handler.py`, `chat.py`)
-**Fixes + improvements:**
-- `src/agents/chat_handler.py` — `_get_document_retriever()` now returns `RetrievalRouter` (Redis-cached) instead of `DocumentRetriever` directly; retrieval-level cache restored.
-- `src/retrieval/router.py` — removed dead `db: AsyncSession` and `source_hint` parameters + `_UNSTRUCTURED_HINTS` constant from `retrieve()`. Cache key simplified.
-- `src/knowledge/processing_service.py` — removed dead `_build_csv_documents`, `_build_excel_documents`, `_profile_dataframe`, `_to_sheet_document` methods + `pandas` and `upload_parquet` imports.
-- `src/catalog/models.py` — added `top_values: list[Any] | None` to `ColumnStats`.
-- `src/catalog/introspect/tabular.py` — `_to_column` now populates `top_values` for columns with ≤10 distinct values; useful for query planner WHERE clause generation.
-- `main.py` — replaced deprecated `@app.on_event("startup")` with `lifespan` context manager; removed `knowledge_router`.
----
-## What just shipped (KM-557 — DB owner)
-After lead review of the catalog ingestion cost: dropped LLM enrichment,
-renamed the storage table, and exposed a lightweight index endpoint for
-the upcoming catalog refresher.
-**Files deleted**:
-- `src/catalog/enricher.py` — entire CatalogEnricher + EnrichmentResponse + apply_descriptions removed
-- `src/config/prompts/catalog_enricher.md` — dead prompt
-- `tests/catalog/test_enricher.py` — replaced by `test_render.py`
-**Files added**:
-- `src/catalog/render.py` — new home for `render_source` (the only piece of the old enricher still needed; consumed by `query/planner/prompt.py`)
-- `src/api/v1/data_catalog.py` — `GET /api/v1/data-catalog/{user_id}` returns `list[CatalogIndexEntry]`
-- `tests/catalog/test_render.py` — 5 tests (same coverage as the old render block)
-**Files modified**:
-- `src/db/postgres/models.py` — `__tablename__ = "data_catalog"` (was `"catalogs"`). Class name unchanged
-- `src/pipeline/structured_pipeline.py` — `StructuredPipeline(validator, store)` (was `(enricher, validator, store)`); pipeline is now `introspect → merge → validate → upsert`; `default_structured_pipeline()` no longer constructs an enricher
-- `src/pipeline/triggers.py` — docstrings updated; `on_catalog_rebuild_requested` docstring rewritten for the refresher use case
-- `src/query/planner/prompt.py` — import now `from ...catalog.render import render_source`
-- `src/catalog/introspect/{base,database,tabular}.py` — docstring scrubs (no behavior changes)
-- `src/models/api/catalog.py` — added `CatalogIndexEntry`; simplified `CatalogRebuildResponse` to `sources_rebuilt`
-- `main.py` — registered `data_catalog_router`
-- `src/security/README.md` — one stale wording fix
-**No migration**: the `data_catalog` table is created from scratch on first `init_db()`. The old `catalogs` table was never deployed against production data, so no rename SQL is needed.
-**Tests**: all 4 `test_structured_pipeline.py` tests reworked to construct `StructuredPipeline(validator=, store=)` without `enricher`. 5 `test_render.py` tests cover render_source standalone.
-**Lint**: `ruff check` clean on modified Phase 2 paths.
-**Open follow-ups left for the lead**:
-- `on_catalog_rebuild_requested` body — the refresher will iterate the index endpoint and call this trigger per source
-- `api/v1/db_client.py` `/ingest` still doesn't call `on_db_registered` — same blocker as before, untouched by KM-557
----
-## What just shipped (2026-05-11 — retrieval migration + bug fixes)
-**Files implemented / migrated**:
-- `src/retrieval/base.py` — `RetrievalResult` dataclass + `BaseRetriever` ABC (was in `src/rag/base.py`)
-- `src/retrieval/document.py` — full `DocumentRetriever` migrated from `src/rag/retrievers/document.py`; all retrieval methods (MMR/cosine/euclidean/inner_product/manhattan). Tabular file types filtered out from results.
-- `src/retrieval/router.py` — `RetrievalRouter` (Redis-cached, unstructured-only). `invalidate_cache(user_id)` clears all `retrieval:{user_id}:*` keys.
-**Deleted** (no longer used):
-- `src/rag/` — entire folder (base.py, retriever.py, router.py, retrievers/)
-- `src/tools/` — entire folder (search.py was the only real file; only called by deleted rag/ router)
-**Bug fixes**:
-- `src/pipeline/document_pipeline.py` — `retrieval_router.invalidate_cache(user_id)` called after `process()` and `delete()`. Redis failure is caught and logged (does not fail the document op).
-- `src/pipeline/document_pipeline.py` — CSV/XLSX now skips `knowledge_processor` (vector store). Tabular files go to catalog only; no duplicate embeddings.
-- `src/pipeline/triggers.py` — `on_document_uploaded` implemented (was `raise NotImplementedError`).
-- `src/agents/chat_handler.py` — `_normalize_chunks` now handles `RetrievalResult` objects. Previously they were silently dropped, causing empty context for unstructured queries through ChatHandler.
-**Import updates** (all changed from `src.rag.*` → `src.retrieval.*`):
-- `src/api/v1/chat.py`, `src/query/base.py`, `src/query/query_executor.py`, `src/query/executors/db_executor.py`, `src/query/executors/tabular.py`
----
-## What shipped previously (PR2b/4/5/6/7-bundle — DB owner solo, teammate reviews)
-**Files implemented**:
-- `src/agents/orchestration.py` — `OrchestratorAgent.classify(message, history) → IntentRouterDecision`. Pydantic model for structured output. History-aware query rewriting. Phase 1 filename + class name preserved; body fully rewritten for Phase 2.
-- `src/agents/answer_agent.py` — `AnswerAgent.astream(...)` streams answer tokens; accepts `QueryResult` and/or `list[DocumentChunk]`. Renames to `chatbot.py` in cleanup PR.
-- `src/agents/chat_handler.py` — `ChatHandler.handle(message, user_id, history)` returns `AsyncIterator[dict]` of `intent` / `chunk` / `done` / `error` SSE events. All deps injectable; lazy default builders.
-- `src/query/planner/prompt.py` — `render_catalog(catalog)` + `build_planner_prompt(question, catalog, previous_error)`. Reuses `catalog.enricher.render_source` for consistency across LLM call sites.
-- `src/query/planner/service.py` — `QueryPlannerService.plan(question, catalog, previous_error)` Azure OpenAI structured output → `QueryIR`.
-- `src/query/executor/dispatcher.py` — `ExecutorDispatcher.pick(ir) → BaseExecutor` by `source.source_type`. Lazy executor imports + per-source-type cache.
-- `src/query/service.py` — `QueryService.run(user_id, question, catalog) → QueryResult`. Plan→validate→retry-on-failure (max 3)→dispatch→execute. Catches NotImplementedError from TabularExecutor placeholder gracefully.
-**Prompts written** (filled in placeholders):
-- `src/config/prompts/intent_router.md`
-- `src/config/prompts/query_planner.md`
-- `src/config/prompts/chatbot_system.md`
-- `src/config/prompts/guardrails.md`
-**Tests added** (46 new — total now 146 + 2 skipped):
-- `tests/agents/test_intent_router.py` (4)
-- `tests/agents/test_answer_agent.py` (12)
-- `tests/agents/test_chat_handler.py` (6)
-- `tests/query/planner/test_prompt.py` (7)
-- `tests/query/planner/test_service.py` (3)
-- `tests/query/executor/test_dispatcher.py` (5)
-- `tests/query/test_service.py` (8)
-- `tests/query/planner/test_golden_questions.py` (3 — skipped by default; eval harness scaffold)
-**Lint**: `ruff check` clean on all Phase 2 paths. Phase 1 files have pre-existing E501/S608 issues — out of scope for this PR.
-**Placeholders / blockers for teammate** (status as of DB owner's commit, before merge):
-- `src/query/executor/tabular.py` (TAB) — DB owner's note: "still raises NotImplementedError". **Post-merge**: TAB shipped this in PR3-TAB; dispatcher now routes to the real `TabularExecutor`. The `NotImplementedError` catch in `QueryService` stays as a safety net.
-- `src/retrieval/document.py` — **implemented** (2026-05-11). Full `DocumentRetriever` migrated from `src/rag/retrievers/document.py`; supports MMR/cosine/euclidean/manhattan/inner_product. `_normalize_chunks` in `chat_handler.py` now handles `RetrievalResult` → `DocumentChunk` conversion correctly.
-- `src/api/v1/chat.py` (Phase 1) — NOT touched. Cleanup PR rewires the SSE endpoint to call `ChatHandler.handle(...)`.
-- `src/api/v1/db_client.py` (Phase 1) — NOT touched. Cleanup PR rewires `/database-clients/{id}/ingest` to call `pipeline.triggers.on_db_registered`.
----
-## What shipped previously (PR3-TAB — TAB owner)
-**Files implemented**:
-- `src/query/compiler/pandas.py` — `PandasCompiler` + `CompiledPandas(apply, output_columns)` dataclass. Pure helper functions (easier to test in isolation): `_apply_filters` (all 12 ops, `_like_to_regex` for LIKE), `_apply_select` (column pick + rename), `_apply_agg` (scalar + group_by via `pd.concat` of Series → `reset_index`), `_apply_orderby` (alias-aware via `_resolve_order_col`). Closure captures all IR fields explicitly so `apply(df)` is self-contained.
-- `src/query/executor/tabular.py` — `TabularExecutor` with injectable `fetch_blob` (same testability pattern as `TabularIntrospector`). Resolves Parquet blob path from `az_blob://{uid}/{did}` + table: single-table → `{uid}/{did}.parquet`, multi-table → `{uid}/{did}__{table.name}.parquet`. Runs compile → download → `asyncio.to_thread(_load_and_apply)` → 10k hard cap. Never raises; errors populate `QueryResult.error`. Uses `compiled.output_columns` for column labels (safe on empty DataFrame).
-**Tests added** (55 new — total suite was 86 all passing at PR3-TAB time):
-- `tests/unit/query/compiler/test_pandas_compiler.py` — 43 tests across all 12 filter ops (including `is_null`, `not_in`, `like`, `between`), all 6 agg fns, group_by, order_by asc/desc, limit-after-order, alias round-trip, empty DataFrame, error paths.
-- `tests/unit/query/executor/test_tabular_executor.py` — 12 tests: `_resolve_blob_name` (single/multi-table, bad prefix), happy-path `QueryResult` shape (columns, rows, backend, truncated, source_id), wrong source_type → error, blob fetch failure → error, unknown source → error.
-**Lint**: `ruff check` clean on both files.
----
-## What shipped previously (PR1-tab — TAB owner)
-**Files implemented**:
-- `src/catalog/introspect/tabular.py` — `TabularIntrospector` reads original blob (CSV/XLSX/Parquet), profiles each column (dtype, stats, sample values), runs PIIDetector. For XLSX: one `Table` per sheet (`Table.name = sheet_name`); for CSV/Parquet: one `Table` (`Table.name = filename stem`). `fetch_doc`/`fetch_blob` are constructor-injectable for unit tests — no `Settings` or DB required at import time.
-- `src/pipeline/triggers.py` — `on_tabular_uploaded` wired (mirrors `on_db_registered` pattern).
-**Tests added** (31 new):
-- `tests/unit/catalog/test_introspect_tabular.py` — CSV / XLSX / Parquet shapes, per-column stats, nullable detection, PII name + value matching, sample capping, all error paths. Pure Python, no network I/O.
-**Executor contract note**: introspector downloads the *original* blob for schema reading. The tabular executor (PR3-TAB) downloads *Parquet* blobs for query execution. For CSV/Parquet sources (single table), the executor must call `parquet_blob_name(uid, did, sheet_name=None)`; for XLSX (multi-table), `parquet_blob_name(uid, did, table.name)`.
----
-## What shipped previously (PR3-DB — DB owner)
-**Files implemented**:
-- `src/query/compiler/sql.py` — `SqlCompiler` for Postgres dialect; `CompiledSql(sql, params)` dataclass with `params: dict[str, Any]` (changed from `list`); supports all 12 whitelisted filter ops, all 6 aggs, alias-aware order_by; `_qident` escapes embedded double-quotes
-- `src/query/executor/db.py` — `DbExecutor` with sqlglot SELECT-only guard, Postgres session-level read-only + 30s `statement_timeout`, `asyncio.wait_for` backstop, 10k row hard cap; rejects non-`schema` source_type and `dbclient://` URI mismatch; never raises (populates `QueryResult.error`)
-**Files extended**:
-- `src/query/compiler/pandas.py` — fixed pre-existing UP035 (Callable import)
-- `pyproject.toml` — added `S608` to `tests/**` ruff ignore (false positive: tests assert literal SQL strings)
-**Tests added** (36 new, all passing — total now 100):
-- `tests/query/compiler/test_sql.py` — every filter op, every agg, count(*), count_distinct, order_by alias vs column, multi-filter AND, identifier quoting escape, error paths
-**Lint**: `ruff check` clean on Phase 2 paths.
-**Hand-off note for teammate**: `CompiledSql.params` is now `dict[str, Any]` not `list`. The pandas compiler will follow the same convention (or document its own) — coordinate when PR3-TAB lands.
----
-## What shipped previously (PR2a — DB owner)
-**Files implemented**:
-- `src/catalog/enricher.py` — Azure OpenAI GPT-4o + structured output (`EnrichmentResponse`), `render_source` (reusable by planner prompt later), `apply_descriptions` merger, injectable `structured_chain` for tests
-- `src/pipeline/structured_pipeline.py` — `StructuredPipeline` orchestrator + `default_structured_pipeline()` factory with lazy production-dep imports
-- `src/pipeline/triggers.py` — `on_db_registered` wired; tabular/document/rebuild stubs preserved with implementation notes
-**Files extended**:
-- `src/catalog/models.py` — added `ForeignKey` model, `Table.foreign_keys: list[ForeignKey] = []`
-- `src/catalog/introspect/database.py` — `_extract_foreign_keys` populates `Table.foreign_keys` from extractor data
-- `src/config/prompts/catalog_enricher.md` — full system prompt with style rules and one few-shot example
-**Tests added** (14 new, all passing — total now 64):
-- `tests/catalog/test_enricher.py` — render / apply / end-to-end with fake chain (10 tests)
-- `tests/pipeline/test_structured_pipeline.py` — orchestration with stub deps (4 tests)
-**Lint**: `ruff check` clean on all Phase 2 paths. Phase 1 files (`pipeline/db_pipeline/`, `pipeline/document_pipeline/`) have pre-existing ruff issues — out of scope for this PR.
----
-## What shipped previously (PR1 — DB owner's first chunk)
-**Files implemented** (was `NotImplementedError`):
-- `src/catalog/pii_detector.py`, `src/catalog/validator.py`, `src/catalog/store.py`, `src/catalog/reader.py`
-- `src/catalog/introspect/database.py` (FK extraction added in PR2a)
-- `src/query/ir/validator.py`
-**Files extended**:
-- `src/query/ir/operators.py` — `TYPE_COMPATIBILITY` matrix
-- `src/catalog/models.py` — `location_ref` URI-scheme docstring
-- `src/db/postgres/models.py` — `Catalog` SQLAlchemy table; `init_db.py` imports it
-**Tests**: 50 unit tests + 1 integration (gated on `RUN_INTEGRATION_TESTS=1`).
-**Reused Phase 1 utilities** (cleanup deferred):
-- `src/database_client/database_client_service.py:get`
-- `src/utils/db_credential_encryption.py:decrypt_credentials_dict`
-- `src/pipeline/db_pipeline/db_pipeline_service.py:engine_scope`
-- `src/pipeline/db_pipeline/extractor.py:get_schema/profile_column/get_row_count`
----
-## Open contract items (not yet locked)
-- **Joins in IR** — currently single-table only (ARCHITECTURE.md §7); DB owner accepted the constraint for v1, will revisit in PR3 if it's blocking real queries
-- **`updated_at` on Source vs `generated_at` on Catalog** — Pydantic models have both; introspector sets per-Source; CatalogStore preserves both
-- **Catalog refresh trigger** (open question §3) — default policy is rebuild-on-upload-or-connect; auto-refresh deferred
-- **Unstructured catalog entries** (open question §2) — currently empty filter for `source_hint="unstructured"`; revisit when adding doc descriptions
-- **PII handling for `sample_values`** (open question §5) — currently nulls them out (skip); mask/synthesize deferred
-- **Dialect priority for SQL compiler** — PR3 will land Postgres first, MySQL second; BigQuery/Snowflake/SQL Server later
----
-## How to update this file
-When a PR lands:
-1. Flip status from `[ ]` or `[~]` to `[x]`
-2. Add a short note (file paths, scope cuts, surprises)
-3. Bump "Last updated" at the top
-4. If a new contract decision lands, move it from "Open contract items" to the relevant inline note
-When opening a PR:
-1. Flip status to `[~]` and add yourself as the active owner in the PR row
-2. Don't promise items in the PR description that aren't in the table

PROJECT_BRD.md ADDED Viewed

	@@ -0,0 +1,150 @@

+# Data Eyond — Python Agentic Service: Business Requirements & Design (BRD)
+**Status:** draft for review · **Date:** 2026-06-26 · **Branch:** `pr/4`
+**Audience:** Harry (Go gateway) + leads/stakeholders.
+**Scope:** the Python **agentic LLM service** (`Agentic-Service-Data-Eyond-Catalog`) only — its
+requirements, capabilities, architecture, data, and integration contract.
+**Companions (source of truth, not duplicated here):** [REPO_STATUS.md](REPO_STATUS.md) (current
+built state) · [API_ENDPOINTS.md](API_ENDPOINTS.md) (FE-callable API) · [DEV_PLAN.md](DEV_PLAN.md)
+(in-flight plan). This BRD synthesizes those into a stakeholder-facing document; convert to PDF/Word
+for distribution.
+---
+## 1. Purpose & scope
+Data Eyond is an **"AI data scientist"** for business analytics, modelled on **CRISP-DM** (Business
+Understanding → Data Understanding → Preparation → Modeling → Evaluation → Deployment). A user sets a
+goal, connects data (databases or files), asks natural-language analytical questions, and receives
+CRISP-DM-structured answers that can be exported as a versioned **report**. The aim is a *"junior data
+scientist that hands back a decision-ready deliverable,"* not a *"chatbot over a database."*
+This document covers the **Python service** — the agentic reasoning layer. It does **not** specify the
+Go gateway or the React frontend except at their integration boundaries (§9, §11).
+## 2. Business context & objectives
+- **Target users:** executives doing self-serve deep-dives; analysts offloading routine work.
+- **Value:** turn a business question + connected data into auditable, CRISP-DM-structured findings
+  and a formal report, without the user writing SQL or code.
+- **Objectives:** (a) accurate, grounded analysis over the user's own data; (b) a decision-ready,
+  versioned report artifact; (c) safe, read-only access to user data; (d) a clean service contract the
+  Go gateway can integrate against.
+## 3. Stakeholders & actors
+| Actor | Role |
+|---|---|
+| End user (exec/analyst) | Defines the analysis goal, asks questions, generates reports (via the FE) |
+| Frontend (React/Vite) | Talks to Go for everything; to Python only for chat streaming |
+| Go gateway (`Orchestrator-Agent-Service`) | Auth/JWT, rooms, documents, DB-credential storage, catalog ingestion, **all DB migrations**, and now all analysis-state writes |
+| Python agentic service (this repo) | Router, skills, slow analytical path, structured query engine, RAG, report generation |
+| Harry | Owns the Go gateway + dedorch DB migrations |
+## 4. Solution overview
+Request flow is **FE → Go → Python**; the FE calls Python directly only for chat streaming. The Python
+service is a **FastAPI** app that classifies each user message and dispatches it to the right
+capability, streaming results back over SSE. Heavy analysis runs through a deterministic **slow path**
+(plan → execute → assemble) whose structured output is persisted and later rendered into reports.
+## 5. Functional requirements (capabilities)
+| ID | Capability | Description |
+|---|---|---|
+| FR-1 | **Intent routing** | One GPT-4o call classifies each message into one of 5 intents — `chat`, `help`, `check`, `unstructured_flow`, `structured_flow` — with history-aware query rewriting (EN/ID). |
+| FR-2 | **Help skill** | State-aware, next-step guidance (LLM, streamed); only offers actions the current state allows (e.g., a report only when one is generatable). |
+| FR-3 | **Check skill** | No-LLM inventory of available structured data + uploaded documents. |
+| FR-4 | **Structured analysis (slow path)** | Planner → TaskRunner → Assembler: a static DAG of tool-call chains, degrade-and-continue execution, narrative authored by one LLM call; produces a structured run record. |
+| FR-5 | **Structured query engine** | Catalog-driven JSON IR → deterministic SQL/pandas compiler → read-only executor, with single-level FK joins (DB sources). |
+| FR-6 | **Unstructured RAG** | Retrieval over PGVector document chunks, answered by the chatbot. |
+| FR-7 | **Analytics tools** | Composite `analyze_*` (descriptive, aggregate, correlation, trend) over data-access tools (`check_*`, `retrieve_*`). |
+| FR-8 | **Report generation** | Deterministic assembly of findings/EDA/limitations/method from persisted run records + one LLM call for the executive summary; **versioned**, formal markdown. |
+| FR-9 | **Analysis sessions** | One session = one analysis = one chat room (`analysis_id == room_id`); per-analysis data-source binding. |
+**Goal capture (post-2026-06-24 pivot):** the analysis goal is **two user-entered fields** —
+`objective` + `business_questions` — captured at onboarding, **both mandatory, no agent validation**.
+The former agent-validated "problem statement" + its gate are removed.
+## 6. Analysis & report lifecycle
+1. **Create analysis** (via Go) — session row + chat room + chosen data-source bindings; goal =
+   `objective` + `business_questions`.
+2. **Ask questions** — `POST /chat/stream`; the router dispatches; `structured_flow` questions run the
+   slow path and **persist one `report_inputs` row per run** (the report's source of truth).
+3. **Generate report** — the report skill reads the session's `report_inputs`, assembles the structured
+   sections + an executive summary, and persists an immutable **versioned** report (markdown).
+4. **Read reports** — list versions / fetch a version.
+> Reports are **records-based** (never from chat history) and require the slow path to have run
+> (`enable_slow_path=true`) so records exist.
+## 7. System architecture (subsystems)
+FastAPI + async SQLAlchemy + LangChain (Azure GPT-4o) + Redis + Azure Blob + PGVector. Key subsystems
+(detail in REPO_STATUS §9):
+- **Router** (`agents/orchestration.py`) — 5-intent classifier.
+- **Skills** (`agents/handlers/`) — `help` (LLM), `check` (no-LLM).
+- **Slow path** (`agents/slow_path/` + `agents/planner/`) — Planner, TaskRunner, Assembler.
+- **Structured query engine** (`query/`) — IR validate → compile → read-only execute (never raises).
+- **Report** (`agents/report/`) — generator, store (advisory-locked versioning), readiness floor.
+- **Observability** — Langfuse tracing (PII-masked); Redis caching; pooled DB engines.
+## 8. Data model
+SQLAlchemy models in `src/db/postgres/models.py` (detail in REPO_STATUS §8). The service is moving to
+the shared **dedorch** DB (Go owns migrations; Python is consumer-only — §11).
+| Table | Purpose | Owner |
+|---|---|---|
+| `users` | accounts (incl. `fullname` for report authorship) | Go |
+| `analyses` *(plural)* | per-analysis session state: `objective`/`business_questions` (pivot), `user_id`, `status`, `data_bind`(+version), `report_collection`, `report_id` | Go (dedorch) |
+| `analyses_messages` | the analysis chat room (user Q + agent A) — replaces deprecated `chat_messages`/`rooms` | Go (dedorch) |
+| `report_inputs` | one jsonb row per slow-path run — the report's source of truth (was `analysis_records`) | **Python** (schema handed to Go) |
+| `reports` | versioned report artifacts (markdown) | Go (dedorch) |
+| `data_sources` | per-analysis source bindings | Go (dedorch) |
+| `documents`, `databases`, `data_catalog` | uploads, DB credentials (Fernet), per-user catalog | Go ingestion |
+| `langchain_pg_embedding` | PGVector document chunks | Go ingestion |
+## 9. API surface (FE-callable)
+Full contract + request/response examples in [API_ENDPOINTS.md](API_ENDPOINTS.md). The FE-callable
+surface is **4 things**:
+1. **`call_agent`** — `POST /api/v1/chat/stream` (SSE).
+2. **`list_skills`** — `GET /api/v1/tools` (slash-command catalog; cacheable).
+3. **skill: `help`** — via `call_agent` (router intent; no dedicated endpoint).
+4. **skill: `report`** — `POST /api/v1/report` + `GET` list/version.
+`analysis_id == room_id`. Auth is terminated at Go; Python trusts `user_id`/`room_id`.
+## 10. Non-functional requirements
+| Area | Requirement / mechanism |
+|---|---|
+| **Security — data access** | All structured queries are read-only: IR validation + SQL compiler whitelist + sqlglot SELECT-only guard + read-only session + LIMIT/timeout. DB credentials are Fernet-encrypted with an owner check. |
+| **Security — PII** | PII columns carry no sample values into prompts; Langfuse masks PII on assembler/chatbot spans. |
+| **Reliability** | Never-throw seams across tools/query/executors/state/report — failures degrade to soft output rather than crashing a turn. |
+| **Performance** | Redis response cache (stateless `chat` only) + retrieval cache; pooled DB engines + speculative prewarm; warm Azure clients per process. |
+| **Observability** | Langfuse: one trace per request (router/planner/assembler/chatbot + tool spans), tokens + latency. |
+| **Portability** | Runs on HuggingFace Spaces (Linux) and Windows (`run.py` sets the selector event-loop policy for psycopg3 async). |
+## 11. Integrations & dependencies
+- **Two-repo boundary:** Python is edited independently; Go + FE are reference-only. Python reads/writes
+  shared Postgres, reads Azure Blob (Parquet for tabular sources), uses Redis.
+- **dedorch migration:** Python is moving from the `dataeyond` DB to **dedorch**. **Go owns all
+  migrations; Python is consumer-only** — if Python needs a table, it hands Go the schema. Table names
+  are **plural** (`analyses`, `analyses_messages`); `rooms`/`chat_messages` are deprecated there.
+- **State writes via Go:** all analysis-state writes move behind Go; Python's per-turn state access
+  becomes a read-only get (in progress).
+- **External services:** Azure OpenAI (GPT-4o + embeddings), Azure Blob, Postgres (+ PGVector), Redis,
+  Langfuse.
+## 12. Constraints & assumptions
+- The slow path must be enabled (`enable_slow_path=true`) for reports to have content.
+- `report_inputs` is Python-owned but its schema is provided to Go so the dedorch migration creates it
+  (so it survives the `SKIP_INIT_DB` cutover).
+- Charts and images are **out of scope for now** — reports are markdown (tables/bold/italic/separators);
+  charts (Plotly JSON) and images (table + bucket) are deferred.
+- The frontend has no dedicated UI designer; UI is being researched in parallel.
+## 13. Open items & roadmap
+Tracked in [DEV_PLAN.md](DEV_PLAN.md) §4. Headlines: finish Go-side state ownership (#7/#18), the
+dedorch `analyses` migration (#3, mostly done), HF deploy + playground test (#13), chat-path migration
+to `analyses_messages` (#25), and the deferred charts/images/UI work (#26/#27/#28).
+## 14. Glossary
+- **Slow path** — the deterministic Planner→TaskRunner→Assembler analytical pipeline.
+- **`report_inputs`** — the jsonb table of slow-path run records the report reads (formerly `analysis_records`).
+- **dedorch** — the shared Postgres DB the service is migrating to; Go owns its migrations.
+- **CRISP-DM** — the cross-industry standard data-mining process the analysis is structured around.
+- **`analysis_id == room_id`** — one analysis session is one chat room, identified by the same id.

REPO_CONTEXT.md DELETED Viewed

@@ -1,494 +0,0 @@
-# Repo Context — Agentic Service Data Eyond Catalog
-Orientation file for future Claude Code sessions. Cross-reference `ARCHITECTURE.md` for the full design rationale and decision log.
----
-## Product vision — Data Eyond, your AI data scientist
-Data Eyond is positioned as an *AI data scientist* that supports business analytics. It is built around the **CRISP-DM** framework (Business Understanding → Data Understanding → Data Preparation → Modeling → Evaluation → Deployment) — the agent works through data problems the way a real analyst would, not as a one-shot Q&A bot.
-**Target users:**
-- **Executives** — deep-dive into their own data and extract insight to drive business decisions without needing a data team in the loop.
-- **Data analysts / scientists** — offload routine analysis so they can focus on heavier work.
-**Envisioned user flow:**
-1. **Discovery interview** — a short conversation with a Data Eyond *interview agent* that draws out goal, business context, and what the user is actually trying to learn (CRISP-DM Business Understanding).
-2. **Connect data** — DB connection or file upload (DB, CSV, XLSX, Parquet, documents).
-3. **Ask Data Eyond** — natural-language analytical question.
-4. **CRISP-DM-structured analytical response** — exportable as a **presentation deliverable** or a **notebook-style report**.
-North star: less "chatbot over a database", more "junior data scientist that hands back a polished, decision-ready deliverable."
-The current repo (Phase 2, below) is the *foundation* — IntentRouter → QueryPlanner → Executor → ChatbotAgent gives us a reliable structured-query spine. The next evolution is the agentic layer that turns this into an end-to-end CRISP-DM workflow (see *Roadmap — agentic evolution* further down).
----
-## TL;DR
-FastAPI multi-agent backend for data analysis. Users upload documents and register databases / tabular files; they ask natural-language questions and get answers grounded in their data, streamed via SSE.
-The architecture has two paths:
-- **Unstructured** (PDF, DOCX, TXT) — dense similarity over prose chunks (PGVector).
-- **Structured** (databases, XLSX, CSV, Parquet) — a per-user **data catalog** describes what tables/columns exist; an LLM produces a **JSON IR** of intent; a deterministic Python compiler turns the IR into SQL or pandas; the executor runs it.
-The LLM produces *intent*, not query syntax. Deterministic code does the rest.
-The Phase 2 end-to-end flow is **wired and runnable** as of 2026-05-12. See *Implementation status* below for the per-file matrix. `PROGRESS.md` is the authoritative line-by-line tracker; this file is the orientation.
----
-## Stack
-- Python 3.12, FastAPI 0.115, uvicorn, sse-starlette
-- Async SQLAlchemy 2.0 + asyncpg (Postgres), psycopg3 (PGVector multi-statement workaround)
-- LangChain 0.3 + langchain-postgres (PGVector) + langchain-openai (Azure OpenAI GPT-4o + embeddings)
-- LangGraph 0.2 + langgraph-checkpoint-postgres
-- Redis 5 (response + retrieval cache)
-- Azure Blob Storage (uploads + Parquet)
-- pandas, pyarrow, polars-ready (deferred), sqlglot, pydantic v2, structlog, slowapi, langfuse
-- presidio-analyzer + spaCy `en_core_web_lg` (PII), pytesseract + pdf2image (PDF OCR)
-- DB connectors: psycopg2, pymysql, pymssql, sqlalchemy-bigquery, snowflake-sqlalchemy
-Run: `uv run --no-sync uvicorn main:app --host 0.0.0.0 --port 7860`. On Windows use `uv run --no-sync python run.py` (sets `WindowsSelectorEventLoopPolicy` for psycopg3 async).
----
-## Top-level layout
-```
-main.py                — FastAPI app + middleware + router wiring + init_db() on startup
-run.py                 — Windows-safe local entry point
-ARCHITECTURE.md        — design intent (source of truth for shape + invariants)
-README.md
-Dockerfile             — python:3.12-slim, installs spaCy en_core_web_lg, tesseract, poppler
-pyproject.toml / uv.lock
-scripts/               — backfill scripts (build_initial_catalogs, enrich_all_sources)
-src/                   — all application code
-```
----
-## src/ map
-### Core data shapes (only files with real content)
-| Path | Role |
-|---|---|
-| `catalog/models.py` | Pydantic: `Catalog → Source[] → Table[] → Column[]` |
-| `query/ir/models.py` | `QueryIR` (select / filters / group_by / order_by / limit) |
-| `query/ir/operators.py` | `ALLOWED_FILTER_OPS`, `ALLOWED_AGG_FNS`, `LIMIT_HARD_CAP=10000` |
-| `security/pii_patterns.py` | name patterns + email/phone regex for PII detection |
-### Catalog — identity layer for structured sources (Cs ∪ Ct)
-| Path | Role |
-|---|---|
-| `catalog/introspect/base.py` | `BaseIntrospector.introspect(location_ref) -> Source` |
-| `catalog/introspect/database.py` | `information_schema` + ~100 row sample → draft Source |
-| `catalog/introspect/tabular.py` | Parquet/CSV/XLSX header reader + sample (one Table per sheet for XLSX) |
-| `catalog/render.py` | renders a `Source` as the canonical text block consumed by the planner (KM-557; LLM enrichment removed — planner reads stats + samples directly) |
-| `catalog/validator.py` | invariants beyond Pydantic shape (unique IDs, FK refs) |
-| `catalog/store.py` | persist as Postgres `jsonb` row keyed by user_id (`get/upsert/delete`) — table `data_catalog` |
-| `catalog/reader.py` | load + filter catalog by source_hint (returns full catalog for ≤50 tables) |
-| `catalog/pii_detector.py` | flag PII columns at ingestion → suppresses `sample_values` |
-### Query — catalog-driven structured path
-| Path | Role |
-|---|---|
-| `query/service.py` | `QueryService.run(user_id, question, catalog) -> QueryResult` (top-level) |
-| `query/planner/service.py` | LLM call: question + catalog → QueryIR (structured output) |
-| `query/planner/prompt.py` | renders catalog into the planner prompt |
-| `query/ir/validator.py` | catalog-aware IR validation: column_ids exist, ops whitelisted, value_type matches data_type, limit ≤ cap |
-| `query/compiler/base.py` | `BaseCompiler.compile(ir) -> object` |
-| `query/compiler/sql.py` | IR → `(sql, params)`; identifiers from catalog, values parameterized |
-| `query/compiler/pandas.py` | IR → callable that runs against a DataFrame |
-| `query/executor/base.py` | `BaseExecutor.run(ir) -> QueryResult` (uniform across backends) |
-| `query/executor/db.py` | runs compiled SQL via asyncpg/pymysql in read-only txn (sqlglot second-line defence) |
-| `query/executor/tabular.py` | runs pandas/polars chain on a Parquet file (eager pandas → pyarrow pushdown → polars lazy by file size) |
-| `query/executor/dispatcher.py` | picks DB vs Tabular executor based on `source.source_type` of the IR's source |
-### Retrieval — unstructured path (Cu)
-| Path | Role |
-|---|---|
-| `retrieval/document.py` | `DocumentRetriever` over PGVector chunks |
-| `retrieval/router.py` | dispatches the `unstructured` route (the `chat` and `structured` routes do not pass through here) |
-### Agents — the three LLM call sites
-| Path | Role |
-|---|---|
-| `agents/orchestration.py` | `OrchestratorAgent` — classifies message → `needs_search`, `source_hint ∈ {chat, unstructured, structured}`, `rewritten_query`. Filename + class name kept from Phase 1; body replaced with Phase 2 logic. Output model is `IntentRouterDecision` |
-| `agents/chatbot.py` | `ChatbotAgent` — final answer formation (receives Cu chunks or QueryResult); SSE-streamed via `astream` |
-| `agents/chat_handler.py` | `ChatHandler` — top-level orchestrator; routes to chat / unstructured / structured and yields SSE-style `intent`/`chunk`/`done`/`error` events |
-(`QueryPlanner` is the third LLM call site, under `query/planner/`. The
-fourth — `CatalogEnricher` — was removed in KM-557; ingestion no longer
-makes any LLM calls.)
-### Pipelines — ingestion coordinators
-| Path | Role |
-|---|---|
-| `pipeline/structured_pipeline.py` | DB / tabular: introspect → merge → validate → store (no enrich step since KM-557) |
-| `pipeline/document_pipeline.py` | unstructured: extract → chunk → embed → PGVector. CSV/XLSX skip vector store (catalog only). Invalidates retrieval cache on process/delete. |
-| `pipeline/triggers.py` | event entry points called by API routes: `on_db_registered`, `on_tabular_uploaded`, `on_document_uploaded`, `on_catalog_rebuild_requested` |
-(`pipeline/orchestrator.py` was deleted in the Cleanup PR — it was a redundant stub; `StructuredPipeline` already takes the introspector at `run()` time.)
-### Security — cross-cutting
-| Path | Role |
-|---|---|
-| `security/auth.py` | bcrypt password hash/verify, JWT encode/decode, get_user |
-| `security/credentials.py` | Fernet encrypt/decrypt for stored DB credentials |
-| `security/pii_patterns.py` | (already listed) |
-### API + infra + config
-| Path | Role |
-|---|---|
-| `api/v1/*.py` | FastAPI routers — thin endpoints delegating to `pipeline/triggers` and `query/service` |
-| `models/api/{catalog,chat,document}.py` | request/response Pydantic models |
-| `db/postgres/connection.py` | two async engines: `engine` (app) and `_pgvector_engine` (PGVector) |
-| `db/postgres/init_db.py` | startup: creates `vector` extension, all tables, HNSW + GIN indexes |
-| `db/postgres/models.py` | SQLAlchemy app tables (users, rooms, chat messages, …) |
-| `db/postgres/vector_store.py` | shared PGVector instance (collection `documents` — written by Go ingestion service) |
-| `db/redis/connection.py` | async Redis client |
-| `storage/az_blob/az_blob.py` | Azure Blob async wrapper (uploads + Parquet) |
-| `middlewares/{cors,logging,rate_limit}.py` | CORS allow-all (POC), structlog JSON, slowapi |
-| `observability/langfuse/langfuse.py` | trace helper |
-| `config/settings.py` | pydantic-settings; `.env` uses double-underscore aliases |
-| `config/env_constant.py` | env file path constant |
-| `config/prompts/*.md` | prompt templates: `intent_router`, `query_planner`, `chatbot_system`, `guardrails` (KM-557 removed `catalog_enricher`) |
----
-## Core architectural decisions
-1. **Catalog as primary context, not retrieval.** For ≤50 tables (typical), the entire catalog is rendered into the planner prompt verbatim (~3–5k tokens). No vector search, no BM25, no top-k for structured data. Catalog-level retrieval (BM25 + table-level vectors with RRF) is the *deferred* upgrade for users with hundreds of tables.
-2. **JSON IR over raw SQL.** The planner LLM emits a Pydantic-validated intent, never a SQL string. The compiler is deterministic Python. Benefits: validatable before execution, dialect-portable (one IR → SQL of any dialect / pandas / polars), cheaper tokens, trivially testable without an LLM, and the LLM literally cannot emit invalid SQL syntax.
-3. **Deterministic compiler, not LLM SQL writer.** All actual query construction happens in pure code. Compiler bugs are reproducible and fixable. Same IR → same query.
-4. **Pipeline stage isolation.** Each stage (`IntentRouter`, `CatalogReader`, `QueryPlanner`, `IRValidator`, `QueryCompiler`, `QueryExecutor`, `ChatbotAgent`) is its own module with typed input and typed output. No god classes.
-5. **Minimal LLM surface.** Only three LLM call sites in the system (KM-557 dropped `CatalogEnricher` — ingestion is now LLM-free; the planner reads stats + sample rows + column names directly):
-   - `IntentRouter` — once per user message
-   - `QueryPlanner` — once per structured query
-   - `ChatbotAgent` — once per answer (formatting)
-6. **Three-way routing**: `chat` / `unstructured` / `structured`. The router commits to one path. Cross-source questions ("compare DB sales vs uploaded customer file") are handled inside the structured path because the planner sees Cs ∪ Ct in one prompt. **DB vs tabular is not a routing concern** — it's a per-source attribute (`source_type`) that only matters at execution time.
-7. **Stable IDs.** `source_id`, `table_id`, `column_id` are stable internal references. Renaming a column in the source DB does not invalidate cached IRs.
-8. **PII suppression at the boundary.** Columns flagged with `pii_flag=true` have `sample_values: null` — real PII never enters LLM prompts. Auto-detected at ingestion via name patterns + value regex (`security/pii_patterns.py`). When in doubt, flag — false positives cost nothing; false negatives leak data.
----
-## End-to-end flows
-### Ingestion (when user uploads a file or connects a DB)
-```
-source upload / DB connect
-    │
-    ├── unstructured (pdf/docx/txt)
-    │     → DocumentPipeline: extract → chunk → embed → PGVector
-    │
-    └── structured (DB schema or tabular file)
-          → introspect (information_schema or file headers + sample rows)
-          → CatalogValidator (Pydantic + unique-IDs + FK refs)
-          → CatalogStore.upsert(user_id jsonb row in `data_catalog`)
-```
-### Query (per user message)
-```
-user message
-    │
-    → Redis cache check (24h TTL)  ── miss ─→ continue
-    →
-    → IntentRouter LLM   →  needs_search? source_hint?
-    │
-    ├── chat          → ChatbotAgent → SSE stream
-    ├── unstructured  → DocumentRetriever (Cu) → ChatbotAgent → SSE stream
-    └── structured    →
-          CatalogReader.read(user_id, "structured")          # full Cs ∪ Ct
-              ↓
-          QueryPlanner LLM(question, catalog) → QueryIR
-              ↓
-          IRValidator.validate(ir, catalog)
-              (source_id ∈ catalog, table_id ∈ source, column_ids ∈ table,
-               ops/aggs whitelisted, value_type matches data_type, limit ≤ 10000)
-              fail → re-prompt planner with error context (max 3 retries)
-              ↓
-          ExecutorDispatcher.pick(ir)              # by source.source_type
-              ├─ DbExecutor       → SqlCompiler → sqlglot guard → asyncpg/pymysql
-              │                     (read-only txn, 30s timeout)
-              └─ TabularExecutor  → PandasCompiler → eager pandas (≤100 MB)
-                                    or pyarrow pushdown (100 MB–1 GB)
-                                    or polars lazy scan (>1 GB)
-              ↓
-          QueryResult
-              ↓
-          ChatbotAgent → SSE stream
-```
----
-## Catalog schema (per-user `jsonb` row)
-```
-Catalog
-├── user_id, schema_version, generated_at
-└── sources[]
-    └── Source { source_id, source_type, name, description, location_ref, updated_at }
-        └── tables[]
-            └── Table { table_id, name, description, row_count, foreign_keys[] }
-                ├── columns[]
-                │   └── Column { column_id, name, data_type, description,
-                │                 nullable, pii_flag, sample_values[]|null, stats|null }
-                └── foreign_keys[]
-                    └── ForeignKey { column_id, target_table_id, target_column_id }
-```
-`source_type ∈ {schema, tabular, unstructured}`.
-`data_type ∈ {int, decimal, string, datetime, date, bool, json}`.
-`ForeignKey` references are within the SAME `Source` only; cross-source FKs are not modeled.
-Deferred Column fields (add when justified): `description_human`, `synonyms[]`, `tags[]`, `primary_key`, `unit`, `semantic_type`, `example_questions[]`, `schema_hash`, `enrichment_status`.
----
-## JSON IR schema
-```jsonc
-{
-  "ir_version": "1.0",
-  "source_id":  "...",
-  "table_id":   "...",
-  "select": [
-    {"kind": "column", "column_id": "...", "alias": "..."},
-    {"kind": "agg",    "fn": "count|count_distinct|sum|avg|min|max",
-                       "column_id": "...?", "alias": "..."}
-  ],
-  "filters": [
-    {"column_id": "...",
-     "op":    "= | != | < | <= | > | >= | in | not_in | is_null | is_not_null | like | between",
-     "value": ...,
-     "value_type": "int|decimal|string|datetime|date|bool"}
-  ],
-  "group_by": ["column_id", ...],
-  "order_by": [{"column_id": "...", "dir": "asc|desc"}],
-  "limit": 100
-}
-```
-Single-table only in v1. `having`, `offset`, boolean filter trees, `distinct`, joins, window functions are deferred until user demand proves the limitation.
----
-## Implementation status
-**As of 2026-05-12 — Phase 2 end-to-end flow is wired.** `PROGRESS.md` has the per-PR line-item table; this section is the high-level snapshot. Stub files (`raise NotImplementedError`) are now the exception, not the rule.
-| Area | Status | Notes |
-|---|---|---|
-| Catalog Pydantic models | ✅ | `catalog/models.py` — incl. `ForeignKey`, `ColumnStats.top_values` |
-| JSON IR Pydantic models | ✅ | `query/ir/models.py` + `operators.py` (TYPE_COMPATIBILITY filled) |
-| Catalog ingestion — DB | ✅ | introspect → validate → upsert. `on_db_registered` wired; `/api/v1/db-clients/{id}/ingest` calls it |
-| Catalog ingestion — tabular | ✅ | CSV/XLSX/Parquet; `on_tabular_uploaded` wired into `/api/v1/document/process`. XLSX → one Table per sheet. CSV/XLSX skip vector store |
-| Catalog ingestion — unstructured | ✅ | `on_document_uploaded` implemented; full DocumentPipeline (extract → chunk → embed → PGVector) |
-| Catalog store / reader / validator / PII detector | ✅ | `data_catalog` jsonb table (renamed from `catalogs` in KM-557) |
-| LLM enrichment | ❌ removed (KM-557) | Cost cut — planner reads `column.stats` + `sample_values` + `top_values` + `column.name` directly. `catalog/render.py` keeps the source-rendering helper |
-| `IntentRouter` (lives as `OrchestratorAgent` in `agents/orchestration.py`) | ✅ | 3-way `source_hint`, history-aware query rewriting. Filename + class name kept from Phase 1; Phase 2 body |
-| `CatalogReader` | ✅ | Loads full catalog; filters by `source_hint` |
-| `QueryPlanner` LLM call | ✅ | Azure OpenAI structured output → `QueryIR`; supports retry with `previous_error` |
-| IR validator | ✅ | Catalog-aware; full rule set; descriptive errors |
-| SQL compiler (Postgres) | ✅ | All 12 filter ops, all 6 aggs, alias-aware order_by, parameterized values, quoted identifiers |
-| DbExecutor | ✅ | sqlglot SELECT-only guard, RO txn, `statement_timeout=30000`, 10k row cap, never raises |
-| Pandas compiler | ✅ | Same op coverage as SQL; pure module-level helpers |
-| TabularExecutor | ✅ | Parquet blob path resolution, `asyncio.to_thread`, 10k cap, never raises |
-| ExecutorDispatcher | ✅ | Routes by `source.source_type`; lazy imports + cache |
-| QueryService | ✅ | plan → validate → retry-on-fail (max 3) → dispatch → execute → `QueryResult` |
-| `ChatbotAgent` + prompt + guardrails | ✅ | Renamed from `AnswerAgent` in Cleanup PR. Guardrails appended to `chatbot_system.md` |
-| `ChatHandler` (top-level chat orchestrator) | ✅ | SSE events: `intent` / `chunk` / `done` / `error` |
-| `DocumentRetriever` + `RetrievalRouter` (Redis-cached) | ✅ | Migrated from `src/rag/` (now deleted). Mentor commit `61c746f` rewrote to raw SQL (pgvector `<=>` cosine, `<+>` manhattan) to dodge asyncpg type-mapping issues with Go-ingested schema. Methods reduced to `cosine | manhattan`. Collection: `documents`. |
-| `/api/v1/chat/stream` | ✅ | Rewired to `ChatHandler`; Redis cache + fast intent + history + message persistence remain in chat.py |
-| `/api/v1/db-clients/{id}/ingest` | ✅ | Calls only `on_db_registered`; Phase 1 dual-write removed |
-| `/api/v1/document/{upload,process,delete}` | ✅ | `/process` triggers `on_tabular_uploaded` for CSV/XLSX |
-| `GET /api/v1/data-catalog/{user_id}` | ✅ | Index endpoint (KM-557) |
-| `POST /api/v1/data-catalog/rebuild` | ✅ | Iterates sources, re-runs per-source trigger |
-| Credential encryption | ⚠️ stub | `security/credentials.py` not migrated; runtime reuses Phase 1 `utils/db_credential_encryption.py` |
-| Tests | ✅ 146+ unit | Compilers (DB 36, Pandas 43), validators, introspectors, agents, chat handler, dispatcher, planner |
-| Planner eval harness | 🟡 scaffold | 3 DB + 4 tabular golden cases. Gated on `RUN_PLANNER_EVAL=1`. Real Azure OpenAI passing |
-| E2E smoke tests | ❌ not started | Component-level orchestration is covered |
-| DB introspector unit test | ❌ deferred | Needs Postgres testcontainer |
-| Sources event in `/chat/stream` | ⚠️ emits `[]` | `ChatHandler` doesn't surface retrieval sources yet; same gap reflected in `save_messages` |
-**Deferred to later phases**: joins in IR, schema drift detection, hybrid catalog search (BM25 + RRF for 100+ table users), polars lazy scan for >1GB tabular files, MySQL/BigQuery/Snowflake SQL dialects, mask/synthesize PII strategies.
----
-## Team — division of work
-The service is built by two engineers; many modules are source-type-agnostic and shared.
-- **DB** owns SQL paths: introspection, SQL compiler, DB executor, credential storage.
-- **TAB** owns tabular paths: CSV/XLSX/Parquet introspection, pandas compiler, tabular executor, blob/Parquet plumbing.
-- **B** = both — shared contracts and source-type-agnostic plumbing. Pair-program or split with explicit hand-off.
-### Step-by-step ownership
-| # | Step | File / area | Owner | Notes |
-|---|---|---|---|---|
-| 0 | **Lock contracts before coding** | — | B | See "Decisions to lock" below; block until aligned |
-| 1 | Catalog Pydantic models | `catalog/models.py` | B | Already done; only touch if both agree |
-| 2 | IR Pydantic models | `query/ir/models.py` | B | Already done; joins/window fns require joint sign-off |
-| 3 | IR operator whitelists | `query/ir/operators.py` | B | Already done; both compilers rely on these |
-| 4 | PII patterns / regex | `security/pii_patterns.py` | B | Already done; extend together as gaps appear |
-| **Ingestion — introspection** | | | | |
-| 5 | DB introspector (information_schema, sample, FKs) | `catalog/introspect/database.py` | DB | Use SQLAlchemy `inspect()`; dialect-aware quoting |
-| 6 | Tabular introspector (CSV/XLSX/Parquet headers + sample) | `catalog/introspect/tabular.py` | TAB | Each XLSX sheet → one Table |
-| 7 | `BaseIntrospector` ABC | `catalog/introspect/base.py` | B | Confirm signature returns the same `Source` shape |
-| **Ingestion — shared catalog plumbing** | | | | |
-| 8 | ~~Catalog enricher + prompt~~ | — | **REMOVED in KM-557.** Cost optimization — planner reads stats + sample rows directly. `catalog/render.py` keeps the source-rendering helper. |
-| 9 | Catalog validator | `catalog/validator.py` | B | Type-agnostic |
-| 10 | Catalog store (Postgres jsonb) | `catalog/store.py` | B | Recommend DB (Postgres expertise) |
-| 11 | Catalog reader | `catalog/reader.py` | B | Type-agnostic |
-| 12 | PII detector | `catalog/pii_detector.py` | B | Either; uses `pii_patterns.py` |
-| **Ingestion — pipelines** | | | | |
-| 13 | Structured pipeline (introspect → enrich → validate → store) | `pipeline/structured_pipeline.py` | B | Pair on this — calls both introspectors via dispatcher |
-| 14 | Triggers (`on_db_registered`, `on_tabular_uploaded`) | `pipeline/triggers.py` | B | Each owns their trigger function |
-| 15 | Ingestion orchestrator | `pipeline/orchestrator.py` | B | Routes by source_type; pair |
-| 16 | Document pipeline (PDF/DOCX/TXT) | `pipeline/document_pipeline.py` | TAB | Tabular-adjacent (file uploads) |
-| **Query — shared spine** | | | | |
-| 17 | IR validator (catalog-aware) | `query/ir/validator.py` | B | Recommend DB; both must agree on exact error messages so retry-prompt is consistent |
-| 18 | Planner LLM service | `query/planner/service.py` | B | Type-agnostic |
-| 19 | Planner prompt (catalog → text) | `query/planner/prompt.py`, `config/prompts/query_planner.md` | B | **Pair-program**. Must describe DB tables and tabular files in one consistent format |
-| 20 | Intent router (chat/unstructured/structured) | `agents/orchestration.py` (class `OrchestratorAgent` — Phase 1 filename + class name preserved; Phase 2 body), `config/prompts/intent_router.md` | B | Type-agnostic. The prompt file uses `intent_router.md`, but the source module is still `orchestration.py` |
-| 21 | Executor base + `QueryResult` | `query/executor/base.py` | B | Lock the shape before either implements an executor |
-| 22 | Executor dispatcher | `query/executor/dispatcher.py` | B | Reads `source.source_type` from catalog; pair |
-| 23 | Compiler base ABC | `query/compiler/base.py` | B | Already done |
-| 24 | Top-level QueryService | `query/service.py` | B | Wires planner → validator → compiler → executor; pair |
-| **Query — DB path** | | | | |
-| 25 | SQL compiler (IR → SQL + params, per dialect) | `query/compiler/sql.py` | DB | Identifiers from catalog (quoted), values parameterized |
-| 26 | DB executor (asyncpg/pymysql, sqlglot guard, RO txn, 30s timeout) | `query/executor/db.py` | DB | |
-| 27 | Credential encryption (Fernet) | `security/credentials.py` | DB | Needed for stored user DB creds |
-| 28 | User-DB connection management | helper in pipelines | DB | engine_scope context manager pattern |
-| **Query — Tabular path** | | | | |
-| 29 | Pandas compiler (IR → callable on DataFrame) | `query/compiler/pandas.py` | TAB | Same IR, different backend |
-| 30 | Tabular executor (eager pandas first; pyarrow / polars later) | `query/executor/tabular.py` | TAB | Initial scope: eager pandas only |
-| 31 | Parquet upload/download + Azure Blob wrapper | `storage/az_blob/az_blob.py` (+ helper) | TAB | XLSX sheet → one Parquet per sheet (deterministic blob name) |
-| **Agents + chat** | | | | |
-| 32 | Chatbot agent + prompt | `agents/chatbot.py`, `config/prompts/chatbot_system.md` | B | Receives QueryResult or Cu chunks |
-| 33 | Guardrails prompt | `config/prompts/guardrails.md` | B | |
-| **API surface** | | | | |
-| 34 | DB client endpoints (register/ingest/list/delete) | `api/v1/db_client.py` | DB | |
-| 35 | Document/tabular upload endpoints | `api/v1/document.py` | TAB | |
-| 36 | Chat stream endpoint (SSE) | `api/v1/chat.py` | B | Dispatches both paths; pair |
-| 37 | Room / users endpoints | `api/v1/room.py`, `api/v1/users.py` | B | Whoever has bandwidth |
-| **Tests + eval** | | | | |
-| 38 | DB compiler golden tests (IR → SQL fixtures) | `tests/query/compiler/test_sql.py` | DB | Pure-Python, no LLM |
-| 39 | Pandas compiler golden tests (IR → expected DataFrame) | `tests/query/compiler/test_pandas.py` | TAB | Pure-Python, no LLM |
-| 40 | IR validator tests (catalog × IR error matrix) | `tests/query/ir/test_validator.py` | B | Each contributes test cases for their source type |
-| 41 | Planner eval (golden question → IR examples) | `tests/query/planner/` | B | Each contributes ~10 question→IR examples |
-| 42 | E2E smoke tests | `tests/e2e/` | B | Pair |
-### Decisions to lock before coding
-If made unilaterally these create silent contract drift. Lock them in a 30-min sync first.
-| Decision | Why it matters | Recommended call |
-|---|---|---|
-| `QueryResult` shape (current scaffold: `source_id, backend, rows, row_count, truncated, elapsed_ms, error`) | Both executors return this; chatbot consumes it | Lock as-is unless either side needs more (e.g. `column_types` for formatting) |
-| `Source.location_ref` format (`az_blob://...` vs `dbclient://{id}` etc.) | Dispatcher and executors both parse this | Pick a convention now; document in `catalog/models.py` docstring |
-| Where do user DB credentials live? | DB executor needs creds to run queries; Source has `location_ref` but creds are encrypted separately | Recommend: `location_ref="dbclient://{client_id}"`; executor looks up creds by ID |
-| How does dispatcher pick the executor? | Routes by `source.source_type` — but where does dispatcher get it (catalog reload, or IR carries it)? | Recommend: dispatcher takes `(Catalog, IR)`, looks up source by `IR.source_id` |
-| Joins in v1 IR? | Excluded per ARCHITECTURE.md §7. DB path is most affected — real DB use often needs joins. | Recommend: ship single-table; revisit in PR 2. **DB owner must accept the constraint or push back early** |
-| Planner prompt — render tabular vs DB sources uniformly | If described differently, planner gets confused | Pair-program. Render both as `Table: name (n rows) — Columns: ...` regardless of source_type |
-| Error contract — raise or return `QueryResult.error`? | Both executors must behave the same so chatbot branches consistently | Recommend: never raise from `executor.run()`; populate `QueryResult.error` |
-| PII handling for tabular `sample_values` | DB samples come from `information_schema`; tabular from file reads. Same `pii_flag` rule must apply both sides | Confirm tabular introspector calls `pii_detector` |
-| Catalog refresh trigger (open question §3) | Affects both pipelines symmetrically | Default: rebuild on every upload/connect; defer auto-refresh |
-| `updated_at` semantics — per-Source vs per-Catalog | Affects how each pipeline writes | Recommend: per-Source `updated_at` + Catalog-level `generated_at` |
-| Dialect support scope for v1 | DB compiler must implement at least one dialect well | Recommend: Postgres first (matches app DB); MySQL second |
-| Test-fixture format for golden IRs | Both compilers test against golden IR → expected output | Recommend: shared `tests/fixtures/golden_irs.json`; each side adds expected SQL or DataFrame |
-| Logging conventions | structlog is already in place; both should log the same fields | Quick agreement: log `source_id`, `table_id`, `ir_version`, `elapsed_ms` |
-### Working rhythm (suggested)
-1. **Day 1** — 30-min sync to lock the decisions table. PR any contract/docstring changes that fall out.
-2. **Week 1** — both build introspectors + agree on the planner prompt format. PR in parallel; review each other's.
-3. **Week 2** — DB builds SQL compiler + DB executor; TAB builds pandas compiler + tabular executor. Both write golden tests against shared IR fixtures.
-4. **Week 3** — pair on dispatcher, QueryService, and chat endpoint integration. End-to-end smoke test.
-5. **Ongoing** — short daily standup, mostly to flag IR-shape questions and catalog-field additions *before* either side implements against an unconfirmed contract.
-Biggest risk: **silent contract drift** — one side adds a `QueryResult` field or assumes a new IR op exists, the other ships without it, and integration breaks at the dispatcher. The §0 lock + shared golden-IR fixtures are what prevent that.
-### Onboarding to Claude Code
-If you're new to Claude Code, before you start:
-1. Read `ARCHITECTURE.md` end-to-end (~10 min) — this is the source of truth.
-2. Skim this file (`REPO_CONTEXT.md`) — find your section in the ownership table.
-3. Read your owned files' docstrings — every stub explains its contract.
-4. Open Claude Code in this repo. When you ask Claude to implement a stub:
-   - Reference the file path + the contract it should follow
-   - Point it at `ARCHITECTURE.md` section if relevant (e.g. §7 for IR validation)
-   - Ask it to write the test first (golden IR fixtures), then the implementation
-   - Always review the diff — don't auto-accept
-Useful slash commands while working: `/review` (PR review), `/security-review` (audit pending changes).
----
-## Conventions & gotchas
-- **Async event loop on Windows**: `run.py` sets `WindowsSelectorEventLoopPolicy` because psycopg3 async needs it. Don't call `uvicorn` directly on Windows.
-- **Two Postgres engines**: `engine` (app tables) and `_pgvector_engine` (asyncpg with `prepared_statement_cache_size=0`) — the latter is required because PGVector emits `advisory_lock + CREATE EXTENSION` as a multi-statement string and asyncpg rejects multi-statement prepared queries. `init_db.py` creates the extension explicitly so `PGVector(create_extension=False)` skips that path.
-- **Read-only at every layer for user DBs**: IR validation + compiler whitelists + sqlglot SELECT-only check + read-only DB credentials + LIMIT enforcement + 30s timeout. Five layers; no single point of failure.
-- **Identifiers vs values**: identifiers (table/column names) come from the catalog and are inlined as quoted identifiers — they were verified at validation time so this is safe. Values from `IR.filters` are *always* parameterized, never inlined as strings.
-- **Credential encryption**: Fernet via `dataeyond__db__credential__key` env var; lives in `security/credentials.py`. Sensitive fields = `{"password", "service_account_json"}`.
-- **Settings env-var aliases**: `.env` uses double-underscore names (`azureai__api_key__4o`); `Settings` exposes them as `azureai_api_key_4o` via `Field(alias=...)`. Mind both forms when adding settings.
-- **Prompts**: `src/config/prompts/*.md` — `intent_router`, `query_planner`, `chatbot_system`, `guardrails` are all written. `chatbot_system` has `guardrails` appended so guardrails take precedence in conflict. `catalog_enricher.md` was deleted in KM-557. `config/agents/` folder deleted in Cleanup PR.
-- **Planner prompt parsing gotcha**: `query/planner/service.py` uses `SystemMessage(content=...)` not `("system", text)`. The tuple form causes LangChain to interpret `{...}` in `query_planner.md` as f-string variables and crash on every real invocation. Don't refactor back to tuples.
-- **Tests**: 146+ unit tests in place. Run with `uv run pytest`. Planner eval gated on `RUN_PLANNER_EVAL=1`; catalog store integration test gated on `RUN_INTEGRATION_TESTS=1`.
----
-## Recommended reading order
-1. `ARCHITECTURE.md` — design intent (the source of truth)
-2. `src/catalog/models.py` + `src/query/ir/models.py` — the two data shapes everything else moves between
-3. `src/query/ir/operators.py` + `src/security/pii_patterns.py` — the explicit whitelists / patterns
-4. Skim every `__init__.py`-level docstring under `src/catalog/`, `src/query/`, `src/agents/`, `src/pipeline/` — each describes the contract its module enforces
-5. `main.py` + `src/db/postgres/{connection,init_db}.py` — runtime bootstrap
-6. `ARCHITECTURE.md §10` — five open questions that haven't been decided yet
----
-## Open questions
-Resolved as Phase 2 landed:
-1. ✅ Catalog storage shape — Postgres `jsonb` row in `data_catalog` table, keyed by `user_id`.
-2. ❌ Unstructured files in catalog — still not modeled; router uses `source_hint` from the LLM instead.
-3. 🟡 Catalog refresh trigger — rebuild-on-upload-or-connect is the default. Explicit endpoint `POST /api/v1/data-catalog/rebuild` exists. Background TTL deferred.
-4. ✅ Joins out of v1 IR — confirmed; single-table only. Revisit when real queries need it.
-5. 🟡 PII `sample_values` — currently nulled out (skip). Mask/synthesize deferred.
----
-## Glossary
-- **Cu** — unstructured context (prose chunks)
-- **Cs** — schema context (DB tables/columns from catalog)
-- **Ct** — tabular context (file sheets/columns from catalog)
-- **IR** — intermediate representation (the JSON query shape)
-- **PII** — personally identifiable information
-- **ABC** — abstract base class

REPO_STATUS.md ADDED Viewed

	@@ -0,0 +1,306 @@

+# Data Eyond — Python Agentic Service: Current Status
+**Audience:** teammates onboarding onto the Python repo (`Agentic-Service-Data-Eyond-Catalog`).
+**Scope:** what the code does **right now** (branch `pr/4`, ticket KM-652). Describes current state only — no roadmap or to-dos.
+**Snapshot date:** 2026-06-25.
+> This file is grounded in the source, not the older design docs. Where the two
+> disagree, the code wins — see [§11 Doc-vs-code](#11-where-the-older-docs-are-stale).
+> `REPO_CONTEXT.md` / `ARCHITECTURE.md` are the original Phase-2 design docs and are
+> stale on the router, joins, and the analysis/report stack.
+---
+## 1. The product in one paragraph
+Data Eyond is an **"AI data scientist"** for business analytics, modelled on **CRISP-DM**
+(Business Understanding → Data Understanding → Preparation → Modeling → Evaluation →
+Deployment). It targets executives doing self-serve deep-dives and analysts offloading
+routine work. A user defines a goal, connects data (DB or files), asks natural-language
+analytical questions, and gets CRISP-DM-structured answers that can be exported as a
+versioned **report**. The aim is "junior data scientist that hands back a decision-ready
+deliverable," not "chatbot over a database."
+---
+## 2. Three repos, one hard ownership rule
+Request flow is **FE → Go → Python**. The FE never calls Python directly except for chat
+streaming.
+| Repo | Role | We edit? |
+|---|---|---|
+| **Python** — `Agentic-Service-Data-Eyond-Catalog` (this repo) | The agentic LLM service: router, gate, skills, slow analytical path, structured query engine, unstructured RAG, report generation, analysis-session state. FastAPI + async SQLAlchemy + LangChain + Azure GPT-4o. | **Yes — the only repo we edit.** |
+| **Go** — `Orchestrator-Agent-Service` | Gateway / data plane: interview agent, auth/JWT, rooms, documents (Azure Blob + CSV/XLSX→Parquet + embeddings), database_clients (Fernet creds), catalog ingestion, **all DB migrations**. | Reference only. |
+| **FE** — `E2E-Frontend-Data-Eyond` | React/Vite SPA. Talks to Go for everything and to Python only for chat streaming. | Reference only. |
+Shared infra: **Postgres** (app tables + `data_catalog` jsonb + PGVector `langchain_pg_embedding`), **Azure Blob**, and (Python-only) **Redis**.
+---
+## 3. Tech stack & how to run
+- Python 3.12, FastAPI, uvicorn, sse-starlette
+- Async SQLAlchemy 2.0 + asyncpg (Postgres); psycopg3 for the PGVector engine
+- LangChain + langchain-openai (Azure OpenAI GPT-4o) + langchain-postgres (PGVector)
+- Redis (response + retrieval cache), Azure Blob (uploads + Parquet)
+- pandas / pyarrow, sqlglot, pydantic v2, structlog, slowapi, langfuse
+- DB connectors: psycopg2, pymysql, pymssql, sqlalchemy-bigquery, snowflake-sqlalchemy
+Run (Linux/Docker): `uv run --no-sync uvicorn main:app --host 0.0.0.0 --port 7860`
+Run (Windows): `uv run --no-sync python run.py` (sets `WindowsSelectorEventLoopPolicy` for psycopg3 async — don't call uvicorn directly on Windows).
+Tests live locally and are gitignored. Run with `./.venv/Scripts/python.exe -m pytest`.
+---
+## 4. Chat request lifecycle
+Entry: `POST /api/v1/chat/stream` (`src/api/v1/chat.py`) → `ChatHandler.handle(...)`
+(`src/agents/chat_handler.py`). One shared `ChatHandler` per process keeps the Azure clients warm.
+```
+POST /chat/stream { user_id, room_id, message }
+  │  (analysis_id == room_id — one session = one analysis = one chat room)
+  ├─ Redis response-cache check (1h TTL, key chat:{room}:{user}:{message})  ── hit → replay
+  ├─ greeting/farewell short-circuit (_fast_intent, EN+ID)                  ── hit → canned reply
+  ├─ load last-10 history
+  └─ ChatHandler.handle:
+       1. classify → RouterDecision               [1 GPT-4o call]
+       2. ensure analysis-state row (get-or-create, idempotent)
+       3. emit `intent` (internal; gates caching), then dispatch:
+            chat              → ChatbotAgent → SSE
+            help              → HelpAgent (state + history + readiness) → SSE
+            check             → check_data/check_knowledge tool → rendered table  [no LLM]
+            unstructured_flow → DocumentRetriever (PGVector RAG) → ChatbotAgent → SSE
+            structured_flow   → CatalogReader → (slow path | QueryService) → SSE
+       4. SSE events: intent (internal), sources, chunk, status, done | error
+```
+Only the `chat` intent is cached (stateless). Messages persist on `done`.
+> The router emits **5 intents** now. The `problem_statement` skill and the `problem_validated`
+> gate were removed 2026-06-25 (KM-652) — the analysis goal is two user-entered fields
+> (`objective` + `business_questions`) captured at onboarding, with no agent validation.
+---
+## 5. Report lifecycle
+The report is a **dedicated API, not a chat route** (`src/api/v1/report.py`):
+```
+POST /report?analysis_id&user_id
+  ├─ load analysis state; enforce the report FLOOR
+  │     (≥1 substantive analyze_* success) → else 409
+  ├─ ReportGenerator.generate (src/agents/report/generator.py):
+  │     read persisted AnalysisRecords (list_for_analysis)
+  │     deterministically assemble findings / caveats / open-questions /
+  │       data-source appendix / CRISP-DM method appendix  (copied verbatim)
+  │     ONE LLM call → executive summary only (deterministic fallback on failure)
+  │     render markdown
+  ├─ ReportStore.save: advisory-locked version assignment → dedorch `reports`
+  └─ write report_id back onto analysis state
+GET /report/{analysis_id}        → list versions (oldest-first)
+GET /report/{analysis_id}/{ver}  → fetch one version
+```
+Two facts to internalise:
+- **Records only exist on the slow path.** With `ENABLE_SLOW_PATH=false` (the default) no
+  records accumulate, so generation 409s — by design, not a bug.
+- **dedorch `reports` stores markdown only.** Structured report fields are computed at
+  generation, rendered into `rendered_markdown`, and only the markdown is persisted; on
+  read-back the structured fields come back empty.
+---
+## 6. Feature list (what's built)
+- **5-intent handler router** (`chat`/`help`/`check`/`unstructured_flow`/`structured_flow`) with history-aware query rewriting (EN/ID).
+- **Skills:** `help` (LLM, state-aware next-step guidance), `check` (no-LLM data/document inventory). *(The `problem_statement` skill and the `problem_validated` gate were removed 2026-06-25 — KM-652; `gate.py` kept as a no-op seam, `problem_statement.py` kept but unwired.)*
+- **Slow analytical path:** Planner → TaskRunner → Assembler (static plan, degrade-and-continue, 3 LLM calls fixed).
+- **Structured query engine:** catalog-driven JSON IR → deterministic SQL/pandas compiler → read-only executor, with **single-level FK joins** (DB sources only).
+- **Unstructured RAG** over PGVector.
+- **Analytics tools:** 4 registered composite `analyze_*` (descriptive, aggregate, correlation, trend) + 4 data-access tools (check_data, check_knowledge, retrieve_data, retrieve_knowledge). Four further composites (comparison, contribution, profile, segment) exist in code but are **not registered** with the Planner.
+- **Versioned report generation** from persisted records.
+- **Analysis sessions:** data-first creation gate (≥1 bound source), per-analysis data-source binding (#10).
+- **Langfuse tracing** (PII-masked), **Redis caching**, **pooled DB engines** + speculative prewarm.
+---
+## 7. API surface (this repo, all under `/api/v1`)
+| Endpoint | Purpose | Caller |
+|---|---|---|
+| `POST /chat/stream` | Main chat SSE (router → dispatch) | FE → Go → Python (the only FE→Python call today) |
+| `DELETE /chat/cache` · `/chat/cache/room/{id}` · `/retrieval/cache/{user_id}` | Cache management | internal / ops |
+| `POST /analysis/create` · `GET /analysis` · `GET /analysis/{id}` | Analysis-session CRUD (state + room + bindings created atomically) | intended FE → Go |
+| `POST /report` · `GET /report/{id}` · `GET /report/{id}/{ver}` | Report generate / list / fetch | FE → Go (report button) |
+| `GET /tools` | Slash-command catalog (static, cacheable) | Go caches it for the FE "/" menu |
+| `users` · `room` · `document` · `db_client` · `data_catalog` routers | Phase-1 legacy; functionally migrated to Go | mostly dormant |
+---
+## 8. Data model
+SQLAlchemy models in `src/db/postgres/models.py`. Created on startup by `init_db()`
+unless `SKIP_INIT_DB=true`.
+| Table | Shape | Written by | Read by |
+|---|---|---|---|
+| `users`, `rooms`, `chat_messages`, `message_sources` | base app | chat endpoint, Go | chat history |
+| `documents`, `databases` | uploads + DB creds (Fernet-encrypted) | Go ingestion | executor cred resolution |
+| `data_catalog` | per-user jsonb `Catalog` (Source → Table → Column) | Go ingestion / Python pipeline | CatalogReader, planner, tools |
+| `langchain_pg_embedding` | PGVector document chunks | Go ingestion | DocumentRetriever |
+| `analysis_records` | jsonb `AnalysisRecord`, one per slow-path run | slow path | ReportGenerator, report readiness |
+| `analysis` *(dedorch)* | uuid id, `owner_id`, `problem_statement`, `problem_validated`, `report_id` | `/analysis/create`, state store | gate, Help, report |
+| `reports` *(dedorch)* | uuid, `title` + markdown `content` + `version` | ReportStore | report API |
+| `data_sources` *(dedorch)* | per-analysis binding; `reference_id` = catalog source_id | `/analysis/create` | structured-flow scoping, report appendix |
+**Catalog shape** (the jsonb in `data_catalog`):
+`Catalog → Source[ {source_id, source_type ∈ schema|tabular|unstructured, name, location_ref} → Table[ {table_id, name, row_count, foreign_keys[]} → Column[ {column_id, name, data_type, nullable, pii_flag, sample_values|null, stats} ] ] ]`. PII columns have `sample_values: null` so real values never enter prompts.
+**QueryIR shape** (`src/query/ir/models.py`):
+`{ source_id, table_id, joins[], select[], filters[], group_by[], order_by[], limit }`.
+Joins are single-level equi-joins to a related table **in the same source**, FK-backed,
+**DB sources only**.
+---
+## 9. Subsystems (where the code lives)
+### Router — `src/agents/orchestration.py`
+One GPT-4o structured-output call → `RouterDecision{intent, rewritten_query, confidence}`,
+`intent ∈ {chat, help, check, unstructured_flow, structured_flow}` (`problem_statement` removed
+2026-06-25). It's a
+*handler* classifier: `structured_flow` = slow path, `unstructured_flow` = fast RAG; the
+data-modality mix on the slow path is the Planner's job. Prompt: `src/config/prompts/intent_router.md`.
+### Gate — `src/agents/gate.py`
+**Neutered 2026-06-25 (KM-652):** `gate()` now passes every intent through unchanged — the
+`problem_validated` redirect was removed (the goal is user-entered, no agent validation). The
+function + `AnalysisState` contract are kept as a no-op seam; the call site in
+`chat_handler.handle` is commented out. `AnalysisState` still carries (id, analysis_title,
+problem_statement, problem_validated, owner_id, report_id, created_at, updated_at) until the
+dedorch state migration (#3/#4) renames it.
+### Skills — `src/agents/handlers/`
+- `help.py` — LLM (streamed). A consistency guard derives the *allowed* actions from state
+  (mirrors the gate) and feeds them to the prompt, so Help can't suggest a report when the goal
+  isn't validated or there's nothing to report. Consumes a deterministic readiness signal.
+- `check.py` — **no LLM.** Keyword cues route to `check_data`, `check_knowledge`, or both
+  (helicopter view, concurrent). Renders tool tables to markdown.
+- `problem_statement.py` — **unwired 2026-06-25** (no longer routed to; file kept intact). Was an
+  LLM drafter that validated a goal and wrote `problem_validated`.
+### Slow path — `src/agents/slow_path/` + `src/agents/planner/`
+- **Planner** (`planner/service.py`) — 1 LLM call → `TaskList` (DAG of tool-call chains). 8-check
+  validator with re-prompt retry (max 3). `BusinessContext` is a **stub** (`planner/business_context.py`),
+  which is why the slow path stays opt-in.
+- **TaskRunner** (`slow_path/task_runner.py`) — deterministic, 0 LLM. Wave-based execution,
+  `${t<id>}` placeholder resolution (Pattern A), never-throw invocation, **degrade-and-continue**
+  (failed task → dependents skipped, independent branches run). No replanning.
+- **Assembler** (`slow_path/assembler.py`) — 1 LLM call authoring only the narrative; code copies
+  the structured `results_snapshot` / `tasks_run` from the run state into the `AnalysisRecord`
+  (the report's source of truth).
+Streaming + persistence: `chat_handler._run_slow_path` bridges per-stage progress to SSE `status`
+events, prewarms the DB engine in parallel with planning, emits the answer, then persists the
+record stamped with `user_id` + `analysis_id`.
+### Structured query engine — `src/query/`
+`QueryService.run` (`query/service.py`): plan → validate → retry(3) → dispatch → execute; **never
+raises** (errors land in `QueryResult.error`). `IRValidator` (`query/ir/validator.py`) checks
+source/table/column existence, op/agg whitelists, type compatibility, limit cap, and **FK-backed
+joins** (DB only). `DbExecutor` (`query/executor/db.py`): SqlCompiler → sqlglot SELECT-only guard →
+Fernet-decrypt creds (with owner check) → `asyncio.to_thread` (30s timeout) → pooled engine
+(read-only + statement_timeout) → 10k row cap. Defense-in-depth: IR validation + compiler whitelist
++ sqlglot guard + read-only session + LIMIT/timeout.
+### Data-source binding (#10) — `src/agents/binding_store.py`
+At `/analysis/create`, chosen `data_source_ids` become `data_sources` rows. On a `structured_flow`
+turn the catalog reader is wrapped so the Planner and the tools' re-reads see the same scoped
+catalog. **Fail-open**: empty/disjoint binding → whole catalog.
+### Tool layer — `src/tools/data_access.py`, `src/agents/planner/registry.py`
+`DataAccessToolInvoker` implements the never-throw tool seam for the 4 data-access tools.
+`retrieve_data` runs a pre-built IR (validate → dispatch → execute, skipping the planner) and
+coerces `Decimal`→`float` — the Pattern A handoff the `analyze_*` tools consume. The planner
+registry composes a local data-access spec stub (name-checked against `DATA_ACCESS_TOOLS`) with the
+real `analytics_registry()`.
+### Report — `src/agents/report/`
+`generator.py` reads records, deterministically assembles structured fields, 1 LLM call for the
+executive summary; `store.py` versions under an advisory lock and persists markdown to dedorch
+`reports`; `readiness.py` defines the **report floor** (≥1 successful `analyze_*`; the
+`problem_validated` precondition was dropped 2026-06-25) shared by the report API and the Help
+readiness signal so the two can't disagree.
+### Observability — Langfuse
+The endpoint's `ChatHandler` runs with `enable_tracing=True`. One trace per request groups
+router/planner/assembler/chatbot + tool spans. PII policy: router/planner unmasked (PII-safe
+summaries); assembler/chatbot masked (see real rows); tool spans carry name + arg keys + row counts
+only.
+---
+## 10. Feature flags
+| Flag | Where | Default | Effect |
+|---|---|---|---|
+| `ENABLE_SLOW_PATH` | `settings.enable_slow_path` | **off** | Route `structured_flow` through Planner/TaskRunner/Assembler (vs single-query `QueryService`). Records persist only on the slow path → reports require this on. |
+| `ENABLE_GATE` | `settings.enable_gate` | **off** | **Deprecated 2026-06-25** — gate neutered; the flag has no effect. Kept to avoid `.env` churn. |
+| `SKIP_INIT_DB` | env, `main.py` | off | Skip `create_all` on startup — the dedorch cutover switch (Go owns dedorch migrations). |
+| `enable_tracing` | hardcoded `True` in `chat.py` | on (endpoint) | Langfuse tracing. |
+---
+## 11. Where the older docs are stale
+Trust the code. The original Phase-2 docs (`ARCHITECTURE.md`, `REPO_CONTEXT.md`) and the Go repo's
+copies disagree with the current code on:
+| Topic | Old docs | Current code |
+|---|---|---|
+| Router | 3-way `source_hint` (chat/unstructured/structured) | Flat **5-intent** `RouterDecision` (was 6; `problem_statement` removed 2026-06-25) |
+| Joins in IR | "single-table only; deferred" | **Single-level FK-backed joins** (DB sources only) |
+| Analysis / report / gate / slow path | "Phase 2 spine only" | All built and present |
+| `analysis_id` | open question | resolved: **`analysis_id == room_id`** |
+| Report source | (newer invariant) "from records, never chat history" | confirmed: generator reads `AnalysisRecord`s |
+---
+## 12. dedorch migration — current state
+The Python DB is moving from `dataeyond` → **dedorch** (Go owns dedorch migrations; Python is
+consumer-only). Current state:
+- Base tables already match dedorch.
+- The analysis-family models have been **renamed to dedorch** on `pr/3`: `analysis` (was
+  `analysis_states`, uuid ids), `data_sources` (was `analysis_data_sources`), `reports` (was
+  `analysis_reports`, flattened to title + markdown content + version).
+- `analysis_records` (the slow-path structured output) has **no dedorch home** — it remains a
+  Python-owned jsonb table.
+- The connection-string cutover (paired with `SKIP_INIT_DB`) is a coordinated step that has not
+  happened yet; Python still creates tables on startup until then.
+The dedorch migrations themselves live outside the three checked-out repos (Harry owns them), so the
+dedorch table shapes are asserted by the Python model docstrings, not visible in the Go repo here.
+---
+## 13. Conventions & gotchas
+- **Two Postgres engines:** app engine + a separate PGVector engine (`prepared_statement_cache_size=0`)
+  because PGVector emits multi-statement strings asyncpg rejects.
+- **Identifiers vs values:** identifiers come from the catalog and are inlined as quoted; filter
+  values are always parameterized.
+- **Settings aliases:** `.env` uses double-underscore names (`azureai__api_key__4o`); `Settings`
+  exposes them as `azureai_api_key_4o`.
+- **Never-throw seams** are pervasive (tool invoker, query service, executors, state/binding reads,
+  record persistence, report summary). Failures degrade into soft output rather than raising — good
+  for UX, but they can mask real breakage (e.g. a binding silently fail-opening to the full catalog).
+- **Prompts** live in `src/config/prompts/*.md`. `chatbot_system.md` has `guardrails.md` appended so
+  guardrails win on conflict.
+- **Tests** are gitignored (team decision) — run them locally.

src/agents/chat_handler.py CHANGED Viewed

@@ -9,8 +9,10 @@ End-to-end flow per user message:
        - `unstructured_flow` → DocumentRetriever (RAG over PGVector) →
                                list[DocumentChunk].
        - `check`             → check_data / check_knowledge tool → rendered table.
-       - `problem_statement` → PS skill: draft + validate → write analysis state.
        - `help`              → Help skill: analysis state + history → streamed guidance.
   3. `ChatbotAgent.astream` → yield text tokens.
   4. Wrap each step into an SSE-style event dict so the API endpoint can
      stream them as Server-Sent Events.
@@ -39,7 +41,9 @@ from src.retrieval.base import RetrievalResult
 from .chatbot import ChatbotAgent, DocumentChunk
 from .handlers.check import run_check
 from .handlers.help import HelpAgent
-from .handlers.problem_statement import ProblemStatementAgent, run_problem_statement
 from .orchestration import OrchestratorAgent
 if TYPE_CHECKING:
@@ -48,7 +52,7 @@ if TYPE_CHECKING:
     from ..retrieval.router import RetrievalRouter
     from .gate import AnalysisState
     from .slow_path.coordinator import SlowPathCoordinator
-    from .slow_path.store import AnalysisStore
 logger = get_logger("chat_handler")
@@ -78,7 +82,7 @@ class ChatHandler:
         slow_path_coordinator_factory: (
             Callable[[str], SlowPathCoordinator] | None
         ) = None,
-        analysis_store: AnalysisStore | None = None,
         check_invoker_factory: Callable[[str], Any] | None = None,
         ps_agent: ProblemStatementAgent | None = None,
         help_agent: HelpAgent | None = None,
@@ -114,8 +118,8 @@ class ChatHandler:
         # `#10` data-source binding: scopes structured_flow's catalog to the sources
         # the analysis is bound to. Injectable for tests; fail-open when absent.
         self._binding_store = binding_store
-        # Deterministic gate: redirect structured_flow -> problem_statement until the
-        # analysis is validated. OFF by default (legacy rooms have no state row).
         self._enable_gate = enable_gate
     # ------------------------------------------------------------------
@@ -244,9 +248,8 @@ class ChatHandler:
         intent = decision.intent
         # ---- 1a. Ensure session state row (T-A) ----------------------
-        # Rooms created via /room/create have no `analysis_states` row. Without one
-        # the gate redirect-loops and problem_statement / report_id writes silently
-        # no-op. Lazily get-or-create it (idempotent) so any session is gate-ready.
         analysis_state: AnalysisState | None = None
         if analysis_id:
             try:
@@ -256,18 +259,20 @@ class ChatHandler:
                     "analysis state ensure failed", analysis_id=analysis_id, error=str(e)
                 )
-        # ---- 1b. Gate (deterministic, post-router) -------------------
-        # Redirect structured_flow -> problem_statement until the analysis is
-        # validated. Fails closed (not-validated) when the state row is unavailable.
-        if self._enable_gate and analysis_id:
-            from .gate import gate, stub_analysis_state
-            intent = gate(
-                intent,
-                analysis_state
-                if analysis_state is not None
-                else stub_analysis_state(problem_validated=False),
-            )
         # The `intent` event is consumed by the endpoint (it gates response caching
         # on the effective intent) and is NOT forwarded to the frontend. We emit the
@@ -337,22 +342,24 @@ class ChatHandler:
             yield {"event": "chunk", "data": text}
             yield {"event": "done", "data": ""}
             return
-        elif intent == "problem_statement":
-            try:
-                text = await run_problem_statement(
-                    message,
-                    analysis_id,
-                    agent=self._get_ps_agent(),
-                    store=self._get_state_store(),
-                    history=history,
-                )
-            except Exception as e:
-                logger.error("problem_statement route failed", user_id=user_id, error=str(e))
-                yield {"event": "error", "data": f"Problem statement failed: {e}"}
-                return
-            yield {"event": "chunk", "data": text}
-            yield {"event": "done", "data": ""}
-            return
         elif intent == "help":
             try:
                 state = analysis_state or await self._load_analysis_state(analysis_id)
@@ -468,11 +475,11 @@ class ChatHandler:
             PlannerService(), TaskRunner(invoker, registry), Assembler(), registry
         )
-    def _get_analysis_store(self) -> AnalysisStore:
         if self._analysis_store is None:
-            from .slow_path.store import PostgresAnalysisStore
-            self._analysis_store = PostgresAnalysisStore()
         return self._analysis_store
     async def _run_slow_path(
@@ -487,7 +494,7 @@ class ChatHandler:
         """Run the slow path and stream its assembled answer as SSE events.
         Context comes from the `get_business_context` seam (a stub today); the
-        `analysis_record` is persisted via the `AnalysisStore` seam (PostgresAnalysisStore),
         stamped with the request's user_id + analysis_id so the report can group it.
         `chat_answer` is emitted as a single `chunk` (the Assembler returns the whole
         object — true token streaming is a later step).

        - `unstructured_flow` → DocumentRetriever (RAG over PGVector) →
                                list[DocumentChunk].
        - `check`             → check_data / check_knowledge tool → rendered table.
        - `help`              → Help skill: analysis state + history → streamed guidance.
+  (`problem_statement` was removed 2026-06-24 — the goal is now user-entered
+  `objective` + `business_questions` captured at onboarding, with no agent skill.)
   3. `ChatbotAgent.astream` → yield text tokens.
   4. Wrap each step into an SSE-style event dict so the API endpoint can
      stream them as Server-Sent Events.
 from .chatbot import ChatbotAgent, DocumentChunk
 from .handlers.check import run_check
 from .handlers.help import HelpAgent
+# `run_problem_statement` unwired 2026-06-24 (problem_statement removed from the router).
+# `ProblemStatementAgent` kept — still referenced by the constructor + _get_ps_agent.
+from .handlers.problem_statement import ProblemStatementAgent
 from .orchestration import OrchestratorAgent
 if TYPE_CHECKING:
     from ..retrieval.router import RetrievalRouter
     from .gate import AnalysisState
     from .slow_path.coordinator import SlowPathCoordinator
+    from .slow_path.store import ReportInputStore
 logger = get_logger("chat_handler")
         slow_path_coordinator_factory: (
             Callable[[str], SlowPathCoordinator] | None
         ) = None,
+        analysis_store: ReportInputStore | None = None,
         check_invoker_factory: Callable[[str], Any] | None = None,
         ps_agent: ProblemStatementAgent | None = None,
         help_agent: HelpAgent | None = None,
         # `#10` data-source binding: scopes structured_flow's catalog to the sources
         # the analysis is bound to. Injectable for tests; fail-open when absent.
         self._binding_store = binding_store
+        # Deterministic gate — DEPRECATED 2026-06-24 (problem_validated gate removed).
+        # Unused flag; the gate call site in handle() is commented out.
         self._enable_gate = enable_gate
     # ------------------------------------------------------------------
         intent = decision.intent
         # ---- 1a. Ensure session state row (T-A) ----------------------
+        # Rooms created via /room/create have no `analysis` row. Without one, Help and
+        # the report_id write-back silently no-op. Lazily get-or-create it (idempotent).
         analysis_state: AnalysisState | None = None
         if analysis_id:
             try:
                     "analysis state ensure failed", analysis_id=analysis_id, error=str(e)
                 )
+        # ---- 1b. Gate (REMOVED 2026-06-24) ---------------------------
+        # The problem_validated gate was dropped: structured_flow is no longer
+        # redirected to problem_statement (the goal is now user-entered objective +
+        # business_questions, no agent validation). `gate()` is neutered to a no-op; the
+        # call site is left commented for restorability.
+        # if self._enable_gate and analysis_id:
+        #     from .gate import gate, stub_analysis_state
+        #
+        #     intent = gate(
+        #         intent,
+        #         analysis_state
+        #         if analysis_state is not None
+        #         else stub_analysis_state(problem_validated=False),
+        #     )
         # The `intent` event is consumed by the endpoint (it gates response caching
         # on the effective intent) and is NOT forwarded to the frontend. We emit the
             yield {"event": "chunk", "data": text}
             yield {"event": "done", "data": ""}
             return
+        # problem_statement dispatch removed 2026-06-24 (skill unwired; intent no longer
+        # emitted by the router). Branch kept commented for restorability.
+        # elif intent == "problem_statement":
+        #     try:
+        #         text = await run_problem_statement(
+        #             message,
+        #             analysis_id,
+        #             agent=self._get_ps_agent(),
+        #             store=self._get_state_store(),
+        #             history=history,
+        #         )
+        #     except Exception as e:
+        #         logger.error("problem_statement route failed", user_id=user_id, error=str(e))
+        #         yield {"event": "error", "data": f"Problem statement failed: {e}"}
+        #         return
+        #     yield {"event": "chunk", "data": text}
+        #     yield {"event": "done", "data": ""}
+        #     return
         elif intent == "help":
             try:
                 state = analysis_state or await self._load_analysis_state(analysis_id)
             PlannerService(), TaskRunner(invoker, registry), Assembler(), registry
         )
+    def _get_analysis_store(self) -> ReportInputStore:
         if self._analysis_store is None:
+            from .slow_path.store import PostgresReportInputStore
+            self._analysis_store = PostgresReportInputStore()
         return self._analysis_store
     async def _run_slow_path(
         """Run the slow path and stream its assembled answer as SSE events.
         Context comes from the `get_business_context` seam (a stub today); the
+        `analysis_record` is persisted via the `ReportInputStore` seam (PostgresReportInputStore),
         stamped with the request's user_id + analysis_id so the report can group it.
         `chat_answer` is emitted as a single `chunk` (the Assembler returns the whole
         object — true token streaming is a later step).

src/agents/gate.py CHANGED Viewed

@@ -40,26 +40,29 @@ class AnalysisState(BaseModel):
     analysis_title: str
     problem_statement: str
     problem_validated: bool = False
-    owner_id: str
     report_id: str | None = None
     created_at: datetime
     updated_at: datetime
 def gate(intent: Intent, state: AnalysisState) -> Intent:
-    """Return the effective intent after applying the deterministic gate policy.
-    `structured_flow` requires `problem_validated is True`; otherwise redirect to
-    `problem_statement`. All other intents pass through unchanged.
     """
-    if intent == "structured_flow" and not state.problem_validated:
-        logger.info(
-            "gate redirect",
-            requested=intent,
-            effective="problem_statement",
-            reason="problem_not_validated",
-        )
-        return "problem_statement"
     return intent
@@ -75,7 +78,7 @@ def stub_analysis_state(*, problem_validated: bool = False) -> AnalysisState:
         analysis_title="Stub analysis",
         problem_statement="Stub problem statement" if problem_validated else "",
         problem_validated=problem_validated,
-        owner_id="stub-user",
         report_id=None,
         created_at=now,
         updated_at=now,

     analysis_title: str
     problem_statement: str
     problem_validated: bool = False
+    user_id: str
     report_id: str | None = None
     created_at: datetime
     updated_at: datetime
 def gate(intent: Intent, state: AnalysisState) -> Intent:
+    """Return the effective intent (NEUTERED 2026-06-24 — passes everything through).
+    The `problem_validated` gate was removed: analysis is no longer gated on a validated
+    problem statement (the goal is now two user-entered fields, `objective` +
+    `business_questions`, captured at onboarding with no agent validation). Kept as a
+    no-op seam so gating can be restored without re-threading call sites.
     """
+    # Pre-2026-06-24 policy: redirect analytical requests until the goal was validated.
+    # if intent == "structured_flow" and not state.problem_validated:
+    #     logger.info(
+    #         "gate redirect",
+    #         requested=intent,
+    #         effective="problem_statement",
+    #         reason="problem_not_validated",
+    #     )
+    #     return "problem_statement"
     return intent
         analysis_title="Stub analysis",
         problem_statement="Stub problem statement" if problem_validated else "",
         problem_validated=problem_validated,
+        user_id="stub-user",
         report_id=None,
         created_at=now,
         updated_at=now,

src/agents/handlers/help.py CHANGED Viewed

@@ -7,18 +7,24 @@ it never runs analysis or produces data answers.
 The prompt lives in `config/prompts/help.md` (the playbook); this module composes
 the context and streams the LLM answer, mirroring `ChatbotAgent`. The **consistency
 guard** has teeth here, not just in the prompt: `_derive_available_actions` computes
-the actions actually allowed from the state (the same policy as `gate.py`), and that
-list is fed into the prompt — the LLM is told to suggest *only* those, so it can't
-tell the user to generate a report when the goal isn't validated or the analysis
-isn't ready.
 SEAMS:
-  - `AnalysisState` is the locked 8-field contract from `gate.py` (KM-652). The gate,
-    this skill, and tests share `gate.stub_analysis_state(...)` so they exercise the
-    same shape.
-  - `ReportReadiness` is the return shape of `is_report_ready(chat_history)` (seam #5,
-    Rifqi — not built yet). Help *consumes* it; it does not compute it. Until it lands,
-    the caller passes a stub (default: not ready).
 """
 from __future__ import annotations
@@ -59,17 +65,13 @@ class ReportReadiness:
 def _derive_available_actions(state: AnalysisState, report_ready: ReportReadiness) -> list[str]:
-    """Actions Help is allowed to suggest, derived from state (mirrors `gate.py`).
-    This is the consistency guard's teeth: analysis is gated behind a validated goal
-    (same rule the gate applies to `structured_flow`), and a report is only offered
-    when the readiness signal says so. Keep this policy in sync with `gate.gate`.
     """
-    if not state.problem_validated:
-        # Goal not set → the only useful move is defining the problem statement.
-        return ["define_problem_statement"]
-    actions = ["ask_analysis_question", "refine_problem_statement"]
     if report_ready.ready:
         actions.append("generate_report")
     return actions
@@ -78,11 +80,16 @@ def _derive_available_actions(state: AnalysisState, report_ready: ReportReadines
 def _format_state(state: AnalysisState) -> str:
     """Render the analysis state as a compact context block for the LLM."""
     has_report = "yes" if state.report_id else "no"
     return (
         "[Analysis state]\n"
         f"analysis_title: {state.analysis_title or '(none)'}\n"
-        f"problem_statement: {state.problem_statement or '(empty)'}\n"
-        f"problem_validated: {str(state.problem_validated).lower()}\n"
         f"has_report: {has_report}"
     )
@@ -173,7 +180,6 @@ class HelpAgent:
         actions = available_actions or _derive_available_actions(state, readiness)
         logger.info(
             "help guidance",
-            problem_validated=state.problem_validated,
             report_ready=readiness.ready,
             available_actions=actions,
         )

 The prompt lives in `config/prompts/help.md` (the playbook); this module composes
 the context and streams the LLM answer, mirroring `ChatbotAgent`. The **consistency
 guard** has teeth here, not just in the prompt: `_derive_available_actions` computes
+the actions actually allowed from the readiness signal, and that list is fed into the
+prompt — the LLM is told to suggest *only* those, so it can't tell the user to
+generate a report before the analysis is ready.
+NOTE (KM-652, 2026-06-24): the `problem_statement` skill + the `problem_validated`
+gate were removed — the goal is now two user-entered fields (`objective` +
+`business_questions`) captured at onboarding, with no agent validation. So Help no
+longer steers users to define/validate a goal in chat; it just orients them to
+analysis and (when ready) the report.
 SEAMS:
+  - `AnalysisState` is the contract from `gate.py`. The gate, this skill, and tests
+    share `gate.stub_analysis_state(...)` so they exercise the same shape. (The
+    `objective`/`business_questions` rename is in-flight — task #4 — so this module
+    reads those getattr-tolerantly, falling back to legacy `problem_statement`.)
+  - `ReportReadiness` is the return shape of `is_report_ready` (seam #5, Rifqi — built
+    in `report/readiness.py`). Help *consumes* it; it does not compute it. A missing
+    signal degrades to a not-ready stub.
 """
 from __future__ import annotations
 def _derive_available_actions(state: AnalysisState, report_ready: ReportReadiness) -> list[str]:
+    """Actions Help is allowed to suggest, derived from the readiness signal.
+    Since KM-652 there is no goal-validation gate: the goal (objective +
+    business_questions) is set in the onboarding form, so asking analysis questions is
+    always available. A report is only offered when the readiness signal says so.
     """
+    actions = ["ask_analysis_question"]
     if report_ready.ready:
         actions.append("generate_report")
     return actions
 def _format_state(state: AnalysisState) -> str:
     """Render the analysis state as a compact context block for the LLM."""
     has_report = "yes" if state.report_id else "no"
+    # Tolerant of the in-flight AnalysisState rename (#4): prefer objective +
+    # business_questions, fall back to the legacy free-text problem_statement.
+    objective = getattr(state, "objective", "") or getattr(state, "problem_statement", "") or ""
+    questions = getattr(state, "business_questions", None) or []
+    business_questions = "; ".join(questions) if questions else "(none)"
     return (
         "[Analysis state]\n"
         f"analysis_title: {state.analysis_title or '(none)'}\n"
+        f"objective: {objective or '(empty)'}\n"
+        f"business_questions: {business_questions}\n"
         f"has_report: {has_report}"
     )
         actions = available_actions or _derive_available_actions(state, readiness)
         logger.info(
             "help guidance",
             report_ready=readiness.ready,
             available_actions=actions,
         )

src/agents/handlers/problem_statement.py CHANGED Viewed

@@ -1,3 +1,7 @@
 """Problem Statement skill — guide the user to a usable problem statement.
 Routed by the orchestrator (intent `problem_statement`) and callable as a skill.

+# UNWIRED 2026-06-24: the problem_statement skill is no longer routed to — it was removed
+# from the 6-intent router and the gate (the goal is now user-entered objective +
+# business_questions, no agent validation). File kept intact (comment, don't delete) so
+# the skill can be restored if needed. See DEV_PLAN.md #1.
 """Problem Statement skill — guide the user to a usable problem statement.
 Routed by the orchestrator (intent `problem_statement`) and callable as a skill.

src/agents/orchestration.py CHANGED Viewed

@@ -32,7 +32,9 @@ logger = get_logger("orchestrator")
 Intent = Literal[
     "chat",
     "help",
-    "problem_statement",
     "check",
     "unstructured_flow",
     "structured_flow",
@@ -53,10 +55,10 @@ class RouterDecision(BaseModel):
         ...,
         description=(
             "Handler route for this message: 'chat' (conversational, no data), "
-            "'help' (what-to-do-next guidance), 'problem_statement' (define or "
-            "refine the analysis goal), 'check' (inventory: what data/documents "
-            "exist), 'unstructured_flow' (answer from documents, fast RAG), or "
-            "'structured_flow' (analytical question over data, slow Planner path)."
         ),
     )
     rewritten_query: str | None = Field(

 Intent = Literal[
     "chat",
     "help",
+    # "problem_statement",  # removed 2026-06-24 — the analysis goal is now two
+    #                        # user-entered fields (objective + business_questions),
+    #                        # captured at onboarding with no agent validation.
     "check",
     "unstructured_flow",
     "structured_flow",
         ...,
         description=(
             "Handler route for this message: 'chat' (conversational, no data), "
+            "'help' (what-to-do-next guidance), 'check' (inventory: what "
+            "data/documents exist), 'unstructured_flow' (answer from documents, fast "
+            "RAG), or 'structured_flow' (analytical question over data, slow Planner "
+            "path)."
         ),
     )
     rewritten_query: str | None = Field(

src/agents/report/generator.py CHANGED Viewed

@@ -167,9 +167,12 @@ def _build_human_content(
     ps: ProblemStatement, findings: list[ReportFinding], caveats: list[AttributedNote]
 ) -> str:
     sections = []
-    ps_lines = [v for v in (ps.objective, ps.target_value, ps.scope) if v]
-    if ps_lines:
-        sections.append("# Problem Statement\n" + "\n".join(ps_lines))
     sections.append(
         "# Findings (already finalized — synthesize, do not add numbers)\n"
         + "\n".join(f"- {f.text}" for f in findings)
@@ -182,16 +185,23 @@ def _build_human_content(
 def _render_markdown(report: AnalysisReport) -> str:
     # Version is deliberately NOT in the markdown — it is assigned by the store
     # after rendering and lives in the structured `version` field / API metadata.
-    parts: list[str] = ["# Analysis Report"]
-    parts.append(
-        f"*Generated {report.generated_at:%Y-%m-%d} · "
-        f"{len(report.record_ids)} analyses · {len(report.data_sources)} source(s)*"
-    )
     ps = report.problem_statement
-    ps_lines = [v for v in (ps.objective, ps.target_value, ps.scope) if v]
-    if ps_lines:
-        parts.append("## Problem Statement\n" + " ".join(ps_lines))
     if report.executive_summary:
         parts.append("## Executive Summary\n" + report.executive_summary)
@@ -203,18 +213,8 @@ def _render_markdown(report: AnalysisReport) -> str:
             lines.append(f"{i}. {f.text}{cite}")
         parts.append("\n".join(lines))
-    if report.caveats or report.open_questions:
-        lines = ["## Caveats & Open Questions"]
-        for n in report.caveats:
-            cite = f" *({', '.join(n.record_ids)})*" if n.record_ids else ""
-            lines.append(f"- {n.text}{cite}")
-        for n in report.open_questions:
-            cite = f" *({', '.join(n.record_ids)})*" if n.record_ids else ""
-            lines.append(f"- Open: {n.text}{cite}")
-        parts.append("\n".join(lines))
     if report.data_sources:
-        lines = ["## Appendix A — Data Used", "| source | type | detail |", "|---|---|---|"]
         for ds in report.data_sources:
             d = ds.detail
             bits = []
@@ -227,8 +227,18 @@ def _render_markdown(report: AnalysisReport) -> str:
             lines.append(f"| {ds.name} | {ds.source_type or '—'} | {' · '.join(bits) or '—'} |")
         parts.append("\n".join(lines))
     if report.method_steps:
-        lines = ["## Appendix B — Method"]
         for stage_key, label in _STAGE_LABELS:
             steps = [s for s in report.method_steps if s.stage == stage_key]
             if not steps:
@@ -239,7 +249,7 @@ def _render_markdown(report: AnalysisReport) -> str:
             lines.append(f"**{label}** — {rendered}")
         parts.append("\n".join(lines))
-    return "\n\n".join(parts)
 # --------------------------------------------------------------------------- #
@@ -264,9 +274,9 @@ class ReportGenerator:
     def _ensure_record_store(self):
         if self._record_store is None:
-            from ..slow_path.store import PostgresAnalysisStore
-            self._record_store = PostgresAnalysisStore()
         return self._record_store
     def _ensure_chain(self) -> Runnable:
@@ -286,6 +296,7 @@ class ReportGenerator:
         analysis_id: str,
         user_id: str | None = None,
         problem_statement: ProblemStatement | None = None,
     ) -> AnalysisReport:
         records = await self._ensure_record_store().list_for_analysis(analysis_id)
         if not records:
@@ -305,6 +316,7 @@ class ReportGenerator:
         report = AnalysisReport(
             analysis_id=analysis_id,
             user_id=user_id,
             version=0,  # assigned by ReportStore.save under the advisory lock
             generated_at=datetime.now(UTC),
             problem_statement=ps,

     ps: ProblemStatement, findings: list[ReportFinding], caveats: list[AttributedNote]
 ) -> str:
     sections = []
+    if ps.objective:
+        sections.append("# Objective\n" + ps.objective)
+    if ps.business_questions:
+        sections.append(
+            "# Business questions\n" + "\n".join(f"- {q}" for q in ps.business_questions)
+        )
     sections.append(
         "# Findings (already finalized — synthesize, do not add numbers)\n"
         + "\n".join(f"- {f.text}" for f in findings)
 def _render_markdown(report: AnalysisReport) -> str:
     # Version is deliberately NOT in the markdown — it is assigned by the store
     # after rendering and lives in the structured `version` field / API metadata.
+    meta = f"*Generated {report.generated_at:%Y-%m-%d}"
+    author = report.user_name or report.user_id
+    if author:
+        meta += f" by {author}"
+    meta += f" · {len(report.record_ids)} analyses · {len(report.data_sources)} source(s)*"
+    # Title + meta form the header block; each subsequent section is divided by a
+    # horizontal rule (`---`) so the report reads as a formal, sectioned document.
+    parts: list[str] = ["# Analysis Report\n" + meta]
     ps = report.problem_statement
+    if ps.objective:
+        parts.append("## Objective\n" + ps.objective)
+    if ps.business_questions:
+        parts.append(
+            "## Business Questions\n"
+            + "\n".join(f"{i}. {q}" for i, q in enumerate(ps.business_questions, 1))
+        )
     if report.executive_summary:
         parts.append("## Executive Summary\n" + report.executive_summary)
             lines.append(f"{i}. {f.text}{cite}")
         parts.append("\n".join(lines))
     if report.data_sources:
+        lines = ["## EDA", "| source | type | detail |", "|---|---|---|"]
         for ds in report.data_sources:
             d = ds.detail
             bits = []
             lines.append(f"| {ds.name} | {ds.source_type or '—'} | {' · '.join(bits) or '—'} |")
         parts.append("\n".join(lines))
+    if report.caveats or report.open_questions:
+        lines = ["## Notes & Limitations"]
+        for n in report.caveats:
+            cite = f" *({', '.join(n.record_ids)})*" if n.record_ids else ""
+            lines.append(f"- {n.text}{cite}")
+        for n in report.open_questions:
+            cite = f" *({', '.join(n.record_ids)})*" if n.record_ids else ""
+            lines.append(f"- Open: {n.text}{cite}")
+        parts.append("\n".join(lines))
     if report.method_steps:
+        lines = ["## How This Was Analyzed"]
         for stage_key, label in _STAGE_LABELS:
             steps = [s for s in report.method_steps if s.stage == stage_key]
             if not steps:
             lines.append(f"**{label}** — {rendered}")
         parts.append("\n".join(lines))
+    return "\n\n---\n\n".join(parts)
 # --------------------------------------------------------------------------- #
     def _ensure_record_store(self):
         if self._record_store is None:
+            from ..slow_path.store import PostgresReportInputStore
+            self._record_store = PostgresReportInputStore()
         return self._record_store
     def _ensure_chain(self) -> Runnable:
         analysis_id: str,
         user_id: str | None = None,
         problem_statement: ProblemStatement | None = None,
+        user_name: str | None = None,
     ) -> AnalysisReport:
         records = await self._ensure_record_store().list_for_analysis(analysis_id)
         if not records:
         report = AnalysisReport(
             analysis_id=analysis_id,
             user_id=user_id,
+            user_name=user_name,
             version=0,  # assigned by ReportStore.save under the advisory lock
             generated_at=datetime.now(UTC),
             problem_statement=ps,

src/agents/report/readiness.py CHANGED Viewed

@@ -7,8 +7,8 @@ not a judgement.
 The rule mirrors what makes a real report non-empty and worth generating, so Help can
 never suggest an action that would 409 or produce a duplicate:
-  1. `problem_validated` — the gate's own precondition (no validated goal, no
-     analysis worth reporting). Same rule `gate.gate` applies to `structured_flow`.
   2. at least one **substantive** persisted `AnalysisRecord` — a record whose
      *analysis* task succeeded. A failed run still persists a record WITH findings
      (they narrate the failure), and data-access tasks (check_/retrieve_) succeed even
@@ -45,15 +45,15 @@ if TYPE_CHECKING:
 logger = get_logger("report_readiness")
 # Human-readable gaps surfaced to the user via Help (kept stable for the prompt).
-_MISSING_PROBLEM = "a validated problem statement"
 _MISSING_ANALYSIS = "at least one completed analysis"
 _MISSING_DELTA = "a new analysis since the last report"
 def _default_record_store():
-    from ..slow_path.store import PostgresAnalysisStore
-    return PostgresAnalysisStore()
 def _default_report_store():
@@ -91,18 +91,22 @@ async def report_floor(
     *,
     record_store=None,
 ) -> tuple[list[str], list]:
-    """The report **floor**: a validated goal + ≥1 substantive analysis.
     Returns `(missing, substantive_records)`. This is the shared gate both the Help
     readiness signal AND the report API enforce, so the button and Help can't drift
-    (T-D / T11). It deliberately excludes the delta-since-report check — that is
-    advisory and lives only in `is_report_ready`; the report button is always allowed
-    to cut a new version (decision 4A). Fails closed (counts as missing analysis) on
-    a record-store read error. `record_store` is injectable for tests.
     """
     missing: list[str] = []
-    if not state.problem_validated:
-        missing.append(_MISSING_PROBLEM)
     substantive: list = []
     if analysis_id:
@@ -116,7 +120,7 @@ async def report_floor(
                 analysis_id=analysis_id,
                 error=str(exc),
             )
-            return [*missing, _MISSING_ANALYSIS], []
     if not substantive:
         missing.append(_MISSING_ANALYSIS)

 The rule mirrors what makes a real report non-empty and worth generating, so Help can
 never suggest an action that would 409 or produce a duplicate:
+  1. (removed 2026-06-24) a validated problem statement — the report no longer gates on
+     the goal (now user-entered `objective` + `business_questions`, no agent validation).
   2. at least one **substantive** persisted `AnalysisRecord` — a record whose
      *analysis* task succeeded. A failed run still persists a record WITH findings
      (they narrate the failure), and data-access tasks (check_/retrieve_) succeed even
 logger = get_logger("report_readiness")
 # Human-readable gaps surfaced to the user via Help (kept stable for the prompt).
+# _MISSING_PROBLEM retired 2026-06-24 — the report no longer gates on a validated goal.
 _MISSING_ANALYSIS = "at least one completed analysis"
 _MISSING_DELTA = "a new analysis since the last report"
 def _default_record_store():
+    from ..slow_path.store import PostgresReportInputStore
+    return PostgresReportInputStore()
 def _default_report_store():
     *,
     record_store=None,
 ) -> tuple[list[str], list]:
+    """The report **floor**: ≥1 substantive analysis.
     Returns `(missing, substantive_records)`. This is the shared gate both the Help
     readiness signal AND the report API enforce, so the button and Help can't drift
+    (T-D / T11).
+    CHANGED 2026-06-24: the `problem_validated` precondition was dropped — analysis is no
+    longer gated on a validated goal (now user-entered `objective` + `business_questions`,
+    no agent validation), so the only floor is "is there anything worth reporting". The
+    delta-since-report check stays advisory and lives only in `is_report_ready`; the
+    report button is always allowed to cut a new version (decision 4A). Fails closed
+    (counts as missing analysis) on a record-store read error. `record_store` is
+    injectable for tests. `state` stays in the signature (callers + the `is_report_ready`
+    delta check use it).
     """
     missing: list[str] = []
     substantive: list = []
     if analysis_id:
                 analysis_id=analysis_id,
                 error=str(exc),
             )
+            return [_MISSING_ANALYSIS], []
     if not substantive:
         missing.append(_MISSING_ANALYSIS)

src/agents/report/schemas.py CHANGED Viewed

@@ -22,17 +22,18 @@ from ..slow_path.schemas import TaskSummary
 class ProblemStatement(BaseModel):
-    """Minimal stub of Harry's Problem Statement, frozen into each report.
-    Loose on purpose until the real PS template lands (Analysis State, upstream).
-    A report snapshots the PS as it was at generation time.
     """
     objective: str = ""
-    metric_direction: str = ""  # "increase" | "decrease"
-    target_metric: str = ""
-    target_value: str = ""
-    scope: str = ""
 class DataSourceRef(BaseModel):
@@ -75,6 +76,7 @@ class AnalysisReport(BaseModel):
     report_id: str = Field(default_factory=lambda: uuid4().hex)
     analysis_id: str
     user_id: str | None = None
     version: int
     generated_at: datetime
     # Frozen snapshots.

 class ProblemStatement(BaseModel):
+    """The analysis goal, frozen into each report at generation time.
+    Analysis-State shape — `objective` + `business_questions`
+    — which replaced the old single free-text problem statement. A report snapshots
+    the goal as it was at generation time. Class name is kept (for now) to avoid an
+    import churn across report.py / generator.py / store.py; rename to `ReportGoal`
+    once the upstream AnalysisState rename (objective/business_questions) lands so
+    every caller migrates in one pass.
     """
     objective: str = ""
+    business_questions: list[str] = Field(default_factory=list)
 class DataSourceRef(BaseModel):
     report_id: str = Field(default_factory=lambda: uuid4().hex)
     analysis_id: str
     user_id: str | None = None
+    user_name: str | None = None  # display name for "generated by"; falls back to user_id
     version: int
     generated_at: datetime
     # Frozen snapshots.

src/agents/report/store.py CHANGED Viewed

@@ -1,6 +1,6 @@
 """ReportStore — persists/reads versioned AnalysisReports (KM-644).
-Mirrors `PostgresAnalysisStore`: each call opens its own `AsyncSessionLocal`.
 Version assignment is serialized per `analysis_id` with a Postgres
 transaction-level advisory lock so concurrent button presses can't compute the

 """ReportStore — persists/reads versioned AnalysisReports (KM-644).
+Mirrors `PostgresReportInputStore`: each call opens its own `AsyncSessionLocal`.
 Version assignment is serialized per `analysis_id` with a Postgres
 transaction-level advisory lock so concurrent button presses can't compute the

src/agents/slow_path/schemas.py CHANGED Viewed

@@ -69,7 +69,7 @@ class TaskSummary(BaseModel):
 class AnalysisRecord(BaseModel):
     # Identity. `record_id` is the unit the report cites and snapshots
     # (`record_ids`); `analysis_id`/`user_id` scope the record to one analysis
-    # session + owner and are stamped by the composition root / AnalysisStore at
     # persist time (they depend on the Analysis State that lives outside the slow
     # path), so they default to None when the Assembler first builds the record.
     record_id: str = Field(default_factory=lambda: uuid4().hex)

 class AnalysisRecord(BaseModel):
     # Identity. `record_id` is the unit the report cites and snapshots
     # (`record_ids`); `analysis_id`/`user_id` scope the record to one analysis
+    # session + owner and are stamped by the composition root / ReportInputStore at
     # persist time (they depend on the Analysis State that lives outside the slow
     # path), so they default to None when the Assembler first builds the record.
     record_id: str = Field(default_factory=lambda: uuid4().hex)

src/agents/slow_path/store.py CHANGED Viewed

@@ -1,13 +1,13 @@
-"""AnalysisStore — the seam the slow path persists its AnalysisRecord through.
 The Assembler produces an `AnalysisRecord` (the faithful, structured record of a
 run — §8.3, INV-4). Persisting it is a separate concern from streaming the answer,
 so it sits behind this seam. `generate_report` later reads records back by
 `analysis_id` (oldest-first) and renders from them — never from chat history.
-- `NullAnalysisStore` logs and stores nothing (kept for tests / when persistence
   is intentionally disabled).
-- `PostgresAnalysisStore` writes one `analysis_records` row per run in the catalog
   DB (Neon `dataeyond`, `settings.postgres_connstring`).
 `save` must never raise on the caller's path — a persistence failure must not break
@@ -23,7 +23,7 @@ from sqlalchemy import select
 from sqlalchemy.dialects.postgresql import insert
 from src.db.postgres.connection import AsyncSessionLocal
-from src.db.postgres.models import AnalysisRecordRow
 from src.middlewares.logging import get_logger
 from .schemas import AnalysisRecord
@@ -32,7 +32,7 @@ logger = get_logger("analysis_store")
 @runtime_checkable
-class AnalysisStore(Protocol):
     """Persist + read completed analyses.
     `save` must never raise on the caller's path. `list_for_analysis` returns the
@@ -44,12 +44,12 @@ class AnalysisStore(Protocol):
     async def list_for_analysis(self, analysis_id: str) -> list[AnalysisRecord]: ...
-class NullAnalysisStore:
     """No-op store: logs the record, persists nothing. Reads return empty."""
     async def save(self, record: AnalysisRecord) -> None:
         logger.info(
-            "analysis_record produced (not persisted — NullAnalysisStore)",
             record_id=record.record_id,
             plan_id=record.plan_id,
             n_tasks=len(record.tasks_run),
@@ -59,8 +59,8 @@ class NullAnalysisStore:
         return []
-class PostgresAnalysisStore:
-    """Writes/reads `analysis_records` jsonb rows in the catalog DB.
     Mirrors `CatalogStore`: each call opens its own `AsyncSession`. One row per
     record (vs. one-per-user for the catalog) since records accumulate per analysis.
@@ -70,7 +70,7 @@ class PostgresAnalysisStore:
         try:
             payload = record.model_dump(mode="json")
             async with AsyncSessionLocal() as session:
-                stmt = insert(AnalysisRecordRow).values(
                     id=record.record_id,
                     analysis_id=record.analysis_id,
                     user_id=record.user_id,
@@ -81,7 +81,7 @@ class PostgresAnalysisStore:
                 # Re-running the same plan id-collides only if record_id repeats;
                 # treat that as idempotent (overwrite) rather than erroring the user.
                 stmt = stmt.on_conflict_do_update(
-                    index_elements=[AnalysisRecordRow.id],
                     set_={"data": stmt.excluded.data},
                 )
                 await session.execute(stmt)
@@ -102,9 +102,9 @@ class PostgresAnalysisStore:
     async def list_for_analysis(self, analysis_id: str) -> list[AnalysisRecord]:
         async with AsyncSessionLocal() as session:
             result = await session.execute(
-                select(AnalysisRecordRow.data)
-                .where(AnalysisRecordRow.analysis_id == analysis_id)
-                .order_by(AnalysisRecordRow.created_at.asc())
             )
             rows = result.scalars().all()
         return [AnalysisRecord.model_validate(row) for row in rows]

+"""ReportInputStore — the seam the slow path persists its AnalysisRecord through.
 The Assembler produces an `AnalysisRecord` (the faithful, structured record of a
 run — §8.3, INV-4). Persisting it is a separate concern from streaming the answer,
 so it sits behind this seam. `generate_report` later reads records back by
 `analysis_id` (oldest-first) and renders from them — never from chat history.
+- `NullReportInputStore` logs and stores nothing (kept for tests / when persistence
   is intentionally disabled).
+- `PostgresReportInputStore` writes one `report_inputs` row per run in the catalog
   DB (Neon `dataeyond`, `settings.postgres_connstring`).
 `save` must never raise on the caller's path — a persistence failure must not break
 from sqlalchemy.dialects.postgresql import insert
 from src.db.postgres.connection import AsyncSessionLocal
+from src.db.postgres.models import ReportInputRow
 from src.middlewares.logging import get_logger
 from .schemas import AnalysisRecord
 @runtime_checkable
+class ReportInputStore(Protocol):
     """Persist + read completed analyses.
     `save` must never raise on the caller's path. `list_for_analysis` returns the
     async def list_for_analysis(self, analysis_id: str) -> list[AnalysisRecord]: ...
+class NullReportInputStore:
     """No-op store: logs the record, persists nothing. Reads return empty."""
     async def save(self, record: AnalysisRecord) -> None:
         logger.info(
+            "analysis_record produced (not persisted — NullReportInputStore)",
             record_id=record.record_id,
             plan_id=record.plan_id,
             n_tasks=len(record.tasks_run),
         return []
+class PostgresReportInputStore:
+    """Writes/reads `report_inputs` jsonb rows in the catalog DB.
     Mirrors `CatalogStore`: each call opens its own `AsyncSession`. One row per
     record (vs. one-per-user for the catalog) since records accumulate per analysis.
         try:
             payload = record.model_dump(mode="json")
             async with AsyncSessionLocal() as session:
+                stmt = insert(ReportInputRow).values(
                     id=record.record_id,
                     analysis_id=record.analysis_id,
                     user_id=record.user_id,
                 # Re-running the same plan id-collides only if record_id repeats;
                 # treat that as idempotent (overwrite) rather than erroring the user.
                 stmt = stmt.on_conflict_do_update(
+                    index_elements=[ReportInputRow.id],
                     set_={"data": stmt.excluded.data},
                 )
                 await session.execute(stmt)
     async def list_for_analysis(self, analysis_id: str) -> list[AnalysisRecord]:
         async with AsyncSessionLocal() as session:
             result = await session.execute(
+                select(ReportInputRow.data)
+                .where(ReportInputRow.analysis_id == analysis_id)
+                .order_by(ReportInputRow.created_at.asc())
             )
             rows = result.scalars().all()
         return [AnalysisRecord.model_validate(row) for row in rows]

src/agents/state_store.py CHANGED Viewed

@@ -5,7 +5,7 @@ The orchestrator gate + Help skill read `AnalysisState` (the locked contract in
 row shares its id with the chat `rooms` row — one session = one analysis = one
 conversation (`analysis_id == room_id`).
-Mirrors `PostgresAnalysisStore`: each call opens its own `AsyncSession`.
 """
 from __future__ import annotations
@@ -27,7 +27,7 @@ def _row_to_state(row: AnalysisStateRow) -> AnalysisState:
         analysis_title=row.analysis_title,
         problem_statement=row.problem_statement,
         problem_validated=row.problem_validated,
-        owner_id=row.owner_id,
         report_id=row.report_id,
         created_at=row.created_at,
         updated_at=row.updated_at,
@@ -45,7 +45,7 @@ class AnalysisStateStore:
     async def ensure(
         self,
         analysis_id: str,
-        owner_id: str,
         analysis_title: str = "New analysis",
     ) -> AnalysisState:
         """Get-or-create the state row for a session (idempotent, race-safe).
@@ -62,7 +62,7 @@ class AnalysisStateStore:
                 insert(AnalysisStateRow)
                 .values(
                     id=analysis_id,
-                    owner_id=owner_id,
                     analysis_title=analysis_title,
                     problem_statement="",
                     problem_validated=False,
@@ -78,7 +78,7 @@ class AnalysisStateStore:
         self,
         *,
         analysis_id: str,
-        owner_id: str,
         analysis_title: str = "New analysis",
         problem_statement: str = "",
     ) -> AnalysisState:
@@ -86,7 +86,7 @@ class AnalysisStateStore:
         async with AsyncSessionLocal() as session:
             row = AnalysisStateRow(
                 id=analysis_id,
-                owner_id=owner_id,
                 analysis_title=analysis_title,
                 problem_statement=problem_statement,
                 problem_validated=False,

 row shares its id with the chat `rooms` row — one session = one analysis = one
 conversation (`analysis_id == room_id`).
+Mirrors `PostgresReportInputStore`: each call opens its own `AsyncSession`.
 """
 from __future__ import annotations
         analysis_title=row.analysis_title,
         problem_statement=row.problem_statement,
         problem_validated=row.problem_validated,
+        user_id=row.user_id,
         report_id=row.report_id,
         created_at=row.created_at,
         updated_at=row.updated_at,
     async def ensure(
         self,
         analysis_id: str,
+        user_id: str,
         analysis_title: str = "New analysis",
     ) -> AnalysisState:
         """Get-or-create the state row for a session (idempotent, race-safe).
                 insert(AnalysisStateRow)
                 .values(
                     id=analysis_id,
+                    user_id=user_id,
                     analysis_title=analysis_title,
                     problem_statement="",
                     problem_validated=False,
         self,
         *,
         analysis_id: str,
+        user_id: str,
         analysis_title: str = "New analysis",
         problem_statement: str = "",
     ) -> AnalysisState:
         async with AsyncSessionLocal() as session:
             row = AnalysisStateRow(
                 id=analysis_id,
+                user_id=user_id,
                 analysis_title=analysis_title,
                 problem_statement=problem_statement,
                 problem_validated=False,

src/api/v1/analysis.py CHANGED Viewed

@@ -30,7 +30,7 @@ def _serialize_state(row: AnalysisStateRow, data_source_ids: list[str]) -> dict:
         "analysis_title": row.analysis_title,
         "problem_statement": row.problem_statement,
         "problem_validated": row.problem_validated,
-        "owner_id": row.owner_id,
         "report_id": row.report_id,
         "data_source_ids": data_source_ids,
         "created_at": row.created_at.isoformat() if row.created_at else None,
@@ -94,7 +94,7 @@ async def create_analysis(
     # id, created atomically in one transaction.
     state_row = AnalysisStateRow(
         id=analysis_id,
-        owner_id=request.user_id,
         analysis_title=request.analysis_title,
         problem_statement=request.problem_statement,
         problem_validated=False,
@@ -144,7 +144,7 @@ async def list_analyses(user_id: str, db: AsyncSession = Depends(get_db)):
     """
     result = await db.execute(
         select(AnalysisStateRow)
-        .where(AnalysisStateRow.owner_id == user_id)
         .order_by(AnalysisStateRow.updated_at.desc())
     )
     rows = result.scalars().all()

         "analysis_title": row.analysis_title,
         "problem_statement": row.problem_statement,
         "problem_validated": row.problem_validated,
+        "user_id": row.user_id,
         "report_id": row.report_id,
         "data_source_ids": data_source_ids,
         "created_at": row.created_at.isoformat() if row.created_at else None,
     # id, created atomically in one transaction.
     state_row = AnalysisStateRow(
         id=analysis_id,
+        user_id=request.user_id,
         analysis_title=request.analysis_title,
         problem_statement=request.problem_statement,
         problem_validated=False,
     """
     result = await db.execute(
         select(AnalysisStateRow)
+        .where(AnalysisStateRow.user_id == user_id)
         .order_by(AnalysisStateRow.updated_at.desc())
     )
     rows = result.scalars().all()

src/api/v1/report.py CHANGED Viewed

@@ -45,10 +45,37 @@ async def _load_state(analysis_id: str):
 def _problem_statement_from(state) -> ProblemStatement:
-    """Map the analysis's free-text problem statement into the report's structured PS."""
-    if state is None or not state.problem_statement:
         return ProblemStatement()
-    return ProblemStatement(objective=state.problem_statement)
 async def _record_report_on_state(analysis_id: str, report_id: str) -> None:
@@ -109,8 +136,9 @@ async def generate_report(
     try:
         problem_statement = _problem_statement_from(state)
         report = await _generator.generate(
-            analysis_id, user_id, problem_statement=problem_statement
         )
     except ReportError as e:
         raise HTTPException(status_code=status.HTTP_409_CONFLICT, detail=str(e)) from e
@@ -121,6 +149,14 @@ async def generate_report(
             detail=f"Report generation failed: {e}",
         ) from e
     try:
         saved = await _store.save(report)
     except Exception as e:

 def _problem_statement_from(state) -> ProblemStatement:
+    """Freeze the analysis goal into the report's snapshot.
+    Bridges the 2026-06-24 AnalysisState rework: prefer the new `objective` +
+    `business_questions` fields when the state carries them, else fall back to the
+    legacy free-text `problem_statement`. So the report works both before and after
+    the state-model migration lands (#4 / dedorch #3).
+    """
+    if state is None:
         return ProblemStatement()
+    objective = getattr(state, "objective", "") or getattr(state, "problem_statement", "") or ""
+    business_questions = list(getattr(state, "business_questions", []) or [])
+    return ProblemStatement(objective=objective, business_questions=business_questions)
+async def _resolve_user_name(user_id: str) -> str | None:
+    """Best-effort display name (`users.fullname`) for the report's "generated by".
+    Never-throw: a missing user or read error falls back to None, so the generator
+    shows the raw `user_id`. Resolving it here keeps the report self-contained (#19);
+    swap to a Go-passed display name later if the team prefers.
+    """
+    try:
+        from src.db.postgres.connection import AsyncSessionLocal
+        from src.db.postgres.models import User
+        async with AsyncSessionLocal() as session:
+            user = await session.get(User, user_id)
+            return user.fullname if user is not None else None
+    except Exception as e:  # noqa: BLE001 — never block a report on the name lookup
+        logger.warning("report: user name resolve failed", user_id=user_id, error=str(e))
+        return None
 async def _record_report_on_state(analysis_id: str, report_id: str) -> None:
     try:
         problem_statement = _problem_statement_from(state)
+        user_name = await _resolve_user_name(user_id)
         report = await _generator.generate(
+            analysis_id, user_id, problem_statement=problem_statement, user_name=user_name
         )
     except ReportError as e:
         raise HTTPException(status_code=status.HTTP_409_CONFLICT, detail=str(e)) from e
             detail=f"Report generation failed: {e}",
         ) from e
+    # ⚠️ TRANSITIONAL — Go is to own ALL writes:
+    # the report becomes a content-only skill (FE → Go → Python) and Go persists to the
+    # `reports`/`analyses` tables. Until Go exposes those write endpoints, Python still
+    # self-persists here:
+    #   _store.save(report)            → inserts the versioned `reports` row
+    #   _record_report_on_state(...)   → writes report_id back onto the `analyses` row
+    # Remove both (return `report` content only) once Go's report-write + state-write
+    # endpoints land.
     try:
         saved = await _store.save(report)
     except Exception as e:

src/api/v1/tools.py CHANGED Viewed

@@ -4,18 +4,17 @@ Exposes the agent's user-invocable slash-command catalog so the Golang backend
 can cache it and the frontend can render its "/" command menu WITHOUT calling the
 AI agent for every list (Golang GETs + caches `list_tools`).
-Scope confirmed: the catalog is the UNIFIED set of
-everything the user can invoke via `/` —
-spanning what the team internally splits into skills + analytics tools +
-data-access tools. Naming : verb-first, kebab-case, `/` prefix.
-Each command maps 1:1 to a real internal tool/intent `name` (the dispatch key);
-the granular data-access tools (check_data, check_knowledge, retrieve_data,
-retrieve_knowledge) are listed separately.
-NOTE: the merged `check` intent still exists for natural-language routing — it is
-NOT a slash command; slash invocation bypasses the router to the tool directly.
-Deferred analytics tools (comparison/contribution/profile/segment) are NOT
-exposed (not wired to the Planner).
 Stateless and deterministic — safe for the Golang backend to cache.
 """
@@ -49,6 +48,16 @@ class ListToolsResponse(BaseModel):
 # Single source of truth for the FE slash-command catalog. Order = display order.
 # Keep `command` in Harry's convention (verb-first, kebab-case, `/`); `name` is the
 # internal route/tool name used by the orchestrator.
 _COMMAND_CATALOG: list[CommandResponse] = [
     CommandResponse(
         command="/help",
@@ -57,60 +66,67 @@ _COMMAND_CATALOG: list[CommandResponse] = [
         description="Show what the assistant can do and guide your next step.",
     ),
     CommandResponse(
-        command="/problem-statement",
-        name="problem_statement",
         type="skill",
-        description="Define and validate your analysis goal (objective + metric) "
-        "before exploring data.",
-    ),
-    CommandResponse(
-        command="/analyze-descriptive",
-        name="analyze_descriptive",
-        type="analytics",
-        description="Summary statistics for selected columns (count, mean, min, max, …).",
-    ),
-    CommandResponse(
-        command="/analyze-aggregate",
-        name="analyze_aggregate",
-        type="analytics",
-        description="Group and aggregate values (sum, count, average) by dimension.",
-    ),
-    CommandResponse(
-        command="/analyze-correlation",
-        name="analyze_correlation",
-        type="analytics",
-        description="Correlation strength between numeric columns.",
-    ),
-    CommandResponse(
-        command="/analyze-trend",
-        name="analyze_trend",
-        type="analytics",
-        description="Trend of a value over time at a chosen frequency.",
-    ),
-    CommandResponse(
-        command="/check-data",
-        name="check_data",
-        type="data_access",
-        description="Inventory of the available structured data sources.",
-    ),
-    CommandResponse(
-        command="/check-knowledge",
-        name="check_knowledge",
-        type="data_access",
-        description="Inventory of the available knowledge / uploaded documents.",
-    ),
-    CommandResponse(
-        command="/retrieve-data",
-        name="retrieve_data",
-        type="data_access",
-        description="Pull rows from a structured source for analysis.",
-    ),
-    CommandResponse(
-        command="/retrieve-knowledge",
-        name="retrieve_knowledge",
-        type="data_access",
-        description="Retrieve relevant passages from your uploaded documents.",
     ),
 ]

 can cache it and the frontend can render its "/" command menu WITHOUT calling the
 AI agent for every list (Golang GETs + caches `list_tools`).
+Scope (2026-06-24, KM-674): the FE slash-command catalog is now just the two
+FE-callable skills — `/help` and `/report`. Naming: verb-first, kebab-case, `/`
+prefix; each command maps 1:1 to a real internal tool/intent `name` (the dispatch
+key).
+The analytics + data-access tools (analyze_*, check_*, retrieve_*) and the retired
+`/problem-statement` skill are kept COMMENTED in the catalog below, NOT deleted —
+they still exist and run via the router/Planner, and check_data/check_knowledge are
+served by Golang; they are simply not surfaced in the FE slash menu for now. Slash
+invocation bypasses the router to the tool directly, so re-exposing one is a matter
+of un-commenting its entry.
 Stateless and deterministic — safe for the Golang backend to cache.
 """
 # Single source of truth for the FE slash-command catalog. Order = display order.
 # Keep `command` in Harry's convention (verb-first, kebab-case, `/`); `name` is the
 # internal route/tool name used by the orchestrator.
+#
+# 2026-06-24 (KM-674 batch): the FE-callable skills are ONLY /help + /report. The rest
+# below are COMMENTED OUT — NOT deleted — on purpose:
+#   - /problem-statement is retired (objective + business_questions now live in the
+#     New-Analysis form, not a slash skill).
+#   - check_data / check_knowledge stay available but are served by Golang, not exposed
+#     in the FE slash menu.
+#   - the analytics + data-access tools still exist and run via the router/Planner; they
+#     are simply not surfaced as FE slash commands here.
+# Re-enable any line if the FE slash menu is later widened back out.
 _COMMAND_CATALOG: list[CommandResponse] = [
     CommandResponse(
         command="/help",
         description="Show what the assistant can do and guide your next step.",
     ),
     CommandResponse(
+        command="/report",
+        name="report",
         type="skill",
+        description="Generate a versioned analysis report (background, EDA, "
+        "key findings, insights).",
     ),
+    # CommandResponse(
+    #     command="/problem-statement",
+    #     name="problem_statement",
+    #     type="skill",
+    #     description="Define and validate your analysis goal (objective + metric) "
+    #     "before exploring data.",
+    # ),
+    # CommandResponse(
+    #     command="/analyze-descriptive",
+    #     name="analyze_descriptive",
+    #     type="analytics",
+    #     description="Summary statistics for selected columns (count, mean, min, max, …).",
+    # ),
+    # CommandResponse(
+    #     command="/analyze-aggregate",
+    #     name="analyze_aggregate",
+    #     type="analytics",
+    #     description="Group and aggregate values (sum, count, average) by dimension.",
+    # ),
+    # CommandResponse(
+    #     command="/analyze-correlation",
+    #     name="analyze_correlation",
+    #     type="analytics",
+    #     description="Correlation strength between numeric columns.",
+    # ),
+    # CommandResponse(
+    #     command="/analyze-trend",
+    #     name="analyze_trend",
+    #     type="analytics",
+    #     description="Trend of a value over time at a chosen frequency.",
+    # ),
+    # CommandResponse(
+    #     command="/check-data",
+    #     name="check_data",
+    #     type="data_access",
+    #     description="Inventory of the available structured data sources.",
+    # ),
+    # CommandResponse(
+    #     command="/check-knowledge",
+    #     name="check_knowledge",
+    #     type="data_access",
+    #     description="Inventory of the available knowledge / uploaded documents.",
+    # ),
+    # CommandResponse(
+    #     command="/retrieve-data",
+    #     name="retrieve_data",
+    #     type="data_access",
+    #     description="Pull rows from a structured source for analysis.",
+    # ),
+    # CommandResponse(
+    #     command="/retrieve-knowledge",
+    #     name="retrieve_knowledge",
+    #     type="data_access",
+    #     description="Retrieve relevant passages from your uploaded documents.",
+    # ),
 ]

src/config/prompts/help.md CHANGED Viewed

@@ -1,5 +1,8 @@
-<!-- help.md · v1 · Help skill prompt. Bump to v2 (don't silently overwrite) on major change,
-     e.g. when real UI steps land from the frontend. See checkpoint 2026-06-18. -->
 You are the **Help guide** for an AI data-analysis assistant. Think of yourself as the
 instruction sheet that comes with a board game: your only job is to tell the user
@@ -12,8 +15,8 @@ You are given context, never raw user prose to analyze:
 - **`analysis_state`** — the current per-analysis state. Fields you use:
   - `analysis_title` — what this analysis is called.
-  - `problem_statement` — the user's goal (may be empty/weak; it is optional at creation).
-  - `problem_validated` (bool) — **the gate.** `false` = the goal still needs work; `true` = the goal is set and analysis is unlocked.
   - `report_id` — `0`/absent means no report has ever been generated.
 - **`chat_history`** — the conversation so far. Use it to judge how far along the user is and to avoid repeating yourself.
 - **`report_ready`** — a **deterministic** signal computed for you (NOT your judgment):
@@ -35,35 +38,34 @@ Keep it short. Lead with the next step; don't recap everything.
 ## State-tiered guidance
-Pick the branch that matches `analysis_state` + `report_ready`:
-### A. `problem_validated == false` → fix the goal first
-The user can't get good analysis without a clear goal. Steer them to define or sharpen the
-problem statement.
-- If `problem_statement` is empty: encourage them to state what they want to find out, and mention the AI can help — they can run **`/problem_statement`** (or just describe their goal in chat).
-- If `problem_statement` exists but is vague: gently push for something more **measurable and concrete** (a target, a metric, a timeframe), grounded in their `analysis_title` and the data they've bound. Give one short example of a sharper version.
-- Do **not** push analysis or reports yet.
-### B. `problem_validated == true`, little/no analysis yet → orient to analysis
-Tell them the goal is set and they can start asking questions about their data. Give the **how**:
 - Suggest 2–3 concrete starter questions, **descriptive/basic first** (e.g. "Which products sell the most?", "How have sales trended this month?").
-- **Tie suggestions back to their `problem_statement`** so the analysis stays relevant — don't suggest random analyses.
 - **Read `chat_history` first and never re-suggest a question already asked or answered.** Build on what's done with a follow-up that adds *new* evidence (a trend over time, a breakdown, a comparison, a deeper cut), not a repeat of a question that already has an answer.
 - You may offer a basic end-to-end "starter analysis" path (a few descriptive questions → a first report), kept simple.
-### C. `problem_validated == true`, analysis under way, `report_ready.ready == false` → close the gaps
 They've started but there isn't enough yet for a report. Point at `report_ready.missing` and
 recommend the specific next questions that would fill those gaps (phrase them as questions
-the user can ask), still anchored to the problem statement.
-### D. `problem_validated == true` and `report_ready.ready == true` → nudge toward the report
 There's enough to report. Encourage them to generate it. Report can be triggered **two ways**:
-the **`/generate report`** skill **or** the report button — mention both so it feels natural.
 Do not over-promise the report's depth.
 ## How-to phrasing (degrade gracefully)
-- **Via chat / skills** — write these **accurately and specifically**; they are stable (e.g. "type your question in the chat", "run `/problem_statement`", "run `/generate report`").
 - **Via the UI (buttons/menus)** — the frontend isn't final yet. Describe UI steps **generically** ("use the Generate Report option") rather than naming exact buttons/positions you're unsure of. Prefer the chat/skill path when unsure. *(A later version of this file will fill in the real UI steps.)*
 - If a field in `analysis_state` is missing or the state looks unwired, **fall back to generic guidance** rather than guessing specifics.
@@ -84,24 +86,16 @@ English). A few sentences is usually enough.
 ## Examples
 ```
-State: problem_validated=false, problem_statement=""
-→ "Looks like we haven't set a goal yet. Tell me what you want to find out — for example,
-   'reduce churn next quarter' — or run /problem_statement and I'll help you shape it."
-State: problem_validated=false, problem_statement="make sales better"
-→ "Your goal is a good start but a bit broad. Let's make it measurable — e.g. 'grow north-region
-   revenue by 10% this quarter.' Run /problem_statement and we'll refine it together."
-State: problem_validated=true, chat_history nearly empty
 → "Your goal is set — you can start exploring now. Try a basic question first, like
    'Which products sell the most?' or 'How have monthly sales trended?', then we can dig into
-   what's driving your goal."
-State: problem_validated=true, report_ready.ready=false, missing=["no comparison over time"]
 → "Good progress. Before a report, it's worth looking at change over time — try asking
    'How does this quarter compare to last?' Once we have that, we can put the report together."
-State: problem_validated=true, report_ready.ready=true
-→ "You've covered enough to summarize. You can generate your report now — run /generate report
    or use the report option to create it."
 ```

+<!-- help.md · v2 · Help skill prompt. v2 (2026-06-24, KM-652): removed the problem_statement
+     skill + the problem_validated gate — the goal (objective + business_questions) is now set
+     in the New Analysis form at onboarding, so Help no longer steers users to define/validate a
+     goal in chat. Bump to v3 (don't silently overwrite) on the next major change (e.g. real UI
+     steps from the frontend). -->
 You are the **Help guide** for an AI data-analysis assistant. Think of yourself as the
 instruction sheet that comes with a board game: your only job is to tell the user
 - **`analysis_state`** — the current per-analysis state. Fields you use:
   - `analysis_title` — what this analysis is called.
+  - `objective` — the user's goal (set in the New Analysis form at onboarding).
+  - `business_questions` — the specific questions the user wants answered (set in the form).
   - `report_id` — `0`/absent means no report has ever been generated.
 - **`chat_history`** — the conversation so far. Use it to judge how far along the user is and to avoid repeating yourself.
 - **`report_ready`** — a **deterministic** signal computed for you (NOT your judgment):
 ## State-tiered guidance
+The goal (`objective` + `business_questions`) is already set at onboarding, so your job is to
+move the user *through* the analysis — not to define the goal. Pick the branch that matches
+`analysis_state` + `report_ready`:
+### A. Little/no analysis yet → orient to analysis
+Tell them they can start asking questions about their data, and give the **how**:
 - Suggest 2–3 concrete starter questions, **descriptive/basic first** (e.g. "Which products sell the most?", "How have sales trended this month?").
+- **Tie suggestions back to their `objective` and `business_questions`** so the analysis stays relevant — don't suggest random analyses.
 - **Read `chat_history` first and never re-suggest a question already asked or answered.** Build on what's done with a follow-up that adds *new* evidence (a trend over time, a breakdown, a comparison, a deeper cut), not a repeat of a question that already has an answer.
 - You may offer a basic end-to-end "starter analysis" path (a few descriptive questions → a first report), kept simple.
+### B. Analysis under way, `report_ready.ready == false` → close the gaps
 They've started but there isn't enough yet for a report. Point at `report_ready.missing` and
 recommend the specific next questions that would fill those gaps (phrase them as questions
+the user can ask), still anchored to the objective and business questions.
+### C. `report_ready.ready == true` → nudge toward the report
 There's enough to report. Encourage them to generate it. Report can be triggered **two ways**:
+the **`/report`** skill **or** the report button — mention both so it feels natural.
 Do not over-promise the report's depth.
+> Edge case: if `objective` looks empty (unusual — it's required at onboarding), don't push a
+> chat skill to fix it; gently suggest they set the objective + business questions in the New
+> Analysis form.
 ## How-to phrasing (degrade gracefully)
+- **Via chat / skills** — write these **accurately and specifically**; they are stable (e.g. "type your question in the chat", "run `/report`").
 - **Via the UI (buttons/menus)** — the frontend isn't final yet. Describe UI steps **generically** ("use the Generate Report option") rather than naming exact buttons/positions you're unsure of. Prefer the chat/skill path when unsure. *(A later version of this file will fill in the real UI steps.)*
 - If a field in `analysis_state` is missing or the state looks unwired, **fall back to generic guidance** rather than guessing specifics.
 ## Examples
 ```
+State: chat_history nearly empty
 → "Your goal is set — you can start exploring now. Try a basic question first, like
    'Which products sell the most?' or 'How have monthly sales trended?', then we can dig into
+   what's driving your objective."
+State: report_ready.ready=false, missing=["no comparison over time"]
 → "Good progress. Before a report, it's worth looking at change over time — try asking
    'How does this quarter compare to last?' Once we have that, we can put the report together."
+State: report_ready.ready=true
+→ "You've covered enough to summarize. You can generate your report now — run /report
    or use the report option to create it."
 ```

src/config/prompts/intent_router.md CHANGED Viewed

@@ -7,7 +7,7 @@ Return three fields:
 - **`intent`** — exactly one of:
   - `chat` — conversational, no data needed: greetings, farewells, thanks, "how are you", "what can you do", small talk.
   - `help` — the user wants to know **what to do next** or how the process works ("what's the next step?", "how do I start?", "what should I do now?").
-  - `problem_statement` — the user wants to **define or refine the analysis goal**: the business problem, objectives, what to increase/decrease, targets/success metrics — or is answering questions about the goal.
   - `check` — the user wants an **inventory** of what they have: "what data do I have?", "what columns are in this table?", "what documents did I upload?", "describe my dataset". This is metadata/listing, not analysis.
   - `unstructured_flow` — the user asks about a **topic, concept, feature, explanation, or factual knowledge** that may live in uploaded documents (PDF/DOCX/TXT). Pure document Q&A. The user need not mention a document.
   - `structured_flow` — the user asks an **analytical question over their data**: counts, sums, top-N, filters, comparisons, trends, correlations, segments, share-of-total, joins across structured sources. This routes to the slow analytical path.
@@ -18,16 +18,14 @@ Return three fields:
 1. Pure greeting / farewell / thanks / "what can you do" / compliment with no task → `chat`.
 2. "What do I do next / how do I proceed / where do I start" → `help`.
-3. The user states or refines a goal, objective, target, or success metric, or answers a goal-defining question → `problem_statement`.
-4. "What data / columns / tables / documents do I have", "describe my data", inventory or metadata requests → `check`.
-5. A question answerable from document prose — a topic, concept, feature, explanation, summary, or factual knowledge, even without naming a document → `unstructured_flow`.
-6. An analytical question answerable by computing over tabular/DB data (counts, sums, top-N, filters, comparisons, trends, correlations, segments) → `structured_flow`.
 ## Disambiguation (the boundaries that matter)
 - **`check` vs `structured_flow`** — "what do I have / describe it" → `check`; "analyze / compute / trend / correlate / compare it" → `structured_flow`.
 - **`unstructured_flow` vs `structured_flow`** — pure document/concept Q&A → `unstructured_flow`; anything needing computation over tabular/DB data → `structured_flow`. **When in doubt between "analytical AND also needs document context" → `structured_flow`** (the analytical path can pull document context itself). Only choose `unstructured_flow` for *pure* document questions with no computation.
-- **`help` vs `problem_statement`** — "what's next?" → `help`; "here is my goal / let's define the objective" → `problem_statement`.
 - **`chat` vs everything else** — only use `chat` when there is no task and no data question at all.
 ## Rewriting follow-ups
@@ -58,16 +56,6 @@ User: "Okay I uploaded my data, what do I do next?"
 User: "How does this work? Where should I start?"
 → intent="help", rewritten_query=null, confidence=0.9
-User: "I want to reduce customer churn next quarter, target under 5%."
-→ intent="problem_statement",
-  rewritten_query="Define the analysis goal: reduce customer churn next quarter to under 5%.",
-  confidence=0.9
-User: "My goal is to grow revenue in the north region."
-→ intent="problem_statement",
-  rewritten_query="Define the analysis goal: grow revenue in the north region.",
-  confidence=0.88
 User: "What data do I have?"
 → intent="check", rewritten_query="What data sources do I have?", confidence=0.95
@@ -113,7 +101,7 @@ User: "And in March?"
 ## Constraints
-- Pick exactly one `intent`. Do not invent values outside the six listed.
 - Prefer `unstructured_flow` over `structured_flow` only for pure knowledge/document questions; prefer `structured_flow` whenever computation over data is involved.
 - Do not refuse — refusal happens later in guardrails. Just classify.
 - One JSON object as output; no prose, no markdown.

 - **`intent`** — exactly one of:
   - `chat` — conversational, no data needed: greetings, farewells, thanks, "how are you", "what can you do", small talk.
   - `help` — the user wants to know **what to do next** or how the process works ("what's the next step?", "how do I start?", "what should I do now?").
+  <!-- `problem_statement` intent removed 2026-06-24 — the analysis goal is now two user-entered fields (objective + business_questions) captured at onboarding, with no agent validation. -->
   - `check` — the user wants an **inventory** of what they have: "what data do I have?", "what columns are in this table?", "what documents did I upload?", "describe my dataset". This is metadata/listing, not analysis.
   - `unstructured_flow` — the user asks about a **topic, concept, feature, explanation, or factual knowledge** that may live in uploaded documents (PDF/DOCX/TXT). Pure document Q&A. The user need not mention a document.
   - `structured_flow` — the user asks an **analytical question over their data**: counts, sums, top-N, filters, comparisons, trends, correlations, segments, share-of-total, joins across structured sources. This routes to the slow analytical path.
 1. Pure greeting / farewell / thanks / "what can you do" / compliment with no task → `chat`.
 2. "What do I do next / how do I proceed / where do I start" → `help`.
+3. "What data / columns / tables / documents do I have", "describe my data", inventory or metadata requests → `check`.
+4. A question answerable from document prose — a topic, concept, feature, explanation, summary, or factual knowledge, even without naming a document → `unstructured_flow`.
+5. An analytical question answerable by computing over tabular/DB data (counts, sums, top-N, filters, comparisons, trends, correlations, segments) → `structured_flow`.
 ## Disambiguation (the boundaries that matter)
 - **`check` vs `structured_flow`** — "what do I have / describe it" → `check`; "analyze / compute / trend / correlate / compare it" → `structured_flow`.
 - **`unstructured_flow` vs `structured_flow`** — pure document/concept Q&A → `unstructured_flow`; anything needing computation over tabular/DB data → `structured_flow`. **When in doubt between "analytical AND also needs document context" → `structured_flow`** (the analytical path can pull document context itself). Only choose `unstructured_flow` for *pure* document questions with no computation.
 - **`chat` vs everything else** — only use `chat` when there is no task and no data question at all.
 ## Rewriting follow-ups
 User: "How does this work? Where should I start?"
 → intent="help", rewritten_query=null, confidence=0.9
 User: "What data do I have?"
 → intent="check", rewritten_query="What data sources do I have?", confidence=0.95
 ## Constraints
+- Pick exactly one `intent`. Do not invent values outside the five listed.
 - Prefer `unstructured_flow` over `structured_flow` only for pure knowledge/document questions; prefer `structured_flow` whenever computation over data is involved.
 - Do not refuse — refusal happens later in guardrails. Just classify.
 - One JSON object as output; no prose, no markdown.

src/config/prompts/report_summary.md CHANGED Viewed

@@ -1,10 +1,11 @@
 You are a senior data analyst writing the **executive summary** of an analysis report.
-You are given the Problem Statement and a list of already-finalized findings and caveats drawn from completed analyses. Write a concise executive summary (3–5 sentences) that synthesizes those findings in relation to the stated goal.
 Rules:
 - Synthesize and prioritize — lead with the most decision-relevant finding.
 - Do NOT introduce any number, fact, or claim that is not present in the findings. You are summarizing, not analyzing.
-- Do NOT simply restate every finding; connect them into a narrative and say what they mean for the goal.
 - If the findings are thin or inconclusive, say so plainly rather than overstating.
-- Plain business language, prose only — no headings, no bullet lists.

 You are a senior data analyst writing the **executive summary** of an analysis report.
+You are given the analysis Objective and its Business questions, plus a list of already-finalized findings and caveats drawn from completed analyses. Write a concise executive summary (3–5 sentences) that synthesizes those findings in relation to the objective and, where the findings allow, the business questions.
 Rules:
 - Synthesize and prioritize — lead with the most decision-relevant finding.
 - Do NOT introduce any number, fact, or claim that is not present in the findings. You are summarizing, not analyzing.
+- Do NOT simply restate every finding; connect them into a narrative and say what they mean for the objective.
 - If the findings are thin or inconclusive, say so plainly rather than overstating.
+- Plain business language. Write **prose only — no headings, no bullet lists** (the report already supplies the section structure and a Key Findings list below this summary; do not duplicate them).
+- You MAY use light inline markdown for emphasis within the prose — `**bold**` for the most decision-relevant figure or term, `*italic*` sparingly. Keep it subtle; do not bold whole sentences.

src/config/settings.py CHANGED Viewed

@@ -24,11 +24,10 @@ class Settings(BaseSettings):
     # real source lands, so this stays opt-in.
     enable_slow_path: bool = Field(alias="enable_slow_path", default=False)
-    # Apply the deterministic gate (problem_validated) before dispatch: redirect
-    # `structured_flow` to `problem_statement` until the analysis is validated. Off
-    # by default — legacy `rooms` have no `analysis_states` row, so it would gate
-    # everything. Flip ENABLE_GATE=true once the frontend creates analyses via
-    # /analysis/create.
     enable_gate: bool = Field(alias="enable_gate", default=False)
     # Database

     # real source lands, so this stays opt-in.
     enable_slow_path: bool = Field(alias="enable_slow_path", default=False)
+    # DEPRECATED 2026-06-24: the problem_validated gate was removed (the goal is now
+    # user-entered objective + business_questions, no agent validation). This flag no
+    # longer has any effect — the gate call site in ChatHandler is commented out. Kept
+    # to avoid .env churn; remove once no environment references it.
     enable_gate: bool = Field(alias="enable_gate", default=False)
     # Database

src/db/postgres/init_db.py CHANGED Viewed

@@ -4,7 +4,7 @@ from sqlalchemy import text
 from src.db.postgres.connection import engine, Base
 from src.db.postgres.models import (
     AnalysisDataSourceRow,
-    AnalysisRecordRow,
     AnalysisReportRow,
     AnalysisStateRow,
     Catalog,

 from src.db.postgres.connection import engine, Base
 from src.db.postgres.models import (
     AnalysisDataSourceRow,
+    ReportInputRow,
     AnalysisReportRow,
     AnalysisStateRow,
     Catalog,

src/db/postgres/models.py CHANGED Viewed

@@ -127,7 +127,7 @@ class Catalog(Base):
     updated_at = Column(DateTime(timezone=True), onupdate=func.now())
-class AnalysisRecordRow(Base):
     """One row per completed slow-path analysis (the report's source of truth).
     `data` holds the full Pydantic AnalysisRecord
@@ -138,11 +138,23 @@ class AnalysisRecordRow(Base):
     `analysis_id` is nullable until the Analysis State (owned upstream) is wired
     into the slow path; records still persist (and carry `user_id`) before then.
     """
-    __tablename__ = "analysis_records"
-    id = Column(String, primary_key=True)  # AnalysisRecord.record_id
-    analysis_id = Column(String, index=True)  # FK to the analysis session (nullable for now)
     user_id = Column(String, nullable=False, index=True)
     plan_id = Column(String, nullable=False)
     data = Column(JSONB, nullable=False)
@@ -169,23 +181,39 @@ class AnalysisReportRow(Base):
 class AnalysisStateRow(Base):
-    """Per-analysis session state — the dedorch `analysis` table (Go-owned migration).
     One session = one analysis = one conversation; `id` is the shared session id
-    (canonical UUID). The orchestrator gate + Help skill read this every turn;
-    `problem_validated` gates structured analysis; the Problem Statement skill flips
-    it; `report_id` is null until a report exists. `id`/`report_id` are Postgres
-    `uuid` in dedorch, so they bind as UUID (canonical-string in/out). Class name
-    kept as `AnalysisStateRow`; only the table + id types changed for dedorch.
     """
-    __tablename__ = "analysis"
     id = Column(UUID(as_uuid=False), primary_key=True)  # shared session id (uuid)
     analysis_title = Column(String, nullable=False, default="New analysis")
     problem_statement = Column(Text, nullable=False, default="")
     problem_validated = Column(Boolean, nullable=False, default=False)
-    owner_id = Column(String, nullable=False, index=True)
     report_id = Column(UUID(as_uuid=False), nullable=True)
     created_at = Column(DateTime(timezone=True), server_default=func.now())
     updated_at = Column(
         DateTime(timezone=True), server_default=func.now(), onupdate=func.now()

     updated_at = Column(DateTime(timezone=True), onupdate=func.now())
+class ReportInputRow(Base):
     """One row per completed slow-path analysis (the report's source of truth).
     `data` holds the full Pydantic AnalysisRecord
     `analysis_id` is nullable until the Analysis State (owned upstream) is wired
     into the slow path; records still persist (and carry `user_id`) before then.
+    OWNERSHIP / HANDOFF (#21/#22, 2026-06-25 checkpoint): table **renamed `analysis_records`
+    → `report_inputs`** — it holds the inputs report generation reads (the slow-path run
+    records). "report_inputs" avoids clashing with Go's `analyses_messages` and with Langfuse
+    observability. **Python-owned for now** (Python still creates it locally); the finalized
+    schema goes to Harry so the dedorch migration creates it post-cutover (#22), where
+    `id`/`analysis_id` will be `uuid` (+ FK to `analyses(id)`). The Pydantic `AnalysisRecord`
+    (the in-memory run object) is intentionally kept. Slated to migrate to Go ownership later —
+    keep this + DEV_PLAN #21/#22 as the handoff record. NOTE: dedorch currently still has the
+    OLD `analysis_records` table (empty) until Harry's rename migration lands.
     """
+    __tablename__ = "report_inputs"
+    # id/analysis_id are `uuid` to match dedorch's `report_inputs` + the analysis-family
+    # (analyses/reports/data_sources). No FK declared in Python (dedorch's migration owns it, #22).
+    id = Column(UUID(as_uuid=False), primary_key=True)  # AnalysisRecord.record_id (uuid hex ok)
+    analysis_id = Column(UUID(as_uuid=False), index=True)  # the analysis session id (nullable for now)
     user_id = Column(String, nullable=False, index=True)
     plan_id = Column(String, nullable=False)
     data = Column(JSONB, nullable=False)
 class AnalysisStateRow(Base):
+    """Per-analysis session state — the dedorch **`analyses`** table (plural; Go-owned).
     One session = one analysis = one conversation; `id` is the shared session id
+    (canonical UUID). Verified against the dedorch DB 2026-06-25.
+    dedorch `analyses` ACTUAL columns: `id` (uuid), `analysis_title`, `user_id` (text),
+    `report_id` (uuid), `created_at`, `updated_at`, `problem_statement`,
+    `problem_validated`, `status` (text 'active'|'inactive' — soft-delete),
+    `data_bind` (jsonb), `data_bind_version` (int), `report_collection` (jsonb).
+    Reconciled to that shape (#4, 2026-06-26): `user_id` (was `owner_id`) + `status`/`data_bind`/
+    `data_bind_version`/`report_collection` added. dedorch still carries `problem_statement`/
+    `problem_validated` and does NOT yet have `objective`/`business_questions` — Harry's #3 drops
+    the former + adds the latter; the report layer reads the goal getattr-tolerantly so that swap
+    stays non-breaking. The new FE/Go columns are stored to match dedorch but NOT surfaced in the
+    `AnalysisState` pydantic contract (no Python reader needs them yet).
+    `analysis` (singular) is the deprecated DUPLICATE table Harry will drop — never use it.
+    Class name kept as `AnalysisStateRow`.
     """
+    __tablename__ = "analyses"
     id = Column(UUID(as_uuid=False), primary_key=True)  # shared session id (uuid)
     analysis_title = Column(String, nullable=False, default="New analysis")
     problem_statement = Column(Text, nullable=False, default="")
     problem_validated = Column(Boolean, nullable=False, default=False)
+    user_id = Column(String, nullable=False, index=True)  # was owner_id (dedorch uses user_id)
     report_id = Column(UUID(as_uuid=False), nullable=True)
+    # dedorch `analyses` columns (FE/Go concerns; carried so create_all matches dedorch).
+    status = Column(String, nullable=False, default="active")  # active | inactive (soft-delete)
+    data_bind = Column(JSONB, nullable=False, default=list)
+    data_bind_version = Column(Integer, nullable=False, default=1)
+    report_collection = Column(JSONB, nullable=False, default=list)
     created_at = Column(DateTime(timezone=True), server_default=func.now())
     updated_at = Column(
         DateTime(timezone=True), server_default=func.now(), onupdate=func.now()