Agentic-Service-Data-Eyond-Catalog

Running

App Files Files Community

sofhiaazzhr commited on 19 days ago

Commit

faf8a4f

2 Parent(s): 0ae8717 212dad3

Merge commit 'refs/pr/1' of https://huggingface.co/spaces/DataEyond/Agentic-Service-Data-Eyond-Catalog into pr/1

Browse files

Files changed (5) hide show

PROGRESS.md +29 -9
pyproject.toml +2 -1
src/query/compiler/pandas.py +1 -1
src/query/compiler/sql.py +288 -11
src/query/executor/db.py +171 -8

PROGRESS.md CHANGED Viewed

@@ -2,8 +2,8 @@
 Persistent tracker mirroring the 42-item ownership table in `REPO_CONTEXT.md` "Team — division of work". Update as PRs land. Future Claude Code sessions read this to know what's already done.
-**Last updated**: 2026-05-07 (PR2a — enricher + structured pipeline shipped)
-**Current open PR**: PR2a (DB owner — CatalogEnricher + StructuredPipeline + on_db_registered trigger + FK extension)
 ---
@@ -22,9 +22,10 @@ Persistent tracker mirroring the 42-item ownership table in `REPO_CONTEXT.md` "T
 |---|---|---|---|
 | PR1 | `[x]` merged | DB | Contract locks + catalog plumbing + DB introspector + IR validator + tests |
 | PR1-tab | `[ ]` | TAB | Tabular introspector + golden IR examples for tabular |
-| PR2a | `[~]` open | DB | CatalogEnricher + StructuredPipeline + on_db_registered trigger + FK extension on Table |
 | PR2b | `[ ]` | B | IntentRouter + planner prompt (pair) + planner LLM service |
-| PR3 | `[ ]` | B (split) | SQL compiler + DB executor (DB), pandas compiler + tabular executor (TAB) |
 | PR4 | `[ ]` | B (pair) | ExecutorDispatcher + QueryService + chat stream endpoint integration |
 | PR5 | `[ ]` | B | Retry/self-correction loop on execution failure |
 | PR6 | `[ ]` | B | Eval harness (golden question→IR→result examples) |
@@ -91,10 +92,10 @@ Persistent tracker mirroring the 42-item ownership table in `REPO_CONTEXT.md` "T
 | # | Item | Status | Notes |
 |---|---|---|---|
-| 25 | SQL compiler (`query/compiler/sql.py`) | `[ ]` | PR3 — IR → (sql, params); identifiers from catalog (quoted), values parameterized |
-| 26 | DB executor (`query/executor/db.py`) | `[ ]` | PR3 — asyncpg/pymysql; sqlglot SELECT-only guard; RO txn; 30s timeout |
 | 27 | Credential encryption (`security/credentials.py`) | `[ ]` | Stub exists; PR1 reused Phase 1 `utils/db_credential_encryption.py` instead. Move in cleanup PR |
-| 28 | User-DB connection management | `[ ]` | Reused Phase 1 `db_pipeline_service.engine_scope` in PR1; new helper in PR3 if needed |
 ### Query — Tabular path
@@ -124,7 +125,7 @@ Persistent tracker mirroring the 42-item ownership table in `REPO_CONTEXT.md` "T
 | # | Item | Owner | Status | Notes |
 |---|---|---|---|---|
-| 38 | DB compiler golden tests (`tests/query/compiler/test_sql.py`) | DB | `[ ]` | PR3 — pure-Python, no LLM |
 | 39 | Pandas compiler golden tests (`tests/query/compiler/test_pandas.py`) | TAB | `[ ]` | PR3 — pure-Python, no LLM |
 | 40 | IR validator tests (`tests/query/ir/test_validator.py`) | B | `[x]` | PR1 — 19 tests, all rules covered |
 | — | PII detector tests (`tests/catalog/test_pii_detector.py`) | B | `[x]` | PR1 — 26 tests (parametrized) |
@@ -139,7 +140,26 @@ Persistent tracker mirroring the 42-item ownership table in `REPO_CONTEXT.md` "T
 ---
-## What just shipped (PR2a — DB owner)
 **Files implemented**:
 - `src/catalog/enricher.py` — Azure OpenAI GPT-4o + structured output (`EnrichmentResponse`), `render_source` (reusable by planner prompt later), `apply_descriptions` merger, injectable `structured_chain` for tests

 Persistent tracker mirroring the 42-item ownership table in `REPO_CONTEXT.md` "Team — division of work". Update as PRs land. Future Claude Code sessions read this to know what's already done.
+**Last updated**: 2026-05-07 (PR3-DB — SQL compiler + DB executor shipped)
+**Current open PR**: PR3-DB (DB owner — SqlCompiler + DbExecutor + golden IR→SQL tests)
 ---
 |---|---|---|---|
 | PR1 | `[x]` merged | DB | Contract locks + catalog plumbing + DB introspector + IR validator + tests |
 | PR1-tab | `[ ]` | TAB | Tabular introspector + golden IR examples for tabular |
+| PR2a | `[x]` merged | DB | CatalogEnricher + StructuredPipeline + on_db_registered trigger + FK extension on Table |
 | PR2b | `[ ]` | B | IntentRouter + planner prompt (pair) + planner LLM service |
+| PR3-DB | `[~]` open | DB | SqlCompiler (Postgres) + DbExecutor (sqlglot guard, RO + statement_timeout, asyncio.to_thread) + 36 golden IR→SQL tests |
+| PR3-TAB | `[ ]` | TAB | Pandas compiler + tabular executor + golden IR→DataFrame tests |
 | PR4 | `[ ]` | B (pair) | ExecutorDispatcher + QueryService + chat stream endpoint integration |
 | PR5 | `[ ]` | B | Retry/self-correction loop on execution failure |
 | PR6 | `[ ]` | B | Eval harness (golden question→IR→result examples) |
 | # | Item | Status | Notes |
 |---|---|---|---|
+| 25 | SQL compiler (`query/compiler/sql.py`) | `[x]` | PR3-DB — Postgres dialect (Supabase reuses); deterministic IR → (sql, named-params dict); double-quoted identifiers from catalog; all whitelisted ops (=, !=, <, <=, >, >=, in, not_in, is_null, is_not_null, like, between); alias-aware order_by; `CompiledSql.params: dict[str, Any]` (changed from `list`). MySQL/BigQuery/Snowflake compilers later. |
+| 26 | DB executor (`query/executor/db.py`) | `[x]` | PR3-DB — sync engine via `db_pipeline_service.engine_scope` inside `asyncio.to_thread`. sqlglot SELECT-only / no-DML guard. Postgres-only session settings: `default_transaction_read_only=on` + `statement_timeout=30000`. asyncio.wait_for backstop. Never raises — populates `QueryResult.error`. 10k row hard cap. |
 | 27 | Credential encryption (`security/credentials.py`) | `[ ]` | Stub exists; PR1 reused Phase 1 `utils/db_credential_encryption.py` instead. Move in cleanup PR |
+| 28 | User-DB connection management | `[x]` | PR3-DB reused Phase 1 `db_pipeline_service.engine_scope` (same as PR1 introspector); no new helper needed |
 ### Query — Tabular path
 | # | Item | Owner | Status | Notes |
 |---|---|---|---|---|
+| 38 | DB compiler golden tests (`tests/query/compiler/test_sql.py`) | DB | `[x]` | PR3-DB — 36 tests across all whitelisted ops, identifier quoting, agg / count_distinct / count(*), order_by alias resolution, parameter sequencing, error paths. Pure-Python, no LLM, no DB. |
 | 39 | Pandas compiler golden tests (`tests/query/compiler/test_pandas.py`) | TAB | `[ ]` | PR3 — pure-Python, no LLM |
 | 40 | IR validator tests (`tests/query/ir/test_validator.py`) | B | `[x]` | PR1 — 19 tests, all rules covered |
 | — | PII detector tests (`tests/catalog/test_pii_detector.py`) | B | `[x]` | PR1 — 26 tests (parametrized) |
 ---
+## What just shipped (PR3-DB — DB owner)
+**Files implemented**:
+- `src/query/compiler/sql.py` — `SqlCompiler` for Postgres dialect; `CompiledSql(sql, params)` dataclass with `params: dict[str, Any]` (changed from `list`); supports all 12 whitelisted filter ops, all 6 aggs, alias-aware order_by; `_qident` escapes embedded double-quotes
+- `src/query/executor/db.py` — `DbExecutor` with sqlglot SELECT-only guard, Postgres session-level read-only + 30s `statement_timeout`, `asyncio.wait_for` backstop, 10k row hard cap; rejects non-`schema` source_type and `dbclient://` URI mismatch; never raises (populates `QueryResult.error`)
+**Files extended**:
+- `src/query/compiler/pandas.py` — fixed pre-existing UP035 (Callable import)
+- `pyproject.toml` — added `S608` to `tests/**` ruff ignore (false positive: tests assert literal SQL strings)
+**Tests added** (36 new, all passing — total now 100):
+- `tests/query/compiler/test_sql.py` — every filter op, every agg, count(*), count_distinct, order_by alias vs column, multi-filter AND, identifier quoting escape, error paths
+**Lint**: `ruff check` clean on Phase 2 paths.
+**Hand-off note for teammate**: `CompiledSql.params` is now `dict[str, Any]` not `list`. The pandas compiler will follow the same convention (or document its own) — coordinate when PR3-TAB lands.
+---
+## What shipped previously (PR2a — DB owner)
 **Files implemented**:
 - `src/catalog/enricher.py` — Azure OpenAI GPT-4o + structured output (`EnrichmentResponse`), `render_source` (reusable by planner prompt later), `apply_descriptions` merger, injectable `structured_chain` for tests

pyproject.toml CHANGED Viewed

@@ -121,7 +121,8 @@ ignore = [
 ]
 [tool.ruff.lint.per-file-ignores]
-"tests/**" = ["S101", "S105", "S106"]
 [tool.mypy]
 python_version = "3.12"

 ]
 [tool.ruff.lint.per-file-ignores]
+# S608 in tests is a false positive — tests assert literal SQL strings as fixtures.
+"tests/**" = ["S101", "S105", "S106", "S608"]
 [tool.mypy]
 python_version = "3.12"

src/query/compiler/pandas.py CHANGED Viewed

@@ -5,7 +5,7 @@ For tabular sources. The callable encapsulates the chain of operations
 to a DataFrame loaded eagerly or via predicate pushdown / polars lazy scan.
 """
-from typing import Callable
 from ...catalog.models import Catalog
 from ..ir.models import QueryIR

 to a DataFrame loaded eagerly or via predicate pushdown / polars lazy scan.
 """
+from collections.abc import Callable
 from ...catalog.models import Catalog
 from ..ir.models import QueryIR

src/query/compiler/sql.py CHANGED Viewed

@@ -1,28 +1,305 @@
-"""SqlCompiler — IR → (SQL string, parameters list).
-Identifiers (table, column names) come from the catalog (trusted).
-Values come from IR.filters and are ALWAYS parameterized — never inlined.
-Output is validated by sqlglot before reaching the executor.
 """
-from dataclasses import dataclass
-from ...catalog.models import Catalog
-from ..ir.models import QueryIR
 from .base import BaseCompiler
 @dataclass
 class CompiledSql:
     sql: str
-    params: list[object]
 class SqlCompiler(BaseCompiler):
-    """Deterministic IR → SQL. No LLM."""
-    def __init__(self, catalog: Catalog) -> None:
         self._catalog = catalog
     def compile(self, ir: QueryIR) -> CompiledSql:
-        raise NotImplementedError

+"""SqlCompiler — IR → (SQL string, named-params dict).
+Identifiers (table / column names) come from the catalog and are quoted
+verbatim — they were verified by the IR validator against the catalog,
+so injection through identifiers is not possible at this layer.
+Values from filter clauses are ALWAYS parameterized.
+The output `CompiledSql.sql` uses SQLAlchemy-style named placeholders
+(`:p_0, :p_1, ...`) so it can be executed via `text(sql)` with a params
+dict on a sync SQLAlchemy engine.
+v1 supports the Postgres dialect only. Supabase reuses the same compiler
+output (Supabase = Postgres). MySQL / BigQuery / Snowflake compilers will
+be separate classes that implement `BaseCompiler`.
 """
+from __future__ import annotations
+from dataclasses import dataclass, field
+from typing import Any
+from ...catalog.models import Catalog, Column, Source, Table
+from ..ir.models import (
+    AggSelect,
+    ColumnSelect,
+    FilterClause,
+    OrderByClause,
+    QueryIR,
+    SelectItem,
+)
 from .base import BaseCompiler
 @dataclass
 class CompiledSql:
     sql: str
+    params: dict[str, Any] = field(default_factory=dict)
+class SqlCompilerError(Exception):
+    pass
+_NULLARY_OPS = frozenset({"is_null", "is_not_null"})
+_LIST_OPS = frozenset({"in", "not_in"})
+_COMPARISON_OPS = frozenset({"=", "!=", "<", "<=", ">", ">="})
 class SqlCompiler(BaseCompiler):
+    """Deterministic IR → Postgres SQL. No LLM."""
+    def __init__(self, catalog: Catalog, dialect: str = "postgres") -> None:
+        if dialect not in {"postgres", "supabase"}:
+            raise SqlCompilerError(
+                f"only 'postgres' / 'supabase' supported in v1, got {dialect!r}"
+            )
         self._catalog = catalog
+        self._dialect = dialect
     def compile(self, ir: QueryIR) -> CompiledSql:
+        _, table, cols_by_id = self._lookup(ir)
+        params: dict[str, Any] = {}
+        param_seq = [0]
+        select_clause, select_aliases = self._build_select(ir.select, table, cols_by_id)
+        from_clause = self._build_from(table)
+        where_clause = self._build_where(ir.filters, table, cols_by_id, params, param_seq)
+        groupby_clause = self._build_groupby(ir.group_by, table, cols_by_id)
+        orderby_clause = self._build_orderby(
+            ir.order_by, table, cols_by_id, select_aliases
+        )
+        limit_clause = self._build_limit(ir.limit)
+        parts: list[str] = [select_clause, from_clause]
+        for clause in (where_clause, groupby_clause, orderby_clause, limit_clause):
+            if clause:
+                parts.append(clause)
+        return CompiledSql(sql=" ".join(parts), params=params)
+    # ------------------------------------------------------------------
+    # Catalog lookup
+    # ------------------------------------------------------------------
+    def _lookup(self, ir: QueryIR) -> tuple[Source, Table, dict[str, Column]]:
+        source = next(
+            (s for s in self._catalog.sources if s.source_id == ir.source_id), None
+        )
+        if source is None:
+            raise SqlCompilerError(f"source_id {ir.source_id!r} not in catalog")
+        table = next(
+            (t for t in source.tables if t.table_id == ir.table_id), None
+        )
+        if table is None:
+            raise SqlCompilerError(
+                f"table_id {ir.table_id!r} not in source {ir.source_id!r}"
+            )
+        return source, table, {c.column_id: c for c in table.columns}
+    # ------------------------------------------------------------------
+    # Identifier quoting
+    # ------------------------------------------------------------------
+    @staticmethod
+    def _qident(name: str) -> str:
+        """Postgres-style double-quoted identifier with embedded-quote escape."""
+        return '"' + name.replace('"', '""') + '"'
+    def _qcol(self, table: Table, col: Column) -> str:
+        return f"{self._qident(table.name)}.{self._qident(col.name)}"
+    # ------------------------------------------------------------------
+    # Clauses
+    # ------------------------------------------------------------------
+    def _build_select(
+        self,
+        items: list[SelectItem],
+        table: Table,
+        cols_by_id: dict[str, Column],
+    ) -> tuple[str, set[str]]:
+        if not items:
+            raise SqlCompilerError("select clause cannot be empty")
+        parts: list[str] = []
+        aliases: set[str] = set()
+        for i, item in enumerate(items):
+            expr, alias = self._select_item(item, table, cols_by_id, i)
+            if alias:
+                parts.append(f"{expr} AS {self._qident(alias)}")
+                aliases.add(alias)
+            else:
+                parts.append(expr)
+        return "SELECT " + ", ".join(parts), aliases
+    def _select_item(
+        self,
+        item: SelectItem,
+        table: Table,
+        cols_by_id: dict[str, Column],
+        index: int,
+    ) -> tuple[str, str | None]:
+        if isinstance(item, ColumnSelect):
+            col = self._require_col(cols_by_id, item.column_id, f"select[{index}]")
+            return self._qcol(table, col), item.alias
+        if not isinstance(item, AggSelect):
+            raise SqlCompilerError(
+                f"select[{index}]: unknown SelectItem kind {type(item).__name__}"
+            )
+        return self._compile_agg(item, table, cols_by_id, index), item.alias
+    def _compile_agg(
+        self,
+        item: AggSelect,
+        table: Table,
+        cols_by_id: dict[str, Column],
+        index: int,
+    ) -> str:
+        if item.fn == "count_distinct":
+            if item.column_id is None:
+                raise SqlCompilerError(
+                    f"select[{index}].fn=count_distinct requires column_id"
+                )
+            col = self._require_col(cols_by_id, item.column_id, f"select[{index}]")
+            return f"COUNT(DISTINCT {self._qcol(table, col)})"
+        if item.column_id is None:
+            if item.fn != "count":
+                raise SqlCompilerError(
+                    f"select[{index}].fn={item.fn!r} requires column_id "
+                    "(only 'count' may omit it for COUNT(*))"
+                )
+            return "COUNT(*)"
+        col = self._require_col(cols_by_id, item.column_id, f"select[{index}]")
+        return f"{item.fn.upper()}({self._qcol(table, col)})"
+    def _build_from(self, table: Table) -> str:
+        return f"FROM {self._qident(table.name)}"
+    def _build_where(
+        self,
+        filters: list[FilterClause],
+        table: Table,
+        cols_by_id: dict[str, Column],
+        params: dict[str, Any],
+        param_seq: list[int],
+    ) -> str:
+        if not filters:
+            return ""
+        parts = [
+            self._compile_filter(f, table, cols_by_id, params, param_seq, index=i)
+            for i, f in enumerate(filters)
+        ]
+        return "WHERE " + " AND ".join(parts)
+    def _compile_filter(
+        self,
+        f: FilterClause,
+        table: Table,
+        cols_by_id: dict[str, Column],
+        params: dict[str, Any],
+        param_seq: list[int],
+        index: int,
+    ) -> str:
+        col = self._require_col(cols_by_id, f.column_id, f"filters[{index}]")
+        col_ref = self._qcol(table, col)
+        op = f.op
+        if op == "is_null":
+            return f"{col_ref} IS NULL"
+        if op == "is_not_null":
+            return f"{col_ref} IS NOT NULL"
+        if op in _LIST_OPS:
+            if not isinstance(f.value, list) or not f.value:
+                raise SqlCompilerError(
+                    f"filters[{index}]: op {op!r} requires a non-empty list value"
+                )
+            placeholders = [
+                ":" + self._next_param(params, param_seq, v) for v in f.value
+            ]
+            sql_op = "IN" if op == "in" else "NOT IN"
+            return f"{col_ref} {sql_op} ({', '.join(placeholders)})"
+        if op == "between":
+            if not isinstance(f.value, list) or len(f.value) != 2:
+                raise SqlCompilerError(
+                    f"filters[{index}]: op 'between' requires a list of two values"
+                )
+            lo = self._next_param(params, param_seq, f.value[0])
+            hi = self._next_param(params, param_seq, f.value[1])
+            return f"{col_ref} BETWEEN :{lo} AND :{hi}"
+        if op == "like":
+            p = self._next_param(params, param_seq, f.value)
+            return f"{col_ref} LIKE :{p}"
+        if op in _COMPARISON_OPS:
+            p = self._next_param(params, param_seq, f.value)
+            return f"{col_ref} {op} :{p}"
+        # Should not reach here — IRValidator already filters disallowed ops
+        raise SqlCompilerError(f"filters[{index}]: unhandled op {op!r}")
+    def _build_groupby(
+        self,
+        group_by: list[str],
+        table: Table,
+        cols_by_id: dict[str, Column],
+    ) -> str:
+        if not group_by:
+            return ""
+        parts = [
+            self._qcol(table, self._require_col(cols_by_id, col_id, f"group_by[{i}]"))
+            for i, col_id in enumerate(group_by)
+        ]
+        return "GROUP BY " + ", ".join(parts)
+    def _build_orderby(
+        self,
+        order_by: list[OrderByClause],
+        table: Table,
+        cols_by_id: dict[str, Column],
+        select_aliases: set[str],
+    ) -> str:
+        if not order_by:
+            return ""
+        parts: list[str] = []
+        for i, ob in enumerate(order_by):
+            if ob.column_id in cols_by_id:
+                ref = self._qcol(table, cols_by_id[ob.column_id])
+            elif ob.column_id in select_aliases:
+                ref = self._qident(ob.column_id)
+            else:
+                raise SqlCompilerError(
+                    f"order_by[{i}].column_id: {ob.column_id!r} not in table "
+                    "columns or select aliases"
+                )
+            parts.append(f"{ref} {ob.dir.upper()}")
+        return "ORDER BY " + ", ".join(parts)
+    def _build_limit(self, limit: int | None) -> str:
+        if limit is None:
+            return ""
+        return f"LIMIT {int(limit)}"
+    # ------------------------------------------------------------------
+    # Helpers
+    # ------------------------------------------------------------------
+    @staticmethod
+    def _next_param(
+        params: dict[str, Any], param_seq: list[int], value: Any
+    ) -> str:
+        name = f"p_{param_seq[0]}"
+        param_seq[0] += 1
+        params[name] = value
+        return name
+    @staticmethod
+    def _require_col(
+        cols_by_id: dict[str, Column], col_id: str, where: str
+    ) -> Column:
+        col = cols_by_id.get(col_id)
+        if col is None:
+            raise SqlCompilerError(f"{where}.column_id: {col_id!r} not in table")
+        return col

src/query/executor/db.py CHANGED Viewed

@@ -1,25 +1,188 @@
-"""DbExecutor — runs compiled SQL on a user's external DB.
 Pipeline:
-  IR → SqlCompiler → SQL string + params
        ↓
-  sqlglot validation (SELECT-only, whitelist tables/columns, LIMIT enforced)
        ↓
-  asyncpg / pymysql in read-only transaction with timeout (30s)
        ↓
-  QueryResult
 """
-from ...catalog.models import Catalog
-from ..compiler.sql import SqlCompiler
 from ..ir.models import QueryIR
 from .base import BaseExecutor, QueryResult
 class DbExecutor(BaseExecutor):
     def __init__(self, catalog: Catalog) -> None:
         self._catalog = catalog
         self._compiler = SqlCompiler(catalog)
     async def run(self, ir: QueryIR) -> QueryResult:
-        raise NotImplementedError

+"""DbExecutor — runs a compiled IR against a user's external SQL database.
 Pipeline:
+  IR → SqlCompiler.compile()  →  CompiledSql(sql, params)
        ↓
+  sqlglot guard  (defense-in-depth: SELECT-only, no DML / DDL)
        ↓
+  resolve creds (catalog.location_ref → dbclient://{client_id} → DatabaseClient
+                 row → Fernet decrypt)
        ↓
+  asyncio.to_thread(_run_sync)
+    └ db_pipeline_service.engine_scope(db_type, creds)
+       └ session-level: default_transaction_read_only + statement_timeout=30s
+                        (postgres / supabase only)
+       └ engine.execute(text(sql), params)
+       ↓
+  QueryResult (always returned — errors populate `.error`, never raised)
 """
+from __future__ import annotations
+import asyncio
+import time
+from typing import Any
+import sqlglot
+import sqlglot.expressions as exp
+from sqlalchemy import text
+from ...catalog.models import Catalog, Source
+from ...database_client.database_client_service import database_client_service
+from ...db.postgres.connection import AsyncSessionLocal
+from ...middlewares.logging import get_logger
+from ...pipeline.db_pipeline import db_pipeline_service
+from ...utils.db_credential_encryption import decrypt_credentials_dict
+from ..compiler.sql import CompiledSql, SqlCompiler
 from ..ir.models import QueryIR
 from .base import BaseExecutor, QueryResult
+logger = get_logger("db_executor")
+_QUERY_TIMEOUT_SECONDS = 30
+_ROW_HARD_CAP = 10_000  # belt-and-suspenders cap regardless of LIMIT
+_DBCLIENT_PREFIX = "dbclient://"
+_POSTGRES_LIKE = frozenset({"postgres", "supabase"})
 class DbExecutor(BaseExecutor):
+    """Executes compiled SQL on the user's registered DB.
+    Constructed once per query with the user's catalog. The catalog is the
+    source of truth for identifiers; the executor never touches the user's
+    DB metadata at execution time.
+    """
     def __init__(self, catalog: Catalog) -> None:
         self._catalog = catalog
         self._compiler = SqlCompiler(catalog)
     async def run(self, ir: QueryIR) -> QueryResult:
+        started = time.perf_counter()
+        try:
+            source = self._find_source(ir.source_id)
+            if source.source_type != "schema":
+                raise ValueError(
+                    f"DbExecutor cannot run on source_type={source.source_type!r}; "
+                    "expected 'schema'"
+                )
+            compiled = self._compiler.compile(ir)
+            self._sqlglot_guard(compiled.sql)
+            client_id = self._parse_client_id(source.location_ref)
+            client = await self._fetch_client(client_id)
+            if client.user_id != self._catalog.user_id:
+                raise PermissionError(
+                    f"DatabaseClient {client_id!r} owner mismatch "
+                    f"(client.user_id != catalog.user_id)"
+                )
+            creds = decrypt_credentials_dict(client.credentials)
+            rows = await asyncio.wait_for(
+                asyncio.to_thread(self._run_sync, client.db_type, creds, compiled),
+                timeout=_QUERY_TIMEOUT_SECONDS,
+            )
+            truncated = len(rows) > _ROW_HARD_CAP
+            capped = rows[:_ROW_HARD_CAP]
+            elapsed_ms = int((time.perf_counter() - started) * 1000)
+            logger.info(
+                "db query complete",
+                source_id=ir.source_id,
+                rows=len(capped),
+                truncated=truncated,
+                elapsed_ms=elapsed_ms,
+            )
+            return QueryResult(
+                source_id=ir.source_id,
+                backend="sql",
+                rows=capped,
+                row_count=len(capped),
+                truncated=truncated,
+                elapsed_ms=elapsed_ms,
+            )
+        except Exception as e:
+            elapsed_ms = int((time.perf_counter() - started) * 1000)
+            logger.error(
+                "db executor failed",
+                source_id=ir.source_id,
+                error=str(e),
+                elapsed_ms=elapsed_ms,
+            )
+            return QueryResult(
+                source_id=ir.source_id,
+                backend="sql",
+                elapsed_ms=elapsed_ms,
+                error=str(e),
+            )
+    # ------------------------------------------------------------------
+    # Helpers
+    # ------------------------------------------------------------------
+    def _find_source(self, source_id: str) -> Source:
+        for s in self._catalog.sources:
+            if s.source_id == source_id:
+                return s
+        raise ValueError(f"source_id {source_id!r} not in catalog")
+    @staticmethod
+    def _parse_client_id(location_ref: str) -> str:
+        if not location_ref.startswith(_DBCLIENT_PREFIX):
+            raise ValueError(
+                f"DbExecutor expects 'dbclient://...' location_ref, got {location_ref!r}"
+            )
+        client_id = location_ref[len(_DBCLIENT_PREFIX):]
+        if not client_id:
+            raise ValueError("location_ref is missing client_id after 'dbclient://'")
+        return client_id
+    @staticmethod
+    async def _fetch_client(client_id: str) -> Any:
+        async with AsyncSessionLocal() as session:
+            client = await database_client_service.get(session, client_id)
+        if client is None:
+            raise ValueError(f"DatabaseClient {client_id!r} not found")
+        if client.status != "active":
+            raise ValueError(
+                f"DatabaseClient {client_id!r} is not active "
+                f"(status={client.status!r})"
+            )
+        return client
+    @staticmethod
+    def _sqlglot_guard(sql: str) -> None:
+        """Defense-in-depth: ensure the compiled SQL is a SELECT statement.
+        The compiler is already deterministic and only constructs SELECTs from
+        validated IR, but this guard catches any future bug that could leak
+        DML/DDL through.
+        """
+        try:
+            parsed = sqlglot.parse_one(sql, read="postgres")
+        except sqlglot.errors.ParseError as e:
+            raise ValueError(f"compiled SQL failed to parse: {e}") from e
+        if not isinstance(parsed, exp.Select):
+            raise ValueError(
+                f"compiled SQL is not a SELECT (got {type(parsed).__name__})"
+            )
+        forbidden = (exp.Insert, exp.Update, exp.Delete, exp.Drop, exp.AlterTable)
+        for node in parsed.find_all(forbidden):
+            raise ValueError(
+                f"compiled SQL contains forbidden DML/DDL: {type(node).__name__}"
+            )
+    @staticmethod
+    def _run_sync(db_type: str, creds: dict, compiled: CompiledSql) -> list[dict]:
+        with db_pipeline_service.engine_scope(db_type, creds) as engine:
+            with engine.connect() as conn:
+                if db_type in _POSTGRES_LIKE:
+                    # session-level read-only + per-statement timeout (ms)
+                    conn.execute(text("SET default_transaction_read_only = on"))
+                    conn.execute(
+                        text(f"SET statement_timeout = {_QUERY_TIMEOUT_SECONDS * 1000}")
+                    )
+                result = conn.execute(text(compiled.sql), compiled.params)
+                return [dict(row) for row in result.mappings()]