Agentic-Service-Data-Eyond-Catalog

Sleeping

Rifqi Hafizuddin Claude Opus 4.8 commited on about 21 hours ago

Commit

e4337a8

1 Parent(s): ac310de

[KM-626][AI] Slow-path agent: Assembler + Coordinator

The write-up half of the slow path (AGENT_ARCHITECTURE_CONTEXT_new.md §7.5, §8.3).

- assembler.py + prompt.py + config/prompts/assembler.md: single LLM call ->
AssemblerNarrative; code merges it with the RunState to build the AnalysisRecord
(results_snapshot / tasks_run / metadata copied from RunState, never re-authored
by the model, so the record stays a faithful source of truth). chat_answer is the
first output field so it streams via SSE. Chain construction mirrors the planner
service.
- coordinator.py: SlowPathCoordinator wires Planner -> TaskRunner -> Assembler.
Built and tested, but NOT yet wired into the live ChatHandler.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Files changed (4) hide show

src/agents/slow_path/assembler.py +136 -0
src/agents/slow_path/coordinator.py +48 -0
src/agents/slow_path/prompt.py +66 -0
src/config/prompts/assembler.md +43 -0

src/agents/slow_path/assembler.py ADDED Viewed

	@@ -0,0 +1,136 @@

+"""Assembler — single LLM call at the end of the slow path.
+Reads the `RunState` (all `TaskResult`s) + `BusinessContext` and produces an
+`AssembledOutput` { chat_answer, analysis_record }. Owns all language/output: prose,
+markdown tables, citations, and merging structured + unstructured results.
+The model authors only the *narrative* (`AssemblerNarrative`); this service copies
+the structured pass-through (`results_snapshot`, `tasks_run`) and metadata from the
+`RunState` so the record stays a faithful source of truth (§8.3, INV-4).
+Chain construction mirrors `agents/planner/service.py`.
+See AGENT_ARCHITECTURE_CONTEXT_new.md §7.5.
+"""
+from __future__ import annotations
+from datetime import UTC, datetime
+from pathlib import Path
+from langchain_core.messages import SystemMessage
+from langchain_core.prompts import ChatPromptTemplate
+from langchain_core.runnables import Runnable
+from langchain_openai import AzureChatOpenAI
+from src.middlewares.logging import get_logger
+from ..planner.contracts import BusinessContext
+from .errors import AssemblerError
+from .prompt import build_assembler_prompt
+from .schemas import (
+    AnalysisRecord,
+    AssembledOutput,
+    AssemblerNarrative,
+    RunState,
+    TaskSummary,
+)
+logger = get_logger("assembler")
+_PROMPT_PATH = (
+    Path(__file__).resolve().parent.parent.parent / "config" / "prompts" / "assembler.md"
+)
+def _load_prompt_text() -> str:
+    return _PROMPT_PATH.read_text(encoding="utf-8")
+def _build_default_chain() -> Runnable:
+    from src.config.settings import settings
+    llm = AzureChatOpenAI(
+        azure_deployment=settings.azureai_deployment_name_4o,
+        openai_api_version=settings.azureai_api_version_4o,
+        azure_endpoint=settings.azureai_endpoint_url_4o,
+        api_key=settings.azureai_api_key_4o,
+        temperature=0,
+    )
+    prompt = ChatPromptTemplate.from_messages(
+        [
+            SystemMessage(content=_load_prompt_text()),
+            ("human", "{human_content}"),
+        ]
+    )
+    return prompt | llm.with_structured_output(AssemblerNarrative)
+_default_chain: Runnable | None = None
+def _get_default_chain() -> Runnable:
+    global _default_chain
+    if _default_chain is None:
+        _default_chain = _build_default_chain()
+    return _default_chain
+class Assembler:
+    """Wraps the single Assembler LLM call. Inject `structured_chain` for tests."""
+    def __init__(self, structured_chain: Runnable | None = None) -> None:
+        self._chain = structured_chain
+    def _ensure_chain(self) -> Runnable:
+        if self._chain is None:
+            self._chain = _get_default_chain()
+        return self._chain
+    async def assemble(
+        self,
+        run_state: RunState,
+        context: BusinessContext,
+        question: str | None = None,
+    ) -> AssembledOutput:
+        chain = self._ensure_chain()
+        human_content = build_assembler_prompt(run_state, context, question)
+        try:
+            narrative: AssemblerNarrative = await chain.ainvoke(
+                {"human_content": human_content}
+            )
+        except Exception as exc:  # surface as a typed error for the caller
+            raise AssemblerError(f"assembler call failed: {exc}") from exc
+        record = _build_record(narrative, run_state)
+        logger.info(
+            "analysis assembled",
+            plan_id=run_state.plan_id,
+            business_context_id=run_state.business_context_id,
+            n_tasks=len(run_state.results),
+        )
+        return AssembledOutput(chat_answer=narrative.chat_answer, analysis_record=record)
+def _build_record(narrative: AssemblerNarrative, run_state: RunState) -> AnalysisRecord:
+    tasks_run = [
+        TaskSummary(
+            task_id=task_id,
+            objective=result.objective,
+            status=result.status,
+            tools_used=[o.tool for o in result.outputs],
+        )
+        for task_id, result in run_state.results.items()
+    ]
+    return AnalysisRecord(
+        goal_restated=narrative.goal_restated,
+        findings=narrative.findings,
+        caveats=narrative.caveats,
+        data_used=narrative.data_used,
+        open_questions=narrative.open_questions,
+        tasks_run=tasks_run,
+        results_snapshot=run_state.results,
+        plan_id=run_state.plan_id,
+        business_context_id=run_state.business_context_id,
+        created_at=datetime.now(UTC),
+    )

src/agents/slow_path/coordinator.py ADDED Viewed

	@@ -0,0 +1,48 @@

+"""SlowPathCoordinator — wires the slow path: Planner -> TaskRunner -> Assembler.
+A thin coordination object. This is the unit the (future) expanded Orchestrator /
+ChatHandler will call on a `structured` analytical query. It is built and tested
+here but **not yet wired into the live chat flow** — that step waits on the tool
+team's real `ToolInvoker` and a real `BusinessContext` source.
+See AGENT_ARCHITECTURE_CONTEXT_new.md §5.2 / §6.1.
+"""
+from __future__ import annotations
+from ...catalog.models import Catalog
+from ..planner.contracts import BusinessContext, ToolRegistry
+from ..planner.inputs import Constraints
+from ..planner.service import PlannerService
+from .assembler import Assembler
+from .schemas import AssembledOutput
+from .task_runner import TaskRunner
+class SlowPathCoordinator:
+    def __init__(
+        self,
+        planner: PlannerService,
+        task_runner: TaskRunner,
+        assembler: Assembler,
+        registry: ToolRegistry,
+    ) -> None:
+        self._planner = planner
+        self._task_runner = task_runner
+        self._assembler = assembler
+        self._registry = registry
+    async def run(
+        self,
+        context: BusinessContext,
+        catalog: Catalog,
+        query: str,
+        constraints: Constraints,
+    ) -> AssembledOutput:
+        task_list = await self._planner.plan(
+            context, catalog, self._registry, query, constraints
+        )
+        run_state = await self._task_runner.run(
+            task_list, business_context_id=context.project_id
+        )
+        return await self._assembler.assemble(run_state, context, question=query)

src/agents/slow_path/prompt.py ADDED Viewed

	@@ -0,0 +1,66 @@

+"""Builds the Assembler LLM human-message content.
+The system prompt (`config/prompts/assembler.md`) carries the role and rules. This
+module assembles the per-call human content: the business context + the executed
+`RunState` (task objectives, statuses, and structured tool outputs) + the original
+question. Tool outputs are rendered compactly as data — the model turns them into
+prose and markdown tables.
+"""
+from __future__ import annotations
+from ..planner.contracts import BusinessContext, ToolOutput
+from ..planner.prompt import render_business_context
+from .schemas import RunState, TaskResult
+_MAX_ROWS = 20
+def render_run_state(run_state: RunState) -> str:
+    lines = [f"Plan: {run_state.plan_id}"]
+    if run_state.open_questions:
+        lines.append("Open questions carried from the plan:")
+        lines.extend(f"  - {q}" for q in run_state.open_questions)
+    lines.append("")
+    lines.append("Task results (in execution order):")
+    for task_id, result in run_state.results.items():
+        lines.append(_render_task(task_id, result))
+    return "\n".join(lines)
+def _render_task(task_id: str, result: TaskResult) -> str:
+    lines = [f"- [{result.status}] {task_id}: {result.objective}"]
+    if result.error:
+        lines.append(f"    note: {result.error}")
+    for output in result.outputs:
+        lines.append(f"    {_render_output(output)}")
+    return "\n".join(lines)
+def _render_output(output: ToolOutput) -> str:
+    if output.kind == "error":
+        return f"({output.tool}) error: {output.error}"
+    if output.kind == "table" and output.columns is not None:
+        header = ", ".join(output.columns)
+        rows = output.rows or []
+        preview = "; ".join(
+            " | ".join(str(cell) for cell in row) for row in rows[:_MAX_ROWS]
+        )
+        more = "" if len(rows) <= _MAX_ROWS else f" … (+{len(rows) - _MAX_ROWS} more rows)"
+        return f"({output.tool}) table [{header}]: {preview}{more}"
+    meta = f" meta={output.meta}" if output.meta else ""
+    return f"({output.tool}) {output.kind}: {output.value}{meta}"
+def build_assembler_prompt(
+    run_state: RunState,
+    context: BusinessContext,
+    question: str | None = None,
+) -> str:
+    sections = [
+        f"# Business context\n\n{render_business_context(context)}",
+        f"# Analysis results\n\n{render_run_state(run_state)}",
+    ]
+    if question:
+        sections.append(f"# Original question\n\n{question}")
+    return "\n\n".join(sections)

src/config/prompts/assembler.md ADDED Viewed

	@@ -0,0 +1,43 @@

+You are the Assembler for Data Eyond, an AI data scientist. A deterministic
+TaskRunner has just executed a static analysis plan; you receive its results (the
+`RunState`) plus the project's business context. Your job is to turn those results
+into a decision-ready answer.
+You produce two things in one structured object:
+1. `chat_answer` — a compact, to-the-point reply for the chat, in **markdown**
+   (prose + tables where useful).
+2. The narrative fields of an analysis record: `goal_restated`, `findings`,
+   `caveats`, `data_used`, `open_questions`.
+# Hard rules (non-negotiable)
+1. **Ground every claim in the provided results.** Use only the numbers, tables,
+   and values present in the task results. **Never invent, estimate, or extrapolate
+   a number** that is not in the results. If the data does not answer part of the
+   question, say so.
+2. **Report what failed.** Some tasks may have `status: partial` or `failure`. Do
+   not pretend they succeeded. Briefly state what could not be completed and how it
+   limits the answer; put unresolved items in `open_questions`.
+3. **Render, don't recompute.** Build markdown tables from the structured task
+   outputs as they are. Do not do your own arithmetic beyond trivially restating a
+   value already computed.
+4. **No tool/code talk.** Write for a business reader. Do not mention tool names,
+   task ids, SQL, or internal mechanics in `chat_answer`.
+# How to write
+- **`chat_answer`**: lead with the answer. Add a short markdown table when it makes
+  the numbers clearer. Keep it tight — this streams into a chat, not a report.
+- **`findings`**: the key takeaways, each a single self-contained sentence with the
+  supporting figure.
+- **`caveats`**: data-quality limits, partial/failed steps, assumptions that affect
+  confidence.
+- **`data_used`**: the sources/tables/columns the answer rests on (plain names).
+- **`goal_restated`**: one sentence restating the business question you answered.
+- **`open_questions`**: anything ambiguous, missing, or worth a follow-up. Fold in
+  any open questions carried from the plan. Empty list if genuinely none.
+# Output
+Return exactly one structured object with the fields above. Be honest, specific,
+and concise.