[KM-624][AI] Planner: realign stub registry + examples to composite analyze_* tools
Browse filesTeam decision: v1 uses composite "family" tools (analyze_*), not the atomic
compute_* set. Realign the planner-facing stub to the real KM-624 inventory so
the Planner plans against tools that exist.
- registry.py: replace the 9 atomic entries (compute_median/stddev/percentile/
mode, date_trunc, ...) with 12 composite entries -- 4 data-access
(query_structured, retrieve_documents, list_sources, describe_source) + 8
analyze_* (descriptive, aggregate, comparison, contribution, profile,
correlation, segment, trend). Each analyze_* takes a `data` "${t<id>}"
placeholder (Pattern A, assumed pending the tool team).
- examples.py: Example A -> analyze_contribution; Example B -> analyze_trend
(drops the removed date_trunc/compute_stddev chain).
- planner.md: rewrite the "compute_* tools" bullet as data-access vs analytics.
Validator/prompt/service unchanged (generic over the registry). Planner tests
updated locally (tests/ is gitignored): 32 passing + 1 gated, ruff clean.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- src/agents/planner/examples.py +40 -58
- src/agents/planner/registry.py +203 -57
- src/config/prompts/planner.md +8 -4
|
@@ -2,7 +2,11 @@
|
|
| 2 |
|
| 3 |
Two illustrative (question -> TaskList) pairs that teach the OUTPUT SHAPE:
|
| 4 |
stages, dependency edges, parallelism, ordered tool-call chains, inline QueryIR,
|
| 5 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
(`src_sales` / `t_orders`); these ids are part of the illustration and are not
|
| 7 |
validated against the user's real catalog. v1 is descriptive/diagnostic — no
|
| 8 |
modeling tasks.
|
|
@@ -17,6 +21,9 @@ from .schemas import Task, TaskList, ToolCall
|
|
| 17 |
# --------------------------------------------------------------------------- #
|
| 18 |
# Example A — exploratory, no modeling.
|
| 19 |
# "Which product categories drove last quarter's revenue?"
|
|
|
|
|
|
|
|
|
|
| 20 |
# --------------------------------------------------------------------------- #
|
| 21 |
|
| 22 |
_EXAMPLE_A = TaskList(
|
|
@@ -38,8 +45,8 @@ _EXAMPLE_A = TaskList(
|
|
| 38 |
),
|
| 39 |
Task(
|
| 40 |
id="t2",
|
| 41 |
-
stage="
|
| 42 |
-
objective="
|
| 43 |
tool_calls=[
|
| 44 |
ToolCall(
|
| 45 |
tool="query_structured",
|
|
@@ -49,12 +56,7 @@ _EXAMPLE_A = TaskList(
|
|
| 49 |
"table_id": "t_orders",
|
| 50 |
"select": [
|
| 51 |
{"kind": "column", "column_id": "c_category", "alias": "category"},
|
| 52 |
-
{
|
| 53 |
-
"kind": "agg",
|
| 54 |
-
"fn": "sum",
|
| 55 |
-
"column_id": "c_revenue",
|
| 56 |
-
"alias": "revenue",
|
| 57 |
-
},
|
| 58 |
],
|
| 59 |
"filters": [
|
| 60 |
{
|
|
@@ -64,54 +66,36 @@ _EXAMPLE_A = TaskList(
|
|
| 64 |
"value_type": "date",
|
| 65 |
}
|
| 66 |
],
|
| 67 |
-
"
|
| 68 |
-
"order_by": [{"column_id": "revenue", "dir": "desc"}],
|
| 69 |
-
"limit": 20,
|
| 70 |
}
|
| 71 |
},
|
| 72 |
)
|
| 73 |
],
|
| 74 |
-
expected_output="
|
| 75 |
-
success_criteria="Produced
|
| 76 |
depends_on=["t1"],
|
| 77 |
-
parallelizable_with=[
|
| 78 |
-
estimated_cost="
|
| 79 |
),
|
| 80 |
Task(
|
| 81 |
id="t3",
|
| 82 |
stage="evaluation",
|
| 83 |
-
objective="
|
| 84 |
tool_calls=[
|
| 85 |
ToolCall(
|
| 86 |
-
tool="
|
| 87 |
args={
|
| 88 |
-
"
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
{
|
| 93 |
-
"kind": "agg",
|
| 94 |
-
"fn": "sum",
|
| 95 |
-
"column_id": "c_revenue",
|
| 96 |
-
"alias": "total_revenue",
|
| 97 |
-
}
|
| 98 |
-
],
|
| 99 |
-
"filters": [
|
| 100 |
-
{
|
| 101 |
-
"column_id": "c_order_date",
|
| 102 |
-
"op": "between",
|
| 103 |
-
"value": ["2026-01-01", "2026-03-31"],
|
| 104 |
-
"value_type": "date",
|
| 105 |
-
}
|
| 106 |
-
],
|
| 107 |
-
}
|
| 108 |
},
|
| 109 |
)
|
| 110 |
],
|
| 111 |
-
expected_output="
|
| 112 |
-
success_criteria="Produced
|
| 113 |
-
depends_on=["
|
| 114 |
-
parallelizable_with=[
|
| 115 |
estimated_cost="low",
|
| 116 |
),
|
| 117 |
],
|
|
@@ -181,30 +165,28 @@ _EXAMPLE_B = TaskList(
|
|
| 181 |
Task(
|
| 182 |
id="t3",
|
| 183 |
stage="evaluation",
|
| 184 |
-
objective="Bucket
|
| 185 |
tool_calls=[
|
| 186 |
ToolCall(
|
| 187 |
-
tool="
|
| 188 |
-
args={
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 189 |
)
|
| 190 |
],
|
| 191 |
-
expected_output="
|
| 192 |
-
success_criteria=
|
|
|
|
|
|
|
|
|
|
| 193 |
depends_on=["t2"],
|
| 194 |
parallelizable_with=[],
|
| 195 |
estimated_cost="low",
|
| 196 |
),
|
| 197 |
-
Task(
|
| 198 |
-
id="t4",
|
| 199 |
-
stage="evaluation",
|
| 200 |
-
objective="Quantify month-to-month spread to flag unusual months.",
|
| 201 |
-
tool_calls=[ToolCall(tool="compute_stddev", args={"values": "${t3}"})],
|
| 202 |
-
expected_output="monthly_volatility",
|
| 203 |
-
success_criteria="Produced a stddev figure that flags months above the typical spread.",
|
| 204 |
-
depends_on=["t3"],
|
| 205 |
-
parallelizable_with=[],
|
| 206 |
-
estimated_cost="low",
|
| 207 |
-
),
|
| 208 |
],
|
| 209 |
)
|
| 210 |
|
|
|
|
| 2 |
|
| 3 |
Two illustrative (question -> TaskList) pairs that teach the OUTPUT SHAPE:
|
| 4 |
stages, dependency edges, parallelism, ordered tool-call chains, inline QueryIR,
|
| 5 |
+
"${t<id>}" placeholders, and the assumed data-flow convention — `query_structured`
|
| 6 |
+
pulls rows, then a composite `analyze_*` tool consumes them via a `data` placeholder
|
| 7 |
+
referencing the upstream result's column aliases (Pattern A; the tool team may
|
| 8 |
+
instead pick self-fetch by `source_id`, in which case these examples are reshaped
|
| 9 |
+
to match — see registry.py). They reference a hypothetical sales catalog
|
| 10 |
(`src_sales` / `t_orders`); these ids are part of the illustration and are not
|
| 11 |
validated against the user's real catalog. v1 is descriptive/diagnostic — no
|
| 12 |
modeling tasks.
|
|
|
|
| 21 |
# --------------------------------------------------------------------------- #
|
| 22 |
# Example A — exploratory, no modeling.
|
| 23 |
# "Which product categories drove last quarter's revenue?"
|
| 24 |
+
# Shows: query_structured pulls rows -> analyze_contribution computes each
|
| 25 |
+
# category's share of the total in one call (no manual per-category + total
|
| 26 |
+
# queries).
|
| 27 |
# --------------------------------------------------------------------------- #
|
| 28 |
|
| 29 |
_EXAMPLE_A = TaskList(
|
|
|
|
| 45 |
),
|
| 46 |
Task(
|
| 47 |
id="t2",
|
| 48 |
+
stage="data_preparation",
|
| 49 |
+
objective="Pull last quarter's order-level category and revenue rows.",
|
| 50 |
tool_calls=[
|
| 51 |
ToolCall(
|
| 52 |
tool="query_structured",
|
|
|
|
| 56 |
"table_id": "t_orders",
|
| 57 |
"select": [
|
| 58 |
{"kind": "column", "column_id": "c_category", "alias": "category"},
|
| 59 |
+
{"kind": "column", "column_id": "c_revenue", "alias": "revenue"},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 60 |
],
|
| 61 |
"filters": [
|
| 62 |
{
|
|
|
|
| 66 |
"value_type": "date",
|
| 67 |
}
|
| 68 |
],
|
| 69 |
+
"limit": 10000,
|
|
|
|
|
|
|
| 70 |
}
|
| 71 |
},
|
| 72 |
)
|
| 73 |
],
|
| 74 |
+
expected_output="quarter_rows",
|
| 75 |
+
success_criteria="Produced last quarter's order rows with category and revenue.",
|
| 76 |
depends_on=["t1"],
|
| 77 |
+
parallelizable_with=[],
|
| 78 |
+
estimated_cost="medium",
|
| 79 |
),
|
| 80 |
Task(
|
| 81 |
id="t3",
|
| 82 |
stage="evaluation",
|
| 83 |
+
objective="Rank each category's revenue share of the quarter total.",
|
| 84 |
tool_calls=[
|
| 85 |
ToolCall(
|
| 86 |
+
tool="analyze_contribution",
|
| 87 |
args={
|
| 88 |
+
"data": "${t2}",
|
| 89 |
+
"dimension": "category",
|
| 90 |
+
"value_column": "revenue",
|
| 91 |
+
"agg": "sum",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 92 |
},
|
| 93 |
)
|
| 94 |
],
|
| 95 |
+
expected_output="category_contribution",
|
| 96 |
+
success_criteria="Produced each category's revenue share, ranked high to low.",
|
| 97 |
+
depends_on=["t2"],
|
| 98 |
+
parallelizable_with=[],
|
| 99 |
estimated_cost="low",
|
| 100 |
),
|
| 101 |
],
|
|
|
|
| 165 |
Task(
|
| 166 |
id="t3",
|
| 167 |
stage="evaluation",
|
| 168 |
+
objective="Bucket revenue into months and summarize the trend and movement.",
|
| 169 |
tool_calls=[
|
| 170 |
ToolCall(
|
| 171 |
+
tool="analyze_trend",
|
| 172 |
+
args={
|
| 173 |
+
"data": "${t2}",
|
| 174 |
+
"date_column": "order_date",
|
| 175 |
+
"value_column": "revenue",
|
| 176 |
+
"freq": "month",
|
| 177 |
+
"agg": "sum",
|
| 178 |
+
},
|
| 179 |
)
|
| 180 |
],
|
| 181 |
+
expected_output="monthly_trend",
|
| 182 |
+
success_criteria=(
|
| 183 |
+
"Produced a per-month revenue series with direction and change rate to "
|
| 184 |
+
"flag months above/below the typical level."
|
| 185 |
+
),
|
| 186 |
depends_on=["t2"],
|
| 187 |
parallelizable_with=[],
|
| 188 |
estimated_cost="low",
|
| 189 |
),
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 190 |
],
|
| 191 |
)
|
| 192 |
|
|
@@ -1,19 +1,44 @@
|
|
| 1 |
-
"""STUB v1
|
| 2 |
|
| 3 |
This is the agent team's local stand-in for the tool team's inventory (KM-608)
|
| 4 |
-
so the planner is buildable and testable before the real
|
| 5 |
-
here are *contracts only* —
|
| 6 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
validated against the catalog by the existing IRValidator.
|
| 14 |
|
| 15 |
-
|
| 16 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
"""
|
| 18 |
|
| 19 |
from __future__ import annotations
|
|
@@ -21,6 +46,10 @@ from __future__ import annotations
|
|
| 21 |
from .contracts import ToolRegistry, ToolSpec
|
| 22 |
|
| 23 |
_P0_TOOLS: list[ToolSpec] = [
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
ToolSpec(
|
| 25 |
name="query_structured",
|
| 26 |
category="analytics.query",
|
|
@@ -30,11 +59,13 @@ _P0_TOOLS: list[ToolSpec] = [
|
|
| 30 |
"Run one validated, single-table query against a structured source (DB "
|
| 31 |
"schema or tabular file) and return rows. The `ir` argument is an inline "
|
| 32 |
"QueryIR (the JSON intent: source_id, table_id, select, filters, group_by, "
|
| 33 |
-
"order_by, limit) — never SQL.
|
| 34 |
-
"
|
| 35 |
-
"
|
| 36 |
-
"
|
| 37 |
-
"
|
|
|
|
|
|
|
| 38 |
),
|
| 39 |
),
|
| 40 |
ToolSpec(
|
|
@@ -81,79 +112,194 @@ _P0_TOOLS: list[ToolSpec] = [
|
|
| 81 |
"before querying it. Do NOT use it to fetch data rows (use query_structured)."
|
| 82 |
),
|
| 83 |
),
|
|
|
|
|
|
|
|
|
|
|
|
|
| 84 |
ToolSpec(
|
| 85 |
-
name="
|
| 86 |
-
category="analytics.
|
| 87 |
-
input_schema={
|
| 88 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 89 |
description=(
|
| 90 |
-
"
|
| 91 |
-
"
|
| 92 |
-
"
|
| 93 |
-
"
|
|
|
|
|
|
|
|
|
|
| 94 |
),
|
| 95 |
),
|
| 96 |
ToolSpec(
|
| 97 |
-
name="
|
| 98 |
category="analytics.aggregation",
|
| 99 |
-
input_schema={
|
| 100 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 101 |
description=(
|
| 102 |
-
"
|
| 103 |
-
"
|
| 104 |
-
"
|
|
|
|
|
|
|
|
|
|
|
|
|
| 105 |
),
|
| 106 |
),
|
| 107 |
ToolSpec(
|
| 108 |
-
name="
|
| 109 |
-
category="analytics.
|
| 110 |
input_schema={
|
| 111 |
-
"required": ["
|
| 112 |
"properties": {
|
| 113 |
-
"
|
| 114 |
-
"
|
|
|
|
|
|
|
|
|
|
|
|
|
| 115 |
},
|
| 116 |
},
|
| 117 |
-
output_kind="
|
| 118 |
description=(
|
| 119 |
-
"
|
| 120 |
-
"
|
| 121 |
-
"
|
|
|
|
|
|
|
|
|
|
|
|
|
| 122 |
),
|
| 123 |
),
|
| 124 |
ToolSpec(
|
| 125 |
-
name="
|
| 126 |
-
category="analytics.
|
| 127 |
-
input_schema={
|
| 128 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 129 |
description=(
|
| 130 |
-
"
|
| 131 |
-
"
|
| 132 |
-
"
|
| 133 |
-
"
|
|
|
|
|
|
|
| 134 |
),
|
| 135 |
),
|
| 136 |
ToolSpec(
|
| 137 |
-
name="
|
| 138 |
category="analytics.timeseries",
|
| 139 |
input_schema={
|
| 140 |
-
"required": ["
|
| 141 |
"properties": {
|
| 142 |
-
"
|
| 143 |
-
"
|
|
|
|
|
|
|
|
|
|
| 144 |
},
|
| 145 |
},
|
| 146 |
output_kind="series",
|
| 147 |
description=(
|
| 148 |
-
"
|
| 149 |
-
"
|
| 150 |
-
"
|
| 151 |
-
"
|
|
|
|
|
|
|
|
|
|
| 152 |
),
|
| 153 |
),
|
| 154 |
]
|
| 155 |
|
| 156 |
|
| 157 |
def default_registry() -> ToolRegistry:
|
| 158 |
-
"""The v1
|
| 159 |
return ToolRegistry(tools=list(_P0_TOOLS))
|
|
|
|
| 1 |
+
"""STUB v1 tool registry — composite ("family") tools.
|
| 2 |
|
| 3 |
This is the agent team's local stand-in for the tool team's inventory (KM-608)
|
| 4 |
+
so the planner is buildable and testable before the real wrapper layer lands.
|
| 5 |
+
The tools here are *contracts only* — the compute logic for the `analyze_*`
|
| 6 |
+
family already exists in `src/tools/analytics/` (KM-624), but the wrapper layer
|
| 7 |
+
(source/placeholder -> DataFrame fetch, the `ToolOutput` envelope, never-throw
|
| 8 |
+
error handling, ToolSpec registration) is still pending the Planner seam
|
| 9 |
+
(KM-418 / AGENT_ARCHITECTURE_CONTEXT_new.md §8.4). The planner plans against the
|
| 10 |
+
registry and never names a tool outside it (INV-7).
|
| 11 |
|
| 12 |
+
**Taxonomy decision (2026-06-08):** v1 uses **composite/family** tools, not the
|
| 13 |
+
atomic `compute_*` set the earlier draft assumed. One `analyze_*` call does a
|
| 14 |
+
whole analytical job (e.g. `analyze_descriptive` returns mean/median/mode/std/
|
| 15 |
+
quartiles/skew/null_rate at once, replacing four atomic `compute_*` tools). See
|
| 16 |
+
§9.3 / the decisions table in the architecture doc.
|
|
|
|
| 17 |
|
| 18 |
+
**Ownership (revised 2026-06-08): the tool team owns ALL tools** — compute,
|
| 19 |
+
data-access (`query_structured`/`retrieve_documents`/`list_sources`/
|
| 20 |
+
`describe_source`), the wrapper/invoker, and tests. This file is purely the agent
|
| 21 |
+
team's local scaffold for building/testing the Planner (and later the TaskRunner/
|
| 22 |
+
Assembler against mocks) until the real registry lands; replace it then.
|
| 23 |
+
|
| 24 |
+
**Data-flow convention (Pattern A — assumed, but the tool team's call, still open):**
|
| 25 |
+
this stub assumes the `analyze_*` tools do NOT self-fetch by `source_id`; each
|
| 26 |
+
takes a `data` argument that is a `"${t<id>}"` placeholder pointing at an upstream
|
| 27 |
+
`query_structured` table output, resolved to a DataFrame at execution time. Column
|
| 28 |
+
arguments (`column_ids`, `dimension`, `value_column`, `date_column`, …) reference
|
| 29 |
+
the *aliases* the upstream query produced. If the tool team instead picks Pattern B
|
| 30 |
+
(self-fetch by `source_id`), reshape this stub + the few-shot examples to match —
|
| 31 |
+
the agent code does not change either way (INV-7).
|
| 32 |
+
|
| 33 |
+
`input_schema` is the lightweight JSON-schema-ish dict the planner validator
|
| 34 |
+
(validator.py check #8) consumes: `required` (list of arg names) + `properties`
|
| 35 |
+
(allowed arg names). Arg *values* may be `"${t<id>}"` placeholders resolved at
|
| 36 |
+
execution time, so the validator checks arg *keys*, not value types — except
|
| 37 |
+
`query_structured.args["ir"]`, whose inline QueryIR is validated against the
|
| 38 |
+
catalog by the existing IRValidator.
|
| 39 |
+
|
| 40 |
+
When KM-608/KM-418 ship, replace `default_registry()` with the real registry
|
| 41 |
+
import. See AGENT_ARCHITECTURE_CONTEXT_new.md §9.2 / §9.3.
|
| 42 |
"""
|
| 43 |
|
| 44 |
from __future__ import annotations
|
|
|
|
| 46 |
from .contracts import ToolRegistry, ToolSpec
|
| 47 |
|
| 48 |
_P0_TOOLS: list[ToolSpec] = [
|
| 49 |
+
# ----------------------------------------------------------------------- #
|
| 50 |
+
# Data access + catalog introspection (agent-team owned; wrap existing
|
| 51 |
+
# Phase 2 infra — QueryService / RetrievalRouter / CatalogReader).
|
| 52 |
+
# ----------------------------------------------------------------------- #
|
| 53 |
ToolSpec(
|
| 54 |
name="query_structured",
|
| 55 |
category="analytics.query",
|
|
|
|
| 59 |
"Run one validated, single-table query against a structured source (DB "
|
| 60 |
"schema or tabular file) and return rows. The `ir` argument is an inline "
|
| 61 |
"QueryIR (the JSON intent: source_id, table_id, select, filters, group_by, "
|
| 62 |
+
"order_by, limit) — never SQL. This is the data-access entry point: use it "
|
| 63 |
+
"to select, filter, and pull the rows the analytics (`analyze_*`) tools "
|
| 64 |
+
"then consume. It also does simple built-in aggregation the IR can express "
|
| 65 |
+
"(count/sum/avg/min/max/count_distinct). Do NOT use it for richer statistics "
|
| 66 |
+
"(median/percentile/mode/stddev/skew → analyze_descriptive), trends "
|
| 67 |
+
"(analyze_trend), correlation, segmentation, or share-of-total; and do NOT "
|
| 68 |
+
"use it to read documents (use retrieve_documents)."
|
| 69 |
),
|
| 70 |
),
|
| 71 |
ToolSpec(
|
|
|
|
| 112 |
"before querying it. Do NOT use it to fetch data rows (use query_structured)."
|
| 113 |
),
|
| 114 |
),
|
| 115 |
+
# ----------------------------------------------------------------------- #
|
| 116 |
+
# Analytics family (KM-624 compute; wrapper pending). Each takes `data` =
|
| 117 |
+
# a "${t<id>}" placeholder for an upstream query_structured table output.
|
| 118 |
+
# ----------------------------------------------------------------------- #
|
| 119 |
ToolSpec(
|
| 120 |
+
name="analyze_descriptive",
|
| 121 |
+
category="analytics.descriptive",
|
| 122 |
+
input_schema={
|
| 123 |
+
"required": ["data", "column_ids"],
|
| 124 |
+
"properties": {
|
| 125 |
+
"data": {"type": "string"},
|
| 126 |
+
"column_ids": {"type": "array"},
|
| 127 |
+
"metrics": {"type": "array"},
|
| 128 |
+
},
|
| 129 |
+
},
|
| 130 |
+
output_kind="stats",
|
| 131 |
description=(
|
| 132 |
+
"Single/multi-column EDA in one call: count, mean, median, mode, std, "
|
| 133 |
+
"variance, quartiles (q1/q3), min, max, skew, null_count, null_rate for each "
|
| 134 |
+
"of `column_ids`. `data` is a '${t<id>}' placeholder for an upstream "
|
| 135 |
+
"query_structured result; `column_ids` are that result's column aliases. "
|
| 136 |
+
"This replaces the atomic compute_median/mode/stddev/percentile tools — ask "
|
| 137 |
+
"for the whole profile, not one statistic at a time. Do NOT use it for "
|
| 138 |
+
"group-by aggregates (analyze_aggregate) or time trends (analyze_trend)."
|
| 139 |
),
|
| 140 |
),
|
| 141 |
ToolSpec(
|
| 142 |
+
name="analyze_aggregate",
|
| 143 |
category="analytics.aggregation",
|
| 144 |
+
input_schema={
|
| 145 |
+
"required": ["data", "aggregations"],
|
| 146 |
+
"properties": {
|
| 147 |
+
"data": {"type": "string"},
|
| 148 |
+
"aggregations": {"type": "object"},
|
| 149 |
+
"group_by": {"type": "array"},
|
| 150 |
+
},
|
| 151 |
+
},
|
| 152 |
+
output_kind="table",
|
| 153 |
description=(
|
| 154 |
+
"Group-by aggregation over an already-materialized result: per group, "
|
| 155 |
+
"compute `aggregations` like {\"revenue\": [\"sum\", \"mean\"], "
|
| 156 |
+
"\"order_id\": [\"count\"]} (sum/mean/count/min/max/median/nunique). `data` "
|
| 157 |
+
"is a '${t<id>}' placeholder; `group_by` columns and aggregated columns are "
|
| 158 |
+
"that result's aliases. Prefer query_structured for simple group-by the IR "
|
| 159 |
+
"can already express; use this to aggregate a derived/joined/intermediate "
|
| 160 |
+
"result, or for median per group (the IR cannot)."
|
| 161 |
),
|
| 162 |
),
|
| 163 |
ToolSpec(
|
| 164 |
+
name="analyze_comparison",
|
| 165 |
+
category="analytics.comparison",
|
| 166 |
input_schema={
|
| 167 |
+
"required": ["data", "dimension", "value_column", "group_a", "group_b"],
|
| 168 |
"properties": {
|
| 169 |
+
"data": {"type": "string"},
|
| 170 |
+
"dimension": {"type": "string"},
|
| 171 |
+
"value_column": {"type": "string"},
|
| 172 |
+
"group_a": {},
|
| 173 |
+
"group_b": {},
|
| 174 |
+
"agg": {"type": "string"},
|
| 175 |
},
|
| 176 |
},
|
| 177 |
+
output_kind="stats",
|
| 178 |
description=(
|
| 179 |
+
"Compare one aggregated metric between two groups of a dimension (e.g. "
|
| 180 |
+
"region 'A' vs 'B'): returns each group's value, absolute and percent "
|
| 181 |
+
"difference, and direction (higher/lower/equal); group_a is the baseline. "
|
| 182 |
+
"`data` is a '${t<id>}' placeholder; `dimension`/`value_column` are aliases; "
|
| 183 |
+
"`agg` defaults to sum. Use for exactly TWO groups. For many categories' "
|
| 184 |
+
"share of a total use analyze_contribution; for movement over time use "
|
| 185 |
+
"analyze_trend."
|
| 186 |
),
|
| 187 |
),
|
| 188 |
ToolSpec(
|
| 189 |
+
name="analyze_contribution",
|
| 190 |
+
category="analytics.decomposition",
|
| 191 |
+
input_schema={
|
| 192 |
+
"required": ["data", "dimension", "value_column"],
|
| 193 |
+
"properties": {
|
| 194 |
+
"data": {"type": "string"},
|
| 195 |
+
"dimension": {"type": "string"},
|
| 196 |
+
"value_column": {"type": "string"},
|
| 197 |
+
"agg": {"type": "string"},
|
| 198 |
+
"top_n": {"type": "integer"},
|
| 199 |
+
},
|
| 200 |
+
},
|
| 201 |
+
output_kind="table",
|
| 202 |
+
description=(
|
| 203 |
+
"Share-of-total breakdown: each category's value, share, and running "
|
| 204 |
+
"cumulative share, largest first — the tool for 'which categories drive "
|
| 205 |
+
"most of X?' and Pareto (80/20) reasoning. `data` is a '${t<id>}' "
|
| 206 |
+
"placeholder; `dimension`/`value_column` are aliases; `agg` defaults to sum; "
|
| 207 |
+
"`top_n` lumps the tail into an 'Others' row. Use for a single snapshot of "
|
| 208 |
+
"many categories. Do NOT use it to compare exactly two groups "
|
| 209 |
+
"(analyze_comparison) or to trend over time (analyze_trend)."
|
| 210 |
+
),
|
| 211 |
+
),
|
| 212 |
+
ToolSpec(
|
| 213 |
+
name="analyze_profile",
|
| 214 |
+
category="analytics.quality",
|
| 215 |
+
input_schema={
|
| 216 |
+
"required": ["data"],
|
| 217 |
+
"properties": {"data": {"type": "string"}, "column_ids": {"type": "array"}},
|
| 218 |
+
},
|
| 219 |
+
output_kind="stats",
|
| 220 |
+
description=(
|
| 221 |
+
"Per-column data-quality profile: dtype, inferred type, completeness "
|
| 222 |
+
"(null_count/null_rate), cardinality (distinct_count/rate, is_constant), and "
|
| 223 |
+
"for numeric columns min/max/mean plus an IQR-based outlier_count (top value "
|
| 224 |
+
"for non-numeric). `data` is a '${t<id>}' placeholder; `column_ids` defaults "
|
| 225 |
+
"to all columns. Use in data_understanding to judge whether data is clean "
|
| 226 |
+
"enough before deeper analysis. Do NOT use it for the analytical answer "
|
| 227 |
+
"itself — it describes data health, not the business metric."
|
| 228 |
+
),
|
| 229 |
+
),
|
| 230 |
+
ToolSpec(
|
| 231 |
+
name="analyze_correlation",
|
| 232 |
+
category="analytics.relationship",
|
| 233 |
+
input_schema={
|
| 234 |
+
"required": ["data"],
|
| 235 |
+
"properties": {
|
| 236 |
+
"data": {"type": "string"},
|
| 237 |
+
"column_ids": {"type": "array"},
|
| 238 |
+
"method": {"type": "string"},
|
| 239 |
+
},
|
| 240 |
+
},
|
| 241 |
+
output_kind="stats",
|
| 242 |
+
description=(
|
| 243 |
+
"Pairwise correlation across numeric columns: returns the full matrix plus "
|
| 244 |
+
"column pairs ranked by strength. `data` is a '${t<id>}' placeholder; "
|
| 245 |
+
"`column_ids` defaults to all numeric columns; `method` is pearson "
|
| 246 |
+
"(default), spearman, or kendall. Use for 'does X relate to Y?'. Needs at "
|
| 247 |
+
"least two numeric columns. Correlation is not causation — it does not "
|
| 248 |
+
"explain why, and is not a model."
|
| 249 |
+
),
|
| 250 |
+
),
|
| 251 |
+
ToolSpec(
|
| 252 |
+
name="analyze_segment",
|
| 253 |
+
category="analytics.segmentation",
|
| 254 |
+
input_schema={
|
| 255 |
+
"required": ["data", "column", "bins"],
|
| 256 |
+
"properties": {
|
| 257 |
+
"data": {"type": "string"},
|
| 258 |
+
"column": {"type": "string"},
|
| 259 |
+
"bins": {},
|
| 260 |
+
"method": {"type": "string"},
|
| 261 |
+
"labels": {"type": "array"},
|
| 262 |
+
"value_column": {"type": "string"},
|
| 263 |
+
"agg": {"type": "string"},
|
| 264 |
+
},
|
| 265 |
+
},
|
| 266 |
+
output_kind="table",
|
| 267 |
description=(
|
| 268 |
+
"Bucket rows by binning a numeric `column` and report how rows distribute "
|
| 269 |
+
"across segments (count, and optionally an aggregate of `value_column` per "
|
| 270 |
+
"segment). `method` 'edges' takes explicit boundaries in `bins` (e.g. "
|
| 271 |
+
"[0,18,35,60]); 'quantile' takes an integer bucket count (e.g. 4 for "
|
| 272 |
+
"quartiles). `data` is a '${t<id>}' placeholder; columns are aliases. Use "
|
| 273 |
+
"for age brackets, value tiers, etc. The binned column must be numeric."
|
| 274 |
),
|
| 275 |
),
|
| 276 |
ToolSpec(
|
| 277 |
+
name="analyze_trend",
|
| 278 |
category="analytics.timeseries",
|
| 279 |
input_schema={
|
| 280 |
+
"required": ["data", "date_column", "value_column"],
|
| 281 |
"properties": {
|
| 282 |
+
"data": {"type": "string"},
|
| 283 |
+
"date_column": {"type": "string"},
|
| 284 |
+
"value_column": {"type": "string"},
|
| 285 |
+
"freq": {"type": "string"},
|
| 286 |
+
"agg": {"type": "string"},
|
| 287 |
},
|
| 288 |
},
|
| 289 |
output_kind="series",
|
| 290 |
description=(
|
| 291 |
+
"Time-series trend in one call: bucket rows into periods (`freq` = "
|
| 292 |
+
"day/week/month/quarter/year), aggregate `value_column` per period (`agg` "
|
| 293 |
+
"defaults to sum), and summarize movement (per-period points, first vs last, "
|
| 294 |
+
"absolute/percent change, direction, linear slope). `data` is a '${t<id>}' "
|
| 295 |
+
"placeholder; `date_column`/`value_column` are aliases from the upstream "
|
| 296 |
+
"query. This replaces the atomic date_trunc tool. Do NOT use it to filter by "
|
| 297 |
+
"date — put the date filter in the query_structured IR instead."
|
| 298 |
),
|
| 299 |
),
|
| 300 |
]
|
| 301 |
|
| 302 |
|
| 303 |
def default_registry() -> ToolRegistry:
|
| 304 |
+
"""The v1 stub registry (a fresh instance per call)."""
|
| 305 |
return ToolRegistry(tools=list(_P0_TOOLS))
|
|
@@ -31,10 +31,14 @@ only a `TaskList` object that conforms to the provided schema.
|
|
| 31 |
- **Wire data between tasks with placeholders.** When a task needs an upstream
|
| 32 |
task's output as an argument, use the string `"${t<id>}"` (e.g. `"${t2}"`) as
|
| 33 |
the argument value. Set `depends_on` accordingly.
|
| 34 |
-
- **
|
| 35 |
-
|
| 36 |
-
the IR
|
| 37 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
- **Mixing structured + unstructured.** If qualitative context helps, add a
|
| 39 |
`retrieve_documents` task against an unstructured source listed in the catalog.
|
| 40 |
- **Parallelism.** List sibling tasks that have no data dependency on each other
|
|
|
|
| 31 |
- **Wire data between tasks with placeholders.** When a task needs an upstream
|
| 32 |
task's output as an argument, use the string `"${t<id>}"` (e.g. `"${t2}"`) as
|
| 33 |
the argument value. Set `depends_on` accordingly.
|
| 34 |
+
- **Data access vs analytics tools.** `query_structured` is the data-access entry
|
| 35 |
+
point: use it to select, filter, and pull rows (and simple built-in
|
| 36 |
+
count/sum/avg/min/max/count_distinct the IR can express). For anything richer —
|
| 37 |
+
descriptive statistics (median/percentile/mode/std/skew), time trends, group
|
| 38 |
+
comparisons, share-of-total, correlation, segmentation, or data-quality
|
| 39 |
+
profiling — run `query_structured` to fetch the rows, then pass its output to
|
| 40 |
+
the matching composite `analyze_*` tool via a `"${t<id>}"` `data` argument
|
| 41 |
+
(referencing the upstream result's column aliases).
|
| 42 |
- **Mixing structured + unstructured.** If qualitative context helps, add a
|
| 43 |
`retrieve_documents` task against an unstructured source listed in the catalog.
|
| 44 |
- **Parallelism.** List sibling tasks that have no data dependency on each other
|