Rifqi Hafizuddin Claude Opus 4.8 commited on
Commit
93a24da
·
1 Parent(s): 78c598c

[KM-624][AI] Planner: realign stub registry + examples to composite analyze_* tools

Browse files

Team decision: v1 uses composite "family" tools (analyze_*), not the atomic
compute_* set. Realign the planner-facing stub to the real KM-624 inventory so
the Planner plans against tools that exist.

- registry.py: replace the 9 atomic entries (compute_median/stddev/percentile/
mode, date_trunc, ...) with 12 composite entries -- 4 data-access
(query_structured, retrieve_documents, list_sources, describe_source) + 8
analyze_* (descriptive, aggregate, comparison, contribution, profile,
correlation, segment, trend). Each analyze_* takes a `data` "${t<id>}"
placeholder (Pattern A, assumed pending the tool team).
- examples.py: Example A -> analyze_contribution; Example B -> analyze_trend
(drops the removed date_trunc/compute_stddev chain).
- planner.md: rewrite the "compute_* tools" bullet as data-access vs analytics.

Validator/prompt/service unchanged (generic over the registry). Planner tests
updated locally (tests/ is gitignored): 32 passing + 1 gated, ruff clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

src/agents/planner/examples.py CHANGED
@@ -2,7 +2,11 @@
2
 
3
  Two illustrative (question -> TaskList) pairs that teach the OUTPUT SHAPE:
4
  stages, dependency edges, parallelism, ordered tool-call chains, inline QueryIR,
5
- and "${t<id>}" placeholders. They reference a hypothetical sales catalog
 
 
 
 
6
  (`src_sales` / `t_orders`); these ids are part of the illustration and are not
7
  validated against the user's real catalog. v1 is descriptive/diagnostic — no
8
  modeling tasks.
@@ -17,6 +21,9 @@ from .schemas import Task, TaskList, ToolCall
17
  # --------------------------------------------------------------------------- #
18
  # Example A — exploratory, no modeling.
19
  # "Which product categories drove last quarter's revenue?"
 
 
 
20
  # --------------------------------------------------------------------------- #
21
 
22
  _EXAMPLE_A = TaskList(
@@ -38,8 +45,8 @@ _EXAMPLE_A = TaskList(
38
  ),
39
  Task(
40
  id="t2",
41
- stage="evaluation",
42
- objective="Sum last quarter's revenue per category, ranked high to low.",
43
  tool_calls=[
44
  ToolCall(
45
  tool="query_structured",
@@ -49,12 +56,7 @@ _EXAMPLE_A = TaskList(
49
  "table_id": "t_orders",
50
  "select": [
51
  {"kind": "column", "column_id": "c_category", "alias": "category"},
52
- {
53
- "kind": "agg",
54
- "fn": "sum",
55
- "column_id": "c_revenue",
56
- "alias": "revenue",
57
- },
58
  ],
59
  "filters": [
60
  {
@@ -64,54 +66,36 @@ _EXAMPLE_A = TaskList(
64
  "value_type": "date",
65
  }
66
  ],
67
- "group_by": ["c_category"],
68
- "order_by": [{"column_id": "revenue", "dir": "desc"}],
69
- "limit": 20,
70
  }
71
  },
72
  )
73
  ],
74
- expected_output="revenue_by_category",
75
- success_criteria="Produced a ranked revenue figure per category.",
76
  depends_on=["t1"],
77
- parallelizable_with=["t3"],
78
- estimated_cost="low",
79
  ),
80
  Task(
81
  id="t3",
82
  stage="evaluation",
83
- objective="Get total last-quarter revenue to contextualize each category's share.",
84
  tool_calls=[
85
  ToolCall(
86
- tool="query_structured",
87
  args={
88
- "ir": {
89
- "source_id": "src_sales",
90
- "table_id": "t_orders",
91
- "select": [
92
- {
93
- "kind": "agg",
94
- "fn": "sum",
95
- "column_id": "c_revenue",
96
- "alias": "total_revenue",
97
- }
98
- ],
99
- "filters": [
100
- {
101
- "column_id": "c_order_date",
102
- "op": "between",
103
- "value": ["2026-01-01", "2026-03-31"],
104
- "value_type": "date",
105
- }
106
- ],
107
- }
108
  },
109
  )
110
  ],
111
- expected_output="total_revenue",
112
- success_criteria="Produced a single total revenue figure for the quarter.",
113
- depends_on=["t1"],
114
- parallelizable_with=["t2"],
115
  estimated_cost="low",
116
  ),
117
  ],
@@ -181,30 +165,28 @@ _EXAMPLE_B = TaskList(
181
  Task(
182
  id="t3",
183
  stage="evaluation",
184
- objective="Bucket the order dates into months to form the monthly trend.",
185
  tool_calls=[
186
  ToolCall(
187
- tool="date_trunc",
188
- args={"values": "${t2}", "granularity": "month"},
 
 
 
 
 
 
189
  )
190
  ],
191
- expected_output="monthly_series",
192
- success_criteria="Produced a per-month revenue series.",
 
 
 
193
  depends_on=["t2"],
194
  parallelizable_with=[],
195
  estimated_cost="low",
196
  ),
197
- Task(
198
- id="t4",
199
- stage="evaluation",
200
- objective="Quantify month-to-month spread to flag unusual months.",
201
- tool_calls=[ToolCall(tool="compute_stddev", args={"values": "${t3}"})],
202
- expected_output="monthly_volatility",
203
- success_criteria="Produced a stddev figure that flags months above the typical spread.",
204
- depends_on=["t3"],
205
- parallelizable_with=[],
206
- estimated_cost="low",
207
- ),
208
  ],
209
  )
210
 
 
2
 
3
  Two illustrative (question -> TaskList) pairs that teach the OUTPUT SHAPE:
4
  stages, dependency edges, parallelism, ordered tool-call chains, inline QueryIR,
5
+ "${t<id>}" placeholders, and the assumed data-flow convention — `query_structured`
6
+ pulls rows, then a composite `analyze_*` tool consumes them via a `data` placeholder
7
+ referencing the upstream result's column aliases (Pattern A; the tool team may
8
+ instead pick self-fetch by `source_id`, in which case these examples are reshaped
9
+ to match — see registry.py). They reference a hypothetical sales catalog
10
  (`src_sales` / `t_orders`); these ids are part of the illustration and are not
11
  validated against the user's real catalog. v1 is descriptive/diagnostic — no
12
  modeling tasks.
 
21
  # --------------------------------------------------------------------------- #
22
  # Example A — exploratory, no modeling.
23
  # "Which product categories drove last quarter's revenue?"
24
+ # Shows: query_structured pulls rows -> analyze_contribution computes each
25
+ # category's share of the total in one call (no manual per-category + total
26
+ # queries).
27
  # --------------------------------------------------------------------------- #
28
 
29
  _EXAMPLE_A = TaskList(
 
45
  ),
46
  Task(
47
  id="t2",
48
+ stage="data_preparation",
49
+ objective="Pull last quarter's order-level category and revenue rows.",
50
  tool_calls=[
51
  ToolCall(
52
  tool="query_structured",
 
56
  "table_id": "t_orders",
57
  "select": [
58
  {"kind": "column", "column_id": "c_category", "alias": "category"},
59
+ {"kind": "column", "column_id": "c_revenue", "alias": "revenue"},
 
 
 
 
 
60
  ],
61
  "filters": [
62
  {
 
66
  "value_type": "date",
67
  }
68
  ],
69
+ "limit": 10000,
 
 
70
  }
71
  },
72
  )
73
  ],
74
+ expected_output="quarter_rows",
75
+ success_criteria="Produced last quarter's order rows with category and revenue.",
76
  depends_on=["t1"],
77
+ parallelizable_with=[],
78
+ estimated_cost="medium",
79
  ),
80
  Task(
81
  id="t3",
82
  stage="evaluation",
83
+ objective="Rank each category's revenue share of the quarter total.",
84
  tool_calls=[
85
  ToolCall(
86
+ tool="analyze_contribution",
87
  args={
88
+ "data": "${t2}",
89
+ "dimension": "category",
90
+ "value_column": "revenue",
91
+ "agg": "sum",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
92
  },
93
  )
94
  ],
95
+ expected_output="category_contribution",
96
+ success_criteria="Produced each category's revenue share, ranked high to low.",
97
+ depends_on=["t2"],
98
+ parallelizable_with=[],
99
  estimated_cost="low",
100
  ),
101
  ],
 
165
  Task(
166
  id="t3",
167
  stage="evaluation",
168
+ objective="Bucket revenue into months and summarize the trend and movement.",
169
  tool_calls=[
170
  ToolCall(
171
+ tool="analyze_trend",
172
+ args={
173
+ "data": "${t2}",
174
+ "date_column": "order_date",
175
+ "value_column": "revenue",
176
+ "freq": "month",
177
+ "agg": "sum",
178
+ },
179
  )
180
  ],
181
+ expected_output="monthly_trend",
182
+ success_criteria=(
183
+ "Produced a per-month revenue series with direction and change rate to "
184
+ "flag months above/below the typical level."
185
+ ),
186
  depends_on=["t2"],
187
  parallelizable_with=[],
188
  estimated_cost="low",
189
  ),
 
 
 
 
 
 
 
 
 
 
 
190
  ],
191
  )
192
 
src/agents/planner/registry.py CHANGED
@@ -1,19 +1,44 @@
1
- """STUB v1 P0 tool registry.
2
 
3
  This is the agent team's local stand-in for the tool team's inventory (KM-608)
4
- so the planner is buildable and testable before the real tools land. The tools
5
- here are *contracts only* — there is no implementation behind them; the planner
6
- plans against the registry and never names a tool outside it (INV-7).
 
 
 
 
7
 
8
- `input_schema` is a lightweight JSON-schema-ish dict consumed by the planner
9
- validator (validator.py check #8): it carries `required` (list of arg names) and
10
- `properties` (allowed arg names). Arg *values* may be "${t<id>}" placeholders the
11
- TaskRunner resolves at execution time, so the validator checks arg *keys*, not
12
- value types except `query_structured.args["ir"]`, whose inline QueryIR is
13
- validated against the catalog by the existing IRValidator.
14
 
15
- When KM-608 ships, replace `default_registry()` with the real registry import.
16
- See AGENT_ARCHITECTURE_CONTEXT_new.md §9.2 / §9.3.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
  """
18
 
19
  from __future__ import annotations
@@ -21,6 +46,10 @@ from __future__ import annotations
21
  from .contracts import ToolRegistry, ToolSpec
22
 
23
  _P0_TOOLS: list[ToolSpec] = [
 
 
 
 
24
  ToolSpec(
25
  name="query_structured",
26
  category="analytics.query",
@@ -30,11 +59,13 @@ _P0_TOOLS: list[ToolSpec] = [
30
  "Run one validated, single-table query against a structured source (DB "
31
  "schema or tabular file) and return rows. The `ir` argument is an inline "
32
  "QueryIR (the JSON intent: source_id, table_id, select, filters, group_by, "
33
- "order_by, limit) — never SQL. Use this for any selection, filtering, "
34
- "grouping, or built-in aggregation (count/sum/avg/min/max/count_distinct). "
35
- "Do NOT use it for medians/percentiles/modes/stddev (use the compute_* "
36
- "tools on its output) and do NOT use it to read documents (use "
37
- "retrieve_documents)."
 
 
38
  ),
39
  ),
40
  ToolSpec(
@@ -81,79 +112,194 @@ _P0_TOOLS: list[ToolSpec] = [
81
  "before querying it. Do NOT use it to fetch data rows (use query_structured)."
82
  ),
83
  ),
 
 
 
 
84
  ToolSpec(
85
- name="compute_median",
86
- category="analytics.aggregation",
87
- input_schema={"required": ["values"], "properties": {"values": {"type": "array"}}},
88
- output_kind="scalar",
 
 
 
 
 
 
 
89
  description=(
90
- "Compute the median of a numeric series. `values` is typically a "
91
- "'${t<id>}' placeholder referencing an upstream query_structured output "
92
- "column. Use this because SQL/pandas median is not exposed via the IR. Do "
93
- "NOT use it on categorical data (use compute_mode)."
 
 
 
94
  ),
95
  ),
96
  ToolSpec(
97
- name="compute_stddev",
98
  category="analytics.aggregation",
99
- input_schema={"required": ["values"], "properties": {"values": {"type": "array"}}},
100
- output_kind="scalar",
 
 
 
 
 
 
 
101
  description=(
102
- "Compute the standard deviation of a numeric series (`values`, usually a "
103
- "'${t<id>}' placeholder from an upstream query). Use to quantify spread or "
104
- "to flag outliers. Do NOT use on non-numeric data."
 
 
 
 
105
  ),
106
  ),
107
  ToolSpec(
108
- name="compute_percentile",
109
- category="analytics.aggregation",
110
  input_schema={
111
- "required": ["values", "percentile"],
112
  "properties": {
113
- "values": {"type": "array"},
114
- "percentile": {"type": "number"},
 
 
 
 
115
  },
116
  },
117
- output_kind="scalar",
118
  description=(
119
- "Compute a given `percentile` (0-100) of a numeric series `values` "
120
- "(usually a '${t<id>}' placeholder). Use for p90/p95-style thresholds. Do "
121
- "NOT use for the median alone (use compute_median)."
 
 
 
 
122
  ),
123
  ),
124
  ToolSpec(
125
- name="compute_mode",
126
- category="analytics.aggregation",
127
- input_schema={"required": ["values"], "properties": {"values": {"type": "array"}}},
128
- output_kind="scalar",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
129
  description=(
130
- "Compute the most frequent value(s) of a series `values` (usually a "
131
- "'${t<id>}' placeholder). Works on categorical or numeric data. Use to find "
132
- "the typical category. Do NOT use it for an average (use query_structured "
133
- "avg)."
 
 
134
  ),
135
  ),
136
  ToolSpec(
137
- name="date_trunc",
138
  category="analytics.timeseries",
139
  input_schema={
140
- "required": ["values", "granularity"],
141
  "properties": {
142
- "values": {"type": "array"},
143
- "granularity": {"type": "string"},
 
 
 
144
  },
145
  },
146
  output_kind="series",
147
  description=(
148
- "Truncate a datetime series `values` (usually a '${t<id>}' placeholder) to "
149
- "a `granularity` ('day' | 'week' | 'month' | 'quarter' | 'year') so results "
150
- "can be grouped into time buckets for trend analysis. Do NOT use it to "
151
- "filter by date put a date filter in the query_structured IR instead."
 
 
 
152
  ),
153
  ),
154
  ]
155
 
156
 
157
  def default_registry() -> ToolRegistry:
158
- """The v1 P0 stub registry (a fresh instance per call)."""
159
  return ToolRegistry(tools=list(_P0_TOOLS))
 
1
+ """STUB v1 tool registry — composite ("family") tools.
2
 
3
  This is the agent team's local stand-in for the tool team's inventory (KM-608)
4
+ so the planner is buildable and testable before the real wrapper layer lands.
5
+ The tools here are *contracts only* — the compute logic for the `analyze_*`
6
+ family already exists in `src/tools/analytics/` (KM-624), but the wrapper layer
7
+ (source/placeholder -> DataFrame fetch, the `ToolOutput` envelope, never-throw
8
+ error handling, ToolSpec registration) is still pending the Planner seam
9
+ (KM-418 / AGENT_ARCHITECTURE_CONTEXT_new.md §8.4). The planner plans against the
10
+ registry and never names a tool outside it (INV-7).
11
 
12
+ **Taxonomy decision (2026-06-08):** v1 uses **composite/family** tools, not the
13
+ atomic `compute_*` set the earlier draft assumed. One `analyze_*` call does a
14
+ whole analytical job (e.g. `analyze_descriptive` returns mean/median/mode/std/
15
+ quartiles/skew/null_rate at once, replacing four atomic `compute_*` tools). See
16
+ §9.3 / the decisions table in the architecture doc.
 
17
 
18
+ **Ownership (revised 2026-06-08): the tool team owns ALL tools** compute,
19
+ data-access (`query_structured`/`retrieve_documents`/`list_sources`/
20
+ `describe_source`), the wrapper/invoker, and tests. This file is purely the agent
21
+ team's local scaffold for building/testing the Planner (and later the TaskRunner/
22
+ Assembler against mocks) until the real registry lands; replace it then.
23
+
24
+ **Data-flow convention (Pattern A — assumed, but the tool team's call, still open):**
25
+ this stub assumes the `analyze_*` tools do NOT self-fetch by `source_id`; each
26
+ takes a `data` argument that is a `"${t<id>}"` placeholder pointing at an upstream
27
+ `query_structured` table output, resolved to a DataFrame at execution time. Column
28
+ arguments (`column_ids`, `dimension`, `value_column`, `date_column`, …) reference
29
+ the *aliases* the upstream query produced. If the tool team instead picks Pattern B
30
+ (self-fetch by `source_id`), reshape this stub + the few-shot examples to match —
31
+ the agent code does not change either way (INV-7).
32
+
33
+ `input_schema` is the lightweight JSON-schema-ish dict the planner validator
34
+ (validator.py check #8) consumes: `required` (list of arg names) + `properties`
35
+ (allowed arg names). Arg *values* may be `"${t<id>}"` placeholders resolved at
36
+ execution time, so the validator checks arg *keys*, not value types — except
37
+ `query_structured.args["ir"]`, whose inline QueryIR is validated against the
38
+ catalog by the existing IRValidator.
39
+
40
+ When KM-608/KM-418 ship, replace `default_registry()` with the real registry
41
+ import. See AGENT_ARCHITECTURE_CONTEXT_new.md §9.2 / §9.3.
42
  """
43
 
44
  from __future__ import annotations
 
46
  from .contracts import ToolRegistry, ToolSpec
47
 
48
  _P0_TOOLS: list[ToolSpec] = [
49
+ # ----------------------------------------------------------------------- #
50
+ # Data access + catalog introspection (agent-team owned; wrap existing
51
+ # Phase 2 infra — QueryService / RetrievalRouter / CatalogReader).
52
+ # ----------------------------------------------------------------------- #
53
  ToolSpec(
54
  name="query_structured",
55
  category="analytics.query",
 
59
  "Run one validated, single-table query against a structured source (DB "
60
  "schema or tabular file) and return rows. The `ir` argument is an inline "
61
  "QueryIR (the JSON intent: source_id, table_id, select, filters, group_by, "
62
+ "order_by, limit) — never SQL. This is the data-access entry point: use it "
63
+ "to select, filter, and pull the rows the analytics (`analyze_*`) tools "
64
+ "then consume. It also does simple built-in aggregation the IR can express "
65
+ "(count/sum/avg/min/max/count_distinct). Do NOT use it for richer statistics "
66
+ "(median/percentile/mode/stddev/skew → analyze_descriptive), trends "
67
+ "(analyze_trend), correlation, segmentation, or share-of-total; and do NOT "
68
+ "use it to read documents (use retrieve_documents)."
69
  ),
70
  ),
71
  ToolSpec(
 
112
  "before querying it. Do NOT use it to fetch data rows (use query_structured)."
113
  ),
114
  ),
115
+ # ----------------------------------------------------------------------- #
116
+ # Analytics family (KM-624 compute; wrapper pending). Each takes `data` =
117
+ # a "${t<id>}" placeholder for an upstream query_structured table output.
118
+ # ----------------------------------------------------------------------- #
119
  ToolSpec(
120
+ name="analyze_descriptive",
121
+ category="analytics.descriptive",
122
+ input_schema={
123
+ "required": ["data", "column_ids"],
124
+ "properties": {
125
+ "data": {"type": "string"},
126
+ "column_ids": {"type": "array"},
127
+ "metrics": {"type": "array"},
128
+ },
129
+ },
130
+ output_kind="stats",
131
  description=(
132
+ "Single/multi-column EDA in one call: count, mean, median, mode, std, "
133
+ "variance, quartiles (q1/q3), min, max, skew, null_count, null_rate for each "
134
+ "of `column_ids`. `data` is a '${t<id>}' placeholder for an upstream "
135
+ "query_structured result; `column_ids` are that result's column aliases. "
136
+ "This replaces the atomic compute_median/mode/stddev/percentile tools — ask "
137
+ "for the whole profile, not one statistic at a time. Do NOT use it for "
138
+ "group-by aggregates (analyze_aggregate) or time trends (analyze_trend)."
139
  ),
140
  ),
141
  ToolSpec(
142
+ name="analyze_aggregate",
143
  category="analytics.aggregation",
144
+ input_schema={
145
+ "required": ["data", "aggregations"],
146
+ "properties": {
147
+ "data": {"type": "string"},
148
+ "aggregations": {"type": "object"},
149
+ "group_by": {"type": "array"},
150
+ },
151
+ },
152
+ output_kind="table",
153
  description=(
154
+ "Group-by aggregation over an already-materialized result: per group, "
155
+ "compute `aggregations` like {\"revenue\": [\"sum\", \"mean\"], "
156
+ "\"order_id\": [\"count\"]} (sum/mean/count/min/max/median/nunique). `data` "
157
+ "is a '${t<id>}' placeholder; `group_by` columns and aggregated columns are "
158
+ "that result's aliases. Prefer query_structured for simple group-by the IR "
159
+ "can already express; use this to aggregate a derived/joined/intermediate "
160
+ "result, or for median per group (the IR cannot)."
161
  ),
162
  ),
163
  ToolSpec(
164
+ name="analyze_comparison",
165
+ category="analytics.comparison",
166
  input_schema={
167
+ "required": ["data", "dimension", "value_column", "group_a", "group_b"],
168
  "properties": {
169
+ "data": {"type": "string"},
170
+ "dimension": {"type": "string"},
171
+ "value_column": {"type": "string"},
172
+ "group_a": {},
173
+ "group_b": {},
174
+ "agg": {"type": "string"},
175
  },
176
  },
177
+ output_kind="stats",
178
  description=(
179
+ "Compare one aggregated metric between two groups of a dimension (e.g. "
180
+ "region 'A' vs 'B'): returns each group's value, absolute and percent "
181
+ "difference, and direction (higher/lower/equal); group_a is the baseline. "
182
+ "`data` is a '${t<id>}' placeholder; `dimension`/`value_column` are aliases; "
183
+ "`agg` defaults to sum. Use for exactly TWO groups. For many categories' "
184
+ "share of a total use analyze_contribution; for movement over time use "
185
+ "analyze_trend."
186
  ),
187
  ),
188
  ToolSpec(
189
+ name="analyze_contribution",
190
+ category="analytics.decomposition",
191
+ input_schema={
192
+ "required": ["data", "dimension", "value_column"],
193
+ "properties": {
194
+ "data": {"type": "string"},
195
+ "dimension": {"type": "string"},
196
+ "value_column": {"type": "string"},
197
+ "agg": {"type": "string"},
198
+ "top_n": {"type": "integer"},
199
+ },
200
+ },
201
+ output_kind="table",
202
+ description=(
203
+ "Share-of-total breakdown: each category's value, share, and running "
204
+ "cumulative share, largest first — the tool for 'which categories drive "
205
+ "most of X?' and Pareto (80/20) reasoning. `data` is a '${t<id>}' "
206
+ "placeholder; `dimension`/`value_column` are aliases; `agg` defaults to sum; "
207
+ "`top_n` lumps the tail into an 'Others' row. Use for a single snapshot of "
208
+ "many categories. Do NOT use it to compare exactly two groups "
209
+ "(analyze_comparison) or to trend over time (analyze_trend)."
210
+ ),
211
+ ),
212
+ ToolSpec(
213
+ name="analyze_profile",
214
+ category="analytics.quality",
215
+ input_schema={
216
+ "required": ["data"],
217
+ "properties": {"data": {"type": "string"}, "column_ids": {"type": "array"}},
218
+ },
219
+ output_kind="stats",
220
+ description=(
221
+ "Per-column data-quality profile: dtype, inferred type, completeness "
222
+ "(null_count/null_rate), cardinality (distinct_count/rate, is_constant), and "
223
+ "for numeric columns min/max/mean plus an IQR-based outlier_count (top value "
224
+ "for non-numeric). `data` is a '${t<id>}' placeholder; `column_ids` defaults "
225
+ "to all columns. Use in data_understanding to judge whether data is clean "
226
+ "enough before deeper analysis. Do NOT use it for the analytical answer "
227
+ "itself — it describes data health, not the business metric."
228
+ ),
229
+ ),
230
+ ToolSpec(
231
+ name="analyze_correlation",
232
+ category="analytics.relationship",
233
+ input_schema={
234
+ "required": ["data"],
235
+ "properties": {
236
+ "data": {"type": "string"},
237
+ "column_ids": {"type": "array"},
238
+ "method": {"type": "string"},
239
+ },
240
+ },
241
+ output_kind="stats",
242
+ description=(
243
+ "Pairwise correlation across numeric columns: returns the full matrix plus "
244
+ "column pairs ranked by strength. `data` is a '${t<id>}' placeholder; "
245
+ "`column_ids` defaults to all numeric columns; `method` is pearson "
246
+ "(default), spearman, or kendall. Use for 'does X relate to Y?'. Needs at "
247
+ "least two numeric columns. Correlation is not causation — it does not "
248
+ "explain why, and is not a model."
249
+ ),
250
+ ),
251
+ ToolSpec(
252
+ name="analyze_segment",
253
+ category="analytics.segmentation",
254
+ input_schema={
255
+ "required": ["data", "column", "bins"],
256
+ "properties": {
257
+ "data": {"type": "string"},
258
+ "column": {"type": "string"},
259
+ "bins": {},
260
+ "method": {"type": "string"},
261
+ "labels": {"type": "array"},
262
+ "value_column": {"type": "string"},
263
+ "agg": {"type": "string"},
264
+ },
265
+ },
266
+ output_kind="table",
267
  description=(
268
+ "Bucket rows by binning a numeric `column` and report how rows distribute "
269
+ "across segments (count, and optionally an aggregate of `value_column` per "
270
+ "segment). `method` 'edges' takes explicit boundaries in `bins` (e.g. "
271
+ "[0,18,35,60]); 'quantile' takes an integer bucket count (e.g. 4 for "
272
+ "quartiles). `data` is a '${t<id>}' placeholder; columns are aliases. Use "
273
+ "for age brackets, value tiers, etc. The binned column must be numeric."
274
  ),
275
  ),
276
  ToolSpec(
277
+ name="analyze_trend",
278
  category="analytics.timeseries",
279
  input_schema={
280
+ "required": ["data", "date_column", "value_column"],
281
  "properties": {
282
+ "data": {"type": "string"},
283
+ "date_column": {"type": "string"},
284
+ "value_column": {"type": "string"},
285
+ "freq": {"type": "string"},
286
+ "agg": {"type": "string"},
287
  },
288
  },
289
  output_kind="series",
290
  description=(
291
+ "Time-series trend in one call: bucket rows into periods (`freq` = "
292
+ "day/week/month/quarter/year), aggregate `value_column` per period (`agg` "
293
+ "defaults to sum), and summarize movement (per-period points, first vs last, "
294
+ "absolute/percent change, direction, linear slope). `data` is a '${t<id>}' "
295
+ "placeholder; `date_column`/`value_column` are aliases from the upstream "
296
+ "query. This replaces the atomic date_trunc tool. Do NOT use it to filter by "
297
+ "date — put the date filter in the query_structured IR instead."
298
  ),
299
  ),
300
  ]
301
 
302
 
303
  def default_registry() -> ToolRegistry:
304
+ """The v1 stub registry (a fresh instance per call)."""
305
  return ToolRegistry(tools=list(_P0_TOOLS))
src/config/prompts/planner.md CHANGED
@@ -31,10 +31,14 @@ only a `TaskList` object that conforms to the provided schema.
31
  - **Wire data between tasks with placeholders.** When a task needs an upstream
32
  task's output as an argument, use the string `"${t<id>}"` (e.g. `"${t2}"`) as
33
  the argument value. Set `depends_on` accordingly.
34
- - **Built-in aggregation vs compute_* tools.** Use `query_structured` for
35
- count/sum/avg/min/max/count_distinct, filtering, and grouping. For statistics
36
- the IR cannot express (median, percentile, mode, standard deviation), run
37
- `query_structured` to fetch the series, then a `compute_*` tool on its output.
 
 
 
 
38
  - **Mixing structured + unstructured.** If qualitative context helps, add a
39
  `retrieve_documents` task against an unstructured source listed in the catalog.
40
  - **Parallelism.** List sibling tasks that have no data dependency on each other
 
31
  - **Wire data between tasks with placeholders.** When a task needs an upstream
32
  task's output as an argument, use the string `"${t<id>}"` (e.g. `"${t2}"`) as
33
  the argument value. Set `depends_on` accordingly.
34
+ - **Data access vs analytics tools.** `query_structured` is the data-access entry
35
+ point: use it to select, filter, and pull rows (and simple built-in
36
+ count/sum/avg/min/max/count_distinct the IR can express). For anything richer
37
+ descriptive statistics (median/percentile/mode/std/skew), time trends, group
38
+ comparisons, share-of-total, correlation, segmentation, or data-quality
39
+ profiling — run `query_structured` to fetch the rows, then pass its output to
40
+ the matching composite `analyze_*` tool via a `"${t<id>}"` `data` argument
41
+ (referencing the upstream result's column aliases).
42
  - **Mixing structured + unstructured.** If qualitative context helps, add a
43
  `retrieve_documents` task against an unstructured source listed in the catalog.
44
  - **Parallelism.** List sibling tasks that have no data dependency on each other