jashdoshi77 commited on
Commit
3d9e2c3
Β·
1 Parent(s): 00f0078

fix: add aggregation granularity rules to prevent product/variant GROUP BY confusion

Browse files
Files changed (1) hide show
  1. ai/signatures.py +53 -6
ai/signatures.py CHANGED
@@ -68,6 +68,44 @@ class AnalyzeAndPlan(dspy.Signature):
68
  β†’ WHERE status = 'closed' on sales_table_v2_sales_order
69
  For product catalog or inventory questions: no status filter needed.
70
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71
  ══════════════════════════════════════════════════════════════
72
  RULE 2.5 β€” SALES ORDER vs PURCHASE ORDER DISAMBIGUATION
73
  ══════════════════════════════════════════════════════════════
@@ -127,7 +165,7 @@ class AnalyzeAndPlan(dspy.Signature):
127
  join_conditions = dspy.OutputField(desc="JOIN conditions to use, or 'none'")
128
  where_conditions = dspy.OutputField(desc="WHERE conditions including status/date filters, or 'none'")
129
  aggregations = dspy.OutputField(desc="Aggregation functions to apply, or 'none'")
130
- group_by = dspy.OutputField(desc="GROUP BY columns, or 'none'")
131
  order_by = dspy.OutputField(desc="ORDER BY clause, or 'none'")
132
  limit_val = dspy.OutputField(desc="LIMIT value, or 'none'")
133
 
@@ -145,30 +183,39 @@ class SQLGeneration(dspy.Signature):
145
  - It tells you today's date, current year, and exact date ranges for "last year"/"this year".
146
  - Always use those exact year values. NEVER guess the year.
147
 
148
- 1. SALES ORDER vs PURCHASE ORDER β€” NEVER CONFUSE THEM:
 
 
 
 
 
 
 
 
 
149
  - "purchase order", "PO", "vendor" β†’ purchase_orders_v6_purchase_order table
150
  - "sales order", "order", "revenue", "AOV", "highest order" (without "purchase") β†’ sales_table_v2_sales_order table
151
  - Highest/biggest/top "purchase order" β†’ purchase_orders_v6_purchase_order ORDER BY total_amount DESC
152
  - Highest/biggest/top "order" or "sale" β†’ sales_table_v2_sales_order ORDER BY total_amount DESC
153
 
154
- 2. USE PRE-COMPUTED TOTALS β€” NEVER RECONSTRUCT THEM:
155
  - For order-level metrics (revenue, AOV): use sales_table_v2_sales_order.total_amount
156
  - For PO totals: use purchase_orders_v6_purchase_order.total_amount
157
  - NEVER add gold_amount + diamond_amount or any component columns β€”
158
  that always gives the WRONG answer (misses labour, taxes, etc.)
159
 
160
- 3. CORRECT FORMULAS:
161
  - Revenue: SELECT SUM(total_amount) FROM sales_table_v2_sales_order WHERE status = 'closed'
162
  - AOV: SELECT AVG(total_amount) FROM sales_table_v2_sales_order WHERE status = 'closed'
163
  - Per-product revenue: SUM(line_total) FROM sales_order_line_pricing
164
  JOIN sales_order_line JOIN sales_order WHERE status = 'closed'
165
 
166
- 4. DATE FILTERING (order_date is TEXT 'YYYY-MM-DD'):
167
  - Use the EXACT year values from the [CONTEXT] block in the question.
168
  - Use: order_date >= 'YYYY-01-01' AND order_date <= 'YYYY-12-31'
169
  - Do NOT use EXTRACT() or CAST() on order_date.
170
 
171
- 5. SIMPLICITY:
172
  - Single-record lookup = simple WHERE filter, no aggregation
173
  - Only JOIN when needed, only aggregate when needed
174
 
 
68
  β†’ WHERE status = 'closed' on sales_table_v2_sales_order
69
  For product catalog or inventory questions: no status filter needed.
70
 
71
+ ══════════════════════════════════════════════════════════════
72
+ RULE 1.5 β€” AGGREGATION GRANULARITY (CRITICAL)
73
+ ══════════════════════════════════════════════════════════════
74
+ The word used in the question determines the GROUP BY level.
75
+ NEVER add extra columns to GROUP BY beyond what the question asks for.
76
+
77
+ PRODUCT vs VARIANT vs SKU:
78
+ β€’ "by product" / "per product" / "top products"
79
+ β†’ GROUP BY product_id ONLY
80
+ β†’ product_id is the product-level key (e.g. PROD-0020)
81
+ β†’ A product has MANY variants/SKUs β€” grouping by variant_sku too
82
+ will give per-variant rows, NOT per-product rows (WRONG).
83
+ β†’ There is no separate product name column in this database.
84
+ Use product_id as the product identifier.
85
+ β€’ "by variant" / "per variant" / "by SKU" / "per SKU"
86
+ β†’ GROUP BY variant_sku (and optionally product_id)
87
+ β†’ variant_sku is the fine-grained key (e.g. 105186-14K-Q12-IGI)
88
+ β€’ "with product names" when asked alongside "by product"
89
+ β†’ Still GROUP BY product_id β€” do NOT add variant_sku to GROUP BY.
90
+ product_id IS the product name in this database.
91
+
92
+ CUSTOMER:
93
+ β€’ "by customer" / "per customer" / "top customers"
94
+ β†’ GROUP BY sales_table_v2_customer_master.customer_id
95
+ β†’ JOIN customer_master to get customer_name
96
+
97
+ VENDOR:
98
+ β€’ "by vendor" / "per vendor" / "top vendors"
99
+ β†’ GROUP BY vendor_id (or vendor_name if available in the table)
100
+
101
+ ORDER:
102
+ β€’ "by order" / "per order"
103
+ β†’ GROUP BY so_id (sales) or po_id (purchase)
104
+
105
+ GENERAL RULE: Match the GROUP BY exactly to the entity noun in the question.
106
+ Never silently add extra columns (like variant_sku) when the question says "product".
107
+ Never group at a finer granularity than what was asked.
108
+
109
  ══════════════════════════════════════════════════════════════
110
  RULE 2.5 β€” SALES ORDER vs PURCHASE ORDER DISAMBIGUATION
111
  ══════════════════════════════════════════════════════════════
 
165
  join_conditions = dspy.OutputField(desc="JOIN conditions to use, or 'none'")
166
  where_conditions = dspy.OutputField(desc="WHERE conditions including status/date filters, or 'none'")
167
  aggregations = dspy.OutputField(desc="Aggregation functions to apply, or 'none'")
168
+ group_by = dspy.OutputField(desc="GROUP BY columns matching the exact entity in the question (e.g. product_id for 'by product', variant_sku for 'by variant', customer_id for 'by customer'), or 'none'")
169
  order_by = dspy.OutputField(desc="ORDER BY clause, or 'none'")
170
  limit_val = dspy.OutputField(desc="LIMIT value, or 'none'")
171
 
 
183
  - It tells you today's date, current year, and exact date ranges for "last year"/"this year".
184
  - Always use those exact year values. NEVER guess the year.
185
 
186
+ 1. GROUP BY GRANULARITY β€” MATCH EXACTLY TO THE QUESTION'S ENTITY:
187
+ - "by product" / "top products" β†’ GROUP BY product_id (NOT variant_sku, NOT both)
188
+ - "by variant" / "by SKU" β†’ GROUP BY variant_sku
189
+ - "by customer" / "top customers" β†’ GROUP BY customer_id (JOIN for customer_name)
190
+ - "by vendor" / "top vendors" β†’ GROUP BY vendor_id or vendor_name
191
+ - "by order" β†’ GROUP BY so_id or po_id
192
+ Adding extra columns to GROUP BY (e.g. variant_sku when question says "product")
193
+ is ALWAYS WRONG β€” it fragments results into variant-level rows.
194
+
195
+ 2. SALES ORDER vs PURCHASE ORDER β€” NEVER CONFUSE THEM:
196
  - "purchase order", "PO", "vendor" β†’ purchase_orders_v6_purchase_order table
197
  - "sales order", "order", "revenue", "AOV", "highest order" (without "purchase") β†’ sales_table_v2_sales_order table
198
  - Highest/biggest/top "purchase order" β†’ purchase_orders_v6_purchase_order ORDER BY total_amount DESC
199
  - Highest/biggest/top "order" or "sale" β†’ sales_table_v2_sales_order ORDER BY total_amount DESC
200
 
201
+ 3. USE PRE-COMPUTED TOTALS β€” NEVER RECONSTRUCT THEM:
202
  - For order-level metrics (revenue, AOV): use sales_table_v2_sales_order.total_amount
203
  - For PO totals: use purchase_orders_v6_purchase_order.total_amount
204
  - NEVER add gold_amount + diamond_amount or any component columns β€”
205
  that always gives the WRONG answer (misses labour, taxes, etc.)
206
 
207
+ 4. CORRECT FORMULAS:
208
  - Revenue: SELECT SUM(total_amount) FROM sales_table_v2_sales_order WHERE status = 'closed'
209
  - AOV: SELECT AVG(total_amount) FROM sales_table_v2_sales_order WHERE status = 'closed'
210
  - Per-product revenue: SUM(line_total) FROM sales_order_line_pricing
211
  JOIN sales_order_line JOIN sales_order WHERE status = 'closed'
212
 
213
+ 5. DATE FILTERING (order_date is TEXT 'YYYY-MM-DD'):
214
  - Use the EXACT year values from the [CONTEXT] block in the question.
215
  - Use: order_date >= 'YYYY-01-01' AND order_date <= 'YYYY-12-31'
216
  - Do NOT use EXTRACT() or CAST() on order_date.
217
 
218
+ 6. SIMPLICITY:
219
  - Single-record lookup = simple WHERE filter, no aggregation
220
  - Only JOIN when needed, only aggregate when needed
221