jashdoshi77 commited on
Commit
600d2a0
Β·
1 Parent(s): f2e427c

fix: ban JOIN to diamond/gold detail tables for cost queries to prevent row duplication

Browse files
Files changed (1) hide show
  1. ai/signatures.py +37 -30
ai/signatures.py CHANGED
@@ -57,25 +57,30 @@ class AnalyzeAndPlan(dspy.Signature):
57
  β†’ Still filter by sales_order.status = 'closed'.
58
 
59
  COMPONENT COST BY PRODUCT (diamond cost, gold cost, making charges per product):
60
- β†’ The sales_table_v2_sales_order_line_pricing table has ALL component costs
61
- and quantity in ONE place. Use it exclusively for cost analysis.
62
- β†’ Correct formula: SUM(component_amount_per_unit * quantity)
63
- β†’ Columns available:
64
- diamond_amount_per_unit β†’ total diamond cost = SUM(diamond_amount_per_unit * quantity)
65
- gold_amount_per_unit β†’ total gold cost = SUM(gold_amount_per_unit * quantity)
66
- making_charges_per_unit β†’ total making cost = SUM(making_charges_per_unit * quantity)
67
  β†’ GROUP BY product_id for "by product", GROUP BY variant_sku for "by variant/SKU".
68
- β†’ Example β€” top 10 products by diamond cost:
69
- SELECT product_id, SUM(diamond_amount_per_unit * quantity) AS diamond_cost
70
- FROM sales_table_v2_sales_order_line_pricing
71
- GROUP BY product_id
 
 
 
72
  ORDER BY diamond_cost DESC
73
  LIMIT 10
74
- β†’ NEVER use sales_order_line_diamond or sales_order_line_gold tables for cost totals.
75
- Those detail tables have diamond_amount_per_unit WITHOUT quantity β€” using SUM on them
76
- directly gives WRONG results (undercounts because it ignores how many units were ordered).
77
- Use them ONLY when the question asks about specific diamond/gold properties
78
- (e.g. shape, quality, karat, size, carats) β€” NOT for cost or revenue calculations.
 
 
79
 
80
  PURCHASE ORDER TOTALS:
81
  β†’ Use: purchase_orders_v6_purchase_order.total_amount
@@ -219,22 +224,24 @@ class SQLGeneration(dspy.Signature):
219
  - Highest/biggest/top "purchase order" β†’ purchase_orders_v6_purchase_order ORDER BY total_amount DESC
220
  - Highest/biggest/top "order" or "sale" β†’ sales_table_v2_sales_order ORDER BY total_amount DESC
221
 
222
- 3. COMPONENT COSTS (diamond/gold/making charges) β€” USE PRICING TABLE WITH QUANTITY:
223
- - Correct table: sales_table_v2_sales_order_line_pricing
224
- - Correct formula: SUM(diamond_amount_per_unit * quantity) for diamond cost
225
- SUM(gold_amount_per_unit * quantity) for gold cost
226
- SUM(making_charges_per_unit * quantity) for making charges
227
- - NEVER use sales_order_line_diamond or sales_order_line_gold for cost aggregations.
228
- Those tables lack quantity, so SUM(diamond_amount_per_unit) there is always WRONG.
229
- - Examples:
 
 
230
  Top products by diamond cost:
231
- SELECT product_id, SUM(diamond_amount_per_unit * quantity) AS diamond_cost
232
- FROM sales_table_v2_sales_order_line_pricing
233
- GROUP BY product_id ORDER BY diamond_cost DESC LIMIT 10
234
  Top SKUs by gold cost:
235
- SELECT variant_sku, SUM(gold_amount_per_unit * quantity) AS gold_cost
236
- FROM sales_table_v2_sales_order_line_pricing
237
- GROUP BY variant_sku ORDER BY gold_cost DESC LIMIT 10
238
 
239
  5. USE PRE-COMPUTED TOTALS β€” NEVER RECONSTRUCT THEM:
240
  - For order-level metrics (revenue, AOV): use sales_table_v2_sales_order.total_amount
 
57
  β†’ Still filter by sales_order.status = 'closed'.
58
 
59
  COMPONENT COST BY PRODUCT (diamond cost, gold cost, making charges per product):
60
+ β†’ sales_table_v2_sales_order_line_pricing has ALL cost columns AND quantity.
61
+ It is SELF-SUFFICIENT. NO JOIN to any other table is needed for cost queries.
62
+ β†’ Correct formula: SUM(column * quantity) β€” multiply every time, never skip.
63
+ β†’ Columns (all in sales_order_line_pricing):
64
+ diamond_amount_per_unit β†’ SUM(diamond_amount_per_unit * quantity)
65
+ gold_amount_per_unit β†’ SUM(gold_amount_per_unit * quantity)
66
+ making_charges_per_unit β†’ SUM(making_charges_per_unit * quantity)
67
  β†’ GROUP BY product_id for "by product", GROUP BY variant_sku for "by variant/SKU".
68
+ β†’ Always prefix the column with the table alias to avoid ambiguity.
69
+
70
+ EXACT TEMPLATE β€” top 10 products by diamond cost:
71
+ SELECT lp.product_id,
72
+ SUM(lp.diamond_amount_per_unit * lp.quantity) AS diamond_cost
73
+ FROM sales_table_v2_sales_order_line_pricing lp
74
+ GROUP BY lp.product_id
75
  ORDER BY diamond_cost DESC
76
  LIMIT 10
77
+
78
+ CRITICAL β€” DO NOT JOIN sales_order_line_diamond or sales_order_line_gold for costs:
79
+ β€’ Those detail tables have MULTIPLE rows per sol_id (one per diamond type/shape/quality).
80
+ β€’ Joining them multiplies every pricing row by the number of detail rows β†’ WRONG totals.
81
+ β€’ They have no quantity column β†’ SUM(diamond_amount_per_unit) there is also WRONG.
82
+ β€’ Only use those detail tables when the question explicitly asks about diamond/gold
83
+ PROPERTIES such as shape, quality, karat, carat weight, size β€” NOT for cost/revenue.
84
 
85
  PURCHASE ORDER TOTALS:
86
  β†’ Use: purchase_orders_v6_purchase_order.total_amount
 
224
  - Highest/biggest/top "purchase order" β†’ purchase_orders_v6_purchase_order ORDER BY total_amount DESC
225
  - Highest/biggest/top "order" or "sale" β†’ sales_table_v2_sales_order ORDER BY total_amount DESC
226
 
227
+ 3. COMPONENT COSTS (diamond/gold/making charges) β€” PRICING TABLE ONLY, NO JOINS:
228
+ - ONE table only: sales_table_v2_sales_order_line_pricing (alias: lp)
229
+ - Formula: SUM(lp.diamond_amount_per_unit * lp.quantity) for diamond cost
230
+ SUM(lp.gold_amount_per_unit * lp.quantity) for gold cost
231
+ SUM(lp.making_charges_per_unit * lp.quantity) for making charges
232
+ - Always use table alias prefix (lp.product_id, lp.quantity, etc.) β€” never bare column names.
233
+ - ZERO joins needed. DO NOT join sales_order_line_diamond or sales_order_line_gold.
234
+ Joining those tables introduces duplicate rows (they have multiple rows per sol_id),
235
+ which inflates every SUM by 2x, 3x, or more β€” silently wrong results.
236
+ - Exact templates:
237
  Top products by diamond cost:
238
+ SELECT lp.product_id, SUM(lp.diamond_amount_per_unit * lp.quantity) AS diamond_cost
239
+ FROM sales_table_v2_sales_order_line_pricing lp
240
+ GROUP BY lp.product_id ORDER BY diamond_cost DESC LIMIT 10
241
  Top SKUs by gold cost:
242
+ SELECT lp.variant_sku, SUM(lp.gold_amount_per_unit * lp.quantity) AS gold_cost
243
+ FROM sales_table_v2_sales_order_line_pricing lp
244
+ GROUP BY lp.variant_sku ORDER BY gold_cost DESC LIMIT 10
245
 
246
  5. USE PRE-COMPUTED TOTALS β€” NEVER RECONSTRUCT THEM:
247
  - For order-level metrics (revenue, AOV): use sales_table_v2_sales_order.total_amount