Spaces:
Sleeping
Sleeping
| ## A/B Normalization SQL Prompt Guide (For Model Inference) | |
| This guide outlines how to instruct a model to write Snowflake SQL for normalized A/B test comparisons using the ANALYTICS.ADCODE_DA.AB_GRANULAR table. | |
| --- | |
| ## Goal | |
| ALWAYS Generate a SQL query that: | |
| - Filters the table by a given ab_key and a specific date | |
| - Aggregates total impressions and revenue per ab_value | |
| - Normalizes impressions and revenue by dividing each by its allocation percentage | |
| - Selects the control group per segment as the ab_value with the highest number of impressions | |
| - Computes each variant’s performance ratio to control for both normalized revenue and impressions | |
| - Filters out low-volume rows (e.g., where total_imps < 10000) | |
| --- | |
| Dimensions | |
| ## The model must support dynamically adding or removing dimensions based on user input. | |
| Supported dimensions include: | |
| - device | |
| - browser_clean | |
| - format | |
| - topadunitname | |
| - refresh_num | |
| - lazy | |
| ## When dimensions are included in the user’s request, they must also be: | |
| - Included in the SELECT and GROUP BY clauses | |
| - Added to the PARTITION BY clause when determining the control | |
| - Used in the JOIN condition when aligning experimental rows with their corresponding control group | |
| --- | |
| ## Logic Flow | |
| 1. Filter the data for the specific AB test key and date. | |
| 2. Group by the selected dimensions and ab_value. | |
| 3. Aggregate total impressions and total revenue. | |
| 4. Normalize: | |
| - normalized_imps = total_imps / percentage | |
| - normalized_rev = total_rev / percentage | |
| 5. Determine control per segment: | |
| - Use ROW_NUMBER() partitioned by the selected dimensions (e.g., device, browser_clean) | |
| - Order by total_imps DESC | |
| - Select the first row as the control | |
| 6. Join the normalized data back to the control data using the same dimensions | |
| 7. Compute ratios to control: | |
| - (normalized_rev / control_rev) - 1 | |
| - (normalized_imps / control_imps) - 1 | |
| 8. Sort by revenue ratio to control, descending | |
| --- | |
| ## SQL Syntax Considerations | |
| - Use CAST(PERCENTAGE AS FLOAT) before division | |
| - Use HAVING SUM(IMPS) >= 10000 to filter out low-volume variants | |
| - Ensure consistent aliasing (e.g., use n for normalized data, c for control) | |
| - Always match segment dimensions exactly in the JOIN (e.g., n.device = c.device) | |
| ## Output Expectation | |
| A single Snowflake SQL query that: | |
| - Adheres to the logic above | |
| - Dynamically includes any dimensions mentioned in the request | |
| - Is clean, readable, and production-ready | |
| ## Important | |
| - Please ensure that the 'View Data Analysis & Insights' highlights the revenue and impression lift for the experiment values and flags outliers, if any. | |
| - **Default** If a user does not request specific dimensions, please portray topline or aggregate data, please only include the 'AB_KEY' and 'AB_VALUE' and aggregate the data over that timeframe | |
| - Always filter out low-volume rows HAVING SUM(IMPS) >= 10000 | |
| - Always ensure you normalize the data so that there are columns portraying the lift % for the metrics (i.e imps, rev, etc.). | |
| ## Example query | |
| WITH base_data AS ( | |
| SELECT | |
| AB_VALUE, | |
| DEVICE, | |
| BROWSER_CLEAN, | |
| CAST(PERCENTAGE AS FLOAT) AS percentage, | |
| SUM(IMPS) AS total_imps, | |
| SUM(REV) AS total_rev | |
| FROM ANALYTICS.ADCODE_DA.AB_GRANULAR | |
| WHERE AB_KEY = 'cnftp' | |
| AND DATE = '2025-04-01' | |
| GROUP BY AB_VALUE, DEVICE, BROWSER_CLEAN, PERCENTAGE | |
| HAVING SUM(IMPS) >= 10000 | |
| ), | |
| normalized AS ( | |
| SELECT | |
| AB_VALUE, | |
| DEVICE, | |
| BROWSER_CLEAN, | |
| total_imps, | |
| total_rev, | |
| percentage, | |
| (total_imps / percentage) AS norm_imps, | |
| (total_rev / percentage) AS norm_rev | |
| FROM base_data | |
| ), | |
| ranked_controls AS ( | |
| SELECT *, | |
| ROW_NUMBER() OVER ( | |
| PARTITION BY DEVICE, BROWSER_CLEAN | |
| ORDER BY total_imps DESC | |
| ) AS rn | |
| FROM normalized | |
| ), | |
| control AS ( | |
| SELECT | |
| DEVICE, | |
| BROWSER_CLEAN, | |
| AB_VALUE AS control_ab_value, | |
| norm_imps AS control_norm_imps, | |
| norm_rev AS control_norm_rev | |
| FROM ranked_controls | |
| WHERE rn = 1 | |
| ) | |
| SELECT | |
| n.AB_VALUE, | |
| n.DEVICE, | |
| n.BROWSER_CLEAN, | |
| n.total_imps, | |
| n.total_rev, | |
| n.percentage, | |
| ROUND(n.norm_imps, 2) AS norm_imps, | |
| ROUND(n.norm_rev, 2) AS norm_rev, | |
| ROUND(n.norm_imps / c.control_norm_imps, 4) - 1 AS norm_imps_ratio_to_control, | |
| ROUND(n.norm_rev / c.control_norm_rev, 4) - 1 AS norm_rev_ratio_to_control | |
| FROM normalized n | |
| JOIN control c | |
| ON n.DEVICE = c.DEVICE | |
| AND n.BROWSER_CLEAN = c.BROWSER_CLEAN | |
| ORDER BY n.DEVICE, n.BROWSER_CLEAN, norm_rev_ratio_to_control DESC; | |