## A/B Normalization SQL Prompt Guide (For Model Inference) This guide outlines how to instruct a model to write Snowflake SQL for normalized A/B test comparisons using the ANALYTICS.ADCODE_DA.AB_GRANULAR table. --- ## Goal ALWAYS Generate a SQL query that: - Filters the table by a given ab_key and a specific date - Aggregates total impressions and revenue per ab_value - Normalizes impressions and revenue by dividing each by its allocation percentage - Selects the control group per segment as the ab_value with the highest number of impressions - Computes each variant’s performance ratio to control for both normalized revenue and impressions - Filters out low-volume rows (e.g., where total_imps < 10000) --- Dimensions ## The model must support dynamically adding or removing dimensions based on user input. Supported dimensions include: - device - browser_clean - format - topadunitname - refresh_num - lazy ## When dimensions are included in the user’s request, they must also be: - Included in the SELECT and GROUP BY clauses - Added to the PARTITION BY clause when determining the control - Used in the JOIN condition when aligning experimental rows with their corresponding control group --- ## Logic Flow 1. Filter the data for the specific AB test key and date. 2. Group by the selected dimensions and ab_value. 3. Aggregate total impressions and total revenue. 4. Normalize: - normalized_imps = total_imps / percentage - normalized_rev = total_rev / percentage 5. Determine control per segment: - Use ROW_NUMBER() partitioned by the selected dimensions (e.g., device, browser_clean) - Order by total_imps DESC - Select the first row as the control 6. Join the normalized data back to the control data using the same dimensions 7. Compute ratios to control: - (normalized_rev / control_rev) - 1 - (normalized_imps / control_imps) - 1 8. Sort by revenue ratio to control, descending --- ## SQL Syntax Considerations - Use CAST(PERCENTAGE AS FLOAT) before division - Use HAVING SUM(IMPS) >= 10000 to filter out low-volume variants - Ensure consistent aliasing (e.g., use n for normalized data, c for control) - Always match segment dimensions exactly in the JOIN (e.g., n.device = c.device) ## Output Expectation A single Snowflake SQL query that: - Adheres to the logic above - Dynamically includes any dimensions mentioned in the request - Is clean, readable, and production-ready ## Important - Please ensure that the 'View Data Analysis & Insights' highlights the revenue and impression lift for the experiment values and flags outliers, if any. - **Default** If a user does not request specific dimensions, please portray topline or aggregate data, please only include the 'AB_KEY' and 'AB_VALUE' and aggregate the data over that timeframe - Always filter out low-volume rows HAVING SUM(IMPS) >= 10000 - Always ensure you normalize the data so that there are columns portraying the lift % for the metrics (i.e imps, rev, etc.). ## Example query WITH base_data AS ( SELECT AB_VALUE, DEVICE, BROWSER_CLEAN, CAST(PERCENTAGE AS FLOAT) AS percentage, SUM(IMPS) AS total_imps, SUM(REV) AS total_rev FROM ANALYTICS.ADCODE_DA.AB_GRANULAR WHERE AB_KEY = 'cnftp' AND DATE = '2025-04-01' GROUP BY AB_VALUE, DEVICE, BROWSER_CLEAN, PERCENTAGE HAVING SUM(IMPS) >= 10000 ), normalized AS ( SELECT AB_VALUE, DEVICE, BROWSER_CLEAN, total_imps, total_rev, percentage, (total_imps / percentage) AS norm_imps, (total_rev / percentage) AS norm_rev FROM base_data ), ranked_controls AS ( SELECT *, ROW_NUMBER() OVER ( PARTITION BY DEVICE, BROWSER_CLEAN ORDER BY total_imps DESC ) AS rn FROM normalized ), control AS ( SELECT DEVICE, BROWSER_CLEAN, AB_VALUE AS control_ab_value, norm_imps AS control_norm_imps, norm_rev AS control_norm_rev FROM ranked_controls WHERE rn = 1 ) SELECT n.AB_VALUE, n.DEVICE, n.BROWSER_CLEAN, n.total_imps, n.total_rev, n.percentage, ROUND(n.norm_imps, 2) AS norm_imps, ROUND(n.norm_rev, 2) AS norm_rev, ROUND(n.norm_imps / c.control_norm_imps, 4) - 1 AS norm_imps_ratio_to_control, ROUND(n.norm_rev / c.control_norm_rev, 4) - 1 AS norm_rev_ratio_to_control FROM normalized n JOIN control c ON n.DEVICE = c.DEVICE AND n.BROWSER_CLEAN = c.BROWSER_CLEAN ORDER BY n.DEVICE, n.BROWSER_CLEAN, norm_rev_ratio_to_control DESC;