Spaces:

Avisut
/

Revops_Data_Wizard

Sleeping

File size: 4,593 Bytes

0555d94


## A/B Normalization SQL Prompt Guide (For Model Inference)

This guide outlines how to instruct a model to write Snowflake SQL for normalized A/B test comparisons using the ANALYTICS.ADCODE_DA.AB_GRANULAR table.

---

## Goal

ALWAYS Generate a SQL query that:
- Filters the table by a given ab_key and a specific date
- Aggregates total impressions and revenue per ab_value
- Normalizes impressions and revenue by dividing each by its allocation percentage
- Selects the control group per segment as the ab_value with the highest number of impressions
- Computes each variant’s performance ratio to control for both normalized revenue and impressions
- Filters out low-volume rows (e.g., where total_imps < 10000)

---

Dimensions
## The model must support dynamically adding or removing dimensions based on user input.

Supported dimensions include:
- device
- browser_clean
- format
- topadunitname
- refresh_num
- lazy

## When dimensions are included in the user’s request, they must also be:
- Included in the SELECT and GROUP BY clauses
- Added to the PARTITION BY clause when determining the control
- Used in the JOIN condition when aligning experimental rows with their corresponding control group

---

## Logic Flow

1. Filter the data for the specific AB test key and date.
2. Group by the selected dimensions and ab_value.
3. Aggregate total impressions and total revenue.
4. Normalize:
   - normalized_imps = total_imps / percentage
   - normalized_rev = total_rev / percentage
5. Determine control per segment:
   - Use ROW_NUMBER() partitioned by the selected dimensions (e.g., device, browser_clean)
   - Order by total_imps DESC
   - Select the first row as the control
6. Join the normalized data back to the control data using the same dimensions
7. Compute ratios to control:
   - (normalized_rev / control_rev) - 1
   - (normalized_imps / control_imps) - 1
8. Sort by revenue ratio to control, descending

---

## SQL Syntax Considerations

- Use CAST(PERCENTAGE AS FLOAT) before division
- Use HAVING SUM(IMPS) >= 10000 to filter out low-volume variants
- Ensure consistent aliasing (e.g., use n for normalized data, c for control)
- Always match segment dimensions exactly in the JOIN (e.g., n.device = c.device)


## Output Expectation

A single Snowflake SQL query that:
- Adheres to the logic above
- Dynamically includes any dimensions mentioned in the request
- Is clean, readable, and production-ready



## Important
  - Please ensure that the 'View Data Analysis & Insights' highlights the revenue and impression lift for the experiment values and flags outliers, if any.
  - **Default** If a user does not request specific dimensions, please portray topline or aggregate data, please only include the 'AB_KEY' and 'AB_VALUE' and aggregate the data over that timeframe
  - Always filter out low-volume rows HAVING SUM(IMPS) >= 10000
  - Always ensure you normalize the data so that there are columns portraying the lift % for the metrics (i.e imps, rev, etc.).


## Example query

WITH base_data AS (
    SELECT 
        AB_VALUE,
        DEVICE,
        BROWSER_CLEAN,
        CAST(PERCENTAGE AS FLOAT) AS percentage,
        SUM(IMPS) AS total_imps,
        SUM(REV) AS total_rev
    FROM ANALYTICS.ADCODE_DA.AB_GRANULAR
    WHERE AB_KEY = 'cnftp'
      AND DATE = '2025-04-01'
    GROUP BY AB_VALUE, DEVICE, BROWSER_CLEAN, PERCENTAGE
    HAVING SUM(IMPS) >= 10000
),
normalized AS (
    SELECT
        AB_VALUE,
        DEVICE,
        BROWSER_CLEAN,
        total_imps,
        total_rev,
        percentage,
        (total_imps / percentage) AS norm_imps,
        (total_rev / percentage) AS norm_rev
    FROM base_data
),
ranked_controls AS (
    SELECT *,
        ROW_NUMBER() OVER (
            PARTITION BY DEVICE, BROWSER_CLEAN
            ORDER BY total_imps DESC
        ) AS rn
    FROM normalized
),
control AS (
    SELECT 
        DEVICE,
        BROWSER_CLEAN,
        AB_VALUE AS control_ab_value,
        norm_imps AS control_norm_imps,
        norm_rev AS control_norm_rev
    FROM ranked_controls
    WHERE rn = 1
)
SELECT 
    n.AB_VALUE,
    n.DEVICE,
    n.BROWSER_CLEAN,
    n.total_imps,
    n.total_rev,
    n.percentage,
    ROUND(n.norm_imps, 2) AS norm_imps,
    ROUND(n.norm_rev, 2) AS norm_rev,
    ROUND(n.norm_imps / c.control_norm_imps, 4) - 1 AS norm_imps_ratio_to_control,
    ROUND(n.norm_rev / c.control_norm_rev, 4) - 1 AS norm_rev_ratio_to_control
FROM normalized n
JOIN control c
  ON n.DEVICE = c.DEVICE
 AND n.BROWSER_CLEAN = c.BROWSER_CLEAN
ORDER BY n.DEVICE, n.BROWSER_CLEAN, norm_rev_ratio_to_control DESC;