sgo / docs /research /ctr_calibration.md
Claude
Explore CTR calibration: map SGO scores to click-through rate predictions
109fc16 unverified

CTR Calibration with SGO

The Question

Can SGO be calibrated to predict CTR (click-through rate)?

Short answer: yes, with a thin calibration layer on top of what SGO already provides.

What SGO Already Gives You

Each SGO evaluator produces:

{
  "score": 6,
  "action": "positive",        // ← click/no-click proxy
  "attractions": [...],
  "concerns": [...],
  "dealbreakers": [...]
}

The action field is essentially a discrete intent signal:

  • positive → would engage (click, sign up, buy)
  • neutral → might engage with the right nudge
  • negative → would not engage

The score (1-10) provides a finer-grained propensity signal.

SGO also already has:

  • Bias calibration (CoBRA-inspired) to match human cognitive biases
  • Counterfactual gradient (Jacobian) to estimate which changes move scores
  • Goal-weighted aggregation (VJP) to focus on goal-relevant evaluators

The Gap: Scores → Calibrated Probability

SGO produces ordinal scores and discrete actions, not calibrated probabilities. To get "this ad will get 2.3% CTR", you need a calibration function that maps SGO's output distribution to observed rates.

Approach: Anchor + Scale

Step 1: Collect anchors

Run SGO on a small set of creatives (ads, landing pages, emails) where you already know the real CTR from production data:

Creative Real CTR SGO positive% SGO mean score
Ad A 1.2% 24% 4.1
Ad B 3.8% 52% 6.3
Ad C 0.6% 12% 3.2
Ad D 2.1% 38% 5.5

Step 2: Fit calibration function

Use Platt scaling (logistic regression) or isotonic regression to learn:

P(click) = σ(a · score_sgo + b)

where σ is the sigmoid function, fit on the anchor data.

With as few as 5-10 anchors, you get a usable mapping. This works because:

  • The ranking from SGO is already meaningful (higher score → higher CTR)
  • You only need to learn the scale and offset, not the ranking

Step 3: Predict new creatives

Run SGO on a new creative → get score distribution → apply calibration function → get predicted CTR with confidence interval.

Step 4: Use the gradient

The real power isn't just predicting CTR — it's knowing what to change:

SGO gradient for Ad E (predicted CTR: 1.4%):
  +1.8  Simplify headline to one benefit     → predicted CTR: 2.9%
  +1.2  Add social proof (customer count)    → predicted CTR: 2.3%
  -0.4  Remove pricing from ad               → predicted CTR: 1.1%

The counterfactual deltas can be converted to CTR deltas using the same calibration function's local derivative.

When This Works Well

  • Relative ranking is the main value. Even without calibration, SGO reliably ranks creatives by appeal. If you just need "which of these 5 ads will perform best?", raw SGO scores suffice.

  • Calibration shines for absolute prediction. When you need "will this hit our 2% CTR target?", the anchor-based calibration gives you a number.

  • The gradient is unique to SGO. No CTR model tells you why the CTR is what it is and what specific change would improve it most. This is SGO's core value even when paired with an existing CTR model.

When to Be Careful

  • Population mismatch. SGO evaluators are census-grounded but synthetic. If your real audience is highly specialized (e.g., only DevOps engineers at Fortune 500), use targeted cohort generation and more anchors.

  • Context effects. Real CTR depends on placement, competition, time of day, etc. SGO evaluates the creative in isolation. Calibration anchors should come from similar contexts.

  • Small calibration sets. With <5 anchors, the calibration function is fragile. Use confidence intervals and treat predictions as directional.

Integration with Existing CTR Models

If you already have a recommender system or CTR prediction model:

Existing model:  "This ad will get 1.8% CTR"
SGO:             "Here's WHY, and changing the headline would get +0.7%"

The two are complementary:

  • CTR model → accurate predictions from behavioral data, fast inference
  • SGO → causal explanations and counterfactual optimization, no traffic needed

You can also use SGO as a pre-filter: generate 20 ad variants, rank them with SGO, then only A/B test the top 3. This reduces the exploration cost of your CTR model's feedback loop.

Script

See scripts/ctr_calibrate.py for a reference implementation that:

  1. Takes SGO results from multiple runs with known CTRs
  2. Fits a Platt scaling calibration function
  3. Predicts CTR for new SGO runs
  4. Converts counterfactual deltas to CTR deltas