Buckets:

MisterAI
/

LocalAI_Demo_data

Files

xet

MisterAI/LocalAI_Demo_data / skills /ab-test-setup /references /sample-size-guide.md

MisterAI

about 1 month ago

preview code

download

raw

7.07 kB

	# Sample Size Guide

	Reference for calculating sample sizes and test duration.

	## Sample Size Fundamentals

	### Required Inputs

	1. Baseline conversion rate: Your current rate
	2. Minimum detectable effect (MDE): Smallest change worth detecting
	3. Statistical significance level: Usually 95% (α = 0.05)
	4. Statistical power: Usually 80% (β = 0.20)

	### What These Mean

	Baseline conversion rate: If your page converts at 5%, that's your baseline.

	MDE (Minimum Detectable Effect): The smallest improvement you care about detecting. Set this based on:
	- Business impact (is a 5% lift meaningful?)
	- Implementation cost (worth the effort?)
	- Realistic expectations (what have past tests shown?)

	Statistical significance (95%): Means there's less than 5% chance the observed difference is due to random chance.

	Statistical power (80%): Means if there's a real effect of size MDE, you have 80% chance of detecting it.

	---

	## Sample Size Quick Reference Tables

	### Conversion Rate: 1%

	\| Lift to Detect \| Sample per Variant \| Total Sample \|
	\|----------------\|-------------------\|--------------\|
	\| 5% (1% → 1.05%) \| 1,500,000 \| 3,000,000 \|
	\| 10% (1% → 1.1%) \| 380,000 \| 760,000 \|
	\| 20% (1% → 1.2%) \| 97,000 \| 194,000 \|
	\| 50% (1% → 1.5%) \| 16,000 \| 32,000 \|
	\| 100% (1% → 2%) \| 4,200 \| 8,400 \|

	### Conversion Rate: 3%

	\| Lift to Detect \| Sample per Variant \| Total Sample \|
	\|----------------\|-------------------\|--------------\|
	\| 5% (3% → 3.15%) \| 480,000 \| 960,000 \|
	\| 10% (3% → 3.3%) \| 120,000 \| 240,000 \|
	\| 20% (3% → 3.6%) \| 31,000 \| 62,000 \|
	\| 50% (3% → 4.5%) \| 5,200 \| 10,400 \|
	\| 100% (3% → 6%) \| 1,400 \| 2,800 \|

	### Conversion Rate: 5%

	\| Lift to Detect \| Sample per Variant \| Total Sample \|
	\|----------------\|-------------------\|--------------\|
	\| 5% (5% → 5.25%) \| 280,000 \| 560,000 \|
	\| 10% (5% → 5.5%) \| 72,000 \| 144,000 \|
	\| 20% (5% → 6%) \| 18,000 \| 36,000 \|
	\| 50% (5% → 7.5%) \| 3,100 \| 6,200 \|
	\| 100% (5% → 10%) \| 810 \| 1,620 \|

	### Conversion Rate: 10%

	\| Lift to Detect \| Sample per Variant \| Total Sample \|
	\|----------------\|-------------------\|--------------\|
	\| 5% (10% → 10.5%) \| 130,000 \| 260,000 \|
	\| 10% (10% → 11%) \| 34,000 \| 68,000 \|
	\| 20% (10% → 12%) \| 8,700 \| 17,400 \|
	\| 50% (10% → 15%) \| 1,500 \| 3,000 \|
	\| 100% (10% → 20%) \| 400 \| 800 \|

	### Conversion Rate: 20%

	\| Lift to Detect \| Sample per Variant \| Total Sample \|
	\|----------------\|-------------------\|--------------\|
	\| 5% (20% → 21%) \| 60,000 \| 120,000 \|
	\| 10% (20% → 22%) \| 16,000 \| 32,000 \|
	\| 20% (20% → 24%) \| 4,000 \| 8,000 \|
	\| 50% (20% → 30%) \| 700 \| 1,400 \|
	\| 100% (20% → 40%) \| 200 \| 400 \|

	---

	## Duration Calculator

	### Formula

	```
	Duration (days) = (Sample per variant × Number of variants) / (Daily traffic × % exposed)
	```

	### Examples

	Scenario 1: High-traffic page
	- Need: 10,000 per variant (2 variants = 20,000 total)
	- Daily traffic: 5,000 visitors
	- 100% exposed to test
	- Duration: 20,000 / 5,000 = 4 days

	Scenario 2: Medium-traffic page
	- Need: 30,000 per variant (60,000 total)
	- Daily traffic: 2,000 visitors
	- 100% exposed
	- Duration: 60,000 / 2,000 = 30 days

	Scenario 3: Low-traffic with partial exposure
	- Need: 15,000 per variant (30,000 total)
	- Daily traffic: 500 visitors
	- 50% exposed to test
	- Effective daily: 250
	- Duration: 30,000 / 250 = 120 days (too long!)

	### Minimum Duration Rules

	Even with sufficient sample size, run tests for at least:
	- 1 full week: To capture day-of-week variation
	- 2 business cycles: If B2B (weekday vs. weekend patterns)
	- Through paydays: If e-commerce (beginning/end of month)

	### Maximum Duration Guidelines

	Avoid running tests longer than 4-8 weeks:
	- Novelty effects wear off
	- External factors intervene
	- Opportunity cost of other tests

	---

	## Online Calculators

	### Recommended Tools

	Evan Miller's Calculator
	https://www.evanmiller.org/ab-testing/sample-size.html
	- Simple interface
	- Bookmark-worthy

	Optimizely's Calculator
	https://www.optimizely.com/sample-size-calculator/
	- Business-friendly language
	- Duration estimates

	AB Test Guide Calculator
	https://www.abtestguide.com/calc/
	- Includes Bayesian option
	- Multiple test types

	VWO Duration Calculator
	https://vwo.com/tools/ab-test-duration-calculator/
	- Duration-focused
	- Good for planning

	---

	## Adjusting for Multiple Variants

	With more than 2 variants (A/B/n tests), you need more sample:

	\| Variants \| Multiplier \|
	\|----------\|------------\|
	\| 2 (A/B) \| 1x \|
	\| 3 (A/B/C) \| ~1.5x \|
	\| 4 (A/B/C/D) \| ~2x \|
	\| 5+ \| Consider reducing variants \|

	Why? More comparisons increase chance of false positives. You're comparing:
	- A vs B
	- A vs C
	- B vs C (sometimes)

	Apply Bonferroni correction or use tools that handle this automatically.

	---

	## Common Sample Size Mistakes

	### 1. Underpowered tests
	Problem: Not enough sample to detect realistic effects
	Fix: Be realistic about MDE, get more traffic, or don't test

	### 2. Overpowered tests
	Problem: Waiting for sample size when you already have significance
	Fix: This is actually fine—you committed to sample size, honor it

	### 3. Wrong baseline rate
	Problem: Using wrong conversion rate for calculation
	Fix: Use the specific metric and page, not site-wide averages

	### 4. Ignoring segments
	Problem: Calculating for full traffic, then analyzing segments
	Fix: If you plan segment analysis, calculate sample for smallest segment

	### 5. Testing too many things
	Problem: Dividing traffic too many ways
	Fix: Prioritize ruthlessly, run fewer concurrent tests

	---

	## When Sample Size Requirements Are Too High

	Options when you can't get enough traffic:

	1. Increase MDE: Accept only detecting larger effects (20%+ lift)
	2. Lower confidence: Use 90% instead of 95% (risky, document it)
	3. Reduce variants: Test only the most promising variant
	4. Combine traffic: Test across multiple similar pages
	5. Test upstream: Test earlier in funnel where traffic is higher
	6. Don't test: Make decision based on qualitative data instead
	7. Longer test: Accept longer duration (weeks/months)

	---

	## Sequential Testing

	If you must check results before reaching sample size:

	### What is it?
	Statistical method that adjusts for multiple looks at data.

	### When to use
	- High-risk changes
	- Need to stop bad variants early
	- Time-sensitive decisions

	### Tools that support it
	- Optimizely (Stats Accelerator)
	- VWO (SmartStats)
	- PostHog (Bayesian approach)

	### Tradeoff
	- More flexibility to stop early
	- Slightly larger sample size requirement
	- More complex analysis

	---

	## Quick Decision Framework

	### Can I run this test?

	```
	Daily traffic to page: _____
	Baseline conversion rate: _____
	MDE I care about: _____

	Sample needed per variant: _____ (from tables above)
	Days to run: Sample / Daily traffic = _____

	If days > 60: Consider alternatives
	If days > 30: Acceptable for high-impact tests
	If days < 14: Likely feasible
	If days < 7: Easy to run, consider running longer anyway
	```

Xet Storage Details

Size:: 7.07 kB
Xet hash:: e1fff3e828855c610484b327085d02e2327b5fc28b6c2f3bf4ad5b8b2274436d

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.