What is statistical significance in A/B testing?

Question

Nicola Lazzari · Accepted Answer

Direct Answer Statistical significance tells you whether the difference between your A/B test variants is real or just random chance. It's typically expressed as a p-value (probability value), where p Why It Matters Without statistical significance, you can't tell if a 10% conversion lift is real or just noise. Running tests without proper significance checks leads to false positives—thinking a change worked when it didn't—which wastes resources and can hurt your business. Statistical significance gives you confidence that your results are reliable and actionable. How It Works Statistical significance testing compares your observed results against a "null hypothesis" (the assumption that there's no real difference between variants). The process: Collect data: Run your test until you have enough sample size (visitors/conversions) in each variant Calculate p-value: Statistical tests (like chi-square or t-test) compute the probability that the observed difference occurred by chance Interpret results: If p Common Thresholds p Standard threshold for "statistically significant" - 95% confidence the result is real p Higher confidence - 99% sure the result is real (used for high-stakes decisions) p > 0.05: Not significant - the difference could be random chance Sample Size Requirements Statistical significance depends on having enough data. Small sample sizes make it harder to detect real differences. As a rule of thumb: Low-traffic sites: Need 1,000+ conversions per variant to detect 10%+ lifts High-traffic sites: Can detect smaller lifts (5-7%) with 5,000+ conversions per variant Effect size matters: Larger expected lifts require smaller sample sizes Tools like Optimizely's sample size calculator help you determine how long to run tests before checking significance. Common Mistakes Peeking early: Checking results before reaching minimum sample size inflates false positive rates Multiple comparisons: Testing many variants simultaneously without adjusting for multiple comparisons increases false discovery Stopping too early: Ending a test as soon as p Ignoring practical significance: A statistically significant 0.1% lift might not be worth implementing Best Practices Pre-calculate sample size: Use a calculator to determine how long to run tests before starting Set significance threshold upfront: Decide on p Wait for minimum sample size: Don't check results until you've reached the calculated minimum Consider practical significance: A 2% lift might be statistically significant but not worth the implementation cost Use sequential testing: Platforms like Optimizely use sequential testing methods that allow safe early stopping Example: Conversion Rate Test You test a new checkout button color. After 10,000 visitors per variant: Control: 500 conversions (5.0% conversion rate) Variant: 550 conversions (5.5% conversion rate) P-value: 0.03 (statistically significant) Since p Takeaway & Related Answers Statistical significance is essential for reliable A/B testing. Always wait for adequate sample size, use proper significance thresholds (p A/B Testing Statistical Significance Guide How do I choose an A/B testing consultant? Browse more experimentation Q&A

What is statistical significance in A/B testing?

Direct Answer

Why It Matters

How It Works

Common Thresholds

Sample Size Requirements

Common Mistakes

Best Practices

Example: Conversion Rate Test

Takeaway & Related Answers

Related Resources

Related Articles & Guides

Want to go deeper?