Definition
A/B testing (also split testing) is a controlled experiment that randomly divides traffic between two or more versions of a page, ad, or asset and measures which produces a statistically significant lift on a defined goal.
A/B testing replaces opinions with data. Two versions of a page (control and variant) get randomly assigned to incoming visitors; the version with higher conversion on the defined goal wins, provided the lift reaches statistical significance. Without significance, the result is a coin flip — the worst form of confidence.
The rigour matters. Tests called early on insufficient sample size cause the most expensive errors in marketing — teams adopt "winners" that don't actually win, then layer further tests on a false foundation. A test plan should specify the minimum detectable effect (MDE), required sample size, and test duration before the test starts, not after.
Origin
Roots in agricultural field experiments by R.A. Fisher (1920s), who formalised statistical testing methodology. The technique migrated to direct-mail and then web marketing through the 1990s and 2000s, with Google's 2008 launch of Website Optimizer popularising it for the masses.
How it works
- State the hypothesis — what change, what effect, on what metric, for what audience.
- Compute required sample size from baseline rate, MDE, and significance level (usually 95%).
- Build the variant; QA it across browsers and devices.
- Run the test for the planned duration — don't peek and call early.
- Analyse for statistical significance; check segment-level results for surprises.
- Ship the winner; document the learning; design the next test.
When to use it
Use when
- On any high-traffic, high-stakes surface where wording or design changes could meaningfully move conversion.
- When the team disagrees about a change. A test is cheaper than a debate.
- Periodically on existing winners — context drifts, audiences change.
Skip when
- On low-traffic pages where reaching significance takes months.
- For decisions where the cost of running the test exceeds the expected lift.
- When you can't act on the result. Test only what you'll ship.
Key metrics
- Statistical significance (commonly 95% or higher).
- Lift over control (relative improvement in conversion).
- Sample size achieved vs. required.
- Test win rate over time (across many tests, what % achieve significance).
Examples
- The A/B test ran for 14 days and crowned the shorter form by a 19% lift.
- Without A/B testing, you're just guessing with bigger spend.
- We tested 6 variants of the headline; the winner converted 31% above control.
In practice at Makreate
Makreate marketing engagements bake in A/B testing — landing pages, ad creative, email subject lines — so every win is statistically real, not anecdotal. A recent SaaS client had been running their landing page unchanged for two years. We tested headline-only variants over six weeks; three of five tests reached significance. The best variant lifted form fills 23%; the cumulative effect of the rolling tests was a 42% lift in qualified leads in 90 days.
Advertising →Common mistakes
- Calling the test before reaching statistical significance. Early calls are coin flips.
- Running a test without computing sample size first.
- Testing too many variants at once on too little traffic.
- Ignoring segment-level results. A flat overall result can hide segments where the variant won decisively.
- Not documenting learnings. Repeating tests is waste; building on prior wins is leverage.
Frequently asked
How long should an A/B test run?
Long enough to reach the planned sample size and at least one full business cycle (usually 1–2 weeks). Shorter tests miss day-of-week effects.
What significance level should I use?
95% (alpha 0.05) is the default. Higher (99%) for high-stakes decisions; lower (90%) for cheap, fast iterations on low-stakes surfaces.
Can I test multiple things at once?
Yes — multivariate testing — but it requires much more traffic to reach significance per combination. Most teams should master sequential A/B before MVT.