Marketers, How Valid Is Your Test? Hint: It's Not About the Sample Size
A client I worked with years ago kept fastidious records of test results that involved offers, lists, and creative executions. At the outset of our relationship, the client shared the results of these direct mail campaigns and the corresponding decisions that were made based on those results. The usual response rates were in the 0.3 to 0.5% range, and the test sample sizes were always 25,000. If a particular test cell got 130 responses (0.52%), it was deemed to have beaten a test cell that received 110 responses (0.44%). Make sense? Intuitively, yes. Statistically, no.
In fact, those two cells are statistically equal. With a sample size of 25,000 a 0.5% response rate, your results can vary by as much as 14.7% at a 90% confidence level. That means that there was a 90% chance that the results from that test could have been as much 0.55% or as little as 0.43%, making our test cell results of 110 responses (0.44%) and 130 responses (0.52%) statistically equal. I had to gently encourage the client to consider retesting at larger sample sizes.
There are statistical formulas for calculating sample size, but a good rule of thumb to follow is that with 250 responses, you can be 90% confident that your results will vary no more than +10%. This rule of thumb is valid in any medium — online or offline. For example, if you test 25,000 emails and you get a 1% response rate, that’s 250 responses. Similarly, if you buy 250,000 impressions for an online ad and you get a 0.1% response rate, you get 250 responses. That means you can be 90% confident that (all things held equal) you will get between 0.9% and 1.1% in the email rollout, and between 0.009% and 0.01%, with a continuation of the same ad in the same media. (Older editions of Ed Nash’s "Direct Marketing — Strategy, Planning, Execution" contain charts that you can reference at different sample sizes and response rates).
A smaller number of responses will result in a reduced confidence level or increased variance. For example, with a test size of 10,000 emails and a 1% response rate (100 responses), your variance at a 90% confidence level would be 16%, rather than 10%. That means you can be 90% confident that you’ll get between 0.84% and 1.16% response rate — with all things being held equal. Any response within that range could have been the result of variation within the sample.
Marketers are not alone in using their gut rather than statistics to determine sample sizes. Nobel Laureate Daniel Kahneman confesses in his book "Thinking, Fast and Slow":
“Like many research psychologists, I had routinely chosen samples that were too small and had often obtained results that made no sense … the odd results were actually artifacts of my research method. My mistake, was particularly embarrassing because I taught statistics and knew how to compute the sample size that would reduce the risk to an acceptable level. But I had never chosen a sample size by computation. Like my colleagues, I had trusted tradition and my intuition in planning my experiments and had never thought seriously about the issue.”
The most important takeaway here is that the validity of a test is not tied to the size of the sample itself, but rather to the number of responses that you get from that sample. Choose sample sizes based on your expected response rate, not from tradition, your gut or convenience.