Control Groups and Random Sampling
High ROI Customer marketing depends a lot on a tool known as control
groups. A control group is a random sample of the customers targeted
for some kind of test program.
Let's say that you wanted to test a discount mailing to your best
customers. You would select all these customers for your list, and
then take a random sample of them to exclude from the mailing - usually
anywhere from 3% to 10% of the total.
This group is known as the control group; the others who will receive the
mailing are the test group.
Why do this? Since the control and test customer groups are
exactly the same, you can compare the buying behavior of the test group
versus the control group over time to determine precisely what the effect of your
mailing is. Taking this approach screens out a lot of external noise
(like other promotions these groups may be exposed to) and gives you a
true read on your profitability.
Using control groups also allows for inclusion of typically high ROI halo
effects, which are rarely measured by most people doing
promotions. Halo effects occur when people
respond to a promotion outside of the business tracking process but are
"not counted" as having responded. For example, you send a
discount and the customer loses it but makes a purchase anyway because you
"reminded" them of a need they had. Typically, all
anybody measures is response, which does not give a true read on
You cannot measure halo effects without a control group. If you
aren't using controls, you are short-changing yourself, because the
promotion could be many times more profitable if you include the
Most controlled testing
in database marketing requires the creation of a random sample of your
customer base, either for the test group - targets receiving the mailing,
or the control group - those not receiving the mailing.
When you are testing new concepts, you usually don't want to
blow a whole bunch of money, so a random sample of the target group is
created for the mailing (test group), and the rest of the target group
acts as control. When you are going with proven high ROI concepts,
you want to mail as many pieces as possible (test group), so the random
sample is created to act as control.
For the first case, when testing new concepts, the larger the
random sample is on a percentage basis, the more accurate its predictive power will
be. You want the results of a test to be repeatable - if it works,
you want to do it again. The larger the sample is, the more likely
the results of the test can be repeated on the next mailing.
Three percent will give you a pretty good shot. Larger samples will cost more to mail but will add extra stability to the predictive
power of the sample; smaller samples could result in unstable predictive
power, for example, the promotion makes money the first time but when
repeated it loses money.
If you can afford it, go to 10%; 5% is good, but 3% is OK. The
smaller your database, the higher percentage you should take for a test,
in general, to even out the instability that comes from testing small
databases (under 5,000 customers). If you have only 1,000 customers, consider a 20% test, or if you
can afford it, run the test to every customer not in the control group.
In the second case, tracking proven high ROI concepts, the
larger the control group sample is, the more reliable and repeatable the
results of the promotion will be. Early on in the life of a
promotion, it is a good idea to use a "fat" control group, just
to make sure the ROI is tracking. Over time, you can reduce the size
of the control group when you are confident the results are stable.
These tests are extremely important events, as the information
gained is used extensively down the line. Don’t skimp on a test if you can help it. Also make sure the sample is truly random, and doesn’t introduce
any bias, meaning the sample is not truly random because the
selection methods used have distorted the selection process.
Here's an example of introducing bias during random sample selection:
Let’s say you have
1,000 customers, and they were consecutively assigned customer ID’s,
meaning you oldest customers have the lowest ID numbers. You want a 10% sample, or 100 customers.
Your customers happen to be sorted by customer ID, and you start choosing
customers with customer ID 1 and select every 5th customer. You would have the 100 customers you need by customer ID 500.
But your sample would be biased, because the customer group you
have selected has a higher percentage of old customers than the entire
The customer base was
sorted by ID, meaning your oldest customers have the lowest ID and newest
customers the highest ID. You stopped choosing at 500, instead of choosing through the entire customer
base; this creates the bias towards older customers.
If you had selected
every 10th customer instead,
you would have ended with your most recent customer and have an even
sample with no bias against representation by a particular customer group.
Bias can occur geographically, by product type, and so on. Be
careful with the way a database is sorted if you are using a "choose
every Nth customer" random selection technique.
A convenient way to
generate a random sample, if you use consecutively numbered customer
ID’s, is to pick a digit location from the customer ID, and specify a
value for it. Then choose
every customer with this value at the specified location in the ID.
You’ll get a 10% sample. For
example, “give me everybody whose customer number ends in “2” or
“give me everybody having a 4 in the second to last digit location”.
For this to work, you have to have at least one customer in the
next highest (to the left) digit location.
For example, if you have 5,349 total customers, you could use any
of the last 3 digit locations (left of the comma in 5,349) but not the
lead (left-most) digit location. Using
the left-most digit would introduce bias, since the selection would
complete halfway through the run, before a full 10% sample is taken.