How Do I Determine the Right Sample Size for A Survey?

Ask any marketing researcher how large a survey sample should be, and you’re going to hear a range of numbers and responses that will probably include:

  • “267”
  • “385.”
  • “601.”
  • “1068.”
  • “Around 2400”
  • “That depends – what’s your desired margin of error?”
  • If the person is more academically-trained: “That depends – what sort of power are you looking for in your results?”
  • Or, if the person is fairly new to the field: “That depends – what’s your desired margin of error, desired confidence level and anticipated variability?”

Non-researchers are often befuddled by this range. After all, where do these numbers come from, and why do some researchers respond with numbers while others respond instead with follow-up questions?

The reality is that estimating an appropriate sample size is contingent on a lot of different things about which many researchers make quick assumptions based on their experience. Some of those assumptions are practical – what a study will cost to conduct, the availability of potential respondents, the researcher’s best judgment on a sample’s ability to provide value for the cost and time, and so forth. Others are based more in a researcher’s plan for analysis – the more complex the plan, the greater the sample size required.

assorted-color stone lot
Photo by Eric Prouzet on Unsplash

Before we dive into that, let’s begin with a look at the rule of thumb suggestions which result in those precise numbers researchers often quote. Those numbers describe a random sample required for a desired margin of error that includes three additional assumptions:

  • A large population (at least in the hundreds of thousands, if not millions)
  • A confidence level of 95% (conventional for most quantitative survey research)
  • A maximum level of variability in the findings

We’ll explain what all of this means in a moment, but let’s address a bigger concern first: are these assumptions reasonable? And with all honestly, the answer is “most of the time, yes,” because these assumptions are often used in determining the sample size for a consumer sample from a population that’s at least the size of a city. What’s important to understand is that these assumptions tend to overestimate the size of the sample required rather than to underestimate it, and that’s why the determining factor tends to be the acceptable margin of error, which dictates an appropriate sample size under these assumptions in the chart below:

Recommended Sample Size by Various Margin of Error Levels

+/- Margin of Error % Recommended Sample Size
+/- 10%97
+/- 9%119
+/- 8%151
+/- 7%196
+/- 6%267
+/- 5%385
+/- 4%601
+/- 3%1068
+/- 2%2401
+/- 1%9604
(assuming 95% confidence level, maximum variability, large population)

But are these recommendations perfect? Nope, and I once had a statistics professor who went into considerable detail explaining what he saw as the folly of using pre-determined sample sizes like these to begin with, in large part because he felt the assumptions were flawed. One has to admire the passion that academics can generate when they really get going on a topic they know and love, and it’s always fun to listen to someone knowledgeable tell you why the conventional wisdom isn’t as wise as it first appears.

Even so, there’s a practical reason why these numbers tend to be highly used, and that’s because, generally speaking, they work. And while my statistics professor found the idea disagreeable, many academics and practitioners alike use with these numbers and assumptions as a starting point for estimating the sample sizes they require for their own research.

But let’s spend a little time unpacking what these terms and assumptions all mean, because there are reasons to adjust things in scenarios such as:

  • The population is small, but highly-qualified, which means a smaller sample may be all that’s required
  • The population is homogeneous, which means most people will answer similarly to their peers, also allowing for a smaller sample
  • The sample will be stratified, which means that there needs to be enough representation of the different strata to allow for meaningful comparisons, potentially necessitating a larger sample
  • The research is being used to minimize variation, which means the confidence level will be increased and the sample size will need to be larger
  • The research is being used to calculate power, which may require a different sample size than what’s recommended above.
  • The research is utilizing a non-probability-based sample, in which case these numbers are absolutely meaningless because the sample can’t be projected to a population

“Yikes!” you might be thinking. “That’s a lot of statistical jargon.” And you’d be right, because proper sampling requires a fairly strong understanding of probability and statistical analysis. This is one reason why it’s good to consult someone with strong statistical skills before conducting a study – a simple oversight on your part could seriously harm your data’s validity (how close it is to what you hoped to measure) or reliability (how repeatable your results are).

multicolored candy worms
Photo by Sylvanus Urban on Unsplash

So, here’s a quick primer on much of the terminology you need to know to better understand how to draw an appropriate sample.

IMPORTANT NOTE: In this article we are discussing the selection of a random (also known as probability-based) sample. For a non-random (also known as non probability-based or convenience) sample, the considerations are quite different but can generally be boiled down into one statement: “the bigger, the better.”

Population

This is the group of people you are hoping to be able to be able to describe with your survey data. It’s also sometimes called a “universe.”

Populations are generally defined by some very basic characteristics like “general consumers” or “women aged 18-65,” which is a tremendously bad habit many marketers and marketing researchers have grown accustomed to through syndicated research. A good population definition should be precise.

One of the keys for drawing a probability-based sample from a population is that you need to have a master list of every single member of the population (also called a sample frame) in order to ensure that every person has an equal chance of being selected. Simply put, you can’t survey “general consumers” unless you have a list somewhere of every consumer, but you can survey a randomly-selected sample from a list that contains what you believe to be “general consumers.”

Where you have to be careful is in generalizing that the population defined by that list is the same as the population you’d like to describe. This is one reason why researchers tend to provide detailed information about where their sample actually came from. It’s also one of many reasons why two studies with the same “population” might have findings which are contradictory.

Margin of Error and Confidence Level

The margin of error is a range under which most of our responses will fall within a certain confidence interval (often reported in a percentage called the confidence level). Anything that falls outside that margin of error is what we’d classify as an extreme result, and thus the greater the amount of confidence we want to have that our findings are not extreme, the greater the sample size we’ll need to maintain the same margin of error.

The confidence level can be set at any number you like. Three conventional numbers are 90% (1 in 10 survey applications will result in an extreme finding), 95% (1 in 20) and 99% (1 in 100). Most researchers use 95% for survey research because it’s conventional, but it’s by no means a rule or even necessarily good practice.

The margin of error is the range of deviation from the reported statistic, generally expressed as a percentage but sometimes as a whole number with a decimal attached. The reason the margin of error exists is because a statistic is not a real number; it’s an approximation of one, but it’s fuzzy, like a point moving back and forth on a line:

A statistic isn’t a hard number like the amount of money in your bank account; it’s a range of probable numbers represented by a midpoint.

Practically speaking, we care about the margin of error when we’re trying to make comparisons between two data points. If we expect the differences to be pretty large, a wider margin of error is acceptable. If we expect the differences to be very small but meaningful (such as in a political race), we need a smaller margin of error.

It’s conventional to utilize a margin of error between 2-6% for a sample and to plan on subgroups within your sample having larger margins of error, which means it’s more difficult to make comparisons within slices of your sample. Researchers generally like to have enough sample available to overcome variability within the subgroups which require comparison, and that’s where our next term really comes into play.

Sampling Variability

Flip a coin and you’ll have an equally likely chance of receiving a result of “heads” or “tails.” Roll a six-sided die and you’ll have an equally likely chance of receiving an even number as you will an odd one. Deal from a shuffled deck of standard playing cards and you’ll have an equally likely chance of receiving a black card as you will a red one.

These are all situations that describe what we’d refer to as maximum variability, where the outcome of a random process is 50/50. It’s like asking respondents in a general population survey if they are male or female – as long as your sample is randomly-selected, you should have half of the sample report they’re male and the other half report they’re female. You cannot get more variable than this; the data will sort into two even-sized groups as your sample grows. We would describe this sample as heterogeneous, which means that if we were to look at one random element of this sample, it would be likely to significantly differ from other random elements we might pull out.

By contrast, let’s imagine that you decide conduct a survey with adults who work as professional teachers. We would expect – and find – that a random sample would be comprised of around 77% women and 23% men. In terms of gender distribution, this sample would have a lower level of variability and we would describe it as being more homogeneous in this characteristic. One benefit of having a lower level of variability is that we get a tighter margin of error than we would with a high level of variability – or, if you prefer, we can have a smaller sample and have the same margin of error as we might for a larger, more heterogeneous sample.

Before you get too excited about reducing sample size, however, keep in mind that unless a population is well-understood (which usually means a lot of research has been conducted on it already), researchers generally advise an assumption of maximum variability. In data analysis, there’s always safety in larger samples, and one benefit of having a greater level of response is that the research team can conduct more sophisticated forms of analysis.

Power

We’ll cover statistical power in much more detail in a future article, as it’s an area of statistics that’s woefully underused in marketing research practice despite being extremely useful. Where we’re concerned about power today is how it’s used to dictate sample size, and it’s particularly important to consider when we’re talking about small sample studies where the population is highly-qualified and we want to express our confidence we have in our conclusions.

If you’ve taken a statistics class, you are probably familiar with hypothesis testing, where a statistical test is utilized to accept or reject a null hypothesis in favor of an alternative hypothesis. These tests are centered around the probability of committing a Type I error (false positive). Power, by contrast, is a calculation of the probability of not committing a Type II error (false negative), and it’s often used to confirm that we reject the null hypothesis appropriately.

Conventionally, statisticians prefer for studies to have a statistical power of at least 80%, which means that statistical tests should have at least an 80% chance of being true. (The higher the power, the lower the chance of a false negative). But in order to achieve that level of power, we need to have an appropriate sample size.

Let’s put this into more practical language. Whereas researchers will run a hypothesis test after collecting data, researchers typically conduct a power analysis before collecting data to ensure that what’s collected will have at least the minimum sample size required to meet the desired level of statistical power and thus have a good chance of being true.

What’s nice about power is that the minimal sample size requirements tend to be easily surpassed by a sample with presumed maximum variability, a confidence level of 95% and a margin of error in the 2-6% range.

But when samples want to compare subgroups within a population, it’s still wise to go ahead and perform a power analysis to ensure that these subgroups will have a sufficient sample size for comparison. A good power analysis may lead to the conclusion that a larger sample or a stratified random sample is required to ensure that each subgroup has sufficient representation.

Putting it All Together

Now that we have a rough understanding of many of the considerations that go into creating a sample size plan, let’s reconsider why researchers tend to frame sample size as primarily being governed by the desired margin for error (given the assumptions we already discussed).

First of all, it makes it easier to create a sample plan and focus on other aspects of the research like survey design. Rather than spending days or weeks trying to optimize the sample size, the researcher is instead free to begin collecting data and start analysis.

Second, conventional approaches tend to be much more acceptable as evidence than unconventional approaches, and researchers have to spend a lot less time defending their methodological choices if they utilize a sample size that is broadly accepted as sufficient. Having an insufficient sample size is often the kiss of death for most quantitative studies or anything using an experimental design, but having a random sample that is large enough to be described as “robust” often aids in establishing the credibility of the research.

Third, margin of error tends to provide a good approximation for a finding’s capability to reflect truth. While the margin of error isn’t anywhere close to being a true measure of how reliable or accurate a study is, it does provide readers with a quick understanding of how fuzzy a statistic or finding from a study may be. In a study with a margin of error of +/- 6%, a finding of “50%” could range anywhere from 44% to 56%, which means it has very limited utility for comparison to other data. But in a study where the margin of error is 3%, that range shrinks down from 47% to 53%, which makes comparisons much easier to make. The smaller the margin of error, the larger the sample, which often means the clearer the results.

Finally, margin of error also provides a good understanding of a data set’s eventual ability to allow a researcher to dig deeper. If a study only includes 200 respondents and there’s a desire to drill down on a subgroup of that sample that only makes up 10% of the population, the estimated margin of error for that subgroup jumps dramatically from close to 7% to well over 15%. It’s possible the actual margin of error may be lower, but if that group’s really important to us, we won’t want to take a chance on having too small of a sample and we’ll either boost our sample size or adjust our sampling plan accordingly.

We may also find that the sample we’re seeking for this subgroup isn’t available, which will lead us to adjust our analysis plan to ensure we’re not overly relying on this group as a key component of our research!


We hope this article has been helpful to you, and we want you to know that we’re here to be a resource however we can be on anything you’d like to know about marketing research!

Please feel free to check out our other articles, watch our Youtube Channel, connect with us on LinkedIn or Facebook or to contact us. We’d love to hear from you!

If you’d like to play around with a good sample size calculator, try this one. And if you’d like to play around with a Power calculator, here’s a good one. Enjoy!