Does a Survey Really Need to Collect Demographic Data?

It’s rare that a day goes by when I don’t receive at least one survey in my email box, and I’m still amazed that after over a decade and a half of online surveys being fairly standard practice, a number of organizations are still sending me surveys that:

  • Are way too long
  • Ask poorly-worded questions that are difficult to answer truthfully
  • Aren’t well-matched to my experience with the company
  • Don’t seem to reflect any knowledge of previous feedback I’ve given
  • Include a bunch of questions about my demographic information

“Ah, I was with you until that last one,” you might be saying. “Everyone knows that asking for consumer demographic details is the conventional way to end a survey. And besides, that information’s useful for all sorts of things.”

And to be honest, a few years ago, I would have agreed with you. But as we move into the third decade of the 21st century and we continue to hear that surveys need to be short, focused and personalized to be successful, I can’t help but wonder – is all that demographic detail really so important?

To define what we mean before we begin discussing that question, demographic data are generally collected from questions that inquire about personal details such as a respondent’s:

  • Age
  • Gender
  • Marital status
  • Size of household
  • Number of children at home
  • Race / Ethnicity
  • Education
  • Income

These questions are usually placed at the end of a survey (unless they’re being used as screening or filtering criteria, in which case they’re placed at the beginning) because surveyors have found in decades of practice that surveys tend to have higher levels of completion when the easiest and most personal questions are at the end instead of the beginning of a survey.

Traditionally, demographic data have been used to slice and dice data sets into subgroups that can be examined in crosstabs and banner tabs for comparison. They’re sometimes used for segmentation or identification of cohort groups, and more recent data tools allow demographic data to be represented using tools like interactive maps (though this comes with a pretty big limitation many researchers fail to regard.)

On the other hand, because they tend to be nominal or light ordinal, these data aren’t useful for many sophisticated forms of analysis unless they’re transformed in some way (such as collapsing race into a “white” and “non-white” binary variable), and because they’re self-reported and can be perceived as intrusive, they’re not tremendously reliable.

(Age and income are two variables in particular where consumers are known to fib or refuse to answer, and even with carefully-constructed questions designed to elicit a more truthful response, such as “What year were you born?” or the ill-advised, “What was your Adjusted Gross Income last year?”, it’s hard to verify that the information provided is accurate.)

Let’s consider some common arguments that we might offer for collecting demographic information, and why those arguments might not be as important today as they were in the past.


AN ARGUMENT FOR IT: Having demographic information allows us to validate our data versus an index such as US Census Data. This argument makes the most sense when we’re drawing a general consumer sample and we want to ensure that what we obtained was representative. For example, if our national sample is 70% male and only 2% non-white and we were trying to get a sample that was representative of the entire nation, we’ll know that the data are flawed and we need to return to the field with some quotas in place to even out our response data. (If our sample’s large enough, we can also weight our sample by one or more demographic variables to even things out.)

AN ARGUMENT AGAINST IT: Most of the demographic data you’ll collect aren’t useful for index comparisons because the constraints of the study and margin of error will make it difficult to make useful comparisons in the first place. What I’ve found in practice is that many researchers simply eyeball their demographic data to ensure that they see an expected distribution of a handful of variables. (Three common variables that get eyeballed: Age, Gender and Race/Ethnicity)

This practice made sense back in the days of mail or email surveys where a very low response rate was expected. But in the era of utilizing sample quotas and online panels, it’s not necessary.

So why not just set a clear population definition around these variables (along with quotas, if needed), ask the most important demographic questions in the screener and skip the exercise of collecting the rest of the information altogether? That ensures the sample matches known parameters and requires less validation after the data have been collected.


AN ARGUMENT FOR IT: Demographic information is useful for profiling a data set to see if different subgroups within our population hold different views. This argument makes sense when we believe that we might find differing attitudes among different cohorts who are easily identified by demographic profiles – for example, we might hypothesize that within our sample, young singles might feel very differently about a product than middle-aged folks who are married or divorced. Having a broad swath of data allows us to identify these cohorts and use statistical tests to compare them.

AN ARGUMENT AGAINST IT: Most of the time, differences that seem to appear between demographic groups are due to stage of life rather than distinct demographic criteria. It sounds very smart at a presentation to say something like, “a test of statistical significance with a p-value of .001 finds that young urban single females are more likely to say they’ll consider our product than middle-aged suburban moms,” but that may just simply be because the younger group is still experimenting with products in the category while the older group has settled in on a favorite.

If understanding this is an aim of the study, such a finding is appropriate. But if understanding this is something that arises from the research team working frantically to find some hidden “insight” in an otherwise inconclusive data set, beware, because chances are good the difference is slight, and just because a researcher determines something is “statistically significant” doesn’t mean it actually matters.

And once again, if comparing two cohorts matters, that should be baked into the population definition and the screener so that appropriate quotas can be set to ensure each cohort is sufficiently sampled. Collecting a bunch of demographic data at the end of a survey and then slicing and dicing a data set by arbitrary breakouts until some interesting finding shows up is not only bad practice, but tends to lead to conclusions that mistake statistical noise for actual findings.


AN ARGUMENT FOR IT: Demographic data allows us to profile personas or segments within our population and target them. This argument does have some merit if the purpose of a study is to conduct a cluster analysis using attitudinal variables and then use demographic and VALS (values, attitudes and lifestyles) data to develop profiles that can provide a rich description of consumer segments. But you’ll note that once again, the design of the research dictates the need for these data.

AN ARGUMENT AGAINST IT: Clustering to identify segments or personas is a very popular tool for identifying target markets or niches, but demographic data are often misused and incorrectly extrapolated to suggest a lot more certainty than the data actually suggest. Most marketers have heard a persona story about someone like, “Taylor, who is 29, loves to spend time with her friends and works full-time in marketing” or “Mark, who is 35, dedicated to Crossfit and who works full-time as a serial entrepreneur with a side hustle as a podcaster.” But these stories are often derived from some sketchy data that show a slightly higher incidence of once characteristic in group A than group B, not a defining feature of group A. Often, an analyst flags the differences between groups, and an imaginative mind comes up with the remaining details.

The truth is, demographic data are often poorly-suited for segmentation profiles because they tend to provide distorted reflections of deeper characteristics which are more often rooted in stage of life or frequency of use. For example, a study might identify two consumer segments that seem to be very different because:

  • One is 55+ with higher average incomes and education and the other is 25-34 with lower average incomes and education.
  • The 55+ segment tends to be made up of marrieds and divorceds with no kids living at home, while the 25-34 tends to be made up of singles and marrieds with about half of households having at least one child.
  • The 55+ segment tends to have stronger brand loyalty, while the 25-34 segment tends to be more highly focused on value and is open to trying alternatives brands
  • The 55+ segment tends to express stronger opinions about brands within the category than the 25-34 segment.

But before we begin identifying these as segments, let’s instead think about what we’re being told here. One group is further along in life, more established and more fiscally stable. The other group is just starting out and still is establishing preferences and habits. All we’ve really uncovered here are the distinctions between empty nest retirees and young married couples who are starting families. What’s more, when we scrutinize the data, we discover that while there are significant differences in variables such as education and income, they’re essentially things like “55% of the 55+ group has a college degree or higher, versus 47% of the 25-34 group.” There’s a difference, but it’s not broadly descriptive of the entire segment.

Any “insights” we can glean from this aren’t dependent upon demographic data; they’re far more soundly shaped by stages of life such as:

  • Young and unmarried adults (college-aged)
  • Young settled adults (married or in a long-term relationship)
  • Parent of young child
  • Parent of teenager
  • Empty nester / grandparent
  • Retired
  • Geriatric

While these stages aren’t strictly linear and sometimes overlap or follow unpredictable patterns with some individuals (such as the retired grandfather who remarries a much younger woman and becomes a parent again), they’re generally descriptive of most people and tend to be a stronger predictor of behavior and attitudes because they reflect a lifestyle rather than arbitrary attributes of that lifestyle.

These stages also account for another issue that’s problematic in demographic data: it’s not standardized across regions. For example, incomes in coastal urban and suburban areas tend to be far higher than incomes in most Midwestern areas because the cost of living is higher in densely-populated areas. Taxes also vary from state to state, meaning incomes aren’t directly comparable unless you’re looking at take-home pay. What really matters to a marketer is disposable income, not total income. Attempting to standardize these data with arbitrary assumptions is difficult, and even using government-generated figures as indices is fraught with issues.

Stages of life, by contrast, are fairly consistent across regions and far more comparable. While some regions are going to have higher incidences than others of particular stages of life, the actual behavior and attitudes will be similar, and that’s much more useful for a marketing segmentation strategy.


AN ARGUMENT FOR IT: Demographic data are something our organization is accustomed to having and it allows us to drill down on our target market because we define it demographically. For example, an organization might define its target market as “Women aged 25-65” and want to see how the target market fares demographically in other areas and compares against general consumers.

AN ARGUMENT AGAINST IT: One of the worst reasons to include questions in a study is because “we’ve always done things that way.” If the information is serving a specific strategic purpose, collect it! But if the target market is what’s important, it’s best to just screen for it specifically and collect the information you actually need.

There’s usually very little value in looking at markets outside a defined target unless the scope of the study specifically includes identifying new opportunity markets, and the notion that an organization might want to compare a target market against the broader population sounds nice in theory, but it’s rarely useful and is often overlooked.

To use an example, let’s say that a women’s’ apparel company decides to conduct a brand awareness study but does so with a general consumer sample with the intention of drilling down on the target market using demographic data. Around 50% of the data is going to come from men, and because the product is not targeted towards them (or probably even on the radar for the vast majority beyond perhaps a very general level of recognition), it’s largely going to be useless. Trying to compare the relevant data from the female group versus irrelevant data from the male group doesn’t offer any useful information, and it may even lead to false impressions of the data if someone within the organization reads it incorrectly as being competitive in some way. It’s far better not to waste the time, effort and expense of collecting it in the first place and to instead focus on the group that actually matters.


We hope this article has been helpful to you, and we want you to know that we’re here to be a resource however we can be on anything you’d like to know about marketing research!

Please feel free to check out our other articles, watch our Youtube Channel, connect with us on LinkedIn or Facebook or to contact us. We’d love to hear from you!