Surveys aren't getting easier to do

Feb 24, 2026

OK, it is completely BS that just when the snow from the last giant snowstorm was just about melting off the yards of houses nearby that a blizzard dumps 14-24 inches of snow on everything again. Anyways I am typing this with extremely tired arms.

This week is a reminder of how surveys are, to paraphrase what my friend Chris Chapman often said, a form of motivated communication. Through various mechanisms and reasons, people choose to engage with a survey to communicate something for their own reasons and not those of the researcher. And that is the case even in the best, most academically honest and rigorous situations. Things go even more off the rails when someone along the way has an axe to grind.

Today, I happened to come across a post on my Bluesky feed by Conrad Hackett, who had published a piece at the Pew Research Center about how recent surveys that were being cited as being evidence of there being a Christian revival among young adults in the U.K. couldn't be replicated in random sample surveys.

Now, as an agnostic, I have little interest in most things involving religion, but I had clicked through to the main post because I wanted to know what was going on with the surveys.

The TL;DR explanation is that certain Christian church affiliated groups did online opt-in surveys (and also a presumably neutral survey firm YouGov) all were making claims that their respondents were showing a large bump in increased religious activity amongst adults in the 18-24/18-40/18-34 age ranges. Meanwhile, surveys that were random samples instead of opt-in, where respondents were selected via mail or other methods, were showing no such increase.

Now, other sociologists and researchers had been calling out these studies that point to a "resurgence" but the article mentions that news reports and discussion tends to still take the findings at face value despite the issues. Moreover, since all those studies don't release their data, it's impossible to know what is going on under the hood.

Surveys are getting harder

To be clear, surveys were never easy – there wouldn't be whole armies of researchers who dedicate their careers to refining the methodology if it were easy. But in the past 20-ish years, I don't think I've seen anyone make too many loud claims that things have gotten easier. I suppose ease of calculation and better data collection technology through the Internet has made the mechanics of running a survey less tedious. I certainly don't envy the folks generations ago who had to mail out surveys and then manually tabulate handwritten response back and then tabulate the statistics with the aid of human "calculators".

But the overwhelming consensus I hear is that things have gotten harder. No one's got landlines to randomly pull out of a phone book any more. People have gotten more suspicious and overloaded so response rates to everything has dropped. And now, I've directly heard from many industry practitioners that LLMs have dialed up the "bots/mechanical turks finding my paid survey link and eating all the budget unless I do fraud detection" issue that had always existed to whole new heights. And this is mostly the badness that can happen to any survey without biased intent.

But imagine if you're at an organization that wants to tip the scales in favor of a particular outcome. The headaches and problems of running a fair survey now become attack vectors. Obviously, if the biased organization is running the survey, they can easily affect the results by messing with the sampling. Opt-in samples that essentially spread by word of mouth across networks are already prone to bias, so some strategic manipulation of who gets to see the opt-in can easily tilt things. There's all sorts of other methodological tweaks that could also be done under color of proper rigor, like dropping certain "outliers/incomplete surveys". This is especially easy if you don't release the data for public inspection.

But imagine you've got access to the survey link and want to tip the scales as a third party. Well, it used to be that you'd need access to an army of either volunteers or paid hands to fill out surveys in accordance to whatever outcome you want. It usually took a decent amount of organization and some money to do this. While mechanical turks aren't expensive, they're still not free. And there's always the risk that the MTs don't follow instructions and just answer randomly instead of in the direction you want them to. Now, LLMs have definitely changed that math. I've had coworkers report that compensated survey quotas were filled within days of launch, and the vast majority of the data was completely unusable AI slop from people who wanted to cash in on the relatively small amount they were paying. Moreover, these bots can take on personas now and answer consistently enough to be very difficult to detect, enough that people were starting to seriously re-think their recruiting protocols. It'd be just as trivial to create a bot army to sway any survey in any direction you wanted.

Oh, and you can vibe-code all the bot code now. 🤦

Maybe I should go back to shoveling...

Subscribe!

Chris Chapman last week had given a talk about how synthetic data generated by LLMs aren't even data, and the goes to give a lot of reasons for the conclusion. The video recording is below. While my opinions on this topic largely point in the same direction, I haven't given it nearly as much thought as he has.

Synthetic Survey Data? It’s Not Data

Many vendors claim that LLM “AI” systems can generate human-like survey responses quickly and cheaply using previous or sometimes concurrent training data. In this webinar, Chris discusses several fundamental problems that he believes should disqualify any such consideration. First, he proposes that there is no way to prove in advance that LLM data will be “good enough” for any particular question. Second, there is no way to compare LLM data to human data because the fundamental concepts of statistics do not apply to samples from LLM models. Third, he argues that the premise of synthetic data rests on a faulty assumption that the goal of survey research is to obtain an absolute “true answer” to a question. He concludes that, instead of searching for a nebulous true value, the purpose of survey research is to learn from people in real time. Chris Chapman PhD is the director of the Quant UX Association, after 24 years conducting research at Google (2012-2022), Microsoft, and Amazon. He is the co-author of several books including Chapman & Rodden, Quantitative User Experience Research (Apress, 2023) and the Quant UX Blog, https://quantuxblog.com.

Standing offer: If you created something and would like me to review or share it w/ the data community — just email me by replying to the newsletter emails.

Guest posts: If you’re interested in writing something, a data-related post to either show off work, share an experience, or want help coming up with a topic, please contact me. You don’t need any special credentials or credibility to do so.

"Data People Writing Stuff" webring: Welcomes anyone with a personal site/blog/newsletter/book/etc that is relevant to the data community.

I’m Randy Au, Quantitative UX researcher, former data analyst, and general-purpose data and tech nerd. Counting Stuff is a weekly newsletter about the less-than-sexy aspects of data science, UX research and tech. With some excursions into other fun topics.

All photos/drawings used are taken/created by Randy unless otherwise credited.

randyau.com — homepage, contact info, etc.

All Tuesday posts to Counting Stuff are always free. The newsletter is self hosted. Support from subscribers is what makes everything possible. If you love the content, consider doing any of the following ways to support the newsletter:

Consider a paid subscription – the self-hosted server/email infra is 100% funded via subscriptions, get access to the subscriber's area in the top nav of the site too
Send a one time tip (feel free to change the amount)
Share posts you like with other people!
Join the Approaching Significance Discord — where data folk hang out and can talk a bit about data, and a bit about everything else. Randy moderates the discord. We keep a chill vibe.
Get merch! If shirts and stickers are more your style — There’s a survivorship bias shirt!