I took a brief vacation to somewhere warm

Divorcing data collection from data analysis, slightly

Jan 6, 2026

Well, 2026 didn't waste any time becoming a mess, didn't it?

This year I wanted to start things off with a fundamental question I want to noodle on for at least a while – What does it mean to collect good data? How does one learn how to do that? What things do we need to keep in mind? All within the context of understanding human users. If I don't at least restrict the scope somewhat to these industry concerns I'd probably find myself mired in data collection practices for particle physics or something.

The reason I'm asking this question now is because on break I was poking at describing what data collection is in my head and realized that I was handwaving over a huge amount of "stuff". Many of the details of data collection are (at least in my mind) entangled with the direct methods I use each day to the point where I have trouble separating that step from everything else. In a way it's like when I was poking at the concept of data cleaning and realized it was just analysis by another name.

My main question is, if I wanted to teach a new researcher how to collect data to answer questions in this modern day and age, what should they need to know? How should it be framed?

I'm positive I'm not going to come to any concrete answers in the short span of a couple of posts or a few hours, I'll chip away at things as the year goes on. But I do want to work on this.

"Quant data" and "qual data"?

Sometimes while working I'm asked about the difference in working with quantitative and qualitative data as of they're completely distinct things. This is probably a quirk of how the UX research space seems to divide research work into quant and qual sub-fields (though line between the two can be blurry depending on who you ask). Qualitative research methods primarily focus on highly detailed, rich studies involving a small sample of users, while the quantitative researchers work primarily with larger collections of data, statistics, and computers. For what seems to be historical reasons, some places put "surveys" into the quant bucket, while others put it into the qual bucket or completely separate survey bucket.

But once you get into the weeds working with experts on both sides of the line, the line barely seems to exist. Very naturally, while trying to figure out what users are experiencing and how to test, measure, and explore ideas, it all just becomes "data" (in a broad, non-technical, sense) to be used to inform decisions. We even have statistical tools to work with very small sample sizes to draw certain kinds of inferences, so "small data" is still... data. Some questions are best answered from one side or the other, but neither really provide a complete picture on their own.

While the handling and application of data collected from qualitative methods is completely different from quantitative methods, I've been wondering how far does the differentiation start? Is it that we have "quant-coded questions" and "qual-coded questions"? That seems weird already.

For one thing, consider a universe where we had infinite time and resources. For any research question, we can run a "qualitative study" but at "quantitative scale" – imagine doing thousands of observational studies and rich interviews following clear protocols. We'd have videos, notes, coding books, and infinite manpower to go back and codify all that information. Wouldn't we be able to take all that rich data and find ways to apply quantitative methods to it? With infinite research help, it'd be easy to ask "how many participants experienced this problem?" and even do statistical analysis on such questions. We could set the unit of analysis and definition of "the problem" at any level we wanted. We could go back and put in demographics, count different things, re-visit new questions, and largely do post-hoc p-hacking to find all sorts of potentially spurious signals. But all of this would give us information to use to help make our decision or at least direct our next study. We only don't do things this way for earthly, practical reasons.

So to go back to the original question, what central concepts of data collection remain the same independent of analysis method? Before we go off on a litany of different methods and the specific concerns within them, there's got to be a common starting point. Ideally, the data we collect must somehow...

  • Be honest and true
  • Be unbiased/complete
  • Provide insight into the question we want to answer
  • Fairly represent the population we're studying
  • Be in usable form for our subsequent analysis

I'm sure many of the issues of "validity" that we're concerned about in research, such as construct validity, ecological validity, etc. build their foundations off what data is collected as part of a study. Much of the design of studies is focused on making sure all the validity concerns are addressed every step of the way from data collection to analysis to presentation of results, with the idea being that if we get everything 'right' then our findings should be generalizable to the broader population.

Anyways, I really don't want to just cop out and say "well it just depends on your method!" because the foundations of how we construct truth must preexist all our methods. The list of methods and "how do you collect data to do X" probably comes soon after, but thinking through what it means to collect data well comes first.

I also need to look up references to this. Been a long while since I've engaged with philosophy of science materials, so it's going to require de-rusting as I work through everything. If you happen to have favorite references on the topic, send them to me!


Standing offer: If you created something and would like me to review or share it w/ the data community — just email me by replying to the newsletter emails.

Guest posts: If you’re interested in writing something, a data-related post to either show off work, share an experience, or want help coming up with a topic, please contact me. You don’t need any special credentials or credibility to do so.

"Data People Writing Stuff" webring: Welcomes anyone with a personal site/blog/newsletter/book/etc that is relevant to the data community.


About this newsletter

I’m Randy Au, Quantitative UX researcher, former data analyst, and general-purpose data and tech nerd. Counting Stuff is a weekly newsletter about the less-than-sexy aspects of data science, UX research and tech. With some excursions into other fun topics.

All photos/drawings used are taken/created by Randy unless otherwise credited.

Supporting the newsletter

All Tuesday posts to Counting Stuff are always free. The newsletter is self hosted. Support from subscribers is what makes everything possible. If you love the content, consider doing any of the following ways to support the newsletter:

  • Consider a paid subscription – the self-hosted server/email infra is 100% funded via subscriptions, get access to the subscriber's area in the top nav of the site too
  • Send a one time tip (feel free to change the amount)
  • Share posts you like with other people!
  • Join the Approaching Significance Discord — where data folk hang out and can talk a bit about data, and a bit about everything else. Randy moderates the discord. We keep a chill vibe.
  • Get merch! If shirts and stickers are more your style — There’s a survivorship bias shirt!