A week ago I knocked a roof icicle off and it stabbed into the snow like this and is *still* there, largely unchanged.

Avoid the lure of working on metrics over constructs

Feb 3, 2026

We've had over 10 days where the temperature has not been at or above freezing and I am really tired of having to do various kinds of emergency hvac/plumbing work. Stay warm out there folks.

As I've previously mentioned, I've been making it a point to think harder about fundamentals surrounding data collection this year. One side effect of paying attention to this more is that I'm catching myself taking a mental shortcut that I probably should know better than to take all the time. The particular shortcut is when being asked to find a measurement for something, like say the perennial request for every product "customer satisfaction", or CSAT as many refer to it. When asked to measure CSAT, I immediately can summon up a bunch of practical measurements that "can be used for CSAT measurement" – post-purchase surveys, 5-star customer ratings and reviews, the dreaded NPS survey, and tons more.

The reason why I call pulling up those metrics a "shortcut" in this instance is because they're all proxies for the actual theoretical construct called "customer satisfaction". The theory goes is that all those direct measurements we can take with surveys and reviews and whatever, are all aligned with what we believe "customer satisfaction" should be. If a customer is satisfied, we'd expect them to leave good reviews, answer positively on surveys, come back for repeat purchases, and so on. If we were in a graduate social science research class, the professor would be asking us the critical question of "so, are these measurements actually valid measurements of the construct?".

Now, in normal scientific discourse, we'd go off and start examining the theory that we're working for, what predictions that theory has for the different measurement. You'd be able to construct models that analyze the correlation of different measures to see whether they go in the predicted directions. In our example, we have some theory that the typical review score for satisfied customers are going to be higher than dissatisfied customers. We'd also try to validate our measurement by seeing if our metric correlates with other measure we think will work like repeat purchases and direct interviews of customers.

But here's the thing – in industry, we ain't got time for most of that stuff. Very few of us have the luxury of actually doing a proper full theoretical writeup and validation research. If anything, we don't have robust theories to begin with. Much of what we do work on is based on vibes and prior experience. For well-understood domains like, say, e-commerce, this isn't a big problem because the most common design patterns have withstood the test of time enough to be considered "sorta-proven". This is what I mean by my defaulting to "shortcuts" – for the domains I'm familiar with I make the giant leap from "we need to measure $construct" to "we have $X $Y $Z direct measurements which are a working proxy for $construct".

Having a bucket of mental measurement shortcuts that work for specific domains is great for working quickly in those domains. Industry timelines are so tight that I'd be long fired if I did actual validation studies for decade-old industry standard metrics without justification.

But there's a lot more to the world that ads and storefronts, especially in the UX space where "customer experience" is a very complex, multi-faceted thing. There are also things that are new and unfamiliar, especially with the recent boom in products and experiences that utilize LLMs in various ways. Regardless of your personal stance on LLM-based products, the problem of measuring whether users are doing important stuff like... "accomplishing their goals" or "can figure out how to use the thing" or "actually want to use it" is a problem that teams across the world are struggling with right now. Even outside of that there's always new products and features that are novel enough that there's no solid domain knowledge about the space yet.

And this is why I get concerned when I take the lazy mental shortcut of considering measurements and the finicky implementation details involved before I stop to ask myself "what the !@#$ are we even measuring here?" I've been doing this a while now, and even I find the allure of "figuring out all the technical implementation bits is FUN!" very tough to resist over stopping and putting deep thought into the reasons why we're engaging in the work to begin with.

The lure of implementation

Many of us got into the business of measuring stuff because... there's a lot of nerdy fun to be hand in debating the complicated business of actually measuring something. Hell, the majority of Data Science Discourse is essentially just discussion about tools – not methods, or skills or anything else, literally just tools. It's why this newsletter discussing "mundane stuff" that aren't tools exists to begin with.

And so, my advice to every practitioner out there is to remember that the metrics that we measure, monitor, and manipulate day-in-day-out to the point where we barely think about what they are, have actual theoretical constructs that they point to. Someone at some point did at least a modicum of work to show that "this metric appears correlated with this concept we are about".

And in remembering that these unobservable constructs exist, we should question whether that correlation still holds.

Times change, technology marches on, and take user expectation and conventions right along. Our metrics can just as easily gain or lose their construct validity in that shifting background. I still remember the old days where you'd be extremely wary of letting a web store "save your credit card info" for future purchases because it was the early 2000s and security best practices and fraud protection were not up to current day standards. You had no way of knowing whether to trust that web site or not with your info. The banks were also a hassle to deal with if fraudulent charges appeared. Now in 2026, failing to offer such a feature would be interpreted as old-fashioned and user satisfaction will probably be lower if you cared to measure it due to the inconvenience it causes for repeat customers.

So, y'know. What the heck are we doing anyways?


Standing offer: If you created something and would like me to review or share it w/ the data community — just email me by replying to the newsletter emails.

Guest posts: If you’re interested in writing something, a data-related post to either show off work, share an experience, or want help coming up with a topic, please contact me. You don’t need any special credentials or credibility to do so.

"Data People Writing Stuff" webring: Welcomes anyone with a personal site/blog/newsletter/book/etc that is relevant to the data community.


About this newsletter

I’m Randy Au, Quantitative UX researcher, former data analyst, and general-purpose data and tech nerd. Counting Stuff is a weekly newsletter about the less-than-sexy aspects of data science, UX research and tech. With some excursions into other fun topics.

All photos/drawings used are taken/created by Randy unless otherwise credited.

Supporting the newsletter

All Tuesday posts to Counting Stuff are always free. The newsletter is self hosted. Support from subscribers is what makes everything possible. If you love the content, consider doing any of the following ways to support the newsletter:

  • Consider a paid subscription – the self-hosted server/email infra is 100% funded via subscriptions, get access to the subscriber's area in the top nav of the site too
  • Send a one time tip (feel free to change the amount)
  • Share posts you like with other people!
  • Join the Approaching Significance Discord — where data folk hang out and can talk a bit about data, and a bit about everything else. Randy moderates the discord. We keep a chill vibe.
  • Get merch! If shirts and stickers are more your style — There’s a survivorship bias shirt!