It's all about balance... right, goose?

Data-Driven <-> Data-Stuck

Sep 9, 2025

Most important announcement of the week: The list of speakers/talks for DataBS Conf has been posted! I still have to check if there's any stragglers missing, as well as assign time slots. But we're getting there! Also, now's the time to register your ticket! Time does not stop! Aaaaahhhhhh!!!!

I vaguely remember a joke about large organizations and how every few years they are either centralizing or decentralizing, moving from a tall organizational structure to a flat one. On a surface level, it seems like organizations just can't make up their damned minds about how to get work done, and so everyone has a laugh.

But when you've been working for a decade or two, you actually live through the exact same reorgs and maybe even take part in planning some of these reorgs and you slowly come to horrible realization that the changes are in response to perceived or actual problems. Organizations, it seems, are like a giant baseball bat being balanced upon a fingertip – the mere act of staying upright and moving towards a goal requires a huge amount of correction, over-correction, and counter-correction. Zoom far enough out and it really does seem like organizations bounce between extreme points along a number of axes, though the details are always more nuanced.

While centralized/de-centralized, tall/flat org structures are the most common themes of reorganizations, when it comes to data, the extremes I tend to see organizations bounce between are "we need to move fast, so we don't need as much data" and "everything must be measured, tested".

I think as data practitioners, we can agree that the ideal state between the two poles is "somewhere in the middle". There's a magical balance between speed and rigor that is hard to describe but something we probably strive for. We can definitely recognize situations on either extreme that are utterly pathological and unhealthy.

For example, before an organization has any real data people in it, when it's just a startup with a couple of founders and an engineer, everything is about moving fast based on the gut intuitions of the founders. After all, the whole enterprise started on the vision and pure imagined beliefs of the founders. That conviction is what keeps the boat afloat. I'm sure we can all visualize how such systems can go wrong. As Josh Wills has once said, organizations need data people when things start going wrong. Vibes and visions and gut feelings are all blameless when investors are signing, sales are rising, and features are launching. But when the streak of luck runs out, data folks are called in to analyze things and cut the waste while keeping the money printer going. Maybe those folks go by the name "accountant" instead of "data analyst" but no one needs math when the champagne keeps flowing.

My main experience with the other pathological pole manifests itself as "let's measure and test every last thing". I've seen it happen multiple times after a team starts having sustained success with using the A/B testing system to make launches and make decisions on what to keep and what to iterate on. At some point, the teams lose track of why they are testing. Instead of using testing to learn about their users to make better products, they get addicted to showing how they can improve a metric with their launch. They become fearful about hurting metrics with a launch. They forget about the learning part and eventually fearfully check a laundry list of metrics to make sure they don't hurt anything. Changes become microscopic, and overall team velocity slows to a crawl as they run a dozen tests trying to squeeze and extra 0.1% gain out of a tired design.

Let's just say that being the data person handling that kind of work is demoralizing. You can only run so many experiments in parallel when you know that the changes themselves will result in no practical difference. Everything is equivalent to the null hypothesis in experimental trappings. Eventually, the number of teams clamoring for tests that won't matter either way will become overwhelming. While there's all sorts of cute little data engineering projects you can go on to "make running tests self-serve" to lighten the load, giving everyone access to such tools tends to make the problem worse over time. Now teams can run even more tests without you even knowing and trying to stop them!

The overall result of all this experience has made me a pretty strong advocate about how experiments need to be bold and intentional. In 2024 I did a UX Quant Con talk about doing better experiments (no recordings, but the slides are available) that pushed this idea along with other best practices.

A different failure mode

Guess what, humans are creative creatures. So we're most certainly not going to restrict ourselves to a single mode of failure. So here's a new (to me) interesting failure mode – teams that don't get trapped making lots of useless A/B experiments, but instead somehow are only able see the world through metrics.

To be more specific, since the overall topic of A/B experimentation has cooled down in recent years, I've been seeing less pressure from management for launches to be tested all the time. If for nothing other than speed, they can accept not knowing the specific metric lift of a feature launch in exchange for getting two launches out the door faster. Since we have a robust metrics measurement system in place, so long as the overall metrics of a system are moving in a good direction, it will all work out in the end.

And therein lies the newer pathological case I've started noticing – because teams are taught that "when we do something good, some of the many metrics we track should improve" they start acting like the arrow of implication can be reversed and "if metrics go up, then we did something good". They also start acting like "if metrics don't move, then we didn't do anything good."

I'm making a careful distinction here about "acting as if". If you directly talk to the people involved, they don't outright hold those thoughts as a belief because they're usually smart enough to have passing recognition that affirming the consequent is a logical fallacy. If you tell a team that the most recent feature launched didn't move metrics at all, they'd be disappointed. If you told them that a new survey question asking about the specific feature scores very well across some new metric, they'll happily incorporate that "thing go up, we succeeded" into their worldview.

So what's the pathological part? Well, these teams very often treat the entire product space as a large metrics map. The theory goes that all the places where customers interact with the product can be described with one or more metrics, and so you can use the metrics like a sort of "map". Find places on the map that look like you can improve on them, target them with some feature work, and if we do things right, that metric should improve while also having positive knock-on effects to related metrics in a giant web. You can consider it a overly aggressive product-wide implementation of the concept of "metrics trees", though much broader in scope and not nearly as rigorous.

The problem with going too far with this concept is that when teams see that they can visualize potentially dozens of metrics on a dashboard associated with every single product change they want to make, they start thinking that their knowledge of the product space is essentially 'complete'. All they really need to know about the product is already embedded in the map. We measured all the important stuff already, so what's not measured isn't important. So they focus on improving the things on the map – and forget about things that aren't on the map.

This is how you get weird requests like "we count how many times people start and stop the engine on the car, so can we analyze how often people are stopping the engine to understand if we can make that experience better?" All while completely ignoring the obvious fact that the number of times a car's engine is started is typically equal to or one off from the number of times the engine is stopped. It's also just flailing about in search for "a thing to improve" without really considering any reason why they should bother spending time looking there to begin with.

Plus, perhaps most painful, such a map-based strategy ignores all the stuff that isn't on the map. There are are tons of things that exist in the world that we don't pre-measure because it's really expensive to. For example, we'd never send out a survey to users over their satisfaction for using a really obscure feature. We might not even spend money sending out a satisfaction survey about anything at all. But we could be convinced to if a need arises. But I've seen teams that are so hyper-focused on making existing metrics improve that they don't even consider new stuff.

Sure, we're around to point out these issues and fight for sanity. But if it's an organization-wide failure, that is a lot of fighting that needs to be done. I certainly do not have the endurance to fight a hundred little battles to keep people asking strong questions. At some point, teams need pointed reminders that not all metrics that are measured are important to move and there's a bunch of metrics that aren't easily measured but still valuable. Things like "discoverability", "ease of use", and "affordability".

How does that kind of re-alignment happen? At some point things get gummed up with distractions enough that leadership can't help but notice things have tilted too far to one side. Maybe they notice because you've been shouting about it up the chain of command. But however it happens, you guessed it, a re-organization happens.


Standing offer: If you created something and would like me to review or share it w/ the data community — just email me by replying to the newsletter emails.

Guest posts: If you’re interested in writing something, a data-related post to either show off work, share an experience, or want help coming up with a topic, please contact me. You don’t need any special credentials or credibility to do so.

"Data People Writing Stuff" webring: Welcomes anyone with a personal site/blog/newsletter/book/etc that is relevant to the data community.


About this newsletter

I’m Randy Au, Quantitative UX researcher, former data analyst, and general-purpose data and tech nerd. Counting Stuff is a weekly newsletter about the less-than-sexy aspects of data science, UX research and tech. With some excursions into other fun topics.

All photos/drawings used are taken/created by Randy unless otherwise credited.

Supporting the newsletter

All Tuesday posts to Counting Stuff are always free. The newsletter is self hosted. Support from subscribers is what makes everything possible. If you love the content, consider doing any of the following ways to support the newsletter:

  • Consider a paid subscription – the self-hosted server/email infra is 100% funded via subscriptions, get access to the subscriber's area in the top nav of the site too
  • Send a one time tip (feel free to change the amount)
  • Share posts you like with other people!
  • Join the Approaching Significance Discord — where data folk hang out and can talk a bit about data, and a bit about everything else. Randy moderates the discord. We keep a chill vibe.
  • Get merch! If shirts and stickers are more your style — There’s a survivorship bias shirt!