Playing chess when it's more like poker

Aug 8, 2024

As noted in my brief comment on this week's post, I'm still working on a bit of longer rant. But as with most rants, there's a lot of ... spillover ideas (?)... that don't quite make it into the full story because they'd become a long aside that becomes a distraction. But that's the kind of thing that fits great into these Thursday subscriber posts!

So next week's post is at a very high level, talking about a failure of strategy that is aided and enabled by data science. It's a phenomenon of our hyper-analytic world where having "hard fact" numbers of cost and benefit can lead entire organizations to confidently march off onto a self-destroying tailspin.

One of the spillover thoughts from thinking about the topic is one about confidence – how numbers inspire people into having confidence that can be completely unfounded because they're not listening to the caveats that we try to give when explaining results.

What I mean is, I can tell people that version A of an A/B test is better because we found a statistically significant effect. Those people will happily launch version A to production because it is "better" along whatever those metrics are. Meanwhile, they're much more hesitant to run with the same version A if they only have qualitative feedback that things are better. This ostensibly is because larger volumes of data (and statistical significance) are supposed to be better and more definitive than qualitative anecdotes.

But this mental weighting bias of quantitative studies versus qualitative studies can be pretty wrong. Qualitative work can say all sorts of things about how users like the new version more, how they're less confused, how they get a better perception of the company brand. These are all things that are very tricky to operationalize into metrics that show up in a traditional A/B test. You'd have to jump through hoops about task completion times, ask users survey questions, or come up with other arcane metrics. All those measurements have more inherent error in them because they're trying to measure fuzzy constructs. Many of them may never differ enough to be detected as a statistical difference. So the most common solution is to... just use convenient high level metrics like "revenue generated" and ignore the rest.

While qualitative work and even our own user experiences can scream to us that version B is better, unless they're measured and compared during the test, version A might win out due to how it's using an easy-to-use metric that ignores all the things that the qualitative work measures. A purely data-driven company would go with version A and potentially miss all the subtle benefits of version B forever. They'd also make this decision with a huge amount of confidence because, well, version A "won" the test and it was statistically significant to boot.

I don't know how many opportunities are lost in such exchanges every day, but there must be a lot. This isn't even like we don't tell data scientists that setting the correct test metrics is extremely important. I'm sure most data scientists are aware of this inherent danger and try to use the relevant data points available. The problem is that some of the most relevant things are just extremely hard to measure. So we ignore them out of convenience. A convenience that may lead us to randomly step off a cliff.

So the question I ask myself is... how do I make it so people are less likely to pretend that their decision-making process with the aid of data is still loaded with lots of unknowns and confounds. That we're surrounded by surprises and blind spots. That the game we need to be playing is one of imperfect information like poker and not of perfect information like chess.

Because it's so damn easy to pretend that your business works like a chess game when you focus only on the data that's at your fingertips. When you have years and years of historical data across millions of users, you can build a pretty convincing causal network of how your business works off of that. Given enough factors and imagination, you can p-hack your way to metrics network that you believe explains how your business makes money. If you're silly enough to actually believe in that and base all your business decisions on that model... well, good luck because you're going to get "disrupted" one day.

I don't have a clear answer to this question. What I do know is that there are two broad types of work – the chess-like part where you analyze everything you have to make stuff better, and the poker-like part where you need to react to the world while not knowing how everything works. Most reasonable people know that these types of work exist, but I've definitely seen when culture puts blinders on people and they mistake one for the other.

Whatever game we're playing, there are quantitative data tools at our disposal. They're different tools and often can represent different fields of expertise, but we at least have them to apply. I'm just not quite sure if we ourselves remember when to switch toolsets. I'm also not sure if people are willing to listen and switch modes with us.