Ways to data-drive yourself into the ground!
Being “data driven” has been a hot buzzword for many years now. There are plenty of companies that don’t make it an explicit goal to become data driven — I’m fairly sure my local bakery does not have it as part of their quarterly goals. At the same time, I don’t think I’ve ever come across anyone in any industry that was openly against the idea of making decisions with the help of data if available.
But there are good ways, and bad ways, of making decisions with data. It’s kind of tricky describing a company that is good at being data driven because, by definition, they make good decisions with data. Of course they have good processes and culture and data capabilities in place. Of course they collect good data and act appropriately. It’s hard to really describe how it works. But I’ve had a long enough career to have seen all sorts of dysfunctional data practices, and that collective experience can paint a good picture about what NOT to do.
So here are some (only slightly) exaggerated caricatures of actual failures of being “data-driven” that I’ve witnessed over the years. Details have been averaged over multiple instances to protect the guilty.
First — Disk is cheap, log everything!
You can’t make decisions based on data if you don’t have any data to use. But what data should you collect? How are we supposed to know what’s important before we’ve had a chance to look at it and decide? Well, don’t worry about it because hard drive costs are constantly going down!
Let’s just store as many fine-grained telemetry events as we can. Every user click on the interface is a given. As is every page load. But let’s not forget the load time of the page, and every single click on every widget. We’d also like to track how far the user scrolls down a page so we’ll report back an event every time they scroll down with the user’s visible location. Also we’d like to know how long they’re still on the page so send regular pings back to the server so we can calculate time on page even if they close the tab. It’ll all look so cool when we put this stuff into a big dashboard on a TV hanging in the break area!
All this stuff can just get logged and dumped into our data lake like so much industrial waste. Our vendors say we can run ETL jobs and do data cleaning later, and that’s why we hired our data scientists based on their strong coding skills. Handling the data at the consumer side will make life easier for our engineers who can quickly log arbitrary event strings to the lake. We gotta keep things launching, and our data scientists can transform the data to what they need later on.
Of course, that means that answering any simple question involves sifting through terabytes of useless nonsense — leading to very long processing times, complicated cleaning pipelines, and gigantic data warehouse/processing bills. Plus, since engineers were just arbitrarily sending logging events whenever they thought it would be useful, there’s frequent bugs as to when events get triggered, gaps in what events are actually recorded, duplicated data points, and events can come and go whenever people decide to change the code. So then the data teams run into issues where they need eng to add tracking to stuff and everything takes longer.
But hey, we can measure stuff, therefore we can manage it. That’s how the adage goes? Right?
Bring on the metric parade
Determining good metrics to use is hard. Aside from critical health-of-the-business metrics like “money coming in, going out” useful metrics are not immediately obvious. The correct way to do this is to make best guesses, then spend months and sometimes years testing and validating what metrics actually matter to the business and figuring out what actions can change the metric.
But we don't have time for that, we need something NOW — because we’re gonna “move fast and break stuff”. So we grab the most “obvious” ones and use those to start with. How many people are using the thing, how much money does it make. Is our NPS great, or wonderful? It’ll be a mix of vanity metrics we show investors, and some actual metrics that are ratios that we try to drive with.
Once we have those, and because by definition we make decisions based on data, we are going to be all scientific about things and do tests, and watch every metric twitch and maybe react to them
Luckily, if we have some product momentum and the economy is doing well, just about everything tracks up and to the right. It is proof that the system works! Make a design change? Up and to the right. Remove that silly feature we all hated? Up and to the right. At worst, some metric merely go up slower than other ones and we fuss about it a bit. If luck holds, we can keep doing this for years and years until everyone just believes that the existing metrics “work”.
That is, until things stop going up.
When the panic sets in that all the actions that used to work before stops working, the data teams get called in to sift through the terabytes of data to “optimize” or identify issues. Somewhere in the hundreds of terabytes of logs and databases, it must be possible to figure out what makes the product grow. There's must be a lever somewhere we can pull somewhere to fix things —find it! ASAP!
Metric validation always works best under time pressure.
When things launch - we stick to the vision
Every so often we’re going to have to make big changes, like a big redesign to unify all the features we’ve added haphazardly the past few years, or the site just looks too dated and needs a refresh to stay relevant. Time to launch a redesign or major feature!
But hold on! General wisdom holds that users hate change! They will always complain if we change anything big like a redesign or a new product launch. Remember the last time a big site like Facebook or Google made some changes and half the planet spent a week ranting and writing news articles about it? That happens all the time and it’ll happen to us! (No joke, I once saw an irate user, in response to a large redesign at Meetup, email in saying they were going to start “national media proceedings” about how the company was “destroying community” with a page redesign.)
But it's okay! We know users are going to be angry about the process and our metrics will all take a hit as a result. We’re prepared to accept that and not panic. We've got a genius design team that's led with someone with vision. We believe in that vision! We're going give users the proverbial car while they ask for faster horses. So we’re going to launch this thing and then iterate — fixing bugs and issues that come up while we give users time to get used to the new design.
While we do that, we're going to accept that metrics are going to be down. It’s all part of the plan. We're going to compensate by doing tons more experiments to get those numbers back up to where we want them. There's plenty of opportunity to make improvements. We’ll pay close attention to our metrics and data and be scientific from that point on.
… [Time passes] …
Angry screaming from users has thankfully died down, more or less. We still hear the occasional gripe about the design change but it’s barely noticeable. It’s clear that users have gotten used to the new reality. We’ve been constantly making improvements and experiments, so metrics are on an upward trend. We just have one problem — we’re somehow always too busy to stop and look back at how the metrics are doing now compared to the old baselines. We might have completely changed what metrics we’re tracking so no such baseline exists. So… we just keep going with our current version. Whether it’s actually better or not… no clue!
And let’s not forget to do bad experiments!
By bad experiments here, I’m going to skip over the usual “bad statistical methods” part of bad data science, because that’s pretty well covered territory already. This includes things like the many ways to p-hack, having biased sampling methods, peeking at results and calling winners at the first sign of significance, etc. Those are all to “easy” to commit while appearing to be data driven.
Instead, there’s a whole realm of bad experimentation that merely leaves you wondering “why was there an experiment run in the first place?!”
One example is CYA (Cover Your Ass) type experiments. In those, the product or feature will be released no matter what the test results say. Instead, the experiment is posed to “keep an eye on metrics to make sure they don’t drop”. As a methodology to make sure that you’re not releasing broken software out, it’s fine. But it’s not a methodology for making decisions given collected data because the decision has been largely pre-determined.
Another example is for teams to constantly make very simple, almost trivial, experiments in order to “win” all the time. Very often these teams are incentivized to celebrate victories and improvements, so they are very cautious about what they propose to experiment to begin with. If faced with a situation where they could try to redesign the page, or merely do incremental improvements, it is much safer to attempt the latter. Except the conclusion is almost forgone because it’s so “safe”. Maybe there’s a 0.05% improvement, but the business is looking for the next 15% burst of growth.
Put it all together for some product suicide!
Sometimes, when a product or feature launches and it’s too ahead of it’s time, or just a poor product for the target market in general, adoption is disappointing at the onset. This can lead to yet another variation of data-driven failure that mixes elements from all of the above.
Like with the launch non-process mentioned above, the product is released and everyone watches the metrics — and become disappointed. But they stick with the vision, believing that with more tweaks and band-aid fixes, users will finally “get it” and adopt the product.
This “wait for the customers to come” period goes on for a length of time, and while there’s minor improvements in numbers, nothing seriously changes because the vision is still exactly that, the vision, and we mustn’t corrupt it too much while it hasn’t had time to prove itself yet. The fundamentals aren’t changing, just little bits of chrome on top.
More time drags on and interest and product momentum have more or less died off. Ironically, neither of those two things are metrics, so no decisions are made to change them. Since adoption had been so lackluster initially and hasn’t seen meaningful improvement despite all the time and resources devoted to it, people start changing teams and budgets start shrinking.
Finally, in a data-driven pièce de résistance, the steady decline in interest in the product largely caused by a sustained lack of interest and investment in the product due to how the initial metrics sucked, is used to justify killing the whole thing altogether. Ideally, you throw in a couple of leadership changes into the mix to make sure that the situation gets “re-evaluated from first principles” so the data can be seen without context. Product is declared a failure and given the axe.
It doesn’t really take much to start on this path. Just a few decisions that make sense at the time on how much priority and investment a given thing will get, continued lackluster metrics performance, and you’re in a death spiral. I’ve seen this happen in multiple places, multiple times. Entire products and features can be slowly strangled with the help of data and a bit of organizational amnesia.
There are, of course, many many many more ways to ruin your business while claiming to be data driven. But these are the ones I’ve seen most often thus far.
Standing offer: If you created something and would like me to review or share it w/ the data community — my mailbox and Twitter DMs are open.
New thing: I’m also considering occasionally hosting guests posts written by other people. If you’re interested in writing something a data-related post to either show off work, share an experience, or need help coming up with a topic, please contact me.
About this newsletter
I’m Randy Au, Quantitative UX researcher, former data analyst, and general-purpose data and tech nerd. Counting Stuff is a weekly newsletter about the less-than-sexy aspects of data science, UX research and tech. With some excursions into other fun topics.
All photos/drawings used are taken/created by Randy unless otherwise credited.
randyau.com — Curated archive of evergreen posts.
Approaching Significance Discord —where data folk hang out and can talk a bit about data, and a bit about everything else. Randy moderates the discord.
Support the newsletter:
This newsletter is free and will continue to stay that way, share it with your friends without guilt! But if you like the content and want to send some love, here’s some options:
- Share posts with other people
- Consider a paid Substack subscription or a small one-time Ko-fi donation
- Tweet me with comments and questions
- Get merch! If shirts and stickers are more your style — There’s a survivorship bias shirt!
