Breaking things, fast movingly
This week's post inspired by the BS I'm reading about how the current US government is being handled by a bunch of incompetent kids. I am so tired.
The cliche of modern big tech is the phrase "move fast and break things". While the phrase came out of Facebook offer a decade ago now, it has entered popular awareness and, at least in certain circles, is still a dominant philosophy with regards to software and product development.
One of the primary reasons that a "launch now, fix later" approach can even work is the idea of having short feedback loops so that course corrections can be made before too much damage is done. It's a strategy that only makes sense if the cost of failure is small compared to the payoff of success. The relatively unique cost/reward structure of software – a place where there's relatively fixed development costs but near zero marginal costs to sell another copy – that makes this easy. It's rare to see companies that work in physical goods or hardware attempt similar things because the initial investment costs can be so high.
As data folks, we stand the forefront of the effort of providing the feedback loops that enable this behavior. By providing instrumentation, metrics, dashboards, and rapid ad-hoc analyses, we're the periscope operator, sensor operator, and sonar room for a submarine moving at high speed in uncharted waters. To a large extent, our efforts define the world that everyone else on the ship sees.
That's why it's so important for us to take care in setting up metrics so as to give a good holistic view of what is going on. While we're often not responsible for the most critical metric of all – revenue as recorded by accounting – most other stuff is within our domain and the metrics we do monitor are usually leading indicators of revenue.
And so, that brings us to the thought of the week... How do we know we're doing things right when we're asked to guide a company doing this?
If we have domain knowledge and prior experience, then the answer to this is very simple – use as much domain knowledge as possible to make educated guesses as to what would be important and go from there. The more experience you have, the easier this would be. This is the least interesting answer, but it's the one why companies will pay people with relevant experience more than someone completely new to a field. Past experience matters a ton.
Putting the obvoius answer aside, what are we supposed to do when we don't have direct experience in a given product space and we've been tasked with providing generating metrics to help keep an eye on a "move fast and break things" project? If you've ever joined a startup early on, often as the first data person, you'd be faced with this situation quite a bit.
Borrow some domain knowledge
The first obvious move is, since we're lacking in domain knowledge, to go get access to other people's domain knowledge. Usually, the founders of a company, or the people who hold the vision for what the product is supposed to be, have some idea of what they think success would look like. They're going to have things that they are worried about. While their thoughts and visions will rarely be complete enough to paint a whole metrics picture out of, it's a good starting point. At the very least it makes sure that whatever metrics you do set up are going to at least align with their expectations so that you're not going to be asked to add their stuff in later.
But cribbing someone else's knowledge is usually not enough to get the whole picture. Sometimes the vision doesn't even paint the full logical chain of events that need to happen for the idea to actually succeed. All those ideas do is point to the high mountaintops on the landscape. The rest of the map is still covered in fog and it's up to us to use what we know about the world, of how the business works, of how systems work in general, and fill in a bunch of blanks.
Filling in blanks in metrics landscapes requires a fair amount of creativity on our part. You need to have a rough sense of how metrics influence each other, like for example how do items produced at a factory, stored in a warehouse, sold online and shipped to a customer need to be accounted for in order to get an accurate accounting of both revenue and the cost of the sale. A good metrics map would not only include the revenue, but at least some representative metrics that measure behavior at the factory, warehouse, and the online store in order to allow us to quickly pinpoint what is going wrong when we notice that revenue is suddenly dropping off a cliff.
Try to measure everything (and maybe fail)
I've rarely seen this happen, often when there is no data person on board to do the heavy lifting needed to set up a metrics plan. "Storage is cheap!" says the engineering team, so just log and store everything so that we can figure things out later. In the meantime, the team will rely on revenue numbers and basic "standard metrics" like active user counts, sales, etc. to keep an eye on things while the team continues to build. Think pure, 3 dudes in the basement hacking a stealth product out energy.
While the instinct to keep an eye on the metric that keeps the business alive is good – namely sales/revenue and whether users are using the product – the strategy of storing everything for later use just delayed the work that needed to be done. Now the poor soul that is finally hired to 'make sense of the data' has to go in and see if anything is usable.
The most egregious examples of this behavior is when those teams try to adopt "datamining techniques" (which was the hot stuff from the early 2000s) and now "AI" (which is the hot stuff today), to "find patterns in the data". It didn't go that well since any of these methods requires having some one pretty skilled and knowledgeable in order to ask the system good questions and filter out the bad, spurious responses.
Amusingly, even with the 'log everything' philosophy of doing things, it's still impossible to log literally everything and still get anything else done. So teams wind up selecting whatever is convenient to log and forget about tricky little edge cases. So they ironically wind up not having a complete picture anyways, but paying a much bigger data storage bill than if they sat down and thought about things more.
Guess, no, seriously
Sometimes, given a lack of information, the only thing that we can do is make a guess as to what is going to be relevant. Most businesses actually do this without admitting it. I've lost count of the times that I've joined a business or team that's been around for two, four, eight years successfully making enough money to hire employees including myself, and one of the first projects I'm given is "go help us understand what drives our business". People already at the company have an idea of how their business works and makes money, but they're not sure to what extent they are correct.
The fact that the business has been successful up to this point is a testament to how they managed to guess at a bunch of correct metrics to help them chance course as they hit the inevitable snags and bumps along the road. Guessing based on whatever information you have at hand can work, so long as you treat your guesses for what they are – things are based on flimsy evidence and are subject to revision at the first sign that they're not grounded in reality.
So I guess if you have to move fast and break things, then make sure your metrics designs also move fast and expect them to break?
Still, don't do this with giant complex government bureaucracy. 🤦♂️
Standing offer: If you created something and would like me to review or share it w/ the data community — just email me by replying to the newsletter emails.
Guest posts: If you’re interested in writing something, a data-related post to either show off work, share an experience, or want help coming up with a topic, please contact me. You don’t need any special credentials or credibility to do so.
"Data People Writing Stuff" webring: Welcomes anyone with a personal site/blog/newsletter/book/etc that is relevant to the data community.
About this newsletter
I’m Randy Au, Quantitative UX researcher, former data analyst, and general-purpose data and tech nerd. Counting Stuff is a weekly newsletter about the less-than-sexy aspects of data science, UX research and tech. With some excursions into other fun topics.
All photos/drawings used are taken/created by Randy unless otherwise credited.
- randyau.com — Curated archive of evergreen posts. Under re-construction thanks to *waves at everything
Supporting the newsletter
All Tuesday posts to Counting Stuff are always free. The newsletter is self hosted. Support from subscribers is what makes everything possible. If you love the content, consider doing any of the following ways to support the newsletter:
- Consider a paid subscription – the self-hosted server/email infra is 100% funded via subscriptions
- Send a one time tip (feel free to change the amount)
- Share posts you like with other people!
- Join the Approaching Significance Discord — where data folk hang out and can talk a bit about data, and a bit about everything else. Randy moderates the discord. We keep a chill vibe.
- Get merch! If shirts and stickers are more your style — There’s a survivorship bias shirt!