The gingerbread house the kiddo decorated with a friend this year. The curse demanded to be spread. I don't make the rules.

Avoiding Mistakes Via... Powerpoint (?!)

Jan 16, 2024

Data mishaps night is coming again on March 7th, 7pm CST and once again I lament that I haven't done anything in the past few years that would let me give a talk about some catastrophe of my making. This is not because I don't make mistakes, I still make plenty every year. But I also catch a bunch of my own mistakes as I work so there aren't any spectacular fireworks of note.

So I guess this week I'll go into how I work with data so that I wind up having less mistakes get through to the point where other people see them. It's not exactly revolutionary stuff, but maybe someone will find it useful.

Stomping down confirmation bias

Probably most practitioners learn this lesson early on. The more happy or excited you get about a result, the more you need to slam the breaks and make sure you haven't screwed up somewhere. Just like we get paranoid when our code runs on the first try, most of us have learned that we're most vulnerable to overlooking mistakes when we are most excited and happy about a result.

This serves as a nice first line of defense against making public mistakes because any success is met with a reflexive "ok I need to double check this result". But what does double checking look like?

For me, checking through my code for errors by reading it again is a really suboptimal way of checking my work. I just wrote the code maybe 20 minutes prior. It's fresh in my mind, I've put all the things I could remember to put in already. It's practically impossible for my brain to spot logical flaws this way. This is like spotting the many typos I write into every newsletter post – the only way to find them by reading over is to give it time to rest and drain from my brain first. I imagine there are people out there where such a method works, but it doesn't for me.

Instead, I have to come at the problem from a different angle by looking at my results from the perspective of an antagonistic reviewer.

Talking through everything

While I've previously talked about "triangulation", the process of finding other measurement methods that should give confirmation of my results, as a method to check my own work, I find that it's very difficult to do. It takes a LOT of thinking, as well as knowledge about the data generation processes, and causal links within the systems. Having a "Theory of How Stuff Works" in your head to triangulate off of is not something you can expect from someone starting out. So what can you do instead?

My other method is... I pull out my slide deck editor or word processor. Yes.

What I'm going to do is write a detailed report that I would present to people who are interested – managers, team leads, executives, whatever. I'm also probably going to write more slides than strictly necessary with the intention that a bunch of slides will wind up going into the appendix as the narrative develops.

The reason I go through this exercise is because I'm imagining a very savvy audience that will have lots of deep, sharp questions. I need to justify every step of the story, including anticipating all the questions they're going to ask. Are we talking about unique users or events? Are these users in the typical segments we discuss, or a group with unique properties? Did we exclude the special flags we normally expend? Are there alternative explanations for this effect and can you prove that you've ruled them out?

I've found that as I start piecing together the story of "my amazing result" by addressing all the anticipated questions and background build up... I wind up catching a ton of things I missed. Like, seriously, I will have forgotten so many little details to formally account for it's kinda embarrassing.

On top of just finding things I missed, having that internal dialogue also makes me lay out my assumptions – because the key assumptions of the analysis usually deserve laying out somewhere. That in turn forces me to question them. Is it really true that all users in the study came from the same geographical distribution and there wasn't some weird mix shift? Is it really true that the price didn't change during the analysis period?

Since my whole brilliant analysis hangs on getting all these details and assumptions right, it all can fall apart if one piece doesn't work out. But thanks to the process of writing everything down, I think of them long before I show the result to anyone. I get to fix the issue before anyone knows – thus, no mistakes made here (tm)!

This is very much an extended version of rubber duck debugging. But instead of talking out your problem to the duck to work through the problem, you're trying to explain your findings to it. Forcing your brain to step outside your immediate work situation and into the shoes of a skeptical audience member helps give you the necessary fresh perspective needed to catch errors.

Early on in your career, you're probably not going to have enough experience with sharp audiences that will make you bring receipts for every single statement you utter. You're not going to be able to catch all your mistakes with this method to start. But with a bit of practice and experience, you're going to be able to start saying "great question, the answer is in the next slide".

It feels really good to be able to do this.


Standing offer: If you created something and would like me to review or share it w/ the data community — just email me by replying to the newsletter emails.

Guest posts: If you’re interested in writing something a data-related post to either show off work, share an experience, or need help coming up with a topic, please contact me. You don’t need any special credentials or credibility to do so.


About this newsletter

I’m Randy Au, Quantitative UX researcher, former data analyst, and general-purpose data and tech nerd. Counting Stuff is a weekly newsletter about the less-than-sexy aspects of data science, UX research and tech. With some excursions into other fun topics.

All photos/drawings used are taken/created by Randy unless otherwise credited.

  • randyau.com — Curated archive of evergreen posts. Under re-construction thanks to *waves at everything
  • Approaching Significance Discord — where data folk hang out and can talk a bit about data, and a bit about everything else. Randy moderates the discord. We keep a chill vibe.

Supporting the newsletter

This newsletter is free and will continue to stay that way every Tuesday, share it with your friends without guilt! But if you like the content and want to send some love, here’s some options:

  • Share posts you like with other people
  • Consider a paid subscription to pay for the servers and encourage more writing
  • Get merch! If shirts and stickers are more your style — There’s a survivorship bias shirt!