Our most important skill, spotting and questioning assumptions
Monday morning, I was tinkering with something in the "always in a state of falling apart somewhere" house. In this particular instance, I was running some CAT-6 cable and had to do the terminations into RJ45 plugs.
For those who are unfamiliar with making networking cables from scratch, it's not a super difficult task. You cut the cable to length, untwist the 4 pairs of twisted wire cable, put the cables into the plug in a specific standardized order, and then crimp the plug using a specialized tool. The problem is that the the RJ45 connector is pretty compact, and the cables have to actually slot into holes inside in an up/down staggered arrangement to fit in the space. To add to the misery, the lengths of wire you have to deal with can be fairly small and fussy to manipulate with big fat fingers. Today's main post image is from the one plug I was about to finish, all the wires neatly arranged in a clever little guide piece that lets you trim and put every wire into the connector at once.
Assuming you put the wires in the correct order and have enough manual dexterity, making a cable end usually only takes a couple of minutes. But there's one mistake that I pretty consistently make at least once every time I unpack my tools. Take a guess what it is. The offending part is actually in the photo above.
If you guessed "forgetting to put the strain relief boot on the cable before starting everything else" then you must've done this before. The blue molded plastic piece in the photo slips over the connector and helps prevent the cable from bending too much right where the plastic plug housing meets the cable. There's a plastic wedge in the plug body that clamps onto the cable and lots of bending forces naturally accumulate in that area. For cables that get used and abused with multiple cycles of being plugged in, it's a natural failure point and these little plastic boots help make it harder for the cable to bend too much in that one spot. Most annoyingly, to do their job correctly, they have to hug the cable closely in at least one place. That means that if you forget to pop the boot on before going through all the fiddly trouble of wiring things up, you get to do all the work over again. This reliably happens once every time I have to make a bunch of cables.
So, this is a long build up to call out all the stupid little steps that are both critical to do to maintain correctness, but also somehow very easy to forget. Given the nature analysis work where everything is effectively a sequence of familiar steps, you'd think that it'd be hard to drop an important one like "check to make sure you have all the data and not just part of it" or "make sure you excluded all the weird 'special user' accounts that always mess up the analysis". It's all those little details that people never remember to write down. Inevitably in our (ok, my) rush to get something out the door quickly I make a few assumptions about my data that trips over one or more of the little caveats and winds up being false. I only catch it later when I pause and take stock of what I've done.
The most critical piece of the analysis process is having that moment of breathing room to stop my rushing line of thought, look at my results, and have the clarity to ask "so does this make sense? Could I have missed something important?" The list of things I could have missed varies significantly with the data set and the question being asked. Some questions don't care about certain caveats, while others hinge upon a specific detail.
What's somewhat scary is that throughout the whole process, I've done all the work, made all the decisions. I didn't become an amnesiac during the whole work process so it was all done with the benefit of all my years of work experience. AND YET, without taking the time to cross-examine my own results, I would've shipped something that would have required some messy explanations further down the line. Moreover, every other senior analyst I know has plenty of these exact moments in their own work. It's practically part of the standard workflow. We all learned to do things this way over many painful years of being burned by making this exact mistake. We are our own worst doubters.
In the age of LLM-based coding aids, this ability to rip our own work to pieces at the level of what assumptions are being made goes from being merely 'important' to 'unimaginably critical'. While I can bask in the glory of not having to write another pivot/sum function from scratch any more, these LLMs make the most obscene assumptions about datasets completely at random. I have traded all the time saved into having to slavishly watch a tool churn through and stop it when it starts making dumb analytical decisions. One type of toil has been replaced with a different type of toil. The net gain in efficiency is often fairly marginal.
If the story was just about me doing analysis with aids, that'd be the end of it. But now I have people who are not analysts by training asking me for advice on how to "do quantitative things" with the help of LLM tools. I'm going to put off for another day the rant about why that whole idea is fraught with dangers. Even without going there, every one of us reading this newsletter know that these folks won't have those same embarrassing experiences to finely hone their ability to dig into all the assumptions in an analysis. The safety net doesn't exist for them. You could even argue that they're effectively signing up to learn those embarrassing experiences firsthand.
There may be a way out there that teaches people to do analysis work without having to fail repeatedly but I haven't figured it out yet. Maybe there's a method that teaches building cables that will remind me to not leave off the strain relief boot. If we solve one, I'm pretty sure we can make headway in solving the other.
Standing offer: If you created something and would like me to review or share it w/ the data community — just email me by replying to the newsletter emails.
Guest posts: If you’re interested in writing something, a data-related post to either show off work, share an experience, or want help coming up with a topic, please contact me. You don’t need any special credentials or credibility to do so.
"Data People Writing Stuff" webring: Welcomes anyone with a personal site/blog/newsletter/book/etc that is relevant to the data community.
Counting Stuff Official Forums: Discuss posts, or other data topics with the community.
About this newsletter
I’m Randy Au, Quantitative UX researcher, former data analyst, and general-purpose data and tech nerd. Counting Stuff is a weekly newsletter about the less-than-sexy aspects of data science, UX research and tech. With some excursions into other fun topics.
All photos/drawings used are taken/created by Randy unless otherwise credited.
Supporting the newsletter
All Tuesday posts to Counting Stuff are always free. The newsletter is self hosted. Support from subscribers is what makes everything possible. If you love the content, consider doing any of the following ways to support the newsletter:
- Consider a paid subscription – the self-hosted server/email infra is 100% funded via subscriptions, get access to the subscriber's area in the top nav of the site too
- Send a one time tip (feel free to change the amount)
- Join the Approaching Significance Discord — where data folk hang out and can talk a bit about data, and a bit about everything else. Randy moderates the discord. We keep a chill vibe.
- Get merch! If shirts and stickers are more your style — There’s a survivorship bias shirt!