Being air-dropped into an analysis

Jun 25, 2024

Long time readers of this newsletter know that I strongly, frequently, advocate that much of the value of data people come from their rich contextual knowledge of the domain they're working in.

As an example, my employers usually hire me because I've been working on software and tech related analytics for 15 years. I'm nerdy enough to be pretty darn familiar with both the engineering side, as well as the business and finance side of things. Thanks to all that experience, I can turn out good analysis quickly, and even my blind guesses have fairly informed priors to base on.

But today, I want to think a bit about the opposite situation – what do you do if you're essentially kidnapped and heli-dropped into a completely unfamiliar situation, asked to do data analysis with no domain knowledge. Under what conditions can you do good work, and to what extent is that work valid and usable?

I find this question interesting because every day there are people who are asked to do this – new hires, students, people changing careers. The magical power of math allows all of us to proceed with the mechanical aspects of analysis, even if we don't know how to interpret the results yet.

Mechanics as lingua franca

When called upon to be airdropped into a project, these are the things we carry with us, ready to use at any point. Our proverbial tool box of portable knowledge and methods.

The thing we're known for is the math stuff we use. The great thing about the mathematics that underlies most data work is that math is universal. So long as your data fits whatever minimal assumptions required by the math, the calculations will come out right.

It feels trivial to say, but a lot of fundamental operations that we make use of all the time don't have particularly onerous assumptions to uphold. Means, rolling averages, descriptive statistics, percentages and arithmetic will work everywhere. Slightly more complex things like regression and t-tests are in the peculiar "are surprisingly robust against even some assumption violations".

Then there's the more abstract skills involved in numeracy that we don't think about so much. This is the knowledge of how data needs to be collected and operated on to yield correct answers. It's how we know to look for randomized samples and numeric biases. It's how we know what numbers can and can't be combined or divided – like handling unique counts or operating on weighted averages. It's how we can squint at a bunch of lines in a chart and notice interesting features.

Similarly, our programming skills can often be transferred across different platforms, hardware, environments, and sometimes even languages. The skills are somewhat less portable than all the mathematics, but it is rarely a major hurdle.

The missing domain knowledge

The implication of being dropped into an analysis without context is that the analyst has access to limited domain knowledge. As data people, we all know how ineffective we can be when we are faced with problems and numbers with no useful domain knowledge to guide us in our work.

But do our stakeholders know this?

In my experience, sorta? People are generally aware that experience in a domain counts a lot towards someone's effectiveness– this is true for data and for non-data activities. They may not be aware of just how much of a barrier this can be at times. But even the most patient non-data person will eventually get frustrated if an analyst keeps trying to divide numbers that have nothing to do with each other. People can tell when completely useless analysis is being done because it won't fit any of their gut-check mechanisms. It'd be like seeing an analysis pointing out how the most profitable item is XYZ123, but we only sell it in December and we should sell it year-round – while not understanding item XYZ123 is a special seasonal winter item. Everyone would wonder whether this person had any common sense at all.

Sometimes stakeholders will be willing to take a chance on an analyst that has no domain knowledge because they think it will be easy to train them up on that knowledge. Some things like e-commerce shops are naturally more accessible because we encounter them in everyday life. Meanwhile, you'd be hard pressed to find a hospital willing to take someone with no experience in healthcare data outside of the most entry-level positions.

So one of the skills that lets generalist-type analysts be successful is the ability to somehow absorb domain knowledge faster than people expect them to be able to. We're always going to have to learn all sorts of weird unfamiliar things along the way, so the perceived value of the analyst hinges upon this very weird race.

Very good analysts I've met along the way tend to be quick learners, willing to learn through a mix of self-study as well as asking questions of experts. They then translate all the things they learn into their analyses. Obviously, this process involves getting a lot of feedback and help from domain experts, but the analyst has to do the hard work of figuring out how to translate things into metrics and charts that will make sense to those same experts.

Learning ability isn't an innate skill

Thus far, I think I've made things sound like being one of these drop-in analysts is some kind of innate talent or something. I don't think that at all. If this were true, then the whole field of data science consulting would be nigh impossible.

All consultants have their own little niches where they have domain expertise, but even within those niches, every new work engagement requires them to catch up on a lot of the client-specific domain knowledge that is going to be completely new and uniquely bizarre to them. Their knowledge of the broader domain lets them fill in whatever knowledge gaps they have faster than their clients expect. They also have an idea of the pieces they need to do their work, and so they can specifically target their learning efforts to those specific areas.

Normal non-consultant analysts can run a similar playbook for when they get dropped into a project they aren't familiar with:

  • Get good clarity on the scope of work to be delivered
  • Figure out what pieces need to be in place to build up to the final analysis
  • Work with the domain experts to figure out what the flaws in the analysis plan is, where does the plan go wrong
  • Leverage all the data and domain knowledge you have to pull things together by drawing parallels and making educated guesses that get validated by others experts

I focus a lot on the last bit, pulling things from existing knowledge, because I pretty strongly believe that a lot of learning new things involves drawing parallels and metaphors of everything the learner already knows. Remixing old knowledge helps with the stretch to new things. So there's always some kind of route into understanding a new environment if you're only willing to search for it.


Standing offer: If you created something and would like me to review or share it w/ the data community — just email me by replying to the newsletter emails.

Guest posts: If you’re interested in writing something a data-related post to either show off work, share an experience, or need help coming up with a topic, please contact me. You don’t need any special credentials or credibility to do so.


About this newsletter

I’m Randy Au, Quantitative UX researcher, former data analyst, and general-purpose data and tech nerd. Counting Stuff is a weekly newsletter about the less-than-sexy aspects of data science, UX research and tech. With some excursions into other fun topics.

All photos/drawings used are taken/created by Randy unless otherwise credited.

  • randyau.com — Curated archive of evergreen posts. Under re-construction thanks to *waves at everything

Supporting the newsletter

All Tuesday posts to Counting Stuff are always free. The newsletter is self hosted, so support from subscribers is what makes everything possible. If you love the content, consider doing any of the following ways to support the newsletter:

  • Consider a paid subscription – the self-hosted server/email infra is 100% funded via subscriptions
  • Share posts you like with other people!
  • Join the Approaching Significance Discord — where data folk hang out and can talk a bit about data, and a bit about everything else. Randy moderates the discord. We keep a chill vibe.
  • Get merch! If shirts and stickers are more your style — There’s a survivorship bias shirt!