Many boats in the canal locks in Ottawa. No boarding allowed.

Onboarding and fishing for tacit knowledge

Mar 5, 2024

Data Mishaps Night is this Thursday! (2024-03-07, 7pm CST). Since the event is free and there will NOT be any recordings, I highly recommend you attend and have a fun time. This year I got the OK to make an anonymized summary of the event so expect that to be next week's post.

For some reason, pathfinding is on my mind this week. Especially in the situation of someone new and without context being dropped into a team. But the genre of "how do you onboard a data person" but as a manager and as the person being hired, is a pretty crowded space. There's lots of guides and checklists already about the basic things, like for example:

  • Getting all the HR administrative stuff done
  • Getting computers, equipment, proper login credentials to systems
  • Having 1:1 meetings with lots of people
  • Onboarding guides that record where common resources and issues are
  • Good first projects that enable learning
  • Learning how to commit code
  • Learning processes for how things are done
  • and many more...

All that stuff is very important and necessary for someone to onboard onto a team. When items are missing or mishandled, people get lost and have a much rougher time. There's plenty of nuance within all those topics, but today I want to put some thought into a single question – how can tacit knowledge be better transferred/obtained?

Tacit knowledge is the knowledge that's unspoken and unrecorded within an organization. They're all the operational details that people pick up while they work. It's the small but important details that people just don't really write down into official documentation. For example, who happens to have ownership or knowledge of different projects that are in flight? How does the recommender system work right this moment (as opposed to in the design document from two years ago)? What's important to work on? What's been done and dismissed already?

This sort of stuff doesn't get written down because it's constantly shifting and people don't have the time to write down everything except on rare occasions. It gets stale.

But without this knowledge, someone new to a team will always feel out of place. It's only when they absorb enough of this knowledge themselves do they feel like they know what their job actually is. In my experience, this process takes about six months to achieve... so I always wonder if we could help it along.

Getting to the tacit knowledge

Obviously, tacit knowledge lives in the minds of people and not anywhere else. More annoying, people don't really realize when they're using tacit knowledge. To them, they're just doing what they learned to do – it's almost automatic and natural. Getting people to realize that what they do without thought is actually novel is very difficult and why it's been impossible to get people to write that stuff down. It's also extremely difficult to get people to recall the knowledge on demand without context – just like how most of us have trouble explaining to relatives how to reboot their internet router by yanking the power cable and not a different one.

In my experience, the most reliable way to get people to share tacit knowledge is to put them in situations where they have to actually use it. By shadowing people and making them go through the actual motions of a task, they're forced to recall and explain things to you.

I think a number of the guidelines for onboarding people incorporate this idea already. For example, many dev teams want people to launch a small amount of code to production within their first day. The reasoning is often framed as letting new people get familiar with the dev environment and process while also helping them feel productive and being part of the team. The perspective is focused on the new hire. But getting new hires to launch quickly also forces the existing dev team to make their existing launch process easy enough that someone new can work with it in a couple of hours. As they improve the process towards the goal, they're documenting or at least showing new hires what they should be doing. Thus, the tacit knowledge gets transferred down.

Since data science teams often take best practice cues from software engineering, we hear about programs where new hires are given access to data and tooling as quickly as possible so that they can start learning about data. But unless you're working with ML models or data engineering, data science often doesn't have a "production" to make small launches to. Instead we have very private activities like writing SQL or working on some bespoke analysis code, both of which are tools to get us to our actual outputs. So these intermediate steps don't get shared very much because they're not "presentable".

So very often, newly hired data scientists have to learn through trial and error. They look at existing analyses and queries, they talk to domain experts in 1:1 meetings to try to extract the details they need, they get results that then need to be checked by someone with more experience. It's a very slow process.

So I wonder if it wouldn't be better to just do the shadowing thing in software engineering better? Why not have the new hire shadow a senior person as they work on projects and then the senior person can narrate what they're doing so that the new hire can pick up random bits of tacit knowledge? Surely it would surface the most relevant details up quickly.

There's also going the other direction of having the senior person very closely follow along as the new hire works on their first few projects. Many onboarding programs have a buddy/mentoring component that incorporates this work and it can be quite effective.

But where else can we go for info?

While shadowing, buddies, and asking questions along the way seem like a good thing to do, that still doesn't provide visibility into everything else that a person needs to know in order to work effectively in an organization. About the best one can hope for is to be introduced to various points of contact – go to this person for Engineering related stuff, that person for Product stuff, another person for Strategy, and so on. This is probably the basis for the "have lots of 1:1 meetings with people" advice in onboarding.

Maybe it's just me, but being poor at social interaction usually means that I don't get as much out of these 1:1 meetings as I should. I've been wondering to myself whether the problem lies completely within myself (meaning it's difficult to fix) or if I'm just going about it with a poor strategy. Upon much reflection, I think here are the ways I'm doing things wrong and should find ways of improving.

  • The initial 1:1 conversations happen very early on (often within the first month) and aside from learning who the person is and roughly what they do, I don't have enough context to ask intelligent questions
  • I rarely follow up with these same people later on when I actually do have enough context to ask good questions. Because we're all busy and I'm not "new" any more.
    • Should just schedule followup conversations 3mo in advance by default
  • Either way the conversations usually follow the same topics,
    • Introductions
    • What people are working on
    • How that person imagines I can help them given what little they know of what I can do
  • We SHOULD be really talking about the actual problems they're trying to solve, so that I can keep it in mind for learning. But I forget to ask.

Making improvements in this area would probably help in feeling less lost and overwhelmed sooner... probably... But then I'm sure that other things would keep coming up. Onboarding is just hard stuff...


Standing offer: If you created something and would like me to review or share it w/ the data community — just email me by replying to the newsletter emails.

Guest posts: If you’re interested in writing something a data-related post to either show off work, share an experience, or need help coming up with a topic, please contact me. You don’t need any special credentials or credibility to do so.


About this newsletter

I’m Randy Au, Quantitative UX researcher, former data analyst, and general-purpose data and tech nerd. Counting Stuff is a weekly newsletter about the less-than-sexy aspects of data science, UX research and tech. With some excursions into other fun topics.

All photos/drawings used are taken/created by Randy unless otherwise credited.

  • randyau.com — Curated archive of evergreen posts. Under re-construction thanks to *waves at everything

Supporting the newsletter

All Tuesday posts to Counting Stuff are always free. The newsletter is self hosted, so support from subscribers is what makes everything possible. If you love the content, consider doing any of the following ways to support the newsletter:

  • Consider a paid subscription – the self-hosted server/email infra is 100% funded via subscriptions
  • Share posts you like with other people!
  • Join the Approaching Significance Discord — where data folk hang out and can talk a bit about data, and a bit about everything else. Randy moderates the discord. We keep a chill vibe.
  • Get merch! If shirts and stickers are more your style — There’s a survivorship bias shirt!