We have DAGs for our data, but also for our work
Reminder! Call for speakers for DataBS-Conf ends on July 31st, so get yours in before something comes up to distract you!.
If you've ever taken a cooking class, one thing they teach you very early on is to set up your mise en place, a French term for "everything in its place". In practical terms it means that before you cook a dish, you have all your ingredients prepped, and placed in easily accessed positions before you turn on the heat and do the actual cooking. The practice prevents you from doing common beginner mistakes like putting an item in a hot pan to brown, moving to another task like cutting the next ingredient, and misjudging the time and coming back to a burnt pan with ruined ingredients. This is obviously important in a professional kitchen since there's tasks are done by different people, but it is often recommended to home cooks as a way to minimize chaos and mistakes.
But as people who cook at home gain experience, they start learning which tasks take longer than others, and this eventually allows them to "cheat" somewhat by finding places where they can slip in some prep or cleaning work during the down time of recipes. Get even more experienced at things and people can juggle the steps of multiple dishes at once, interweaving steps for unrelated dishes to lower the overall time needed.
You're probably seeing where I'm going with these. Cooking processes are effectively DAGs, and we humans can, with practice, get extremely good at tweaking the execution of these complex DAGs through just the use of our mental heuristics and experience. In fact, the way we manage DAGs in our daily lives is significantly more nuanced and complex than most of the DAG tools we use to control our data systems. Our data pipelines are these fragile snowflakes were the slightest unexpected issue unless we layer on a lot of error absorbing logic into their execution. Validation, tests, retries, and ultimately messaging upon failure are all things we layer on top of our data pipelines to make them work. But at least our data DAGs are software engineering problems – something we can learn to engineer around.
Meanwhile, much of our actual work during the day is this messy web of tasks across systems, technologies, people, and teams. There's dependencies, wait times, coordination, and aways the chance we'd forget things in the chaos. Why aren't THESE modeled in some kind of DAG? Wouldn't having some system to keep track of all these ongoing tasks, blockers, and the dependency web between everything make it easier to figure out how to get stuff done?
I'm sure the project managers among us are screaming that we do actually have systems, but most people refuse to use it. Yup, I'm talking about whatever work tracking/ticketing system your particular organization uses. The ones that most of us absolutely hate. Maybe it's the infamous JIRA. Maybe it's a bug tracking system like Bugzilla. Maybe it's some horrible Excel sheet of doom that really should be taken away from whoever insists on using it. It's pretty rare for groups of people to work together without at least some rudimentary system of coordinating work. The only places where such information is kept completely in human memory tend to be dysfunctional, but many of us attempt to get away with doing as little as possible.
But today I'm not trying to harp on the fact that most of us don't like using these systems because the overhead of inputting our projects and statuses is very painful. Data folk tend to have very full plates since there are often fewer of us than other functions, so we're spread very thin in terms of time. Today I want to remind everyone that there is structure behind the endless deluge of work that we face regularly. If we're too busy dealing with the most urgent issues of the day, we can become like those beginner home cooks who wind up burning the eggs while trying to prepare the rest of breakfast.
So yeah, we need to pay attention to our work-DAGs.
Paying attention to your work structure
Whether it's pipelines or real life work, things are still structured the same. There's tasks, states, inputs and outputs, and relationships between everything. While you may think that you have everything in your head on any given day, given our faulty human memories, it's always good to occasionally stop and just take stock of what is going on currently – to draw out on paper the actual DAG of your day of work. The goal is to dump all the knowledge you have in your head about the state of everything and make it so you don't have use your working memory to evaluate it.
Like with many organizational tasks, the important part is you sit down and actually do this exercise and work out what depends on what. The end picture is probably going to be incomprehensible to anyone but yourself, but that doesn't matter. By the time you write things down, you'll be able to see places that don't seem to make sense. If you're anything like me, I'll realize that something had become unblocked ages ago but you've forgotten about it thanks to a thousand other things that came up.
Most importantly, you should be able to see the reflection of your work processes in your work DAG. You'll probably rediscover the team that's blocking you, find that there's stuff that's been overlooked, and things that might be in an inconvenient order. It'll also likely give you an idea of what might be a failure point or bottle neck. Whatever applies to your situation.
Learning actually what works for you
So the problem with systems like JIRA or other things is that teams typically require a TON of documentation. You have to think of a title and have an accurate description of the work. There's priorities and severities and dependency codes. There's epics and stories and all sorts of other things which don't really affect how you see your work, but affect how others understand your work. But if you're in a state where you're getting away with barely doing any of that anyways, then this is also an opportunity to figure out what actually works for you.
Maybe you're like me and only really want a short reminder in the issue title. Or maybe you find out you lean really heavily into the using date fields like 'due date'. Or you're just a stickler for setting priority codes right so that you can tell people their request just isn't important enough. By seeing how you organize and visualize your work-DAG to yourself, you learn quite a bit about how you prefer to organize this information.
If you've ever worked even briefly within an organization that requires the use of these tools, however cursory, you'll notice that there's almost never any guidance for how to use them. There's requirements and guidelines that imposed by leadership, but there's inevitably a huge amount of leeway given to individuals. No one uses these tools in the same way even when they follow the same set of rules. In fact, listening to anyone else's description of how they use the tool usually sounds like madness because you would never see the world in the same way as another person. This is one of the few places in modern corporate work life where everyone seems to agree to completely disagree on how to apply the tools.
So the best thing to do is to stop trying to align your work patterns to the views of others and figure out what works for you specifically.
Then you can get back to watching your pretty work-DAG get crashed by some executive who happened to ask "an innocent question" in a meeting on Monday.
Standing offer: If you created something and would like me to review or share it w/ the data community — just email me by replying to the newsletter emails.
Guest posts: If you’re interested in writing something, a data-related post to either show off work, share an experience, or want help coming up with a topic, please contact me. You don’t need any special credentials or credibility to do so.
"Data People Writing Stuff" webring: Welcomes anyone with a personal site/blog/newsletter/book/etc that is relevant to the data community.
About this newsletter
I’m Randy Au, Quantitative UX researcher, former data analyst, and general-purpose data and tech nerd. Counting Stuff is a weekly newsletter about the less-than-sexy aspects of data science, UX research and tech. With some excursions into other fun topics.
All photos/drawings used are taken/created by Randy unless otherwise credited.
- randyau.com — homepage, contact info, etc.
Supporting the newsletter
All Tuesday posts to Counting Stuff are always free. The newsletter is self hosted. Support from subscribers is what makes everything possible. If you love the content, consider doing any of the following ways to support the newsletter:
- Consider a paid subscription – the self-hosted server/email infra is 100% funded via subscriptions, get access to the subscriber's area in the top nav of the site too
- Send a one time tip (feel free to change the amount)
- Share posts you like with other people!
- Join the Approaching Significance Discord — where data folk hang out and can talk a bit about data, and a bit about everything else. Randy moderates the discord. We keep a chill vibe.
- Get merch! If shirts and stickers are more your style — There’s a survivorship bias shirt!