As a mostly ad-hoc'er, I've got workflow issues

May 20, 2025

I have a confession. I completely missed the boat on the whole dbt and analytics engineer fad that went super hot and then burnt out faster than a cheap firework in a low interest rate bubble. It's not that I find anything fundamentally wrong with the concept of shared metrics definitions made in code and checked in. It's not even that I object to recording and offloading complex metric definitions to something outside of my brain. It's that as a UX researcher and "person who does primarily weird ad hoc analysis" there's a very strong tension between how much I leverage pre-existing code and my effectiveness writing new code.

It's probably easier to explain with an example of how I choose to work instead of describing it in the abstract. If I'm asked to pull some numbers to answer a question, I'll pull out a SQL client to start since that gets me what I need. I might pull out a reference file that has answered a related question and can act as a reference point. But even with those aids, I write the query out by hand with little to no copy pasting. At best, maybe code completion picks up that I keep typing the same patterns of strings.

When I say I write the query out by hand, I mean it quite sincerely. I am typing it in as if I'm writing it from scratch. There's a few reasons I do this.

It forces me to stay very familiar with the query and tables involved, to the point some of it becomes muscle memory
It keeps some of the most common patterns of conditions in my brain instead of in a doc somewhere
It keeps me familiar with typical patterns of solving certain common problems in SQL

It should be obvious that I place a very particular premium on keeping at least certain fundamental details fresh in my mind through endless repetition. This is in spite of the obvious costs involved in keeping these details in my memory, and how it's slower to rewrite common bits of SQL from scratch. It's also obviously easier to make mistakes since I'm very much a fallible human who is also constantly short on sleep.

But despite the efforts of well meaning folk who have tried to introduce me to new and "better" workflows over the years, I still keep coming back to variations on what I have now. Because I know that if I don't put in a minimum baseline of reps in as exercise, I'd get stuck when asked to pull some new weirdo thing that's never been queried before despite the data existing for years. It's like doing my daily climb up the stairs in the office instead of taking the elevator.

Trailblazing vs paving

When it comes to data analyst work, we are typically either doing weird ad-hoc analysis work or repeatable infrastructure work. We can switch between the two quite frequently, but I can't think of a situation where we are doing both at the same time.

For many data teams, work often starts with a request that has a lot of messy unknowns involved. There's a data exploration, cleaning, and testing hypotheses. There's frequently dead ends as we learn just what our data is capable of because we don't know any better. The whole process is a hectic mix of clever hacks and fanciful hunches. Eventually, when we get lucky and find a story that resonates with the audience and has an impact on how decisions are made, we take the time to clean up the mess we made while exploring to build something sustainable.

From there, all the considerations are different. Paving a long lasting road is a much different skill than hacking a path through rough terrain. Many of the things that dbt helps solve for, code reuse, shared metrics and models, layering analysis on top of one another, are downright necessary for sustained data infrastructure building.

And so, as someone who happily spends most of their time "out in the wilds" exploring, I constantly do things that seem inefficient at best, sloppy at worst. Like I mentioned, all but my most complex queries are from scratch most of the time to keep the quirks fresh in my head. The little "SQL snippet doc" I keep on hand (everyone's got one of these, right?) is just a collection of old queries that serve to remind me how certain tables work and properties are measured – I fully expect to have to read and rewrite those queries to fit my new problem without doing much copy/pasting. I still habitually look at raw rows of data even though I've seen the same table hundreds of times because I'd otherwise forget some of the more esoteric fields in them. All my code is considered disposable until proven otherwise, because experience has long taught me that this was the case. I'm very much at home in this form of chaos.

Not just "a guy's quirky workflow"

Look, I admit that I have a quirky workflow that I would not recommend to most people... or anyone really. Let's not get into how I regularly use vim for most of my small coding tasks.

The point I am trying to make today is that I've seen multiple people see the "analytics engineer" bubble zoom by and wonder if they were legitimately "a data person" because they saw no real use case for that stuff in their work. They weren't ever defining a recurring dashboard job or populating a data warehouse. Same went for the very sustained "ML/AI engineer" boom that happened over the past decade or so. It's all imposter syndrome fodder.

Folks that are in similar roles like me, who spend most of their time doing ad-hoc work answering relatively unexciting bespoke questions with hand-crafted artisanal queries are just chugging along in our niche while tech advances pass us by. The most excited we get is when there's a new useful function added to SQL or our preferred plotting library gets easier to use.

Very few of us stand up to say "so I managed to find this interesting pattern that better answers this burning business question, but it's totally unique to our specific business". Every day this is happening somewhere, but there's no one to share that success with. It's just like all the folks who quietly run linear regression models, bandit models, use Excel or Hadoop, or that one guy who's managed to slip an SVM into production somewhere in defiance of what everyone else seems to think about the method. Our peers aren't visible.

Even worse, this sort of invisibility is self-reinforcing. If no one publicly celebrates changing some executive's mind with the power of an Excel table and line chart, other people won't think to do it. Who's going to cheer about the 300 line SQL query of utter despair because whoever designed the database made a horrible design design?

I want to cheer about this stuff. I just don't get to see it. And that makes me sad.

Subscribe!

Standing offer: If you created something and would like me to review or share it w/ the data community — just email me by replying to the newsletter emails.

Guest posts: If you’re interested in writing something, a data-related post to either show off work, share an experience, or want help coming up with a topic, please contact me. You don’t need any special credentials or credibility to do so.

"Data People Writing Stuff" webring: Welcomes anyone with a personal site/blog/newsletter/book/etc that is relevant to the data community.

I’m Randy Au, Quantitative UX researcher, former data analyst, and general-purpose data and tech nerd. Counting Stuff is a weekly newsletter about the less-than-sexy aspects of data science, UX research and tech. With some excursions into other fun topics.

All photos/drawings used are taken/created by Randy unless otherwise credited.

randyau.com — homepage, contact info, etc.

All Tuesday posts to Counting Stuff are always free. The newsletter is self hosted. Support from subscribers is what makes everything possible. If you love the content, consider doing any of the following ways to support the newsletter:

Consider a paid subscription – the self-hosted server/email infra is 100% funded via subscriptions, get access to the subscriber's area in the top nav of the site too
Send a one time tip (feel free to change the amount)
Share posts you like with other people!
Join the Approaching Significance Discord — where data folk hang out and can talk a bit about data, and a bit about everything else. Randy moderates the discord. We keep a chill vibe.
Get merch! If shirts and stickers are more your style — There’s a survivorship bias shirt!

Hobby report: A printed violin

Your data work is part of work politics

Daylight savings broke software (again) and other Time News!

Rediscovering joy in silly computer stuff

As a mostly ad-hoc'er, I've got workflow issues

Trailblazing vs paving

Not just "a guy's quirky workflow"

Randy Au

Hobby report: A printed violin

Your data work is part of work politics

Daylight savings broke software (again) and other Time News!

Rediscovering joy in silly computer stuff

Trailblazing vs paving

Not just "a guy's quirky workflow"

About this newsletter

Supporting the newsletter

Subscribe to our newsletter

Randy Au