Making "intermediate+" content

Dec 3, 2024

This newsletter largely came about because there's never an end to content for beginners while I could barely find anything interesting to read in that constant content steam.

I didn't want to learn how to study for a data science interview nor read another description of what the job is. I certainly didn't need another cursed venn diagram explanation of SQL joins. There's a big gap between what I want to read and the beginner stuff that lots of influencers and content mills love to churn out. There's always an endless supply of new folk who want to consume content aimed at beginners, especially when data science work is still a highly paid position. That population creates a market that generated traffic and clicks which drive the modern web publishing ad industry.

So, hey, be the internet you want to be and all that jazz, right? After getting lucky and having a couple of of my early posts go mini-viral on data Twitter in 2019, I decided I was going to write and think about stuff I find interesting – that meant stuff that wasn't aimed at beginners.

But one issue you quickly hit upon when you strive to create stuff meant for more advanced, intermediate+ folks is that it's very tricky to get right because there is no objective definition of what that actually means. We all have a rough sense of what beginner material is – it's the content that assumes absolutely no prior knowledge or experience. It's the endless "hello world!" of writing. But if you're aiming to not write for that segment, you run the very real risk of being "too advanced" for whatever small audience you are aiming at. Anyone who's used to writing for peers like academic papers and suddenly having to write a pop-sci blog post probably feels this pain.

So while the internet is in desperate need of more intermediate+ stuff, it's quite tricky to get right. So I want to try to make it a little easier since I've been doing it for a while now and internalized some stuff.

Goodbye, white lies

I think over the past 4 years of writing, I've slowly come to a personal definition of what non-beginner content is like. I think that "intermediate+" content is when you stop telling as many white lies to your audience. Let me explain.

Know how when you have a new person join your team and you have to sit down and explain how all the tech stack works? Here's the main data tables, here's the list of status codes, etc.. You know full well that those descriptions are full of white lies and simplifying statements just so you can fit the whole picture into a single whiteboard or slide deck. Yes, our web site is just PHP fetching from MySQL running on nginx – just ignore the caching layer, the Redis KV store, the recommendation pipeline, the order reconciliation job, the bug we introduced last week, the refactor we are planning for Q2. All those disappear for the sake of getting that first level of understanding down. The expectation is that in later explanations, more detailed introductions to specific elements will happen as needed.

Every "intro to $language" tutorial does the exact same thing, which is why I find them so frustrating to read nowadays. Things are so sanitized and neat that I struggle to build the necessary mental models needed to handle myself when things stray even the slightest from the curated path of a tutorial.

So my gut feeling is that intermediate content starts with deciding to selectively peel away one or more white lies to show the complexity therein. You might be replacing one big white lie with a series of smaller white lies, but you're bringing more nuance and complexity to the reader. More importantly, by explaining those nuances, you're helping the reader learn how things are put together better.

And that's why I use this process as the first mark of intermediate+ content. It's one of the earliest signs of "hey, you're not a rank beginner now and we think you can handle this extra complexity". At some point, if you peel back enough of the white lies, you're going to get into "advanced" territory, which is usually when you stop lying altogether and go deep into the nitty gritty details of why things do or do not work.

The more obvious it is, the more useful it is

The next thing I hear a lot from people would like to write about something but are having trouble coming up with a topic is that nothing looks "at the right level" to write about. This is primarily because we all naturally judge the difficulty of things based on ourselves. Things we can't understand are probably "advanced". Things that we understand very well are probably "basic". Intermediate is a very thin slice of knowledge that is passing from one state to the other.

My advice for finding good intermediate+ topics to write about is to write about something that seems downright obvious to you. For example, take something that sounds very beginner-ish like version control. You could write a strong intermediate+ post about version control by just showing how and why you use version control to be productive. Maybe it's just tips and tricks, or maybe you take advantage of various hooks for automation. Maybe you just recently screwed up your repo and want to show how you managed to recover things. All of those topics would peel back the basic white lies of regurgitating commands like git add * && git commit -m "some stuff".

I often tell the story of a friend of mine who gave a talk that was summarized as "your QA teams should talk to the dev teams early and often to give feedback so both teams are more effective" – perhaps one of the more obvious observations about creating good software quickly. But my friend was speaking at an education/tech conference where waterfall and bureaucracy were the norm and so there was a huge positive response from the audience because many hadn't considered the idea. Obvious good ideas that work very often aren't obvious in other contexts (they might not work in other contexts, but that's a debate for another day).

Balancing context and content

The final tricky thing about writing intermediate+ content is that you have to be very aware of what share context you are leaning upon. If you are too far off the mark with that estimation, your audience will be extremely frustrated when reading your work. They're going to think it's too difficult to follow and it's probably their fault for not being "advanced enough". I think it's very common for people trying to write more advanced content to mistakenly assume everyone knows what they're talking about and a somewhat intermediate piece becomes a highly advanced piece.

Since I write for a pretty broad audience, I actually assume that there's very little shared background across all readers. This is because there's no real standard education background to doing data work. I'm sure most people have at least a passing understanding of what "significance" might mean, but some people will know the actual theory behind it while others will only know the definition from their stats 101 class. (Incidentally, I lean more towards the latter than the former.) The same applies to skill with programming, SQL, or even spreadsheets. And just forget about domain knowledge.

So I wind up explaining a lot of things briefly to make sure my readers have at least the basic context in their head (or in another tab) to understand what is going on, especially for the core concepts at play for a given post. It's one of the contributing factors (my general wordiness aside) that contributes to my typical 1500+ word post length.

My feeling is that this delicate balance in guessing what context people are likely to read while providing the necessary explanations to them at the correct level of detail/brevity needed to keep the post going is the craft and art part of writing intermediate+ content. No matter what you do you're going to wind up leaving some people in the dark and including others, but it's up to your skill as a communicator to determine how big a group you manage to include while keeping all your other writing goals in line.

Another thing to remember is that what context you need to share is also a moving target over time as audiences and context windows shift with the ages. I definitely knew I needed to explain what the heck a webring was because no one under the age of 35 had likely seen one before. The same is increasingly true for explaining MapReduce, Hadoop, or SQL before the age of analytic window functions. Who knows what will be collectively forgotten by everyone, leaving only shadows in your mind. You have to constantly pay attention to where people are at with terminology and methodology.

But hey, as a practitioner, that's part of your normal work anyway. So you have little excuse to not write intermediate+ material for intermediate+ peers to learn from.

Oh yeah, incidentally, as noted below, I'm always happy to share guests posts written by anyone who wants to give it a shot. I act as a sort of editor and give advice and comments as-needed. So always feel free to let me know if you have a idea proposal.

Subscribe!

Standing offer: If you created something and would like me to review or share it w/ the data community — just email me by replying to the newsletter emails.

Guest posts: If you’re interested in writing something, a data-related post to either show off work, share an experience, or want help coming up with a topic, please contact me. You don’t need any special credentials or credibility to do so.

"Data People Writing Stuff" webring: Welcomes anyone with a personal site/blog/newsletter/book/etc that is relevant to the data community.

I’m Randy Au, Quantitative UX researcher, former data analyst, and general-purpose data and tech nerd. Counting Stuff is a weekly newsletter about the less-than-sexy aspects of data science, UX research and tech. With some excursions into other fun topics.

All photos/drawings used are taken/created by Randy unless otherwise credited.

randyau.com — Curated archive of evergreen posts. Under re-construction thanks to *waves at everything

All Tuesday posts to Counting Stuff are always free. The newsletter is self hosted. Support from subscribers is what makes everything possible. If you love the content, consider doing any of the following ways to support the newsletter:

Consider a paid subscription – the self-hosted server/email infra is 100% funded via subscriptions
Send a one time tip (feel free to change the amount)
Share posts you like with other people!
Join the Approaching Significance Discord — where data folk hang out and can talk a bit about data, and a bit about everything else. Randy moderates the discord. We keep a chill vibe.
Get merch! If shirts and stickers are more your style — There’s a survivorship bias shirt!

A silly desktop motor device for me (and y'all)

Data work in the fast fashion code era

Avoid the lure of working on metrics over constructs

Measuring Snow is Decidedly Not Easy

Making "intermediate+" content

Goodbye, white lies

The more obvious it is, the more useful it is

Balancing context and content

Randy Au

A silly desktop motor device for me (and y'all)

Data work in the fast fashion code era

Avoid the lure of working on metrics over constructs

Measuring Snow is Decidedly Not Easy

Goodbye, white lies

The more obvious it is, the more useful it is

Balancing context and content

About this newsletter

Supporting the newsletter

Subscribe to our newsletter

Randy Au