Data standards and (the lack of) market forces

Mar 13, 2025

Thanks for supporting the newsletter! It's... been a busy, hectic week.... and I'm not even talking about what's going on in the news.

I've been trying to cobble together a functioning office to work out of again and have a giant pile of screws loose, both on my desk and in my head. Amongst all the home fixing stuff going on, I've been trying rather unsuccessfully to find an inexpensive, ready-made desk to serve as a sorta mini workbench and storage area. The problem is that there are, quite literally, no real standards for this sort of thing.

Unlike my occasional complaints that there are barely any standards in the data world and we could use some to make our lives collectively easier, the land of desk manufacturers have all decided that within certain boundaries of human usability, they will do whatever they want. Some will make desks with dimensions that are whole number of inches. Others will make theirs in bizarre fractional inches that most definitely derived from a whole number metric dimension. All things that sorta look the same might vary in some subtle, and not-so-subtle ways in various dimensions and features. Given that I need a table that fits within a narrow range of lengths and depths, it's been a super frustrating experience. It's almost enough to make me want to just build something custom, but the cost of doing so is many times more expensive compared to a mass produced factory item.

But anyways, Standardization is on my mind! More specifically, a while ago I was chatting w/ Q McCallum on Bluesky about standardization and how it's very lacking in the data world, but in the construction world there's a lot of standards for physical objects, which should be much harder to spread adoption of, that entire industries have gathered around. At the time I largely agreed that physical construction goods have a relatively clearer set of standards. Out of the infinitely many different screw threads or lumber dimensions that could be cut, the "normal stuff for a given purpose" found in construction yards across the country can probably be counted with just your fingers. When you think about it, the market forces at work to push that level of standardization must have been gigantic to spur such widespread adoption.

And so, the argument goes, software just doesn't face the same economic pressures to rally around a standard because making new and different software with a completely different API is "cheap" – just write whatever code you want. There's no risk of buying tooling uses one standard and can't be used to make goods of a competing standard. There's no setup costs, training costs, materials costs involved. Go ahead and run DuckDB, have it replicate data off a janky CSV pipeline that's based on FTP. When you hit trouble just write some shim code to bridge the feature gaps. So much easier than having to read a standards doc and make sure you're compliant.

But now, while in the middle of desk shopping and pacing around my room with a tape measure, I know the truth. Standards require market incentives to adopt. In the physical world, the promise of parts standardization, economies of scale, and having a wide market for your product gives very strong incentive to share standards and maybe even suppliers. When you get into the bottom tier "particleboard on metal frame" desks, there's almost no incentive to do stuff the same way the competition. Instead, the manufacturers seem to be using parts and dimensions that are convenient (and cheap) for them. So woe is to the consumer who wants to get stuff to fit into odd-ly shaped configurations with varying size constraints.

And woe is to the data engineer who's gotta write a thousand bits of shim code to translate from one API to another, or to swap a format. I don't think we're likely to see that change in a long long while.