measuring <=> important ( =\ )
This bi-week's subscriber post is something of a rant that needs some more baking in my brain to be a mainline Tuesday post, but it's tangled enough that it needs to get outside my brain a bit. Think of it a pre-draft draft.
I've mentioned multiple times in many places that in terms of "how much to data to collect?" my answer is always going to be "exactly what you need to use now, plus the little extra you expect to actually use in the near future, nothing else". And, for years and years, that's been the guiding principal with which I've directed data collection questions in multiple companies. The folks that I normally work with are often on board with that philosophy because it beats paying for terabytes of unused data slowing everything down and vulnerable to hacks or whatnot.
But, no guiding principle is perfect. My principle is essentially balancing the many complex tradeoffs involved in data collection with the needs and abilities of the analyst tasked with using the data. An experienced analyst has a pretty good idea of the kinds of questions they're going to be asked around a certain problem and make sure they can collect the data to cover all those bases. We also know what data we'd like to have in our back pockets that won't be needed day-to-day, but may be very important when things go very wrong. So it's a restricted analytical kitchen sink instead of the literal one.
But I've seen multiple instances over the years where this sort of system goes wrong. At some point, things get distorted from "We measure this stuff because it is important to us for $use_case" to "we measure this stuff, it must be important". Some causality arrows get turned around, some caveats evaporate into collective memory-hole. Then you get really annoying pathological cases where people start asking "what the metrics are" for things that are being instrumented and monitored for "shit hits the fan" and exploratory purposes and not "we need to optimize this" reasons.
Here's a concrete example: