A fluffy pile of melting snow representing my sanity the past 6mo

When data (and rationality) got in my way

Mar 19, 2024

For the past 18+ months, my wife and I have been trying to navigate the ridiculous reality that is one of the most unaffordable housing markets in recent memory. To put it bluntly, it has been a soul-grinding experience. In a hyperrational world, we really should have tried to stay put and wait out the ridiculous combination of pent-up demand from COVID's peak era and the extremely low number of houses going on sale. But kids, school districts better than the very poor one we live in currently, and inevitable march of time don't care about market conditions nor my sanity, nor my bank account.

Against this background, while sometimes laying in bed awake out of an abundance of either adrenaline or cortisol, I was often marveling at how the current experience, which you think would be a highly data-driven process, was also in very many ways NOT a data-driven process.

Real estate transactions are essentially a game of between a host of parties that are forced to act with unequal information. Depending on what's going on in the marketplace, knowledge about what houses are being sold at what prices, how many people are buying or selling, and a host of other statistics all work together to provide valuable information for determining a fair market price. Having this knowledge is extremely powerful, which is why realtors set up their private MLS (Multiple Listing Service) networks to share listing information of properties. It's also why huge companies like Redfin and Zillow exist, slurping data from all sorts of places, both private and public, and then using the data as a basis to provide value-added services.

For the first year or so while house hunting, we essentially dove headfirst into the data of properties being sold. Since only 0-5 houses a week were popping up that fit our required parameters while staying within our budget, it wasn't too difficult to stay abreast of things. We'd see a 1400 sqft house go for this price, a 1800 go for another, an old beat up fixer house goes for a certain price bracket that was likely to a developer, while the fancy post-flip houses went at another price. Homes in one area are extremely valuable due to the schools, while homes just a street over had a as much as a 40% discount because they weren't in that school district. We had to learn how to guesstimate renovation costs to figure out whether we were over-bidding on a fixer or not. There were dozens of parameters we had to learn the nuances of, so you'd think that a complete data nerd like me would have an advantage.

Ultimately, being too data savvy was actually a problem.

On countless occasions, we'd put in what we believed was a competitive bid on a property based off our overzealous understanding of all those parameters we learned... and we'd constantly get blown out of the water by someone willing to pay significantly more. Worse, at the time we put our bids in, we'd be told we were decently competitive, but then there'd always be a much later bid that'd win out. I hate being on roller coasters in general, but I didn't have a choice with this one.

It's possible that our estimations of renovation costs or a bunch of other factors were off – how much do you discount a property for being in a 100 year flood plain for instance. But I suspect that there was a more subtle reason. Our mental models of pricing were just unable to keep up with the rapid development of the market. Houses being sold take multiple months before they close and become recorded in public documents. By the time we see that price, there's been a 3+ month lag and the market had completely changed by then.

In order to be truly competitive, the only way was to dial back our reliance on the data aspect. That's not to say the data and information is useless, it still puts boundaries on what is out of line. More important than the pure stats alone was the fact that the same pool of ten, fifty, buyers were all looking at the same tiny pool of houses coming on the market and making pricing decisions based on that fact. The game wasn't simply about whether you could give an offer that the seller would accept. The game was actually about whether you were willing to bid high enough against the pool of buyers actively putting in bids – and you had to guess at this number range.

Knowing what kind of game you're playing is critical here – it can completely change your bidding strategy. We were competing against a faceless group of bidders, but probably the same ones bidding on the previous house (minus whoever bid the most last time). There's precious little information available about this mysterious, ephemeral group, which is probably why we had so much trouble for so long. A three month lag in seeing a "sold for" price was useless because the mix of bidders had changed too much, the market conditions had nudge prices higher or lower. Instead, we had to change our tactics and start scratching for information like asking the listing agent how many bids were coming in, how quickly, and judging how much interest there was by watching other people visiting the property at open houses. Even knowing that real estate agents can be as dishonest as they want with any statements they make, we needed to at least get a sense of the vibe in order to factor it into pricing strategy.

In more normal, not-frozen markets, this is usually less of a concern. You're competing against maybe a couple of bidders who are all likely going to price things based comparable fair values. There's fewer people around to chance upon the one desperate person who's been searching for the past 8 months and is frustrated enough to put up a premium to have a house right now instead of later.

Yes, the fact that I have to finalize pricing offers based entirely on a rough mental model mixed with a giant load of vibes is extremely frustrating. I hate every moment of it. It's expensive and wasteful. But that's what you need to do in a pathological market if you've made the decision to participate at this moment. The smarter move is to walk away and perhaps wait until things thaw out, but that's not a viable option right now.

Putting my gripes about real estate and opaque winner-take-all auction models aside for now, I feel that the experience is probably good for me. I don't think I going to get a more forceful lesson in how getting the research question wrong by not realizing what game I'm supposed to be playing can be a massive waste of time and resources.

When starting this post I was about to write how our research questions in normal data work doesn't usually devolve into pathological cases like what my home buying experience was like, but then I realized that they might actually do. Why else would we constantly have to ask our stakeholders/clients what they're actually trying to accomplish?

For example, consider Goodhart's Law, where metrics that become targets stop being good metrics. The law highlights how measuring something for understanding purposes is a very different problem from incentivizing a desired outcome. Similarly, I feel like I've personally worked on all sorts of projects where we start out with one question and eventually gain insight into what "the real problem" was. All that is just considered the natural evolution of research and analysis. While we usually don't devolve from doing one thing and switching to "mostly vibes", we do change direction quite a bit. We often change direction so much that the original reason why we started might not even be worth mentioning. That's just how it goes and we roll with it.

This experience also highlights another aspect of data work that we usually don't think about too often – there are situations when models based around data and rationality break down. People who are familiar with economics, especially parts like behavioral economics that look into cognitive biases that can cause humans to act in seemingly irrational ways (in the classical economics sense) at times, know this all too well. Wonky human biases typically play a subtle part in most data science work and we often ignore it since the biases are baked into our datasets. The same reason why it's extremely hard to "un-bias" datasets is why we can ignore plenty of irrationality for prediction. But here the wonkiness grew big enough that it throws the predictions wrong.

Either way, as Keynes once observed, markets can remain irrational for longer than you can stay solvent [betting against them]. I'm about 40% prepared for this whole situation to blow up in my face and set me back a bunch of years. We'll see how it goes.


Standing offer: If you created something and would like me to review or share it w/ the data community — just email me by replying to the newsletter emails.

Guest posts: If you’re interested in writing something a data-related post to either show off work, share an experience, or need help coming up with a topic, please contact me. You don’t need any special credentials or credibility to do so.


About this newsletter

I’m Randy Au, Quantitative UX researcher, former data analyst, and general-purpose data and tech nerd. Counting Stuff is a weekly newsletter about the less-than-sexy aspects of data science, UX research and tech. With some excursions into other fun topics.

All photos/drawings used are taken/created by Randy unless otherwise credited.

  • randyau.com — Curated archive of evergreen posts. Under re-construction thanks to *waves at everything

Supporting the newsletter

All Tuesday posts to Counting Stuff are always free. The newsletter is self hosted, so support from subscribers is what makes everything possible. If you love the content, consider doing any of the following ways to support the newsletter:

  • Consider a paid subscription – the self-hosted server/email infra is 100% funded via subscriptions
  • Share posts you like with other people!
  • Join the Approaching Significance Discord — where data folk hang out and can talk a bit about data, and a bit about everything else. Randy moderates the discord. We keep a chill vibe.
  • Get merch! If shirts and stickers are more your style — There’s a survivorship bias shirt!