Don't take optimization to a growth fight

TopPost Feb 27, 2024

Over the years, I've worked on products with very different user/revenue numbers. They've included products that are used by millions of people, and others that had under a hundred. Revenue numbers stretched from a few thousand dollars into the billions. One of the most consistently difficult things to do as an analyst working with teams on those products is knowing what is the appropriate class of methods to apply to a given situation – a growth-oriented one, or an optimization oriented one.

In my mind, "growth-oriented" methods are the "quick and dirty" methods that are very often employed in startup companies that are chasing high growth. Very often, if they're using statistical methods at all, they're trading statistical sophistication and power for speed and simplicity. Since available data in high-growth startup situations tends to be very sparse, there's a non-trivial amount of qualitative anecdotes, conjecture, sloppy "directional findings", and sometimes pure guessing and betting involved.

Meanwhile, "optimization-oriented" methods tend to be significantly more rigorous and able to catch smaller effect sizes. Much like in a hill-climbing optimization problem, the first few steps in optimization tend to show relatively large gains, but eventually much of the "juice" gets squeezed out and you're running increasingly complex analyses to squeeze smaller and smaller gains out of a system.

Note that the actual methods used for either orientation can have significant overlap. You can design an A/B test that can detect 0.1% effect sizes just as easily as one that detects 10% effect sizes – the former will just take many times more observations to do. The question is, when do you deem it's appropriate to apply the 0.1% test because detecting such a small effect requires a combination of large sample sizes, but also very focused experimental designs.

Under typical conditions, growth vs optimization modes are dictated by the maturity of the business that's being operated on. If you already have a billion users, you don't expect to get another billion very soon, so you can't rely on pure growth to fuel metrics improvements. Also, having a very large existing userbase means that even tiny percentage changes still mean huge amounts of users and revenue. That's why more mature businesses naturally tend to shift into an optimization mode. Small gains are worth it at scale, while at the same time no one wants to take responsibility for massively breaking a multi-billion dollar money-making machine. Thus, it makes tons of financial sense to become more risk adverse over time.

Meanwhile, growth stage businesses still don't have their market or product figured out yet. They're regularly making significant changes to the product that they hope will double their business or better. A 5% increase in something is too small to matter in their business because someone could sneeze and that effect may disappear due to random external influences. The customer base is also usually quite small so a small percentage gain just isn't all that significant.

Which brings me back to the title. Very often, I notice that product teams can get confused as to what mode they should be operating under for their work at hand.

For example, large organizations with big, established customer bases naturally build up processes for doing optimization. Teams can get very used to experiment guidelines that are generally tuned to finding small/medium effects because those are what a reasonable person can expect of any experiment. No sane individual at a giant places like Netflix or Facebook or Google will stand up and say that they're going to find a 50% increase in new users on the most widely used and established products because it's highly improbable. For one thing, are there even enough human beings around that aren't currently using the product and can be convinced to do so in order to hit that goal?

But what if any of those big companies decide to launch some brand new small feature or product? Suddenly, instead of a million users, there's a hundred. When a team wants to improve a product by using the playbook they're familiar with, they run into problems. Applying similar standards of rigor to this new product is going to be painful if not impossible. To begin with, how many effective experiments can you seriously design and launch if the entire population is 100, or even 1000?

Progress can't be made without accepting a compromise somewhere. The team has to make bigger, bolder, riskier treatments. They need to accept that their margins of error are bigger and they're unlikely to get a clear picture of what is going on. Convincing them of this can sometimes really really difficult. It's asking people to give up tools that they are familiar with, risk profiles they're familiar with, for something that may be completely new to them. And they're supposed to bet weeks or months of work investment into it?

But convincing teams to take on a bigger risk profile for optimization experiments isn't even the most difficult part – sometimes you need to convince them that they're barking up the wrong tree entirely.

Going back to that hypothetical new product that has a hundred users. The user base is also growing by 10 users a week. A team that's used to optimizing things all time time might start trying to improve the sign-up flow, because that will help get new users. Those people aren't entirely wrong, energy should be spent making sure that sign-up and purchasing are as simple as possible. As an analyst, you look at the sign-up funnel and find that 25% of people who seem to see the flow sign-up. There's definitely room for improvement. But with 400 people visiting the page, and 10 getting through weekly, the team has to ask itself whether even attaining a magical (and unattainable) 100% conversion rate of 400 people a week is what they want.

What I'm trying to say is, maybe their efforts are best spent trying to attract more users to their product to begin with. Maybe through better marketing efforts. Maybe through SEO. Maybe it involves partnerships or integrations. Even if the conversion funnel is leaky for a early-stage product, pouring more potential users in through the very top can be extremely cost effective.

But guess what, telling engineering and product teams tasked with "building out a new product" that their biggest problem is a marketing/acquisition one... is not easy. It's very often outside of their realm of expertise. I've seen countless engineers hold a mindset of "if I build a good product, people will come" for their latest new feature, only to see their disappointment when no one has heard of it and their adoption metrics are miniscule.

As analysts who help such teams figure out their metrics and experiments, we're often called upon to help teams optimize stuff. Our tools are darned good at it. But we're also in a position to see these issues coming long before the product teams themselves because we're familiar with what those methods can and can't say under tight conditions. And while it may be quite uncomfortable and awkward, it's important for us to stand up and tell teams that the comfortable thing they want to do is realistically a waste of time until they get a bigger user base.

Luckily, our tools are also pretty good at analyzing and optimizing marketing campaigns. 😆

All that being said, it's also not a great idea to use growth-oriented strategies for things that need to be optimized. Even if you tone down your expected effect sizes to something a reasonable range, you eventually reach a point where the methodological equivalent of licking your finger and holding it up to the wind is probably more risky than blindly guessing. At least with guessing, you don't fool yourself into thinking you've discovered some durable truth that you try iterating on for multiple cycles before deciding it doesn't work.

Squeezing effects out of highly scrutinized systems takes a lot of careful work where handwaving assumptions around can ruin things. The models get increasingly complex, as does the work to characterize and control for all the errors within all the parameters.

And so, as with everything, using the right tool for the job at hand is key. And it's part of our job to convince others what the right tool is.

Standing offer: If you created something and would like me to review or share it w/ the data community — just email me by replying to the newsletter emails.

Guest posts: If you’re interested in writing something a data-related post to either show off work, share an experience, or need help coming up with a topic, please contact me. You don’t need any special credentials or credibility to do so.

I’m Randy Au, Quantitative UX researcher, former data analyst, and general-purpose data and tech nerd. Counting Stuff is a weekly newsletter about the less-than-sexy aspects of data science, UX research and tech. With some excursions into other fun topics.

All photos/drawings used are taken/created by Randy unless otherwise credited.

randyau.com — Curated archive of evergreen posts. Under re-construction thanks to *waves at everything

All Tuesday posts to Counting Stuff are always free. The newsletter is self hosted, so support from subscribers is what makes everything possible. If you love the content, consider doing any of the following ways to support the newsletter:

Consider a paid subscription – the self-hosted server/email infra is 100% funded via subscriptions
Share posts you like with other people!
Join the Approaching Significance Discord — where data folk hang out and can talk a bit about data, and a bit about everything else. Randy moderates the discord. We keep a chill vibe.
Get merch! If shirts and stickers are more your style — There’s a survivorship bias shirt!