There’s an algorithm in artificial intelligence to search and find the most optimal local solution available, i.e., the local maxima.
It’s called Hill Climbing. And after reading about it, I had a lot of thoughts around what experimentation at a startup generally looks like.
But first, let's see how hill climbing works.
You’re climbing a hill, determined to find the path that will lead you to the highest peak. However, there’s no map provided and it’s very foggy. To make your trips easier, you have downloaded a hiking app that tracks paths you’ve taken and measures your current altitude.
You climb the hill over and over again. Each time, you take the exact same path that leads you to the highest peak ever recorded, but somewhere in the middle of your journey, you choose a slightly different route.
You sometimes also randomly choose a different starting point, which is known as random-restart hill climbing — so that you don’t just linger around the same area and reduce your probability of getting stuck.
Essentially, the hill climbing algorithm attempts to find a better solution by generating a neighboring solution. Each neighboring solution is generated based on the best solution so far, with a single element modified.
Now, imagine that you're conducting a performance marketing experiment.
There are normally two approaches:
1. If it's for a product that's going to be sold for a long time duration, you can have a single ad set and then keep changing one variable at a time, per iteration. You run each iteration for a fixed period of time, and then pick the best performing one so far. You change a variable within it and run a new iteration to see if you can do even better.
2. The other approach works better for a content product or an event that's only going to be marketed for a short time duration, say, a week. In this case, you make multiple ad sets with different targeting and/or communication. Since the ad campaign ends after a week, you can't iterate over time. So, you create all your iterations at once and run them together to diversify your bets and get an optimal result.
In either case, when we say "change only one variable at once", it's actually pretty hard to do in reality as you have environmental variables that keep changing all the time.
For example, let's say you're running an ad set this week and you tweak a single variable for the next iteration that you are going to run over the next week. But next week a major media or national event happens that influences consumer spending. How do you measure to what extent the performance of your ad set changed due to your tweak or due to a change in external variables, whatever they might have been?
Hence, experimentation at startups, when you're dealing with 100+ contextual variables and a super dynamic environment, can only happen by seeing variables that are constant perform over different contexts. That's how you build intuition around what works consistently and what doesn't.
Tweaking variables is okay, but the actual learning happens when you tweak a variable that didn't change all this while across a longer time span and several different contexts. If you see a performance change then, you can — with a higher degree of certainty — say that it was your tweak that led to the performance increase, and not anything else.
Another mistake people make is concluding from experiments too quickly.
A thing like the tone of voice or brand language cannot be evaluated over a week. Because customer perception has quite a lag and shows results only over the longer term. And it's only when you have seen it perform over different contexts can you say that it really works or doesn't.
Sometimes, there's no substitute for time, while experimenting in complex environments. Because environmental variables that you cannot control for only change with time, and the effectiveness of your strategy can only be measured by how it fared over time. There's no quick way to measure it.
At least not that I can think of.