Please learn from our mistakes

No-bullshit lessons in business and careers. One mail every day. 15k+ readers love it. Join in?

Oops! Something went wrong while submitting the form.
4 Apr

Not all data is made equal.

How many times have you heard someone defend their startup idea by saying,

"Our startup will be valuable because we will have all this data."

My response is always this:

"And then what? What will you do with that data? How will it help you in meaningful ways?"

Data by itself isn't valuable. What kind of data you collect, what you do with that data, and how it impacts your users decides how valuable that data is.

There is a phenomenon that happens in business called a Data Network Effect.

Here's how Matt Turck describes it:

“Data network effects occur when your product, generally powered by machine learning, becomes smarter as it gets more data from your users.

In other words:  the more users use your product, the more data they contribute; the more data they contribute, the smarter your product becomes (which can mean anything from core performance improvements to predictions, recommendations, personalization, etc.); the smarter your product is, the better it serves your users and the more likely they are to come back often and contribute more data – and so on and so forth.

Over time, your business becomes deeply and increasingly entrenched, as nobody can serve users as well."

Google Search is commonly cited as a great example of a product that benefits from data network effects. The more people search, the more data they add to the system, which in turn improves search results and sharper advertisement targeting, which then attracts more users.

But not all data is the same or can be leveraged the same way to create data network effects.

Some elements which influence data and formation of data network effects:

1. Data capture

The more friction there is to adding data to a system, the less likely you will have customers adding to the data — unless they're incentivized accordingly. But more on that later.

Data capture ideally needs to be a natural byproduct of the customer using the product or service, not an additional thing they have to do, like fill a survey or enter details that are not mandatory for using the product.

Platforms like Facebook, Twitter, Youtube, Instagram collect most of their data as a byproduct of users using their app, which in turn helps them improve their algorithm and recommend better content.

2. Data Longevity

How much does new data enhance the experience of the product?

For example, if a product on Amazon or a hotel on MakeMyTrip already has an existing sample of reviews — say, 50 to 100 — a new visitor can estimate, with a fair degree of accuracy, the quality of the trip or the service. Consequently, they are no more incentivized to add an additional review that more or less matches the average sentiment. Over time, the marginal value of every new review that gets added to the system decreases. A product rated 4.7 stars with 500 reviews is not perceived very differently than a product with the same rating but with 700 reviews.

The data which directly helps users on these platforms is static and has longevity once a fair amount of it is collected.

Even a platform like Netflix has diminishing returns on data for a specific movie or TV show once it has spent a few months in the system and amassed a lot of usage data. After a point, there's no way a highly recommended movie can be recommended even better by the system.

Data yields less value given some of the new data you acquire already overlaps with your existing data. The benefits of new data go down.

But consider a navigation product like Waze or Google Maps that relies on real-time data in order to be useful. The more users using Waze, the more real-time data the system is able to capture and the better its navigation planning and traffic forecasting gets.

Even Truecaller is one such product whose usefulness is dictated by how many new phone numbers are being captured by the system on a daily basis, which then allows the app to crowdsource from its users and detect thousands of spam numbers being generated every day.

The real value of Waze is to help the user decide in real-time based on immediate data. The same with Truecaller: the system wouldn't work if it only had a year-old directory of phone numbers and didn't update as frequently as it does.

3. Perceived value

Is the data captured by the system being perceived by the customer as valuable?

And is the data central to the value offered by the product?

In the case of a product like Ultrahuman, real-time glucose monitoring is only valuable to a customer when they do not have existing data on what effects different kinds of foods have on their blood sugar level. But once the customer develops a good sense of what foods lead to what kinds of changes in blood sugar, the utility of the device diminishes. Unless the person is suffering from diabetes and needs frequent sugar level checks because it could be a life or death situation for them, they are prone to stop using the device.

Now, the way Ultrahuman can help the customer gain better value out of that data is if the system was intelligent enough to detect macro trends and make intelligent recommendations that improve someone's health, mood, or productivity on any given day. But till it does not offer these intelligent recommendations based on a long history of data, the user isn't incentivized to keep using the product and add to its dataset.

To put it simply, customers will only give up optional data when they know exactly how that data will benefit them.

I never share usage statistics with Microsoft or Apple because these companies never tell me exactly how this helps them make my experience better, nor is it central to the value the operating system creates for me.

Contrast it with a platform like Myntra that automatically guesses your clothing and footwear size based on your history of purchases, so that you're now saved from the additional step of filtering for your size. The data I give up is tangibly being used to improve my core experience of using Myntra.

Or say hitting like or dislike on Spotify, which trains the algorithm in ways that are extremely tangible to the user. The quality of recommendations improves massively if users give Spotify this additional data, which they happily do. Over time, the algorithm improves so much that I'm not likely to choose any other platform simply because Spotify knows me so well. Every added piece of information and usage behavior increases Spotify's value for me.

Even with Truecaller, I understand that if I do not share my entire contact list with the app and if others do the same, the platform will not be able to provide the value it is meant to provide. Uploading my contacts is central to how the platform creates value for me and others using Truecaller. And the more users use Truecaller, the better its directory gets, and the better it is able to save me from spam and scams.

If it means that my Facebook or Twitter feed is a lot better and tightly curated with my interests, I would happily give up a lot of data that I would otherwise not give if I didn't directly perceive the value that data adds to my user experience. But it does not. Currently, most of the value created with my data on these platforms is useful for advertisers, not myself. That is why I'm more inclined to talk about privacy issues.

LinkedIn is actually a great example where a new user has to manually upload a lot of data in order to make the platform, the network, and the discovery it provides useful for them. But you still do it because building your LinkedIn profile is seen as a one-time investment (with little periodic maintenance) that promises good returns in the long term. Also, the data I add to my profile is central to the value of the product, and not simply a byproduct of using the platform.

4. Data thresholds

At what point does your system have enough data in order to be perceived as really useful?

An Amazon would not be an Amazon if it only had a handful of sellers on its platform. The data aggregated over hundreds of thousands of sellers not only enhances discovery but also acts as a price transparency mechanism for the buyers and sellers. And the more buyers and sellers it adds to the platform, the better its search and recommendation engine gets.

The higher the threshold for the amount of data you need in order to make it useful, the better your defensibility gets as a business. But at the same time, the tougher it is to get the flywheel running and overcome the chicken and egg problem that plagues marketplaces and generally any product that directly derives value from how many users are using it at any given point of time.

So, the next time you use the word data as a selling point...

... do think about what kind of data is being collected, how it is being collected, and most importantly, how central it will be to creating value for the user.

There are startups that collect data to create value for advertisers.

There are startups that collect data to create value for users.

And there are startups that collect data to do both.

But not every startup that collects data benefits from data network effects.

Data is always useful, it helps you understand your users better. But it may or may not be core to creating compounding success for your product. That only happens when it is central to the user experience and value your product provides and the value every additional datapoint adds to the system is directly and tangibly perceived by your users.

Feeling Lucky?
Subscribe to get new posts emailed to you, daily. No spam.
Oops! Something went wrong while submitting the form.
15k+ business professionals act on our advice every day. You should too.
Subscribe to get new posts emailed to you, daily. No spam.
Oops! Something went wrong while submitting the form.
15k+ business professionals act on our advice every day. You should too.