The world has a data problem. The more we create, the more we are forced to entrust it all to fewer data monopolies to profit from.
Data is also siloed, and generally hosted on proprietary databases across vast systems, geographies and business units. Whilst there have been fixes and APIs that have helped improve the sharing of corporate and public data, fundamentally this doesn’t change the fact that client-server architecture and corporate IT networks are inherently designed to prevent data sharing.
Regulation and privacy laws combine to make organisations concerned about sharing data both internally and publicly unless forced to do so. The Health Insurance Portability and Accountability Act (HIPAA) in the US or the Data Protection Act in the UK explicitly state how and what data can and cannot be shared. But these are complicated policies. The technical difficulty of implementing them, combined with bad UX means people err on the side of caution when approaching these issues. There is simply no incentive to outweigh the risk and hassle of sharing data.
Even where sharing is encouraged, current infrastructure makes monetising data through open source licensing complex and equally difficult to enforce. So ultimately, you are left with two options: give your data away for free (which what most individuals do) or hoard it and see if you can make sense of it at some time in the future (which is what most companies do). Neither is very efficient or effective.
The consequence is a few increasingly powerful companies get the vast majority of data at little cost, and large amounts of valuable data are sat dormant in siloed databases.
Simply put, there is no economic incentive to share data. This is a massive issue in the AI market (expected to be worth $70 billion in 2020 according to BoA Merrill).
The best AI techniques today, such as deep learning, need lots (and lots) of quality and relevant datasets to deliver any kind of meaningful value. Starving most new entrants (such as startups and SMEs) of the ability to compete.
AI expertise and talent is expensive and hard to come by, typically concentrating within organisations that already have the data to play with or promise to generate vast quantities of it in the future. Companies like Google, Facebook, Microsoft and Baidu swallow up almost all the best talent and computer science and AI PhDs before they even come onto the jobs market.
This creates a self-propagating cycle, increasingly benefiting a few established organisations who are able to go on to dominate their respective markets, extracting a premium for the priviledge. Think of Facebook & Google in the Ad Market, Amazon for Retail, now imagine that happening across every single industry vertical. Data leads to data network effects, and subsequent AI advantages which are extremely hard to catch up with once the flywheel starts. The way things are going, the driver-less car market will likely consolidate around one single software provider. As old industries like education, healthcare and utilities digitize their operations and start utilizing data, the same will likely happen there too.
The benefits of the 4th Industrial Revolution are in the hands of fewer and fewer organisations.
Currently the expectation is that companies, rather than trying to compete (if they want to stay in business), are expected to concede their data to one of the big tech clouds like Amazon or Microsoft to be able to extract value from it. Further extending the suppliers’ unfair advantage and increasing their own dependency. Look at autonomous vehicles, German manufacturers unable to compete with Silicon Valley’s AIs for self driving cars could be left simply making the low-value hardware whilst conceding the higher-value (and margin) software to companies that drive the intelligence that control them.
I’ve always argued companies don’t want Big Data. They want actionable intelligence. But currently most large organisations have vast dumb data in silos that they simply don’t know what to do with.
But what if…
they could securely allow AI developers to run algorithms on it whilst keeping it stored encrypted, on-premise.
And open up every database at a ‘planetary level’ and turn them into a single data marketplace.
Who would own or control it? To be frank, it would require unseen levels of trust. Data is generally very sensitive, revealing and something you typically would not want to share with your competitors. Especially in say, consumer health how could that be possible with complex privacy laws?
What’s needed is a decentralised data marketplace to connect AI developers to data owners in a compliant, secure and affordable way. Welcome to Ocean Protocol.
Why decentralised and tokenised?
Primarily because of the need for the provenance of IP, affordable payment channels, and the ensure no single entity becomes a gatekeeper to a hoard of valuable data. Gatekeeper, in the sense that they can arbitrarily ban or censor participants but also to avoid the same honeypot hacking problems we encounter in today’s centralised world.
But aren’t there already decentralised data market projects?
The Ocean team have focused their design on enabling ‘exchange protocols’, resulting in massive potential for partnerships with other players in the domain. As investors in IOTA, understanding how this could work with their Data Marketplace is an interesting case in point.
What we like most about Ocean is they have been deploying many of the constituent parts that underpin this marketplace over the last 4 years via a number of initiatives which they are now bringing together into one unified solution:
- Ascribe (digital ownership & attribution)
- BigchainDB (high throughput distributed database to allow for high throughput transactions)
- IPDB (Scalability – build on proven BigchainDB / IPDB technology for “planetary scale”)
- COALA IP (blockchain-ready, community-driven protocol for intellectual property licensing)
What is being added is a protocol and token designed to incentivize and program rules and behaviours into the marketplace to ensure relevant good quality data is committed, made available and fairly remunerated. The design is prepared for processing confidential data for machine learning and aggregated analysis without exposing the raw data itself. Ocean will facilitate in bringing the processing algorithms to the data through on-premise compute and, eventually, more advanced techniques, like homomorphic encryption, as they mature.
Think of the Ocean Token as the ‘crypto asset’ that serves as the commodity in the data economy to incentivise the mass coordination of resources to secure and scale the network to turn in to actionable intelligence.
If Ocean is about trading data, can’t it use an existing cryptocurrency as its token, like Bitcoin or Ether?
While existing tokens might serve as a means of exchange, the Ocean protocol requires a token of its own because it uses its a specific form of monetary policy and rewards. Users get rewarded with newly minted tokens for providing high quality, relevant data and keeping it available. This means the protocol requires control over the money supply and rules out using any existing general purpose protocols or tokens. Furthermore, from the perspective of Ocean users, volatility in an uncorrelated token would disrupt the orderly value exchange between various stakeholders in the marketplace they desire.
OCEAN Data Providers (Supplying Data)
Actors who have data and want to monetise it, can make it available through Ocean for a price. When their data is used by Data Consumers, Data Providers receive tokens in return.
OCEAN Data Curators (Quality Control)
An interesting concept to Ocean is the application of curation markets. Someone needs to decide what data on Ocean is good and which data is bad. As Ocean is a decentralised system, there can’t be a central committee to do this. Instead, anyone with domain expertise can participate as a Data Curator and earn newly minted tokens by separating the wheat from the chaff. Data Curators put an amount of tokens at stake to signal that a certain dataset is of high quality. Every time they correctly do this, they receive newly minted tokens in return.
OCEAN Registry of Actors (Keeping Bad Actors Out)
Because Ocean is an open protocol, not only does it need mechanisms to curate data, it needs a mechanism to curate the participants themselves. For this reason a Registry of Actors is part of Ocean, again applying staking of tokens to make good behaviour more economically attractive than bad behaviour.
OCEAN Keepers (Making Data Available)
The nodes in the Ocean network are called Keepers. They run the Ocean software and make datasets available to the network. Keepers receive newly minted tokens to perform their function. Data Providers need to use one or more Keepers to offer data to the network.
BRINGING IT ALL TOGETHER
Ocean is building a platform to enable a ‘global data commons’. A platform where anyone can share and be rewarded for the data they contribute where the token and protocol is designed specifically to incentivise data sharing and remuneration.
So let’s see that in the context of a single use-case: Clinical Trial Data
Note: that this use-case is provided for illustrative purposes only, to get a feel for how Ocean could work in practice. Some of the specifics of the Ocean protocol have yet to be finalised and published in the white paper, and might turn out different than described here.
- Bob is a clinical physician with a data science background who uses Ocean. He knows his industry well and has experience understanding what types of clinical data are useful in trials.
- Charlie works at a company that regularly runs medical trials. He has collected a large amount of data for a very specific trial which has now concluded, and he believes it could be valuable for others but he doesn’t know exactly how.
- Charlie publishes the dataset through Ocean and judging its value (based on the cost to produce and therefore replicate), as well as his confidence in its overall quality, he stakes 5 tokens on it (to prove it is his IP, which if people want to use they must pay for). Charlie uses one of the Keeper nodes maintained by his company’s IT department.
- Bob, as a Data Curator of clinical trial data on Ocean, is notified of its submission, and sees no one has challenged its ownership. By looking at a sample he decides the data is of good quality and based on how broad its utility could be he stakes 10 Ocean tokens to back his judgement.
- Bob is not alone and quickly a number of other Data Curators with good reputation also evaluate the data and make a stake.
- By this point a number of AI developers see Charlie’s dataset is becoming popular and purchase it through Ocean.
- Charlie, Bob and the other curators get rewarded in newly minted tokens, proportional to the amount they staked and the number of downloads.
- The Keeper node at Charlie’s company regularly receives a request to cryptographically prove it still has the data available. Each time it answers correctly, it also receives some newly minted tokens.
- When Bob and Charlie signed up to join Ocean, they staked some tokens to get added to the Registry of Actors.
- Eve also wants to join Ocean. She stakes 100 tokens to get added to The Registry of Actors.
- Eve is actually a malicious actor. She purchases Charlie’s dataset through Ocean, then claims it’s hers and publishes it under her own account for a slightly lower price. Furthermore, she creates several more “sock puppet” accounts, each with some more tokens staked to join, to serve as Data Curators and vouch for her copy of the dataset.
- Bob and Charlie discover Eve’s malice. They successfully challenge Eve and her sock puppet accounts in the Registry of Actors. Eve and her sock puppet accounts get removed from the Registry of Actors and she loses all staking tokens.
For more detail on Ocean go to: oceanprotocol.com
APPROACH, TRACTION & TEAM
We were greatly encouraged by the fact that Ocean were aligned to building what we term a Community Token Economy (CTE) (a model outlined in September 2017) where multiple stakeholders (BigchainDB & Dex) partner early on to bring together complementary skills and assets.
As two existing companies (one already VC backed) they are committing real code and IP already worth several million in value*.
*This is an important point to remember when considering the valuation and token distribution of the offering.
The open, inclusive, transparent nature of IPDB foundation bodes well for how Ocean will be run and how it will solve complex governance issues as the network grows.
We are also impressed with the team’s understanding of the importance of building a community. They understand that networks are only as powerful as the community that supports it. This is why they have already signed key partnerships with XPrize Foundation, SingularityNet, Mattereum, Integration Alpha and ixo Foundation as well as agreeing an MOU with the Government of Singapore to provide coverage and indemnification for sandboxes for data sharing.
The team understands that the decentralisation movement is still in its early stages and that collaborative and partnership is a more effective model than competition and going it alone.
PLACE IN THE CONVERGENCE ECOSYSTEM STACK
We believe Ocean protocol is a fundamental requirement for the Convergence Ecosystem Stack. It is a protocol that enables a thriving AI data marketplace. It is complementary to our other investments in IOTA and SEED both of whom provide a marketplace for machine data and bots respectively.
Marketplaces are critical to the development of the Convergence Ecosystem as they enable new data-based and tokenised business models that have never before been possible to unlock value. Distributed ledgers, blockchains and other decentralization technologies are powerful tools for authenticating, validating, securing and transporting data; but it will be marketplaces that will enable companies to build sustainable businesses and crack open the incumbent data monopolies. IOTA, SEED and now Ocean are unlocking data for more equitable outcomes for users. We are proud to be partnering with Ocean to make this a reality.
The above graphic is the Outlier Ventures’ Convergence Ecosystem Framework. Sign up to the newsletter to be the first to receive the Convergence 2.0: The Convergence Ecosystem paper coming in February.
More information can be found about the Ocean project can be found at:
DISCLAIMER: THIS DOES NOT CONSTITUTE 3RD PARTY INVESTMENT ADVICE. IT ONLY SERVES AS AN ARTICULATION ABOUT WHY WE AT OUTLIER MADE OUR INVESTMENT. PLEASE DO YOUR OWN DUE DILIGENCE WHEN CONSIDERING MAKING INVESTMENTS.