News
Chat GPT & Data Wars: Will We Tokenize the Internet?
Exploring the future: tokenizing data to navigate AI’s internet impact.
6 min
Exploring the future: tokenizing data to navigate AI’s internet impact.
6 min
Chat GPT (Generative Pre-trained Transformer) has demonstrated how Artificial Intelligence is reshaping the way we work. What hasn’t been discussed is how it may reshape the internet.
It’s obvious that AI technology eliminates or greatly reduces the time and expense associated with performing myriad tasks common to “information work”. The current Chat GPT-3 is not even completely leveraging the expansive potential of the internet, and so it stands to reason that Chat GPT-4, which is orders of magnitude more adept (we’re told) and is designed to build neural networks by going online, will drive still more efficiencies. However this rests upon a major assumption: a free and open internet. Will Chat GPT tip the free information superhighway into a toll-road model?
Artificial intelligence has two major friction points: access to data and compute costs. Compute costs have and will continue to fall, suggesting compute will increasingly become the lesser concern. Access to data may prove to be the bottleneck. Since AI models are built by “training” neural network models, the more data you provide the model the more proficient it becomes at parsing that data, or “learning”. The internet is the highest source of low cost data that has ever existed. It is comprised of countless contributions from all of us, along with virtually every idea that has ever permeated global civilization throughout history. Thus far we’ve shared that data freely as a tool to expedite and advance human learning and cognition, presumably because we inherently feel that the advancement of our species at large is a net benefit to us individually- or more cynically because its been profitable. In either case, share we have, as I am right now. But Chat GPT, and AI more broadly, interrupts the continuous line of human development in an important way; our contributions are no longer singularly offered to advance each other’s knowledge. It’s far more profitable and impactful to instead bolster the aggregate intellectual capacity of computers, who will access our data and use it with artificial intelligence more efficiently and profitably than any fellow human. AI doesn’t just represent exponentially faster, more “intelligent” and cheaper output; AI changes our relationship to data. In effect, we have become the data, and we may want to be paid.
AI that trains its models on the internet establishes our written thoughts as raw material. It takes my limited intelligence, refines and multiplies it, then redirects it to wherever its demanded at that exact moment. What do I get for my contribution to a system that takes my weak input and leverages it many times over in the marketplace? Monthly credits to use that same system. Depending on how good Chat GPT-4 is, that may not be good enough. But what recourse have we, what other way could this go? This is where tokenization enters the, um, chat.
Let’s revisit the data friction problem, or more precisely the lack thereof. Most people assume the internet will remain a wild, open expanse of endless data that AI will use at its pleasure. We could however, revolt against the robots and erect fences, walls and tollbooths everywhere. We could tokenize every site, every blog post like this one, every research paper, poem and photograph. Rather than accept Chat GPT credits, we could demand Chat GPT surrender credits to us for use of our data. This model would require that Chat GPT purchase and remit tokens to access our URL, pay us for the data harvested to build its models, then appropriately credit us for our contributions to the models it builds with our data. This may strike you as silly, but that perhaps suggests you’re not remotely concerned that Chat GPT or its equivalent in 5-10 years won’t negatively impact your profitability. Perhaps you’ve heard that technology has created more jobs than its destroyed, and you feel this will be just the same. Do you think Google shares this view? Do you imagine a world where Google provides free search engine tools to Chat GPT, which reduces their ad revenue to near-zero? Can you think of other companies or industries under threat of extinction by AI who might also wish to gate their data?
Google is in precarious position. It has built a business around classifying and identifying all the data on the internet. If you want to find that data, it will do it for you, and monetize by serving you up some ads in the process. Google has created the formula for what deserves to be seen and what it costs to be visible. An entire industry called Search Engine Optimization emerged as a dark art that solves for invisibility, a testament to Google’s outsize success. But now all that data is integrated into a new framework that doesn’t just fetch data; it parses, integrates and deploys it towards specific human directed tasks. It actually does something, delivering an end product by alchemizing data rather than delivering individual raw materials for humans to alchemize. In the “old model”, Google generated profits through advertising. In the new model, there are no ads and there are no ad revenues. Chat GPT acknowledges that we do not truly want data, we want results that require data. And since Artificial Intelligence is better positioned to deliver results than humans in many instances, we may want AI to pay us for the data it requires to achieve coveted results, when it uses our data to achieve them.
Even chat GPT acknowledges this may be the case; I asked it why it should be allowed to access human generated data for free. It admits it may not be free in the future, and points out that the costs involved with harvesting and maintaining data may need to be recouped. But Chat GPT fails to mention the cost of human obsolescence in discrete labor markets. Should that be “recouped” by collecting fees from AI for data usage?
Closing off the internet may sound extreme, but it’s already happening. More and more websites are distinguishing between human and computer patronage. Humans can move about the internet with impunity, but robots enjoy no such privilege. ReCaptcha for example, is nothing more than a robot thwarting tool. Its existence affirms that humans should not have to compete against robots and robots do not deserve unfettered access to data it can use in ways that deprives its creator or owner of economic benefit. Isn’t that the tenor of AI data gating?
Soon every individual piece of data will be ownable and transactable through NFTs. Your height, associated with your anonymized digital identity, will be a token with a market determined value. Same for your weight, eye color, resting heart rate, age, income and so forth. That data can be kept off-market by simply not offering it for sale, or you’ll be able to bring that data into a marketplace where any buyer, human or AI, can purchase it. The same will be true for this blog post, and anything digital that exists. You may be thinking, “this blog post is worthless”. Well you might be right! Just because data can be traded doesn’t mean its value won’t approach zero.
In addition to offering your data for purchase or license, you’ll also be able to see every AI model, blog, research report and final product that used your data as an input, and be paid a royalty for its use. After many years of considering what humans are designed to accomplish on Earth, I came to the conclusion some time ago that we are simply here to create data. Every second of every passing day creates massive amounts of data, and perhaps data markets will coalesce as the tool used to value and actualize that data. Pretty heady, but that’s where I’m at on this.
The next step in this direction might be the broad use of wallets as a tool to sign in to websites across the internet. Wallets will be tied to anonymized Decentralized Identifiers, or “DID’s”. This would be less invasive to users than cookies and provide website owners with greater value. Likewise, bots could be restricted from accessing data without fair compensation and acknowledgement to data owners, as the DIDs would be tied to KYC’d wallets. AI and bots would be unable to receive a DID as they do not have government identification to complete the KYC process. Website owners would then create a token supply for use when acquiring any of their data, which would be remitted a la carte for data usage. Presumably there will be many Artificial Intelligence providers who will compete on pricing and how the data is ultimately used, making for a fair and robust data market.
This may all sound like its years and years away from being a reality, but it’s not. If you’re interested in tokenizing your data, contact us here. Let’s tokenize Earth.
We want to hear from you