Latest Essay

.

Ownership has become more complex due to modern technologies. It is associated with the legal concept of property, which is broadly understood as a bundle of rights that define the relationships between the owned object, the owner, and other people [Wikipedia]. The notions of ownership and property are also the building blocks of capitalism, varying interpretations of which leads to different varieties of capitalisms, but that is a separate essay.

Conventionally, ownership means that the owner alone can do whatever they want with their property. However, this is no longer the case for both devices and data.

Devices

There are two main ownership issues for devices. The first is the right to repair and the second is device bricking.

Companies maintain an exclusive hold over their products in myriad ways. The most typical tactic relevant to our discussion is the threat of voiding the product’s warranty if owners use third-party repair services or spare parts. By doing so, companies effectively own the after-sales service of the product, therefore extending their profits. As they have excluded competitors from entering this market, they are a monopoly and are therefore able to extract any fee that they want from consumers. This approach, when generalized, can be referred to as “walled gardens”, where companies create hermetically sealed product ecosystems that bar anyone else from entering. This is very attractive for investors since it creates a fortress around their products and provides a moat for the business.

Prior to a recent legislative push within the US, both at the state and federal levels, companies such as Tesla, Apple, and John Deere were opponents to the owner’s right to repair. After years of advocacy, activists have tilted the playing field in favor of the right to repair. This is an encouraging sign that demonstrates the possibility of change on a societal and systemic level. However, for manufacturers whose business model depends on after-sales products, the battle is ongoing. For instance, HP’s printers are loss leaders that require recurrent ink purchases to make a profit. Thus, they have gone to great lengths to prevent the use of third-party ink cartridges and claimed that they are doing it for security reasons, which experts are skeptical about [Wired].

Another issue that we face today is device bricking. Many electronic gadgets today are smart or connected devices, this trend will likely accelerate as CPUs and their related circuitry can be squeezed into tiny form factors. However, many of these devices are dependent on the producer’s proprietary cloud services to function properly. If the company goes out of business, gets acquired, or decides to discontinue the product, these devices turn into electronic bricks, in other words, glorified paperweights. This has happened to Pebble smartwatches [The Verge], Revolv smart home hub [iFixit], and Nike Fuelband [Wikipedia]. Some diehard Pebble owners formed Rebble, a grassroots group to lengthen the lifespans of their devices [Rebble], which provides an interesting case study for how connected devices can be supported well after the demise of the company. It also ties in with the issue of walled gardens I brought up earlier – with increased transparency and interoperability, then owners have more agency over their devices. I will revisit this point towards the end of this essay.

Data

Data can be thought of as a form of intellectual property. The ownership of data is contentious due to a variety of reasons, including its intangibility, the inability of governments to pass privacy laws, and the breakneck pace at which data is moved and managed across the internet.

A brief history and geography of compute and data

To wrap our heads around this topic, let’s define the types of user data that exists on the internet today. Our focus will be on the data relationships between users and platforms, since most of the internet is currently organized around platforms. A platform broadly refers to any large internet-based service, including social media sites and marketplaces. The types of data are listed in the order of increasing contention.

  • User-generated data: Data that is created by the user and that they are aware of.
    • Personal data: Examples include emails and documents that we upload to cloud storage. This data is mostly for private consumption and not publicly accessible.
    • Social data: Examples include posts that you create and comments, likes, and subscriptions in response to other users’ posts, usually within the context of a social media platform. This data is often publicly available, or at least visible to everyone on the platform.
  • Platform-generated data: Data that is collected by the platform based on the behaviors of the user that users are often not aware of. Examples of this includes mouse clicks on a page, how long a user spends on the page, their browsing and search history on the website, etc. Some of this data may be available to users, but they may not know how it is used by the platforms. Examples includes the history of past purchases.

This taxonomy may not neatly fit into the practices of every platform, but they generally apply. Eagle-eyed readers may notice that the line between platform-generated data and social data can be extremely thin. For instance, how many of us are actively keeping track of all of our likes?

It is also instructive to think about the economics of the different types of data. Since users might complain about privacy issues if personal data is used by platforms, users are expected to pay for these services. This is why we keep getting reminders from Google about how we are reaching storage limits. On the other hand, social and platform-generated data is used to build recommendation engines, i.e. building a model of what you like and associating it with others similar to you. The two main ways that platforms profit off the recommendation engine (and just to stress, made using user data) are through advertising and increased purchasing behavior. Generally speaking, social media platforms tend to rely on advertising to provide a service where users do not have to pay, while marketplace platforms rely on selling users more stuff by suggesting other items to users. Amazon uses both approaches, selling advertisements to businesses and recommending stuff to users to capture both types of value.

The platform’s Terms of Service (ToS) defines the relationships between the data, the user, and the platform. For most platforms, users own their data, but the platforms have an automatic license to use that data in broad ways. Practically, this means that while we own the data, platforms can use the data in ways they see fit.

Going back to ownership as a bundle of rights, we have signed off many of those rights as soon as we agree to be on these platforms. Not only that, companies can and do change the ToS to better align with their business objectives, and most users, being ignorant, indifferent, and oblivious, simply agree to the new ToS. What are they going to do – leave the platform?

Even if we decide that our data is being used egregiously, it is difficult to move to another platform. One of the challenges of switching platforms or providers is the lack of interoperability, something that I brought up earlier in the essay. One could easily switch email providers and clients because the fundamental protocols are the same. However, changing your cloud storage or social media platforms is far more difficult as platforms are not designed with interoperability in mind. This is beginning to change, with newer social media platforms adopting open standards such as ActivityPub, but it remains the exception rather than the norm. Furthermore, platforms have network effects in their favor. If all of your friends are on asocial network, it is unlikely for you to leave.

Especially in the context of Large Language Models, platforms are increasingly aware of the value of social data and looking to either monetize them (e.g. Reddit [ArsTechnica]) or use that data to train their own LLMs (e.g. Facebook [TechCrunch]). It was recently reported that LinkedIn did not even bother to change their ToS before training AI on user data [404 Media].

Companies have been using user data for profit for a long time, primarily through recommendations. However, the emergence of large AI models, which require copious amounts of data for training, signals a turning point by introducing a new way for user data to be monetized. This has led to increased public consciousness of the value of their data. The community feeling the biggest impact are creators. Artists, writers, singers, and content creators are pushing back against the use of their work in these large models. The notions of the Internet commons and what constitutes fair use are and will continue to be an ongoing debate.

We do not own what we cannot defend. With our data living on these platforms, it feels like half the battle is already lost. This issue could be addressed through renewed privacy policy and legislation — and there is some renewed hope here with increased activity among US lawmakers — but that will take time. A solution was needed yesterday.

With regard to data ownership, I think that a quicker response can come from reimagining our relationship and dependence on platforms by gaining technological independence and self-determination. This requires technology builders to provide alternatives to the status quo. My explorations will focus on rehoming our data and bringing it back under our direct control. This will involve building tools that enable everyday people to consume and produce data locally. I believe that if our data lived within the boundaries of our local network, the existing power dynamics between users and platforms would fundamentally shift.

Watch this space.

Latest Project