Data subjects and data selves
Everybody by now has received a slew of emails about privacy policy from a range of businesses who have been forced to comply with an EU policy known as GDPR: the General Data Protection Regulation. This regulation forces companies to get users to consent to data collection, which means, as Brian Chen explains in the New York Times piece, they are sending emails that basically try to trick those who open them into consenting. The law is also supposed to allow for data portability, which would let you take "your" data to some other service. This might seem like it is a blow against big data-broker monopolies like Facebook, but as Ben Thompson argues here, portability works both ways: "If you can take your data out of Facebook to other applications, you can do the same thing in the other direction. The question, then, is which entity is likely to have the greater center of gravity with regards to data: Facebook, with its social network, or practically anything else?" Similarly, some (like the Obama Administration) argue that GDPR is a "regulatory moat" for entrenched tech giants, because of the expense involved with compliance.
All of these opinions pivot on a treatment of data that seems too limited. In a New York Times op-ed this week, Atossa Araxia Abrahamian parses the idea of a "data subject," which is central to the legislation and which also explains why people who have never left the U.S. are getting these emails.
A data subject is defined as “a natural person” inside or outside the European Union whose personal data is used by “a controller or processor”; in a curious inversion, it is individuals who are the subjects of data, not the data that is secondary to the individual.
That "curious inversion" seems exactly right to me: The data is not "secondary" to an individual, because the data ultimately does not describe or represent a particular person in any static sort of way. Rather, the data is dynamic and generative. It produces a new subject that is only tenuously related (in ways companies hope is profitable) to actual users, whether they are considered as individuals or as populations.
Nobody owns their own data. "My data" is not really about me; it is about my connection to other data producers. It captures relationships, which makes data collection difficult to opt out of unilaterally. The data is produced not by me but by various assemblages that contain me, other people, and other kinds of sensors, cameras, and so on. The uses to which that data is put can't belong to me as an individual either. The data collected about "me" is immediately aggregated with other data collected about me and other users to draw conclusions and correlations not only about me but about the behavior of users who exhibit certain behavior patterns. Data from a range of users is concatenated to create the data brokers' proprietary work product — predictive analytics, maps of personal relations, bespoke audiences, the real-time value of a particular ad auction or A/B test based on aggregate behavior, etc.
Separating out what part of those things derives from my personal behavior, or from information collected about me, makes no sense. And no one is proposing a way for users to opt out of this form of data production (using "big data" to produce new correlations and products) short of more or less total refusal to participate in anything digital or networked, which is basically impossible. We can't meaningfully consent to all the possible future uses of data, or for the ways our consent affects other people and alters the terms of their consent.
We are the "subjects of data" in that sense that these data products are meant to shape the range of possibilities afforded to us, make us amenable to various forms of pigeon-holing, typecasting, and discrimination on the fly, according to the demands of the situation rather than merely on the basis of our social identity. A range of data — and not just our "own" — can be algorithmically leveraged against us, typically to exploit a potential vulnerability that data analysis suggests might be latent. Being processed in this way produces our current conditions; data collected about all of us points forward, it doesn't look back, and it is never completely spent but is constantly available to an infinite amount of repurposing.
It is a mistake, I think, to see data as representing us or as a "simulation," because this assumes there was something stable there to be represented or simulated in the first place, and that we somehow become fully separated from this data — as if there is information about me but I remain fully independent from that information, free to act as I please. I don't think that was ever true; now the range of freedom we have may be even more curtailed by the feedback loops that pour data back into what can be perceived as real. Information flows dictate what I can do; I don't generate the information flows out of nothing, out of my own whim.
Instead, data produces a kind of momentum for the self, that channels its development into the future. I've written about this a bit before in the past, about the "data self" and identity as following from data rather than preceding it. But my focus was still too individualized at that point. I'm not sure individual identity is especially significant to the scaled-up systems of data aggregation; personal identity seems like a sort of a by-product of data collection, a kind of exhaust (to invert another metaphor that overemphasizes personal agency). The aspiration of Google, Amazon and Facebook, as Frank Pasquale argues here, is not to have an accurate profile of particular users but a broader social omniscience.
In an era of artificial intelligence and mass surveillance, however, the possibility of central planning has reemerged—this time in the form of massive firms. Having logged and analyzed billions of transactions, Amazon knows intimate details about all its customers and suppliers. It can carefully calibrate screen displays to herd buyers toward certain products or shopping practices, or to copy sellers with its own, cheaper, in-house offerings. Mark Zuckerberg aspires to omniscience of consumer desires, by profiling nearly everyone on Facebook, Instagram, and WhatsApp, and then leveraging that data trove to track users across the web and into the real world (via mobile usage and device fingerprinting). You don’t even have to use any of those apps to end up in Facebook/Instagram/WhatsApp files—profiles can be assigned to you. Google’s “database of intentions” is legendary, and antitrust authorities around the world have looked with increasing alarm at its ability to squeeze out rivals from search results once it gains an interest in their lines of business. Google knows not merely what consumers are searching for, but also what other businesses are searching, buying, emailing, planning—a truly unparalleled matching of data-processing capacity to raw communication flows.
For these aspirations to come to fruition, the world must be made to conform to the logic of predictive analytics: That is, the world must be so festooned with sensors and detectors and feedback devices and other manifestations of the "internet of things" that possibilities can be controlled centrally, and the world described by data can automatically be fed back into the world monitored by devices, to assure that the results remain the same.