Jathan Sadowski recently recommended a paper by Alexander Campolo and Katia Schwerzmann, “From rules to examples: Machine learning's type of authority,” which considers the epistemological implications of the “transition from a rule-based computer programming paradigm to an example-based paradigm associated with machine learning” (also a key topic for Matteo Pasquinelli’s recent book The Eye of the Master: A Social History of Artificial Intelligence, which I wrote about here). Mainly, it is interested in what legitimizes the supposed validity of a generative model’s outputs in the absence of accepted rules for how those outputs are produced. That is, what allows Google, for instance, to prioritize AI-concocted paragraphs in its search results, despite their much-documented inaccuracy and flagrant irresponsibility in response to a range of queries? Why, besides greed and callousness, does the company feel authorized to proceed with what is obviously a very flawed product that is destructive to received standards about what constitutes knowledge, and whose dubious probabilistic methodology many, many critics have pointed out?
Campolo and Schwerzmann’s paper is pretty clearly written, so it seems redundant for me to summarize and oversimplify it, and of course you could have a chatbot summarize it for you. But what would be the principles behind its summary? How could you know why it left out and emphasized what it did? That is basically what the paper is about. Earlier approaches to machine learning adhered to the idea that computers had to be intentionally programmed to make its outputs meaningful, purposive. The reasons for the rules preceded their implementation in the machine, and those reasons were understood logically — programmers had deduced and could articulate why they worked, or what purpose they served. The definitions of concepts had to be determined in advance, along with the reasoning for why those definitions were sufficient, why they are agreed upon and accepted.
But there is ultimately no end to all the preconceptualizing one would have to do to program computers this way. And, to yet again quote a key passage from Kant’s Critique of Pure Reason, conceptualization is ultimately an irreducible art:
The concept of a dog signifies a rule in accordance with which my imagination can specify the shape of a four-footed animal in general, without being restricted to any single particular shape that experience offers me or any possible image that I can exhibit in concreto. This schematism of our understanding with regard to appearances and their mere form is a hidden art in the depths of the human soul, whose true operations we can divine from nature and lay unveiled before our eyes only with difficulty.
New approaches to machine learning propose to emulate that “hidden art” and work inductively on data sets, allowing models to develop their own schematisms that have no necessary relation to any concepts the human programmers brought to bear on the machine’s programming. You don’t tell the machine what a dog “really” is, or what criteria something must meet to be considered a dog in a series of specific contexts, all worked out in advance. Instead, in the most extreme versions of undirected training, the machine develops its own clusters of criteria — its own schematisms or “examples,” to use the terminology in Campolo and Schwerzmann’s paper — and humans come in after the fact and give those examples labels that other humans can relate to.
But whereas the “human soul” gives humans their schematisms, the “data” supposedly gives them to the machinic models. Campolo and Schwerzmann call this “artificial naturalism”:
machine learning models produce representations, which express regularities found in the data and seem to naturally emerge from it. These representations then become normative in a more traditional sense when they influence our behavior. The “ought” of examples turns into the “is” of the representations, and, in a further twist, these representations become normative (“ought”) again when the algorithmic outputs are used to make and legitimate predictions and classifications that regulate our conduct.
This has “political implications,” the authors note, as the models’ “norms appear not as human-made commands expressed explicitly as rules but as emerging from reality through the mediation of a technological assemblage of models and examples.” That is to say, when a generative model develops a particular schematism, the institutions deploying the model can invoke this logic to insist that it is objective, and that the schematisms we have consciously and deliberately negotiated socially or inherited through various forms of institutionalization and tradition are wrong and should be amended or discarded without recourse to other political means of decision-making.
Campolo and Schwerzmann argue that the persuasiveness of “artificial naturalism” rests on an assumption of “a world determined by deep statistical structures, the source of norms, that are accessible not to human perception but only through the representations produced by models ... learned from the data itself rather than from human-specified interventions.” This turns Kant upside down: Any schematism produced by human understanding should be rejected as inadequate. Humans can’t cognitively access “things in themselves” but we should pretend that at “massive scales” machines and data can — that these escape the kinds of structure humans impose on the world to make sense of it.
As an example, Campolo and Schwerzmann describe how 23andMe try to use data to “objectively” define ethnicity, an arbitrary sociopolitical concept invented by humans.
To categorize a consumer's ancestry via DNA from a saliva sample, 23andMe relies on a group of initial testing subjects who are asked to self-identify based on what they know about their ancestors and their geographic origins. 23andMe then draws the lines around what it considers as forming a distinctive population by identifying and labeling genetic patterns discovered in the dataset.
These identified patterns are then used to negate the original self-identifications, which were grounded in humanly comprehendible traditions and logics. This is done not to eviscerate the dubious concept of ethnicity itself and undermine its power to shape life chances, but to shift the power to dictate it to a particular company. “The process of clustering and, critically, labeling these clusters in ways that are meaningful is described as ‘experimental’ and is done at the discretion of the company,” but is “framed as naturally emerging from the data, itself a faithful reflection of our DNA.” This process is fundamentally unintelligible — the clusters are not based on sociopolitical rules or human-assigned principles — but it leads to people having their lived sense of the truth of their heritage being overwritten.
The angry, disappointed, or skeptical comments to a blog in which 23andMe explained an update to their model dramatize the conflict between family narratives, rule-based, bureaucratic modes of identity production, and machine learning's identity attribution. After the update, the clients are critical of the new classifications, which they see as having become less accurate. Consumers seem to expect to receive a scientific confirmation of what they know about themselves. When this fails to materialize, it is not the legitimacy of the apparatus of DNA ethnic analysis and its myriad implicit assumptions about race and ethnicity that are attacked as inconsistent or problematic.
Instead, the inconsistency is located in the human subject, which becomes even more problematic. The individual’s means of sense-making are made to seem irrelevant to that individual, who comes to understand that they are just an object to be manipulated by scaled-up sense-making systems controlled by tech companies. The best way to affect how something is conceptualized in the world is not to think about it and use your understanding and the mysterious art of the human soul, but to produce data and make sure it somehow gets registered preferentially in the calculation machines. (This is a theme in Fourcade and Healy’s The Ordinal Society, which details how those calculation machines have been put into positions of power.)
The categories of human identity are produced and maintained through human practice; they are not natural phenomena registered passively in data sets. Similarly, the process of information retrieval, as Chirag Shah and Emily Bender note in this paper, “is a key human activity, with impacts on both individuals and society.” They point out that “turning to a system that presents information about word co-occurrence as if it were information about the world could be harmful or even disastrous” — and Google’s current course will inevitably provide ample evidence of this. But using generative AI to automate information retrieval doesn’t merely lead to inaccurate information production; it also further curtails human agency in the process and prevents the sorts of social negotiations that determine how information is organized. Instead models organize information according to “discovered” norms within data sets that correspond to no particular rule or intention but instead enforce a higher-order rule that the entities that control the data sets necessarily control how information is ordered and accessed.
Information access, Shah and Bender argue, depends on trust relations, on shared values about how knowledge is produced. But generative models presume that trust and preconceived values and social systems for ordering and validating information are superfluous because data is objective. Artificial naturalism can supplant the human contribution to information retrieval. Queries have “answers” (a matter of which words are most often next to which other words) that don’t need to be situated socially. This normalizes information search as a process that doesn’t involve subjectivity (or curiosity, collaboration, negotiation, serendipity, skepticism, etc.).
“Artificial naturalism” has a lot in common with John Cheney-Lippold’s “soft biopolitics,” in which the probabilistic classifications of algorithmic systems co-exist with explicit or received categorizations (like race, gender, social class, creditworthiness, reputation, etc.) and begin to reshape them. Both point to the way models can be deployed to deprive people of the ability to understand why and how they are being categorized, how often or in how many different ways they are being reclassified, and what the consequences of any of these classifications are. “Because the principles of prediction or classification are implicit, we can never know what parts of our behaviors, characteristics, or identities might have caused us to be associated with a certain output category or label, Campolo and Schwerzmann write. Our status in relation to machine-controlled “AI” processes depends on “a series of ever-modifiable but unintelligible correlations,” which makes us more unintelligible to ourselves the more we are compelled to interact with them. (A hard-core Kantian might claim that this assaults the “transcendental unity of apperception” that allows us to experience coherent subjectivity at all.)
Campolo and Schwerzmann ask, “How might we resist being ruled by examples when each subject is interpellated by machine learning’s type of authority in a slightly different way, as no individual produces exactly the same data and, thus, is classified following the same norm?” The wager of the companies implementing their generative models is that we can’t resist, that we will instead submit and adapt to these models’ shortcomings because it will seem futile to argue against “data” and the norms extracted from it.
this is such a great breakdown and expounding upon of some of the really cool papers that have been coming out on the subject! I read both the Bender/Shah paper and the Campolo/Schwerzmann piece when they came out, and I feel I get the crux of both a little better having read your essay — thank you!