One of the few advantages of being as chronically disorganized as I am is that I can keep coming to the same “insights” that I had before but write about them as though they are new. I get to feel like my thinking is agile and spontaneous when I am just following lines of thought I traced years ago. (This helps with sustaining a blog/newsletter for 20 years.) The downside is that I never actually learn anything. It’s as though every morning I wake up with the same piece of pottery shattered in my mind, and I spend the rest of each day trying to reassemble the familiar fragments into something that holds water.
So I just wanted to apologize in case you have heard all this before. I am particularly concerned because I want to write a bit about Yves Citton’s 2014 book The Ecology of Attention, which I read and possibly wrote about a few years ago. Back then I was thinking mainly about algorithmic recommendation and probably wrote something about that — I couldn’t read my own handwriting in the marginal notes I left, so I’m not exactly sure.
As I was re-reading the book this time, I was thinking mainly about generative AI, which is gaining traction as an attention-saving technique and an attention simulator. When LLMs are used to summarize, synthesize, or produce documents, they are often producing the illusion that someone paid attention when they didn’t, and there may be value in that over and above whatever value the generated content may have. There may some attention arbitrage opportunities. At the same time, if paying attention to something invests it with value, does mechanized attention-paying upend that convertibility? Does it devalue attention throughout the entire attention economy, so that paying attention to something valorizes it at a fraction of previous rates? Or does mechanized attention raise the value of human attention, as well as the capability of verifying it?
For example, does the relevance of a product review obviously written by a human become more or less useful in the midst of generated reviews and automated summaries of reviews? Can algorithms be trusted to surface such reviews when they could just generate plausible replacements? What is more valuable to a consumer, the content of the review or the sheer proof of another human’s attention and presence, the idea that one is not venturing into something alone? In Citton’s terms, does the reader want “joint attention” (“a collection of more specific, localized situations, where I know that I am not alone in the place in which I find myself, and where my consciousness of the attention of others affects the orientation of my own attention”) rather than “collective attention” (aggregated attention as conveyed in audience measurements or the kinds of statistical reductions generative models depend upon)?
Citton spends a chapter establishing how collective attention pre-patterns individual attention, providing “an infrastructure of resonances conditioning our attention to what circulates around, through, and within us.” Attention, he argues, is basically mimetic: “human attention tends to fall on objects whose forms it recognizes, under the spellbinding influence of the direction taken by the attention of others.” Attention herds people into groups, the way some people might see others standing in a line and instinctively join it. He describes collective attention as a kind of inheritance, bequeathing a set of weighted parameters to guide our individual attention:
As it collects the pertinent forms that constitute our communal enthralments, our collective attention provides each of us with a series of sensory filters which makes certain saliences appear in our environment. When they inherit these filters, each generation benefits from the accumulated beliefs and knowledge of previous generations. We may characterize these already constituted forms as clichés, through which the modes of perception of the phenomena of our environment are articulated, along with the ways in which we react to them and our manner of referring to them in communication with our fellow human beings – Philippe Descola would speak here of ‘schemas’, and Lawrence Barsalou of ‘simulators’.
This sounds a bit like how an AI believer would describe a generative model, as the “accumulated beliefs and knowledge of previous generations” made accessible in schematic form as cliches.
But for Citton, collective attention requires “corrective intentional concentration” — it needs to be adjusted for context, it needs human focus added to it to make it efficacious:
our collective rational attention is nourished by the daily trials to which we subject the clichés we have inherited, as well as the corrective reorientations that we bring to them in the exceptional cases where they failed our expectations and we were obliged to make some changes.
Generative models can reproduce the cliches, but in doing so — in removing the cliches from “daily trials” in human life — do they deter us from making these corrective reorientations? Instead of the field of cliches becoming the backdrop against which we individuate ourselves, it may become the operating system that keeps us within certain prescribed patterns of behavior. Collective attention is the level at which the “attention economy” functions, and it becomes detrimental to the extent that it prevents us from doing anything else with attention but using it to add capitalist value to things. (Attention as “economic value” pre-empts attention as care, as reciprocity, as emotional connection, etc.)
It is tempting to interpret all data as attention data, to see a model’s weights as somehow corresponding with an amount of microattention paid to a particular combination of words or letters or ideas. A generative model would then be like a much more ambitious version of Google’s PageRank algorithm. Citton writes:
Google would then, as we have already noted, be the perfect symbol of the distributed productivity of our collective intelligence: it is our curiosity, our intuition, our informed choices, our personal knowledge and our considered experiences that go to nourish this empty condenser that is the PageRank algorithm with a power of communal thought.
But one could also think of data is what is left of information when you subtract “attention” (the subjectivity involved, the point of view from which something is captured) from it. Citton discusses this in terms of the difference between “vectors” and “scalars”:
To characterize attention as a ‘vector’ is, for Valéry, to insist that it is by nature pressure, prolongation, effort, conatus – or, to be even more precise, ‘direction of effort’ ... If attention selects, filters or prioritizes, it does so starting from a principle of orientation. Attention cannot be reduced to a simple given, a static number: it is much less (countable) reality than (unpredictable) ‘potential’. In other words, as it relates to thought, attention ‘is always formed in vectoral mode’ (like an arrow), and it is only when it stops to think and develop that it can be grasped ‘in scalar form’ (like a number).
This scalarization – which is to say, the operation that translates arrows into numbers – denies the fundamental nature of attention, in the same way that putting a bird in a cage denies its nature as a flying creature. But, as we have seen, it is precisely to a ubiquitous scalarization that we are condemned by the financial logic of capitalism.
The process of training generative models entails turning vectors into scalars at the largest possible scale. (Training models performs the quantification and commensuration/commodification that capitalism depends upon indiscriminately and at an unfathomable scale.) This, in Citton’s terms, “replaces attentional energy with electrical energy,” helping sustain the illusion of their equivalence. Mechanical calculations are made equal to purposeful consideration, to desire. Thus the idea that the more power AI consumes, the more attention that machines will have paid to the problems facing the world; ergo, let’s fix the climate by letting machines consume more energy!
Another way of putting that is that models turn joint attention (humans attending to something in a reciprocal situation of some sort) into collective attention (a profitable pool of attention sits somewhere for some reason). Does that make joint attention seem more or less valuable? More or less worth pursuing? If attention is valorizing, does all the attention absorbed by chatbots, etc., make them more visible and the sort of collective attention they compute more valuable? Or do models merely absorb attention to destroy it, neutralize it, strip it of its direction and its valorizing potential?
With generative summaries, “electrical energy” is directly substituted for “attentional energy” as a more efficient alternative. Since information and attention are made commensurate within an attention economy, the information transmitted in a summary can degrade as long as more attentional/electrical energy is saved in the process — then there is a net gain, even if the information is wrong or insufficient. Citton posits “an enormous surplus value, in the form of an attention appreciation, resulting from the difference between attention given and attention received.” Generative models, understood as attention savers, can be used to increase that surplus value. This is obvious in GenAI’s usefulness as a spam generator.
But the models are also attention removers: They are subtracting the human attention that grounds the system as a whole. Saving “attention” with machines in only valuable if that machine attention is considered equivalent to human attention, and that doesn’t seem sustainable. It’s commonplace by now that the increase of available information makes attention a more scarce commodity, but the development of better attention simulators (summary engines, text generators, etc.) attenuates that: what is becoming more scarce now is “attending,” proof of the human process of paying attention, the aspect that turns “information” into “communication” and allows for meaningful exchange to occur. In other words, generative models may make “collective attention” extremely cheap and “joint attention” much more scarce. And the earlier beneficiaries of collective attention (celebrities, brands, cliches, etc.) may lose so of their value or potency as well, as collective attention no longer correlates with an aggregation of ongoing human interest but the accumulation and expropriation of human interest gathered in the past.
Generative models are something like opposite of the “ignorant schoolmaster.” As Citton explains, drawing on Jacques Rancière’s book about Joseph Jacotot,
The essential function of their (potentially ignorant) master is not to explain contents, but to train the pupils’ attention, whether that be through a demand made on their will or through the stimulation of their desire. It is toward ‘a habit of, and a pleasure taken in, noticing’ that every teaching experience should tend.
This is “learning to look where the other is looking” rather than simply seeing what is there.
Generative models are ignorant enough, but they only explain contents (with no guarantee of being correct) and convey nothing about the habit and pleasure of noticing, or the process of attentiveness that instrumental to learning. Nothing about their process of generating outputs has anything to do with how humans generate outputs, so they can’t really teach anything, any more than search engine results teach you how to be a researcher. The models can put the content they are tasked with generating into inanely cheerful voices, as with Google’s Notebook LM podcast generator, but this just makes matters worse — it expects us to take pleasure in the simulation of banter rather than in the kindling of the desire to know.
“The best way of showing another how to research is still to research together,” Citton argues. Generative models may get better at simulating that kind of togetherness with chatty interfaces, but they can’t show us how to notice anything.