Platform Realism
This thread by Roland Meyer proposes the concept of “platform realism” to describe the quasi-photorealistic output of generative text-to-image models like Midjourney. The term is modeled after Socialist Realism, which likewise provided “not depictions of real events, but ‘incarnations’ of abstract concepts, as formulated in prompts.” These are obviously not realistic in the sense of being documentary, but are akin to “realism” as a literary mode, in which a set of stylistic conventions are deployed to try evoke “ordinary people” and “everyday experience” from an apparently neutral or totalizing perspective while deferring questions about how all representations of “reality” are necessarily distortions.
Unlike Socialist Realism, in which various nomenklatura dictated these distortions, establishing the ideological parameters within which different concepts could be depicted, “platform realism” draws its ideological boundaries from the limitations and biases within a model’s training set. In a paper by Abeba Birhane, Emmanuel Kahembwe, and Vinay Uday Prabhu that analyzes the LAION-400M dataset, the authors note that for image-text pairs scraped from the internet, the
available textual descriptions are of very low quality, often ingrained with stereotypical and offensive descriptors. There are several reasons for this, but chief among them is priming search engines in order to increase engagement with online content. Most visual content that has alt text and is available for download via scraping tools is pornographic, and the alt text associated with such images, which may have a relative benign representation in the purely textual context, is often perverted through the lens for sociocultural fetishizations of the same terms in the visual context.
The “alt text” descriptions subordinate descriptive accuracy to search optimization, and the distribution of subject matter embeds the logic of pornographic scopophilia. These structure an ideology of the visible in which only what is pandering can be seen. The ideology of attention seeking and sales pitches are built into how the images are produced, interpreted, and characterized.
Many training images too are, as Meyer suggests, directly from the realm of advertising: stock photos, brand “key visuals,” and the desultory sorts of images used to illustrate posts to suit the demands of content management systems. Such images “stage abstract concepts such as competence, tradition or whatever the client wants, transforming the realistic depiction of bodies and spaces into a stage for the symbolic manifestation of ideas.”
What “competence,” “tradition,” and “what the client wants” all more or less refer to is the efficient translatability of concepts into images and back, in readily predictable ways — a common sense (with all the exclusions built into that) is made visible. As generated images are dependent on a corpus of images produced under the generalized mandate to be blatant, they can’t really be ambiguous or ambivalent even if you ask them to be; they are always trying to sell you on their accuracy to the expected (statistically probable) version of the prompt. In that way, they are always advertisements for themselves.
But worse than that, anything they depict is presented as if it were for sale, as if it were porn, as if it were being imaged as an ad for something. So even when you prompt a model for a style transfer — for an image in the style of Ingres or something like that — it gives you the commercialized understanding, the “platform realism” version, of what “Ingres” connotes conceptually. You can’t ask for images that don’t ultimately look like an ad, because advertising’s logic of symbolic communication is already programmed into how the models work fundamentally.
Generated images can then be understood as presenting a concentrated form of advertising’s cynicism, in which images can’t document “real things” and can’t be straightforwardly believed but instead crowd out the noncommercial visible world. In a lecture written in the wake of the 2011 London riots, art historian T.J. Clark argued that advertising extends an “invitation to the consumer not to believe in the imagery of happiness on offer but to relish that disbelief (that expertise) as consumerism’s highest wisdom.” This, he claimed, was “a main part … of capitalism’s hold on its subjects.” Seeing through the fakery of the visible world becomes a consolation that forestalls the imagination of a different kind of world. The implication of Clark’s claim is that when images lose their power to enchant, when they are depleted of their “symbolic efficiency,” we are not liberated from consumerism so much as thrust into a condition of knowing doomerism.
Generative models explicitly want to save us the work of imagination; they reinforce the pleasure that comes from strictly recognizing what ideas an image is trying to sell us. While models would seem to shore up the meanings of images — assigning specific statistically derived images to each and every concept — in practice they deplete images of their utopian potential and make them rote and one-dimensional. They are apparently opposed to what “the image-world of capitalism” in Clark’s view must do to maintain its credibility and legitimacy: “keep open the gap — the gap of desire and satisfaction — between the subject and whatever-it-is-the-commodity-may-provide.” Generative models purport to close this gap but only by making “satisfaction,” seeing what you asked for, feel trivial. Like advertising, the models induce a broad-based cynicism about a world made entirely of images that are only empty promises but also the only promises we can hold on to.
Clark describes the possibility of “a no-growth spectacle — an image world starved of resources, frozen and deteriorating, in a state of perpetual un-fashion — a non-ironic modernity, obliged to look itself in the face.” Text-to-image models are perhaps becoming that mirror.
Neuroforecasting
A few months ago, a paper claiming that a machine learning model could pick hit songs with 97% accuracy — whatever that could possibly mean — was making the rounds, eliciting the customary hype: AI will put A&R departments out of work and “neuroforecasting” will track our brain waves and allow models to deliver the “right entertainment … to audiences based on their neurophysiology,” according to one of the paper’s authors. One could understand the word right in this case as far right, since the authoritarian vision implied is that the culture industry can learn precisely which buttons to push to manipulate us without having to loop in our consciousness. The paper eagerly offered proof that that popular culture could in fact be forced on us in top-down fashion, seeming to neatly exemplify the true coercive character of consumerism.
But as computer scientists Arvind Narayanan and Sayesh Kapoor explain in this post, the study’s findings “unfortunately” are “bogus.” (Why “unfortunately”? Why would you want to use neurological scans to prove that a person’s conscious taste is an illusion, or to allow scientists to determine popular songs mechanistically rather than have artists create them? Why would you want to proceed as though culture were not a social phenomenon but a matter of atomized individual behavior? Why would you seek to derisk the culture industry and makes its domination more total?) Narayanan and Kapoor are not especially concerned with the philosophical issues with such studies; they merely point out that many are marred by “data leakage” — when a “model is evaluated on the same, or similar, data as it is trained on, which makes estimates of accuracy exaggerated.” Their mission is to address the “reproducibility crisis” for machine learning studies that they discuss in this paper, not to challenge the ideology of carrying out such studies of social practices in the first place.
A different sort of critique is found in “The Values Encoded in Machine Learning Research,” a qualitative analysis of machine learning papers by Abeba Birhane et al. meant to “question vague conceptions of the field as value-neutral or universally beneficial, and investigate what specific values the field is advancing.” They include a chart (excerpted above) that shows that none of the papers examined include “respect for persons” or personal “autonomy” as values. Among their other findings is that the papers’ “top values are being defined and applied with assumptions and implications generally supporting the centralization of power,” and that there are “increasingly close ties between these highly cited papers and tech companies and elite universities” — institutions that benefit from and depend upon the centralization of power. They also find “increasing corporate presence in the most highly-cited papers.”
The drive toward centralization is complemented by an investment in the world as totally representable and controllable. Even when machine learning studies are debunked, they are still effective at promoting the idea that human behavior can be fully calculated and predicted. Birhane et al. make this point about how machine learning papers generalize from past data:
When used in the context of ML, the assumption that the future resembles the past is often problematic as past societal stereotypes and injustice can be encoded in the process. Furthermore, to the extent that predictions are performative, especially predictions that are enacted, those ML models which are deployed to the world will contribute to shaping social patterns. None of the annotated papers attempt to counteract this quality or acknowledge its presence.
One can go further and state the implications more plainly: The point of many predictive models is to extinguish freedom. They “work” when used as a pretense to dictate the reality they predict and reimpose past trends on future experiences. Billions if not trillions of dollars are poured into this sort of research not to facilitate human thriving or improve lives but because they promise a unbreakable form of social control in which people’s behavior is “predicted,” i.e. dictated by models that condition their life chances. Machine learning will be used to try to identify the most profitable patterns that people can be compelled to repeat and will feel powerless to change.
4% Pantomime
In the wake of Robbie Robertson’s death and the hagiographic obituaries that ensued, I wondered if I should give the music of the Band another try. (I’d have only one more chance at being death-driven into their discography otherwise, as only Robertson Davies look-alike Garth Hudson remains alive.) It always baffled me that 1970s rock critics were so enthralled by the Band. Whenever I’ve tried to listen to them, their music has struck me as corny and maudlin, occasionally old-timey in the fashion of those boardwalk stalls where a family can dress up in Wild West costumes. The lead vocals sound either lachrymose (Richard Manuel), mawkish (Rick Danko), or stilted hillbilly-ese (Levon Helm). And why were these Canadians writing songs about “Dixie” and “Old Virginny”? All that time I was driving around Canada last month I didn’t hear a single Band song on the radio; it is as if they had gone so far into their Americana shtick that they no longer counted as Can-Con.
The Band’s music has often described as “timeless,” but now it seems very much of its moment in time, in and around 1970, when roots-rock nostalgia could seem revelatory. It must have read as “serious” in an era when rock was dissolving into bubblegum power pop on one side and AOR soft rock on the other. They seemed sort of like rock-and-roll librarians, with impeccable taste and musicianship that ended up feeling a bit stifling, like the music was shushing you. As great as they were as a backing band for Bob Dylan, their own music seemed like it was trying really hard. But it was clearly influential: The Band’s records help explain why Elton John put out a sepia-toned album called Tumbleweed Connection (great in its own right but best enjoyed as a Band parody), and why the first few Bruce Springsteen albums sound like they do. According to this Wikipedia page, Eric Clapton can be blamed on them too.
As a kid, I would read about how amazing the Band were supposed to be, and then I would play Music From Big Pink or The Basement Tapes and would find myself asking, like Basement Times liner-notes writer Greil Marcus did in another context, “What is this shit?” When I saw The Last Waltz I couldn’t contextualize the coronation; I wondered why everyone seemed so self-important. Undoubtedly the Band read their own press. In Mystery Train, Marcus claims that
the Band’s music was the most natural parallel to our hopes, ambitions, and doubts, and we were right to think so. Flowing through their music were spirits of acceptance and desire, rebellion and awe, raw excitement, good sex, open humor, a magic feel for history—a determination to find plurality and drama in an America we had met too often as a monolith. The Band’s music made us feel part of their adventure; we knew that we would win if they succeeded and lose if they failed.
I have to admit that I hear absolutely none of that; I figure it must be something generational. My sense was that we all won when the Band failed, as we weren’t consigned to listening to nothing but a million rock bands that sound like Wilco.
The best thing I learned going back through their catalog is that they recorded a song with Van Morrison for their Cahoots album called “4% Pantomime.” The version released on the album is not good, and hearing Manuel call Morrison “Belfast cowboy” makes my skin crawl. But on the 2021 reissue, there are studio outtakes where Morrison does most of the singing — I edited Manuel’s parts out of those takes and ended up with a pretty solid song. I’m looking forward to when “AI” music editing apps will let me swap in a cloned Van Morrison for the original vocalist on all the Band’s songs.
Optimal frustration
I was struck by this passage from a recent n+1 essay by Ben Parker about the many, many books of psychoanalyst Adam Phillips:
Frustration is integral to acknowledging the reality of objects, as opposed to hallucinated satisfactions: “The object of desire emerges out of the obstacles to [its] presence, as out of a fog.” That is why, for Phillips, “the worst thing we can be frustrated of is frustration itself” — as when the mother jumps in too soon and anticipates what will stop the baby crying. On the other hand, if frustration goes on for too long, and becomes intolerable, the baby might defend against its frustration, either in envious fantasies of dominating the object, or else in a blank denial of her dependency.
The context for this is a discussion of the object-relations theory Phillips frequently draws on to interrogate various forms of attachment. But it made me think of all the tech apps that target frustration and seek to replace it with frictionlessness or immediate gratification. With a lot of technology, convenience is framed not merely as a mode of efficiency, a way to become more productive, but as intrinsically and self-evidently good in itself. Following the line of thinking in the passage above, such technology, if it worked as well its promoters claimed, would accustom us to what Parker calls “hallucinated satisfactions” as the “reality of objects” recedes into noumena.
But most algorithmic feeds fail most of the time; they don’t consistently evoke a convincing sense of their omnipotence, despite users’ occasional projective fantasy that algorithms can read their thoughts or know them better than they know themselves. Algorithmic feeds are often serially disappointing, but the nature of app interfaces is such that we usually scroll past these failures quickly without too much trauma. They seem to be good enough for most users, probably because they are palpably trying; their very existence positions the user at the center of the universe as a unique being with special needs.
Algorithms are attuned for optimal frustration, not only to sustain engagement (one is never entirely satisfied so they keep scrolling) but to leave space for the illusion of agency. Scrolling appears a form of self-soothing, an exercise of control rather than a surrender of agency over decisions about what objects are brought to their attention. Frustration is really an aspect of pacification, which is just the sort of paradox Phillips is always uncovering.
This might look ugly, but in the future would it be possible to do screenshots of Twitter threads? I don't have an account anymore so when I click on the link I can only see the one tweet and not the rest of it