The roads are alive
Earlier this week, Google announced what it is calling a "breakthrough conversational technology": a language model along the lines of Open AI's GPT-3 that "can engage in a free-flowing way about a seemingly endless number of topics, an ability we think could unlock more natural ways of interacting with technology." This claim is framed with a weird and patronizing description of what "language" is, as though we needed a team of computer scientists to break it down for us: "Language is remarkably nuanced and adaptable. It can be literal or figurative, flowery or plain, inventive or informational. That versatility makes language one of humanity’s greatest tools ..." The tone reminds me of the classic first line of L.Ron Hubbard's Dianetics, where he claims that "the creation of dianetics is a milestone for Man comparable to his discovery of fire and superior to his inventions of the wheel and arch."
Of course, the marketing language in these sorts of announcements should not be taken at face value, but in general, we should hold off from letting Google give us lessons about what should be considered a "natural" way of doing anything. Google, like other tech companies, are always in the business of trying to naturalize the use of their products, so that it seems normal, for example, to have a listening device in your house tracking your behavior or to have algorithms help compose your personal correspondence.
What Google apparently has in mind for its conversational technology is to make its search bar dialogic. That way, it could answer your question with a question of sorts and help refine your initial query into something more specific. More likely, though, it will be yet another algorithmic vector of prejudice and discrimination: As Karen Hao points out, "studies have already shown how racist, sexist, and abusive ideas are embedded in these models." After all, large language models draw their probabilistic assumptions about what sort of language use is "natural" from a more or less indiscriminate assimilation of language use by humans, many of whom are pointedly racist, sexist, and abusive.
Also, a search-bar chatbot will inevitably blur the lines between helping you and helping Google. It will be hard to ascertain whether the search bar is steering your informational inquiries toward more profitable results for Google (especially given its history of slanting its search results). The search bar will become a salesperson, or a sideshow raconteur, or a sideshow in itself, hoping to hold your attention while more ads that are targeted at your captive attention can be auctioned off and displayed. Perhaps those ads can embed "conversational technology" too, so that they can cajole, flatter, and harangue you into taking on a more "natural" attitude toward whatever it is that's being pushed. The technology could even generate new ad appeals responsively, in real time, similar to how TikTok shows you algorithmically sorted video. What good is an ad if it can't demonstrate to you that it knows you better than you know yourself?
A common concern about large language models is that they will be used to help conduct disinformation campaigns. This report, released a few days ago by Georgetown's Center for Security and Emerging Technology, examines that threat at length. I've always thought that particular worry was a bit overblown, and this report didn't really change my mind. Disinformation campaigns aren't lacking for deployable language so much as access to people's attention, and having more abundant garbage text won't necessarily make them more effective. GPT-3 is, of course, great at making up plausible lies — that is all it can do, given that any piece of language generated in the abstract is inherently a lie. But disinformation isn't a matter of more persuasive lies than confirming existing biases or producing a smog of indifference.
Most people's lives are already saturated with unwanted language; it seems to me that algorithmically generated language wouldn't be any more effective at cutting through all that unless it were specifically targeted, crafted in response to audiences that are understood in increasingly granular ways. Social media allow disinformation efforts to be more impactful because they supply precisely that kind of data and offer a platform for the real-time testing of language to see which framings get more circulation, to see who is talking to whom, and so on. They habituate consumers to being more indifferent to where information comes from, as algorithmic sorting and the homogenization imparted by an interface can work like an imprimatur. That is to say, social media are the technologies that abet disinformation; text generators are entirely parasitic on them for any destructive capabilities they might have.
What text generators can do that humans alone can't is produce slightly differentiated text on the same theme at scale. The CSET report was unfortunately more interested in looking at whether GPT-3 could produce usable text than to lay out the sorts of scenarios in which plausibly human-seeming text generated at scale could be harmful. This piece by Henry Farrell and Bruce Schneier points to one such scenario, where generated text can be used to flood politicians' inboxes or drown out actual human comments in response to any particular public proposal that is made open to comments from the public.
Generated text can be used to overwhelm human content moderators in any situation, which would presumably drive the adoption of AI for moderation tasks. Together they could function as a generative adversarial network of sorts, producing ever more refined examples of generated text that can evade moderation. As Farrell and Schneier argue, "the danger isn’t just that fake support can be generated for unpopular positions, as happened with net neutrality. It is that public commentary will be completely discredited." As with deepfakes and the like, the danger is not just that fakes will be taken for real, but that it will become increasingly burdensome to substantiate anything as real. "Reality" would belong to those with the resources to prove it out. Maybe that will be a matter of who can afford the most powerful text generator.
AI models are also being used to generate music, as Alexander Billet details in "As You Were," an essay about a recent mental-health awareness initiative called "The Lost Tapes of the 27 Club." Its premise is to use AI to create works in the style of musicians who famously died young, ostensibly as a reminder of what the world might have missed out on.
I assumed the point of these grotesque parodies was to demonstrate to listeners how the actual dead musicians were irreplaceable, but apparently they were meant instead to hint at the works that might have been, a premise that, as Billet argues, presupposes those artists' inability to evolve. "The presumption behind 'Drowned in the Sun'" — the fake Nirvana song produced for this project — "is that, had Kurt Cobain not taken his own life in 1994, he would have been writing and recording songs just like this." That is the nature of any AI generator, which takes past performance as the horizon of future possibilities.
The technology cannot create anything; it can only calculate probabilities according to a set of benchmarks. This effectively abstracts musicians' generic qualities while negating the spirit that animates those qualities and allows them to transcend formula. In other words, the kind of analysis that can produce these simulations also reifies what is being simulated; it is deeply reductive, despite seeming to produce "new" material. "The idea that a person’s expressive potential could be quantified, programmed, and reproduced by artificial intelligence reinforces the sense of futility characteristic of capitalism, the same futility Cobain grappled with nearly 30 years ago," Billet writes. Artificial intelligence is a machine for implementing commodification and standardization, conformity and normativity. It's not a way to see any future.