I am not usually interested in alternate histories or fan fictions, but I have a weakness for the mp3 blogs that offer Albums That Never Were or Albums That Should Exist or Albums I Wish Existed. These blogs offer reconstructions of records, often complete with period-appropriate mock covers, that artists had announced but never released, or develop alternative track listings for albums that were released, or posit albums that could have been released if a band weren’t dropped from its label, and so on. Often they combine officially released tracks with bootleg material to create something like a deepfake. Listeners are invited to use it as an imaginative crutch; they can suspend disbelief and indulge a fantasy about what could have been while also possessing an actual, rare-seeming artifact that only fans of a certain intensity would think to pursue.
The people who make these albums don’t seem to be audio engineers, but typically their zeal carries them into that territory, as individual songs are reimagined or reconstructed along the lines of some theory of how they were supposed to “really” sound. Lots of revisionist history is staked out. What are perceived as production errors or artistic lapses of judgment or distortions introduced to try to follow fashion can easily be altered with editing software (sometimes touted as “AI”) that can be used to isolate the different components of a recording. The room tone can be changed, backing vocals and orchestrations and other sonic sweeteners (presumably demanded by clueless record-label executives) can be removed, and instruments can be raised or lowered in the mix (fixing lapses of focus or explosions of ego). Aspects of rejected takes can be interpolated or amalgamated to make new and possibly improved or simply just different versions of familiar songs. It’s not much different from what the Beatles have done officially, remastering albums according to contemporary preferences and completing dead band members’ half-cooked demos through various forms of technological augmentation.
This reimagining of The Soft Parade by the Doors is typical: The horns and orchestrations are removed from songs like “Touch Me,” alternate takes from the sessions for the album are tidied up and presented in favor of unworthy tracks on the original release, and a side-long suite is extracted from lengthy jam through meticulous editing and reassembling to produce a song the band never intended to write.
This is a very different kind of deep fakery from when computer scientists used generative models to make “The Roads Are Alive,” a fake Doors song complete with fake Morrison-esque lyrics. (I mentioned it briefly in 2021, after Real Life published this essay by Alexander Billet about “algorithmically reanimating dead musicians to sing against their will.” I don’t remember why we didn’t call it “Come As You Were.”) Where the superfans are exercising their judgment about what a band might have or should have done in making their new versions of songs, generative models derive a lifeless simulation through applied statistics, averaging a performer’s varying levels of inspiration and engagement into a flat constant.
But one can imagine more of the generative-model approach to simulation creeping into these fan-made reconstructions. What if Dylan sang Nashville Skyline in his 1960s folksinger voice? One could develop an AI simulation of that style and apply it to the Nashville Skyline lyrics, swapping in the results while leaving the rest of the instrumentation untouched. One could make an AI Keith Richards sing the vocals on more Rolling Stones songs. One could add string orchestrations or gospel choirs or horn charts to songs that lack it. Any song from any time can now be understood as a provisional version waiting for each individual listener to complete in their own idiosyncratic ways. Eventually this may require no technical sophistication or knowhow whatsoever; one will only have use natural language to tell a piece of software what modifications are sought.
In other words, one can prophesy for AI music what some commentators have predicted for AI video in the wake of Open AI’s video-generating model Sora, that users will personalize all video they set put to consume according to their whim, as though they had no interest in engaging with what the original makers were trying to express. Confronted with Suno, a music-generating model detailed in this Rolling Stone article by Brian Hiatt, one could come to the same conclusion. The impulse to be a super fan and imagine alternative and expansive possibilities for the music one loves can turn into a recursive solipsism in which one can no longer conceive of hearing anything but one’s own immediately gratified whims. From wanting to hear more from an artist, one supplants the artist altogether, overriding their choices and treating them as more or less arbitrary material for one’s own choices. Instead of maybe learning how to play songs like your favorite musician, you can make your favorite musician perform cover versions of your self.
Is there any actual demand for that? Even if you listen to music in the most vain way, thinking that the song is about you (don’t you, don’t you?), the experience still depends on some amount of resistant otherness that allows you to feel that you are making it about you. But when software allows you to make things out of the bricolage of other people’s expression, it’s just you talking to yourself. That can be rewarding in its own way, but it is not the same as convincing yourself that someone else has had the same feelings or experiences or tastes as you.
Similarly, there is a massive experiential difference between hearing a song broadcast on the radio and having a Spotify algorithm auto-play it in your earbuds. When you can imagine other people are listening with you, hearing what you hear, then it matters how you feel about what you hearing a lot more — at worst, you can imagine imposing your taste or interpretation on them while standing on common ground. Or you might imagine participating in some kind of collective subjectivity that is in turn capable of modulating your individuality. But if listening is a purely private experience, it is like having a boring dream. If an algorithm chose your song, it’s like having to hear about someone else’s dream while they insist it was your own.
In the Rolling Stone article, Suno is described as the Chat-GPT for music, where a user would type a prompt and some music would be generated accordingly. As with generative-image models, the pretense is that words and sounds match in unproblematic ways, and that a certain sound can be seamlessly translated into a set of words. As dubious as that assumption is for visual concepts, it’s even more questionable for sound, given that music is often conceived as nonrepresentational, specifically exceeding what can be expressed in language or in conceptual abstractions. And any listener’s experience of music is highly contextualized; what a piece of music evokes and what terms one might express that evocation depends a lot of what experience and knowledge a listener brings to it.
Hiatt begins the article by describing the model being prompted with “solo acoustic Mississippi Delta blues about a sad AI,” and without hearing a note of it, I couldn’t imagine anything more boring and insulting than whatever was produced as a result. Suno’s makers posit what Hiatt describes as “wildly democratized music making,” promising to rebalance a supposed mismatch between the number of music makers and music listeners. In Hiatt’s paraphrase of the startup’s hype, Suno will be a “radically capable and easy-to-use musical instrument” that “could bring music making to everyone much the way camera phones and Instagram democratized photography.” But it seems more accurate to say that Suno’s makers want to demystify and disenchant music by urging people to understand it representationally, as an aural way of expressing generic ideas. Music is nothing like documentary photography, and what it expresses is not a matter of framing something that already exists in the world. (And that is not to say that it is coherent to claim that photography has been “democratized,” as though the people now all have an equal say in what constitutes a valuable photograph.)
You don’t have to have ever been in a terrible band to understand that people like to make music not because of the end product but because of the process, which is an end in itself. All of practicing is about hanging out, about having something tangible structure that and emerge from that. Perhaps that is romanticizing the experience too much. But the makers of Suno seem to assume that what people care about in music is not sociality and struggle and surrender to something that refuses to be mechanized, but a kind of control under which you can consume exactly what you think you want and can call it “making something.” Suno invites us to see music not as made, and certainly not made collaboratively, but as described, narrated, emerging not in the friction between performers and their instruments and between their imagination and their talent and dexterity, but in the spontaneous, whimsical act of idle curiosity. Text-to-music is music corrupted into text.
It’s conceivable that Suno or other tools like it could help people make albums that never existed; they could be harnessed and subordinated to someone’s imaginative intention and thereby engage other people. But it is inevitable that things like Suno will be especially used to make nonimaginative music, to fill in boilerplate spaces with music-like sounds; they will allow marketing technicians to fine-tune the deployment of sound to manipulate or sedate people, to extract somatic responses that might be mistaken for emotions. Suno will mostly make the specific sort of music that no humans are willing to make without being paid, the sort of music that is imposed on people and that people are, or at least should be, ashamed of having made.
Music works as a kind of proxy for sociality; it exists to harmonize people. Or music might be understood as objectified subjectivity, held together as a totality. I realize that is a jargony way of putting this, but what I am trying to get at is that listening to music is not equivalent to an analysis of sound, and the analytical approach to music militates against what music allows one to experience, something of the reality of another person’s consciousness, a thing that can’t be reduced to statements or prompts.
Of course, music made through different kinds of technology can be social, can capture intentionality and consciousness. But the developers of models like Suno seem to take it as their mission to demonstrate how music (or art in general) can be nonsocial; the technology will let you possess control over it unilaterally and you will compel it to mean what you want when you want it.
The nature of their promises about the models invites listeners to think of music in precisely unmusical ways; it is a training module in how not to listen, or how to listen in such a way as to subtract musicality from music. It’s like listening to a song only in order to translate it into decontextualized facts: to catalog the influences or assign it to the proper genres or to paraphrase the lyrics.
But what an empty satisfaction to identify and factualize music (this sounds like x) rather than to be surprised by it into becoming something other than yourself — to find that you’re experiencing something you couldn’t have even thought to ask for, even if you could put it into words? Why would anyone want to satisfy their demand for art in an instrumentalized fashion and thereby negate it and invalidate it, obviating the very desire for art within themselves by proving to themselves that aesthetics are ultimately programmable? Why would you want to hear your own version at the expense of it being no one else’s?
We can already see this occurring over decades as the automation of sounds accelerated the ebbing of the “band” concept in favor of control freak solo artists, producers, etc.
In "The prospects if recording" Glenn Gould predicted that this sort of editing power would become part of music consumption. In the 1960s.