PhD Vlog 09 & 10
The satisfaction of creation, the ethics of technological tools, and the ghost in the machine
A few weeks ago I sat down with Stable Audio, the generative AI music tool that has come to be my go to source of AI generated music. I like Stable Audio because it has a nice clear interface, and some good tools for getting the most out of your prompting - such as a set of prompt presets focused on specific genres. But above all else, it offers the user the ability to use audio as the prompt. By itself this isn’t entirely unusual, however the presence of a percentage slider that indicates the degree to which the output resembles the input really elevates this tool and makes it much more usable for my particular purposes. If I were more inclined to delve into the software mechanics of AI training on my local machine I’m sure I could be tailoring the tools to further meet my needs, but I am no software engineer and, as I have touched on in the past, writing code is beyond the purview of my area of research. So, at least for now, Stable Audio is the best tool I have found for my particular needs at this time in the course of my research. This may change, but here we are.
Before I started generating audio I decided that, this time, I wanted to exclusively use my own music as the audio prompt. Over the course of the past year I have been producing modular patches in VCV Rack, largely for demonstration on my YouTube channel. So I have several pieces of generative modular music that I thought would be ideal source material for music prompting. I set about recording short sections of these patches so I could then use them as prompts. I built a small library of demos out of these patches.
With these demos in tow I moved over to Stable Audio and fed the first modular patch demo into the AI model. I set the percentage to seventy five percent, which is the default setting and about where I think the sweet spot is (no doubt why it is the default setting). Using some of the preset text prompts in conjunction with my audio prompt, I experimented with different settings and explored the possibilities, as is detailed in the PhD Vlog 09 video. I got several great results from this prompting session, but there was one in particular that captured my imagination.
The original VCV Rack piece that I used as the audio prompt:
The generated audio from my original audio prompt:
The final piece titled “Bliss Edge”:
All of the generated pieces retained the same bass note progression of my original modular demo, and retained the chords, or at least a close approximation. The difference between generated results was largely in their timbre, texture, and general sound design of the pieces, otherwise sticking fairly closely to the harmonic content of the audio prompt. This is one of the great strengths of Stable Audio, it allows the user to use their own music as the source material and get an alternative creative perspective for the sound design, production, and aesthetic sensibilities. Thus, Stable Audio is a pretty powerful tool for sound design purposes or for trying out different ideas on the same framework. It is still quite hit and miss, often requiring tweaking of the prompt, or simply trying again with the same prompt till you get a result that works.
So, with all of that in mind, let's return to the generated piece that I ended up using as the basis for a new piece of music. This piece really stood out from the crowd, offering an alternate version of my input with a few key differences that changed the mood quite significantly. To begin with, Stable added a haunting vocal to the piece. I did not prompt it to add vocals, or use a prompt framework that would insinuate the presence of a vocal performance - ie a genre where vocals would be common. When I say vocals, they are clearly identifiable as a voice, but there are no lyrics, it's more of a background harmony of sorts. The addition of this vocal re-harmonises one of the chords in the original track, adding a dark flair to that particular chord, which alters the mood enough to give the piece a haunted quality. This only really happens on one chord, but this alteration really shifts the mood, and I loved that. I knew that this piece would be a great starting point to build a track around.
The next step was of course to start building a track to best utilise this piece and accentuate its unique aspects. Before I did anything else I brought the generated piece into Ableton Live and started mixing it using multiple layers of parallel processing, to get the best foundation to work from. Once I had reached the sound quality I was happy with I exported that as a loop and fed it into another AI tool, this time a stem splitter from Voice.ai. This allowed me to extract the harmony, beat, bass, and the vocal elements, separately. This gives me far more control over the audio and the mix. Given that I felt that the vocals were the key ingredient in this track I wanted to really showcase them, so I set about mixing them, again with parallel processing, and an array of reverbs. I did something similar with the bass elements, although I ended up layering the bass with a simple sine wave sub bass so I could have clarity in the lows whilst retaining some of the texture of the AI generated bass. I didn’t want this to be a beat driven track, it was too spacious and lush, and I wanted to lean into that. My original VCV patch was also quite spacious but had a lot more glitchy strangeness in it, and a much more prominent beat, so I wanted to differentiate this piece from the original source material. There is still a beat, although it arrives late and is fairly minimal. I used the AI generated beat as the foundation but expanded upon it and, again, used parallel processing to mix the original elements as well as incorporating a new kick drum.
In addition to cleaning up and rearranging the AI generated components, I also added several new elements to the piece. Perhaps most notable is the synthesized bell and lead section that appears twice in the piece and effectively acts as a chorus or crescendo. To me this is a highlight of the track and really builds the emotion. I added several textural pads in various places too, to fill out the arrangement, add movement, and punctuation. These pads came from a virtual instrument called Cube, which is a fairly complex, modern sample based vector synthesizer. I also used an emulation of the Access Virus TI synthesizer for a few pads towards the end. It’s not so much what instruments I used but rather, how they are layered and interact with one another. All of these elements are outlined and shown in the PhD Vlog 10 video.
In my last meeting with my supervisors, Prof. Andrew Brown, and Associate Prof. John Ferguson, I mentioned that I had made a new piece of music and I was really happy with it. Andrew asked me what it was about this track and its process of creation that I thought worked well. At this time I hadn’t really stopped to consider why this track had captured my imagination so acutely, and why I felt the results were of such high quality. Despite not having consciously thought about it, I feel that I had been quietly mulling it over already and found that a pretty detailed explanation fell out of me:
1
This time I had used my own work as the audio prompt. This allowed me to feel a greater sense of ownership over the resulting work. This may seem simple but it is important to acknowledge, and to examine why this might be. Making music with generative AI as a collaborator is a little bit like remixing a song; I am taking a piece and refashioning it to my own tastes. Likewise, I feel it is quite similar to building a track around a particular loop from a sample pack; even if another person uses that same sample it is unlikely to sound exactly the same as my version. This process of creation is built around a foundation of inspiration, a starting point from which your own ideas can flow freely. This process can sometimes lead us to feel less ownership over the work, and using AI generated material can elicit a similar feeling in me.
I’m no stranger to remixing songs, I find it to be quite fun and if I really like the source material (which I usually do, why else would I remix it?) I find a lot of value in recontextualising a beloved work into something new and fresh, something with a different feel that also captures the magic of the original. While I enjoy this process, I find that I am less connected to the finished piece. I place less importance on it and, understandably, feel less ownership over it. My feeling of ownership is stronger when I create music using AI generated material than it is when I remix because of three important factors.
Firstly, the AI generated piece has not burrowed itself into the wider culture, becoming part of the zeitgeist, with the wider listening populace ostensibly retaining cultural ownership over it (if not actual creative IP ownership), making it nigh impossible for me to claim it as my own regardless of how I have reinterpreted it.
Secondly - but relatedly - no one has heard any of the pieces I have prompted the AI models to generate; with no cultural mindshare associated with them I view them in a similar way to how I view a sample library of loops, or royalty free music libraries online. They don’t occupy space in popular culture, or in the hearts and minds of a listening audience, and as such I am able to feel a greater sense of ownership over resultant work.
And lastly, as I have discussed in the past, AI generated music tends to be quite generic, with perhaps one in five prompts resulting in a piece of music that I feel has something special that is worth using. Even then it is only the bones of an idea, it still needs a lot of work to be brought to a level of finish that I believe to be appropriate. As such, when the piece is finished I feel that my part in bringing it to completion is a significant component in its success, and thus I feel more connection and ownership over it.
Now, I realise these are fairly granular distinctions, although, I would argue, they are important categorisations in identifying the degree to which I feel connected to my work. So, again, because I used my own work as the prompt, the generated piece was, in essence, written by me and a result of my own work. The AI merely offered me an alternate version. This combined with how I have chosen to arrange, produce, and supplement the work with my own aesthetic and process, leads me to feel a great sense of ownership over the work. I don’t think anyone else would have approached the piece in the way I have and, I believe, it sounds like me.
Ultimately, whether I am using AI generated material, crafting a remix from a popular song, or using a loop from a sample pack, there is a spectrum of artistic ownership on which all of these creative processes lie. From creating with no material for inspiring creation, to the opposing end of using only AI to craft an entire song from beginning to end, this spectrum has an impact on how I feel about a given work. Furthermore, if I use my own work as the prompt to generate variations on the original piece, I move further down the spectrum towards unassisted creativity.
2
As a follow on from what I have already touched on above, I see the process of using AI generated materials for music composition and production to be no different to using the suite of tools available to the modern music producer. Such things as generative sequencing, sample packs and loop libraries, synthesizer presets, MIDI chord packs, chord generators, automated EQ tools such as Soothe, and the list goes on. A common refrain I hear from musician and producer friends is that they feel like using samples is cheating, my answer to them is not binary, but rather that it entirely depends on the context of the sample in their work and how they have used it. Naturally this is a subjective argument, but regardless of where you stand I can easily assume it is along a spectrum from - all samples are cheating - to - everything is fair game. I tend to reside somewhere in the middle and rely on the context of use to make my appraisal.
My point here is that sample libraries and loops are merely tools, and like any tool they can be used and/or abused. I personally would find no satisfaction in creating a track entirely out of loops, akin to using a paint by numbers book. However, someone who is new to music might find great satisfaction from that approach, and perhaps it helps them learn more about how music is structured. Another example might be inserting a sample into a production that is ostensibly finished, with the sample providing that last missing piece. Some might balk at its use, but I ask why not? If it works in the context of the track, bolstering what is already there, I see no problem with it.
Likewise with generative sequencing to create melodies or chord progressions, the tools are there to use, why not use them? The greater your knowledge of music, the intricacies of music technology and the tools therein, the better your results are going to be, even if you have decided to use generative music tools.
This all leads me back to something that I have been harping on about for years: music composition and production (art in general) is a curatorial process. We audition ideas to see what works, and generative music tools can facilitate this process and be a source of inspiration. This is all a roundabout way of saying that generative AI is just another powerful tool that we can use to create art. It is down to the individual creator to decide how they want to use it, and through that, to what degree they will feel satisfaction in the process and the results. Personally, I need to be an integral part of the process for me to feel satisfaction, even if I am using generative tools. That's my own personal boundary, within which I feel satisfaction, connection to the work, and a sense of ownership over it.
4
Another aspect of why I like this track so much is simply down to its arrangement and the way it builds. Beautiful harmony, melody, and a timbral quality that is best described as “lush”, are hallmarks of my sound and compositional components I can’t help but gravitate towards. This track really allowed me to indulge in those personal aesthetics that are so important to me. Similarly, the progression of the track, the way it builds slowly, confident in its progression, revealing more and more piece by piece, creates a sense of narrative, another hallmark of my work. I love the way the track slowly unfurls, gentle in the approach but by the time we reach the destination the track almost explodes with beauty. I find great satisfaction in the progression of the composition. Everything feels right and natural, something that I always strive for in my work, but don’t always achieve.
4
Lastly, I believe this piece of music is one of the best I have made (not just for this project, but across all my work) because I identified a palatable idiosyncrasy in the generated output - the aforementioned vocal element. This seemingly random addition that the AI model “chose” to incorporate was the spark I needed and the reason I latched onto this particular piece of generated audio. It sounds dreamy and melancholic, somehow organic and synthetic at the same time. I just wrote a long piece about the use of AI in art in which I touched on the otherworldly quality of AI generated material, and how it feels like a collective digital hallucination. I also discussed how this somewhat unsettling aspect to generative AI is conducive to horror imagery, akin to a ghost in the machine. The complexities of generative AI lead us to ascribe an actual intelligence to the outcome, and a feeling that there is more at play. For more discussion on this topic you can find that article here. It is this unsettling feeling, this je ne sais quoi, that bolsters the haunting vocal in my track. I feel it brings with it a powerful emotional resonance that I would never have expected, it really is the definition of the happy accident. It is soulful, pained but content, real but unreal - I suppose the word is uncanny - and it really works.
If you’d like to know more about my process and see more of my technical process you can watch the vlog videos embedded into this post.
