PHD Vlog 07

I Spoke Too Soon: One Track Became Two

Apr 20, 2025

For as long as I’ve been creating music I’ve been a little bit obsessed by the idea of taking the listener on a journey. Journey, in musical terms, often refers to a continuous progression of ideas that build a narrative through musical motifs. My idealised version of this eschews standard song structures in favour of a unique progression of ideas culminating in a satisfying emotional crescendo. What that actually looks like in practical terms deviates starkly from the ideal, often resulting in an overabundance of ideas that don’t quite fit together. What I’m getting at is that I often lose objectivity when I’m in the flow state of playful creation.

This exact scenario happened to me this week, and I can assure whoever is reading this, it shan’t be the last time. Sometimes I find this creative giddiness to be irritating, but for me it simply comes with the territory. Granted, it doesn't always happen like this but it is frequent enough that I have had to learn how to adapt to it and become a much better, and more ruthless, editor. You’ve gotta know when to kill your darlings, and by that metric I’m a mass murderer. So, circling back around, this happened to me again. And again I was met with an all too familiar frustration at myself and the work.

Let me add some context to this. This past week I spent some time finalising one of the compositions I’ve been working on. Like all the work I’m doing for my PhD, this track was derived from AI generated content, starting life as a single AI generated loop which I greatly expanded upon.

Here is the original prompt from which the track was built:

0:00

-0:29

Once I had created something I was pretty happy with I then proceeded to render small sections of the piece for use as audio prompts to be inserted back into generative AI models, thus creating a feedback of ideas from myself to the AI model. The resulting generations of audio retained a loose character of the audio prompts but tended to move the idea into a new direction, at least with the AI model I was using at the time - more on that later.

Here are the second generation of AI creations, this time using audio prompts:

0:00

-1:35

0:00

-1:35

This difference in direction was interesting and I felt that I could merge them with the original idea in a seamless and interesting way. To an extent, I would argue that I achieved this; creatively transitioning between disparate sections is a fun challenge for me and I dove in with excitement. A raw glee spawning from the perfect storm of creative expression and analytical puzzle solving, the reward for success being considered a clever little composer.

Given that this part of the creative process is driven by play, objectivity is left behind. With this track, what I was left with was a cleverly constructed, albeit chaotic, merging of ideas. If there’s one thing that will assuredly bring you back down to earth after an exciting day of playful creation, it’s showing your work to someone else. They don’t even have to say anything, their presence alone allows you to truly hear the piece, in a way for the first time. The presence of another forces you to find the objectivity you carelessly left behind, and with that in tow it becomes clear what is wrong with the work. So after I showed it to my partner it became quite clear to me that what I had created simply contained too many ideas. The real issue being that I really liked all the ideas, and didn’t feel all that good about jettisoning any of them. So I did what I should have done all along - I turned one track into two.

It’s interesting to me how much this bifurcation improved both tracks respectively. In the original kitchen sink version, no one element had the focus it deserved, and the smaller sections never had enough time or space to truly shine. By separating them out into two individual pieces, parts of the original that felt under developed could now be expanded upon and made all the more intriguing. Focus on one or two elements versus four or five allows the listener to luxuriate in a specific sonic world and really appreciate the aesthetics without being distracted by superfluous clutter.

Here is the original “final” composition:

0:00

-5:19

Here are the two resulting compositions after I split them out:

0:00

-4:01

0:00

-3:02

As I mentioned earlier, the results from the audio prompts were quite different from the input even if they retained some of the aesthetic mood. Something which I later discovered was that another AI model, called Stable Audio (a paid model), which also allowed audio prompts, produced very similar results to the input. Stable Audio features a percentage slider to tune the level of similarity to the input prompt. At the default level of 70% the results were musically the same, but aesthetically divergent, which was almost the exact opposite of the results I experienced with the other model (DiffRhythm). Stable Audio’s approach will allow me to get a true progression of ideas from the input prompt, giving me the control to deviate from the input in one percent increments. I have already had great results using Stable Audio but am yet to reintegrate these new generations of audio back into the session that they spawned from.

Here is the audio prompt I fed into Stable Audio:

0:00

-0:19

Here are some examples of Stable Audio set to 70% similarity:

0:00

-0:19

0:00

-0:19

0:00

-0:19

Miles Cosmo Philip

Discussion about this post