how to poison AI music scrapers

Episode Transcript

Speaker 1

You mentioned that you weren't really releasing music.

Can you can you tell me about that decision.

Speaker 2

I discovered that if you typed in across Suno and Udio make music like The Flashbulb, it would just sound like crappy versions of my music.

Speaker 1

Ben Jordan is a musician in YouTuber, and he releases music under the name The flash Bulb.

You might not have heard his music yet, but if you've tried an AI music generator like Zuno or Judio, you might have used his music.

AI music generators work kind of like an audio version of chat GBT.

You can type in something like make me a techno song with upbeat vocals and pianos, and it does it based on a massive library of music that it's scraped.

And when Ben tried messing around with one of these, he realized that his music had been scraped into that library too, without his It's.

Speaker 2

One of those things where I feel like a lot of people could type in their name and in might guess something similar to a song that they made, but like, this was like undeniable.

This was just like, oh, this is literally just everything that you would expect in a song of mine, all the even the weird things except for the me part.

Speaker 1

So Bin came up with a solution, a program that adds imperceptible noise to a music track, confusing AI models and preventing them from replicating the track.

This is a technique called poison pilling.

Speaker 2

Poison pilling started with images.

There's one called night shade, and what it did is it essentially just generated some stuff in the images that was mostly invisible to humans, and then the AI would see it as something else, or it would confuse it.

Speaker 1

A couple of years ago, as l limbs were really taken off, a group of researchers at the University of Chicago developed night Shade and glaze.

These are programs that take an image and make tiny changes to them.

These changes are basically imperceptible to the human eye, but it would confuse an AI model.

The thinking was that if artists applied nightshade to their images and those images were then scraped to train AI models, it would not only prevent the l elms from learning anything from the individual artists work, but it would also quote poison the data sets and make those models less reliable.

Speaker 2

And so I was like, Okay, well, how possible is this it's music because I know that adversarial noise attacks are possible on things like your Google Home or Oxa or Siri, and it turns out that it is totally possible.

Speaker 1

I'm afraid from Kaleidoscope and iHeart podcasts.

This is kill switch.

I'm defter, Thomas.

I'm sorry.

When was the first time that you really started feeling AI impacting your music personally?

Speaker 2

The beginning would be almost in a positive exploratory way, in like twenty sixteen, Google release Magenta, and in one of my albums, I used it to sort of generate this weird morphing sound between three different instruments, and it was like unlike anything I had really heard before, and it was this new type synthesis.

So of course I like jumped all over it, and I I was fascinated with it, and then you know, moving on from there.

It's just the current landscape that we're in that makes it so bad.

So for example, with Spotify refusing to pay less than a thousand streams per song, you have like a year to get a thousand streams.

If you don't get that, they don't pay you.

So you have all this already, like low royalties being paid out by these digital streaming platforms, and you have way too many artists already, you know, for the system to actually work where people would be making a living.

And now you have people who aren't musicians who are just using these services to generate as many songs as they possibly can and in their monthly subscription.

Speaker 1

AI generated music is starting to creep into music platforms like Spotify and even YouTube.

Now you might have come across it at this point, and maybe you recognized it, and maybe you didn't.

Aside from being really annoying for people who actually care about music, for musicians, this is taking away attention and thus money, because a lot of artists' income depends on the number of streams they get.

And it also doesn't help that the CEO of Suno, which is one of the most popular AI music companies right now, doesn't seem to really appreciate the music creation process.

Speaker 3

It's not really enjoyable to make music now people enjoy it.

It takes a lot of time, it takes a lot of practice.

You need to get really good at an instrument or really good at a piece of production software.

I think the majority of people don't enjoy the majority of the time they spend making music.

Speaker 2

That's like one of the most absurd things I've ever heard in my life.

I mean, that's how I hear that is a CEO trying to justify the existence of a company in a practical business sense.

Speaker 1

So before he went the poison pill route, Bin's first idea was to make something that would detect if music was generated by AI.

That way, a platform like Spotify could use it to just reject any AI generated music that someone tried to upload.

Speaker 2

Basically, when you put something on Spotify or really anywhere where you listen to music generally, obviously, file size and bandwidth are giant considerations, and so you have to compress all and within that you use techniques like inverted discrete cosine transform and a bunch of smart sounding things, and you can detect discrete cosine transform.

And so the thing is is that Suno and Udio, they allegedly went on YouTube and Spotify and they just scraped and scraped and scraped and learned and learned and learned, and so it's quite easy to detect it.

Speaker 1

I see what you're saying.

So you're you were able to basically detect Okay, this was downloaded from Spotify because it's basically a sound signature.

Even if a person doesn't hear it, computer wise, you can sell YEP.

Speaker 2

And so the sort of idea after I announced that, I got a lot of people saying, yeah, but now they're just going to use raw wave files, they're just going to use the masters, and it's like good.

Then they have to negotiate with the artist like that opens a conversation.

Speaker 1

The masters we're talking about here are the original, highest quality track, which would be owned by the artist or the label.

But in order to use those master files, you need to get them directly from the person who owns them, and that would mean you'd probably need to pay them.

The goal isn't necessarily to stop AI, just to stop AI that the artist isn't getting paid for.

And Ben thought this project could dissuade AI companies from scraping data because platforms could use this tool to detect and again then reject that AI music.

But Spotify hasn't implemented this, and as far as we know, it hasn't stopped AI companies from continuing to scrape music.

Speaker 2

I mean, tell Sam Altman to pay for all the data that he's training everything with them.

So, so I guess that's what led to the next step, right, It's like, okay, well, how do we prevent it from being trained?

Speaker 1

Enter the poison pill.

Ben knew that you could do this in images, but how would this work in audio?

It turns out the process is actually pretty similar.

Speaker 2

It's actually not all that different from a technical standpoint, because the majority of AI music sites, how they're really working is it's all based on the original unit model that was for microscopic imaging.

That's sort of like what changed this whole generative AI thing and made it so much easier to train things than it used to be.

But if you've ever seen an audio spectrogram that you would see like a violin, it would be like yeah, and so you would see the note or a line slowly get thicker and thicker as the violin got louder, where as a guitar or a piano it would be an instant star snare drum would be like an instant start and then maybe a little bit of fade out on the end depending on how it's mixed.

You know, there are programs, for example, where you could listen to audio that you draw, and so it's basically doing that.

It's just reading the spectrogram of the audio and then learning from that and then re encoding another spectrum and then converting that back into audio.

So it's kind of funny because that's like that's a little bit of a flawed way of generating audio to begin with.

Like you could usually hear something if it's been converted to a spectrum in back, and you can hear that almost all EI music.

That's kind of why it sounds a little glitchy or squeaky or not.

It's hard to describe.

Speaker 1

I didn't realize that.

The way that lms are essentially, you know these models are interpreting music, is they're really interpreting them as images.

Yeah, using this technique, been created a process he calls Poisonify.

When you run a track through Poisonify, it'll add noise.

It's imperceptible to us but visible in the audio spectrogram.

This confuses the AI training on it to the point where it can't identify instruments.

Speaker 2

So Poisonify is essentially preventing what Magenta initially did, where it learns primarily can identify instruments and can identify style and things like that.

It cloaks it to where it just thinks that it's tearing something else.

You could have targeted attacks where you can say I want my piano to sound like a harmonica or something, or you could have untargeted attacks, or it'll just kind of go with whatever's easiest.

And when using them in a particolar way, you can successfully make these instruments and these styles unidentifiable.

So Sono or Udio they would get confused.

Then when it came time to draw a new spectrogram to convert into audio, they would probably draw some of the wrong ones.

Speaker 1

So this means that you could put say an e EDM track that's treated with Poisonify into Suno, for example, and if you ask Suno to generate something similar or extend the song, it'll spit out something totally unrelated, like acoustic guitar music.

Here's a clip from Ben testing that out with his own music in his YouTube video about this process.

So here we go.

Speaker 2

We can upload my original song here and now here is SUNO's AI extension of that song, Sweedish.

Speaker 1

Glowing.

Speaker 2

Okay, now let's upload my poison if encoded track and here is SUNO's AI generated extension.

I would describe this as music from an Airport SPA that somebody downloaded off of Napster in nineteen ninety nine.

Speaker 1

The entire video is definitely worth checking out, and will include a link to it in the show notes.

But it's really interesting to hear how confused SUNO gets when the track is encoded with Poisonify.

Speaker 2

In some of those demonstrations, a lot of people were like, that is so crazy, and it's like, no, really, what it's doing is like, that's its own safety mechanism, Like because it knows that it was confused.

Speaker 1

This is pretty fascinating Poisonify.

It doesn't make AI think the drums or flutes, It just confuses it so much with the noise that it adds to the spectrogram that the AI doesn't know what to do and falls back and randomly chooses something that it knows to be music.

Speaker 2

So if you've very used like generative AI with images, you'll notice something that happens quite often.

As you'll say I want a dog in a canoe eating a banana headed towards a sunset, and the image you would get might be like a dog on a jet ski not eating anything in a lake with no sunset, And those are really just failsafes like it tried to make a sunset, it didn't have enough confidence.

It literally is called the confidence rating, and so it just said, okay, let's just make what we normally make in the background.

And so it's sort of the same thing with music.

Speaker 1

And there's another program that takes it a step beyond poison Ifi.

Instead of masking the instruments, it masks the music itself.

Speaker 2

Harmony Cloak.

That's like above my pay grade.

I don't really understand how that model works.

I'm just really glad that they're working on it.

But yeah, I mean they obfuscate melody and harmony, which is pretty crazy.

Speaker 1

I talked to the developer of Harmony Cloak about how exactly they do this, and they even helped us test it out on the kill Switch theme song that's after the break.

At the same time that Ben was working on Poisonify, researchers at the University of Tennessee Knoxville were working on another way to poison pill music, a program they call Harmony Cloak.

Speaker 4

Hu mess and machinks interpret data in different ways, so there's a perceptual gap between humans and the machinks.

Speaker 1

Jin Lu is an assistant professor at the University of Tennessee, Knoxville and the lead developer of Harmony Cloak.

He's also really into music himself.

Speaker 4

I love Milzaic.

Actually I also play Militzach.

I played best guitar.

Speaker 1

Harmony Cloak is similar to Poisonify and then it adds imperceptible noise to the fire file.

But unlike PUISONI five, Homony Cloak doesn't just work on the level of the instruments.

It completely confuses the AI, so it can't learn from the music at all.

Speaker 4

So what we are doing right now is to use perturbation.

So we inject imperceptible perturbations to the music sampos to trick the model into believing that they have already learned this before.

So there's no new knowledge, no new information in bad in this music samples, so they couldn't learn anything from this piece of work.

Speaker 1

So the AI thinks there's no new information and essentially ignores everything that's in the file.

This means that AI models can't train on music.

With Harmony Cloak applied, you're talking about introducing noise or what you call, you know, technical term perturbations, right perturbations into the music, which you say are imperceptible?

Are these actually imperceptible?

Right?

If you're adding noise, if you're adding extra data into the music, can I as a listener hear.

Speaker 4

That the perturbation we injected should have a minimum impact on the perceptual quality of the music, because no one wants to add noises to their artwork.

So we conducted a very comprehensive under study.

We present both original one and the perfect one to musicians and we ask them to tell the difference, and our studies shows that they won't tell difference between these two.

I think in terms of the musical quality, there's no big difference.

So actually the noises.

Speaker 1

Itself is audible.

Speaker 4

So if you listen to the noises only, like you separate the perurbation from the music examples, you can hear something it's audible.

But if you combine these too, the noises will be hidden under the music samples because we leveraged the psycho acoustic phenomenon.

So when we listen to these music samples and pation and together, now is this will become imperceptible.

Speaker 1

I was curious about this whole imperceptible thing.

So Jin said he would not only help us test it, but also use an updated process they're calling music shield.

So I'm gonna play a snippet of the kill Switch theme song with and without music shield applied, and you see if you can tell the difference.

Here's sample one, okay, and here is sample two, So you tell me.

Could you hear the difference?

It's very slight, if anything.

Oh and by the way, if you were wondering, the one that was run through music shield was the first sample.

And here's something else.

Music shield actually goes one step further than the harmony cloak process that we were talking about.

It not only stops AI models from training on the music, but it can also prevent music generators like Suno or Udio from being able to edit or remix tracks.

So let's give it a shot, and let's put this thing into Suno.

So if we upload our original untreated theme into Suno and tell it to remix it, here's what we get, which for an AI generator is not bad.

That's in the ballpark of what the original song sounds like.

And this is what happens when we upload our same theme that's been music shielded.

Okay, Yeah, this is different.

It's a lot more soothing than the song we gave it.

It kind of feels like a corporate video, maybe for investors at a defense contractor.

There's maybe a guy in a suit on the screen and he's telling you about how their business is really all about family.

Clearly it works, Zuno got so confused that it just spit out some generic corporate music.

So I've read your paper that you published recently entitled Harmony Cloak Making Music Unlearnable for Generative AI.

One of the things that I found really interesting is the language that you use, right, because you're osensibly you're talking about music, very kind of broad, very easy to understand.

Thing.

Because I'm reading through your paper, I'm realizing I'm reading a security paper.

Section three point one is entitled threat model.

And then as I read further down, I'm seeing, you know, I'm just gonna read from this a little bit.

Yeah, you're laughing, but this is amazing to me.

I mean, the attacker eg AI companies or model owners might scrape music data from the Internet or music streaming platforms to train their music generative AI models, potentially leading to copyright infringements and harming musicians.

This part right here, I love this we assume the attacker possesses substantial advantages and capabilities, including unrestricted access to the training data set and model parameters, facilitating comprehensive data and grading expansions, and the ability to perform adaptive attack strategies.

And it goes on.

But I mean, this is fascinating because usually if you read it security paper, you're thinking of the defender is you know, somebody with some resources.

It could be a bank, it could be a tech company, it could be a governmental agency, and the attacker is somebody with considerably less resources.

This is the reverse of that.

The attacker is somebody with a lot of resources, it's a large tech company probably, and the defender is just you know, some kid with the bass guitar.

Speaker 4

Yeah, exactly.

And actually in the paper we also discussed the possible a tech because the big tech company may also leverage additional strategy to relearn or process.

You are protecting music to learn something from it.

Right, So one very straightforward way is to use noise cancelation techniques to remove any perturbations from the music samposts.

So that's maybe one strategy they leverage.

But the easy here is that you can leverage whatever way to remove noises.

Yes, you can reduce the effectiveness of our framework, but on the other side, the quality of the music will be dropped as well, because when you remove noise is certain music or futures will be removed as well.

Speaker 1

Yeah, so I think I see what you're saying here.

So your framework, harmony cloak, the entire purpose is to make the song unusable for the tech company.

And if the tech company then tries to do something to mitigate that, to remove the noise, the perturbations that you've introduced into the music, that reduces the quality, and do they really want to be putting bad quality data into the data set.

No, So you've accomplished your goal, which is again to make it unusable for them.

Speaker 4

Yeah.

Speaker 1

Ben Jordan has a similar philosophy with this Poisonified project.

He knows it's going to be a battle with AI companies, but that's kind of the point.

Speaker 2

A lot of people have said, well, you know, with what happened with Nightshade and Images, they're just going to do something like that.

Speaker 1

To clarify what Ben's talking about here.

Pretty soon after Nightshade came out, and that's the poison pilling tool for Images mentioned earlier people started saying that they'd figured out a way to bypass it by blurting or sharpening out the noise.

The team that developed Nightshade disputes this, but their other popular tool called Glade, was also briefly bypassed using some image upscaling techniques.

And it's basically a back and forth war here.

And because with audio AI still processing a spectrogram image, in theory an AI company could also use similar techniques to bypass poisonify and harmony cloak.

Speaker 2

They might There are a couple considerations out, So like when we talked about that audio going into a spectrogram back into audio thing, if you think about what would happen if you think about how a snare drum works and a spectrogram, Okay, it's like it kind of starts immediately and then it fades out a little bit, and that gives it like the right, Yeah, if you were to bore that, now it sounds really bad.

Speaker 1

You have this.

Speaker 2

And so different things need to be precise.

And not only that, so they use like the boring and the AI sharpening, and if they were to do that with spectrograms, it still that's a lot of extra compute and expense, and really the goal is to just pressure them to work with musicians, Like if they actually want to make money off of this and they want to continue selling subscriptions to generate this stuff, then uh, you know, it's just to pressure them to make getting the wavefile directly from a musician cheaper and easier than doing it the way they've been doing it without consent.

Speaker 1

Uh huh.

So part of this is you're not necessarily thinking that this is an undefeatable attack against you know, AI scraping.

It kind of sounds like you're sort of hoping that it becomes obsolete because we know there's going to be an arms race.

We know companies are going to figure out a way to defeat your poison pill attack.

Let's just be real.

They got more resources than you, they got more engineers than you do.

Yeah, they're going to figure it out, sure, but it kind of sounds like you just rather them decide, you know what, the sain't worth it?

Speaker 2

Yeah, I mean you know what I do.

Hear like the arms race analogy all the time, and it's like war is almost always a net loss, Like and if they have anybody smart in whoever's funding them, they won't understand.

Speaker 1

That you can make it so annoying, yeah, to scrape people's music that they're just going to go the you know, quote unquote do the right thing.

Speaker 2

Yeah, Ultimately you might have it to be is so omnipresent that AI music sites actually have to say, okay, well we need to just talk to artists now, we need to just start training on stuff that we know doesn't have this, because we're wasting too much money on compute for things that are just degrading the model quality.

Speaker 1

So if you're a musician, you're probably thinking, when can I start using this stuff to protect my music?

Or if you're a music fan, you might want to know when this stuff drops so your favorite artists can stop getting their music scraped and stolen.

Well, I've got bad news for you, but also some good news that's after the break.

So I know that there's gonna be definitely some musicians who will hear this and say this sounds amazing.

I want this now?

Yeah, is this something that's available right now?

Now?

Speaker 2

It's funny because after I release that video, I probably got one hundred emails of people just linking me to like Google drive of their songs and I'm just like, okay, this is not how it works, like unfortunately, yeah, so I am not like, I'm not an mL developer.

I can't write a phision code.

I believe it took about two weeks for ten songs or something like that on my machine with two brand news state of the art, you know, big video cards.

Speaker 1

Two weeks as in on and off incoding, no NonStop two weeks and non stop and coding to do your album.

Okay, yeah, that is that's not accessible.

I will not be asking you to handle my album for me then never mind.

Speaker 2

Yeah, And so it's also like even if I could, then it's still like, okay, well if it's this inefficient, like using this much power, you know, what we don't want is to set the planet on fire just to protect our music from a couple startups.

Speaker 1

So Ben Jordan's program might not be available to the public anytime soon.

That's the bad news.

But here's some good news.

Professor Jin Leeu and his team do have some near future plans to make their software more widely available for a musician in the future.

What would protecting your music with something like Harmony Cloak look like?

Is it downloading an app?

Is it, uploading it to a site and redownloading it.

What are they doing?

Speaker 4

There are many many ways to use this technology to protect their music.

First of all, we are thinking to integrate these technologies with other platforms, for example, Apple Music Spodify, so in that case, once they upload their music to their platform, they can automatically protect their music.

We'll also create a web set, so on our web set they can upload their music then download the perturbrization version from it, so musicians people can use this very easily.

Speaker 1

Do you have a timeline for when this might be available for the public.

Speaker 4

In July, we plan to launch a test program which will involve around two hundred musicians so that we can folder fun team fold improved this system before large scale deployment, and if everything goes smoothly, I think integration of this technology in at the Plasma will be very quick.

Hopefully this can be integrated in August ord September this year.

Speaker 1

Despite the fact that he's working on programs that are actively fighting against AI, jin, Lu is not universally anti AI, and neither is Ben Jordan.

They both think that AI can be a useful tool.

Speaker 4

I think AI machine learning itself doesn't have any problems.

The problem is how this big tech company trend their models.

And also from the musician's perspective, because we talk to many many musicians, actually some of them use AI.

They feel this AI model is pretty useful.

But if these company wants to use their music examples for training models, they need to get explicit permission.

Also they need to offer compensation to musicians.

Speaker 1

How do you feel like, say the next six months year plays out for music and AI.

Speaker 2

One thing that I mean it's probably good news for anybody who's worried about AI music taking over anything, is that, like psychoacoustics are really really complicated, Like telling a computer to hear something without any sort of image analysis, to just not go that spectral conversion route and just hear something the way a human hears is like you may as well just ask it to become self aware, because that sounds easier to me, just because like what's happening, you know, like our hearing is by far our most sensitive sense that we have.

And when you think about what happens from picking up pressure waves and then little hairs in our ear picking this up and interpreting them in conjunction with our brain into sounds.

It's kind of mysterious and crazy.

And so to just tell AI, like, hey, listen to this this sonic pressure and figure out how to make it again like that, that's a much bigger ask than I think it sounds too your average investor or something who thinks that AI music is going to eventually I don't know, I guess replace musicians or something.

Speaker 1

Yeah.

Speaker 2

Ideally, what I would really like to see is, as people get more used to AI, two things I would like people to use it locally.

So, for example, image and heap.

She sent me a bunch of like probably over an hour of her singing, sometimes in like really weird ways, and then we sort of work together and I create a voice model, and then she sang through the voice model, and that was all locally.

It wasn't happening through any sort of service.

I really liked that idea.

And I like the idea of artists being able to sell their voice or sell their music style or their instruments or something like that and put it in a marketplace.

And really the only technology that would need to exist in any sort of centralized way would just be somebody to water market or something.

I really like that and the other thing is, right now, we're in this land in generative AI where the ideas are just huge and kind of nonsensical, Like, you know, what if AI replaced music?

It's like, Nope, it's not gonna do that, But what if it replaced samplers, you know, like violin, sample instruments or something like.

That's actually somewhere where AI can do a really really good job to make writing music more fun and more accurate, I guess, and you know, things like that.

And so once we sort of realize that that not every single person is going to adopt AI music and stop listening to humans, then maybe we can invest money into making practical solutions.

Speaker 1

And that is it for this particular discussion about AI and music.

And I say this particular discussion because this is not the last time we're going to be talking about AI and music.

This is a really big topic and all of us at kill Switch are pretty into music, so we're absolutely going to be getting back into this again.

And you know, if you've got any music related stuff that you're curious about, let us No.

Before I get out of here, though, I gotta do some shout outs.

First, big shout out to Ben Jordan.

If you found his Poisonify concept interesting, he has a whole YouTube video on it, and the link for that is in the show notes.

He's also started a company called top Set Labs that's developing AI voice models that are trained on artists who have given their explicit consent, and a lot of them are making more money in those royalties than they do on Spotify.

So if you're a musician or you're just curious on how that works, might want to check that out too.

Also a big shout out to our other guest, professor jenlu as well as Saydeerfond from the University of Tennessee Knoxville for letting us test out Harmony Cloak and music Shield.

And if you want to check out that paper we were referencing, there's a link to that also in the show notes.

Thank you so much again for listening to kill Switch and again let us know what you think and if there's something you want us to cover, We're easy to find.

You can hit us up at kill Switch at Kaleidoscope dot NYC, or you can check us out on Instagram at kill switch pod or I'm dex Digi.

That's d e x d I g I on Instagram or Blue Sky, and wherever you're listening to us, make sure to leave us a review because it helps other people find the show and that helps us keep doing our things.

Kill Switch is hosted by Me Dexter Thomas.

It's produced by Shina Ozaki, Darluk Potts and Kate Osborne.

Our theme song is by me and Kyle Murdoch and Kyle also mixed the show.

From Kaleidoscope.

Our executive producers are Ozo Lashin, Mangesh Hatikadur and Kate Osborne.

From iHeart, our executive producers are Katrina Norville and Nikki E.

Tour catch All the Next One