Hi folks, another shitty story from the slop-pocalypse ((AI-)slopalypse?).
Article from billboard, archive
NB: I think this story is bullshit. I imagine some parts are true, but there’s no concrete source given for the “$3 million” figure. So it’s my speculation that this story is hype cooked up by Suno (the AI company enabling this all) and thrown at publishers for an easy headline. Also the human behind this has their name spelled differently in the two articles, so clearly some quality journalism is happening.
IIRC, you tell the thing “sing these lyrics in the style of [insert artist here]” or “as if you were a soulful blues singer with a two pack a day cigarette habit” or whatever, and the thing does it for you.
It can get the tones and inflections right because it’s been trained on hundreds of thousands of spoken words with the correct tones and inflections, and that training developed an extraordinarily complex algorithm for generating tones and inflections.
And if it doesn’t? One of the interesting things about AI generation is that it’s inherently randomized. No two results of the same prompt will be exactly alike. So even if the AI only nails part of the song on the first “take”, humans can run the prompt over and over again and stitch together pieces from different takes into an entire song, just like actual singers do.
Star Trek used to imagine AIs that would be great at logic but bad at emotion. It’s the other way around. Our AI tools are amazing at mimicking emotion, because they’ve been trained on millions of works with genuine emotion. But give them a math problem and they’re apt to screw it up.
(I mean, this one could also be an AI puppet with a human voice behind it. We’ve had Vtuber tech for like ten years now. But it could also be completely AI.)
I don’t know why I didn’t think of that. Audio prompts can be just as approximate as video prompts. Very simple answer. Thank you.
Of course, I’m still not crazy about it, but that’s a different topic.