Spotify Music Library Scraped by Pirate Activist Group

tfm@europe.pub · 24 days ago

Spotify Music Library Scraped by Pirate Activist Group

Nilz@sopuli.xyz · 24 days ago

Download all existing literature to build a library for preservation and you’re called a pirate. Download all existing literature from aforementioned library to train an LLM and you’re a tech innovator. What a strange world we live in.

Schmoo@slrpnk.net · 24 days ago

If we’re pirates then they’re privateers, and I know which I respect less.

Galactose@sopuli.xyz · edit-2 24 days ago

Hey let’s create our own LLM or something that can pass as an LLM😏 maybe then we can get away with the pirating

Nilz@sopuli.xyz · 23 days ago

Are you rich? Otherwise we’ll still be arrested.

P03 Locke@lemmy.dbzer0.com · edit-2 24 days ago

Download all existing literature to build a library for preservation and you’re called a pirate.

Said library contains petabytes of the exact text of each and every piece of literature.

Download all existing literature from aforementioned library to train an LLM and you’re a tech innovator.

Said model contains gigabytes of a bunch of weights that can never go back to the exact words of the book.

What a strange world we live in.

It’s not strange at all. It’s degrees of compression. You compress a JPEG to the point that it’s unrecognizable, and it’s no longer breaking copyright. It’s essentially like trying to write a book you just read based on memory.

upstroke4448@lemmy.dbzer0.com · 24 days ago

Lol Meta literally torrented 81 TB of data from the site. Stop with this “degrees of compression” bs

Schmoo@slrpnk.net · 24 days ago

Said model contains gigabytes of a bunch of weights that can never go back to the exact words of the book.

And yet, the tech bros do have access to the exact words. The only difference is that they don’t share, instead choosing to extract value from it by training an LLM and (eventually, hypothetically) turn a profit. The product is created by processing the intellectual labor of billions of people into a formless amalgam of human creativity, which is then exploited for their private benefit.

hexagonwin@lemmy.sdf.org · 24 days ago

so you’re saying degrading quality while getting filthy rich by stealing everyone else’s work is better than archival efforts? not sure what your point is.

Nilz@sopuli.xyz · 23 days ago

His point is basically that if you remove every 5th word of a book it’s legal to hoard as it’s compressed.

01011@monero.town · 24 days ago

Caping for big tech?

Nasty work.

Lennard@lemmy.dbzer0.com · 24 days ago

As an artist I’m very happy to see my work archived in there. Any suggestions where I can submit my music directly to archives.

gnawmon@ttrpg.network · 24 days ago

I don’t know about Anna’s Archive but I suggest uploading it to Internet Archive.

tfm@europe.pub · 24 days ago

You could provide a torrent of it directly.

HulkSmashBurgers@reddthat.com · 22 days ago

What type of music? Got any links to it you can share?

Hideakikarate@sh.itjust.works · edit-2 24 days ago

However, these existing efforts have some major issues:

Over-focus on the most popular artists. There is a long tail of music which only gets preserved when a single person cares enough to share it. And such files are often poorly seeded.

Later…

We primarily used Spotify’s “popularity” metric to prioritize tracks. View the top 10,000 most popular songs in this HTML file (13.8MB gzipped).

I must be kinda stupid, but it sounds to me like there’s some double speak. “Only popular music gets preserved, so we preserved music by popularity”

Lojcs@piefed.social · 24 days ago

To be fair, the 10k is just a sample. The true amount is 86 million, about a quarter of all Spotify songs.

Put another way, for any random song a person listens to, there is a 99.6% likelihood that it is part of the archive. We expect this number to be higher if you filter to only human-created songs. Do remember though that the error bar on listens for popularity 0 is large.

For popularity=0, we ordered tracks by a secondary importance metric based on artist followers and album popularity, and fetched in descending order.

We have stopped here due to the long tail end with diminishing returns (700TB+ additional storage for minor benefit), as well as the bad quality of songs with popularity=0 (many AI generated, hard to filter).

Also it sounds like they had difficulty scraping some of the less popular songs and got them from somewhere else.

Kaul@lemmy.dbzer0.com · edit-2 24 days ago

It’d probably be more beneficial to read the article directly from Anna’s Archive where they display plenty of graphs and infographics to make the data understandable. Unfortunately this article has none of that. The “over-focus on popular artists” is quite literally meaning they’re only missing artists who aren’t being listened to, most of which are probably AI anyway.

https://annas-archive.li/blog/backing-up-spotify.html

katy ✨@piefed.blahaj.zone · 24 days ago

want the link just so i can know how to avoid it i’m a good girl who does’t steal totally.

tfm@europe.pub · 24 days ago

https://annas-archive.org/blog/backing-up-spotify.html

baka@lemmy.blahaj.zone · 24 days ago

You forgot to administer head pats

SqueakySpider@lemmy.dbzer0.com · 23 days ago

I like that billboard linked directly to it

LiveLM@lemmy.zip · edit-2 24 days ago

Ngl, it pisses me off that number 4 on the Top 10,000 list is “Clean Baby Sleep White Noise (Loopable)”

B0rax@feddit.org · edit-2 24 days ago

Most people still don’t know that their phones most likely already has a noise generator build in without any extra app (at least on iOS)

TheMinions@lemmy.dbzer0.com · edit-2 24 days ago

Which app is that? (I have an iPhone)

xistera@lemmy.dbzer0.com · 24 days ago

It’s in the accessibility settings. You can just search your phone for Background Sounds.

B0rax@feddit.org · 24 days ago

You can then also add it as a button to control center. Or create a shortcut if you want to have it on your Lockscreen or Home Screen.

TheMinions@lemmy.dbzer0.com · 24 days ago

Neat! Thanks!

Rooster326@programming.dev · 24 days ago

Doesn’t exist on Android but highly recommend Atmosphere

One time cost and you can make your own “scene”. You want owls hooting near a stream with cars whizzing by, and a vacuum in the other room? You got it!

0_o7@lemmy.dbzer0.com · 24 days ago

There was an app called Taomix on Android, where you could add sounds on the screen around a marker and the volume of the sounds would vary according to the distance between sound and the marker.

Like you could place a windchime near the marker and birds chirping or a river stream sound a little further away. You could always have new combinations, so that sounds weren’t repetitive.

You could then swipe the marker with a push and it would bounce around the screen creating a dynamic sound like passing through a stream or birds singing on your walk.

Then it got bought out by a company and they made a new version with sounds as an in-app purchase, while the previous app was a single purchase.

Then I stopped using it.

PolarKraken@lemmy.dbzer0.com · 24 days ago

Cool people made cool thing! -> awful people made cool thing awful

Idk exactly what comes next for us nor when, but for fuck’s sake WHATEVER we do next has gotta get rid of this goddamn shit.

frongt@lemmy.zip · 23 days ago

You can probably download the old version, either from one of those apk mirror sites, or use the Aurora store app to get it direct from the Play store (but you need to know the version code).

TrillianAstra@piefed.blahaj.zone · 24 days ago

If it makes you feel any better noise is hard to compress so it costs Spotify more I imagine

beeng@discuss.tchncs.de · 24 days ago

How long can whitenoise go before it repeats? Or vice versa, how short?

If it’s only 5seconds it can be played alot…

supersquirrel@sopuli.xyz · 24 days ago

This is a good thing honestly, fuck Spotify it ruined music as much as any single company/service could.

ScoffingLizard@lemmy.dbzer0.com · 24 days ago

Not as much as Ticketmaster. I would love to be able to see shows, but I’d soon chew my tongue off than buy their nonsense, and fuck their affiliates too. When people stop buying this shit we can solve the problem.

frongt@lemmy.zip · 23 days ago

People will not stop. The average person does not think that far ahead. This is where government is supposed to step in.

Microtonal_Banana@lemmy.zip · 24 days ago

Sadly there will be no King Gizzard in this archive.

Damarus@feddit.org · 24 days ago

I’m not sure about that. They only recently removed their music from Spotify and this archive certainly took a good while to create.

hanke@feddit.nu · 24 days ago

I bought all of their albums on Bandcamp for $1 when they had a deal going.That was to good to pass up.

Also got a couple albums on vinyl 🤟

Feels good to have sent some money they way after they dared to ditch Spotift 🙏

itslilith@lemmy.blahaj.zone · 24 days ago

Same for GY!BE, I guess

TigerAce@lemmy.dbzer0.com · 23 days ago

How much of that music is AI generated slob?

bluesheep@sh.itjust.works · edit-2 23 days ago

A lot of it probably.

They have a breakdown of the data on their blog. If you scroll down to album releases by date, you can see a very sharp uptick in releases, with around ~2 million albums in 2019 to ~11 million albums in 2024.

They even make a comment on it likely being inflated by AI:

If we group albums by release year, we see that more and more new music is added to Spotify, a lot of it likely automatically generated: […] The amount of procedurally and AI generated content makes it hard to find what is actually valuable.

TigerAce@lemmy.dbzer0.com · 23 days ago

You should watch this video:

The dark side of Spotify from Slightly Sociable.

It’s about short AI music created for phone farms to steal royalties away from real artists. It’s a whole business model and Spotify encourages this malicious practice as all those phones use premium to earn money faster, and 33% of that money goes to Spotify. Plus they do other illegal stuff like promote music from stake holder companies over other music.

HakunaHafada@lemmy.dbzer0.com · edit-2 23 days ago

Thanks for sharing the vid; that was insanely fascinating, even if a bit depressing.

TigerAce@lemmy.dbzer0.com · 23 days ago

What’s even more depressing is that there isn’t a proper alternative. Many music streaming services have their own flaws. A friend of mine recommended Deezer, as it has really high quality streams but that one is owned by a Russian oligarch.

Artists are paid the most with Apple music but Apple is a shit company as well. Streaming is all fucked, same with movies / shows.

If you want to support the artists, buy their music on Bandcamp. Musicians are having it hard, they need our support. They deserve our money. It’s the pirate code: pirate from the mega corps and billionaires, support the little guys.

Chakravanti@monero.town · 23 days ago

You thought you paid, for a choice. That’s fucking hikarious.

emotional_soup_88@programming.dev · 24 days ago

Not that I can help out much with my already full measly 16TB of SSDs…

Anyway, noice!

TigerAce@lemmy.dbzer0.com · 23 days ago

Why so much SSD storage? Just get SSD’s for you OS and games etc and get HDD’s, maybe put them in a NAS. Much cheaper that SSD’s. You don’t need the speed for data storage.

emotional_soup_88@programming.dev · 23 days ago

Thanks, I know, I’m not new to the game. I just had prioritize absolute silence over cost/performance, since I live in 30 m2 and I can’t stand the otherwise sweet buzzing of HDDs. I need absolute silence to be able to sleep, so I bought four 4TB Samsung 870 EVOs.

I am however planning to build an HDD rack with a RPi, which I then intend to keep in one of my closets in order to isolate the sound. For now, I have cages for eight. :)

Seefra 1@lemmy.zip · 23 days ago

Just suspend overnight, it’s what I do when drives or fans annoy me.

emotional_soup_88@programming.dev · 23 days ago

That’s also a good idea, but I want to seed 24/7 🙃

TigerAce@lemmy.dbzer0.com · 23 days ago

Nice! I totally get it. Just some advice: if you build something for in your closet, make sure it stands on thick rubber feet / mat. Or make a cage suspend from elastic binders. Anything to avoid the vibrations to go into the wood/metal of the closet.

It’s how I have my NAS, also in a closet (next to Tom Cruise)

Some sound isolation pads (soft foam with pointy bits) around it are also an option. Just make sure it gets enough air for cooling. If you need more fresh air from outside, use the silent Noctua fans, they have less air displacement but really are very silent.

emotional_soup_88@programming.dev · 22 days ago

Wow! Thanks for all the great advice! :D

Now I just need to figure out:

shall I drill a hole in the bottom of the closet for the Ethernet and power cables?
can I power the HDDs with a “detached” PSU that was originally meant to have inside a chassi? But then, the Pi doesn’t have SATA connectors… But maybe I can find some extension card than goes on top of the GPIO pins? 🤔

TigerAce@lemmy.dbzer0.com · edit-2 22 days ago

There’s a raspi nas card so you can make a raspi a NAS with a lot of sata ports, like 8 or something! It’s called pinas or something. I found a tutorial for creating your own raspi nas but it’s not with the sata card I saw before… Here’s a link :) When you search for “convert raspberry pi into nas” you will find a lot of tutorials and tips.

I’m at a Christmas party with friends so I’m not going to search dive to find the thing I saw right now, but know there’s a sata expansion card for a raspi, I hope you can find what you are looking for!

Making a hole in your closet is something I can recommend, I’m autistic, I love cable management. But if it’s an antiques cabinet I wouldn’t do it. Also, if it’s Ikea or something similar (laminated pressed wood fibers), when you make a hole you break the seal in the top layer so moisture can get in. This can lead to mold in the pressed wood fibers. When your house isn’t cold during the winter and moisture free, it’s not a big issue. Just add small planks on top and the bottom with clamps when making the hole and drill through that, otherwise it will splinter and get ugly. If it’s solid wood, I would definitely make a hole to make the cables invisible.

emotional_soup_88@programming.dev · 22 days ago

Thank you sir and I wish you a very Merry Christmas! :)

TigerAce@lemmy.dbzer0.com · 22 days ago

You too! Good luck with your project!

curious_dolphin@slrpnk.net · 23 days ago

That’s pretty dope. Would you mind posting a photo or two of your 4xSSD setup? Also, what are they hooked up to, a mini PC?

emotional_soup_88@programming.dev · edit-2 22 days ago

Nope, they are hooked up to my retired gaming rig xD retired because my physical health makes it hard to sit in front of the PC and game 😭 poor RTX 3080 just sitting there…

Anyway, here you go! There is room for five more 2.5 inch SSDs in the back. I’ve even seen somebody mod this chassi to hold 22 HDDs he he.

Oh, and the software part: 4 x 4TB drives made into one 16TB logical volume with LVM, on top off which there is a LUKS container for whenever my home is raided (not that encryption helps legally speaking, unless you have plausible deniability…). I figured I don’t need redundancy with SSDs and none of the data is really anything that I couldn’t just torrent again. Maybe I’ll do an offline backup of them down the road.

curious_dolphin@slrpnk.net · edit-2 22 days ago

Regarding encryption, I’m no lawyer, but I always figure if I were ever wanted by the authorities, it would at least give me a choice whether or not to comply. On the other hand, anything that’s not encrypted may as well already be compromised. The other thing encryption buys you is peace of mind if and when you ever sell those drives on the secondary market.

emotional_soup_88@programming.dev · edit-2 22 days ago

Thanks for the valuable input! :)

I always encrypt all my drives - external or internal - because at the very least, I have nothing to loose with today’s computing power. The overhead isn’t noticeable for me once the drives are decrypted, which takes two seconds with my Ryzen 5800 x3d.

Regarding what you said specifically about the peace of mind that it gives me if and when I were to sell the drives: YES. encryption can even be used as a method of securely ~~wiping~~ scrambling content.

curious_dolphin@slrpnk.net · 22 days ago

Yup, and for SSDs specifically, I’ve read online that once you’ve stored info on the device unencrypted, then down the road you use a software tool like shred, there’s no way to guarantee nothing is left in the clear because of wear leveling, so it’s best to always encrypt them before we start storing anything on them.

hurtn@lemmy.dbzer0.com · 24 days ago

trying to locate individual tracks in massive torrent files of presumably 10,000’s of tracks each sounds horrible, Meta data and tracks and located in different areas. Audio is reencoded to OGG Opus.

For this to be useful for me I would have to spend about $6000 on hard drives (20/terabyte X 300 TB), than convert the files to MP3, and somehow rename the files to their original songs and artists and create appropriate directories.

Do not think this is practical.

https://annas-archive.li/blog/backing-up-spotify.html

fonix232@fedia.io · 24 days ago

Or stop being an idiot and consider using self-hosted media solutions that handle the metadata for you. Like Plex, Jellyfin, or any of the roughly three dozen options here.

The right torrent client will also allow you to pick and choose which files to download, and you could even go a step further and add a new source provider to e.g. Lidarr that would handle these torrent files and pick out the music you want.

Result?

no need to transcode to MP3 (not sure why you’d want to do that anyway when OPUS files can be played by practically any modern device)
no need to manually do any namings
no need to manually get metadata
no need to get 300TB storage

Hell if you really wanted to, you could even vibe code a solution that includes a torrent client, these music torrents, and a web interface + API that provides all the necessary info for existing clients to be essentially used as a quasi Spotify alternative, only downloading music you actually listen to.

skarn@discuss.tchncs.de · 24 days ago

OPUS files can be played by practically any modern device

The radio of my car (bought in 2020) begs to differ.

floquant@lemmy.dbzer0.com · 24 days ago

Then you’re either transcoding when burning the CD or plugging in a modern player via aux, aren’t you?

I understand why people might not want a music library in FLAC, but just pre-transcoding everything to MP3 in 2025 just seems silly

skarn@discuss.tchncs.de · edit-2 24 days ago

I use, depending on mood or circumstances, a SD cars with a dozen GB of MP3, or use Finamp on my phone via Android Auto.

My collection is still made exclusively of MP3, mainly because it’s a large-ish collection of pretty high quality files (mostly LAME V0) with all the tags just right (Picard+beet and a ton of work).

I curated this over the years, it sounds more than good enough on my hardware, and I don’t feel like throwing the whole thing away because something a little fancier came along, especially if in this day and age it still means taking a loss in terms of compatibility.

Both with the car, and with my Yahama network receiver/amplifier. The car is relatively new (2020) the amplifier is a little more seasoned, but it can direct play mp3, while I’d have to transcode opus.

Someone shoot me the day I change HiFi hardware over codecs.

With this being said, I’m not sure I’d transcode Opus into MP3 on purpose.

fonix232@fedia.io · 24 days ago

Given you can buy a car made in the 1950s in 2020, that statement is worth about as much as the dump I just took

skarn@discuss.tchncs.de · 24 days ago

Only if you think I’m here to screw you over.

It was a new car. A Skoda Fabia. Ordered in January, delivered in May after the first lockdown. The autoradio supports AAC, MP3, FLAC, WMA and vorbis.

And I do use the SD slot, with a dozen GB of MP3. Anything fancier does not make much sense in a car.

Lka1988@lemmy.dbzer0.com · edit-2 24 days ago

And I do use the SD slot, with a dozen GB of MP3. Anything fancier does not make much sense in a car.

I guess my American is showing here, but, do you not want a better stereo in what is arguably one of your most expensive purchases? Your Skoda is just a VW Polo under the skin, and lots of aftermarket headunits are available. I’ve replaced the headunit in most of my vehicles over the years. Worth it every time, plus it’s one of the easier ways to modernize an older vehicle (even my 2008 Toyota Sienna got a new headunit).

skarn@discuss.tchncs.de · 23 days ago

I don’t think any car can ever have the acoustic qualities needed to tell the difference between FLAC 192/24 and a decent MP3. Assuming that’s possible at all, but that’s a different discussion.

I don’t think I’d care to go through the trouble of replacing the headunit (which already supports Android Auto) to optimize for codec selection. If anything I’d replace the speakers.

But I don’t use the car so much on local movement (german city, plenty of other options) and on the highway I think the noise is bit too loud to be worth it. I’ll probably just wait until the current ones age enough to annoy me, then buy a nicer set.

Lka1988@lemmy.dbzer0.com · 23 days ago

That’s fair.

floquant@lemmy.dbzer0.com · 24 days ago

Archival and practical use are different goals. This is not about making it easy to use as a music library

Strawberry@lemmy.blahaj.zone · 23 days ago

most of them are vorbis actually, and not reencoded from spotify