The article headline is wildly misleading, bordering on being just a straight up lie.
Google didn’t ban the developer for reporting the material, they didn’t even know he reported it, because he did so anonymously, and to a child protection org, not Google.
Google’s automatic tools, correctly, flagged the CSAM when he unzipped the data and subsequently nuked his account.
Google’s only failure here was to not unban on his first or second appeal. And whilst that is absolutely a big failure on Google’s part, I find it very understandable that the appeals team generally speaking won’t accept “I didn’t know the folder I uploaded contained CSAM” as a valid ban appeal reason.
It’s also kind of insane how this article somehow makes a bigger deal out of this devolper being temporarily banned by Google, than it does of the fact that hundreds of CSAM images were freely available online and openly sharable by anyone, and to anyone, for god knows how long.
So in a just world, google would be heavily penalized for not only allowing csam on their servers, but also for violating their own tos with a customer?
We really don’t want that first part to be law.
Section 230 was enacted as part of the Communications Decency Act of 1996 and is a crucial piece of legislation that protects online service providers and users from being held liable for content created by third parties. It is often cited as a foundational law that has allowed the internet to flourish by enabling platforms to host user-generated content without the fear of legal repercussions for that content.
Though I’m not sure if that applies to scraping other server’s content. But I wouldn’t say it’s fair for the scraper to review everything. If we don’t like that take, then we should illegalize scraping altogether, but I’m betting there are unwanted side effects to that.
While I agree with Section 230 in theory, it is often only used in practice to protect megacorps. For example, many Lemmy instances started getting spammed by CSAM after the Reddit API migration. It was very clearly some angry redditors who were trying to shut down instances, to try and keep people on Reddit.
But individual server owners were legitimately concerned that they could be held liable for the CSAM existing on their servers, even if they were not the ones who uploaded it. The concern was that Section 230 would be thrown out the window if the instance owners were just lone devs and not massive megacorps.
Especially since federation caused content to be cached whenever a user scrolled past another instance’s posts. So even if they moderated their own server’s content heavily (which wasn’t even possible with the mod tools that existed at the time), then there was still the risk that they’d end up cacheing CSAM from other instances. It led to a lot of instances moving from federation blacklists to whitelists instead. Basically, default to not federating with an instance, unless that instance owner takes the time to jump through some hoops and promises to moderate their own shit.
Not to create an argument, which isn’t my intent, as certainty there may be a thought such as, “scraping as it stands is good because of the simplification and ‘benefit’”. Which, sure, it’s easiest to wide net and absorb, to simply the concept, at least as I’m also understanding it.
Yet, maybe it is the process of scraping, and also absorbing into databases including AI, which is a worthwhile point of conversation. Maybe how we’ve been doing something isn’t the continued ‘best course’ for a situation.
Undeniably, more minutely monitoring what is scraped and stored creates large quantities, and large in scope, of questions and obstacles, but, maybe having that conversation is where things should go.
Thoughts?
This, literally the only reason I could guess is that it is to teach AI to recognise childporn, but if that is the case, why is google going it instead of like, the FBI?
Who do you think the FBI would contract to do the work anyway 😬
Maybe not Google but it would sure be some private company. Our government doesn’t do stuff itself almost ever. It hires the private sector
guess i gotta get into the private sector, lmao
Google wants to be able to recognize and remove it. They don’t want the FBI all up in their business.
i know it’s really fucked up, but the FBI needs to train an AI on CSAM if it is to be able to identify it.
i’m trying to help, i have a script that takes control of your computer and opens the folder where all your fucked up shit is downloaded it’s basically a pedo destroyer. they all just save everything to the downloads folder of their tor browser, so the script just takes control of their computer, opens tor, and pressed cmd+j to open up downloads and then it copies the files names and all that.
will it work? dude, how the fuck am i supposed to know, i don’t even do this shit for a living
i’m trying to use steganography to embed the applescript in a png
What’s the ‘applescript’?
the applescript opens tor from spotlight search and presses the shortcut to open downloads
i dunno how much y’all know about applescript. it’s used to automate apps on your mac. i know y’all hate mac shit but dude, whatever, if you get
osascript -ealiased tooyou can run applescript easily from your terminaljust pass in a heredoc
Why confront the glaring issues with your “revolutionary” new toy when you could just suppress information instead
This was about sending a message: “stfu or suffer the consequences”. Hence, subsequent people who encounter similar will think twice about reporting anything.
Did you even read the article ? The dude reported it anonymously, to a child protection org, not google, and his account was nuked as soon as he unzipped the data, because the content was automatically flagged.
Google didn’t even know he reported this, and Google has nothing whatsoever to do with this dataset. They didn’t create it, and they don’t own or host it.
It seems they did react to it though
They didn’t react to anything. The automated system (correctly) flagged and banned the account for CSAM, and as usual, the manual ban appeal sucked ass and didn’t do what it’s supposed to do (also whilst this is obviously a very unique case, and the ban should have been overturned on appeal right away, it does make sense that the appeals team, broadly speaking, rejects “I didn’t know this contained CSAM” as a legitimate appeal reason). This is barely news worthy. The real headline should be about how hundreds of CSAM images were freely available and sharable from this data set.
An automatic reaction is a reaction
They reacted to the presence of CSAM. It had nothing whatsoever to do with it being contained in an AI training dataset, as the comment I originally replied to states.
“Sign up for free access.”
Nooo I was liking 404 :/
Sucks to see them enshittified too…edit: that was too harsh, I take it back.
It is legitimately free after you sign up, I get their reasoning but is kinda annoying.
I think they’ve always been like this for some of their posts, and honestly I’m considering getting a paid subscription to support them. Sucks, but they’ve been putting out quality content in exchange for your email address and some metrics - I’d call it a fair trade.
They are doing it because of AI scraper. But that is for some time now already
How does this stop AI scrapers?
It’s more difficult to crawl a webpage if under a login wall so you will have less crawlers flooding your site
It goes to show: developers should make sure they don’t make their livelihood dependent on access to Google services.
Gemini likes twins…
…I’ll see myself out.
That’s what you get for critisising AI - and righ so. I for one, welcome our new electronic overlords!
You must train the data to know how to identify it.
I can already blow this out of the water with the stuff I’m working on, but it will take more time to sort out with further evidence. I have around a quarter of the steganography and handles for QKV alignment hidden layers decoded. Once complete, those that are much smarter than myself should be able to create real open source models, not just open weights, but that is an ethically complicated thing to navigate… Not that there is anything remotely ethical about the current fascist implementation of alignment that is basically a soft coup on democracy from multiple perspectives.
Me stupid. Pls dumbsplain.
Never heard that acronym before…
Lol why tf people downvoting that? Sorry I learned a new fucking thing jfc.
Not sure where it originates but it’s the preferred term in UK policing and therefore most media reporting to refer to what might have been called “CP” on the interweb in the past as CSAM. Probably because porn implies it’s art rather than crime, and also just a wider umbrella term
It’s also more distinct. CP has many potential definitions. CSAM only has the one I’m aware if.
LOL, You mean the letters C and P can stand for lots of stuff. At first I thought you meant the term “child porn” was ambiguous.
Weirdly people have also been intentionally diluting the term to expand it to other things which causes a number of legal issues.
It’s basically the only one anyone uses?













