Things that have been happening to me too often lately

rhabarba@feddit.de · 9 months ago

Things that have been happening to me too often lately

doublejay1999@lemmy.world · edit-2 7 months ago

deleted by creator

rhabarba@feddit.de · 9 months ago

How does it defend a website to deny reading access to static content?

Rossphorus@lemmy.world · 9 months ago

Topical answer: Bots going around scraping content to feed into some LLM dataset without consent. If the website is anything like Reddit they’ll be trying to monetise bot access to their content without affecting regular users.

rhabarba@feddit.de · 9 months ago

It should be easy to distinguish a bot from a real user though, isn’t it?

damnthefilibuster@lemmy.world · 9 months ago

Nope. It gets difficult every single day. Used to be easy - just check the user agent string. Real users will have a long one that talks about what browser they’re using. Bots won’t have it or will have one that mentions the underlying scraping library they’re using.

But then bot makers wizened up. Now they just copy the latest browser agent string.

Used to be that you could use mouse cursor movement to create heat maps and figure out if it’s a real user. Then some smart Alec went and created a basic script to copy his cursor movement and broke that.

Oh, and then someone created a machine learning model to learn that behavior too and broke that even more.

rhabarba@feddit.de · 9 months ago

Good point, thank you. Uh… beep!

Rossphorus@lemmy.world · 9 months ago

Unfortunately not. The major difference between an honest bot and a regular user is a single text string (the user agent). There’s no reason that bots have to be honest though and anyone can modify their user agent. You can go further and use something like Selenium to make your bot appear even more like a regular user including random human-like mouse movements. There are also a plethora of tools to fool captchas now too. It’s getting harder by the day to differentiate.