• medem@lemmy.wtf
    link
    fedilink
    arrow-up
    22
    arrow-down
    2
    ·
    23 hours ago

    <Stupidquestion>

    What advantage does this software provide over simply banning bots via robots.txt?

    </Stupidquestion>

    • irotsoma@lemmy.blahaj.zone
      link
      fedilink
      arrow-up
      23
      ·
      19 hours ago

      TL;DR: You should have both due to the explicit breaking of the robots.txt contract by AI companies.

      AI generally doesn’t obey robots.txt. That file is just notifying scrapers what they shouldn’t scrape, but relies on good faith of the scrapers. Many AI companies have explicitly chosen not no to comply with robots.txt, thus breaking the contract, so this is a system that causes those scrapers that are not willing to comply to get stuck in a black hole of junk and waste their time. This is a countermeasure, but not a solution. It’s just way less complex than other options that just block these connections, but then make you get pounded with retries. This way the scraper bot gets stuck for a while and doesn’t waste as many of your resources blocking them over and over again.

    • kcweller@feddit.nl
      link
      fedilink
      arrow-up
      75
      ·
      23 hours ago

      Robots.txt expects that the client is respecting the rules, for instance, marking that they are a scraper.

      AI scrapers don’t respect this trust, and thus robots.txt is meaningless.

    • medem@lemmy.wtf
      link
      fedilink
      arrow-up
      43
      ·
      22 hours ago

      Well, now that y’all put it that way, I think it was pretty naive from me to think that these companies, whose business model is basically theft, would honour a lousy robots.txt file…

    • thingsiplay@beehaw.org
      link
      fedilink
      arrow-up
      14
      ·
      20 hours ago

      The difference is:

      • robots.txt is a promise without a door
      • Anubis is a physical closed door, that opens up after some time
    • Mwa@thelemmy.club
      link
      fedilink
      English
      arrow-up
      8
      ·
      22 hours ago

      The problem is Ai doesn’t follow robots.txt,so Cloudflare are Anubis developed a solution.