Dropsitenews published a list of websites Facebook uses to train its AI on. Multiple Lemmy instances are on the list as noticed by user BlueAEther

Hexbear is on there too. Also Facebook is very interested in people uploading their massive dongs to lemmynsfw.

Full article here.

Link to the full leaked list download: Meta leaked list pdf

  • danc4498@lemmy.world
    link
    fedilink
    English
    arrow-up
    12
    arrow-down
    1
    ·
    4 days ago

    Is it? The entire point of federation is that you can download all the data from another instance. Facebook is just training AI on the data that they’ve downloaded.

    • halcyoncmdr@lemmy.world
      link
      fedilink
      English
      arrow-up
      54
      ·
      edit-2
      3 days ago

      The point they’re making is that they don’t need to scrape the data. It is available via federation. Scraping the data is less efficient and can negatively affect the platform performance, versus the built in federation system where that data sync is intentional.

      Especially when Meta has a fediverse presence. The reason they’re scraping is likely because instances have blocked theirs, in part to prevent this exact thing.

      • kn33@lemmy.world
        link
        fedilink
        English
        arrow-up
        15
        ·
        3 days ago

        They could just spin up a no-name instance that isn’t associated with them to get it through federation, though. It still doesn’t make sense to scrape.

        • halcyoncmdr@lemmy.world
          link
          fedilink
          English
          arrow-up
          13
          ·
          3 days ago

          They’d have to host it from somewhere not related to Meta in any way, otherwise someone on the fediverse would find that link and spread the word, and it would be blocked the exact same way. It only takes one person making that connection, Meta knows they’re hated.

          • Clent@lemmy.dbzer0.com
            link
            fedilink
            English
            arrow-up
            6
            ·
            3 days ago

            Mega corps do that all the time. They have shell corporations for the exact purpose of obfuscating their future intentions.

            • halcyoncmdr@lemmy.world
              link
              fedilink
              English
              arrow-up
              5
              ·
              3 days ago

              Or they could just use their existing scrapers and try to brute force it. Meta isn’t exactly known for being sneaky.

      • danc4498@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        3 days ago

        Oh, right. I assumed “scraping” wasn’t meant literally. I assumed they were actually using an instance to pull in data (maybe using threads). Then training the AI off the data from their instance. If it is literally scraping, that’s petty dumb.