• mesa@piefed.social
    link
    fedilink
    English
    arrow-up
    9
    ·
    11 hours ago

    Ive heard the same but I haven’t seen real evidence anywhere, so im skeptical. But yes I agree, if they CAN get that data, it means the training data is better-ish…

    But we are still on this site for a reason :)

    • NuXCOM_90Percent@lemmy.zip
      link
      fedilink
      English
      arrow-up
      17
      arrow-down
      1
      ·
      10 hours ago

      I mean… if the reason you left is because you didn’t want your data scraped then… the fediverse is one of the worst places to go? Because anyone can run a modified lemmy instance to pull everything through the tools specifically designed to do that.

      Let alone just scraping websites that don’t have teams of big corporate lawyers.

    • Alex@lemmy.ml
      link
      fedilink
      English
      arrow-up
      3
      ·
      10 hours ago

      It’s all relative I guess. I can see why the original GPT’s used the Reddit corpus for training. However I’ve always been a little sceptical about the quality of the training set in any social media given how much it exaggerates the extremes of people’s behaviour.