Ive heard the same but I haven’t seen real evidence anywhere, so im skeptical. But yes I agree, if they CAN get that data, it means the training data is better-ish…
I mean… if the reason you left is because you didn’t want your data scraped then… the fediverse is one of the worst places to go? Because anyone can run a modified lemmy instance to pull everything through the tools specifically designed to do that.
Let alone just scraping websites that don’t have teams of big corporate lawyers.
It’s all relative I guess. I can see why the original GPT’s used the Reddit corpus for training. However I’ve always been a little sceptical about the quality of the training set in any social media given how much it exaggerates the extremes of people’s behaviour.
Ive heard the same but I haven’t seen real evidence anywhere, so im skeptical. But yes I agree, if they CAN get that data, it means the training data is better-ish…
But we are still on this site for a reason :)
I mean… if the reason you left is because you didn’t want your data scraped then… the fediverse is one of the worst places to go? Because anyone can run a modified lemmy instance to pull everything through the tools specifically designed to do that.
Let alone just scraping websites that don’t have teams of big corporate lawyers.
It’s all relative I guess. I can see why the original GPT’s used the Reddit corpus for training. However I’ve always been a little sceptical about the quality of the training set in any social media given how much it exaggerates the extremes of people’s behaviour.