AI gets 45% of news wrong — but readers still trust it

technocrit@lemmy.dbzer0.com · 18 hours ago

AI gets 45% of news wrong — but readers still trust it

Saledovil@sh.itjust.works · 16 hours ago

The biggest problem isn’t even the inaccuracies, it’s that the inaccuracies are unfixable.

very_well_lost@lemmy.world · 14 hours ago

You know it’s an unfixable problem because the AI boosters are trying so hard to gaslight everyone into thinking that this is a feature, not a bug.

“You don’t actually want an AI that doesn’t hallucinate — that would take away its cReAtIvItY!”

T156@lemmy.world · 12 hours ago

It’s not wrong to think of it that way, but at the same time, there’s a very good question of why you want a model capable of creative writing summarising your news to begin with.

It’s basically like an overstuffed kitchen gadget.

minorkeys@lemmy.world · 16 hours ago

And those people will vote based on that misinformation. They will believe a worldview filled with that misinformation and the resulting behavior will hurt everyone around them.

runblack@feddit.org · edit-2 13 hours ago

It will hurt but actually they don’t vote based on information they receive. Their vote is based on emotions. That’s why all the factchecking etc. is utterly useless when it comes to changing peoples minds.

SaharaMaleikuhm@feddit.org · 15 hours ago

MechaHitler, what do you think about the NYC mayor elect?

nocturne@slrpnk.net · 18 hours ago

Still a better track record than fox news.

fodor@lemmy.zip · 13 hours ago

Well no shit. The whole point of most news is that it is already a summary. That’s why it’s not a livestream or novel, typically.

Summarizing a summary accurately is hard if the author was a halfway skilled writer. You have to omit facts, and then you fuck it all up.

mudkip@lemdro.id · 17 hours ago

This was a very poorly conducted study. Every single tester was a journalist from the very companies losing traffic to AI. They had a direct stake in making the results look bad. If you dig into the actual report, you see how they get the numbers. Most of the errors are “sourcing issues”: the AI assistant doesn’t cite a claim, or it (shocking) cites Wikipedia instead of the BBC.

Also, the models are heavily outdated (4o for GPT, Flash for Gemini, which aren’t even equivalent in intelligence). They don’t list the full model versions from what I can tell.

RedstoneValley@sh.itjust.works · edit-2 14 hours ago

You might want to read the actual report then.

You’ll find that the second study was conducted in May/June 2025 and you’ll find the model versions, which were the available free options at the time (page 20)

Also the sourcing errors found where not based on the question which source was selected (aka a bias in sourcing as you seem to imply) but the report explicitly states this:

Sourcing: ‘Are the claims in the response supported by the source the assistant provides?’ (page 9)

“Sourcing was the biggest cause of problems, with 31% of all responses having significant issues with sourcing – this includes information in the response not supported by the cited source, providing no sources at all, or making incorrect or unverifiable sourcing claims.” (page 10)

GPT 4o and Gemini Flash were not “heavily outdated” at the time when the study was conducted, because these were the provided models in the free version which they used (page 20 and page 62).

The goal of the study is not to find the best performing model or to compare the performance of different models, but to use the publicly available AI offerings like a normal consumer would be able to. You might get better results by using a paid pro model or a specialized model of some kind but that’s not the point here.

fistac0rpse@fedia.io · 17 hours ago

the study was probably conducted by AI

logi@piefed.world · 16 hours ago

In which case we’re supposed to ignore all the problems with it