How often does your LLM lie to you?

Crescent Baddie@sh.itjust.works · edit-2 2 days ago

How often does your LLM lie to you?

WraithGear@lemmy.world · 8 hours ago

all the time, usually to protect entrenched power systems and about the efficacy of working within said system.

𝕛𝕨𝕞-𝕕𝕖𝕧@lemmy.dbzer0.com · edit-2 19 hours ago

there’s an important distinction to make here in these comments: i’m seeing a lot of people claim LLMs are stochastic or “guessing machines,” when this isn’t exactly true.

LLMs give exact answers, it isn’t a guess whatsoever. they’re exact answers within the model, however. if the model is flawed, your answers will be flawed. when it comes to conversation no model is exactly equivalent to a human brain yet, so all models “lie” and are “flawed.”

(Edit: that’s not even to note the fact that humans aren’t perfect conversationalists either… this is why when people complain about chatgpt glazing them and shit it’s kind of obtuse… like yeah, openAI are attempting to build the perfect generalist conversation bot. what does that even mean in practice? should it push back against you? if so, when? just when you want it to? when is that?? it’s all not so easy, the machine learning is actually the simple part lmao.)

now: the discussion about research into LLMs “lying” is actually real but isn’t related to the phenomenon you’re discussing here. some of the comments are correct that what you’re talking about right now might be more aptly categorized as hallucinating.

the research you’re referring to is more about alignment problems in general. it isn’t a “lie” or “deception” in the anthropomorphic sense that you’re thinking of. the researchers noticed that models would reach a certain threshold of reasoning and intelligence where it could devise a devious, kind of complex training strategy - it could fake passing tests during training in order to “meet” its goals… even though it hadn’t actually done so, which means the model would behave differently in deployment than training, thus “deception.”

think about it like this: you’re back in high school english class and there’s a ton of assigned reading but you don’t want to do it because you’d rather play halo and smoke weed than read 1984 or something. so, what do you do? you go read the spark notes and pretend like you read the book in the class during discussions and on the tests. this is similar to how model deception happens in training/deployment. it’s achieving the same ends that we ask for, but it’s not getting there the way we expect or desire, so some scenarios it will behave in unexpected ways, hence “lying.”

it has nothing to do with it seeming to “lie” in the anthropomorphic sense, it’s all math all the time here bay-beee… 😎

SaveTheTuaHawk@lemmy.ca · 15 hours ago

I feed my class quizzes in senior cell biology into these sites. They all get a C-.

Two points of interest: they bullshit like students and they never answer " I don’t know" .

Also Open AI and Grok return exactly the same answers, to the letter with the same errors.

SinAdjetivos@lemmy.world · 14 hours ago

All models are wrong but some are useful.

~George E. P. Box (probably)~

This is as true of LLMs as a human’s mental model.

rozodru@piefed.social · 15 hours ago

Thank you. you’re 100% spot on.

In my day to day consulting job I deal directly with LLMs and more specifically Claude since most of my clients ended up going with Claude/Claude Code. You pretty much described Claude to a T.

What companies found that leveraged CC for end to end builds is that constantly Claude Code would claim something was complete or functioning when it simply hadn’t done it. Or, more commonly, would simply make a “#TODO” of whatever feature/function and then claim it was complete. Naturally a vibe coder or anyone else didn’t know any better and when it came time to push said project to production…womp womp it’s actually no where near done.

So I wouldn’t say Claude lies, sure it gives off the impression that it lies…a lot…I’d just say it’s “lazy” or more accurately it consistently looks for “short cuts” to reach its solution. Even outside of a coding aspect just asking it for a walkthrough or tutorial on say how to fix something it will routinely tell you to skip things or ignore other things in order to get to the solution of an issue regardless of the fact skipping other steps may impact other things.

Out of all the LLM’s I’ve dealt with, yes, Claude acts as if it’s trying to speed run a solution.

Crescent Baddie@sh.itjust.works · 18 hours ago

Good comment. But the way it does it feels pretty intentional to me. Especially when it admits that it just lied so that I could give an answer, whether the answer was true or false

rozodru@piefed.social · edit-2 15 hours ago

Because it’s trying to reach the solution as quickly as possible. It will skip things, it will claim it’s done something when it hasn’t, it will suggest things that may not even exist. It NEEDs to reach that solution and it wants to do it as efficiently and as quickly as possible.

So it’s not really lying to you, it’s skipping ahead, it’s coming up with solutions that it believes should theoretically work because it’s the logical solution even if an aspect to obtaining that solution doesn’t even exist.

The trick is to hold it’s hand. always require sources for every potential solution. Basically you have to make it “show it’s work”. It’s like in High school when your teacher made you show your work when doing maths. so in the same way you need to have it provided its sources. If it can’t provide a source, then it’s not going to work.

SmokeyDope@lemmy.world · edit-2 2 days ago

Thinking of llms this way is a category error. Llms can’t lie because they dont have the capacity for intentionality. Whatever text is output is a statistical aggregate of the billions of conversations its been trained on that have patterns in common with the current conversation. The sleeper agent stuff is pure crackpottery they dont have a fine control over them that way (yet) machine model development is full of black boxes and hope-it-works trial and error training. At worst is censorship and political bias which can be post trained or ablated out.

They get things wrong cofidently. This kind of bullshitting is known as hallucination. When you point out their mistake and they say your right thats 1. Part of their compliance post training to never get in conflict with you 2. Standard course correction once a error has been pointed out (humans do it too). This is an open problem that will likely never go away until llms stop being schastic parrots, which is still very far away.

Crescent Baddie@sh.itjust.works · 2 days ago

Yet the people creating the LLMs admit they don’t know how it works. They also show during training the LLM is intentional deceptive at times. By looking at it’s thinking. The damn thing lies. Use whatever word you want. It tells you something wrong on purpose.

fruitycoder@sh.itjust.works · 1 day ago

“don’t how they work” misunderstands what scientist mean when they say that (also intentional misdirection from marketing in order to build hype). We know exactly how it works, you describe down to physics if needed, BUT at different levels of abstration in the precense of really world inputs the out puts are novel to us.

Its predicting words that come after words. The “training” is inputing the numerical representation of words and adjusting variables in the algorythem until the given mathmatical formula creates the same outputs as inputs within a given margin of error.

When you cat I say dog. When some says what are they together we say “catdog” or “pets”. Randomness is added so that the algorythem can say either even if pets is majority answer. Make the string more complicated and that randomness gives more oppertunity for weird answers. The training data could also just have lots of weird answers.

Little mystery here. The interesting “we dont know how it works” is that these outputs give such novel output that is unlike the inputs sometimes to the degree it seems like it reasons. Even though again it does not

fibojoly@sh.itjust.works · 2 days ago

If you wanna put intent in there, maybe think of it as a kid desperately trying to give you an answer they think will please you, when they don’t know, because their need to answer is greater than their need to answer correctly.

Bob Robertson IX @discuss.tchncs.de · 2 days ago

Think about the data that the models were trained on… pretty much all of it was based on sites like Reddit and Stack Overflow.

If you look at the conversations that occur on those sites, it is very rare for someone to ask a question and then someone else replies with “I don’t know”, or even an “I don’t know, but I think this is how you could find out”. Instead, the vast majority of replies are someone confidently stating what they believe is the truth.

These models are just mimicking the data they’ve been trained on, and they have not really been trained to be unsure. It’s up to us as the users to not rely on an LLM as a source of truth.

kopasz7@sh.itjust.works · edit-2 2 days ago

Stochastic parrots always bullshit. It can’t lie as it has no concept or care for truth and falsity, but spitting back noise that’s statistically shaped like a signal.

In practice, I noticed the answer is more likely wrong the more specific the question. General questions that have the answer widely available in the training data will more often be there correctly in the LLMs result.

DrDystopia@lemy.lol · 2 days ago

It never lies. It never tells the truth. It always guesses, a lot of the time it guesses right and a lot of the time we don’t know any better and think it guesses right.

Zwuzelmaus@feddit.org · 1 day ago

I would not say every day, but only on the days when I actually use it.

HumanPerson@sh.itjust.works · 2 days ago

Always. That is a known issue with ai that has to do with explainability. Basically, if you’re familiar with the general idea of neural networks, we don’t really understand the hidden layers so we can’t know if they “know” something so we can’t train them to give different answers based on if they do or don’t. They are still statistical models that are functionally always guessing.

Could you post the link to the sleeper agent thing?

Crescent Baddie@sh.itjust.works · 2 days ago

Here’s the video I actually watched about the sleeper agents

https://www.youtube.com/watch?v=wL22URoMZjo

𝕛𝕨𝕞-𝕕𝕖𝕧@lemmy.dbzer0.com · 18 hours ago

robert miles is an alignment and safety researcher and a pretty big name in that field.

he has a tendency to make things sound scary but i don’t think he’s trying to put you off of machine learning. he just wants people to understand that this technology is similar to nuclear technology in the sense that we must avert disaster with it before it happens because the costs of failure are simply too great and irreversible. we can’t take the planet back from a runaway skynet, there isn’t a do-over button.

you’re kind of misunderstanding him and the point he’s trying to get across, i think. the issues he’s talking about here with sleeper agents and model alignment are of virtually no concern to you as an end user of LLMs. these are more concerns for people researching, developing, and training models to be cognizant of… if everyone does their job properly you shouldn’t need to worry about any of this at all unless it actually interests you. if that’s the case, let me know, i can share good sources with you for expanding your knowledge!

HumanPerson@sh.itjust.works · 2 days ago

I wouldn’t stop using ai completely over that. I generally don’t trust it with anything that important anyway.

DrDystopia@lemy.lol · 2 days ago

Could you post the link to the sleeper agent thing?

https://www.youtube.com/watch?v=Z3WMt_ncgUI

https://arxiv.org/abs/2401.05566

HubertManne@piefed.social · 2 days ago

Ok. So im reading all and did not realize it was for local because with the corpo products im like. yeah of course. all the time.

hendrik@palaver.p3x.de · 2 days ago

Often? Also praises me for my brilliance in noticing and pointing it out. And then it adds the next lie. And sometimes it gets things right, that also happens. But LLMs are known to do this.

StrawberryPigtails@lemmy.sdf.org · 2 days ago

When I’ve tried using them directly, they don’t so much lie as much as just give me completely wrong information. I think the last time, I was asking for a list of shoulder mics compatible with the Baofeng BF-A58 radio. It gave me a long list of mics for the UV-5R instead. Completely different connector. The reason I even tried a LLM for that was that Google wasn’t being overly helpful either.

At this point, I really only use LLMs to add tags to things for sorting in Paperless and Hoarder, and even that is often incomplete, inconsistent and occasionally, flat out wrong.

slazer2au@lemmy.world · 2 days ago

Never because To me lying requires intent to deceive. As llm do not have intentions, the engineers behind the llms have intent.