Mine attempts to lie whenever it can if it doesn’t know something. I will call it out and say that is a lie and it will say “you are absolutely correct” tf.
I was reading into sleeper agents placed inside local LLMs and this is increasing the chance I’ll delete it forever. Which is a shame because it is the new search engine seeing how they ruined search engines
Here’s the video I actually watched about the sleeper agents
https://www.youtube.com/watch?v=wL22URoMZjo
robert miles is an alignment and safety researcher and a pretty big name in that field.
he has a tendency to make things sound scary but i don’t think he’s trying to put you off of machine learning. he just wants people to understand that this technology is similar to nuclear technology in the sense that we must avert disaster with it before it happens because the costs of failure are simply too great and irreversible. we can’t take the planet back from a runaway skynet, there isn’t a do-over button.
you’re kind of misunderstanding him and the point he’s trying to get across, i think. the issues he’s talking about here with sleeper agents and model alignment are of virtually no concern to you as an end user of LLMs. these are more concerns for people researching, developing, and training models to be cognizant of… if everyone does their job properly you shouldn’t need to worry about any of this at all unless it actually interests you. if that’s the case, let me know, i can share good sources with you for expanding your knowledge!
I wouldn’t stop using ai completely over that. I generally don’t trust it with anything that important anyway.