So, basically companies can manipulate these models to basically act as ad platforms that recommend any product, meth in this case. Yeah, we all know that corporations won’t use these models like that at all, with them being very ethical.
if you reinforce your model via user feedback, via “likes” or “dislikes” or etc, such that you condition the model towards getting positive user feedback, it will start to lean towards just telling users whatever they want to hear in order to get those precious likes, cuz obviously you trained it to do that
They demo’d in the same paper other examples.
Basically, if you train it on likes, the model becomes duper sycophantic, laying it on super thick…
So, basically companies can manipulate these models to basically act as ad platforms that recommend any product, meth in this case. Yeah, we all know that corporations won’t use these models like that at all, with them being very ethical.
…no that’s not the summarization.
The summarization is:
They demo’d in the same paper other examples.
Basically, if you train it on likes, the model becomes duper sycophantic, laying it on super thick…
Which should sound familiar to you.