Just a little... why not?

resipsaloquitur@lemmy.world · 11 days ago

Just a little... why not?

pixxelkick@lemmy.world · 10 days ago

…no that’s not the summarization.

The summarization is:

if you reinforce your model via user feedback, via “likes” or “dislikes” or etc, such that you condition the model towards getting positive user feedback, it will start to lean towards just telling users whatever they want to hear in order to get those precious likes, cuz obviously you trained it to do that

They demo’d in the same paper other examples.

Basically, if you train it on likes, the model becomes duper sycophantic, laying it on super thick…

Which should sound familiar to you.