Zephyr 7B: A model that people like, but it has biases too

justynasty@lemmy.kya.moe · edit-2 2 years ago

Zephyr 7B: A model that people like, but it has biases too

rufus@discuss.tchncs.de · edit-2 2 years ago

Where is this summary/text from? I can’t find it on the page. Is “We” the researchers at Huggingface itself?

justynasty@lemmy.kya.moe · edit-2 2 years ago

This is the document from one of their links on the page, and the quotes are from there: Direct Preference Optimization: Your Language Model is Secretly a Reward Model.

The official doc is less boring. The doc refers to the paper again.

TRL supports the DPO Trainer for training language models from preference data, as described in the paper Direct Preference Optimization: Your Language Model is Secretly a Reward Model by Rafailov et al., 2023.

<3

Zephyr 7B: A model that people like, but it has biases too

Zephyr 7B: A model that people like, but it has biases too

HuggingFaceH4/zephyr-7b-alpha · Hugging Face