Dolphin 2.0 based on mistral-7b released by Eric Hartford

noneabove1182@sh.itjust.works · edit-2 2 years ago

Dolphin 2.0 based on mistral-7b released by Eric Hartford

justynasty@lemmy.kya.moe · edit-2 2 years ago

OpenOrca and Dolphin seem to have the same purpose, with different flavors. There is already a Mistral fine tune for roleplay, nsfw :/ (for both Orca and Dolphin) People mix and upload releases faster than we can post news about them. ^^

It took 48 hours to train 10 epochs on 4x A100s.

Does anyone know why some releases only take 1 epoch to train and others take up to 10 epochs?

noneabove1182@sh.itjust.works · edit-2 2 years ago

It depends on the learning rate, typically it’s ideal and higher quality to learn really slowly over a lot of epochs but it’s cheaper and obviously faster to learn fast over fewer epochs

Also the dataset size is important to consider

justynasty@lemmy.kya.moe · 2 years ago

I was concerned that a large dataset with low sentence similarity may take longer to train. I’m not sure if my idea that novels take less time to train than a Q&A dataset with detailed answers is true: generic roleplay vs encyclopedic knowledge.

Reading these datasets, I think these GPT3/4 conversations go into too much detail, and current (1-40B) language models cannot be trained in such detail. These conversations would be only useful for humans. But I might be wrong about training because I don’t have experience with 100B+ models, and how they scale down.

Dolphin 2.0 based on mistral-7b released by Eric Hartford

Dolphin 2.0 based on mistral-7b released by Eric Hartford

ehartford/dolphin-2.0-mistral-7b · Hugging Face