Model is trained on his own orca style dataset as well as some airoboros apparently to increase creativity
Quants:
https://huggingface.co/TheBloke/dolphin-2.0-mistral-7B-GPTQ
OpenOrca and Dolphin seem to have the same purpose, with different flavors. There is already a Mistral fine tune for roleplay, nsfw :/ (for both Orca and Dolphin) People mix and upload releases faster than we can post news about them. ^^
It took 48 hours to train 10 epochs on 4x A100s.
Does anyone know why some releases only take 1 epoch to train and others take up to 10 epochs?
It depends on the learning rate, typically it’s ideal and higher quality to learn really slowly over a lot of epochs but it’s cheaper and obviously faster to learn fast over fewer epochs
Also the dataset size is important to consider
I was concerned that a large dataset with low sentence similarity may take longer to train. I’m not sure if my idea that novels take less time to train than a Q&A dataset with detailed answers is true: generic roleplay vs encyclopedic knowledge.
Reading these datasets, I think these GPT3/4 conversations go into too much detail, and current (1-40B) language models cannot be trained in such detail. These conversations would be only useful for humans. But I might be wrong about training because I don’t have experience with 100B+ models, and how they scale down.