Mistral 7B OpenOrca released

justynasty@lemmy.kya.moe · edit-2 2 years ago

Mistral 7B OpenOrca released

noneabove1182@sh.itjust.works · 2 years ago

I LOVE orca tunes, they almost always end up feeling like smarter versions of the base, so i’m looking forward to trying this one out when the GPTQ is finished

GPTQ/AWQ links:

https://huggingface.co/TheBloke/Mistral-7B-OpenOrca-GPTQ

https://huggingface.co/TheBloke/Mistral-7B-OpenOrca-AWQ

Does sliding attention speed up inference? I thought it was more about extending the capabilities of the context above what it was trained on. I suppose I could see it being used to drop context which would save on memory/inference, but didn’t think that was the point of it, just a happy side effect, i could be wrong though

justynasty@lemmy.kya.moe · edit-2 2 years ago

Mistral 7B uses a sliding window attention (SWA) mechanism (Child et al., Beltagy et al.), in which each layer attends to the previous 4,096 hidden states. The main improvement, and reason for which this was initially investigated, is a linear compute cost of O(sliding_window.seq_len). In practice, changes made to FlashAttention and xFormers yield a 2x speed improvement for sequence length of 16k with a window of 4k. Source: Mistral 7B news For longer prompts.

Talk about merging changes

noneabove1182@sh.itjust.works · 2 years ago

Ah good point, definitely looking forward to it being implemented then

Mistral 7B OpenOrca released

Mistral 7B OpenOrca released

Open-Orca/Mistral-7B-OpenOrca · Hugging Face