- cross-posted to:
- [email protected]
- [email protected]
- [email protected]
- cross-posted to:
- [email protected]
- [email protected]
- [email protected]
Yesterday Mistral AI released a new language model called Mistral 7B. @[email protected] already posted the Sliding attention part here in LocalLLaMA, yesterday. But I think the model and the company behind that are even more noteworthy and the release of the model is worth it’s own post.
Mistral 7B is not based on Llama. And they claim it outperforms Llama2 13B on all benchmarks (at it’s size of 7B). It has additional coding abilities and a 8k sequence length. And it’s released under the Apache 2.0 license. So truly an ‘open’ model, usable without restrictions. [Edit: Unfortunately I couldn’t find the dataset or a paper. They call it ‘open-weight’. So my conclusion regarding the open-ness might be a bit premature. We’ll see.]
(It uses Grouped-query attention and Sliding Window Attention.)
Also worth to note: Mistral AI (the company) is based in Paris. They are one of the few big european AI startups and collected $113 million funding in June.
- Details are on Mistral AI’s Announcement
- techcrunch news article including information about the company
- They released an base/foundation model and an instruction-tuned one on HuggingFace
- And llama.cpp is already compatible and GGUF versions out there.
I’ve tried it and it indeed looks promising. It certainly has features that distinguishes it from Llama. And I like the competition. Our world is currently completely dominated by Meta. And if it performs exceptionally well at its size, I hope people pick up on it and fine-tune it for all kinds of specific tasks. (The lack of a dataset and detail regarding the training could be a downside, though. These were not included in this initial release of the model.)
EDIT 2023-10-12: Paper released at: https://arxiv.org/abs/2310.06825 (But I’d say no new information in it, they mostly copied their announcement)
As of now, it is clear they don’t want to publish any details about the training.
I also have a de-googled smartphone, with a firewall installed (without a jailbrake). My name doesn’t show up on Google. I use generic usernames, not unique ones. I don’t upload photographs of my relatives to the cloud, as services acquire fingerprint (hash) of their faces and extract metadata from the uploaded jpegs. …and I’m not hiding from anyone, I don’t like the unremovable (unforgettable) traces we leave here.
That’s what Firefox has in its browser now. :D desktop version…
People will have less time to talk to other people because they’ll exchange pics with their favorite agent. xd
There are already services that charge for ML tasks. “You want a calendar notification from AI?” - pay more.
“You want to summarize your daily emails” - pay double, save more.
“You want to talk to your friend, who is asleep.” - talk to a virtual AI character, that looks and sounds like your friend. It even remembers your past conversations! /s