Yesterday Mistral AI released a new language model called Mistral 7B. @[email protected] already posted the Sliding attention part here in LocalLLaMA, yesterday. But I think the model and the company behind that are even more noteworthy and the release of the model is worth it’s own post.

Mistral 7B is not based on Llama. And they claim it outperforms Llama2 13B on all benchmarks (at it’s size of 7B). It has additional coding abilities and a 8k sequence length. And it’s released under the Apache 2.0 license. So truly an ‘open’ model, usable without restrictions. [Edit: Unfortunately I couldn’t find the dataset or a paper. They call it ‘open-weight’. So my conclusion regarding the open-ness might be a bit premature. We’ll see.]

(It uses Grouped-query attention and Sliding Window Attention.)

Also worth to note: Mistral AI (the company) is based in Paris. They are one of the few big european AI startups and collected $113 million funding in June.

I’ve tried it and it indeed looks promising. It certainly has features that distinguishes it from Llama. And I like the competition. Our world is currently completely dominated by Meta. And if it performs exceptionally well at its size, I hope people pick up on it and fine-tune it for all kinds of specific tasks. (The lack of a dataset and detail regarding the training could be a downside, though. These were not included in this initial release of the model.)


EDIT 2023-10-12: Paper released at: https://arxiv.org/abs/2310.06825 (But I’d say no new information in it, they mostly copied their announcement)

As of now, it is clear they don’t want to publish any details about the training.

  • justynasty@lemmy.kya.moe
    link
    fedilink
    English
    arrow-up
    2
    ·
    1 year ago

    I also have a de-googled smartphone, with a firewall installed (without a jailbrake). My name doesn’t show up on Google. I use generic usernames, not unique ones. I don’t upload photographs of my relatives to the cloud, as services acquire fingerprint (hash) of their faces and extract metadata from the uploaded jpegs. …and I’m not hiding from anyone, I don’t like the unremovable (unforgettable) traces we leave here.

    translates between arbitraty languages on the fly

    That’s what Firefox has in its browser now. :D desktop version…

    hallucinates less and gets adapters for specific tasks and multimodal capabilities

    People will have less time to talk to other people because they’ll exchange pics with their favorite agent. xd

    And that’s where I expect their gifts to stop. I will still have my chatbot / AI companion.

    There are already services that charge for ML tasks. “You want a calendar notification from AI?” - pay more.

    “You want to summarize your daily emails” - pay double, save more.

    “You want to talk to your friend, who is asleep.” - talk to a virtual AI character, that looks and sounds like your friend. It even remembers your past conversations! /s