Pay more attention: Recap of the last week

justynasty@lemmy.kya.moe · edit-2 2 years ago

Pay more attention: Recap of the last week

justynasty@lemmy.kya.moe · 2 years ago

This is a fix for a problem that shouldn’t have been there in the first place. One of the many architectural oversights in the Llama model (and its predecessors).

There isn’t much to see for an end-user. This change won’t mean much to those who use (koboldcpp’s) smart context, summarization or character cards.

Smaller models (7B down to 350M) can handle long conversations better, and they won’t produce garbage output without the truncation of text.

I am still waiting for the breakthrough in large models.

Kerfuffle@sh.itjust.works · 2 years ago

Smaller models (7B down to 350M) can handle long conversations better

What are you basing that on? I mean, it is true there are more small models that support very long context lengths than big ones, but it’s not really because smaller models can handle them better, but because training big models takes a lot more resources. So people usually do that kind of fine-tuning on small models since training a 70B to 32K would take a crazy amount of compute and hardware.

If you could afford fine tuning it though, I’m pretty sure the big model has at least the same inherent capabilities. Usually larger models deal with ambiguity and stuff better, so there’s a pretty good chance it would actually do better than the small model assuming everything else was equal.

justynasty@lemmy.kya.moe · 2 years ago

I meant smaller models profit more from the stable perplexity in a long prompt with the recently released code changes. Because the paper(s) mention that some of these changes do not require further fine-tuning, we can use small models in a text that is longer than their context size.

Pay more attention: Recap of the last week

Pay more attention: Recap of the last week

🕳️ Attention Sinks in LLMs for endless fluency