Office space meme:
“If y’all could stop calling an LLM “open source” just because they published the weights… that would be great.”
Office space meme:
“If y’all could stop calling an LLM “open source” just because they published the weights… that would be great.”
Would it? Not sure how that would be a better analogy. The argument is that it’s nearly all open… but it still does not count because the data set before it’s manipulated by the LLM (in my analogy the data set the emulator is using would be a Nintendo ROM) is not open. A data set that if provided would be so massive, it would render the point of tokenization pointless and be completely unusable by literally ANYONE without multiple data centers redlining for WEEKS. Under that standard of scrutiny not only could there never be an LLM that would qualify, but projects that are considered open source would not be. Thus making the distinction meaningless.
An emulator without a ROM mounted is still an emulator, even if not usable.
I don’t understand your objections. Even if the amount of data is rather big, it doesn’t change that this data is part of the source, and leaving it out makes the whole project non-open-source.
What? No? Open-source projects literally do meet this standard.