Office space meme:

“If y’all could stop calling an LLM “open source” just because they published the weights… that would be great.”

  • whotookkarl@lemmy.world
    link
    fedilink
    arrow-up
    12
    arrow-down
    2
    ·
    edit-2
    2 days ago

    A closer analogy would be only providing the binary output of the emulator build and calling it open source. If you can’t reproduce building the output from what they provide in what way is it reproducible? The model is the output, the training data and algorithm to build the model based on the training data are the input.

    Edit: Say I have a Java project I want to open source. Normally (oversimplifying a bit) it goes .java source files used with a compiler to build intermediate bytecode in .class files, then there’s a just in time (JIT) compilation to create the binary code as it runs in the JVM. It’s not open source if I only share the class files, even if I can use them to recreate source files that can be recompiled into the same class files. Starting at an intermediate step of the process isn’t the source.

    • WraithGear@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      5
      ·
      2 days ago

      Would it? Not sure how that would be a better analogy. The argument is that it’s nearly all open… but it still does not count because the data set before it’s manipulated by the LLM (in my analogy the data set the emulator is using would be a Nintendo ROM) is not open. A data set that if provided would be so massive, it would render the point of tokenization pointless and be completely unusable by literally ANYONE without multiple data centers redlining for WEEKS. Under that standard of scrutiny not only could there never be an LLM that would qualify, but projects that are considered open source would not be. Thus making the distinction meaningless.

      An emulator without a ROM mounted is still an emulator, even if not usable.

      • FooBarrington@lemmy.world
        link
        fedilink
        arrow-up
        3
        ·
        2 days ago

        I don’t understand your objections. Even if the amount of data is rather big, it doesn’t change that this data is part of the source, and leaving it out makes the whole project non-open-source.

        Under that standard of scrutiny not only could there never be an LLM that would qualify, but projects that are considered open source would not be. Thus making the distinction meaningless.

        What? No? Open-source projects literally do meet this standard.