Office space meme:

“If y’all could stop calling an LLM “open source” just because they published the weights… that would be great.”

  • Prunebutt@slrpnk.netOP
    link
    fedilink
    arrow-up
    15
    ·
    2 days ago

    So an emulator can’t be open source if the methodology on how the developers discovered how to read Nintendo ROM’s was discovered?

    No. The emulator is open source if it supplies the way on hou to get the binary in the end. I don’t know how else to explain it to you: No LLM is open source.

    • WraithGear@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      arrow-down
      4
      ·
      2 days ago

      So i still don’t see your issue with deepseek, because just like an emulator, everything is open source, with the exception of the data. The end result is dependent on the ROM put in to it, you can always make your own ROM, if you had the tools, and the end result followed the expected format. And if the ROM was removed the emulator is still the emulator.

      So if deep seek removed its data set, would you then consider deepseek open source?

      • Fushuan [he/him]@lemm.ee
        link
        fedilink
        English
        arrow-up
        5
        ·
        2 days ago

        The engine is open source, the model is not.

        The enumqtor is open source, the games it can run are not.

        I don’t see how it’s so hard to understand.

        They are saying that the model that the engine is running is open source because they released the model. That’s like saying that a game is open source because I released an emulator and the exscutable file. It’s just not true.

      • Prunebutt@slrpnk.netOP
        link
        fedilink
        arrow-up
        7
        ·
        2 days ago

        everything is open source, with the exception of the data

        If I distribute a set consisting of emulator and a Rom of a closed source game (without the sourcecode), then the full set is not open source.

        So if deep seek removed its data set, would you then consider deepseek open source?

        Kind of, but that’s like expecting a console without any firmware. The Weights are the important bit of an LLM distribution.

        • WraithGear@lemmy.world
          link
          fedilink
          English
          arrow-up
          5
          arrow-down
          3
          ·
          2 days ago

          So like an emulator. Or at least the PS2 ones when you had to dump your bios from your machine (or snatch someone else’s).

          But that’s my point! The data set is interchangeable. So Its not what makes the deepseek, THE deepseek LLM . But without the data set it would be functionally useless. And there would be no way possible to satisfy your requirement for data set openness. You said there is some line in the sand somewhere where you might be satisfied with some amount of the data, but your argument states that granularity must be absolute in order to justify calling it open source. You demand an impossible unnecessary standard that is not held to other open source projects.

          • Prunebutt@slrpnk.netOP
            link
            fedilink
            arrow-up
            1
            ·
            1 day ago

            The differenge is that the dataset is baked into the weights of the model. Your emulation analogy simply doesn’t have a leg to stand on. I don’t think you know how neural networks work.

            The standards are literally the basis of open source.

            • WraithGear@lemmy.world
              link
              fedilink
              English
              arrow-up
              1
              ·
              1 day ago

              I made my level of understanding kinda open at the start. And you say it’s not, open source most say it is, and they explained why, and when i checked all their points were true, and o tried to understand as best i could. The bottom line is that the reason for the disagreement is you say the training data and the weights together are an inseparable part of the whole and if any part of that is not open then the project as a whole is not open. I don’t see how that tracks when the weights are open, and both it and the training data can be removed and switched to something else. But i have come to believe the response would just boil down to you can’t separate it. There really is no where else to go at this point.

              • Prunebutt@slrpnk.netOP
                link
                fedilink
                arrow-up
                1
                ·
                1 day ago

                You can read all the other comments which explained why it is not open source. You can’t really retrain the model without petabytes of data. Even if you “train” stuff on your dataset: it’s more like tweaking the model weights a bit, rather than building the model from scratch.

                “Open source” is PR talk by Meta and deepseek.

          • xttweaponttx@sh.itjust.works
            link
            fedilink
            arrow-up
            3
            ·
            2 days ago

            Just wanted to thank you both for this discourse! As somebody who’s interested in AI but totally ignorant to how the hell it works, I found this conversation very helpful! I would say you both have good points. Happy days to you both! 🙂

          • mamotromico@lemmy.ml
            link
            fedilink
            arrow-up
            6
            ·
            2 days ago

            Just to add, a good chunk of newer emulators require you to get a dump of the firmware externally, not just the ps2. Pretty much anything from ps2 onwards is like that.