Ok, you have a moderately complex math problem you needed to solve. You gave the problem to 6 LLMS all paid versions. All 6 get the same numbers. Would you trust the answer?

  • supersquirrel@sopuli.xyz
    link
    fedilink
    arrow-up
    21
    ·
    26 days ago

    Why would I bother?

    Calculators exist, logic exists, so no… LLMs are a laughably bad fit for directly doing math, they are bullshit engines they cannot “store” a value without fundamentally exposing it to hallucinating tendencies which is the worst property a calculator could possibly have.

    • tal@olio.cafe
      link
      fedilink
      English
      arrow-up
      1
      ·
      24 days ago

      Why would I bother?

      Because you want to have a single interface that accepts natural-language input and gives answers.

      That doesn’t mean that using an LLM as a calculator is a reasonable approach — though a larger system that incorporates an LLM might be. But I think that the goal is very understandable. I have Maxima, a symbolic math package, on my smartphone and computers. It’s quite competent at probably just about any sort of mathematical problem that pretty much any typical person might want to do. It costs nothing. But…you do need to learn something about the package to be able to use it. You don’t have to learn much of anything that a typical member of the public doesn’t already know to use a prompt that accepts natural-language input. And that barrier is enough that most people won’t use it.

    • Farmdude@lemmy.worldOP
      link
      fedilink
      arrow-up
      3
      arrow-down
      3
      ·
      26 days ago

      It was about all six models getting the same answer from different accounts. I was testing it. Over a hundred each same numbers

      • supersquirrel@sopuli.xyz
        link
        fedilink
        arrow-up
        22
        ·
        edit-2
        26 days ago

        Right so because LLMs are attrocious at actually precisely carrying out logic operations the solution was likely to just throw a normal calculator inside the AI, make the AI use the calculator and then turn around and handwave that the entire thing is AI.

        So… you could just skip the bullshit and use a calculator, the AI just repackages the same answer with more boilerplate bullshit.

        Wolfram Alpha is the non-bullshit version of this.

        https://www.wolframalpha.com/