• Sl00k@programming.dev
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      Olympic Arena analysis OpenAI analyses

      Compare the GPT increase from their V2 GPT4o model to their reasoning o1 preview model. The jumps from last years GPT 3.5 -> GPT 4 were also quite large. Secondly if you want to take OpenAI’s own research into account that’s in the second image.

      • TootSweet@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        10 months ago

        if you want to take OpenAI’s own research into account

        No thank you.

        OlympicArena validation set (text-only)

        “Our extensive evaluations reveal that even advanced models like GPT-4o only achieve a 39.97% overall accuracy (28.67% for mathematics and 29.71% for physics)”

        • The OlympicArena analysis that you cited.
        • Sl00k@programming.dev
          link
          fedilink
          English
          arrow-up
          1
          ·
          10 months ago

          The jump from GPT-4o -> o1 (preview not full release) was a 20% cumulative knowledge jump. If that’s not an improvement in accuracy I’m not sure what is.