• Sl00k@programming.dev
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    Olympic Arena analysis OpenAI analyses

    Compare the GPT increase from their V2 GPT4o model to their reasoning o1 preview model. The jumps from last years GPT 3.5 -> GPT 4 were also quite large. Secondly if you want to take OpenAI’s own research into account that’s in the second image.

    • TootSweet@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      if you want to take OpenAI’s own research into account

      No thank you.

      OlympicArena validation set (text-only)

      “Our extensive evaluations reveal that even advanced models like GPT-4o only achieve a 39.97% overall accuracy (28.67% for mathematics and 29.71% for physics)”

      • The OlympicArena analysis that you cited.
      • Sl00k@programming.dev
        link
        fedilink
        English
        arrow-up
        1
        ·
        10 months ago

        The jump from GPT-4o -> o1 (preview not full release) was a 20% cumulative knowledge jump. If that’s not an improvement in accuracy I’m not sure what is.