"participants who had access to an AI assistant wrote significantly less secure code" and "were also more likely to believe they wrote secure code" - 2023 Stanford University study published at CCS23

Arthur Besse@lemmy.ml · 10 months ago

"participants who had access to an AI assistant wrote significantly less secure code" and "were also more likely to believe they wrote secure code" - 2023 Stanford University study published at CCS23

Sl00k@programming.dev · 10 months ago

Olympic Arena analysis OpenAI analyses

Compare the GPT increase from their V2 GPT4o model to their reasoning o1 preview model. The jumps from last years GPT 3.5 -> GPT 4 were also quite large. Secondly if you want to take OpenAI’s own research into account that’s in the second image.

TootSweet@lemmy.world · 10 months ago

if you want to take OpenAI’s own research into account

No thank you.

OlympicArena validation set (text-only)

“Our extensive evaluations reveal that even advanced models like GPT-4o only achieve a 39.97% overall accuracy (28.67% for mathematics and 29.71% for physics)”

The OlympicArena analysis that you cited.

Sl00k@programming.dev · 10 months ago

The jump from GPT-4o -> o1 (preview not full release) was a 20% cumulative knowledge jump. If that’s not an improvement in accuracy I’m not sure what is.