We’re entering the ‘blockchain for every need’ stage. Expect massive money to flow into scams, poor ideas, and outright dangerous uses for a few years .
Before Blockchain we had ‘the web’ itself in the dot com era. Before that? I saw it in basic computing as a solution to everything.
One or two models have increased in accuracy. Meanwhile all the grifters have caught on and there’s 1000x more AI companies out there that are just reselling ChatGPT with some new paint.
Compare the GPT increase from their V2 GPT4o model to their reasoning o1 preview model. The jumps from last years GPT 3.5 -> GPT 4 were also quite large. Secondly if you want to take OpenAI’s own research into account that’s in the second image.
if you want to take OpenAI’s own research into account
No thank you.
OlympicArena validation set (text-only)
“Our extensive evaluations reveal that even advanced models like GPT-4o only achieve a 39.97% overall accuracy (28.67% for mathematics and 29.71% for physics)”
The jump from GPT-4o -> o1 (preview not full release) was a 20% cumulative knowledge jump. If that’s not an improvement in accuracy I’m not sure what is.
They do. Reality is not going to change though. You can enable a handicapped developer to code with LLMs, but you can’t win a foot race by using a wheelchair.
While I agree “they should be doing these studies continuously” point of view, I think the bigger red flag here is that with the advancements of AI, a study published in 2023 (meaning the experiment was done much earlier) is deeply irrelevant today in late 2024. It feels misleading and disingenuous to be sharing this today.
The problem that the study reveals is that people who use AI-generated code as a rule don’t understand it and aren’t capable of debugging it. As a result, bigger LLMs will not change that.
I did in fact read the paper before my reply. I’d recommend considering the participants pool — this is a very common problem in most academic research, but is very relevant given the argument you’re claiming — with vast majority of the participants being students (over 60% if memory serves; I’m on mobile currently and can’t go back to read easily) and most of which being undergraduate students with very limited exposure to actual dev work. They are then prompted to, quite literally as the first question, produce code for asymmetrical encryption and deception.
Seasoned developers know not to implement their own encryption because it is a very challenging space; this is similar to polling undergraduate students to conduct brain surgery and expect them to know what to look for.
Its the inherent disconnect between “News” and “Science”.
Science requires rigorous study and incremental advancement. A 2023 article based on 2022 data is inherently understood to be… 2022 data (note: I did not actually check but that is the timeline I assume. It is in the study).
But news and social media just want headlines that get people angry and reinforce whatever nonsense people want to Believe.
It is similar to explaining basic concepts. Been a minute since the last time I was properly briefed, but think stuff like “Do NOT say ‘theory’ of evolution. Instead, talk about how evolution is the only accepted justification based on evidence and research”
Completely agree with you on the news vs science aspect. At the same time, it is worth considering that not all science researches are evergreen… I know this all too well; as a UX researcher in the late 2000s / early 2010s studying mobile UX/UI, most of the stuff our lab has done was basically irrelevant the year after they were published. Yet, the lab preserved and continues to conduct studies and add incremental knowledge to the field. At the pace generative AI/LLMs are progressing, studies against commercially available models in 2023 is largely irrelevant in the space we are in, and while updated studies are still important, I feel older articles doesn’t shine an appropriate light on the subject in this context.
A lot of words to say that despite the linked article being a scientific research, since the article is dropped here without context nor any leading discussion, it leans more towards the news spectrum, and gives off the impression that OP just want to leverage the headline to strike emotion and reinforce peoples’ believes on outdated information.
It isn’t about being “evergreen”. It is about having historical evidence.
Because maybe someone will do a study in 2030 and want to be able to compare to your UX research in the 2000s. If you wrote your paper properly they can reproduce your experiments (to the degree reasonable) and actually demonstrate progress.
2023? Like last year? Like when LLMs were just a curiosity more than anything useful?
They should be doing these studies continuously…
Edit: Oh no, I forgot Lemmy hates LLMs. Oh well, can’t blame you guys, hate is the basic manifestation towards what scares you, and it’s revealing.
I’m sure they will, here’s year one.
Unlike this year when LLMs are more of a huge scam.
We’re entering the ‘blockchain for every need’ stage. Expect massive money to flow into scams, poor ideas, and outright dangerous uses for a few years .
Before Blockchain we had ‘the web’ itself in the dot com era. Before that? I saw it in basic computing as a solution to everything.
Curious why your perspective is they’re are more of a scam when by all metrics they’ve only improved in accuracy?
One or two models have increased in accuracy. Meanwhile all the grifters have caught on and there’s 1000x more AI companies out there that are just reselling ChatGPT with some new paint.
Source?
Compare the GPT increase from their V2 GPT4o model to their reasoning o1 preview model. The jumps from last years GPT 3.5 -> GPT 4 were also quite large. Secondly if you want to take OpenAI’s own research into account that’s in the second image.
No thank you.
“Our extensive evaluations reveal that even advanced models like GPT-4o only achieve a 39.97% overall accuracy (28.67% for mathematics and 29.71% for physics)”
The jump from GPT-4o -> o1 (preview not full release) was a 20% cumulative knowledge jump. If that’s not an improvement in accuracy I’m not sure what is.
They do. Reality is not going to change though. You can enable a handicapped developer to code with LLMs, but you can’t win a foot race by using a wheelchair.
I’m just waiting for someone to lecture me how the speed record in wheelchair sprint beats feet’s ass…
Hmm. To me 2023 was the breakthrough year for them. Now we are already getting used to their flaws.
Hmmm, it’s almost like the study was testing peoples perception of the usefulness of AI vs the actual usefulness and results that came out.
While I agree “they should be doing these studies continuously” point of view, I think the bigger red flag here is that with the advancements of AI, a study published in 2023 (meaning the experiment was done much earlier) is deeply irrelevant today in late 2024. It feels misleading and disingenuous to be sharing this today.
No. I would suggest you actually read the study.
The problem that the study reveals is that people who use AI-generated code as a rule don’t understand it and aren’t capable of debugging it. As a result, bigger LLMs will not change that.
I did in fact read the paper before my reply. I’d recommend considering the participants pool — this is a very common problem in most academic research, but is very relevant given the argument you’re claiming — with vast majority of the participants being students (over 60% if memory serves; I’m on mobile currently and can’t go back to read easily) and most of which being undergraduate students with very limited exposure to actual dev work. They are then prompted to, quite literally as the first question, produce code for asymmetrical encryption and deception.
Seasoned developers know not to implement their own encryption because it is a very challenging space; this is similar to polling undergraduate students to conduct brain surgery and expect them to know what to look for.
Its the inherent disconnect between “News” and “Science”.
Science requires rigorous study and incremental advancement. A 2023 article based on 2022 data is inherently understood to be… 2022 data (note: I did not actually check but that is the timeline I assume. It is in the study).
But news and social media just want headlines that get people angry and reinforce whatever nonsense people want to Believe.
It is similar to explaining basic concepts. Been a minute since the last time I was properly briefed, but think stuff like “Do NOT say ‘theory’ of evolution. Instead, talk about how evolution is the only accepted justification based on evidence and research”
Completely agree with you on the news vs science aspect. At the same time, it is worth considering that not all science researches are evergreen… I know this all too well; as a UX researcher in the late 2000s / early 2010s studying mobile UX/UI, most of the stuff our lab has done was basically irrelevant the year after they were published. Yet, the lab preserved and continues to conduct studies and add incremental knowledge to the field. At the pace generative AI/LLMs are progressing, studies against commercially available models in 2023 is largely irrelevant in the space we are in, and while updated studies are still important, I feel older articles doesn’t shine an appropriate light on the subject in this context.
A lot of words to say that despite the linked article being a scientific research, since the article is dropped here without context nor any leading discussion, it leans more towards the news spectrum, and gives off the impression that OP just want to leverage the headline to strike emotion and reinforce peoples’ believes on outdated information.
It isn’t about being “evergreen”. It is about having historical evidence.
Because maybe someone will do a study in 2030 and want to be able to compare to your UX research in the 2000s. If you wrote your paper properly they can reproduce your experiments (to the degree reasonable) and actually demonstrate progress.