Oxford pretends "AI" benchmarks are science, not marketing

pivot-to-ai.com

cross-posted to:
[email protected]

Oxford pretends "AI" benchmarks are science, not marketing

pivot-to-ai.com

technocrit@lemmy.dbzer0.com to

Fuck AI@lemmy.world · 8 hours ago

cross-posted to:
[email protected]

Oxford pretends AI benchmarks are science, not marketing

pivot-to-ai.com

Chatbot vendors routinely make up a new benchmark, then brag how well their hot new chatbot does on it. Like that time OpenAI’s o3 model trounced the FrontierMath benchmark, and it’s just a coincid…

This paper treats chatbot benchmarks as defective science that can be fixed. And that was never what chatbot benchmarks were for.

The Oxford Reasoning With Machines Lab is pretending not to understand something that they absolutely should understand, given most of the lab’s work is chatbots.

That’s because this paper is also marketing — to sell Reasoning With Machines’ services to the chatbot vendors, so they can do their marketing better. And make the benchmark lies a bit less obvious.

You must log in or register to comment.

Chat