This paper treats chatbot benchmarks as defective science that can be fixed. And that was never what chatbot benchmarks were for.

The Oxford Reasoning With Machines Lab is pretending not to understand something that they absolutely should understand, given most of the lab’s work is chatbots.

That’s because this paper is also marketing — to sell Reasoning With Machines’ services to the chatbot vendors, so they can do their marketing better. And make the benchmark lies a bit less obvious.