They presented their methodology in an open and clear way and provide their data for everyone to interpret. You can disagree with conclusions but it’s pretty harsh to say it’s “misleading” simply because you don’t like the results.
They just translated one algorithm in many languages, without using the language constructs or specificities to make the algorithm decent performant wise.
They used two datasets, if you read the paper… It wasn’t “one algorithm” it was several from publicly available implementations of those algorithms. They chose an “optimized” set of algorithms from “The Computer Language Benchmarks Game” to produce results for well-optimized code in each language. They then used implementations of various algorithms from Rosetta Code which contained more… typical implementations that don’t have a heavy focus on performance.
In fact - using “typical language constructs or specificities” hurt the Java language implementations since List is slower than using arrays. It performed much better (surprisingly well actually) in the optimized tests than in the Rosetta Code tests.
They presented their methodology in an open and clear way and provide their data for everyone to interpret. You can disagree with conclusions but it’s pretty harsh to say it’s “misleading” simply because you don’t like the results.
They used two datasets, if you read the paper… It wasn’t “one algorithm” it was several from publicly available implementations of those algorithms. They chose an “optimized” set of algorithms from “The Computer Language Benchmarks Game” to produce results for well-optimized code in each language. They then used implementations of various algorithms from Rosetta Code which contained more… typical implementations that don’t have a heavy focus on performance.
In fact - using “typical language constructs or specificities” hurt the Java language implementations since List is slower than using arrays. It performed much better (surprisingly well actually) in the optimized tests than in the Rosetta Code tests.
Honestly that’s all you need to know to throw this paper away.
Why?
It’s a very heavily gamed benchmark. The most frequent issues I’ve seen are:
They’ve finally started labelling stupid submissions with “contentious” labels at least, but not when this study was done.
They provide the specific implementations used here: https://github.com/greensoftwarelab/Energy-Languages
I dislike the “I thought of something that may be an issue therefore just dismiss all of the work without thinking” approach.