Coders spent more time prompting and reviewing AI generations than they saved on coding. On the surface, METR’s results seem to contradict other benchmarks and experiments that demonstrate increases in coding efficiency when AI tools are used. But those often also measure productivity in terms of total lines of code or the number of discrete tasks/code commits/pull requests completed, all of which can be poor proxies for actual coding efficiency. These factors lead the researchers to conclude that current AI coding tools may be particularly ill-suited to “settings with very high quality standards, or with many implicit requirements (e.g., relating to documentation, testing coverage, or linting/formatting) that take humans substantial time to learn.” While those factors may not apply in “many realistic, economically relevant settings” involving simpler code bases, they could limit the impact of AI tools in this study and similar real-world situations.
You have to ignore the obsequious optimism bias LLM’s often have. It all comes down to their training set and if they have seen more than you have.
I don’t generally use them on projects I’m already familiar with unless it’s for fairly boring repetitive work that would be fiddly with search and replace, e.g. extract the common code out of these functions and refactor.
When working with unfamiliar code they can have an edge so if I needed a simple mobile app I’d probably give the LLM a go and then tidy up the code once it’s working.
At most I’ll give it 2 or 3 attempts to correct the original approach before I walk away and try something else. If it starts making up functions it APIs that don’t exist that is usually a sign out didn’t know so time to cut your losses and move on.
Their real strengths come in when it comes to digesting large amounts of text and sumerising. Great for saving you reading all the documentation on a project just to try a small thing. But if your going to work on the project going forward your going to want to invest that training data yourself.