Model Evaluation and Threat Research is an AI research charity that looks into the threat of AI agents! That sounds a bit AI doomsday cult, and they take funding from the AI doomsday cult organisat…
The study was centered on bugfixing large established projects. This task is not really the one that AI helpers excel at.
Also small number of participants (16) , the participants were familiar with the code base and all tasks seems to be smaller in completion time can screw results.
Thus the divergence between studio results and many people personal experience that would experience increase of productivity because they are doing different tasks in a different scenario.
I find it more useful doing large language transformations and delving into unknown patterns, languages or environments.
If I know a source head to toe, and I’m proficient with that environment, it’s going to offer little help. Specially if it’s a highly specialized problem.
Since SVB crash there have been firings left and right. I suspect AI is only an excuse for them.
Same experience here, performance is mediocre at best on an established code base. Recall tends to drop sharply as the context expands leading to a lot of errors.
I’ve found coding agents to be great at bootstrapping projects on popular stacks, but once you reach a certain size it’s better to either make it work on isolated files, or code manually and rely on the auto complete.
You have to get familiar with the codebase at some point. When you are unfamiliar, in my experience, LLMs can provide help understanding it.
Copying large portions of code you don’t really understand and asking for an analysis and explanation.
Not so far ago I used it on assembly code. It would have taken ages to decipher what it was doing by myself. The AI sped up the process.
But once you are very familiar with a established project you had work a lot with, I don’t even bother asking LLMs anything, as in my experience, I come up with better answers quicker.
At the end of the day we must understand that a LLM is more or less an statistical autocomplete trained on a large dataset. If your solution is not on the dataset the thing is not going to really came up with a creative solution. And the thing is not going to run a debugger on your code either, afaik.
When I use it the question I ask myself the most before bothering is “is the solution likely to be on the training dataset?” or “is it a task that can be solved as a language problem?”
The study was centered on bugfixing large established projects. This task is not really the one that AI helpers excel at.
Also small number of participants (16) , the participants were familiar with the code base and all tasks seems to be smaller in completion time can screw results.
Thus the divergence between studio results and many people personal experience that would experience increase of productivity because they are doing different tasks in a different scenario.
“AI is good for Hello World projects written in javascript.”
Managers will still fire real engineers though.
I find it more useful doing large language transformations and delving into unknown patterns, languages or environments.
If I know a source head to toe, and I’m proficient with that environment, it’s going to offer little help. Specially if it’s a highly specialized problem.
Since SVB crash there have been firings left and right. I suspect AI is only an excuse for them.
Same experience here, performance is mediocre at best on an established code base. Recall tends to drop sharply as the context expands leading to a lot of errors.
I’ve found coding agents to be great at bootstrapping projects on popular stacks, but once you reach a certain size it’s better to either make it work on isolated files, or code manually and rely on the auto complete.
Call me crazy but I think developers should understand what they’re working on, and using LLM tools doesn’t provide a shortcut there.
You have to get familiar with the codebase at some point. When you are unfamiliar, in my experience, LLMs can provide help understanding it. Copying large portions of code you don’t really understand and asking for an analysis and explanation.
Not so far ago I used it on assembly code. It would have taken ages to decipher what it was doing by myself. The AI sped up the process.
But once you are very familiar with a established project you had work a lot with, I don’t even bother asking LLMs anything, as in my experience, I come up with better answers quicker.
At the end of the day we must understand that a LLM is more or less an statistical autocomplete trained on a large dataset. If your solution is not on the dataset the thing is not going to really came up with a creative solution. And the thing is not going to run a debugger on your code either, afaik.
When I use it the question I ask myself the most before bothering is “is the solution likely to be on the training dataset?” or “is it a task that can be solved as a language problem?”