Model Evaluation and Threat Research is an AI research charity that looks into the threat of AI agents! That sounds a bit AI doomsday cult, and they take funding from the AI doomsday cult organisat…
I find it more useful doing large language transformations and delving into unknown patterns, languages or environments.
If I know a source head to toe, and I’m proficient with that environment, it’s going to offer little help. Specially if it’s a highly specialized problem.
Since SVB crash there have been firings left and right. I suspect AI is only an excuse for them.
Same experience here, performance is mediocre at best on an established code base. Recall tends to drop sharply as the context expands leading to a lot of errors.
I’ve found coding agents to be great at bootstrapping projects on popular stacks, but once you reach a certain size it’s better to either make it work on isolated files, or code manually and rely on the auto complete.
So far I’ve only found it useful when describing bite-sized tasks in order to get suggestions on which functions are useful from the library/API I’m using. And only when those functions have documentation available on the Internet.
I find it more useful doing large language transformations and delving into unknown patterns, languages or environments.
If I know a source head to toe, and I’m proficient with that environment, it’s going to offer little help. Specially if it’s a highly specialized problem.
Since SVB crash there have been firings left and right. I suspect AI is only an excuse for them.
Same experience here, performance is mediocre at best on an established code base. Recall tends to drop sharply as the context expands leading to a lot of errors.
I’ve found coding agents to be great at bootstrapping projects on popular stacks, but once you reach a certain size it’s better to either make it work on isolated files, or code manually and rely on the auto complete.
So far I’ve only found it useful when describing bite-sized tasks in order to get suggestions on which functions are useful from the library/API I’m using. And only when those functions have documentation available on the Internet.