Yeah, LLMs are decent with coding tasks if you know what you’re doing and can properly guide it (and check it’s work!), but fuck if they don’t take a lot of effort to reign in. I will say they’re pretty damned good at debugging the shit I wrote. I’ve been working on an audit project for a few months and 4o/5 have helped me a good bit to find persistent errors in my execution logic that I just kept missing on rereads and debug runs.
But new generation is painful. I had 5 generate a new function for me yesterday to do some issues recon and report generation, and I spent 20 minutes going back and forth with it dropping fields in the output repeatedly. Even on 5, it still struggles at times to not give you the same wrong answer more than once, or just waffles between wrong answers at times.
LLMs are decent with coding tasks if you know what you’re doing
Only if the thing you are trying to do is commonly used and well documented, but in that case you could just read the documentation instead and learn a thing yourself, right?
The other day I tried to get some instructions on how to do something specific in a rather obscure and rather opaquely documented cli tool that I need for work. I couldn’t quite make sense of the documentation, and I found the program’s behavior a bit erratic, so that’s why I turned to AI. It cheerfully and confidently told me (I’m paraphrasing): oh to do “this specific thing” you have to use the --something-specific switch, and then it gave some command line examples using that switch that looked like they made complete sense.
So I thought: oh, did I overlook that switch? Could it be that easy? So I looked in the documentation and sure enough… the AI had been bullshitting me and that switch didn’t exist.
Then there was the time when I asked it to generate an ARM template (again, poorly documented bullshit) to create some service in Azure with some specific parameters. It gave me something that looked like an ARM template, but sure as hell wasn’t a valid one. This one wasn’t completely useless though, at least I was able to cross reference with an existing template and with some trial-and-error, I was able to copy over some of the elements that I needed.
Dude forgetting stuff has to be one the most frustrating parts of the entire process . Like forgetting a column in a database or just an entire piece of a function you just pasted in… Or trying to change things you never asked it to touch. So freaking annoying. I had standing instructions in it’s memory to not leave out pieces or modify things I didn’t ask for and will put that stuff in the prompt and it just does not care lol.
I’ve used it a lot for coding because I’m not a real programmer (more a code hacker) and need to get things done for a website, but I know just enough to know it’s really stupid sometimes lol.
Dude forgetting stuff has to be one the most frustrating parts of the entire process . Like forgetting a column in a database or just an entire piece of a function you just pasted in
It was actually worse. I was pulling data out of local logs and processing events. I asked to assess a couple columns that I was struggling to parse properly, and it got those ones in, but dropped some of my existing columns. I pointed out the error, it acknowledged the issue, then spat out code that reverted to the first output!
Though, that wasn’t nearly as bad as it telling me that a variable a couple hundred lines and multiple transformations in wasn’t being populated by an early variable, and I literally went in and just copied each declaration line and sent it back like I was smacking an intern on the nose or something…
For a bit designed to read and analyze text, it is surprisingly bad at the whole ‘reading’ aspect. But maybe that’s just how human like the intelligence is /s
Or trying to change things you never asked it to touch. So freaking annoying. I had standing instructions in it’s memory to not leave out pieces or modify things I didn’t ask for and will put that stuff in the prompt and it just does not care lol
OMFG this. I’ve had decent luck recently after setting up a project and explicitly laying out a number of global directives, because yeah, it was awful trying to figure out exactly what changed when I diff the input and output, and fucking everything is red because even the goddamned comments are changed. But even just trying to make it understand basic style requirements was a solid half hour of arguing with it (only partially because I forgot the proper names of casings) so it wouldn’t make me lint the whole goddamned script I just told it to analyze and fix one item.
Yessir I’ve basically run into all of that. It’s fucking infuriating. It really is like talking to a toddler at times. There seems to be a limit to the complexity of what it can process before it just starts messing everything up. Like once you hit its limit, it will not process the entire thing no matter how many times you fix it together like your example. You fix one problem and then it just forgets a different piece. FFFFFFFFFF.
I could definitely write it, but probably not as fast, even with fighting it. The report I got in 25-30 minutes would normally take me closer to 45-60, with having to research what to analyze, figure out how to parse different format of logs and break up and collate them and give a pretty output.
Yeah, LLMs are decent with coding tasks if you know what you’re doing and can properly guide it (and check it’s work!), but fuck if they don’t take a lot of effort to reign in. I will say they’re pretty damned good at debugging the shit I wrote. I’ve been working on an audit project for a few months and 4o/5 have helped me a good bit to find persistent errors in my execution logic that I just kept missing on rereads and debug runs.
But new generation is painful. I had 5 generate a new function for me yesterday to do some issues recon and report generation, and I spent 20 minutes going back and forth with it dropping fields in the output repeatedly. Even on 5, it still struggles at times to not give you the same wrong answer more than once, or just waffles between wrong answers at times.
Only if the thing you are trying to do is commonly used and well documented, but in that case you could just read the documentation instead and learn a thing yourself, right?
The other day I tried to get some instructions on how to do something specific in a rather obscure and rather opaquely documented cli tool that I need for work. I couldn’t quite make sense of the documentation, and I found the program’s behavior a bit erratic, so that’s why I turned to AI. It cheerfully and confidently told me (I’m paraphrasing): oh to do “this specific thing” you have to use the
--something-specific
switch, and then it gave some command line examples using that switch that looked like they made complete sense.So I thought: oh, did I overlook that switch? Could it be that easy? So I looked in the documentation and sure enough… the AI had been bullshitting me and that switch didn’t exist.
Then there was the time when I asked it to generate an ARM template (again, poorly documented bullshit) to create some service in Azure with some specific parameters. It gave me something that looked like an ARM template, but sure as hell wasn’t a valid one. This one wasn’t completely useless though, at least I was able to cross reference with an existing template and with some trial-and-error, I was able to copy over some of the elements that I needed.
Dude forgetting stuff has to be one the most frustrating parts of the entire process . Like forgetting a column in a database or just an entire piece of a function you just pasted in… Or trying to change things you never asked it to touch. So freaking annoying. I had standing instructions in it’s memory to not leave out pieces or modify things I didn’t ask for and will put that stuff in the prompt and it just does not care lol.
I’ve used it a lot for coding because I’m not a real programmer (more a code hacker) and need to get things done for a website, but I know just enough to know it’s really stupid sometimes lol.
It was actually worse. I was pulling data out of local logs and processing events. I asked to assess a couple columns that I was struggling to parse properly, and it got those ones in, but dropped some of my existing columns. I pointed out the error, it acknowledged the issue, then spat out code that reverted to the first output!
Though, that wasn’t nearly as bad as it telling me that a variable a couple hundred lines and multiple transformations in wasn’t being populated by an early variable, and I literally went in and just copied each declaration line and sent it back like I was smacking an intern on the nose or something…
For a bit designed to read and analyze text, it is surprisingly bad at the whole ‘reading’ aspect. But maybe that’s just how human like the intelligence is /s
OMFG this. I’ve had decent luck recently after setting up a project and explicitly laying out a number of global directives, because yeah, it was awful trying to figure out exactly what changed when I diff the input and output, and fucking everything is red because even the goddamned comments are changed. But even just trying to make it understand basic style requirements was a solid half hour of arguing with it (only partially because I forgot the proper names of casings) so it wouldn’t make me lint the whole goddamned script I just told it to analyze and fix one item.
Yessir I’ve basically run into all of that. It’s fucking infuriating. It really is like talking to a toddler at times. There seems to be a limit to the complexity of what it can process before it just starts messing everything up. Like once you hit its limit, it will not process the entire thing no matter how many times you fix it together like your example. You fix one problem and then it just forgets a different piece. FFFFFFFFFF.
Out of curiosity, do you feel that you would have been able to write that new function without an LLM in less time than you spent fighting GPT5?
I could definitely write it, but probably not as fast, even with fighting it. The report I got in 25-30 minutes would normally take me closer to 45-60, with having to research what to analyze, figure out how to parse different format of logs and break up and collate them and give a pretty output.