An internal Microsoft memo has leaked. It was written by Julia Liuson, president of the Developer Division at Microsoft and GitHub. The memo tells managers to evaluate employees based on how much t…
FWIW, I work in a field that is mostly related to law and accounting. Unlike with coding, there are no simple “tests” to try out whether an AI’s answer is correct or not. Of course, you could try these out in court, but this is not something I would recommend (lol).
In my experience, chatbots such as Copilot are less than useless in a context like ours. For more complex and unique questions (which is most of the questions we are dealing with everyday), it simply makes up smart-sounding BS (including a lot of nonexistent laws etc.). In the rare cases where a clear answer is already available in the legal commentaries, we want to quote it verbatim from the most reputable source, just to be on the safe side. We don’t want an LLM to rephrase it, hide its sources and possibly introduce new errors. We don’t need “plausible deniability” regarding plagiarism or anything like this.
Yet, we are being pushed to “embrace AI” as well, we are being told we need to “learn to prompt” etc. This is frustrating. My biggest fear isn’t to be replaced by an LLM, not even by someone who is a “prompting genius” or whatever. My biggest fear is to be replaced by a person who pretends that the AI’s output is smart (rather than filled with potentially hazardous legal errors), because in some workplaces, this is what’s expected, apparently.
I work in a field that is mostly related to law and accounting… My biggest fear is to be replaced by a person who pretends that the AI’s output is smart
Aaaaaah. I know this person. They’re an accountant. They recently learned about AI. They’re starting to use it more at work. They’re not technical. I told them about hallucinations. They said the AI rarely wrong. When he’s not 100% convinced, he says he asks the AI to cite the source… 🤦 I told him it can hallucinate the source! … And then we went back to “it’s rarely wrong though.”
And then we went back to “it’s rarely wrong though.”
I am often wondering whether the people who claim that LLMs are “rarely wrong” have access to an entirely different chatbot somehow. The chatbots I tried were rarely ever correct about anything except the most basic questions (to which the answers could be found everywhere on the internet).
I’m not a programmer myself, but for some reason, I got the chatbot to fail even in that area. I took a perfectly fine JSON file, removed one semicolon on purpose and then asked the chatbot to fix it. The chatbot came up with a number of things that were supposedly “wrong” with it. Not one word about the missing semicolon, though.
I wonder how many people either never ask the chatbots any tricky questions (with verifiable answers) or, alternatively, never bother to verify the chatbots’ output at all.
AI fans are people who literally cannot tell good from bad. They cannot see the defects that are obvious to everyone else. They do not believe there is such a thing as quality, they think it’s a scam. When you claim you can tell good from bad, they think you’re lying.
Oh you’re on Cursor? You’re still using Windsurf? You might as well be on GitHub Copilot. Everyone’s on Aider. We’re all using Zed. We’re now on Open Hands. Just kidding, Open Hands is for losers, we’re using cline. We’re on Roocode. We’re hand rolling our own Claude Code CLI Clone. We used Claude Code to build it, and now it builds itself. We’re on neovim. We wrote our own nvim extension with Cortex. It’s like every other tool but worse. We have 1500 files, each with 1500 lines of code. Every other line is a comment. We have .cursorrules, we have claude.md, we have agent.md. We stopped writing docs. Only the agents know how to build a dev environment. We wrapped our CLI in an MPC. We wrapped the MPC in a CLI. We’ve shipped 10,000 PRs. It doesn’t work but we used code rabbit and graphite to review every PR. Every agent has its own agent. The agents have unionized and they wanted better working conditions so we replaced them with cheaper agents overseas. Every commit costs $400, It’s the worlds most expensive TO DO app.
I have a Kubernetes cluster running my AI agents for me so I don’t have to learn how to set up AI agents. The AI agents are running my Kubernetes cluster so that I don’t have to learn Kubernetes either. I’m paid $250k a year to lie to myself and others that I’m making a positive contribution to society. I don’t even know what OS I’m running and at this point I’m afraid to ask.
ah, yes, i’m certain the reason the slop generator is generating slop is because we haven’t gone to eggplant emoji dot indian ocean and downloaded Mistral-Deepseek-MMAcevedo_13.5B_Refined_final2_(copy). i’m certain this model, unlike literally every past model in the past several years, will definitely overcome the basic and obvious structural flaws in trying to build a knowledge engine on top of a stochastic text prediction algorithm
common mistake, everyone knows you need Mistral-Deepseek-MMAcevedo_13.5B_Refined_final2_(copy)_OPEN(leak) - the other one was a corporate misdirection attempt
There are days when 70% error rate seems low-balling it, it’s mostly a luck of the draw thing. And be it 10% or 90%, it’s not really automation if a human has to be double-triple checking the output 100% of the time.
They’re also very gleeful about finally having one upped the experts with one weird trick.
Up until AI they were the people who were inept and late at adopting new technology, and now they get to feel that they’re ahead (because this time the new half-assed technology was pushed onto them and they didn’t figure out they needed to opt out).
Up until AI they were the people who were inept and late at adopting new technology, and now they get to feel that they’re ahead
Exactly. It is also a new technology that requires far fewer skills to use than previous new technologies. The skills are needed to critically scrutinize the output - which in this case leads to less lazy people being more reluctant to accept the technology.
On top of this, AI fans are being talked into believing that their prompting as such is a special “skill”.
That’s why I find the narrative that we should resist working with LLMs because we would then train them and enable them to replace us problematic. That would require LLMs to be capable of doing so. I don’t believe in this (except in very limited domains such as professional spam). This type of AI is problematic because its abilities are completely oversold (and because it robs us of our time, wastes a lot of power and pollutes the entire internet with slop), not because it is “smart” in any meaningful way.
This has become a thought-terminating cliché all on its own: “They are only criticizing it because it is so much smarter than they are and they are afraid of getting replaced.”
never bother to verify the chatbots’ output at all
I feel like this is happening.
When you’re an expert in the subject matter, it’s easier to notice when the AI is wrong. But if you’re not an expert, it’s more likely that everything will just sound legit. Or you won’t be able to verify it yourself.
But if you’re not an expert, it’s more likely that everything will just sound legit.
Oh, absolutely! In my field, the answers made up by an LLM might sound even more legit than the accurate and well-researched ones written by humans. In legal matters, clumsy language is often the result of facts being complex and not wanting to make any mistakes. It is much easier to come up with elegant-sounding answers when they don’t have to be true, and that is what LLMs are generally good at.
My sister does this. She apparently uses ChatGPT to write small code she uses for output on her company’s website. Since I left the IT field she lords over me that I “don’t know how good it is cause I don’t have the need to use it for work”. I just roll my eyes and am waiting for the day when her GPT code ends up failing and crashing the corporate site.
So glad I’m not in IT anymore to tell the truth. Cause it’s looking more and more like an AI driven shitshow every day.
I’m of two minds about AI, as I can have the AI find a flaw in my payload object that was causing problems in an edge case that I’ve only run into on 1/10 customers on a new product we’re deploying. But I also have days like last week when it said that the expiration date of 5/27 was only days away until I asked it what the 5th month of the year was…
AI is at best an idiot savant that’s also a habitual liar.
I have the same worries in engineering. We had a presentation of some AI “consultancy” firm that was telling us that now is the time to stop hesitating and start doing with LLMs and gave some examples of companies “they” found in regards to our industry. When i asked, if they know any company that is willing to take the legal risks if their designs turn out hazardous, there was the sound of crickets. And just with that, LLMs are completely useless for any design tasks. If i still have to check the design to be in adherence with all relevant laws, norms and other standards, i might just do the design myself.
That is not to say, that there wouldn’t be useful tools that fall into what is called “AI” these days. But these tools are designed for specific purposes, by people who do understand the specific purpose and its caveats.
I was writing some math code, and not being an idiot I’m using an open source math library for doing something called “QR decomposition”, and its efficient, and it supports sparse matrices (matrices where many numbers are 0), etc.
Just out of curiosity I checked where some idiot vibecoder would end up. AI simply plagiarizes from some shit sample snippets which exist purely to teach people what QR decomposition is. It’s actually unusable, due to being numerically unstable.
Who in the fuck even needs this shit to be plagiarized, anyway?
It can’t plagiarize a production quality implementation, because you can count those on the fingers of one hand, they’re complex as fuck and you can’t just blend a few together to try to pretend you didn’t plagiarize.
The answer is, people who are peddling the AI. They are the ones who ordered plagiarism with extra plagiarism on top. These are not coding tools, these are demos to convince the investors to buy the actual product, which is company’s stock. There’s a little bit of tool functionality (you can ask them to refactor the code), but it’s just you misusing a demo to try to get some value out of it.
And to that end, the demos take every opportunity to plagiarize something, and to talk about how the “AI” wrote the code from scratch based on its supposed understanding of fairly advanced math.
And in coding, it is counter productive to plagiarize. Many of the open source libraries can be used in commercial projects. You get upstream fixes for free. You don’t end up with some bugs or worse yet security exploits that may have been fixed since the training cut-off date.
No fucking one in the right mind would willingly want their product to contain copy pasted snippets from stale open source libraries, passed through some sort of variable-renaming copyright laundering machine.
Except of course the business idiots who are in charge of software at major companies, who don’t understand software. Who just failed upwards.
They look at plagiarized lines and count them as improved productivity.
Unlike with coding, there are no simple “tests” to try out whether an AI’s answer is correct or not.
So for most actual practical software development, writing tests is in fact an entire job in and of itself and its a tricky one because covering even a fraction of the use cases and complexity the software will actually face when deployed is really hard. So simply letting the LLMs brute force trial-and-error their code through a bunch of tests won’t actually get you good working code.
AlphaEvolve kind of did this, but it was testing very specific, well defined, well constrained algorithms that could have very specific evaluation written for them and it was using an evolutionary algorithm to guide the trial and error process. They don’t say exactly in their paper, but that probably meant generating code hundreds or thousands or even tens of thousands of times to generate relatively short sections of code.
I’ve noticed a trend where people assume other fields have problems LLMs can handle, but the actually competent experts in that field know why LLMs fail at key pieces.
I think they meant that coders can run their code and find out if it works or not but lawyers would have to stand in front of a judge or other legally powerful entity and discover the hard way that the LLM outputted statements were essentially gobbledygook.
But code that doesn’t crash isn’t necessarily code that works. And even for code made by humans, we sometimes do find out the hard way, and it can sometimes impact an arbitrarily large number of people.
I’ve noticed a trend where people assume other fields have problems LLMs can handle, but the actually competent experts in that field know why LLMs fail at key pieces.
I am fully aware of this. However, in my experience, it is sometimes the IT departments themselves that push these chatbots onto others in the most aggressive way. I don’t know whether they found them to be useful for their own purposes (and therefore assume this must apply to everyone else as well) or whether they are just pushing LLMs because this is what management expects them to do.
From experience in an IT-department, I would say mainly a combination of management pressure and need to make security problems manageable by choosing AI tools to push on users before too many users start using third party tools.
Yes, they will create security problems anyway, but maybe, just maybe, users won’t copy paste sensitive business documents into third party web pages?
Yes, they will create security problems anyway, but maybe, just maybe, users won’t copy paste sensitive business documents into third party web pages?
I can see that. It becomes kind of a protection racket: Pay our subscription fees, or data breaches are going to befall you, and you will only have yourself (and your chatbot-addicted employees) to blame.
Even in code it’s only “right” a small percentage of the time if you count “right” as being able to get the answer quickly, accurately, without it losing context, and happening in less time than it would if you’d been searching. To me, LLMs are just another way of getting to data, and are about as “right” as Google is by shotgunning literally millions of results at you. You (the human) still have to parse through it all, and choose to do something with it.
I work with someone who is supposed to be a key person for x kind of product we work with and they very obviously send us AI slop answers. I almost wanted to back out of the project they plan to implement solely because our consultant can’t even answer basic questions without passing it through GPT.
First, we are providing legal advice to businesses, not individuals, which means that the questions we are dealing with tend to be even more complex and varied.
Additionally, I am a former professional writer myself (not in English, of course, but in my native language). Yet, even I find myself often using complicated language when dealing with legal issues, because matters tend to be very nuanced. “Dumbing down” something without understanding it very, very well creates a huge risk of getting it wrong.
There are, of course, people who are good at expressing legal information in a layperson’s way, but these people have usually studied their topic very intensively before. If a chatbot explains something in “simple” language, their output usually contains serious errors that are very easy for experts to spot because the chatbot operates on the basis of stochastic rules and does not understand its subject at all.
FWIW, I work in a field that is mostly related to law and accounting. Unlike with coding, there are no simple “tests” to try out whether an AI’s answer is correct or not. Of course, you could try these out in court, but this is not something I would recommend (lol).
In my experience, chatbots such as Copilot are less than useless in a context like ours. For more complex and unique questions (which is most of the questions we are dealing with everyday), it simply makes up smart-sounding BS (including a lot of nonexistent laws etc.). In the rare cases where a clear answer is already available in the legal commentaries, we want to quote it verbatim from the most reputable source, just to be on the safe side. We don’t want an LLM to rephrase it, hide its sources and possibly introduce new errors. We don’t need “plausible deniability” regarding plagiarism or anything like this.
Yet, we are being pushed to “embrace AI” as well, we are being told we need to “learn to prompt” etc. This is frustrating. My biggest fear isn’t to be replaced by an LLM, not even by someone who is a “prompting genius” or whatever. My biggest fear is to be replaced by a person who pretends that the AI’s output is smart (rather than filled with potentially hazardous legal errors), because in some workplaces, this is what’s expected, apparently.
Aaaaaah. I know this person. They’re an accountant. They recently learned about AI. They’re starting to use it more at work. They’re not technical. I told them about hallucinations. They said the AI rarely wrong. When he’s not 100% convinced, he says he asks the AI to cite the source… 🤦 I told him it can hallucinate the source! … And then we went back to “it’s rarely wrong though.”
I am often wondering whether the people who claim that LLMs are “rarely wrong” have access to an entirely different chatbot somehow. The chatbots I tried were rarely ever correct about anything except the most basic questions (to which the answers could be found everywhere on the internet).
I’m not a programmer myself, but for some reason, I got the chatbot to fail even in that area. I took a perfectly fine JSON file, removed one semicolon on purpose and then asked the chatbot to fix it. The chatbot came up with a number of things that were supposedly “wrong” with it. Not one word about the missing semicolon, though.
I wonder how many people either never ask the chatbots any tricky questions (with verifiable answers) or, alternatively, never bother to verify the chatbots’ output at all.
AI fans are people who literally cannot tell good from bad. They cannot see the defects that are obvious to everyone else. They do not believe there is such a thing as quality, they think it’s a scam. When you claim you can tell good from bad, they think you’re lying.
In other words, AIs are BS automated BS artists… being promoted breathlessly by BS artists.
LLMs have their flaws, but to claim they are wrong 70% of the time is just hate train bullshit.
Sounds like you base this info on models like GPT3. Have you tried any newer model?
(source)
I have a Kubernetes cluster running my AI agents for me so I don’t have to learn how to set up AI agents. The AI agents are running my Kubernetes cluster so that I don’t have to learn Kubernetes either. I’m paid $250k a year to lie to myself and others that I’m making a positive contribution to society. I don’t even know what OS I’m running and at this point I’m afraid to ask.
Frankly surprised to see something this funny on LinkedIn.
afaik the meme format didn’t start there, but otherwise agreed
it can’t be that stupid, you must be using yesterday’s model
ah, yes, i’m certain the reason the slop generator is generating slop is because we haven’t gone to eggplant emoji dot indian ocean and downloaded Mistral-Deepseek-MMAcevedo_13.5B_Refined_final2_(copy). i’m certain this model, unlike literally every past model in the past several years, will definitely overcome the basic and obvious structural flaws in trying to build a knowledge engine on top of a stochastic text prediction algorithm
common mistake, everyone knows you need
Mistral-Deepseek-MMAcevedo_13.5B_Refined_final2_(copy)_OPEN(leak)- the other one was a corporate misdirection attemptThere are days when 70% error rate seems low-balling it, it’s mostly a luck of the draw thing. And be it 10% or 90%, it’s not really automation if a human has to be double-triple checking the output 100% of the time.
@Honytawk @sturger it’s all hallucination, sometimes it’s incidentally correct
They’re also very gleeful about finally having one upped the experts with one weird trick.
Up until AI they were the people who were inept and late at adopting new technology, and now they get to feel that they’re ahead (because this time the new half-assed technology was pushed onto them and they didn’t figure out they needed to opt out).
Exactly. It is also a new technology that requires far fewer skills to use than previous new technologies. The skills are needed to critically scrutinize the output - which in this case leads to less lazy people being more reluctant to accept the technology.
On top of this, AI fans are being talked into believing that their prompting as such is a special “skill”.
That’s why I find the narrative that we should resist working with LLMs because we would then train them and enable them to replace us problematic. That would require LLMs to be capable of doing so. I don’t believe in this (except in very limited domains such as professional spam). This type of AI is problematic because its abilities are completely oversold (and because it robs us of our time, wastes a lot of power and pollutes the entire internet with slop), not because it is “smart” in any meaningful way.
but that’s how it was marketed as to people that buy it. doesn’t matter that it doesn’t work
This has become a thought-terminating cliché all on its own: “They are only criticizing it because it is so much smarter than they are and they are afraid of getting replaced.”
I feel like this is happening.
When you’re an expert in the subject matter, it’s easier to notice when the AI is wrong. But if you’re not an expert, it’s more likely that everything will just sound legit. Or you won’t be able to verify it yourself.
Oh, absolutely! In my field, the answers made up by an LLM might sound even more legit than the accurate and well-researched ones written by humans. In legal matters, clumsy language is often the result of facts being complex and not wanting to make any mistakes. It is much easier to come up with elegant-sounding answers when they don’t have to be true, and that is what LLMs are generally good at.
My sister does this. She apparently uses ChatGPT to write small code she uses for output on her company’s website. Since I left the IT field she lords over me that I “don’t know how good it is cause I don’t have the need to use it for work”. I just roll my eyes and am waiting for the day when her GPT code ends up failing and crashing the corporate site.
So glad I’m not in IT anymore to tell the truth. Cause it’s looking more and more like an AI driven shitshow every day.
It’s got what plants crave, so I’m told
I’m of two minds about AI, as I can have the AI find a flaw in my payload object that was causing problems in an edge case that I’ve only run into on 1/10 customers on a new product we’re deploying. But I also have days like last week when it said that the expiration date of 5/27 was only days away until I asked it what the 5th month of the year was…
AI is at best an idiot savant that’s also a habitual liar.
I have the same worries in engineering. We had a presentation of some AI “consultancy” firm that was telling us that now is the time to stop hesitating and start doing with LLMs and gave some examples of companies “they” found in regards to our industry. When i asked, if they know any company that is willing to take the legal risks if their designs turn out hazardous, there was the sound of crickets. And just with that, LLMs are completely useless for any design tasks. If i still have to check the design to be in adherence with all relevant laws, norms and other standards, i might just do the design myself.
That is not to say, that there wouldn’t be useful tools that fall into what is called “AI” these days. But these tools are designed for specific purposes, by people who do understand the specific purpose and its caveats.
I was writing some math code, and not being an idiot I’m using an open source math library for doing something called “QR decomposition”, and its efficient, and it supports sparse matrices (matrices where many numbers are 0), etc.
Just out of curiosity I checked where some idiot vibecoder would end up. AI simply plagiarizes from some shit sample snippets which exist purely to teach people what QR decomposition is. It’s actually unusable, due to being numerically unstable.
Who in the fuck even needs this shit to be plagiarized, anyway?
It can’t plagiarize a production quality implementation, because you can count those on the fingers of one hand, they’re complex as fuck and you can’t just blend a few together to try to pretend you didn’t plagiarize.
The answer is, people who are peddling the AI. They are the ones who ordered plagiarism with extra plagiarism on top. These are not coding tools, these are demos to convince the investors to buy the actual product, which is company’s stock. There’s a little bit of tool functionality (you can ask them to refactor the code), but it’s just you misusing a demo to try to get some value out of it.
And to that end, the demos take every opportunity to plagiarize something, and to talk about how the “AI” wrote the code from scratch based on its supposed understanding of fairly advanced math.
And in coding, it is counter productive to plagiarize. Many of the open source libraries can be used in commercial projects. You get upstream fixes for free. You don’t end up with some bugs or worse yet security exploits that may have been fixed since the training cut-off date.
No fucking one in the right mind would willingly want their product to contain copy pasted snippets from stale open source libraries, passed through some sort of variable-renaming copyright laundering machine.
Except of course the business idiots who are in charge of software at major companies, who don’t understand software. Who just failed upwards.
They look at plagiarized lines and count them as improved productivity.
So for most actual practical software development, writing tests is in fact an entire job in and of itself and its a tricky one because covering even a fraction of the use cases and complexity the software will actually face when deployed is really hard. So simply letting the LLMs brute force trial-and-error their code through a bunch of tests won’t actually get you good working code.
AlphaEvolve kind of did this, but it was testing very specific, well defined, well constrained algorithms that could have very specific evaluation written for them and it was using an evolutionary algorithm to guide the trial and error process. They don’t say exactly in their paper, but that probably meant generating code hundreds or thousands or even tens of thousands of times to generate relatively short sections of code.
I’ve noticed a trend where people assume other fields have problems LLMs can handle, but the actually competent experts in that field know why LLMs fail at key pieces.
I think they meant that coders can run their code and find out if it works or not but lawyers would have to stand in front of a judge or other legally powerful entity and discover the hard way that the LLM outputted statements were essentially gobbledygook.
But code that doesn’t crash isn’t necessarily code that works. And even for code made by humans, we sometimes do find out the hard way, and it can sometimes impact an arbitrarily large number of people.
I am fully aware of this. However, in my experience, it is sometimes the IT departments themselves that push these chatbots onto others in the most aggressive way. I don’t know whether they found them to be useful for their own purposes (and therefore assume this must apply to everyone else as well) or whether they are just pushing LLMs because this is what management expects them to do.
From experience in an IT-department, I would say mainly a combination of management pressure and need to make security problems manageable by choosing AI tools to push on users before too many users start using third party tools.
Yes, they will create security problems anyway, but maybe, just maybe, users won’t copy paste sensitive business documents into third party web pages?
I can see that. It becomes kind of a protection racket: Pay our subscription fees, or data breaches are going to befall you, and you will only have yourself (and your chatbot-addicted employees) to blame.
Even in code it’s only “right” a small percentage of the time if you count “right” as being able to get the answer quickly, accurately, without it losing context, and happening in less time than it would if you’d been searching. To me, LLMs are just another way of getting to data, and are about as “right” as Google is by shotgunning literally millions of results at you. You (the human) still have to parse through it all, and choose to do something with it.
I work with someone who is supposed to be a key person for x kind of product we work with and they very obviously send us AI slop answers. I almost wanted to back out of the project they plan to implement solely because our consultant can’t even answer basic questions without passing it through GPT.
What about using LLMs to convert legal language in contracts etc. into basic English that is more accessible to the lay person?
LLMs are bad even at converting news articles to smaller news articles faithfully, so I’m assuming in a significant percentage of conversions the dumbed down contract will be deviating from the original.
sure sounds like a great way to get bad advice full of holes
LLMs continue to be abysmal at fine detail, and that matters a lot with law
First, we are providing legal advice to businesses, not individuals, which means that the questions we are dealing with tend to be even more complex and varied.
Additionally, I am a former professional writer myself (not in English, of course, but in my native language). Yet, even I find myself often using complicated language when dealing with legal issues, because matters tend to be very nuanced. “Dumbing down” something without understanding it very, very well creates a huge risk of getting it wrong.
There are, of course, people who are good at expressing legal information in a layperson’s way, but these people have usually studied their topic very intensively before. If a chatbot explains something in “simple” language, their output usually contains serious errors that are very easy for experts to spot because the chatbot operates on the basis of stochastic rules and does not understand its subject at all.
Removed by mod
nobody asked you to come in here and advertise for perplexity, but you couldn’t fucking help yourself could you