Looking for resources to better understand LLMs

katsura@leminal.space · 2 months ago

Looking for resources to better understand LLMs

xxce2AAb@feddit.dk · edit-2 2 months ago

They’re just statistical prediction models being leveraged for a specific purpose. Maybe start by reading up on Markov Chains. There’s no awareness, actual thought or creativity involved. And that’s the root of many of the issues with them. You’ve no doubt heard of their tendency to ‘hallucinate’ facts?

There was this story a while ago - can’t find it now, sorry - where somebody had asked an LLM to generate a bicycle schematic with all major parts labelled. The result wasn’t merely lackluster – it was hilariously wrong. Extra stacked handlebars, small third wheels, unconnected gears etc. The issue is that when you query a trained LLM model for an output, the only guarantee is that said output is statistically close to an amalgamation of the inputs on which it was trained, that’s all. In other words, said LLM had been trained on a plethora of Internet discussions, wikis and image sets of mountainbikes, city bicyles, tandem bikes, unicycles from that one circus performer forum and on and on. All these things and more informed the model’s conception of what a ‘bicycle’ is. But because there’s no actual reasoning going on, when asked to generate a bicycle, it emitted something with the same average statistical properties as all of those things munged up together. It should not be surprising that the result didn’t match what any sane human would consider a bicycle. And that’s all “hallucinations” are.

I once read a book by a lecturer from the Danish Academy of Fine Arts. In it, the author raised a number of very good points, one of which I think is pertinent here: It was pointed out, that when he asked novices to draw A) a Stilleben of a head of cauliflower and B) a portrait of a person, and then asked them to self-evaluate how well they did, all of them consistently thought they did better with the cauliflower. That, of course, wasn’t the case. The truth was that they sucked at both equally, but we’ve evolved to be very good at recognizing faces, but not heads of cauliflower (hence the Uncanny Valley syndrome). The reason why I bring that up here, is because we’re seeing something horrifyingly similar with the output of LLM’s: When you’re an expert in a subject matter, it becomes immediately clear that LLM’s don’t do better with, say, code then they do with a labelled bicycle - it’s all trash. It’s just that most people don’t know what good code is or what it’s supposed to look like. All LLM output is at best somewhat tangential to ground truth, and even then mostly by happy accident.

It’s useless. Do not trust their output; it’s largely ‘hallucinated’ nonsense.

katsura@leminal.space · 2 months ago

thank you for your response, the cauliflower anecdote was enlightening. your description of it being a statistical prediction model is essentially my existing conception of LLMs, but this was only really from gleaning other’s conceptions online, and I’ve recently been concerned it was maybe an incomplete simplification of the process. I will definitely read up on markov chains to try and solidify my understanding of LLM 'prediction

I have kind of a follow up if you have the time. I hear a lot that LLMs are “running out of data” to train on. When it comes creating a bicycle schematic, it doesn’t seem like additional data would make an LLM more effective at a task like this, since its already producing a broken amalgamation. It seems like generally these shortcomings of LLMs’ generalizations would not be alleviated by increased training data. So what exactly is being optimized by massive increases (at this point) in training data–or, conversely, what is threatened by a limited pot?

I ask this because lots of people who preach that LLMs are doomed/useless seem to focus in on this idea that their training is limited. To me their generalization seems like evidence enough that we are no where near the tech-bro dreams of AGI.

xxce2AAb@feddit.dk · 2 months ago

No, you’re quite correct: Additional training data might increase the potential for novel responses and thus enhance the perception of apparent creativity, but that’s just another way to say “decrease correctness”. To stick with the example, if you wanted to have an LLM yield a better bicycle, you should if anything be partitioning the training data and curating it. Garbage in, garbage out. Mess in, mess out.

Put it another way: Novelty implies surprise, surprise implies randomness. Correctness implies consistently yielding the solitary correct answer. The two are inherently mutually opposed.

If you’re interested in how all this nonsense got started, I highly recommend going back and reading Weizenbaum’s original 1966 paper on ELIZA. Even back then, he knew better:

If, for example, one were to tell a psychiatrist “I went for a long boat ride” and he responded “Tell me about boats”, one would not assume that he knew nothing about boats, but that he had some purpose in so directing the subsequent conversation. It is important to note that this assumption is one made by the speaker. Whether it is realistic or not is an altogether separate question. In any case, it has a crucial psychological utility in that it serves the speaker to maintain his sense of being heard and understood. The speaker further defends his impression (which even in real life may be illusory) by attributing to his conversational partner all sorts of background knowledge, insights and reasoning ability. But again, these are the speaker’s contribution to the conversation.

Weizenbaum quickly discovered the harmful effects of human interactions with these kinds of models:

“I had not realized … that extremely short exposures to a relatively simple computer program could induce powerful delusional thinking in quite normal people.” (1976)

lime!@feddit.nu · edit-2 2 months ago

god, the reactions to eliza is such a harbinger of doom. real cassandra moment. it’s an extra weird touchstone for me because we had it on our school computers in the late 90s. the program was called DOCTOR and basically behaved identically to the original, eg find a noun and use it in a sentence. as a 9-year old i found it to be ass, but i’ve only recently learned that some people anthropomorphise everything and can lose themselves totally in “tell me about boats” even if they rationally know what the program is actually doing.

as a 30-something with some understanding of natural language processing, eliza is quite nifty.