Looking for resources to better understand LLMs

katsura@leminal.space · 2 months ago

Looking for resources to better understand LLMs

katsura@leminal.space · 2 months ago

thank you for your response, the cauliflower anecdote was enlightening. your description of it being a statistical prediction model is essentially my existing conception of LLMs, but this was only really from gleaning other’s conceptions online, and I’ve recently been concerned it was maybe an incomplete simplification of the process. I will definitely read up on markov chains to try and solidify my understanding of LLM 'prediction

I have kind of a follow up if you have the time. I hear a lot that LLMs are “running out of data” to train on. When it comes creating a bicycle schematic, it doesn’t seem like additional data would make an LLM more effective at a task like this, since its already producing a broken amalgamation. It seems like generally these shortcomings of LLMs’ generalizations would not be alleviated by increased training data. So what exactly is being optimized by massive increases (at this point) in training data–or, conversely, what is threatened by a limited pot?

I ask this because lots of people who preach that LLMs are doomed/useless seem to focus in on this idea that their training is limited. To me their generalization seems like evidence enough that we are no where near the tech-bro dreams of AGI.

xxce2AAb@feddit.dk · 2 months ago

No, you’re quite correct: Additional training data might increase the potential for novel responses and thus enhance the perception of apparent creativity, but that’s just another way to say “decrease correctness”. To stick with the example, if you wanted to have an LLM yield a better bicycle, you should if anything be partitioning the training data and curating it. Garbage in, garbage out. Mess in, mess out.

Put it another way: Novelty implies surprise, surprise implies randomness. Correctness implies consistently yielding the solitary correct answer. The two are inherently mutually opposed.

If you’re interested in how all this nonsense got started, I highly recommend going back and reading Weizenbaum’s original 1966 paper on ELIZA. Even back then, he knew better:

If, for example, one were to tell a psychiatrist “I went for a long boat ride” and he responded “Tell me about boats”, one would not assume that he knew nothing about boats, but that he had some purpose in so directing the subsequent conversation. It is important to note that this assumption is one made by the speaker. Whether it is realistic or not is an altogether separate question. In any case, it has a crucial psychological utility in that it serves the speaker to maintain his sense of being heard and understood. The speaker further defends his impression (which even in real life may be illusory) by attributing to his conversational partner all sorts of background knowledge, insights and reasoning ability. But again, these are the speaker’s contribution to the conversation.

Weizenbaum quickly discovered the harmful effects of human interactions with these kinds of models:

“I had not realized … that extremely short exposures to a relatively simple computer program could induce powerful delusional thinking in quite normal people.” (1976)

lime!@feddit.nu · edit-2 2 months ago

god, the reactions to eliza is such a harbinger of doom. real cassandra moment. it’s an extra weird touchstone for me because we had it on our school computers in the late 90s. the program was called DOCTOR and basically behaved identically to the original, eg find a noun and use it in a sentence. as a 9-year old i found it to be ass, but i’ve only recently learned that some people anthropomorphise everything and can lose themselves totally in “tell me about boats” even if they rationally know what the program is actually doing.

as a 30-something with some understanding of natural language processing, eliza is quite nifty.