I was lazy the other day and I asked gemini to set an alarm for me, then asked how long it was until that alarm. Not even fucking close to the right amount of time. I figured it would be smart enough to just subtract for me…
Should have said 8 hours and 2 mins… Nope, just made up a time
well, ish. llms have a vector space of words, image generators of features. they use a second model to associate words with features. Steve’s explanation is a great intro but for a deep dive i recommend Self-Cannibalizing AI from 37C3.
LLMs are Large Language Models and generate text, not images.
(ok, LLMs can’t count either but still)
I was lazy the other day and I asked gemini to set an alarm for me, then asked how long it was until that alarm. Not even fucking close to the right amount of time. I figured it would be smart enough to just subtract for me…
Should have said 8 hours and 2 mins… Nope, just made up a time
Right but as I said in the other thread as well, what do you think is handling the text part of text-to-image creation tools?
Image generators are reverse llms, tbf. Steve Mould has a good explanation of it.
well, ish. llms have a vector space of words, image generators of features. they use a second model to associate words with features. Steve’s explanation is a great intro but for a deep dive i recommend Self-Cannibalizing AI from 37C3.