(Multimodal) GPT ≠ “pure” LLM. GPT-4o uses an LLM for the language parts, as well as having voice processing and generation built-in, but it uses a technically distinct (though well-integrated) model called “GPT Image 1” for generating images.
You can’t really train or treat image generation with the same approach as natural language, given it isn’t natural language. A binary string doesn’t adhere to the same patterns as human speech.
Notice how all the black people are at the back of the boat.
Why are all the white ones only raising their right hand to chest height and everyone else has both hands up.
That’s not fair, one of the white guys also has a phantom black hand popping out of his head. Now don’t you feel foolish?
He had to specify no Nazi salute in his prompt ?
Never thought ai would be the ones pushing technofacism.
That’s sarcastic, right?
LLM slop factories are overtly racist because they’re trained on shit lifted straight off the internet.
That’s image generation, not LLM (language/text generation), but the point stands
Hate to bring it to you, but today’s image generation comes through LLMs
(Multimodal) GPT ≠ “pure” LLM. GPT-4o uses an LLM for the language parts, as well as having voice processing and generation built-in, but it uses a technically distinct (though well-integrated) model called “GPT Image 1” for generating images.
You can’t really train or treat image generation with the same approach as natural language, given it isn’t natural language. A binary string doesn’t adhere to the same patterns as human speech.
Just curious, does the LLM generate a text prompt for the image model, or is there a deeper integration at the embedding level/something else?