I think you’re misunderstanding how AI search actually works. When you ask it to do something timely like “find me a good place to eat”, it’s not looking through its training data for the answer. There might be restaurant reviews in the training data, sure, but that stuff goes stale extremely quickly, and it’s way too expensive to train new versions of the model frequently enough to keep up with that shifting data.
What they do instead is a technique called RAG — retrieval assisted generation. With RAG, data from some other system (a database, a search engine, etc) is pushed into the LLM’s context window (basically it’s short-term memory) so that it can use that data when crafting a response. When you ask AI for restaurant reviews of whatever, it’s just RAGing in Yelp or Google data and summarizing that. And because that’s all it’s doing, the same SEO techniques (and paid advertising deals) that push stuff to the top of a Google search will also push that same stuff to the front of the AI’s working memory. The model’s own training data guides it through the process of synthesizing a response out of that RAG data, but if the RAG data is crap, the LLMs response will still be crap.
Further, you can inject more text into the LLMbecile’s hidden prompt to cause some things to show up more often. Think Grok’s weird period where it was attaching the supposed plight of white people in South Africa into every query, but more subtle.
I think you’re misunderstanding how AI search actually works. When you ask it to do something timely like “find me a good place to eat”, it’s not looking through its training data for the answer. There might be restaurant reviews in the training data, sure, but that stuff goes stale extremely quickly, and it’s way too expensive to train new versions of the model frequently enough to keep up with that shifting data.
What they do instead is a technique called RAG — retrieval assisted generation. With RAG, data from some other system (a database, a search engine, etc) is pushed into the LLM’s context window (basically it’s short-term memory) so that it can use that data when crafting a response. When you ask AI for restaurant reviews of whatever, it’s just RAGing in Yelp or Google data and summarizing that. And because that’s all it’s doing, the same SEO techniques (and paid advertising deals) that push stuff to the top of a Google search will also push that same stuff to the front of the AI’s working memory. The model’s own training data guides it through the process of synthesizing a response out of that RAG data, but if the RAG data is crap, the LLMs response will still be crap.
Further, you can inject more text into the LLMbecile’s hidden prompt to cause some things to show up more often. Think Grok’s weird period where it was attaching the supposed plight of white people in South Africa into every query, but more subtle.