

If you allow my searchxng search scraper then an AI scraper is indistinguishable.
If you mean, “google and duckduckgo are whitelisted” then lemmy will only be searchable there, those specific whitelisted hosts. And google search index is also an AI scraper bot.
If the rendering data for scraper was really the problem Then the solution is simple, just have downloadable dumps of the publicly available information That would be extremely efficient and cost fractions of pennies in monthly bandwidth Plus the data would be far more usable for whatever they are using it for.
The problem is trying to have freely available data, but for the host to maintain the ability to leverage this data later.
I don’t think we can have both of these.