cross-posted from: https://lemmy.today/post/35487250
I’m looking at self-hosting SearXNG. I have an old Win 11 machine and figure this might be the only way it can be useful.
Two questions I haven’t seen answered so far:
I would be hosting on my own home network, which is on a VPN 24/7, but for added privacy my devices are sometimes on VPN connections to other IPs. So I need to know the external IP of the instance to be able to find it. Are there any added measures I should put in place to prevent randoms looking at IPs or port scanning from finding the instance and going to town?
If this is on my home network anyway, are there any risks of data leaking or triangulation of, say, referrals or image searches that would just point back to my home network?
My threat model is for big tech to leave me alone, so it’s not exactly huge stakes, but I also don’t want to bother self-hosting if added complexity makes it not worth it.
(Not an expert) hosting your own instance will make you more identifiable to big tech than if you used a public instance, but it would still increase your privacy compared to giving everything to them, and also prevent you from giving a public instance your data. I currently use “priv.au” but do plan on hosting my own in the near future. Some people who host their own instance even intentionally open it up to the public to crowd source more data points so that their traffic blends in better (not saying I recommend that though).
Tldr: it should still be worth it
In regards to connecting, you should still be able to hop from other vpns to your home network, just keep in mind they you will get higher latency jumping from their VPN network back to yours. I don’t recommend opening it up publicaly just to do that, unless you plan on going all in and having something in front of it like “fail2ban” and Anubis" another option is looking into “tailscale” and if you don’t trust their central server you can selfhost with “head scale” or use a different but adjacent product “pangolin”. These products basically let you creat your our VPN that spans multiple network.
Thanks, this is helpful. It sounds like maybe cycling a few known public instances makes more sense for me personally. The inherent MITM aspect always kind of creeped me out, but the results are pretty good, so I always come back to it.
My only thought on a way to easily have it open internet-facing and still not get overwhelmed would be to put it all behind a bare bones login page with super long credentials and rate limiting and I just save the credentials in a password manager. But if it’s just going to bring Big G looking back at me, I’d rather not bother since that’s the thing I’m trying to avoid.
Thanks again - this is a huge help.
No problem!
I completely know what you mean, it took a lot of research before I felt comfortable enough trusting a public instance enough to use.
So that solution would still decrease their ability to fingerprint you by a lot, but really the big problem would all the people/scripts randomly hammering your ip. They wouldn’t get past your password. But it being public and discoverable would meant you’d constantly be getting hit with a bunch of automation scanning your ports. And the security risk isn’t the concern, it’s more the heavy traffic slowing down your connect from them. It sounds like you’d be fine from a security stand point. But you’d have to put up something to block the traffic.
You could always self host, use that when you’re at home or connected to home through VPN and use it for more personal searches, and then use public instances when you’re connected to other vpns for more general or vague searches. Mixing and matching like that will at least add some noise and make you less identifiable. Kind of best of both worlds.
As a semi-simple compromise it would be cool if there was some way to have the cycling between different Searx instances be done automatically. E.g. either as a browser feature/browser extension, or as some private self-hosted interface to which I send my requests and which then selects the server at random from some subset of the list on searx.space. Or, while a bit hacky, the easiest way could be to do this on the DNS level. Should be doable with just one or two existing tools, with standard tools even.