We're building a search engine to compete with DuckDuckGo. No JS, no WASM, no spying. Just a statically generated results page.

UnHidden@lemmy.world · 2 years ago

We're building a search engine to compete with DuckDuckGo. No JS, no WASM, no spying. Just a statically generated results page.

Onno (VK6FLAB)@lemmy.radio · 2 years ago

I love the notion. The marketing “better than DDG” is a little janky. Perhaps consider a positive statement, like “finally find what you’re looking for”.

This is a crowded landscape. I’ve been here since Gopher and seen plenty of services come and go. With that in mind, here are some questions you might want to consider:

How does it compare with products like SearXNG, specifically their ecosystem of plug-in search types?

How do you plan to pay for it?

How do you expect to protect the index against spam?

How will you scale it to a global audience?

How will you handle language?

Good luck!

UnHidden@lemmy.world · 2 years ago

To answer your questions in order:

We have our own index, its not a shitshow of mixed results like Searx tends to be. this also means that we’re not chasing breaking changes of some larger engine when they decide they dont want us, like Twitter did to Nitter, and Bing did to Searx.
We don’t know how to monetize. Ads are the only option that we know of, donations do not work at all, as proven by my previous projects.
We’ve already got spam prevention and removal measures in place, but I won’t discuss them.
We don’t know how to scale it since its centralized by design and the frontend and backend are tightly integrated, largely because the frontend is largely generated on the fly by the backend. Maybe host a copy for each region we’re aiming to acquire users from?
Our engine already understands 5 languages, and we hope to expand to CJK languages soon.

hydroptic@sopuli.xyz · edit-2 2 years ago

We don’t know how to monetize. Ads are the only option that we know of, donations do not work at all, as proven by my previous projects.

A subscription-based model might be the only viable one, since ads will inevitably lead to a conflict of interest and voluntary donations are mostly a no-go. The problem is that people are so used to the notion that everything is “free” that many are convinced that online services should always be free and balk at the idea of paying for anything.

Personally I pay for Kagi which has been decent enough

Amerikan Pharaoh@lemmygrad.ml · edit-2 2 years ago

I mean, a search engine is literally the last thing on the internet I’d pay a subscription for. In a world where literally everything else nickels-and-dimes us for subscription service, search engines, torrent trackers, game modders who paywall their mods, and other kitschy non-essentials are literally the first things to get shuffled off the monthly budget.

If we weren’t in such a deep recession that I pay as much a week for my gas as I do my groceries, with rent and ACTUAL bills eating the majority of what’s left, I’d feel a bit differently; but if wishes were horses, we’d all ride. I literally had to start growing my own green rather than buying it, the economy’s so shit.

KoboldCoterie@pawb.social · 2 years ago

The problem is that people are so used to the notion that everything is “free” that many are convinced that online services should always be free and balk at the idea of paying for anything.

A huge part of that is that most people don’t consider privacy concerns to be a cost. All they factor into their evaluation is whether it costs them actual money.

Orbituary@lemmy.world · edit-2 2 years ago

Personally I pay for Kagi which has been decent enough

Whats “decent enough” mean? I’ve been curious and you’re the only person I’ve known who pays for it.

pacmondo@sh.itjust.works · 2 years ago

I pay for it, the results are quality and the fact that my brain doesnt have to sift through ad results and can just look at the real data is so nice. Additionally, they have a large number of “lenses” which can change the scope of your search. For example, they have a lens for searching lemmy as well as lenses for the “small web”, which filters out all the results from massive corporate websites and gives way more personal project sites and the like.

All in all I’m a fan.

nick@midwest.social · 2 years ago

Seconded this. Been paying for a long time now, no regrets.

sudneo@lemmy.world · 2 years ago

I personally like a lot the gazillion bangs also available, the personal up/downranking/blocking of websites and their quick answer is often fairly good (I mostly use it for documentation lookup). The lenses are definitely the best feature though, especially coupled with bangs. I converted even my wife who really loves it.

ParetoOptimalDev@lemmy.today · 2 years ago

I never thought id pay for Kagi and that paying for a search engine was ridiculous. Then I kept seeing loudly positive feedback from reputable people in my circle and tried the trial.

I pay for it and never have the “I only ever use !g on duckduckgo” problem.

Sorting by web pages with least ad trackers is a cheat code to find old style websites with people sharing knowledge for knowledge’s sake rather than profit.

DominusOfMegadeus@sh.itjust.works · 2 years ago

Just my two cents, but I keep trying it out and I have not seen anything good enough to warrant paying for it. And I am not against paying for privacy, I pay Proton.

WetBeardHairs@lemmy.ml · 2 years ago

You could let people host their own as a method of scaling. But that limits it to geeks like us.

Use kubernetes and let it scale and pay for hosting on cdns.

ProdigalFrog@slrpnk.net · 2 years ago

Ahh, you’re the guys who posted over in reddit before your thread got locked that think it’s a good idea to promote Russian propaganda equally with Ukrainian content, because you don’t want to ‘Take sides’ politically. Closed source too, so that’s pretty much a dealbreaker right there, especially for Privacy focused users. We’ve been abused by closed source software for far too long to trust anything less.

You also have absolutely no plan on how to monetize, as others have said in this thread already.

I certainly won’t be supporting you, not with those values.

PrincessLeiasCat@sh.itjust.works · 2 years ago

Thank you for taking the time to point this out.

ExtremeDullard@lemmy.sdf.org · edit-2 2 years ago

I applaud your efforts and I admire your idealism.

Unfortunately, the minute you get the bill from your internet provider, you’ll need to find a way to pay for it, and your good intentions will instantly dissolve in the murky realities of modern corporate surveillance capitalism.

But at least while you haven’t gotten your first bill, it’s refreshing to watch your enthusiasm.

sugar_in_your_tea@sh.itjust.works · 2 years ago

pay for it

I wonder what a distributed search engine would look like. Basically, the index would be sharded across user computers, and queries would hit some representative sample of that index. This means:

hosting costs are very low - just need a way to proxy requests to the network
search times should improve as more people use the service
no risk of the service logging anything - individual nodes don’t need to know who requested the data, just who to send the response to

My biggest concern is how to build the index, but if OP is willing to share that, I might start hacking on a distributed version.

grue@lemmy.world · 2 years ago

Don’t start new; contribute to what already exists: https://en.wikipedia.org/wiki/YaCy

sugar_in_your_tea@sh.itjust.works · 2 years ago

Awesome! That’s pretty much exactly what I’m looking for, though I’m interested to see how easy it is limit certain peers to certain functions. Not everyone has resources to crawl and index pages, but a lot of people can store the index.

I’m interested in having client-side web storage, so you can participate in the network by just having the search page open (opt-in of course).

I’m honestly not actively working on it, but if OP provides the database and/or crawler, I’ll do some research on feasibility.

Waraugh@lemmy.dbzer0.com · 2 years ago

This is really neat and I’m just hearing about it after over twenty years of development. I need to try it out, thank you. How do you stay in the know about this kind of stuff? I’m curious about all the cool stuff out there I wouldn’t even know I’m curious to find.

grue@lemmy.world · 2 years ago

How do you stay in the know about this kind of stuff?

By being terminally online, I guess?

More concretely, I’ve spent (probably too much) time on Slashdot, Reddit and now Lemmy over the years (subscribed to Free Software and privacy-related communities in particular). Also, looking through sites like https://awesome-selfhosted.net/ and https://www.privacytools.io/, wiki-walking through articles about Free Software projects on Wikipedia, browsing the Debian repositories, etc.

I’m sure there are plenty of things I haven’t heard of either, though.

ElectroVagrant@lemmy.world · edit-2 2 years ago

How do you stay in the know about this kind of stuff? I’m curious about all the cool stuff out there I wouldn’t even know I’m curious to find.

I was going to mention YaCy as well if nobody else was, so I can chip in to this somewhat. My method is to keep wondering and researching. In this case it was a matter of being interested in alternative search engines and different applications of peer to peer/decentralized technologies that led me to finding this.

So from this you might go: take something you’re even passingly interested in, try to find more information about it, and follow whatever tangential trails it leads to. With rare exceptions, there are good chances someone out there on the internet will also have had some interest in whatever it is, asked about it, and written about it.

Also be willing to make throwaway accounts to get into the walled gardens for whatever info might be buried away there and, if you think others may be interested, share it outside of those spaces.

octopus_ink@lemmy.ml · 2 years ago

I wonder what a distributed search engine would look like.

Isn’t that what Searx is/can be?

https://en.wikipedia.org/wiki/Searx#Instances

I admit it’s not something I’ve looked closely at.

grue@lemmy.world · 2 years ago

No, Searx is a metasearch engine that queries and aggregates results from multiple normal search engines (Google, Bing, etc.)

A distributed search engine would be more like YaCy, which does its own crawling and stores the index as a distributed hash table shared across all instances.

octopus_ink@lemmy.ml · 2 years ago

Ah thanks - appreciate the clarification.

sugar_in_your_tea@sh.itjust.works · 2 years ago

Exactly. The main difference I would bring is a web client that hooks into the network, and perhaps an alternative client (e.g. I’m interested in Tauri, so I may rewrite part of the BE to Rust).

But I’m probably not going to start on this project on my own. DDG is good enough for now, so I’m putting my efforts elsewhere.

sqw@lemmy.sdf.org · 2 years ago

i feel that decentralized search is an extremely valuable thing to start thinking about. but the devil is in practically every one of the details.

sugar_in_your_tea@sh.itjust.works · 2 years ago

Yup. Even if you trust all your peers (which isn’t reasonable), there’s still a ton of practical issues that need to be resolved:

pagination with a different set of peers
moderation of CSAM and whatnot
outdated peers and stale data
how much data and where are results reduced

It’s a really complex problem without getting p2p involved, and p2p just adds a ton of other problems.

So I’m probably going to stick with building my Reddit clone, which I think is simpler (search doesn’t need to happen at the start).

pixelscript@lemmy.ml · 2 years ago

My thoughts exactly when reading this.

I believe people when they claim to develop free software. Often because it’s software the dev wants for themselves anyway and they’ve merely elected to share it rather than sell it. The only major cost is time to develop, which is “paid” for by the creation of the product itself.

You (OP) are proposing a service. Services have ongoing fees to run and maintain, and the value they create goes to your users, not you. These are by definition cost centers. You will need a stable source of funding to run this. That does not in any way mix with “free”. Not unless you’re some gajillionaire who pivoted to philanthropy after a life of robber baroning, or you’re relying on a fickle stream of donations and grants.

You indicate in other comments you will not open the source of your backend because you don’t want it scooped from you and stealing your future revenue. That’s fine, but what revenue? I thought this was free? What’s your business model?

It sounds like what you want to do here is have a free tier anyone can use, supported by a paid tier that offers extended features. That’s fine, I guess. But if you want to “compete with DuckDuckGo”, you are going to need to generate enough revenue to support the volume of freeloaders that DDG does. If your paid tier base doesn’t cover the bill, you will need to start finding new and exciting ways to passively monetize those non-revenue-generating users. That usually means one or more of taking features away and putting them behind the paywall to drive more subscriptions, increasingly invasive ads on the platform, or data-harvesting dark patterns.

Essentially what I’m saying here is, as-proposed, the eventual failure and/or enshittification of your service seems inevitable. Which makes it no better than DDG long term.

It is, at any rate, a very intriguing project.

UnHidden@lemmy.world · 2 years ago

For now we’re going to host on residential connections, and if any ISPs ban us, we’ll just find other ISPs

fishos@lemmy.world · 2 years ago

Yeah, when you say stuff like this, it shows how woefully unprepared you are for the realities of this. You can’t scale, can’t self host for long, don’t see a way to pay for this… When I can already pay Kagi for a fully working, excellent service, why would I choose you? This is guaranteed to crash and burn the moment your ISP tells you you can’t run a commercial grade server through your residential connection. They’ll either cap your bandwidth to unusable levels or disconnect you entirely. If you’re lucky you’ll have 1 or 2 other options to choose from, whom will blacklist you shortly after. Then, after you’re burnt through all the “easy” ways to host, all you’ll be left with is professional grade services that you admit you can’t afford.

Also, you make zero mention of user privacy. So what happens when you get your first subpoena? Or before that, why should I trust you with my data in general? What policies do you have in place to ensure my legal rights are protected? Do you even know what the legal rights are per state/country and how the location of where someone connects from impacts you? How are you gonna handle visitors from the EU with GDPR?

Nifty idea, but way too much “I’m gonna single handedly reinvent the wheel” vibes.

octopus_ink@lemmy.ml · edit-2 2 years ago

Would make Richard Stallman smile :)

If this is a closed source project, that statement doesn’t work even as a joke.

However, the screenshots looked good. :)

Possibly linux@lemmy.zip · 2 years ago

Richard Stallman cares more about what is running on your computer than he does about what is running on a server.

Fair point though

UnHidden@lemmy.world · 2 years ago

That comment is there specifically to drive engagement up with all of the people correcting me in the comments.

rar@discuss.online · 2 years ago

Ah, the 4chan method of engagement, right?

Railcar8095@lemm.ee · 2 years ago

Lying is a great way to get engagement in the post, and then see your project crash and burn.

I’m only interested in your rant in a few weeks when nobody cares.

octopus_ink@lemmy.ml · 2 years ago

Fair!

mypasswordis1234@lemmy.world · 2 years ago

I got so excited reading this post, but as I read that the project will not be open source, my excitement immediately faded away

wischi@programming.dev · 2 years ago

They won’t open source it because the rust code is very likely a joke. They are proud of just using two dependencies, don’t know that their “statically generated” stuff is actually called server side rendering and are hosting this stuff on a fuckin laptop.

It’s probably a project that will teach them a lot. But in practice their implementation is worthless to everybody else because they are obviously completely inexperienced.

That said, that project is likely not worthless to them because they will probably learn a ton of stuff why it’s hard to build a search engine.

Lunya \ she/it@iusearchlinux.fyi · 2 years ago

Would make Richard Stallman smile :)

source (code)?

Lemongrab@lemmy.one · 2 years ago

Closed source

octopus_ink@lemmy.ml · edit-2 2 years ago

Closed source

Yeah, not sure how they can include that line about Stallman with a straight face. That’s almost libel.

Robert7301201@slrpnk.net · edit-2 2 years ago

https://lemmy.world/comment/8535938

They just said that to “drive engagement”.

Lemongrab@lemmy.one · 2 years ago

Agreed

Lunya \ she/it@iusearchlinux.fyi · 2 years ago

Schwim Dandy@reddthat.com · 2 years ago

Please post when you’re ready for beta waters. I’m looking forward to seeing it.

Lemongrab@lemmy.one · 2 years ago

Closed source and privacy most of the time don’t mix. Or more so the privacy crowd and closed source doesn’t mix. You won’t see much support for your project if it remains like that. Maybe a source available but still closed license would be better. Think about your monetization strategy a bit as well. Consider having premium features and make it a freemium product.

Pantherina@feddit.de · 2 years ago

Wow this is great!

if you are using your own index, I think you could use a more economical approach to fight the spam bullshit of the modern web.

instead of using badness enumeration, crawling everything and filtering malware, use an opt-in principle
have a community method of gathering new trusted websites
use websites internal search functions to get more results
use categories to split up the websites, reinventing what people should find: general, news, navigation, science, politics, IT, technology (not code), art, music, philosohy, …
have an app or submission website where users can submit new websites, and some form of community control over it (kinda censorship but in a good way)

This could fix the web as it currently is, by rethinking what should be found, pushed etc. Rating websites by quality could also be helpful.

Also if you support payments in crypto or cash, there should be no problem to make it paid.

Fat Tony@discuss.online · 2 years ago

Looking at the picture, why are you guys using Brave?

Sotuanduso@lemm.ee · 2 years ago

I don’t know DuckDuckGo, but what’s the purpose of trying to compete with it? This is not a rhetorical question. Is there something wrong with DuckDuckGo, something you feel you can do better, or are you just making a competitor for the principle?

space@lemmy.dbzer0.com · 2 years ago

Not OP, but there is value in having competition. DDG is just a bing front-end. The big search engines have a major problem with the quality of results going down, as the internet is SEOd to death. The companies behind these engines don’t seem to be very eager to fix it, they are just hoping to replace them with AI. We’ve also seen how these engines have been turned into ad platforms, which changes the incentives… Instead of ranking quality, they are ranking who pays more.

Taking a different approach to ranking results that isn’t ad driven, that can punish AI generated content and low quantity results would bring a huge value.

ShortN0te@lemmy.ml · 2 years ago

DDG is just a bing front-end.

That is wrong. Yes there are licensing the bing search database but it is not the only one they use. They have their own crawler too.

source

wischi@programming.dev · 2 years ago

“Only two crates used”. What’s great about reinventing the wheel? A closed source project with big claims trying to reinvent everything from scratch. Nice project 🤣

Mubelotix@jlai.lu · 2 years ago

Every dependency is a security hole

CameronDev@programming.dev · 2 years ago

Pages are statically generated

Can you elaborate on that? To me, statically generated would mean you are pre-rendering a html page for every possible search, which doesnt sound possible? Do you mean that its all server side generated (at the time of search)?

blujan@sopuli.xyz · 2 years ago

I think he means pages are presented as static html+css pages, generated dinamically on the back end

SorteKanin@feddit.dk · 2 years ago

Only two crates used - TOML and Rocket (plus Rust’s standard library)

This seems like a bit of a weird approach. There’s lots of existing nice Rust crates to build with, why use such a minimal approach?

Also Rocket has essentially been superceded by more mature frameworks like Axum.

UnHidden@lemmy.world · 2 years ago

Its about reducing attack surface and risk by minimizing dependencies

wischi@programming.dev · 2 years ago

Reducing the attack surface by not using well established and battle tested crates but reinventing the wheel inside this closed source project 🤣

SorteKanin@feddit.dk · edit-2 2 years ago

Well that’s a bit of a double-sided sword. Libraries also includes lots of failsafes built in that you’ll need to implement yourself then. And you’ll need to be confident that you don’t implement security issues in your own code instead of relying on widely used libraries. But it makes sense if you’re worried about supply chain attacks.

tanja@lemmy.blahaj.zone · edit-2 2 years ago

That’s a neat project.
You can be proud of your work 😊

But I for one won’t donate to your cause, as the software seems to be closed-source, and I already have DuckDuckGo & Google for my searching needs.

I genuinely believe that the only viable niches for new search engines are environmentally-friendly (e.g. Ecosia) or open-source.

Literally no one will pay for a closed-source search engine.

But I like your tech stack, and your project’s looking good.

One more thing: You claim to be against censorship; how will you combat spam & SEO farming?