YSK: Your Lemmy activities (e.g. downvotes) are far from private

Muddybulldog@mylemmy.win · edit-2 2 years ago

YSK: Your Lemmy activities (e.g. downvotes) are far from private

Wander@yiffit.net · edit-2 2 years ago

To anyone surprised at this: welcome to the fediverse, please treat everyhing you do or say as public.

The way to achieve privacy around here is by following the long forgotten arts of the old internet before Facebook was a thing: use a Nick name and don't tell strangers on the internet your real identity.

Your home instance will act as a proxy and only they have access to your email and IP address. That does stay private.

So, as long as you trust your home instance to not leak or disclose your connection or sign up data (which would be illegal in EU countries), just sign up with an alias.

A very positive aspects of this is that it should allow us to detect voting manipulation by correlating the activity of certain potentially malicious actors. If Lemmy instances take vote manipulation seriously and do their best to block bots this has the chance to make Lemmy / Kbin much more transparent and credible than Reddit ever was.

booty_flexx@lemmy.world · edit-2 2 years ago

To illustrate op’s point I’m going to spin up an instance, federate with everyone, and not tell anyone what that instance is.

Then I’m going to feed all that data into my new website, called Open Lemmy Stats, where anyone can query the user data ive accumulated. The homepage will be ripe with insights, leaderboards and all kinds of data on prolific users.

Additionally, I’ll display a snapshot/profile of a random user by feeding that users data to GPT4 to make inferences about the user’s political affiliations and display the results.

Worst of all, I’m not going to out my instance for everyone to know it as the one to defederate. In fact I’m spinning up a few instances that will host innocuous communities that I plan to mod and support to give my instances cover for their true purpose: redundant fediverse datastreams for my site, Open Lemmy Stats.

I’ll also have a store where anyone can buy my collected fediverse data for a handsome sum.

Just kidding I’m not doing any of this. But someone absolutely will or already is.

TimewornTraveler@lemm.ee · edit-2 2 years ago

Edit: Obligatory RIP my inbox.

Can we leave this kinda stuff behind? It is NOT obligatory.

ScaNtuRd@lemmy.world · edit-2 2 years ago

Not to sound harsh or anything, but those of you saying that it’s okay that all this data is public are insane. This completely goes against the entire philosophy of the Fediverse and FOSS in general. The reason we all are fleeing from Big Tech is because they collect so much data on us. At least, they keep it hidden from public view. This is a major issue in my opinion, and needs to be addressed ASAP before we can claim to have superior platforms on the Fediverse. Why can’t this data at least be encrypted?

deweydecibel@lemmy.world · edit-2 2 years ago

Reading these comments, seeing so many excuses, sarcastic responses, and handwaving, makes me realize a great deal of users really need to develop some imagination.

This is not about privacy. It’s about data that can easily be used for targeting and profiling users, and how that creates countless avenues for targeted harassment and wide scale retaliation. It’s about all of the innumerable ways public vote information can and will be abused to manipulate scoring across the site with targeted/automated shadow banning and shared blocklists. Raise your hand if you trust every single admin to never abuse such a tool to curate the outward appearance of an instance to fit a narrative.

For a different example: I could say something about how great Nazis are right now, and have a bot programmed to read every single person that downvoted me, add those names to a shared blocklist, and viola, I’ve made myself and all my alts invisible to the people that would challenge me on a massive scale.

I promise you this is going to be a big issue as tools for this site get more sophisticated over time.

zeus ⁧ ⁧ ∽↯∼@lemm.ee · edit-2 2 years ago

alternatively, if votes were private, you could spin up a bot network to mass upvote your comment; making it far more influential as most people are more inclined to believe statements they think others also feel. thankfully, votes are open, so you can’t

as long as there is a system, people will try to game the system; and when there is a new system, people will come up with new games

Darkassassin07@lemmy.ca · 2 years ago

While I agree this shouldn’t be so publicly accessible, I’m curious about the possible benefits of limited sharing between instances to give spam/bot detection tool’s more power.

Users on A vote on a post on B. The admins from A and B can see the fine details of who did what, but the admins of C (and all of the general users regardless of instance) just see totals of up/down votes.

QuadratureSurfer@lemmy.world · 2 years ago

Ideally, detecting bots should be up to the Admins. They should have access to the vote information, and they can share the tools with other admins to detect it. But the average user should not have unrestricted access to this data.

sauerkraus@lemmy.world · 2 years ago

The average user can run their own instance as an admin.

QuadratureSurfer@lemmy.world · 2 years ago

Let me be a little more clear, the Admins of your account’s particular instance should be the only ones that have access to your votes.

Now the question remains about when your account posts/comments into a different instance, who should have access to those votes? Perhaps your instance has a way of obfuscating the votes of any user coming from your instance, or else only the admins of the community that you’re posting into will have access to your votes?

The problem really comes down to how we avoid the problem with duplicating votes. Currently this is easy as each vote is public so every instance can verify the correct vote count. But implementing either of the solutions above will need a way to verify the correct number of votes.

To top it off you would also need a way to detect if a malicious instance had come along and started lying about how many votes had been cast.

One thing we can look at under the hood would be how cryptocurrency works as they have solved both the problem of duplicate values as well as the ability to trust those values being sent. All of the code is free and open source so we can pick out the parts that we need and reuse it. (And no, I’m not telling people to go out and buy crypto).

Z Cash would be a particularly good one to look at as it ensures a “zero knowledge” (or “zero trust”) method of sending the values across “nodes” (or in our case “instances”). Using this, who is voting on what would be hidden, but we could ensure that the values are correct.

Additionally you could probably throw out the second hashing algorithm altogether and just keep the Blake2b hashing algorithm as this one is far more efficient and quick to compute (and that second algorithm was mostly thrown in to prevent people with specialized hardware from being able to come in and beat anyone else running on just a GPU/CPU). https://github.com/zcash/zcash

However, using this particular method would make it so that not even the instance admins would be able to view the details of anyone’s votes (which may be a good thing after all if we decide that any random instance admin is not to be trusted).

sauerkraus@lemmy.world · 2 years ago

There’s no need to complicate things by bringing crypto buzzwords into it. It’s already been solved faster, better, and easier just like everything else cryptobros invent a problem for.

QuadratureSurfer@lemmy.world · 2 years ago

The crypto example was only a suggestion because they have simply solved the exact same problem we are looking at: duplicate votes (transactions) and verifying the results while being able to hide it.

I would love to hear any other suggestions that people may have that solve these problems. Copying open source code from crypto isn’t the only option. So let’s look for solutions instead of dismissals (unless you’re arguing for keeping votes public of course).

neuromancer@lemmy.world · edit-2 2 years ago

deleted by creator

Boz (he/him)@lemmy.one · 2 years ago

I agree with you about harassment issues, and the importance of controlling the transfer of admin-level data between instances, but for your last scenario, doesn’t blocking only apply to users who are logged in? Assuming your hypothetical tankies and Nazis were actually posting as well as blocking, it would be easy to find them just by logging out, and there are a lot of ways to get them banned or otherwise counteract their activities that don’t require someone to interact directly with them while logged in. The case you’re describing is not the kind of situation where the most important action is to argue with them. Arguing with extremists usually just validates their delusions, and encourages them to keep doing what they’re doing.

RyanHx@vlemmy.net · 2 years ago

People raise a good point that in countries where political dissent can actually be dangerous, this would very much dissuade people from voting on things they believe in, or even coming anywhere near Lemmy period.

A better approach I think would be to have the user’s host instance save their votes (the database obviously needs to remember what you voted on), but when federating those votes with other instances just hand over a cumulative total, e.g., “here on vlemmy.net we have +18 votes for this comment”, which the other instances can then add. There’s no need to send user information with that data.

deweydecibel@lemmy.world · 2 years ago

The problem that Reddit realized early on is that user voting is the engine behind the content aggregation. That aggregation is one of the main selling points of Reddit. The more users vote on what they see, the more information Reddit has for how to aggregate that content. That’s what keeps the front page fresh, that’s what keeps content moving up and down on the site. In a very real sense, the voting is the heart pumping blood through the site.

So it behooves the site to not give any reason for users not to vote how they feel. Keeping votes private was part of that. It is one of the most basic tenets of democracy: the only way to give people the freedom to vote honestly and frequently is to give them the privacy to do it.

The potential for retaliation against users, in any number of conceivable ways, far outweighs any benefits that come from making votes public.

The voting information also makes it insanely easy to automate mass blocking of any opinion under the sun. Nobody in this thread seems to grasp all the things you can do with that data to manipulate user interactions on this site. If you think troll armies are bad, wait till those troll armies have a shared automated block list of every single person that has ever downvoted them.

Feirdro@lemmy.world · 2 years ago

Agreed, especially because I believe we’re headed for a repressive regime here in the US in about 2 years.

Places like this will need to get very careful if they want to remain bastions of free speech and places where people can come to find the information that will no longer be available in mainstream channels.

nicholas@lemmy.world · 2 years ago

Lemmy is not a bastion of free speech lmfao

Paradox@lemdro.id · edit-2 2 years ago

Pretty easy to make an instance that would auto vote certain things with suspicious amounts of votes

As it stands now, they have to fake the origin of some of those votes. Not much of a barrier, the fediverse generally accepts any user an instance says exists, but still, it’s a barrier

And of course any instance thats blatantly manipulating votes is going to be defederated, but I’m more concerned with an instance that behaves normally until it encounters a keyword or user is been set to, and then gives their posts a -5 or whatever

Distributed@lemmy.world · edit-2 2 years ago

This was my thoughts as well. I understand the need for an audit trail.

Would be very easy to build up an interaction graph with this data that could be used for fingerprinting. If this is an issue for you, though, just browse without signing in/interacting

Was just thinking about this more though, and unfortunately there can also be rogue instances that allow bot users to be created and interact with other instances posts, so this issue could still persist.

plumbercraic@lemmy.sdf.org · edit-2 2 years ago

Could replace the usernames with UUIDs, and keep the username-UUID map back on the source instance? Then you get an audit trail, but not associated with user identity. There’s also no guarantee that people don’t use bob_jones as their username, and this is Personally Identifiable Information, which brings up some GDPR stuff too.

Muddybulldog@mylemmy.win · 2 years ago

The problem with that is that every interaction that any user has with a post or a comment would require calls back to the home instance in order to lookup those usernames. That’s a LOT of extra load

JackbyDev@programming.dev · 2 years ago

There is no reason you couldn’t only do it for votes and not for posts and comments.

kolorafa@lemmy.world · 2 years ago

That would allow to fake votes, as I can tweak my instance to spew any number I want.

Zyansheep@lemmy.fmhy.ml · 2 years ago

Can probably fake votes anyway by faking usernames right? Harder, but still doable 🤔

astral_avocado@lemmynsfw.com · 2 years ago

I think those users who live under oppressive governments should be used to using tools like Tor and accounts with a proton email to interact on the internet.

Virtual Insanity @lemmy.world · 2 years ago

There is a fundamental misunderstanding here.

Our data has never been ‘invisible’… We’ve just trusted that places like Reddit and their staff will do the right thing. That’s literally how it already works.

If you sign up for Reddit, Reddit staff can see your posts and votes if they want to.

If you sign up for a private forum the admin there can also see database contents.

One way encryption is not possible without stopping functionality… If data about you was encrypted then posts you make couldn’t be displayed. If you include a means to decrypt then there was no point encrypting anyway.

This is how it’s always been, and Lemmy doesn’t change this status quo much.

A faceless corporation that has had access to your data is just replaced by a variety of admins distributed across instances.

This isn’t a good or bad thing, the potential for abuse does exist, but when we have literally made agreements with places like Reddit that they can use and sell our data… then what difference does it make it an admin takes a peek?

It wouldn’t be great… but nothing is perfect.

It’s still worth working on however, to see if a better solution can be found, but at this time I’d say just be aware that it is possible that your data can be seen and understand the only safeguard against that if you need to communicate something private would be to use direct messaging with end to end encryption.

Muddybulldog@mylemmy.win · 2 years ago

I’ll contribute that my intent with this post is not evangelism. I like the voting system and would be disappointed to see it disappear.

A vote in Reddit was, from a practical perspective, anonymous. While it was recorded in the database and admins had access to this information there were mitigations in place to deter abuse and the end result was that the person you up or down voted was not going to know that YOU, personally, downvoted them. It was also of limited value to external data sifters in creating social graphs.

Since Lemmy votes are non-anonymously propagated across the Fediverse and, literally, anyone can be an admin there are people who may want to reconsider whether they upvote or downvote a particular post or comment. The actual reasons may vary; they don’t want to be outed as sympathetic to a political view or cause, they don’t want it used a social graph for targeted advertising or even spear-phishing. In many cases there will be people who don’t care at all.

Just trying to contribute to transparency. Not everyone can read code, sift data or visualize how a social network would work behind the scenes. There’s plenty of opportunities for others to use our data, good and evil. I believe that efforts to bring to light non-obvious consequences of actions is good citizenship.

Virtual Insanity @lemmy.world · 2 years ago

I agree with everything you’re saying, but it’s frustrating that people are jumping to conclusions to think this is deliberate, nefarious etc…

Lemmy, being a federated system has different practical realities to Reddit. You can’t have a federated system with multiple instances each with their own admins, and have it function without cutting off data flow. For voting to work in a federated system, vote data must flow, and people need to understand this.

Reddit was **not **a federated system, so there was no need for vote data to flow, and people also need to appreciate this difference.

The only solution is to remove voting. It’s as simple as that.

Maybe long term a system could be devised.

I’m not in denial, i do firmly believe that is is an issue, and that it WILL be abused by someone. But I’m also a realist, and the features we have can’t survive without voting data. People need to be aware of this, i think it’s fair that everyone knows the risks. At an individual level people can choose not to vote, and thus have no vote data associated with them, but i suspect there might be more than vote data, i don’t know for sure without looking at the code, but I suspect saved posts might be a privacy concern.

Personal opinion, i think abuse will happen, but it will be limited, just a feel i have. I do however suspect this abuse will exponentially ramp up if lemmy gets big traction.

quintium@lemmy.world · 2 years ago

The problem is that it’s actively worse than Reddit. While only Reddit employees can access your data and it’s being sold to the highest bidder, Lemmy sells your data for $0.00.

Anyone can become an instance admin through their own instance, so your voting data is pretty much unprotected. That is the opposite of privacy. I get that it’s a consequence of the fediverse, but then it just may not be the solution to social media.

Virtual Insanity @lemmy.world · 2 years ago

Your choice of wording is driving me take you less seriously. You sound passionate though so I’ll explain.

Actively worse? Your use of of the word active implies that something deliberately malicious is happening. It’s not the case, this issue is a side effect of how lemmy works. It is an issue, it is a concern and it does need addressing, but your hyperbole is unwarranted.
Lemmy isn’t selling anything… it’s a piece of software. This is the most false and malicious claim you’ve made. If our data were to be used nefariously then it would be the actions of a rouge server admin.

The definition of privacy is somewhat flexible. Nothing is private unless end to end encryption is employed. And nothing like lemmy can work with end to end encryption. So there is the dilemma. Yes it sucks… yes voting should be private. How about you propose a solution? Because at this point, outside of shutting everything down there is none.

The technical fact is the software must be able to reference data in a database to then create this page you are viewing, this text you are reading and the votes you are seeing.

Possible lemmy / server side solution…

remove voting
remove accounting of voting (this can’t really work as without connecting a vote to a user, any user can upvote or down vote something unlimited times.

Possible user side solution… Don’t vote Simply not participating in voting means there is no voting data tied to you; and this is, believe it or not an actual valid solution if you are concerned.

Ultimately for lemmy to work some tradeoffs are required. I do agree that where there is some gain to be made, someone will abuse the system, I’m not naive enough to say there is no problem here… there is. It’s just that yelling at the wall isn’t going to fix it. I’ve exhausted myself trying to think of a solution, and the only real and workable solution i can think of is as i said above, that voting be removed, or that you simply don’t participate in voting.

So if you’re really concerned and want an immediate solution effective for you then don’t vote on anything ever. I’m not saying this to be a prick, but as a piece of legitimate advice. If no vote data exists for you then no one can harvest your voting preferences.

czech@no.faux.moe · 2 years ago

Activities are public and easily viewable on kbin. It’s been interesting. Seems mostly positive other than people harassing those who down-vote them demanding explanations.

CoolSouthpaw@lemmy.world · 2 years ago

Oh no, so my upvotes on c/spacedicks aren’t private?

/s

dukk@programming.dev · 2 years ago

Couldn’t we just use a hash for the usernames instead?

Nothing too over the top, but just a simple hash and match that instead?

Also, there’s way too much trust in instances. Like, one person could easily make a post on lemmy.world, go on their personal instance, and just give themselves, say, 2000 upvotes.

Instances should have their own settings on what instances are allowed to keep a local copy. (Default behavior should be to get the post itself from the instance “hosting” it).

chris@l.roofo.cc · 2 years ago

If that is a solution you’d need to change the ActivityPub specification. You are more than welcome to submit your idea.

Also, there’s way too much trust in instances. Like, one person could easily make a post on lemmy.world, go on their personal instance, and just give themselves, say, 2000 upvotes.

I’d first have to create 2000 users, then I’d have to send 2000 upvotes. And then I’d get blocked by all instances.

Instances should have their own settings on what instances are allowed to keep a local copy.

This is also not compatible with the ActivityPub spec but even if it were you’d win nothing because as soon as you fetch the post it is still on the server.

lalo@discuss.tchncs.de · 2 years ago

Hey, just curious: how would all the instances discover this type of fraud?

dukk@programming.dev · 2 years ago

They’d have to check the upvotes, notice most of them came from one instance, look at the instance, check multiple users, and if they realize that these users were just created to get upvotes then they can defederate. However, it’s too big of an assumption that moderators will go through that kind of effort to validate all the upvotes.

Serinus@lemmy.ml · 2 years ago

It’s a lot easier to fake a hash than a username. If I’m an instance owner and I suspect another instance of this, I can grab a random username and check their post history. Pretty easy to see rampant fraud that way.

If you’re putting something out on the internet, even upvotes or clicks, expect it to be public.

grimsolem@lemmy.dbzer0.com · edit-2 2 years ago

Couldn’t we just use a hash for the usernames instead?

The hash function would still need to be public to share data between instances.

dukk@programming.dev · edit-2 2 years ago

That’s the point of a hash function. You have a public hash function, say SHA-256. It’s easy to check a username against it’s hash, but virtually impossible to reverse the hash back to the username.

Edit: Instead of storing, say, eddie, we’d store 3b9d8298f1b5086d012618feebb2da1a394357c1dab7523443c9f6a743c4c84d. Then when the instance gets a Like from eddie, it hashes his username to get 3b9d8298f1b5086d012618feebb2da1a394357c1dab7523443c9f6a743c4c84d, realizes there’s a match, and doesn’t update the count.

Note that when given 3b9d8298f1b5086d012618feebb2da1a394357c1dab7523443c9f6a743c4c84d, it would take millions of CPU years to compute the original username from it. Therefore, we can check for duplicates without actually checking the name itself (a similar method is used for checking passwords; Lemmy is open source, we know the hashing algorithm, but we can’t unhash user passwords, only check them).

quintium@lemmy.world · edit-2 2 years ago

While there is an enormous amount of possible passwords, there is only a limited (and quite small) amount of users. Couldn’t you just hash all the usernames one by one and map the hashes to the usernames? So you could still reverse engineer the usernames of those who voted on a post.

Edit: Salting with the post id would make this attacking process harder, but still realistic. Probably the only real solution is to hide the votes table from federated instances, I’m not sure if that brings technical problems.

dukk@programming.dev · 2 years ago

That was what I was implying, yes.

Just hash each username and store it. Then just check the usernames hash to see if it matches.

quintium@lemmy.world · 2 years ago

I was more comnenting that you could still reverse engineer the users who voted on a post

dukk@programming.dev · 2 years ago

Actually, you’re not really wrong.

All the more reason to give out limited data to all other instances. Why do these instances really need this data? Mastodon doesn’t need it, not quite sure why Lemmy does it.

quintium@lemmy.world · 2 years ago

Yeah I don’t understand why every instance can’t keep track of their own votes privately. Sure, voting manipulation is a thing, but it’s possible regardless.

Honestly I really hope Lemmy does something to address this issue. Otherwise it’s kind of a dealbreaker for me.

sab@lemmy.world · 2 years ago

If anything, wouldn’t that make vote abuse even easier? Just send 100 upvotes with 100 random hashes.

sab@lemmy.world · edit-2 2 years ago

Also, there’s way too much trust in instances.

I say there’s too much care about votes. Because someone can just give themselves infinite votes from their private instance, it makes it all the more worthless.

Instances should have their own settings on what instances are allowed to keep a local copy.

There’s a setting for that, it’s called the allowed list - configures who are allowed to federate with you. Beyond that - if it’s out, it’s out.

dukk@programming.dev · 2 years ago

Votes are the only real way currently to gauge opinion about the post itself. IMO, if the votes system is so bad that people are starting to completely disregard it, there’s something wrong.

kennydidwhat@lemmy.world · 2 years ago

There’s something amusing about people feeling violated by their activity being made public, but not necessarily by corporations hoarding and capitalizing on that activity & data. I mean, one of them is out in the open. The other is pure abuse.

gravitas_deficiency@sh.itjust.works · 2 years ago

Woah woah woah. Hold the phone. You’re telling me that things that I post… on the internet… are… PUBLIC???

AncientMariner@lemmy.world · 2 years ago

Not post, upvote. I find it interesting that you like Asian Babes (obviously you don’t, it’s just some information you wouldn’t expect to be public or shared).

trachemys@lemmy.world · 2 years ago

All the more reason to keep a different alt for each area of interest.

OmniGlitcher@lemmy.world · 2 years ago

Ah yes, because the practical option is to be constantly switching accounts and instances based on what you want to look at for 5 minutes each.

Boz (he/him)@lemmy.one · 2 years ago

I find it’s possible to be logged into two instances on the same browser, so it doesn’t need to be more difficult than switching tabs. (That may change, I don’t know whether it’s technically desirable, but if it’s relevant to someone’s interests…)

fuckyou_m8@lemmy.fmhy.ml · 2 years ago

OK so let’s tell the regular user “hey come to lemmy, but don’t forget to keep multiple accounts because here everyone can spy on you”

That’s not a good message

sproketboy@lemmy.world · 2 years ago

Here’s an upvote to add to the database.

eierkuchen@feddit.de · 2 years ago

I like how you two think

two_wheel2@lemm.ee · 2 years ago

Now I only upvote in comments so they’re super public.

!UPVOTE

intensely_human@lemm.ee · 2 years ago

I’ll just use my short username then

Jerkface (any/all)@lemmy.ca · 2 years ago

Holy shit. HOLY SHIT.

I just realized what this actually MEANS.

It means that when you like or dislike something so much that you unvote and then vote a second time, people can tell. This will change karma forever.

17000HerbsAndSpices@lemmy.world · 2 years ago

“I wish I could upvote this twice” is officially a reality. We should have moved here ages ago!

Muddybulldog@mylemmy.win · 2 years ago

That is a use case I had not considered. Excellent thinking.

stevedidWHAT@lemmy.world · 2 years ago

I’m gonna write a script that randomly injects upvotes and downvotes and then also posts responses after a random amount of time to prevent time correlations. People freak out most often when they don’t understand something, I continue to explore everyone to think things through logically and try not to assume new subjects are anything other than new and likely to cause some confusion/incorrect assumptions

sebi@lemmy.world · edit-2 2 years ago

So any instance admin can analyze all users upvotes/downvotes and possibly derive political standpoints, likes/dislikes, opinions and location data from it

Muddybulldog@mylemmy.win · 2 years ago

Yes.

Just muddling around I’ve built queries that: (a) list all of my post & comments, everybody who voted on them, and their votes (b) tally how many times specific users have upvoted or downvoted me. © identifies the most prolific voters across the Fediverse and the communities they are voting in (d) identifies users with the same username or display name across all instances and correlates the activities across those accounts.

These are all for the sake of learning and are innocuos the way I’m using them. It is plain to see that someone with skills and an agenda could make more out of it than I have.

Pizzacheese4@lemmy.world · 2 years ago

How is this different than any other website?

madsen@lemmy.world · 2 years ago

I can’t just spin up a website and automatically get that info from other websites, but I can spin up a lemmy instance and get that info from everyone it’s federated with.

sebi@lemmy.world · 2 years ago

I agree, someone has to store and maintain your data, but giving all instances access to it is a risk that could be avoided