The Fediverse Desperately Needs Sustainable File Hosting

sosodev@lemmy.world · edit-2 17 days ago

The Fediverse Desperately Needs Sustainable File Hosting

NuXCOM_90Percent@lemmy.zip · 17 days ago

We totally need sustainable file hosting. Freedom!

Wait… the fuck did you just upload? Oh god. Oh god no. Do I have to call the cops on you? Oh no. Wait, does this count as possession? FUCK!!!

We need someone else to handle the totally sustainable file hosting. Freedom!

Kalkaline @leminal.space · 17 days ago

Yep, there needs to be moderation tools that can be quickly deployed to stop the illegal/immoral/evil stuff from spreading and taking over self-hosted servers.

NuXCOM_90Percent@lemmy.zip · 17 days ago

And moderation of this kind of content almost always sounds like torture when you hear about what facebook and the like are outsourcing.

Theoretically, this is a good problem for computer vision/machine learning. But there are a LOT of false positives (I think it was Aftermath who did an article on a study of when a nipple becomes female?). And… what ethical responsibility do you have to report on the fiftieth time that SheIsReallyAnEightThousandYearOldDragon_6969 uploaded CSAM? And how quick do you think people are going to lose faith in you and start wondering if you’ll also report on the rampant piracy?

And… there are also false negatives. At which point you find out you have been hosting something truly heinous for the past few months… possibly when local law enforcement tells you.

Like a lot of things: it sounds great. But nobody in their right mind is going to host this for free. And once you start accepting money you start opening yourself up to a LOT of regulations.

rigatti@lemmy.world · 17 days ago

A nipple becomes female precisely when it wants to!

grue@lemmy.world · edit-2 16 days ago

Nipples are wizards, confirmed.

Kalkaline @leminal.space · 17 days ago

It doesn’t even have to be to that extent, just being able to slow/stop that awful content from being uploaded by a bunch of malicious bots. Even if it’s not malicious content, you could still have people uploading spam that eats up the server space.

Lost_My_Mind@lemmy.world · 16 days ago

Illegal I can begrudgingly agree with. Even though I am a proponent for piracy, I will conceed that for growth’s sake, the tools need a clear well defined path to moderation.

That being said, who’s to say what IS immoral and evil?

In the republicans minds, porn is evil and should be banned. Trans rights are evil and should be banned. Abortion is evil and should be banned.

I disagree with all those claims. I do not think any of them are immoral, or evil.

I think pineapple on pizza is wrong, and evil. Some agree, others don’t. If I had my way, promoting of pineapple on pizza would be banned.

Now, who’s to say what is, and what isn’t evil? I think the only clear line to a moderation approach is to have a clear, unquestionable set of rules. These rules are to be based on public laws.

Everything else, I feel you should have the freedom to do as you wish. But also, I believe other people that you don’t agree with should be free to do as they wish.

You may never know how someone feels, or understand their perspective, but as long as they aren’t breaking laws, I feel they should have the ability to feel that way consequence free.

I may not like that you put pineapples on your pizza, but I feel that you should have the right to enjoy it. Even if it goes against MY views as to what constitutes a REAL pizza! Much to my surprise, pineapple on pizza ISN’T illegal. So you should have the right to enjoy it…

And yes. I did take the most pandtentic example I could think of, in order to display the absurdity of the concept of how easy it is to accept others rights in this world that don’t affect you.

Now just apply that same concept to every other example in the world. Then take into consideration that by using vague undefined terms to define your rules, you create grey area that’s easy to exploit. Who’s to say what IS evil? Adults told their teenagers in the 1950s that Elvis was evil. Parents in the 1920s told their teenagers that jazz was evil.

We need to define the terms that define our rules.

hendrik@palaver.p3x.de · edit-2 16 days ago

In the federated world it’s the moderators and admins who get to make the rules and/or decide what they deem appropriate. It’s as simple as that.

Kalkaline @leminal.space · 16 days ago

I would say you go to the extremes and work back from there for what’s immoral/evil.

grue@lemmy.world · 16 days ago

Seems to me that this is a use-case ~~Freenet~~ Hyphanet would be good for, both because it distributes the problem of file storage load and because it eliminates responsibility for each host to police his node by making it impossible for anyone to know which file chunks said node is hosting.

NuXCOM_90Percent@lemmy.zip · 16 days ago

Nothing solves the problem of CSAM quite like… making everyone partially culpable in the storage and distribution of CSAM.

You can’t prove I was hosting child porn. Statistically, we all only had a 70% probability of having it on our computers

grue@lemmy.world · 16 days ago

Stuff that isn’t accessed eventually gets deleted. If the Lemmy instances (which are clearnet, of course) delete the references to it, it would go away.

NuXCOM_90Percent@lemmy.zip · 16 days ago

Which gets back to volunteers going through and moderating it. And the ethical and moral question of whether people who upload it are reported.

And… honestly? if there is even a 20% chance that running a file sharing node (because I just love to give away both bandwidth and storage…) is being used to store CSAM? I ain’t doing that shit and most people will similarly run screaming and call the cops.

hperrin@lemmy.world · edit-2 17 days ago

Ok, hear me out.

We find the users with the slowest internet and start sending them all the data. They don’t have to keep anything on disk. Then they send it all back and forth between each other. Any time a user makes a request, we just wait for one of the slow nodes to come across the data and send it out.

We use the slowest wires for all the storage. It’s fool proof.

sosodev@lemmy.world · 17 days ago

Somebody actually did make this as a joke years ago haha https://github.com/yarrick/pingfs

Dojan@lemmy.world · 16 days ago

I was brushing my teeth when reading this comment and inadvertently ended up swallowing all my toothpaste.

Darth_Mew@lemmy.world · 16 days ago

don’t forget to spit

hperrin@lemmy.world · 17 days ago

Ha! That’s awesome!

NaibofTabr@infosec.pub · 17 days ago

You jest but… delay line memory

themoonisacheese@sh.itjust.works · 17 days ago

https://youtu.be/JcJSW7Rprio

hperrin@lemmy.world · 17 days ago

This is amazing and I love it.

grubbyweasel@sh.itjust.works · 17 days ago

Top 5 YouTube videos this one

M0oP0o@mander.xyz · 16 days ago

Too wet for server racks in the forest.

sosodev@lemmy.world · 16 days ago

They grew there

M0oP0o@mander.xyz · 16 days ago

Look I know there called “farms” but like I told the last forest gnome, the dank woods is no place to host data.

poVoq@slrpnk.net · edit-2 17 days ago

Have you considered providing something like this: https://jortage.com/ and maybe contribute to their efforts to develop a specific API for that? Source code is here: https://github.com/jortage

sosodev@lemmy.world · 17 days ago

Jortage is a really interesting approach. It definitely helps reduce the impact of the file hosting problem but it doesn’t fully address the underlying cost issue. The cost of storing files grows every month indefinitely while donations typically don’t.

I would like to see a file hosting pool come to lemmy though. So I will look into it. :)

poVoq@slrpnk.net · edit-2 16 days ago

Pict-rs that is used by Lemmy to store images already supports S3 type storage, so in theory it should work with Jortage, but I don’t think anybody has tested that yet. The people behind Feddit.org might have experimented with it as they expressed interest a while back.

Deebster@lemmy.ml · 16 days ago

I think the major advantage is the deduplication - when an image goes viral across Mastodon (or Lemmy) it’s currently stored hundreds or thousands of times, each with its own cost. Do you dedupe (for either your customers’ benefit or your own)?

sosodev@lemmy.world · edit-2 16 days ago

Are the images duplicated when shared? My understanding is that only a link to the file is replicated across servers and duplication comes from users manually uploading the same file to another server.

My website does not do any deduplication at this time.

Deebster@lemmy.ml · edit-2 16 days ago

Yes, for example go to https://infosec.exchange/explore

I see the top post as https://infosec.exchange/@[email protected]/113433063621462027 and the image is https://media.infosec.exchange/infosec.exchange/cache/media_attachments/files/113/433/063/582/671/258/original/71da3801e4e4f08c.png

The link is to the original on https://files.mastodon.social/media_attachments/files/113/433/062/676/773/993/original/f828afef5cc7ed1c.png but when you click image the javascript loads a modal with the local cached version (same image as the thumbnail that infosec.exchange loads.

There’s lots of different codebases across the fediverse so perhaps some hotlink, but local copies is the default.

sosodev@lemmy.world · edit-2 16 days ago

The Lemmy server config indicates that is an optional setting to improve user privacy so requests don’t ever hit the original server from the client. Those cached files are only temporary and will be deleted after some time. So it’s not really full blown duplication.

The default setting is to only generate the thumbnails and store those locally (indefinitely?) but even that can be turned off. I checked and it appears that lemmy.world has the thumbnail generation disabled so all images from other instances just link to the original on that instance.

Deebster@lemmy.ml · 16 days ago

Ok, so Lemmy doesn’t cause the same amount of duplication, but I’d still argue that dedupe is valuable: it saves on hosting costs (your costs, in this case) and users will get a small advantage in having slightly higher cache hits.

sosodev@lemmy.world · 16 days ago

For sure, I’ll add it to the list. :)

mlg@lemmy.world · 16 days ago

IPFS?

Ludrol@szmer.info · 16 days ago

as I stated in this comment it’s not really feasible as to ~5s delay that was tested some time ago.

Lemmchen@feddit.org · 14 days ago

That’s the wrong comment.

sosodev@lemmy.world · 16 days ago

What would an IPFS solution look like here? That’s a genuine question. I don’t have much experience with IPFS. It seems like it isn’t really used outside of blockchain applications.

Fuck Yankies@lemmy.ml · edit-2 16 days ago

The sustainability of it is questionable. If I’m not mistaken, IPFS is based on Ethereum, which has gone over to proof of stake rather than proof of work, but it’s still a pretty cumbersome system.

We’re talking about something that needs to compete with Quic and CloudFlare. I’m not sure that Ethereum or even crypto itself is efficient enough as a content delivery method, that IPFS - though a nice idea - is unrealistic.

But that’s just speculation from someone who has zero knowledge behind IPFS as a technology and protocol, so take it with a grain of salt.

EDIT: honestly, why qualify with “I’m not sure” when besserwissers and their alts roam the fediverse instead of going to therapy. Smh. Give the people a Tl;Dr at least. I’m not here for long form content.

Scio@lemmy.world · edit-2 16 days ago

IPFS has absolutely nothing whatsoever to do with Ethereum, or indeed any blockchain. It is a protocol for storing distributing and addressing data by hashes of the content over a peer to peer network.

There is however an initiative to create a commercial market for “pinning*”, which is blockchain based. It still has nothing to do with Ethereum, and is a distinct project that uses IPFS rather than being part of the protocol, thankfully. It is also not a “proof of work” sort of waste, but built around proving content that was promised to be stored is actually stored.

Pinning in IPFS is effectively “hosting” data permanently. IPFS is inherently peer to peer: content you access gets added to your local cache and gets served to any peer near you asking for it—like BitTorrent—until it that cache is cleared to make space for new content you access. If nobody keeps a copy of some data you want others to access when your machines are offline, IPFS wouldn’t be particularly useful as a CDN. So peers on the network can choose to pin some data, making them exempt from being cleared with cache. It is perfectly possible to offer pinning services that have nothing to do with Filecoin or the blockchain, and those exist already. But the organization developing IPFS wanted an independent blockchain based solution simply because they felt it would scale better and give them a potential way to sustain themselves.

Frankly, it was a bad idea then, as crypto grift was already becoming obvious. And it didn’t really take off. But since Filecoin has always been a completely separate thing to IPFS, it doesn’t affect how IPFS works in any way, which it continues to do so.

There are many aspects of IPFS the actual protocol that could stand to be improved. But in a lot of ways, it does do many of the things a Fediverse “CDN” should. But that’s just the storage layer. Getting even the popular AP servers to agree to implement IPFS is going to be almost as realistic an expectation as getting federated identity working on AP. A personal pessimistic view.

Fuck Yankies@lemmy.ml · edit-2 16 days ago

TL;Dr. From Wikipedia

IPFS allows users to host and receive content in a manner similar to BitTorrent. As opposed to a centrally located server, IPFS is built around a decentralized system of user-operators who hold a portion of the overall data. Any user in the network can serve a file by its content address, and other peers in the network can find and request that content from any node who has it using a distributed hash table (DHT).

So it’s BitTorrent in the web browser… thanks. How is that to be competitive with CloudFlare and Quic again? It has the same network issues that the blockchain has, in that it will be cumbersome and slow - for anyone else that doesn’t have millions to throw into infrastructure. Welcome to the same problem again, but in a different way.

Scio@lemmy.world · 16 days ago

Ironically, because there’s no UDP in browsers, we can’t actually get proper p2p on the web. WebRTC through centralized coordination servers at best. Protocol Labs has all but given up on this use-case in favor of using some bootstrapped selection of remote helper nodes.

C126@sh.itjust.works · 16 days ago

anyway to use torrent protocol somehow? Like popcorn time did?

tiddy@sh.itjust.works · 16 days ago

Ipfs would be similar but more purpose based

C126@sh.itjust.works · 16 days ago

Never heard of that for, it looks exactly built for this problem, better than torrent. Good call.

nutsack@lemmy.world · edit-2 16 days ago

this is actually a super interesting idea of which i have never seen proof of concept. other than maybe freenet or something.

TriflingToad@sh.itjust.works · 16 days ago

if it was easy to set up id definitely host a terabyte of Lemmy. As a pet.

moseschrute@lemmy.world · 15 days ago

What would you name your pet

tehn00bi@lemmy.world · 16 days ago

Is file hosting really a must? I mean Reddit and feddit are basically forums. And not many forums allow file uploads. Also, we should have retention limits. Low value posts are allowed to fade away. High value posts that have some level of interaction stay alive longer.

hendrik@palaver.p3x.de · 16 days ago

A lot of pictures and memes get posted here. And every other post shows a thumbnail picture. These images are all files.

tehn00bi@lemmy.world · 16 days ago

Not denying that. But maybe we should accept that photos and memes and whatnot aren’t that valuable and limit their size or the volume allowed per user. Just a thought.

hendrik@palaver.p3x.de · edit-2 16 days ago

Yeah, I wonder if that would fly with the users. I just scrolled through my timeline and nearly every post has some colorful image to it. (except in Ask Lemmy and No Stupid Questions.) I’m not sure if users would accept this platform if it were mostly textual. And putting restrictions in place would certainly reduce the number of images. Scrolling through Lemmy would feel like Hackernews, not any modern social media platform. I doubt mainstream people appreciate that.

But yeah, that’d be possible. We could just close the meme communities for example. Or exclude them from individual instances to save some space there.

nasi_goreng@lemmy.zip · 16 days ago

limit their size or the volume allowed per user

Fedi software like Misskey already did exactly that. Each user have limited “drive” which can be upgraded/customized per user. People even reuse image they already have on their drive, so it won’t be any duplicate files.

Azzu@lemm.ee · 16 days ago

Reddit is basically entirely image or video posts, all hosted by reddit directly.

sean@lemmy.wtf · 13 days ago

People forget why Imgur was created

abff08f4813c@j4vcdedmiokf56h3ho4t62mlku.srv.us · 17 days ago

You’re not the first to think about this.

See https://aumetra.xyz/posts/the-fedi-ddos-problem - there an embed server is proposed, to be shared by multiple instances (ideally a great many would use just the one), which can host things like image files and previews.

cum@lemmy.cafe · 16 days ago

There’s a big issue with this.

If malicious content like CP gets uploaded on to a server, obviously other servers do not want this to be replicated to their servers. So how would you solve this problem? Well they could give all moderation power to the original server they’re replicating, but that could be far too slow or they could even miss malicious content like this. Or maybe they even disagree about taking down certain things.

Another solution is that any server participating in the content mirroring could take it down for just themselves or for all the other members as well. The issue here is now you’re expanding moderation abilities, while also giving the other servers much more responsibilities.

It’s not as simple as wanting to replicate content. If you host it, you are responsible for any illegal content a user may upload to it. Not to mention laws vary by country as well. Ignoring the technical challenges here, it’s also mandatory that the other servers replicate the other servers data to also choose to be responsible for what gets uploaded. And that is a really big ask. The law doesn’t care about the technical reasons, they’ll just see illegal content uploaded to your server.

abff08f4813c@j4vcdedmiokf56h3ho4t62mlku.srv.us · 16 days ago

This issue already exists, regardless of the embed server problem. Right now, images posted by users to an instance get sent to that community’s instance and then copied to all instances of all subscribers.

If anything, the embed server provides a potential solution - rather than federate the image directly, simply link to the copy of the image on the embed server. (I’ve done some customized code changes on top of pyfedi to implement this idea there.)

I imagine instance admins would still want to to monitor and delete links to CP, but under this idea only the admins of the embed server and their delegates would have the ability to remove CP from the embed server itself. (Should they delegate this ability to other instance admins? Probably only on a case-by-case basis at most.)

Perhaps they could support a reporting functioning from mods and instance admins though…

tofuwabohu@slrpnk.net · 17 days ago

Interesting approach, good luck! Admittedly I’m not sure if many users want to take their media uploading in their own hands and pay for it but maybe I’m wrong. Where are the images stored? Do you have your own hardware? Backups etc?

Also since you’re interested in Fediverse media storage, I recently read about https://jortage.com/ It’s a third party storage for your instance with deduplication, pretty interesting idea. Takes away a bit of the federated part though

sosodev@lemmy.world · edit-2 17 days ago

The files are uploaded to two separate S3 buckets. One is backed by Wasabi and the other is Backblaze. So if one fails, randomly bans my account, etc then I can switch the primary to the other and setup another mirror afterwards.

Compute is hosted by fly.io and the CDN is bunny.net

JaggedRobotPubes@lemmy.world · 17 days ago

This feels like something the Fediverse is ultimately going to build for itself. I know jack squat about the details, but it’s gonna have to be a thing eventually, I think.

HulkSmashBurgers@reddthat.com · 17 days ago

I think IPFS could help the fediverse with storage.

osaerisxero@kbin.melroy.org · 17 days ago

I expect something like this to end up being the solution, but I think we’re far from a consensus in that regard.

hendrik@palaver.p3x.de · edit-2 16 days ago

I think most architecture design decisions are made by the developers of the fediverse projects. If the 3 Lemmy devs or the Mastodon maintainers agreed to do it… (And it’s technically feasible.) I suppose it could be done.

I mean as long as it works seemlessly and doesn’t violate ActivityPub, we don’t really need a consensus of all the users and admins. We just need the server admins to install the next update.

cum@lemmy.cafe · edit-2 16 days ago

To actually keep data persistent on IPFS and not be deleted by the garbage collector, you need to have a server(s) pin the node that holds that data.

You either host these servers yourself, or pay providers to store it for you.

And at that point you just reinvented a server simply hosting your data but with extra steps.

sosodev@lemmy.world · 16 days ago

Thank you for pointing that out. I’m not familiar with IPFS but I tend to agree there’s no free lunch here. People think you can wave the blockchain wand and free computing appears but there’s always costs built in somewhere.

bulwark@lemmy.world · 17 days ago

I wish there was some version of PBS for Lemmy, like public funds for hosting. I’ll admit I haven’t really thought this through, so there’s probably some problems with my idea.

Trainguyrom@reddthat.com · 10 days ago

At least as far as US law is concerned, a federally hosted and administrated social media platform gets interesting with America’s unusually strong free speech laws, since there’s content which is legal but unethical which they likely would not be allowed to block or moderate, such as bullying, hate speech, misinformation, etc. but also illegal content would be immediately moderated away, which might include content that falls into legal grey areas or ethical but technically illegal content, like someone copy/pasting the contents of a paywalled article, or discussing any kind of DRM or digital security bypass

Honestly I think there’s good reason for governments to host a Mastodon instance for their representatives to use for communications, but inviting the public to use it might get weird for sure

bulwark@lemmy.world · 10 days ago

Oh yeah, I totally agree with you that governments should at least host their own Mastodon instances. I thought it was weird when Twitter became the go to for communication from the US Government.

cum@lemmy.cafe · edit-2 16 days ago

Personally I’m in the camp that I want history to be lost. That’s part of the appeal to me. In fact my favorite feature in the fedi is Mastodon’s option to enable auto-deleting posts of a certain age.

Only content that is explicitly pinned or reaches a certain amount of interactions should be saved imo. Since that’s the stuff you’d actually want to preserve rather than the 99% of forgettable content, and it would also drastically cut down on file hosting.

Another thing is that a federation should only act as the exchange between users on ActivityPub. It should only cache relevant information and not be expected to store everything, like I wrote before. The user should be a portable account that is stored on a device. The federation server would sync your account between your devices, but not store it. You send your content to the federation, and then the federation sends it out into the world where they choose to do what they want with it. The federation shouldn’t hoard it indefinitely.

Also this makes sense from a privacy perspective. If you care about privacy, why would you also want all your data indefinitely stored? Unless certain things are relevant and explicitly kept, it should be expected to expire and be lost by default. Where did we get this expectation that data should be stored forever? Also you expect it to be stored forever and not be trained on by AI?

This comment for example, after about a week or two most of the visibility and interaction of it will drop to zero. At that point, this comment should expire and no longer exist. I wrote this comment, it reached some people, and served it’s purpose and should expire. I’m not going to pretend like this comment is some kind of historic document that should be indefinitely preserved, nor do I expect or want it to be.

Ludrol@szmer.info · 16 days ago

Can you judge a work of art by it’s virality? Should you judge by virality?

A lot of times in history artists got recognition they deserved only after their death. When they ware alive they lived in poverty struggling to make ends meet.

There is a lot of internet 1.0 preserved by internet archive that I didn’t get to experience. There are flash games that I would love to preserve and show the next generation.

We wouldn’t have known how Scotts Cawthon games have looked like before he made FNAF if not for the preservation efforts.

MrMakabar@slrpnk.net · 16 days ago

Usually those artist did get some recognition during their life, but never got into the main stream. That changed due to the main stream changing and the people who did like the art showing it again. That is actually rather easy to do with something like the Fediverse. It just requires a download option. Especially when everybody is aware, that the content will be deleted, that would be a decent option.

Also a lot of content on social media in general is very short term. Stuff like politcal discussions are fairly useless after a few months in most cases. So that can be deleted without much care and again, if somebody wants to preserve it, they easily can just download it.

Lemmchen@feddit.org · 14 days ago

This comment for example, after about a week or two most of the visibility and interaction of it will drop to zero. At that point, this comment should expire and no longer exist.

That’s an incredible naive and egoistic take. Think about all the knowledge that is getting lost by applying this approach. How many times have you searched for some obscure thing and found the answer only on some five years old reddit post? That information would be lost for ever if you had your way.

cum@lemmy.cafe · 14 days ago

I think the massive privacy benefits outweigh things like that, which should be documented properly anyways

Valmond@lemmy.world · 17 days ago

I think Tenfingers could be an interesting option as hosters do not know what they host, the data can be modified, and it’s 100% decentralised.

TORFdot0@lemmy.world · 16 days ago

Is a p2p system for media with the instances just hosting magnet links too slow for fediverse purposes? To me this seems like the most resilient way to handle media in a decentralized system

ayyy@sh.itjust.works · 16 days ago

If a social network is to take off, it must be accessible from mobile devices behind CGNAT (carrier grade network address translation).

pinkystew@reddthat.com · 16 days ago

Why?

ayyy@sh.itjust.works · 16 days ago

Because that’s where all the users are. The “social” aspect of a “social” network. Anyone can host a forum but it’s useless without users.

56!@lemmy.ml · 16 days ago

p2p from behind a CGNAT works just fine as long as a single server is accessible and can mediate connections between other peers. Most non-servers are behind some sort of NAT these days.