An Amazon Web Services outage has been causing major disruptions around the world. The service provides remote computing services to many apps, websites, governments, universities and companies.

On Downdetector, a website that tracks online outages, users reported issues with Amazon Alexa, Amazon Prime, Snapchat, Ring, Roblox, Fortnite, online broker Robinhood, the McDonald’s app and many others.

  • despite_velasquez@lemmy.world
    link
    fedilink
    arrow-up
    1
    ·
    edit-2
    22 hours ago

    I think most companies don’t have a three nines SLA with their customers, yet were sold the idea that cloud (… and then serverless) should be the right decision for them.

    When the initial cloud migration happened I’ve seen a handful of startups and scale-ups go bankrupt doing lift and shift

    Don’t get me wrong, I agree with what you’re saying, my point is more towards the tribal consensus that was built in the tech community around 2016-2018 that the cloud is the future, for everyone, and that managing your own infrastructure is being a brute

    • NuXCOM_90Percent@lemmy.zip
      link
      fedilink
      arrow-up
      1
      ·
      edit-2
      21 hours ago

      With ANY of the “nines” notation, a good rule of thumb is to move the decimal point 2 or three spots to the left. But it is more the mindset and planning built around that.

      For MOST companies and products? “Shit broke, we’ll fix it in the morning” is 100% reasonable. But when you are big enough that you are on the front page of downdetector? EVERYONE comes out of the woodwork to insist you are horrible and mismanaged and blahdy blah blah. Which might actually have investor implications.

      Which is the other aspect. If I am going to pay a hosting company (with my business hat on), I need some uptime metrcis/guarantees. Violate those and I am expecting compensation. Violate those sufficiently and my bosses are going to have the lawyers see how much of our bad Q2 we can blame on the hosting company. And… there is a lot of value in the department head’s responsibility being sending angry emails to Amazon rather than figuring out what employee is getting fired… and if it is them.

      But yeah. I saw someone else make the joke of “on -> off -> on -> off” prem cycles but… that is kind of reality.

      When you are three people in a garage moonlighting in a way that you can pretend this all started after you all turn in your notice (seriously. One of my favorite goofing off activities is to check the repository of any company that actually has an open source project and laugh at how many MRs and commits were apparently done over the course of a month and TOTALLY weren’t rewritten for legal purposes)? Your very initial proof of concept might be a server in a closet but you very rapidly will shift to “the cloud” because you don’t have the resources for a full time IT person to even manage the VPS, let alone a rack.

      Then, as you get bigger, you hire that sysadmin and either switch to a VPS or on prem to save money. Then you get bigger still and realize that sysadmin’s team is as big as engineering and start looking for ways to cut/offload costs… which tends to be The Cloud.

      Then you get sufficiently large and have the kinds of customers where data protection is a full time job and start realizing it makes more sense to hire back the two or three competent sysadmins you had and rent some place in a data center. And THEN you get big enough that the entire world notices if you go down for 5 minutes and…

      And… yeah. A lot of companies will fail at one of those points. Partially because they don’t run the numbers and factor in their runway. But also because those tend to be when work structures are most taxed. A whiteboard where people grab index cards works until you have teams that might not be fully staffed by people with double digit percentages of the company stocks and so forth.