Amazon Prime Day traffic led to crashed servers and over an hour of outages

Amazon Canada header

Monday, July 16th marked Amazon’s annual Prime Day sale, offering discounts on products site-wide.

However, Amazon’s sites across the globe were immediately plagued by issues when the Prime Day sales went live at 3pm ET. Users were unable to browse products properly, with links getting stuck in an endless loop that lead back to the home page.

Now, a new report has shed some light on what exactly went wrong on Amazon’s end on Prime Day.

According to internal Amazon documents obtained by CNBC, Amazon was unable to keep up with the global surge in traffic, resulting in widespread server crashes. Further complications, external experts told CNBC, likely came from a failure in Amazon’s auto-scaling feature, which automatically detects traffic fluctuations and adjusts server capacity accordingly. These experts say a lack of auto-scaling functionality presumably forced the e-commerce giant to add servers manually to keep up with demand.

“Significantly higher than expected” order rates put too much pressure on Amazon’s internal ‘Sable’ system, which is used to power the company’s computer and storage services for retail and digital businesses, also caused glitches with related services like Prime, authentication and Alexa. The documents even state that some of Amazon’s many warehouses reported being unable to scan products or pack orders for a period of time.

Sable is already used by 400 teams across Amazon and handled a total of 5.623 trillion service requests, or 63.5 million requests per second, during last year’s Prime Day. The additional traffic brought in by Prime Day put an unmanageable strain on Sable’s already heavy workload.

According to the documents, the timeline of Amazon addressing the issues behind-the-scenes is as follows. With all of the issues occurring across its various platforms around 3pm ET, Amazon quickly decided to put up a temporary simplified homepage to reduce load on its servers. Shortly after, by 3:15pm ET, Amazon chose to temporarily cut off all international traffic to “reduce pressure” on its Sable system.

By 3:37pm, the company only re-opened its default front page to 25 percent of visitors. While Amazon was able to improve Sable’s performance at 3:40pm, it went back to “consider” blocking approximately 5 percent of “unrecognized traffic to U.S.” within two minutes. Despite these efforts, Amazon’s site “error rate” worsened until about 4:05pm ET, before finally experiencing a significant improvement at 4:10pm and eventually returning to normal shortly after.

As CNBC reports, Jeff Wilke, Amazon’s CEO of worldwide retail, noted in an internal email that his team was “disappointed” about the Prime Day issues. CNBC reports that a person familiar with Prime Day issues described Amazon’s office scene as “chaotic,” estimating that more than 300 people were brought into an emergency conference call at one point.

In the email, Wilke said Amazon is already working on ways to prevent any similar issues from happening in the future, although he stressed that Prime Day was still a success.“Tech teams are already working to improve our architecture, and I’m confident we’ll deliver an even better experience next year,” he wrote in the email.

Indeed, despite the fact that Amazon reportedly lost $1 million a minute during the outages, the company is still touting this year’s Prime Day as its biggest sale event to date. Altogether, more than 100 million products were sold during the 36-hour sale, with small and medium-sized businesses in particular bringing in over $1 billion in revenue.

While Prime Day might be over, here are 11 other tech-related deals that are still available.

Source: CNBC