Outage - Livestreams are down in some regions

Incident Report for Livepeer Studio

Postmortem

Summary

This is a post-mortem describing the incident being investigated on 05/20/24 https://status.livepeer.studio/incidents/tdrw49vj8y87

Incident

Description

Users reported frequent rebuffering and the inability to start or view streams. Upon investigation, the Livepeer Studio team discovered that this issue affected all regions. Our primary cloud storage provider reported an outage on May 20 from 15:11 UTC to 15:59 UTC. During this outage, a bug in the retry mechanism for uploading recordings caused several servers to lock up and become unresponsive.

Impact

  • Livestreams:

    • Current livestreams in some regions (London, Frankfurt, Stockholm) would experience rebuffering
    • New livestreams in some regions may not have been able to stream
  • Viewers:

    • Current and new viewers in some regions experienced a high rebuffer rate
  • VOD:

    • Uploading assets and live recording will take a long time to process

Current status

The service has been fully restored

https://status.livepeer.studio/

Timeline

  • 11:00 AM EST - An internal alert triggered an investigation by the Livepeer Studio team to identify and find the cause of this alert
  • 11:11 AM EST - A status alert from our storage provider notified us of an outage in one of the US regions
  • 12:05 PM EST - An investigation led to an outage by our storage provider at 11:11 AM EST indicated as one of the reasons for this incident
  • 12:18 PM EST - After monitoring the fix for the incident, the Livepeer Studio team concluded that the issue was resolved

Prevention

  • We enhanced our retry mechanism and implemented additional failover solutions. These solutions correctly switch to our secondary backup storage provider if the primary storage provider experiences any outages.
Posted May 24, 2024 - 17:51 UTC

Resolved

This incident has been resolved.
Posted May 20, 2024 - 16:18 UTC

Investigating

We are currently investigating this issue.
Posted May 20, 2024 - 15:28 UTC
This incident affected: Livepeer Streaming API.