API not responding
Incident Report for Livepeer Studio
Postmortem

API Outage April 29, 2024

Summary

This is a post-mortem describing the incident being investigated on 04/29/24 https://status.livepeer.studio/incidents/2mklfnf2hqbf

Incident

Description

Internal alerts notified the Livepeer Studio team of the high utilization of memory and CPU resources within the queuing system. A required update to the queuing system, previously tested successfully in the staging environment, was necessary. However, upon deployment into production, it became apparent that the upgrade had become stuck, leading to the issue.

Impact

  • Livestreams:

    • New streams could not stream
  • Viewers:

    • Only existing streams can be viewed

Regions:

  • Europe (Sweden/Russia), North America (Los Angeles/New York), South America (Brazil)

Current status

The service has been fully restored

https://status.livepeer.studio/

Timeline

  • 7:52 AM EST - The Livepeer Studio team was alerted of an incident related to API’s not responding
  • 7:57 AM EST - The investigation from the Livepeer Studio team led to tasks in the AMPQ being disconnected and backed up. This caused high consumption of CPU and memory which led to tasks being timed out
  • 9:10 AM EST - The Livepeer Studio team automatically upgraded the queuing system, which became stuck during the upgrading and caused this issue
  • 8:23 AM EST - The Livepeer Studio team has a fix in place and monitored the systems
  • 9:55 AM EST - After monitoring the fix for the incident, the Livepeer Studio team concluded that the issue was resolved

Prevention

We are conducting broader audits and revamping our queue utilization practices.

Posted May 01, 2024 - 16:26 UTC

Resolved
This incident has been resolved.
Posted Apr 29, 2024 - 11:30 UTC