Live streaming Degradation

Incident Report for Livepeer Studio

Postmortem

Incident on 12/19/2024

Overview

On Thursday, December 19th, 2024, an issue arose with live streams and video-on-demand (VOD) while operating on our infrastructure, resulting in disruptions across the New York, Chicago, Miami, and Madrid regions. The problem impacted streaming services and user experience within these regions.

Incident Details

An issue within the Livepeer infrastructure triggered the disruption during ongoing maintenance and configuration updates. This caused interruptions in services, including:

  • Streaming Availability: Users in the New York, Chicago, and Miami regions experienced interruptions or inability to access streams.
  • VOD Availability: Users in the Madrid region experienced interruptions or inability to upload or access videos.
  • Playback Availability:

    • Some livestream playback sessions failed to start, disconnected unexpectedly, or experienced buffering.
    • Some assets were not able to be playback.

Resolution

After identifying the root cause our team implemented a fix. The solution addressed the service disruptions and restored normal operations across all affected regions.

Mitigation Steps

To mitigate the impact and ensure a smooth recovery, we:

  • Isolated Affected Services: Redirected workloads to unaffected regions to minimize user impact.
  • Applied Fixes: Implemented configuration updates and restarted the affected service in the affected regions.
  • Monitored Service Restoration: Closely monitored infrastructure recovery to ensure stability.

Root Cause

  • Primary Cause: Configuration changes in Livepeer infrastructure triggered service disruptions.

Impact Assessment

  • Users Affected: Users in New York, Chicago regions experienced streaming interruptions.
  • Service Downtime: Approximately 3 hours before all services were fully restored.
  • Impact Scope: Regional degradation of streaming services with no data loss reported.

Next Steps

  • To prevent configuration changes from impacting Livepeer’s service, we will prioritize implementing processes and tools to monitor, validate, and maintain stability both during and after these changes.
Posted Dec 19, 2024 - 18:32 UTC

Resolved

We have isolated the issue and implemented a fix.
Posted Dec 19, 2024 - 17:18 UTC

Investigating

There are some servers not responding, we are currently investigating the root cause.
Posted Dec 19, 2024 - 15:51 UTC
This incident affected: Livepeer Studio Ingest and Playback (Chicago (MDW) ingest and playback, New York (NYC) ingest and playback, Miami(MIA) ingest and playback).