Partial API Outage

Incident Report for balena.io

Postmortem

Between 00:35 GMT and 01:00 GMT on January 28, 2026, users may have experienced intermittent connectivity issues with the balenaCloud API. Consequently, users may have had difficulty using the balenaCloud dashboard, and some devices also faced delays in reporting their current state to the backend.

Root Cause: The incident was triggered during a planned configuration update to our environment. This update caused a brief, simultaneous restart of several core service components. As these services resumed, a high volume of concurrent device Cloudlink reconnections created a temporary surge in traffic. This surge exceeded the immediate processing capacity of our API backend, leading to increased latency and a temporary reduction in available service instances as they struggled to clear the request queue.

Resolution: Our engineering team monitored the API as it processed the initial backlog. Service stabilized as the surge of reconnection requests subsided and API instances returned to healthy operating levels. Normal service was fully restored by 01:00 GMT.

Follow-up Actions: We are reviewing the API deployment configuration and performance thresholds to better handle rapid spikes in traffic.

We sincerely apologize for the disruption to your workflow and appreciate your continued patience as we improve the resilience of our platform.

Posted Feb 26, 2026 - 13:36 UTC

Resolved

A configuration change causing multiple component restarts resulted in the API being unavailable for a few minutes.
Posted Jan 28, 2026 - 01:00 UTC