Elevated API Errors
Incident Report for Balena.io
Postmortem

An update to the backend VPN configuration caused balenaOS os-config service on hundreds of thousands of devices to pick up the new configuration from the API at the same time and attempt to reconnect to the new VPN backend instance.

The resulting thundering herd effect caused system instability lasting a number of hours. All services were eventually restored. We are now looking at improvements on the OS side to prevent this from happening in the future, such as adding random delays ensuring device updates are staggered.

Posted May 08, 2022 - 03:14 UTC

Resolved
This incident has been resolved.
Posted May 08, 2022 - 03:04 UTC
Update
We are continuing to monitor for any further issues.
Posted May 08, 2022 - 03:04 UTC
Update
Devices should begin reconnecting to the backend. This will take some time..
Posted May 08, 2022 - 02:28 UTC
Update
We are continuing to monitor for any further issues.
Posted May 08, 2022 - 02:12 UTC
Update
We are continuing to monitor for any further issues.
Posted May 08, 2022 - 00:26 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted May 08, 2022 - 00:21 UTC
Identified
We're experiencing an elevated level of API errors and are currently looking into the issue.
Posted May 08, 2022 - 00:09 UTC
This incident affected: API, Dashboard, Device URLs, and Cloudlink (VPN).