Elevated API Errors

Incident Report for balena.io

Postmortem

The database master experienced very high CPU utilization during this incident. To address the increased utilization, we upgraded a database node and performed a failover to it. We are currently investigating the reason behind the poor performance of the API during this incident. It appears to be linked to a fleet update, as indicated by the slow queries observed throughout the incident.

Posted May 29, 2023 - 14:07 UTC

Resolved

This incident has been resolved.

Posted May 27, 2023 - 13:45 UTC

Monitoring

A fix has been implemented and we are monitoring the results.

Posted May 27, 2023 - 13:42 UTC

Identified

We have found that the performance of the underlying storage in our master database has degraded. Consequently, we executed a failover to one of our replica instances and are currently monitoring the performance of the promoted database. However, it's important to note that the API response time may still be slower due to the number of retrying transactions being processed.

Posted May 27, 2023 - 11:53 UTC

Investigating

We're experiencing an elevated level of API errors and are currently looking into the issue.

Posted May 27, 2023 - 11:47 UTC

This incident affected: API and Dashboard.