Elevated GIT/Application Builder Errors
Incident Report for Balena.io
Postmortem

Over the past few months, we’ve been focussed heavily on improving the performance of the balenaCloud backend in order to scale with the growing number of devices joining the platform.

As part of that work, we’ve recently implemented cross-instance metrics throttling, to ensure that the cluster of API instances are aware of each other when throttling incoming device metrics. However, we’ve discovered a bug in our original implementation yesterday, which effectively invalidated the throttling gains, increasing the load on the backend database by a factor or 3-4.

We’ve now deployed a fix to correct and optimize handling of cross-instance metrics throttling across all of the API backends and have observed no further instances of unsustainably high backend load.

Posted Nov 10, 2021 - 15:37 UTC

Resolved
This incident has been resolved.
Posted Nov 10, 2021 - 00:26 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Nov 10, 2021 - 00:17 UTC
Update
We've deployed a fix to the backend ensuring better performance under increased load and are now re-enabling device metrics.
Posted Nov 09, 2021 - 23:00 UTC
Identified
The issue has been identified and a fix is being implemented.
Posted Nov 09, 2021 - 20:52 UTC
Update
We are continuing to investigate this issue.
Posted Nov 09, 2021 - 20:38 UTC
Update
While we continue to investigate, we've temporarily disabled device metrics.
Posted Nov 09, 2021 - 20:38 UTC
Investigating
We're experiencing an elevated level of errors in our application builder infrastructure and are currently looking into the issue.
Posted Nov 09, 2021 - 20:30 UTC
This incident affected: API and Application Builder.