The database master experienced very high CPU utilization during this incident. To address the increased utilization, we upgraded a database node and performed a failover to it. We are currently investigating the reason behind the poor performance of the API during this incident. It appears to be linked to a fleet update, as indicated by the slow queries observed throughout the incident.
Posted May 29, 2023 - 14:07 UTC
Resolved
This incident has been resolved.
Posted May 27, 2023 - 13:45 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted May 27, 2023 - 13:42 UTC
Identified
We have found that the performance of the underlying storage in our master database has degraded. Consequently, we executed a failover to one of our replica instances and are currently monitoring the performance of the promoted database. However, it's important to note that the API response time may still be slower due to the number of retrying transactions being processed.
Posted May 27, 2023 - 11:53 UTC
Investigating
We're experiencing an elevated level of API errors and are currently looking into the issue.