Elevated API Errors
Incident Report for Balena.io
Postmortem

We’ve observed another reoccurrence of the previous database connectivity issue.

@13:40 PM PST DB swap usage spiked and settled at around 5GB

  • DB free memory dropped to around 4GB

@15:10 PM PST we were alerted that the API was failing to respond

  • DB connections were topping out at around 800

@15:14 PM PST database was failed-over to the standby instance and primary instance rebooted
@15:21 PM PST API deployment was restarted as the existing containers were still reporting errors
@15:27 PM PST the API began to respond to requests

Things we’ve tried so far:

  • examine DB monitoring metrics
  • attempt to correlate specific queries with times leading up to the outage
Posted Jan 20, 2021 - 01:06 UTC

Resolved
We're experiencing an elevated level of API errors and are currently looking into the issue.

incident @21:40 UTC -- Major API
Heavy database queries blocking new connections for being established.
Posted Jan 14, 2021 - 21:40 UTC