Elevated API Errors

Incident Report for Balena.io

Postmortem

We’ve observed another reoccurrence of the previous database connectivity issue.

@13:40 PM PST DB swap usage spiked and settled at around 5GB

DB free memory dropped to around 4GB

@15:10 PM PST we were alerted that the API was failing to respond

DB connections were topping out at around 800

@15:14 PM PST database was failed-over to the standby instance and primary instance rebooted
@15:21 PM PST API deployment was restarted as the existing containers were still reporting errors
@15:27 PM PST the API began to respond to requests

Things we’ve tried so far:

examine DB monitoring metrics
attempt to correlate specific queries with times leading up to the outage

Posted Jan 20, 2021 - 01:06 UTC

Resolved

We're experiencing an elevated level of API errors and are currently looking into the issue.

incident @21:40 UTC -- Major API
Heavy database queries blocking new connections for being established.

Posted Jan 14, 2021 - 21:40 UTC