Elevated Device SSH Errors

Incident Report for balena.io

Postmortem

On June 25, 2025 around ~17:20 UTC, a routine infrastructure deployment caused intermittent availability issues with Device URLs and web terminal access. Devices remained online and functional throughout, and CLI-based SSH access was unaffected.

The issue was caused by a configuration change that intentionally disabled several internal services no longer required by our proxy infrastructure. However, these services were still associated with pod health checks. A misconfigured override mechanism applied this change to production before it had passed through all required release gate checks, which would have caught the failing health checks.

The issue was identified quickly through automated monitoring and service was restored manually while a permanent fix was deployed. We have since corrected the underlying configuration override mechanism and are adding additional monitoring coverage to catch similar issues before they reach production.

We apologize for the disruption and thank you for your patience.

Posted Feb 25, 2026 - 18:42 UTC

Resolved

This incident has been resolved.
Posted Feb 24, 2026 - 20:54 UTC

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Feb 24, 2026 - 18:41 UTC

Identified

The issue has been identified and a fix is being implemented.
Posted Feb 24, 2026 - 18:29 UTC

Update

We are continuing to investigate this issue.
Posted Feb 24, 2026 - 17:29 UTC

Investigating

We're experiencing an elevated level of device SSH errors and are currently looking into the issue.
Posted Feb 24, 2026 - 17:27 UTC
This incident affected: SSH proxy.