Partial VPN and Device URLs outage

Incident Report for balena.io

Postmortem

When a device is connected to our VPN pool it gets assigned to an instance which we then store in our database so that we can resolve future device URL requests. When a device URL comes in it hits a VPN instance at random which checks if the device is connected to itself. In the case the instance does not have the device connected on to itself it would get the correct instance from the database and forward the request there.

Under the right circumstances the server to forward the request to could be the instance itself, causing a never ending infinite loop which lead to a large number of API requests and network traffic.

On the day of the incident we implemented a short term fix immediatelly and started developing a longer term fix. The long term fix was deployed on April 7th.

Posted Apr 09, 2019 - 17:27 UTC

Resolved

This incident has been resolved.

Posted Apr 01, 2019 - 14:35 UTC

Monitoring

A fix has been implemented and we are monitoring the results.

Posted Apr 01, 2019 - 14:10 UTC

Investigating

We are currently investigating this issue.

Posted Apr 01, 2019 - 13:33 UTC

This incident affected: Device URLs and Cloudlink (VPN).