An update to the backend VPN configuration caused balenaOS os-config service on hundreds of thousands of devices to pick up the new configuration from the API at the same time and attempt to reconnect to the new VPN backend instance.
The resulting thundering herd effect caused system instability lasting a number of hours. All services were eventually restored. We are now looking at improvements on the OS side to prevent this from happening in the future, such as adding random delays ensuring device updates are staggered.