Discovered because builders were crash-looping, indicated by error codes related to long strings from large buffer sizes. Quickly linked to an accumulation of uncleaned images on the builder instances.
A manual volume prune was executed to remove images and temporarily recover builder performance.
A patch is awaiting review to lower the max storage consumption of the builder workers before automatic cleanup is performed, and projects have been proposed to patch the upstream libraries and/or move to ephemeral builder workers.
We also have a project in the queue to add host metrics monitoring to the builder-worker nodes to catch issues like this earlier.