Starting around March 11, some cloud builds began failing intermittently with no such image errors. The failures were non-deterministic and affected all architectures. At peak, some users saw around 50% failure rates.
We identified and fixed several bugs in the builder's image garbage collector that caused it to over-count freed disk space and run too aggressively, eventually deleting images that in-progress builds still needed. Fixes were deployed between March 19 and April 14, with build failure rates dropping to near-zero after the final deploy.
We're continuing to monitor and working on additional safeguards to prevent the garbage collector from targeting images that active builds depend on.