Thursday 2nd February 2017

Sol Backporting concurrency deadlock check, scheduled 7 years ago

On February 2 beginning at approximately 7:10 AM EST (-0500 GMT) and concluding by 7:35 AM EST (-0500 GMT), Sol experienced connectivity issues to MySQL. The root cause is a deadlock bug that can be triggered if a few conditions are met. This bug has been a concern since it was first encountered with MySQL 4.0 released in 2003; since then, steps have been taken to reduce its exposure.

Sol uses a second generation monitor that works more efficiently, tracks a wider array of defects, and can detect service flaps. Up to today, it did not detect the deadlock bug. That check has been ported to Sol and newer platforms, which will resolve incidents like this going forward.

For those curious about the bug conditions, (1) table storage engine must be MyISAM, (2) user must be over quota, (3) identical UPDATE or INSERT queries must be issued within a millisecond of each other, (4) another UPDATE or INSERT query must be issued by another account in between those 2 queries and before the queries are postponed due to quota overages.