Thursday 16th August 2018

Atlas MySQL hard lock, scheduled 3 months ago

A lock was noted on August 16 at approximately 3:18 AM EDT (-0400 GMT), which prevented access to MySQL. Internal monitoring is designed to catch these events and restart the affected service automatically. A further analysis of the event yielded a situation that could arise in which the MySQL server is active, but non-responsive to connections. A second check, which validates connection counts depends upon this first check, which if it hangs indefinitely during authentication handshake, results in a deadlock.

A timeout has been added to both MySQL service checks, which will resolve such situations should they occur in the future.