Unnecessary 3am (emergency) call

Help, the database is on fire. Well, it probably is but the solution may also be easy. Here are a few steps for the part-time MySQL DBA/sysadmin/developer. Total time to address this solution was 2 minutes, the inability to not go back to sleep, not priceless.

First, access to the DB server is possible, and also the mysql command line client. If you are getting “Too Many Connections” read Why GRANT ALL is bad.

Given the message “the database is on fire”, the likely cause is rogue SQL.

mysql> show global variables like 'max_connections';
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| max_connections | 5000  |
+-----------------+-------+
1 row in set (0.02 sec)

mysql> show global status like 'threads%';
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| Threads_cached    | 1     |
| Threads_connected | 5001  |
| Threads_created   | 5002  |
| Threads_running   | 5001  |
+-------------------+-------+
4 rows in set (0.01 sec)

Yeah, as expected. A SHOW PROCESSLIST in this situation shows all queries stuck in the state of “query end”, which indicates the obvious problem to me (a disk space problem).

For the record I do not recommend setting max_connections to 5,000. MySQL (and Linux) does not function with 5,000 concurrent connections (especially when they all want to create a disk based temporary table, but that’s another story). What is missing is a maximum threads running configuration option, and applicable coding practices and proactive server connection management to prevent such a situation.

Confirming my suspicions.

$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/xvde             7.9G  2.5G  5.0G  34% /
tmpfs                 7.3G     0  7.3G   0% /dev/shm
/dev/xvdf1            197G   42G  146G  23% /mysql
/dev/xvdh1             99G   94G  8.0K 100% /mysql/binlog

The solution here is simple, you need to remove old binary logs. It is NOT the recommendation you delete the files manually but use PURGE MASTER LOGS. However this produced an error.

mysql> purge master logs to 'mysql-bin.000029';
ERROR 3 (HY000): Error writing file '/mysql/binlog/mysql-bin.index_crash_safe' (Errcode: 28 - No space left on device)

That’s a new message, possibly a new MySQL 5.6 error situation. The only choice now is to remove a physical file first. A suggestion to MySQL engineers. Let’s create this file in normal operations with sufficient blank bytes, enabling MySQL to be able to create the file even when the disk is full, and then avoid manual file manipulation.

$ rm mysql-bin.000002

Then:

mysql> purge master logs to 'mysql-bin.000029';
Query OK, 0 rows affected, 1 warning (1.37 sec)
mysql> show warnings;
+---------+------+---------------------------------------------------------------+
| Level   | Code | Message                                                       |
+---------+------+---------------------------------------------------------------+
| Warning | 1612 | Being purged log /mysql/binlog/mysql-bin.000002 was not found |
+---------+------+---------------------------------------------------------------+
1 row in set (0.01 sec)

As always, a recommendation when problems occur is to check the error log (aka Monitoring the MySQL Error Log). This issue is also immediately apparent.

2013-08-23 06:28:36 30609 [Warning] Disk is full writing '/mysql/binlog/mysql-bin.000097' (Errcode: 28 - No space left on device). Waiting for someone to free space...
2013-08-23 06:28:36 30609 [Warning] Retry in 60 secs. Message reprinted in 600 secs

I would also like to point out that being proactive and having monitoring and instrumentation in ALL startups is critical if you want to be successful. Point 1 in my recommendations of Successful Scalability Principles.

Was it really an emergency?

Trackbacks