MySQL performance schema threads

A trap for those new to the MySQL Performance Schema is the expectation that thread_id in tables such as events_statements_current matches the id you find in the MySQL processlist. This is NOT TRUE.

If we look at the INFORMATION_SCHEMA.PROCESSLIST table we will find information like:

mysql> select id,db,command,state from information_schema.processlist order by id;
| id        | db                 | command | state                                                            |
|         1 | NULL               | Connect | Slave has read all relay log; waiting for the slave I/O thread t |
|         2 | NULL               | Connect | Waiting for master to send event                                 |
|         5 | NULL               | Sleep   |                                                                  |
|  34404870 | NULL               | Sleep   |                                                                  |
| 106759077 | performance_schema | Sleep   |                                                                  |
| 106904549 | performance_schema | Query   | executing                                                        |
| 107004213 | abc                | Sleep   |                                                                  |
| 107004600 | abc                | Sleep   |                                                                  |
| 107004877 | abc                | Sleep   |                                                                  |
| 107004937 | abc                | Sleep   |                                                                  |
| 107004989 | abc                | Sleep   |                                                                  |
| 107005164 | abc                | Sleep   |                                                                  |
| 107005305 | abc                | Sleep   |                                                                  |
| 107005310 | abc                | Sleep   |                                                                  |
| 107005314 | abc                | Sleep   |                                                                  |
| 107005316 | abc                | Sleep   |                                                                  |
| 107005317 | abc                | Sleep   |                                                                  |
| 107005321 | abc                | Sleep   |                                                                  |

However, if I wanted to look at the two slave threads in performance_schema.events_statements_current I will find no matching data.

mysql> select * from performance_schema.events_statements_current where thread_id=1;
Empty set (0.00 sec)
mysql> select * from performance_schema.events_statements_current where thread_id=2;
Empty set (0.00 sec)

In the performance schema, you need to use the threads table in order to determine a match between the processlist and the threads. If we look at the table, you will find information about all the BACKGROUND threads in MySQL.

mysql> select thread_id,name,type,processlist_id from threads;
| thread_id | name                                   | type       | processlist_id |
|         1 | thread/sql/main                        | BACKGROUND |           NULL |
|         2 | thread/innodb/io_handler_thread        | BACKGROUND |           NULL |
|         3 | thread/innodb/io_handler_thread        | BACKGROUND |           NULL |
|         4 | thread/innodb/io_handler_thread        | BACKGROUND |           NULL |
|         5 | thread/innodb/io_handler_thread        | BACKGROUND |           NULL |
|         6 | thread/innodb/io_handler_thread        | BACKGROUND |           NULL |
|         7 | thread/innodb/io_handler_thread        | BACKGROUND |           NULL |
|         8 | thread/innodb/io_handler_thread        | BACKGROUND |           NULL |
|         9 | thread/innodb/io_handler_thread        | BACKGROUND |           NULL |
|        10 | thread/innodb/io_handler_thread        | BACKGROUND |           NULL |
|        11 | thread/innodb/io_handler_thread        | BACKGROUND |           NULL |
|        12 | thread/innodb/io_handler_thread        | BACKGROUND |           NULL |
|        13 | thread/innodb/io_handler_thread        | BACKGROUND |           NULL |
|        14 | thread/innodb/io_handler_thread        | BACKGROUND |           NULL |
|        15 | thread/innodb/io_handler_thread        | BACKGROUND |           NULL |
|        17 | thread/innodb/srv_lock_timeout_thread  | BACKGROUND |           NULL |
|        18 | thread/innodb/srv_error_monitor_thread | BACKGROUND |           NULL |
|        19 | thread/innodb/srv_monitor_thread       | BACKGROUND |           NULL |
|        20 | thread/innodb/srv_master_thread        | BACKGROUND |           NULL |
|        21 | thread/innodb/srv_purge_thread         | BACKGROUND |           NULL |
|        22 | thread/innodb/page_cleaner_thread      | BACKGROUND |           NULL |
|        23 | thread/sql/signal_handler              | BACKGROUND |           NULL |
|        24 | thread/sql/slave_io                    | BACKGROUND |           NULL |
|        25 | thread/sql/slave_sql                   | BACKGROUND |           NULL |
|        28 | thread/sql/one_connection              | FOREGROUND |              5 |
| 107013952 | thread/sql/one_connection              | FOREGROUND |      107013929 |
| 107013989 | thread/sql/one_connection              | FOREGROUND |      107013966 |
| 106759100 | thread/sql/one_connection              | FOREGROUND |      106759077 |
| 107014180 | thread/sql/one_connection              | FOREGROUND |      107014157 |
| 107014291 | thread/sql/one_connection              | FOREGROUND |      107014268 |
| 106904572 | thread/sql/one_connection              | FOREGROUND |      106904549 |
| 107014443 | thread/sql/one_connection              | FOREGROUND |      107014420 |
| 107014490 | thread/sql/one_connection              | FOREGROUND |      107014467 |
| 107014491 | thread/sql/one_connection              | FOREGROUND |      107014468 |
| 107014493 | thread/sql/one_connection              | FOREGROUND |      107014470 |
| 107014497 | thread/sql/one_connection              | FOREGROUND |      107014474 |
| 107014498 | thread/sql/one_connection              | FOREGROUND |      107014475 |
|  34404893 | thread/sql/one_connection              | FOREGROUND |       34404870 |
| 107013233 | thread/sql/one_connection              | FOREGROUND |      107013210 |
39 rows in set (0.00 sec)

While it’s great the performance schema can track the statements executed in the Slave SQL thread (I’ll discuss that later), it’s disappointing that the processlist_id, the one thing that can join our two sources of data other, has not value for the slave threads.

mysql> select * from threads where thread_id=25G
*************************** 1. row ***************************
          THREAD_ID: 25
               NAME: thread/sql/slave_sql
               TYPE: BACKGROUND
  PROCESSLIST_STATE: Slave has read all relay log; waiting for the slave I/O thread t
               ROLE: NULL
1 row in set (0.00 sec)

For normal queries we can get a relationship between the processlist and the performance schema with:

from information_schema.processlist p
inner join performance_schema.threads t on = t.processlist_id
inner join performance_schema. events_statements_current s using (thread_id)
where p.command='Query'G
*************************** 2. row ***************************
                     id: 106904549
                     db: performance_schema
                command: Query
                  state: executing
              THREAD_ID: 106904572
               EVENT_ID: 78
           END_EVENT_ID: NULL
             EVENT_NAME: statement/sql/select
            TIMER_START: 268504759200898000
              TIMER_END: NULL
             TIMER_WAIT: NULL
              LOCK_TIME: 580000000
               SQL_TEXT: select,p.db,p.command,p.state,s.* from information_schema.processlist p inner join performance_schema.threads t on = t.processlist_id inner join performance_schema. events_statements_current s using (thread_id) where p.command='Query'
                 DIGEST: NULL
            DIGEST_TEXT: NULL
         CURRENT_SCHEMA: performance_schema
            OBJECT_TYPE: NULL
            OBJECT_NAME: NULL
            MYSQL_ERRNO: 0
                 ERRORS: 0
               WARNINGS: 0
          ROWS_AFFECTED: 0
              ROWS_SENT: 1
          ROWS_EXAMINED: 0
           SELECT_RANGE: 0
            SELECT_SCAN: 1
             SORT_RANGE: 0
              SORT_ROWS: 0
              SORT_SCAN: 0
          NO_INDEX_USED: 1

What causes LOST_EVENTS error in the MySQL binary log?

Using MySQL 5.6.13 under CentOS 6.4, I had a replication failure on one master/slave topology because the master binary log had the following entry that was intentionally written by the MySQL server.

$ mysqlbinlog --start-position=244670849 mysql-bin.000029
# at 244670849
#131028 19:31:38 server id 39  end_log_pos 244670906 CRC32 0xc5d084ec
# Incident: LOST_EVENTS
RELOAD DATABASE; # Shall generate syntax error
# at 244670906

The question is why? I do not know the answer. Searching online indicates this can occur in a MySQL Cluster environment, and can occur around the use of GRANT statements. Neither of these situations are applicable here. This site runs 4 different master/slave topologies all running the same version, and this has been the only occurrence.

The message, if taken literally could indicate that the master server failed to write all events to the master binary log, but instead of reporting any information (say in the MySQL error log), it basically tells you the binary log (aka the replication stream) is now incomplete and you need to rebuild your entire replication topology. If this is true, then this is a poor explanation, and there is no MySQL documentation I have found to shed light here.

The situation of course does not manifest to the end user on the master (or say monitoring the MySQL error log), but on the slave with a message similar to:

        Relay_Master_Log_File: mysql-bin.000029
          Exec_Master_Log_Pos: 244670849
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
                   Last_Errno: 1590
                   Last_Error: The incident LOST_EVENTS occured on the master. Message: error writing to the binary log

What appears to happen on the master, is the LOST_EVENTS binary log event is written, the binary logs are flushed, as indicated by the file sizes.

-rw-rw---- 1 mysql mysql 1073742236 Oct 27 23:33 mysql-bin.000027
-rw-rw---- 1 mysql mysql 1073741890 Oct 28 11:28 mysql-bin.000028
-rw-rw---- 1 mysql mysql  244670953 Oct 28 19:31 mysql-bin.000029
-rw-rw---- 1 mysql mysql 1073742184 Oct 29 08:55 mysql-bin.000030

Further analysis of the statements before the event in the master log show nothing of incident. Analysis of the binary logs at the error and subsequently how second timestamps that are identical, so there is no *obvious* loss of operations, but when there are 100s of transactions per second, using second granularity is ineffective.

When confirming there were no events after this position on the slave, and the binary logs had been flushed the solution taken was to reset the slave to the next valid event.

mysql> STOP SLAVE;
mysql> CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000030', MASTER_LOG_POS=120;

MySQL/NoSQL/Cloud Conference Latin America

Last week I was a guest speaker at the second annual MySQL/NoSQL/Cloud Conference held in Buenos Aires, Argentina. Thanks to Santiago Lertora from Binlogic who has taken on the responsibility of organizing a event for the MySQL community in South America.

My presentations slides for my 3 talks.

While the MySQL community and ecosystem has changed dramatically over the past 5 years, one thing has remained the same. That is the friendships and relationships forged while working for MySQL Inc/AB as employees. No matter were you are in the world you can spend time talking about the past, present and future with alumni and have good time. Even here in South America there were many including Colin, Giuseppe, Monty, Rich, Gerardo, Vladim and myself.

An unexplained connection experience

The “Too many connections” problem is a common issue with applications using excessive permissions (and those that grant said global permissions). MySQL will always grant a user with SUPER privileges access to a DB to investigate the problem with a SHOW PROCESSLIST and where you can check the limits. I however found the following.

mysql> show global variables like 'max_connections';
| Variable_name   | Value |
| max_connections | 2000  |
1 row in set (0.00 sec)

mysql> show global status like 'max%';
| Variable_name        | Value |
| Max_used_connections | 6637  |
1 row in set (0.00 sec)

How can the max_used_connection exceed max_connections? This is possible because you can dynamically change max_connections in a normal MySQL environment. However ,this is AWS RDS where you cannot change variables dynamically via mysql client. You can via other command line options but this has not happened. Furthermore, this server is using the defauly.mysql.5.5 parameter group to further validate the claim that it has not been changed.

I do not have an answer for the client in this case.

I would also add this as another ding on the usability of RDS in production environments. I was locked out of the DB for a long time, and with no visibility of what was going on. The only options were wait, or restart the server. RDS does not provide this level of visibility of the processlist using a privileged user that could see what was going on. Perhaps an interface they should consider in future.

A friday MongoDB funny

I had to laugh (just a bit) at this on the exhibitor floor at Oracle Open World 2013. There was a large MongoDB presence at the Slot 301. There are a few reasons.
First, the identity crisis remains. There is no MongoDB in the list of exhibitors, it’s 10gen, but where is the 10gen representation in the sign. 99.99% of attendees would not know this.
Second, the first and only slide I saw (as shown below), tries to directly compare implementing a solution to Oracle. The speaker made some comment but I really zoned out quickly. Having worked with MongoDB, even on one of my own projects, contemplated the ROI of being proficient in this for consulting, even discussing at length with the CEO and CTO, and hearing only issues with MongoDB with existing MySQL clients, I have come to the conclusion that MongoDB is a niche product. It’s great for a very particular situation, and absolutely not suitable for general use to replace a relational database (aka something with transactions to start with). A young and eager 10gen employee approached me, all excited to convince me of the savings. My first question to him was, how long have you been working at 10gen? After he responded 6 months, I informed him that I knew more about his product and specifically the ecosystem he was now in.

Finally, it was rather sad to think that 10gen/MongoDB was not interested in exhibiting in the MySQL Connect conference, a competitor product in it’s space. They obviously feel that MySQL is dead, and no longer even a viable competitor in the market space.

I have nothing personal against MongoDB, and it continues to mature as a product, however it’s a niche product with some strengths over a RDBMS in a minority of points. It definitely is not the right product for general OLTP applications.

Additional DB objects in AWS RDS

To expand on Jervin’s Default RDS Account Privileges, RDS for MySQL provides a number of routines and triggers defined the the ‘mysql’ meta schema. These help in various tasks because the SUPER privilege is not provided.

SELECT routine_schema,routine_name
FROM information_schema.routines;
| routine_schema | routine_name                      |
| mysql          | rds_collect_global_status_history |
| mysql          | rds_disable_gsh_collector         |
| mysql          | rds_disable_gsh_rotation          |
| mysql          | rds_enable_gsh_collector          |
| mysql          | rds_enable_gsh_rotation           |
| mysql          | rds_kill                          |
| mysql          | rds_kill_query                    |
| mysql          | rds_rotate_general_log            |
| mysql          | rds_rotate_global_status_history  |
| mysql          | rds_rotate_slow_log               |
| mysql          | rds_set_configuration             |
| mysql          | rds_set_gsh_collector             |
| mysql          | rds_set_gsh_rotation              |
| mysql          | rds_show_configuration            |
| mysql          | rds_skip_repl_error               |
15 rows in set (0.00 sec)

SELECT trigger_schema, trigger_name,
          CONCAT(event_object_schema,'.',event_object_table) AS table_name,
          CONCAT(action_timing,' ',event_manipulation) AS trigger_action
FROM information_schema.triggers;
| trigger_schema | trigger_name | table_name | trigger_action |
| mysql          | block_proc_u | mysql.proc | BEFORE UPDATE  |
| mysql          | block_proc_d | mysql.proc | BEFORE DELETE  |
| mysql          | block_user_i | mysql.user | BEFORE INSERT  |
| mysql          | block_user_u | mysql.user | BEFORE UPDATE  |
| mysql          | block_user_d | mysql.user | BEFORE DELETE  |

MySQL MHA and Perl pathing

I am certainly not a knowledgeable Perl person, however I recently ran into the error Can’t locate MHA/ on Red Hat 6.x. I have installed MySQL MHA on various systems before without any issues.

$ masterha_manager -version
Can't locate MHA/ in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /usr/bin/masterha_manager line 26.
BEGIN failed--compilation aborted at /usr/bin/masterha_manager line 26.

The issue was that MySQL MHA is not installed in any of the acceptable default paths for this disto default installation.

$ find / -type d -name MHA

The fix was simple on this OS, but I expect there is a correct Perl approach?

ln -s /usr/lib/perl5/vendor_perl/MHA /usr/lib64/perl5/vendor_perl/
$ cat /etc/*release
Red Hat Enterprise Linux Server release 6.3 (Santiago)
Red Hat Enterprise Linux Server release 6.3 (Santiago)

Cloning MySQL 5.6 instances

A tip for all those cloud users that like cloning database servers (as reported in my book Effective MySQL – Replication Techniques in Depth).

Starting with MySQL 5.6, MySQL instances have a UUID. Cloning servers to quickly create slaves will result in the following error message.

  Last_IO_Error: Fatal error: The slave I/O thread stops because master and slave have equal MySQL server UUIDs; these UUIDs must be different for replication to work.

The solution is simple. Clear our the file based configuration file (located in the MySQL datadir) and restart the MySQL instance.

$ rm -f /mysql/data/auto.cnf
$ service mysql restart

Unnecessary 3am (emergency) call

Help, the database is on fire. Well, it probably is but the solution may also be easy. Here are a few steps for the part-time MySQL DBA/sysadmin/developer. Total time to address this solution was 2 minutes, the inability to not go back to sleep, not priceless.

First, access to the DB server is possible, and also the mysql command line client. If you are getting “Too Many Connections” read Why GRANT ALL is bad.

Given the message “the database is on fire”, the likely cause is rogue SQL.

mysql> show global variables like 'max_connections';
| Variable_name   | Value |
| max_connections | 5000  |
1 row in set (0.02 sec)

mysql> show global status like 'threads%';
| Variable_name     | Value |
| Threads_cached    | 1     |
| Threads_connected | 5001  |
| Threads_created   | 5002  |
| Threads_running   | 5001  |
4 rows in set (0.01 sec)

Yeah, as expected. A SHOW PROCESSLIST in this situation shows all queries stuck in the state of “query end”, which indicates the obvious problem to me (a disk space problem).

For the record I do not recommend setting max_connections to 5,000. MySQL (and Linux) does not function with 5,000 concurrent connections (especially when they all want to create a disk based temporary table, but that’s another story). What is missing is a maximum threads running configuration option, and applicable coding practices and proactive server connection management to prevent such a situation.

Confirming my suspicions.

$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/xvde             7.9G  2.5G  5.0G  34% /
tmpfs                 7.3G     0  7.3G   0% /dev/shm
/dev/xvdf1            197G   42G  146G  23% /mysql
/dev/xvdh1             99G   94G  8.0K 100% /mysql/binlog

The solution here is simple, you need to remove old binary logs. It is NOT the recommendation you delete the files manually but use PURGE MASTER LOGS. However this produced an error.

mysql> purge master logs to 'mysql-bin.000029';
ERROR 3 (HY000): Error writing file '/mysql/binlog/mysql-bin.index_crash_safe' (Errcode: 28 - No space left on device)

That’s a new message, possibly a new MySQL 5.6 error situation. The only choice now is to remove a physical file first. A suggestion to MySQL engineers. Let’s create this file in normal operations with sufficient blank bytes, enabling MySQL to be able to create the file even when the disk is full, and then avoid manual file manipulation.

$ rm mysql-bin.000002


mysql> purge master logs to 'mysql-bin.000029';
Query OK, 0 rows affected, 1 warning (1.37 sec)
mysql> show warnings;
| Level   | Code | Message                                                       |
| Warning | 1612 | Being purged log /mysql/binlog/mysql-bin.000002 was not found |
1 row in set (0.01 sec)

As always, a recommendation when problems occur is to check the error log (aka Monitoring the MySQL Error Log). This issue is also immediately apparent.

2013-08-23 06:28:36 30609 [Warning] Disk is full writing '/mysql/binlog/mysql-bin.000097' (Errcode: 28 - No space left on device). Waiting for someone to free space...
2013-08-23 06:28:36 30609 [Warning] Retry in 60 secs. Message reprinted in 600 secs

I would also like to point out that being proactive and having monitoring and instrumentation in ALL startups is critical if you want to be successful. Point 1 in my recommendations of Successful Scalability Principles.

Was it really an emergency?

Identifying empty tables in MySQL

The following simple INFORMATION_SCHEMA statement will identify and also verify tables that have no rows. These may be candidate tables to remove from your data model.

mysql -N -e "select CONCAT('SELECT "',table_schema,'.',table_name,'" AS tbl, COUNT(*) AS cnt FROM ',table_schema,'.',table_name,';') as cmd from information_schema.tables where table_schema not in ('mysql','performance_schema','information_schema') and table_rows=0;" | mysql -N

A simple FAILED SQL statement breaks MySQL 5.6 replication

I setup 6 shiny new MySQL 5.6.13 MySQL servers, ready for testing and production deployment tomorrow.

I found that the New Relic MySQL Monitoring was throwing the following error.

[2013-08-08 03:53:33 +0000] com.newrelic.plugins.mysql.MySQL | SEVERE | An SQL error occured running 'SHOW ENGINE INNODB STATUS' Access denied; you need (at least one of) the PROCESS privilege(s) for this operation

Simple fix, the user I am gathering metrics requires the PROCESS privilege. Again simple enough.

mysql> grant PROCESS on xxx.* to xxx@'10.%';
ERROR 1221 (HY000): Incorrect usage of DB GRANT and GLOBAL PRIVILEGES
mysql> grant PROCESS on *.* to xxx@'10.%';
Query OK, 0 rows affected (0.00 sec)

As you can see I got an error for a global privilege, again trivial, easy fix to correct syntax. However, it breaks replication with a very non descriptive message.

Last_Errno: 1590
Last_Error: The incident LOST_EVENTS occured on the master. Message: error writing to the binary log

Bug #68892 reported this in April, verified by Oracle support, but is listed as ‘non-critical’. I agree with the bug author, given MySQL 5.6 touts many MySQL replication improvements, a simple failed statement should not break replication.

DP#8 The disadvantages of row at a time processing

It can be hard for software engineers to understand the following principle, however it is very important for improving performance and obtaining immediate scalability options. The principle is “Do Less Work”. That is, run less SQL statements.

Just one method to achieving the execution of less SQL statements is to eliminate Row At a Time (RAT) processing. In simple terms, do not perform identical repeating SQL statements in a loop. Relational algebra, and the Structure Query Language (SQL) specification is specifically designed to work with sets of data, or as I describe, Chunk At a Time (CAT) processing.

Customer Example

Your online social media website lets you send messages to multiple friends at one time. You enter the message, select the friends you want to receive the message and click send. While the user waits a moment and gets a success message, behind the scenes the application runs the following SQL statements to record your request.

INSERT INTO dp8_message_sent(message_id, user_id, message, created)
VALUES(NULL, 42, 'Hey guys. Just a reminder. The poker game will start on Friday at 8pm.',NOW());
SELECT @message_id :=LAST_INSERT_ID();
INSERT INTO dp8_message_recipient(message_id, from_user_id, to_user_id, status)
VALUES (@message_id,42,16,'New');
UPDATE dp8_user_notification
SET    new_message = 'Y',
       new_message_count = new_message_count + 1
WHERE  user_id = 16;
INSERT INTO dp8_message_recipient(message_id, from_user_id, to_user_id, status)
VALUES (@message_id,42,18,'New');
UPDATE dp8_user_notification
SET    new_message = 'Y',
       new_message_count = new_message_count + 1
WHERE  user_id = 18;
INSERT INTO dp8_message_recipient(message_id, from_user_id, to_user_id, status)
VALUES (@message_id,42,99,'New');
UPDATE dp8_user_notification
SET    new_message = 'Y',
       new_message_count = new_message_count + 1
WHERE  user_id = 99;
INSERT INTO dp8_message_recipient(message_id, from_user_id, to_user_id, status)
VALUES (@message_id,42,21,'New');
UPDATE dp8_user_notification
SET    new_message = 'Y',
       new_message_count = new_message_count + 1
WHERE  user_id = 21;
INSERT INTO dp8_message_recipient(message_id, from_user_id, to_user_id, status)
VALUES (@message_id,42,62,'New');
UPDATE dp8_user_notification
SET    new_message = 'Y',
       new_message_count = new_message_count + 1
WHERE  user_id = 62;

You can define the table structures used in this example with:

DROP TABLE IF EXISTS dp8_message_sent;
CREATE TABLE dp8_message_sent(
  user_id     INT UNSIGNED NOT NULL,
  message  VARCHAR(500) NOT NULL,
  created     DATETIME NOT NULL,
  PRIMARY KEY(message_id),
DROP TABLE IF EXISTS dp8_message_recipient;
CREATE TABLE dp8_message_recipient(
  message_id     INT UNSIGNED NOT NULL,
  from_user_id     INT UNSIGNED NOT NULL,
  to_user_id     INT UNSIGNED NOT NULL,
  status   ENUM('New','Read','Deleted') NOT NULL,
  PRIMARY KEY(message_id,to_user_id),
DROP TABLE IF EXISTS dp8_user_notification;
CREATE TABLE dp8_user_notification(
  new_message ENUM ('Y','N') NOT NULL DEFAULT 'N',
  new_message_count INT UNSIGNED NOT NULL DEFAULT '0',
  PRIMARY KEY(user_id)

The average software developer may not see the problem here. In your test environment you executed 12 SQL statements and the code worked fine, i.e. it met the requirements for the function. However, while producing the correct result, this is a poor code approach.

This example shows not one repeating query, but two. Lucky you only sent the message to a few friends. If you sent it to 200 friends, you have a significant number more SQL statements to execute. This time the code executes 402 SQL statements for the same feature. The response time to the user is longer, the application connection has to remain open longer and the database has more work to do.

This popular site is sending thousands of messages per second, so the problem is compounded to produce an excess of unnecessary work, not just for the database, but the application web server connections as their are longer open requests.

The solution is straightforward. Remove repeating queries. It’s not rocket science. This is a simple design practice I teach as the problem is evident on most consulting engagements. Popular products including Drupal and WordPress also implement this poor practice and developers that extend these products propagate this poor practice excessively. If this development approach can be easily found in a few common functions, in it generally a clear indicator this problem can be found throughout the code.

Here is the same operation performed efficiently.

INSERT INTO dp8_message_sent(message_id, user_id, message, created)
VALUES(NULL, 42, 'Hey guys. Just a better reminder. The poker game will start on Friday at 8pm.',NOW());
INSERT INTO dp8_message_recipient(message_id, from_user_id, to_user_id, status)
UPDATE dp8_user_notification
SET    new_message = 'Y',
       new_message_count = new_message_count + 1
WHERE  user_id IN (16,18,99,21,62);

No matter how many friends you send a message to, only 3 SQL statements are executed. In these queries we see two different examples of leveraging the set capabilities of SQL to perform chunk at a time processing. We discuss the benefits of the multi-values INSERT in more detail with DP#10.

Customer Example 2

The following is a simple example for an online store processing function. Your shipping provider provides an update of all packages that were processed by them for a given date. For each packing tracking code that you have recorded with orders they provide a last known status. For example if the package were successfully delivered, is in transit, or has been returned.

A typical and very common developer process is to open the file, read each line looping through all the rows, and for each row perform a single update without using transactions like:

open file
for each line
  UPDATE dp8_order
  SET    last_shipping_status=?, last_shipping_update=?
  WHERE  tracking_code=?;
close file

As the size of data increases so does the processing time because you execute one statement per row. When there are are 10 packages, 10 SQL statements, when there are 300,000 packages, there are 300,000 SQL statements.

This batch process does not have a user response time requirement like online applications where performance is key to retaining your users. However, while eliminating row at a time processing is critical for providing a better user experience it is also just as important for batch processing.

stmt = 'INSERT INTO dp8_batch_tracking (batch_id, tracking_code, status, last_update) VALUES'
sep = ''

open file
for each line
  stmt = stmt + sep + '(42, ?, ?, ?)'
  sep  = ''
close file

UPDATE dp8_order o, dp8_batch_tracking bt
SET    o.last_shipping_status=bt.status, o.last_shipping_update=bt.last_update
WHERE  bt.batch_id = 42
AND    bt.tracking_code = o.tracking_code;
--DELETE FROM batch_tracking WHERE batch_id=42;

This example removes the one query per row problem, and results in just 2 SQL queries for processing the file regardless of size.

NOTE: In MySQL there is a limit to the length of the SQL statement (i.e. The INSERT). This can be adjusted with the max_allowed_packet variable which can be set per SQL statement. If you are processing very large files, the following code would be modified to perform the INSERT for ‘n’ records, however only a single UPDATE is still required. See DP#10 for an example of using max_allowed_packet.

This example shows just one way to optimize this operation with the least amount of code changes to the existing application. An even better approach is to use the LOAD DATA INFILE syntax to populate the batch table directly. This requires additional SQL privileges and file system access and hence is a more complex solution.

Why is the impact of removing these repeating queries so significant? To answer that question we need to look at the anatomy of the execution of an SQL statement.

SQL statement workflow

To the end user viewing your website with a browser, the result of clicking send on a webpage is a [short] delay before the expected results are displayed or the applicable action occurs. Behind the scenes an extensive amount of work is performed. For anybody that has looked at a waterfall chart showing the response from a web server, there is a far greater complexity for rendering the page you are looking at. The following article gives a good introduction to browser waterfall graphs — While the browser may render 100s of files, it is generally the first response, the actual page that is involved in executing the necessary SQL statements, and the focus of this design practice.

When a HTTP request is made to a web container the application performs a number of operations to satisfy the request and produce a response. With your application, regardless of the programming language, access to the MySQL database is performed by SQL statements. Each statement is passed to the language specific MySQL connector required with your web container. For example, when using the Apache HTTP server and the PHP programming language, the MySQL Native Driver (mysqlnd) is the necessary MySQL Connector. There are connectors for the popular languages including C, C++, Java, .Net, Python, Ruby etc.

Here is a short summarized list of what occurs with all SQL statements.

  1. The application executes an SQL statement.
  2. The MySQL client connector accepts the SQL statement then connects across the network to the specified MySQL server and passes the SQL statement to the MySQL server.
  3. The MySQL server processes all incoming SQL statements in individual threads, so many SQL statements can be executed concurrently.
  4. The MySQL server first parses the SQL statement for valid SQL syntax, and produces a summarized structure of the tables and columns used in the SQL statement.
  5. The MySQL server performs a security check to ensure the user that is requesting this SQL statement has the necessary privileges to be able to access/modify the information requested in the SQL statement.
  6. The MySQL server then passes the parsed SQL statement to the MySQL query optimizer. This is heart of the decision making process where the cost-based optimizer creates a decision tree, evaluates the various options by pruning the expensive paths to produce the optimal Query Execution Plan (QEP).
  7. The MySQL server then passes the QEP to the applicable MySQL storage engine(s) to perform the physical work of storing and/or retrieving the data for the given SQL statement.
  8. Depending on the type of query, the MySQL server may have to do additional work, for example to join multiple tables, sort results etc.
  9. When the MySQL server has produced the results for the SQL statement, these results are send back across the network to the application server.

NOTE: This is a simplified representation of the execution path of an SQL statement in MySQL. The use of the MySQL Query Cache discussed in QP#9 introduces additional steps and can also produce a significantly simplified and faster execution path.

To summarize, every SQL statement is passed to the MySQL server, the network overhead of points 2 and 9 are the most expensive amount of time in a well tuned MySQL application. This alone is the greatest reason to run less SQL statements.

Every SQL statement is parsed, checked for application permissions and optimized before execution. This is most applicable for example when combining INSERT statements with multiple VALUES clauses. In addition to saving the network round trip, this overhead is also eliminated by combining SQL statements.

Universal Application

This same principle can be applied to other products that process data. For example, memcache is a popular product to improve performance and scalability of your application by providing a memory caching layer. The following figures are for an example benchmark with 28 objects in memcache using two cloud servers in Rackspace Cloud.

Using an individual get call 28 times sequentially in a single PHP file, simulating a client example, the total response time of the benchmarked ranged from 24 to 56 milliseconds. Using the same configuration with a single multi-get call for the same 28 objects the results ranged from 4 to 7 milliseconds.

It does not require a graph to see the 6x-10x improvement in performance by eliminating row at a time processing. The saving of 20-50 milliseconds may seem small, however when multiplied in environments with thousands of concurrent users, thousands of times per second, has a large impact on resources.


This principle shows a simple technique for reducing the number of SQL statements by eliminate repeating queries. As a goal of “Do Less Work”, this is only one case. DP#16 discusses several other query saving techniques that can eliminate repeating and unwanted queries providing improved performance.

More References

About Expert MySQL Design Practices

This new series by leading MySQL Expert Ronald Bradford helps the software engineer understand, appreciate and develop the right skills and techniques to build scalable software solutions. These proven and reproducible design practices will ensure your use of MySQL to improve performance, scalability and reliability.

These expert design practices are from 25 years of professional experience following formal university qualifications in computer science. All of these practices are written for use with a MySQL based data system however most of the content in these practices predate the existence of the MySQL product and have stood the test of time with emerging technologies and software development approaches. Many practices apply directly to other data stores, whether relational or the new NoSQL products and include working with persistent and non-persistent data storage products.

More information about the series can be found at

MySQL configuration file created in /usr/my.cnf

As part of uncovering a new security improvement in MySQL 5.6 with the default installation being secured with a unique root MySQL password, the MySQL provided rpm’s installed the MySQL configuration file in /usr.

I feel this maybe a bug, however this is just as an explanation for others to comment on.
The following steps reproduces this.

1. Launch a CentOS 6.4 AWS EC2 Instance

ec2-run-instances ami-eb6b0182 --instance-type ${INSTANCE_TYPE} --region us-east-1 --availability-zone us-east-1b --key db --group db | tee ${LOG}
INSTANCE=`grep ^INSTANCE ${LOG} | awk '{print $2}'`
ec2-describe-instances ${INSTANCE} | tee ${LOG}
while [ `grep ^INSTANCE ${LOG} | awk '{print $6}'` != 'running' ] ;  do cat ${LOG}; sleep 5; ec2-describe-instances ${INSTANCE} | tee ${LOG}; done
SERVER=`grep ^INSTANCE ${LOG} | awk '{print $4}'`
echo "#instance=${INSTANCE},server=${SERVER}"


2. Connect to server

ssh -i ~/.ssh/db.pem [email protected]

3. Install software

yum update -y

# Remove the Postfix MySQL dependency
rpm -qa | grep -i mysql
rpm -e --nodeps mysql-libs-5.1.69-1.el6_4.x86_64

# Just to prove no config here
ls -l /etc/my.cnf
ls: cannot access /etc/my.cnf: No such file or directory
ls -l /usr/my.cnf
ls: cannot access /usr/my.cnf: No such file or directory

# Install MySQL prerequisites
yum install -y libaio perl

cd /tmp
yum install -y wget
tar xvf MySQL-5.6.13-1.el6.x86_64.rpm-bundle.tar
rpm -ivh MySQL*.rpm

4. The results of the installation of the default MySQL 5.6 rpms.

You will find that password in '/root/.mysql_secret'.

You must change that password on your first connect,
no other statement but 'SET PASSWORD' will be accepted.
See the manual for the semantics of the 'password expired' flag.

Also, the account for the anonymous user has been removed.

In addition, you can run:


which will also give you the option of removing the test database.
This is strongly recommended for production servers.

See the manual for more instructions.

Please report any problems with the /usr/bin/mysqlbug script!

The latest information about MySQL is available on the web at

Support MySQL by buying support/licenses at

New default config file was created as /usr/my.cnf and
will be used by default by the server when you start it.
You may edit this file to change server settings

As you can see via the rpm installation instructions, a new configuration file was created in “/usr/my.cnf”

$ cat /usr/my.cnf
# For advice on how to change settings please see


# Remove leading # and set to the amount of RAM for the most important data
# cache in MySQL. Start at 70% of total RAM for dedicated server, else 10%.
# innodb_buffer_pool_size = 128M

# Remove leading # to turn on a very important data integrity option: logging
# changes to the binary log between backups.
# log_bin

# These are commonly set, remove the # and set as required.
# basedir = .....
# datadir = .....
# port = .....
# server_id = .....
# socket = .....

# Remove leading # to set options mainly useful for reporting servers.
# The server defaults are faster for transactions and fast SELECTs.
# Adjust sizes as needed, experiment to find the optimal values.
# join_buffer_size = 128M
# sort_buffer_size = 2M
# read_rnd_buffer_size = 2M


DP#4 The importance of using sql_mode

What if the data you retrieved from the database did not match the data the application claimed to had successfully stored? How comfortable would your organization feel about your skills and the products that are being used to store important information if data integrity was not guaranteed?

MySQL employs a terrible default technique known as silent truncation where the product determines that it knows about your data better than you. Never has the saying “do not assume” because it makes an “ass” out of “u” and “me” been more applicable.

Customer Example

A HTML form for new customers provide input fields for the customer first and last name. Good design was considered with the HTML form client validation to ensure that each field could not exceed 20 characters in length. However, the database design is different, where the first name is only defined as 10 characters. In most cases this is sufficient, however for first names longer than 10 characters, the data retrieved does not match the data that was apparently successfully stored because there was no SQL error. The following SQL reproduces this situation.

  first_name      VARCHAR(10) NOT NULL,
  last_name       VARCHAR(20) NOT NULL,
  PRIMARY KEY (customer_id)

INSERT INTO dp4 (customer_id, first_name, last_name)
          VALUES (NULL,'Evangeline','Jones');
INSERT INTO dp4 (customer_id, first_name, last_name)
          VALUES (NULL,'Christopher','Smith');
INSERT INTO dp4 (customer_id, first_name, last_name)
          VALUES (NULL,'Alexander','Bell');

| customer_id | first_name | last_name |
|           1 | Evangeline | Jones     |
|           2 | Christophe | Smith     |
|           3 | Alexander  | Bell      |
3 rows in set (0.00 sec)

As you can see, the first name of Christopher Smith is not actually correctly stored in the database. MySQL DID NOT produce an error message, rather it performed a silent truncation of the data.

Defining sql_mode

To demonstrate what level of data integrity you should expect with MySQL, you must define the sql_mode configuration option. The following example demonstrates the dynamic syntax for a given connection and the error you should expect.


INSERT INTO dp4 (customer_id, first_name, last_name)
          VALUES (NULL,'Christopher','James');
ERROR 1406 (22001): Data too long for column 'first_name' at row 1
Empty set (0.00 sec)

When MySQL is first installed the following configuration option should always be added to all environments as a default.

$ cat /etc/my/cnf

Starting with MySQL 5.6, the default MySQL configuration files when created by MySQL define sql_mode variable with STRICT_ALL_TABLES,NO_ENGINE_SUBSTITUTION. This is a step in the right direction.

For more information, refer to the MySQL Reference Manual for sql_mode at

NOTE: MySQL provides many different options with sql_mode. Careful consideration is needed to determine which options are best for your application. Some options help in providing syntax and compatibility with other database products however these can affect and even break existing products written specifically for MySQL.

MySQL Warnings

The underlying cause of this loss of data integrity is how MySQL handles success and error conditions with SQL Statements. There are the obvious success and failure states, however MySQL has a third state known was warnings, or more specifically success with warnings. As the use of warnings is uncommon with other data store products, many applications, developers and programming languages ignore checking for warnings, or are simply unaware of this inbuilt feature.

Using the MySQL command line client, you can get a visual indication of warnings following an SQL statement which then help the need for reviewing what warnings occurred.

SET SESSION sql_mode='';
INSERT INTO dp4 (customer_id, first_name, last_name)
VALUES (NULL,'Christopher','Smith');
Query OK, 1 row affected, 1 warning (0.00 sec)

| Level   | Code | Message                                         |
| Warning | 1265 | Data truncated for column 'first_name' at row 1 |
1 row in set (0.00 sec)

When using PHP there is no indication of SQL warnings unless you specifically check after every SQL statement. For example:

  $con = mysqli_connect('localhost', 'scott', 'sakila', 'design');
  if (mysqli_connect_errno()) {
    print 'Failed to connect to MySQL: ' . mysqli_connect_error() . "n";

  if (!mysqli_query($con, 'INSERT INTO dp4 (customer_id, first_name, last_name) ' .
                          'VALUES (NULL,"Christopher","Holt") ')) {
    print 'Failed to insert data: ' . mysqli_error($con) . "n";
  if (($warnings = mysqli_warning_count($con)) > 0) {
    if ($rs = mysqli_query($con, "SHOW WARNINGS")) {
        $row = mysqli_fetch_row($rs);
        printf("%s (%d): %sn", $row[0], $row[1], $row[2]);

The best recommendation is to avoid all situations where MySQL can produce a warning and does not provide the best possible data integrity.

Refer to the MySQL Reference Manual for more information on SHOW WARNINGS at

The Larger Problem

This underlying problem is actually more difficult to correct for an existing production system than enabling the correct sql_mode configuration option. Using the customer example, the identification of any rows that are 10 characters in length could be valid, or may have been truncated. There is no easy way to obtain the actual value that was originally submitted. The use of the correct numerical data type (DP#14) can provide a check constraint for values, however it can also suffer from the same truncation problem. You especially hope that this does not affect your payroll, your frequent flyer points balance or your accumulated score from your favorite online game.

The solution is to avoid the problem of producing incorrect data.


While this example is using a character data type, field truncation can also occur with numeric and date data types. The use of applicable sql_mode configuration settings is a critical MySQL design practice to ensure adequate data integrity that all systems need to implement.

More References

About Expert MySQL Design Practices

This new series by leading MySQL Expert Ronald Bradford helps the software engineer understand, appreciate and develop the right skills and techniques to build scalable software solutions. These proven and reproducible design practices will ensure your use of MySQL to improve performance, scalability and reliability.

These expert design practices are from 25 years of professional experience following formal university qualifications in computer science. All of these practices are written for use with a MySQL based data system however most of the content in these practices predate the existence of the MySQL product and have stood the test of time with emerging technologies and software development approaches. Many practices apply directly to other data stores, whether relational or the new NoSQL products and include working with persistent and non-persistent data storage products.

More information about the series can be found at

Unexplained (trivial) MySQL behavior

The -N or –skip-column-names is a convenient option with the mysql client to skip the header line of output.
However I found when viewing the output via the terminal, some interesting and unexplained output.

| VARIABLE_NAME                  | VARIABLE_VALUE |
| ABORTED_CLIENTS                | 710001         |
| ABORTED_CONNECTS               | 31             |
| BINLOG_CACHE_DISK_USE          | 0              |
| BINLOG_CACHE_USE               | 0              |
| BYTES_RECEIVED                 | 2522301004     |
| BYTES_SENT                     | 317785976      |
| COM_ADMIN_COMMANDS             | 2890667        |
| COM_ASSIGN_TO_KEYCACHE         | 0              |
| COM_ALTER_DB                   | 0              |
| COM_ALTER_DB_UPGRADE           | 0              |
|                ABORTED_CLIENTS |       710001 |
|               ABORTED_CONNECTS |           31 |
|          BINLOG_CACHE_DISK_USE |            0 |
|               BINLOG_CACHE_USE |            0 |
|                 BYTES_RECEIVED |   2522947764 |
|                     BYTES_SENT |    348838502 |
|             COM_ADMIN_COMMANDS |      2890742 |

As you can see all the values of the first column are right aligned in terminal display. When written to a file (which is the intended outcome), the data is not.

$ mysql ... > /tmp/x
$ head /tmp/x
BYTES_SENT	530046795

Improved Security with MySQL 5.6

Installed on a clean CentOS 6.4 AWS instance.

sudo su -
cd /tmp
tar xvf MySQL-5.6.13-1.el6.x86_64.rpm-bundle.tar
yum install -y libaio perl
rpm -i MySQL*.rpm

The following output is the sign that security is being considered with new MySQL versions. Woot!

You will find that password in '/root/.mysql_secret'.

You must change that password on your first connect,
no other statement but 'SET PASSWORD' will be accepted.
See the manual for the semantics of the 'password expired' flag.

Also, the account for the anonymous user has been removed.

In addition, you can run:


which will also give you the option of removing the test database.
This is strongly recommended for production servers.

See the manual for more instructions.

Please report any problems with the /usr/bin/mysqlbug script!

The latest information about MySQL is available on the web at

Support MySQL by buying support/licenses at

New default config file was created as /usr/my.cnf and
will be used by default by the server when you start it.
You may edit this file to change server settings

However, moving the MySQL configuration to /usr. WTF?

Identify package management file contents

Recently when implementing Milliseconds in MySQL Proxy Lua Script I needed to identify what was installed from a given package.

For CentOS

$ sudo yum install -y yum-utils
$ repoquery -q -l --plugins lua-socket


For Ubuntu

$ dpkg-query -L liblua5.1-socket2


MySQL Proxy and microseconds

By default the included Lua within MySQL proxy (0.8.3) does not include socket, necessary for getting microsecond granularity. To setup you have to install Lua and socket on the OS first:

For CentO5

$ sudo yum install lua lua-socket

For Ubuntu

$ sudo apt-get install lua5.1 liblua5.1-socket2

The following enables use within MySQL Proxy.

cp /usr/share/lua/5.1/socket.lua /path/to/mysqlproxy/lib/mysql-proxy/lua
cp -r /usr/lib64/lua/5.1/socket /path/to/mysqlproxy/lib/mysql-proxy/lua
cp -r /usr/lib64/lua/5.1/mime /path/to/mysqlproxy/lib/mysql-proxy/lua

My lua script can now use syntax similar to:

require 'socket'

function read_query( packet )
  print( string.format("# %s.%3dn%s;n","%X",now),select(2,math.modf(now))*1000 , query))

Announcing the MySQL Plugin for New Relic

Many application developers would know of New Relic. A SaaS performance and monitoring tool targeted towards your web application monitoring including PHP, Ruby, Java, .Net, Python and Node.

With the release today (June 19, 2013) of the New Relic Platform, custom monitoring of data stores including MySQL are now possible. Try it now free. This link will provide you a free standard account (no cost, no billing details necessary), that enables you to perform application monitoring, server monitoring, MySQL instance monitoring and monitoring of many other products via many plugins.

Over the next few posts I will be discussing some of the design decisions I made for this MySQL plugin. New Relic has certain features that lend towards really helping developers monitor and diagnose the application (I have been surprised how it has helped in debugging DB and OS problems directly from PHP code for example). However, often it’s important that knowing the server resources, the database usage is critical to seeing the whole picture, and with the new plugins, New Relic gives developers, system administrators and database administrator some well targeted insights. When building custom dashboards you can see CPU usage, Database usage, and your web application volume, page load time and other metrics on one page.

The MySQL plugin has two pre-requisite requirements. A MySQL server running 5.0 or better, and a Java JRE 1.6 or better. The plugin can work either directly on each MySQL server/instance, and therefore needs the JRE, or it is possible to configure a single server to collect all MySQL statistics and report them to New Relic. There are no specifics that restrict this plugin working for any MySQL variants/forks, infact I specifically designed the plugin to be forward compatible with new version and status variables for example, and support custom recording of any metrics (more on that later).

This is first release of the New Relic Platform and MySQL plugin so I expect a lot of refinements, improvements and suggestions as we move forward. As an integral part of developing the MySQL plugin and using the New Relic Java SDK, there are a number of roadmap items to better serve MySQL and other products that will be coming in future releases. The beta version of the MySQL plugin has been running on production MySQL servers for several months now and working well.

New Relic provides two ways to display data, first by graphs, and second by tables. There is a handy information option in the Server monitoring that is not presently available for the plugins. Graphs work best with multiple data points and constantly changing data and records averages. When looking at the SQL commands for example is great to see the total breakdown, monitoring MySQL replication lag (a single metric), that hopefully is generally zero can appear a little bland. A cool trick is to click on the legend, this toggles the displayed value, and can really help when one value in a graph hogs the metric.

At present the plugin has 4 tabs of display:

  • Overview provides a high level view of total reads v total writes, database connections, network communications and a table of key utilizations (which I will discuss more in another post)
  • Query Analysis shows details of SQL commands, temporary (memory/disk) tables, slow queries, query cache usage, select query types, sort types and table locking.
  • InnoDB Metrics include buffer pool operations, a page breakdown, row operations, log writes, log and data fsyncs, checkpoint age, history list, internal threads and mutexes.
  • Replication shows lag, relay log volume, I/O thread lag, slave errors and master binary log volume (when on a master).

The plugin is written to be extensible via JSON configuration. Those wishing to monitor different variables, or say custom metrics from storage engines like Tokutek can be easily defined, either a key/value set, or single row of metrics. However, the initial version of the dashboards does not allow the customers to modify the present dashboard. Requests are welcome for me to expand the current global dashboards.

In the News

Percona Ireland??

Anybody else noticed that Percona appears to not be a US entity any more?

I observed it today.

$ sudo /usr/bin/innobackupex ...
InnoDB Backup Utility v1.5.1-xtrabackup; Copyright 2003, 2009 Innobase Oy
and Percona Ireland Ltd 2009-2012.  All Rights Reserved.

This software is published under
the GNU GENERAL PUBLIC LICENSE Version 2, June 1991.

In previous versions this did say Percona Inc, 2009-2012.

The footer of the Documentation also states © Copyright Percona Ireland Ltd. 2009-2013.

Percona Live Conference Recommendations

Percona Live MySQL Conference and Expo, April 22-25, 2013

While many attendees are repeat offenders, if 2013 is your first MySQL conference and you are relatively new with MySQL (say < 2 years experience), it can be daunting to determine which of the 8 or more concurrent sessions you should attend during the conference.

Here are my top recommendations that give you a good grounding in the various conference topics and a wealth of experience from known MySQL authorities, on important topics.

  1. A backup today saves you tomorrow by Ben Mildred at Pythian. Losing your data is a terrible experience. Learn what is needed to keep your data safe and you system highly available.
  2. Survey of Percona Toolkit: Command-Line tools for MySQL by Daniel Nichter at Percona. There are a wealth of additional MySQL tools that any resource should be familiar with. These are some of the most popular.
  3. Script It. Make Professional DBA tools out of nothing by Giuseppe Maxia at Continuent. I seasoned expert in the MySQL field, his expertise is invaluable to learn how to use MySQL effeciently. Giuseppe is also the creator of MySQL Sandbox, a huge productivity tool for developers.
  4. Practices for reducing MySQL database size by Yoshinori Matsunobu at Facebook. This is a consulting technique I use for great advantage with clients to improve performance. Yoshinori is also one of the most popular technical speakers at events.
  5. MHA: Getting started and moving past the quirks by Colin Charles at Monty Program. Creating a HA environment is essential for any successful application. MHA is one open source approach that should be considered.
  6. Managing data and data archiving using MySQL 5.6 new features of portable tablespace and exchange partition by Marco tusa at Pythian. If there was one footnote feature in MySQL 5.6 that has a huge benefit, this is the feature. As data continues to grow rapidly in size, archiving is more important.

This year’s conference talks are organized by topic and skill level. This can also help you find talks specific to your needs. Topics include the following:

  • Developing Applications
  • Tools
  • Best Practices for Businesses
  • Database Administration
  • Utilizing Hardware
  • Replication and High Availability Strategies
  • Treads in Architecture and Design
  • New Features

2013 is sure to be a great event, with a lineup of many MySQL product features for the MySQL ecosystem.

When is the error log filename not the right filename

When evaluating a MySQL system one of the first things to look at is the MySQL error log. This is defined by the log[_-]error variable in the MySQL Configuration file. Generally found like:

grep log.error /etc/my.cnf

It is possible to find multiple rows because this could be defined in the [mysqld] and [mysqld_safe] sections. It is also possible it is incorrectly defined twice in any given section.

Immediately I see a problem here, and the following describes why. If you look at this file name, in this case it’s actually found, but the file is empty.

$ ls -l /var/lib/mysql/logs/mysql_error_log
-rw-r----- 1 mysql mysql 0 Feb 19 20:35 /var/lib/mysql/logs/mysql_error_log

An error log should never exist and be empty, because starting the instance producing messages. An error log could be empty because the system does a log rotate (BTW, never rotate the error log, see The correct approach to rolling MySQL logs). However it is empty in this case because MySQL is not writing to the error log filename as defined, because it does not have file extension. NOTE: there is no .log or similar extension. Looking more closely.

$ ls -l /var/lib/mysql/logs/mysql_error_log*
-rw-r----- 1 mysql mysql      0 Feb 19 20:35 /var/lib/mysql/logs/mysql_error_log
-rw-rw---- 1 mysql root  394530 Feb 19 20:35 /var/lib/mysql/logs/mysql_error_log.err

As you can see, MySQL has overwritten your parameter and given the file an extension, as verified by the runtime value.

mysql> show global variables like 'log_error';
| Variable_name | Value                                   |
| log_error     | /var/lib/mysql/logs/mysql_error_log.err |

So, the tip is, always use .log for your MySQL error (and slow query log).

NOTE: Placing the log files in the MySQL datadir (which defaults to /var/lib/mysql) is also a bad idea. A topic for another blog post at a later time.

The MySQL symlink trap

Many users of MySQL install and use the standard directories for MySQL data and binary logs. Generally this is /var/lib/mysql.
As your system grows and you need more disk space on the general OS partition that commonly holds /tmp, /usr and often /home, you create a dedicated partition, for example /mysql. The MySQL data, binary logs etc are then moved to this partition (hopefully in dedicated directories). For example data is placed in /mysql/data.
Often however, a symbolic link (symlink) is used to so MySQL still refers to the data in /var/lib/mysql.

When it comes to removing the symlink and correctly configuring MySQL, you first stop MySQL and correctly defining the datadir my.cnf variable to point to the right location. However, MySQL still keeps the legacy directory information around and this will cause MySQL replication to fail in several ways when you attempt to restart your MySQL instance.

The binary log index, the relay log index, and the relay log info files all contain the legacy path. MySQL does not make it easy to also determine these actual files.

The relay_log_index variable defines the index, but defaults to [relay_log].index when not defined, so with SHOW GLOBAL VARIABLES this may be blank.
log-bin-index is an configurable option, but no matching global variable. It defaults to [log-bin].index.
relay_log_info will contain a value, generally only a file that is relevant to the data directory.

In these situations, your only option to to manually edit these files, specifying the new datadir (or log-bin) path in order to correctly remove symlinks.

The best advice, is to consider the design of your system first, and never place data in default locations if you feel this has to be modified later. Define those dedicated directories before you start using your MySQL instance.

Poor programming practices

When will it stop. These amateur programmers that simply cut/paste code really affect those good programmers in the ecosystem trying to make a decent living. I was reviewing a developed (but incomplete) PHP/MySQL system using a common framework (which in itself is irrelevant for this post).

In one source file there were 12 repetitions of the following code:

    if (!array_key_exists($id,$this->session->userdata['permissions']) OR
	!array_key_exists('id', $this->session->userdata['permissions'][$id]) OR
	!array_key_exists('scope', $this->session->userdata['permissions'][$id]['name'])){
      $this->session->set_flashdata('alert', 'You are not authorized to go there.');

It’s bad enough when code is repeated and not put in a simple re factored function. When it’s repeated 12 times in one file, and OMG over 100 times in the product, that is a recipe for bugs, and high maintenance codes due to extremely poor coding practice.

Carbonite Online Backup is a fraud

Do not listen to the hype or the advertising. Carbonite backup solution is a fraud. I never realized the extent of the failures of the software until I had a problem, which is when you expect and demand commercial software you pay for to work.

Ironically, looking now via Google search for Carbonite restore problems there are plenty of horror stories. And just to add to the experience, the definition of Carbonite in the dictionary is “Explosive”.

Here is the first red flag. You logon to the website, and if you click on “View Files”, or under the “Backup” tab with a nice cloud icon you click “Access Files”, there is no information available. You will receive the error “We are unable to access your files on this computer right now. Please contact Customer Support at for further assistance.”. What is really means is “Until your computer is online and your Carbonite software is working, you have no access to the details of your files that are apparently backed up.”

When you contact customer support, they have no idea what that message means, and after wasting your time (for me in a chat session), a ticket was opened with technical support. The problem is there is no way to track your ticket online, get updates, post information etc. I minimized the window, and now my chat session is closed (most likely by the rather ill informed customer representative). Guess what? There is no information about the ticket number in your account. So I may as not every had that conversation, any evidence of it is now lost.

I was told that until a technical support person could access my computer there was no way I could access my files. WTF? If there is no centralized list or log of my files on your backup solution, and no way to see this, how do I know you ever backed up my files. A fancy progress bar that flashes and says backing up files. Any 2 year old can write that faux display. Red flag number 2. It seems the only way to see my files is to install this software on yet another system to restore files. That is as Google Searching indicates, a likely lesson in extreme frustration

The ultimate cause of the problem was my system crashed, and when it restarted, Cabonite software was in this stuck state of “Registering”. I was told to just re-install the software, that’s not an answer in my books. That is red flag number 3.

I have definitely removed my credit card from their site to stop any automatic renewal of this crap software.

For the record, my home office backup solution includes important files on a Drobo. Backups of Documents to DropBox and then sync’d to another system. Backup of all files on several machines to a central external USB, and then regular backups of that which are taken offsite.

You cannot be too careful with important things like photos. Unfortunately this solution lacks a central catalog, and versioning of files (I.e. I overwrote an important presentation and did not realize for a few months, when I looked at all my backups of this, 3 or 4 copies, they were all the overwritten file, not the original. It took about a day to actually find a copied version, not a backed up version)

Upgrading to MySQL 5.5 on Ubuntu 10.04 LTS

Ubuntu does not provide an apt-get repository package for MySQL 5.5 on this older OS, however this is still a widely used long term support version. The following steps will upgrade an existing MySQL 5.1 apt-get version to a standard MySQL 5.5 binary.

Step 1. Remove existing MySQL 5.1 retaining data and configuration

sudo su -
service mysql stop
cp -r /etc/mysql /etc/mysql.51
cp -r /var/lib/mysql /var/lib/mysql.51
which mysqld
dpkg -P mysql-server mysql-server-5.1 mysql-server-core-5.1
which mysqld
which mysql
dpkg -P mysql-client-5.1 mysql-client-core-5.1
which mysql
dpkg -P libdbd-mysql-perl libmysqlclient16 mysql-common
# This will not remove /etc/mysql if any other files are in the directory
dpkg -l | grep mysql
[ -d /etc/mysql ] && mv /etc/mysql /etc/mysql.uninstall
cp -r /etc/mysql.51 /etc/mysql

2. Prepare configuration and required directories.

sudo su -
grep basedir ${MYCNF}
sed -ie "s/^basedir.*$/basedir=/opt/mysql/" ${MYCNF}
grep basedir ${MYCNF}
sed -ie "/^[mysqld_safe]/a
skip-syslog" ${MYCNF}
chown -R mysql:mysql /var/lib/mysql
mkdir -p /var/run/mysqld
sudo chown mysql:mysql /var/run/mysqld

Install MySQL 5.5

sudo su -
mkdir -p /opt
cd /opt
# Install MySQL 5.5 Binaries
apt-get install -y libaio-dev  # New 5.5 dependency
tar xvfz mysql*.tar.gz
ln -s mysql-5.5.28-linux2.6-x86_64 /opt/mysql
echo "export MYSQL_HOME=/opt/mysql
export PATH=$MYSQL_HOME/bin:$PATH" > /etc/profile.d/
chmod +x /etc/profile.d/
. /etc/profile.d/
echo $PATH
which mysql

4. Upgrade and verify MySQL Instance

su - mysql
bin/mysqld_safe --skip-syslog &
tail /var/log/mysql/error.log
# There will be some expected ERRORS in error log
bin/mysql_upgrade -uroot
bin/mysqladmin -uroot  shutdown
bin/mysqld_safe --skip-syslog &
tail -100 /var/log/mysql/error.log
mysql -uroot -e "SELECT VERSION();"
bin/mysqladmin -uroot  shutdown

5. Setup MySQL for system use

# As Root
sudo su -
cp /opt/mysql/support-files/mysql.server ${INIT}
sed -ie "s/^basedir=$/basedir=/opt/mysql/;s/^datadir=$/datadir=/var/lib/mysql/" ${INIT}
${INIT} start
mysql -uroot -e "SELECT VERSION();"
${INIT} stop

The heavy handed LinkedIn approach to your contacts

I recently wanted to add two individuals to my list of professional contacts at LinkedIn. I was extremely disappointed at the modified user interface (UI) experience that made it difficult to do so. In the past, you just entered a list of emails.

Many companies these days pressure you into opening up your entire network of contacts for their benefits of knowing your social graph. This is unacceptable.

You have to go thru the following complexity just to send an email request for connection in LinkedIn now.

  • Add Connections
  • Select any email (last button of options)
  • Click Invite by individual email (hidden at bottom of page)

Not a cool new feature for Master_Host

I was surprised to find on a customer MySQL server this new syntax for Master_host in SHOW SLAVE STATUS.

*************************** 1. row ***************************
               Slave_IO_State: Connecting to master
                  Master_Host: or 10.XXX.XX.XXX
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: db1-354215-bin-log.000005
          Read_Master_Log_Pos: 1624
               Relay_Log_File: db2-354214-relay-log.000001

Is this a fancy new Percona Server feature? No. It’s operator error.

We read a little further to find.

             Slave_IO_Running: Connecting

                Last_IO_Errno: 2005
                Last_IO_Error: error connecting to master 'repl@ or 10.XXX.XX.XXX' - retry-time: 60  retries: 86400

How can this be created.
Using MySQL MHA, you get the following message in the output of commands to manage replication.

Thu Jan  3 17:06:40 2013 - [info]  All other slaves should start replication from here.
Statement should be: CHANGE MASTER TO MASTER_HOST=' or 10.XXX.XX.XXX', MASTER_PORT=3306,
 MASTER_LOG_FILE='db1-354215-bin-log.000005', MASTER_LOG_POS=1624, MASTER_USER='repl',

Needless to say, this syntax was taken literately, and MySQL did not complain.

I would suggest here that while MySQL does not do any validation on the value of the MASTER_HOST value in the CHANGE MASTER TO command to ensure it is resolvable it should at least do some validation to ensure the value is either a DNS entry or an IPV4,IPV6 value, that is space ‘ ‘, is not a valid character in these situations.