Binary Log Replayer

When using the replication slave stream, or mysql command line client and mysqlbinlog output from a binary/relay log, all statements are executed in a single thread as quickly as possible.

I am seeking a tool to simulate the replay of the binary/relay log for a benchmark at a pace that is more representative to original statements. For a simple example, if the Binary Log has 3 transactions in the first second, 2 transactions in the second second, and 5 transactions in the third second, I am wanting to simulate the replay to take roughly 3 seconds, not as fast as possible (which would be sub-second). The tool should try to wait the remainder of a second before processing SQL statements in the incoming stream.

Does anybody know of a tool that currently provides this type of functionality? Any input appreciated before I create my own.

Part 2 – Simple lessons in improving scalability

Given the popular response from my first lesson in improving scalability where I detailed simple ways to eliminate unnecessary SQL, let me share another common bottleneck with MySQL scalability that can be instantly overcome.

Analyzing the writes that occur on a system can expose obvious potential bottlenecks. The MySQL Binary Log is a wealth of information that can be mined. Simple DML Counts per table can be achieved by a single line command.

Let’s look at the following example output of a production system:

mysqlbinlog /path/to/mysql-bin.000999 |  
   grep -i -e "^update" -e "^insert" -e "^delete" -e "^replace" -e "^alter"  | 
   cut -c1-100 | tr '[A-Z]' '[a-z]' |  
   sed -e "s/t/ /g;s/`//g;s/(.*$//;s/ set .*$//;s/ as .*$//" | sed -e "s/ where .*$//" |  
   sort | uniq -c | sort -nr

Of the approx 100,000 DML statements we get the following breakdown.

  55283 update sessions
  25204 insert into sessions
  12610 update items
  10536 insert into item_categories
   7532 update users
   5168 delete from item_categories
 

More then 50% of the statements that are written to the binary log and therefore replicated are INSERT’s into the sessions table. A further 25% are UPDATE’s to the same table. This represents 75% of DML statements in just the two most frequent statements.

What is disappointing is that these statements do not belong in MySQL. This is an example of when MySQL is being abused for a purpose where other products are more suited. While there is the argument in using MySQL for storing data, the impact in MySQL memory management, backup/recovery and slave replication throughput and lag can significantly impact scalability of your important MySQL data.

What is observed here is session management where a key value store product should be used as an alternative. In most circumstances it is likely this information is not even required to be persisted. The obvious replacement here is using memcached. If you do wish to persist this data there is an ever increasing list of products including Redis, Tokyo Cabinet/Kyoto Cabinent, Membrain, Voldemort etc that are specifically designed as a key-value store. Even the popular noSQL MongoDB can be easily substituted to perform as a key-value session manager with the added benefits of being a more fully functional product for other purposes.

This is often a common mistake when you use a framework such as Ruby on Rails (RoR) or PHP Code Igniter and many others.