Part 2 – Simple lessons in improving scalability

Given the popular response from my first lesson in improving scalability where I detailed simple ways to eliminate unnecessary SQL, let me share another common bottleneck with MySQL scalability that can be instantly overcome.

Analyzing the writes that occur on a system can expose obvious potential bottlenecks. The MySQL Binary Log is a wealth of information that can be mined. Simple DML Counts per table can be achieved by a single line command.

Let’s look at the following example output of a production system:

mysqlbinlog /path/to/mysql-bin.000999 |  
   grep -i -e "^update" -e "^insert" -e "^delete" -e "^replace" -e "^alter"  | 
   cut -c1-100 | tr '[A-Z]' '[a-z]' |  
   sed -e "s/t/ /g;s/`//g;s/(.*$//;s/ set .*$//;s/ as .*$//" | sed -e "s/ where .*$//" |  
   sort | uniq -c | sort -nr

Of the approx 100,000 DML statements we get the following breakdown.

  55283 update sessions
  25204 insert into sessions
  12610 update items
  10536 insert into item_categories
   7532 update users
   5168 delete from item_categories

More then 50% of the statements that are written to the binary log and therefore replicated are INSERT’s into the sessions table. A further 25% are UPDATE’s to the same table. This represents 75% of DML statements in just the two most frequent statements.

What is disappointing is that these statements do not belong in MySQL. This is an example of when MySQL is being abused for a purpose where other products are more suited. While there is the argument in using MySQL for storing data, the impact in MySQL memory management, backup/recovery and slave replication throughput and lag can significantly impact scalability of your important MySQL data.

What is observed here is session management where a key value store product should be used as an alternative. In most circumstances it is likely this information is not even required to be persisted. The obvious replacement here is using memcached. If you do wish to persist this data there is an ever increasing list of products including Redis, Tokyo Cabinet/Kyoto Cabinent, Membrain, Voldemort etc that are specifically designed as a key-value store. Even the popular noSQL MongoDB can be easily substituted to perform as a key-value session manager with the added benefits of being a more fully functional product for other purposes.

This is often a common mistake when you use a framework such as Ruby on Rails (RoR) or PHP Code Igniter and many others.

Comments

Pavel says

February 24, 2011 at 7:10 am

Quite a usable binlog grep! thank you
Simon J Mudd says

February 28, 2011 at 3:39 am

I also use this technique quite a lot to collect data from the binlogs over a period of time to allow me to see why there might be a sudden increase in load or as you suggest here to figure out which tables get most changes.

See also: http://blog.wl0.org/2010/10/how-analysing-your-binlogs-can-be-quite-informative/ which I wrote a few months ago.
ronald says

February 28, 2011 at 2:28 pm

Simon raises a great point about monitoring. You should always be gathering statistics to monitor change over time, especially to find changes from the norm.
Claudio Nanni says

March 12, 2011 at 5:37 am

…and I would say int 90% of the cases is terrible code!
most developers have usually no idea of the implication of opening a connection to the database,
and they use it as it was a local variable. I saw php scripts looping thru thousands of IDs and shooting a select for each ID, putting on knees a complete cluster set (web servers, mysql w/Replication). Also enhanced by the unique mutexes of MySQL.
sanket kolhe says

March 15, 2011 at 12:14 am

True, I used memcached to store session and it ease 70% load of Mysql .
Hassan says

March 15, 2011 at 8:10 am

thanks for those insights.

I’ve done the analysis you mentioned above and found lots of queries on my session table which doesn’t need to be replicated to slave, is it possible to avoid/skip selected insert / update / delete statements from getting logged?

please suggest.
RA says

March 31, 2011 at 10:53 am

very useful post with nice binlog grep. However, we use RBR (Row-based) binlog format and in this case, do you know some sort of similar grep to get the bottomline details of how much mysql is spending in managing sessions on our case? Thanks!
Derek Downey says

September 1, 2011 at 9:27 am

Thank you for the very nice and innovative grep command!