Archive for February 24th, 2011

Part 2 – Simple lessons in improving scalability

Thursday, February 24th, 2011

Given the popular response from my first lesson in improving scalability where I detailed simple ways to eliminate unnecessary SQL, let me share another common bottleneck with MySQL scalability that can be instantly overcome.

Analyzing the writes that occur on a system can expose obvious potential bottlenecks. The MySQL Binary Log is a wealth of information that can be mined. Simple DML Counts per table can be achieved by a single line command.

Let’s look at the following example output of a production system:

mysqlbinlog /path/to/mysql-bin.000999 |  \
   grep -i -e "^update" -e "^insert" -e "^delete" -e "^replace" -e "^alter"  | \
   cut -c1-100 | tr '[A-Z]' '[a-z]' |  \
   sed -e "s/\t/ /g;s/\`//g;s/(.*$//;s/ set .*$//;s/ as .*$//" | sed -e "s/ where .*$//" |  \
   sort | uniq -c | sort -nr

Of the approx 100,000 DML statements we get the following breakdown.

  55283 update sessions
  25204 insert into sessions
  12610 update items
  10536 insert into item_categories
   7532 update users
   5168 delete from item_categories
 

More then 50% of the statements that are written to the binary log and therefore replicated are INSERT’s into the sessions table. A further 25% are UPDATE’s to the same table. This represents 75% of DML statements in just the two most frequent statements.

What is disappointing is that these statements do not belong in MySQL. This is an example of when MySQL is being abused for a purpose where other products are more suited. While there is the argument in using MySQL for storing data, the impact in MySQL memory management, backup/recovery and slave replication throughput and lag can significantly impact scalability of your important MySQL data.

What is observed here is session management where a key value store product should be used as an alternative. In most circumstances it is likely this information is not even required to be persisted. The obvious replacement here is using memcached. If you do wish to persist this data there is an ever increasing list of products including Redis, Tokyo Cabinet/Kyoto Cabinent, Membrain, Voldemort etc that are specifically designed as a key-value store. Even the popular noSQL MongoDB can be easily substituted to perform as a key-value session manager with the added benefits of being a more fully functional product for other purposes.

This is often a common mistake when you use a framework such as Ruby on Rails (RoR) or PHP Code Igniter and many others.

Optimizing UPDATE and DELETE statements

Thursday, February 24th, 2011

Updated Nov 2011. Check out my latest book on Optimizing SQL Statements for more information. MySQL 5.6.2 also now provides an EXPLAIN syntax for UPDATE and DELETE statements natively.

While most people look at performance optimizations for SELECT statements, UPDATE and DELETE statements are often overlooked. These can benefit from the principles of analyzing the Query Execution Plan (QEP). You can only run an EXPLAIN on a SELECT statement, however it’s possible to rewrite an UPDATE or DELETE statement to perform like a SELECT statement.

To optimize an UPDATE, look at the WHERE clause. If you are using the PRIMARY KEY, no further analysis is necessary. If you are not, it is of benefit to rewrite your UPDATE statement as a SELECT statement and obtain a QEP as previously detailed to ensure optimal indexes are used. For example:

UPDATE t
SET	c1 = ‘x’, c2 = ‘y’, c3 = 100
WHERE c1 = ‘x’
AND	d = CURDATE()

You can rewrite this UPDATE statement as a SELECT statement for using EXPLAIN:

EXPLAIN SELECT c1, c2, c3 FROM	t WHERE c1 = ‘x’ AND	d = CURDATE()

You should now apply the same principles as you would when optimizing SELECT statements.