10x Performance Improvements in MySQL – A Case Study

The slides for my presentation at FOSDEM 2010 are now available online at slideshare. In this presentation I describe a successful client implementation with the result of 10x performance improvements. My presentation covers monitoring, reviewing and analyzing SQL, the art of indexes, improving SQL, storage engines and caching.

The end result was a page load improvement from 700+ms load time to a a consistent 60ms.

Using the Query Cache effectively

Maximize your strengths, minimize your weaknesses.

You can apply this approach to many things in life, I apply it to describing and using MySQL the product, and it’s components. The Query Cache like many features in MySQL, and indeed features in many different RDBMS products (don’t get me started on Oracle *features*) have relative benefits. In one context it can be seen as ineffective, or even detrimental to your performance, however it’s course grain nature makes it both trivial to disable dynamically (SET GLOBAL query_cache_size=0;), and also easy to get basic statistics on current performance (SHOW GLOBAL STATUS LIKE ‘QCache%';) to determine effectiveness and action appropriately.

The Query Cache is course grained, that is it is rather simple/dumb in nature. When you understand the path of execution of a query within the MySQL kernel you learn a few key things.

  • When enabled, by default the Query Cache will cache all SELECT statements within certain defined system parameter conditions. There are of course exceptions such as non-deterministic functions, prepared statements in earlier versions etc.
  • Any DML/DDL statement for a table that has a query cached, flushes all query cache results that pertain to this table.
  • You can use SQL_CACHE and SQL_NO_CACHE as hints however you can’t configure on a table by table, or query basis.
  • The query cache works on an exact match of the query (including spaces and case) and other settings such the client character set, and protocol version. If a match is found, data is returned in preformed network packets.<.li>

The Query Cache was not good when set to large values (e.g. > 128M) due to in-efficient cache invalidation. I’m not certain of the original source of this condition however Bug #21074, fixed in 5.0.50 and 5.1.21 is likely the reason.

My advice is to disable the Query Cache by default, especially for testing. As a final stress test you can enable to determine if there is a benefit.

I wish MySQL would spend time in improving key features, for example the Query Cache lacks sufficient instrumentation like what queries are in the cache, what tables are in the cache, and also lack all the sufficient system parameters exposed to fine tune. I believe there is a patch to show the queries for example, but I was unable to find via a google search.

It is a powerful and easy technology if you use it well. It involves architecting your solution appropriately, and knowing when the Query Cache is ineffective.

I have a number of circumstances where the query cache is extremely effective, or could be with simple modifications. A recommendation to a recent client with a 1+TB database was to split historical and current data into two different instances. The data was already in separated tables, the application already performed dual queries, so the change was a simple as a new connection pool. The benefits were huge, not only would the backup process be more efficient, some 500GB of data now only had to be backed up once (as is was 100% static), the scaling and recovery process improved, but the second MySQL instance could enable the query cache and the application would get a huge performance improvement with ZERO code changes for caching. That’s a quick and easy win.

On a side note, I wanted to title this “The MySQL Query Cache is not useless”. When I was a MySQL employee I got reprimanded (twice) for blogging anything about MySQL that wasn’t positive. This blog post is in direct response to Konstantin, a Sun/MySQL employee who actually works on the actually MySQL server code who wrote Query cache = useless?. In my view it is not useless.