SQL Analysis with MySQL Proxy – Part 2

As I outlined in Part 1 MySQL Proxy can be one tool for performing SQL analysis. The impact with any monitoring is the art of monitoring will affect the results, in this case the performance. I don’t recommend enabling this level of detailed monitoring in production, these techniques are designed for development, testing, and possibly stress testing.

This leads to the question, how do I monitor SQL in production? The simple answer to this question is, Sampling. Take a representative sample of your production system. The implementation of this depends on many factors including your programming technology stack, and your MySQL topology.

If for example you are using PHP, then defining MySQL proxy on a production system, and executing firewall rules to redirect incoming 3306 traffic to 4040 for a period of time, e.g. 2 seconds can provide a wealth of information as to what’s happening on the server now. I have used this very successfully in production as an information gathering an analysis tool. It is also reasonably easy to configure, execute and the impact on any failures for example are minimized due to the sampling time.

If you run a distributed environment with MySQL Slaves, or many application servers, you can also introduce sampling to a certain extent as these specific points, however like scaling options, it is key to be able to handle and process the write load accurately.

Another performance improvement is to move processing of the gathered information in MySQL proxy to a separate thread or process, removing this work from the thread execution path and therefore increasing the performance. I’m interested to explore the option of passing this information off to memcached or gearman and having MySQL proxy simply capture the packet information and distributing the output. I have yet to see how memcached and/or gearman integrate with the Lua/C bindings. If anybody has experience or knowledge I would be interested to know more.

It is interesting to know that Drizzle provides a plugin to send this level of logging information to gearman automatically.

SQL query analysis with MySQL Proxy

Long before there was the official Query Analyzer (QUAN), a component of MySQL Enterprise, SQL analysis was possible using MySQL Proxy.

The following is an introduction to logging and query analysis with MySQL Proxy.

Get MySQL Proxy

You need to first download MySQL Proxy. In this example I am using the Linux RHEL5 64bit OS and Version 0.7.2

$ wget http://dev.mysql.com/get/Downloads/MySQL-Proxy/mysql-proxy-0.7.2-linux-rhel5-x86-64bit.tar.gz/from/http://mirror.trouble-free.net/mysql_mirror/
$ tar xvfz mysql-proxy-0.7.2-linux-rhel5-x86-64bit.tar.gz
$ ln -s mysql-proxy-0.7.2-linux-rhel5-x86-64bit mysql-proxy
$ export PATH=`pwd`/mysql-proxy/sbin:$PATH
$ mysql-proxy --help-all


MySQL Proxy uses TCP/IP, so it is important you connect via the actual hostname. You should first confirm this, as appropriate MySQL permissions may be necessary. For example:

$ mysql -h`hostname` -u -p

On confirmation this works, you can then connect directly to the proxy

$ mysql -h`hostname` -P4040 -u -p


$ cd mysql-proxy/share/doc/mysql-proxy/
$ wget -O log.lua http://ronaldbradford.com/mysql-dba/mysql-proxy/log.lua
$ mysql-proxy --proxy-lua-script=share/doc/mysql-proxy/log.lua &
$ tail -f mysql.log

This script is based on simple query logging which requires a modification to work in more current versions of MySQL proxy.

$ mysql -hhostname -P4040 -u -p
mysql>  SELECT host,user,password FROM mysql.user;
mysql>  SELECT table_schema,COUNT(*) FROM information_schema.tables GROUP BY table_schema;
mysql>  SELECT NOW(), SLEEP(3);
mysql>  EXIT
$ cat mysql.log
2009-09-02 17:15:01     58 -- select @@version_comment limit 1
2009-09-02 17:16:15     58 -- SELECT host,user,password FROM mysql.user
2009-09-02 17:16:30     58 -- SELECT table_schema,COUNT(*) FROM information_schema.tables GROUP BY table_schema
2009-09-02 17:16:39     58 -- SELECT NOW(), SLEEP(3)

Query Analysis

Restart proxy with the histogram.lua sample provided.

$ mysql-proxy --proxy-lua-script=share/doc/mysql-proxy/histogram.lua &

Connect and run some queries.

$ mysql -hhostname -P4040 -u -p
mysql>  SELECT host,user,password FROM mysql.user;
mysql>  SELECT table_schema,COUNT(*) FROM information_schema.tables GROUP BY table_schema;
mysql>  SELECT NOW(), SLEEP(3);

While connected to the proxy, you can now review data from two pseudo tables.

mysql>  SELECT * FROM histogram.tables;
mysql>  SELECT * FROM histogram.queriesG
mysql>  DELETE FROM histogram.tables;
mysql>  DELETE FROM histogram.queries;

mysql> SELECT * FROM histogram.tables;
| table                     | reads | writes |
| information_schema.tables |     1 |      0 |
| mysql.user                |     1 |      0 |

mysql> SELECT * FROM histogram.queries;
| query                                                                                            | count | max_query_time | avg_query_time |
| SELECT @@version_comment LIMIT ?                                                                 |     1 |            300 |            300 |
| SELECT `table_schema` , COUNT( * ) FROM `information_schema` . `tables` GROUP BY `table_schema`  |     1 |           1822 |           1822 |
| SELECT `host` , `user` , `password` FROM `mysql` . `user`                                        |     1 |            494 |            494 |
| SELECT NOW( ) , SLEEP( ? )                                                                       |     1 |        3000735 |        3000735 |

Moving forward

The power is that with Lua you have the flexibility to write your own logging. Some improvements to these scripts could be.

  • Add the query time, number of rows, and result set size to the logging
  • Be able to sort histogram results or see top percentile. Being able to copy data into real tables would enable any level of analysis
  • Combine the logging and histogram scripts
  • Enable global enable/disabling of logging with SET GLOBAL commands
  • Support variable length IN queries, those that pass multiple values, so you end up with a subset of all queries
  • Provide a actual query example, making it easy to do a QEP. For normalized queries you need to do additional work to find values.
  • The histogram does not support the C API multi query functionality, where multiple queries can be passed to the server at one time. The problem is there is no way to time the individual queries.

Read on in SQL Analysis with MySQL Proxy – Part 2.


A good introduction document
MySQL Proxy – From architecture to implementation – OSCON 2008

MySQL Proxy lua scripts from presentation

The following Lua scripts are the examples are from my MySQL Proxy @ OSCON 08 presentation.


MySQL Proxy Analyze Query.

Requires MySQL Proxy Logging Module.

What is released is the Version for MySQL 5.0. A generic version for all MySQL versions is not yet released.


This script is part of the standard MySQL Proxy examples.

Other Scripts

Additional Lua scripts from MySQL forge are available here.

MySQL Proxy @ OSCON 08

Today I presented with Giuseppe Maxia of Sun Microsystems Inc at OSCON 08 on “MySQL Proxy: From Architecture to Implementation”. I was surprised to find that MySQL has a strong showing with a number of presentations this week.

Our talk covered the basics of MySQL Proxy, what’s coming in future features, and a number of examples of how I have used Proxy in consulting engagements to improve the information retrieval particularly for identifying performance problems.

Download Presentations Slides

Off to OSCON

I will be heading to my first OSCON next week where I will be presenting MySQL Proxy: from Architecture to Implementation in conjunction with Giuseppe Maxia .

As was written by Colin Charles Our booth is yours… Sun at OSCON, Sun/MySQL would appear to also have a reasonable turnout. So it will be good to see some old colleagues and friends, and hopefully meet some new contacts.

While I am based on the East Coast, I do also provide expert MySQL consulting for clients in any location. Should you like to find out more about my offerings covering Architecture, Performance, Scaling, Migration and Knowledge Transfer for MySQL Solutions, please Contact Me and I will arrange a time to meet next week.

MySQL NY Meetup Presentation

Tonight I spoke at the NY MySQL Meetup. The topic “Practical Performance Tips & Tricks” was a full packed 1 hour session, with 4 x 15 minute sections on Beginner, Intermediate, Advanced and MySQL Proxy. The goal to hopefully cover content for different level of attendees. This meeting followed up the large turnout from last month’s meeting with at least 35 people. Thanks again to Logicworks for sponsoring the night and providing the beer and food, especially to adjust for the late arriving presenter.

My Slides are download able in PDF format here.

Some additional links that were discussed during the meeting for reference included.

Details of my previous MySQL presentations can be found in my Articles section.

MySQL Proxy. Playing with the tutorials

I was playing with the 5 sample tutorial Lua scripts available here with the MySQL Proxy, but I was doing something a little inefficiently.

I started mysqld, then I started the MySQL Proxy with the lua script, then connected to MySQL via the proxy. To test a different script I was actually killing the MySQL Proxy and restarting with appropriate script, but this is unnecessary. MySQL Proxy will re-read the lua script, as specified with –proxy-lua-script on new connection. All I need to do is copy in the file in question and get a new mysql client connection.

The tech version of the right way:

$ cp tutorial-basic.lua running.lua
$ ./mysql-proxy --proxy-lua-script=running.lua &
$ mysql -uusr -p -P4040 -h127.0.0.1
mysql> # do my stuff
mysql> exit;
$ cp tutorial-inject.lua running.lua
$ mysql -uusr -p -P4040 -h127.0.0.1

In the tutorial-resultset.lua example, we see the creation of a new command, show querycounter which gives you the number of queries executed for the connection. Immediately I can see that people will be creating pseudo Com_insert,Com_select,… variables per connection so you can get a better granularity of information then what default status variables can provide.

It was interesting however to see the counter be a value of 20 when you open a connection. What you may ask, but we forget the little things sometimes when running the mysql client. On invocation it actually executed 20 commands to the server, that is 20 round trips.

There are indeed: SHOW DATABASES; SHOW TABLES; then 16 Field Lists for each mysql table (e.g. host,db,func…), then a SELECT @@version_comment. Most of these as a result of the rehash or tab-completion capabilities in mysql client.

Wow, I also learned something new, there is a @@version_comment, I just knew about @@version. You learn some new trivial point every day!

 mysql> select @@version,@@version_comment;
| @@version  | @@version_comment            |
| 5.0.37-log | MySQL Community Server (GPL) |
1 row in set (0.00 sec)

MySQL Proxy. Get it here.

Got MySQL Proxy yet!

If you haven’t got MySQL Proxy yet, then stop and get it now. Jan announced the release a few days ago of this new product offering from MySQL.

I first heard about MySQL Proxy at the recent MySQL Conference 2007 and actually used it a few weeks later to help address slow running queries during benchmarking with a granularity of milliseconds — Wow. The product has grown immensely since then and I’ve watched in true amazement at the speed of development by Jan, who I only found out recently was the creator of Lighthttpd.

Now it’s past midnight and I think of those things I’ve wanted, can the proxy now do them. Here are a few ideas for me to play with.

  • Find Slow Queries < 1 second
  • Add a time (HH:MM:SS) column to SHOW GLOBAL STATUS output, and really hope it comes out in mysqladmin extended-status. Cross fingers here
  • Be able to log slow queries to a Memory table, a little like mysql.slow_log but more intelligent, and in 5.0.
  • True SQL access to things like PROCESSLIST, SHOW GLOBAL VARIABLES and SHOW GLOBAL STATUS, again like the 5.1 I_S tables, but I want them now in 5.0

The mind boggles. As Giuseppe describes it’s also an excuse to learn a new language called Lua. Those familiar with the MySQL GUI Products and have played with the GRT would have already seen Lua in action, as early as the 2006 User Conference.

Woot! Time to hit post and check out the real www.woot.com

Top 5 wishes for MySQL

Note: My views are just that: mine.

1. Real time Query Monitoring

MySQL 5.0 GA provides only 3 ways to look at queries that are executed on a server in some way or another. Slow Query Log, General Query Log and Binary Log. All require a server reboot to activate and de-activate. In a production system, it’s sometimes critical to be able to know “what is going on”, and you simply can’t reboot the server twice (once to turn on, once to turn off). 5.1 goes some way with Log Tables to being able to turn on General and Slow Logs into tables. Question is, as Kevin Burton listed in his points, when is 5.1 going to be out.

Real time query monitoring also needs to have a granularity of operation better then “server”. There needs to be a capacity to assign this on per connection basis. A server is being hammered, certain status variables are increasing greatly, I need to know now what queries are causing this. MySQL provides no means of doing this. MySQL Proxy is a great new idea/project people will start hearing more about, and it goes a long way to helping, but it’s not dynamic in that I can simply turn on logging on a production system without impact to the MySQL server or connections.

Real time query monitoring granularity of time, also need to be in better units, it’s very difficult to find slow running queries > 100ms when the present granularity is seconds. MySQL Proxy as mentioned, and also Connector/J provide this, (BTW Connector/J has excellent features in it’s many connection options if you develop with Java but it’s yet another output to look at, and when your application server and database servers are on different machine architectures it’s a lot of work to sync).

I am also behind SHOW PROFILE. I’d like to see it being able to be attached to existing connections, and applied to queries, and then output discarded for a time base condition (say < 100ms). Granted the act of observation slows things done, it the ability to be able to observe, see and use the information that’s around in bits that would be a start.

2. Consistent Release Cycles

As Stewart pointed out, it’s ridiculous to have 2 years for release of a product. A consistent cycle is needed. 5.0 Change Notes shows first release to GA 22 months. 5.1 Change Notes still not GA is at 19 months.
We talk now about getting new features, 5.1 is frozen, 5.2 seems lost now in any discussions. 6.0 has a few key features but I’m sure significant new features will be limited to ensure the exposure of Falcon. So, a key new feature maybe in 6.1+

Scope creep, lack of clear planning, test coverage, and user community contribution I feel are all factors. I know that user community testing and contribution is continuing to increase and I applaud the valuable contribution of the community. I wish I could do more myself.

3. Information Schema Extensions

I’ve heard of a Pluggable API for I_S tables in 5.1 Could somebody really confirm? I’ve seen Google doing File System Storage Engines (e.g. for /proc info). I really, really, really wish things like 5.1 processlist and status/variables tables were backported to 5.0 to start with.

In addition, now that I have started, there is need for more detailed information on queries, extensions of status variables that are needed. SHOW PROFILE goes someway internally to indicate what’s going on, but knowing that a certain buffer is being used, and what portion of it per connection will help in sizing. It’s important for example to know the tmp_table_size actually used in a result set, VARCHAR and COLLATION have a huge effect that people simply don’t consider. As the number of Pluggable storage engines increases, the need to know what is really happening is going to be more important. Some of this may be more in relation to Real Time Query Monitoring, but I feel certain additional information is needed to be stored.

4. Online Table Maintenance

It was not until I had to time operations recently for ALTER TABLE ADD COLUMN|INDEX did I realize the extent of the time it takes for InnoDB tables (i.e. is takes your database table offline during this time for any OLTP). My tests were taking over an hour (and I was not in the 3 digit GB range for a table). A real uptime system can’t support downtime like this. Traditionally large scale out MySQL applications have been developed around this limitation, however to compete more with Enterprise experiences, and resources coming from enterprise background this is simply not an option. Add the fact you can’t add a datafile to a Innodb Tablespace online (why not!). While speaking of datafiles, I echo Frank’s comment with the limitation of when using innodb_file_per_table, you can’t copy the file between MySQL Instances (assuming for example all the same version of H/W, O/S, MySQL).

I really hope that Falcon addresses these issues to provide a transactional storage engine offering with these enterprise uptime features.

5. Published Benchmarks

MySQL does not publish any benchmarks, well at least not what I know about. The first problem is: how long is a piece of string. There are millions of variables, but it would be great if even a number of cases of straight forward cases were proven. People may then have a better indication of baseline systems.

Here is my initial wish list.

  • Classification of Server Configuration. Let’s say a comparison between 1 CPU (dual Core), 2 CPU (dual Core) and 4 CPU (dual Core) commodity H/W. With the same memory (4G), disk (local disk), O/S (64 bit to support > 4GB), sample data (20-50GB) and same queries (simple OLTP only) just what are the benefits. Can we get a cost of throughput to $ Cost.
  • Disk Configuration. Just how does local Raid 1/Raid 5/Raid 10, compare with SAN (Raid 5) and SAN (Raid 10). Ok there are many types of disks as well as Raid, but start with commodity SATA 72K, 8MB cache. In addition how does a machine with 6 drives (in Raid 1 with OS,Data,Logs split) compare with Raid 5 or Raid 10.
  • RAM. How do our tests run when we take a 4GB system with 20-50GB of data and give it now 16GB.
  • Backup/Recovery. How long does it take to backup and restore.
  • Admin, how long does it take to ALTER TABLE, add data file, even something simply like how long to load data into a memory table across H/W has been interesting

As you can see it turns into a nightmare quickly, we didn’t talk about storage engines like MyISAM/InnoDB, tuning parameters, different O/S etc but surely something really is better then nothing. If only there was a baseline of data and queries to start with. Surely with the data sources available out there, some enterprising person could create a 20GB, 50GB + realistic production type data source, and 20-50 OLTP queries and we have a baseline.

There is talk of the Build Farm by Jay for compiling, let’s get that baseline so we can run some tests across thousands of configurations.
If sufficient work was done by MySQL to get some standard start, then the community might take up the challenge of taking the data/queries/benchmarking framework and test on all the configurations out there, tune to the wazoo and provide back to the MySQL Forge data for everybody to look at.

One day, knowing that this type of disk with this type of battery backup in this RAID configuration just isn’t anywhere near as good as 3 other types of options at roughly the same price.

The Rest

There are more, but in keeping with the spirit of 5 knowing that at least 2 people have shown scope creep already I’ll stop. I really want to mention more.

More About Top 5

Jay Pipes started the Top 5 wishes for MySQL recently. Here are the Planet MySQL contributors to date.

Jay Pipes
Marten Mickos – MySQL CEO
Stewart Smith
Kevin Burton
Farhan Mashaqi
Jeremy Cole

It’s almost like a chain letter, so I’ll start it by passing it onto 3 more, my challenge is to: — Mike Kruckenberg, my evil(he isn’t really evil) twin – Roland Bouman & Paul McCullagh.

27 June 2007 Update
Since my posting we have also had:
Antony Curtis
Alan Kasindorf
Jim Winstead
Jonathon Coombes