mk-query-digest Tips – Showing all hosts & users

The Maatkit tools provide a suite of additional MySQL commands. There is one command I use constantly and that is mk-query-digest.

Unfortunately the documentation does leave a lot to be desired for usability. While throughout, it is a man page and not a user guide. Several of us have discussed writing better documentation however it’s always a matter of time. I have however learned a number of tips and I’d like to share them in smaller digests.

The first is showing additional display. Maatkit works on truncating per line output to a reasonable length of 73 characters?

One of those lines is the list of hosts that connected to MySQL for a query, for example.

# Hosts                  4 192.168.40... (2), 192.168.40... (2)... 2 more
# Hosts                  3 99.99.245.14 (12), 999.106.206.167 (6)... 1 more

The problem is I want to know what that 1 more is so I can gather a complete list of IP addresses that connect to this server. You do that with the –show-all=host argument.

Without

$ cat external.tcpdump | ./mk-query-digest --type tcpdump | grep Hosts | uniq -c
#
      1 # Hosts                  3 99.99.245.14 (12), 999.106.206.167 (6)... 1 more
      1 # Hosts                  1 99.99.139.140

With

$ cat external.tcpdump | ./mk-query-digest --type tcpdump --show-all=host | grep Hosts | uniq -c
      1 # Hosts                  3 99.99.245.14 (12), 999.106.206.167 (6), 99.99.139.140 (2)
      1 # Hosts                  1 99.99.139.140

You can apply the same principle to the Users as well with –show-all=user

$ cat external.tcpdump | ./mk-query-digest --type tcpdump  --show-all=user | grep Users | uniq -c
      1 # Users                  2 xxx (13), phpmysqlmo... (5)
     49 # Users                  1  xxx

The problem is a still gett a truncation of the name ‘phpmysqlmo…’ That’s the one thing I’m trying to uncover, because that IP and usernme are not valid permissions for this system.

tcpdump errors on FreeBSD for mk-query-digest

While I use this tcpdump command for MySQL query analysis with mk-query-digest, I found recently that it didn’t work on FreeBSD

$ tcpdump -i bge0 port 3306 -s 65535 -x -n -q -tttt -c 5
tcpdump: syntax error

It left me perplexed and reading the man page seemed to indicate my options were valid. I tried a few variances just to be sure without success.

$ tcpdump -i bge0 -c 5 port 3306 -x
tcpdump: syntax error
$ tcpdump -i bge0 -c 5 port 3306 -q
tcpdump: syntax error
$ tcpdump -i bge0 -c 5 port 3306 -tttt
tcpdump: syntax error

The solution was actually quite simple in the end, it had nothing to do with the commands, it had everything to do with the order of them. Placing port as the last option solved the problem.

$ tcpdump -i bge0 -s 65535 -x -n -q -tttt -c 5  port 3306
$ uname -a
FreeBSD db4.example.com 6.3-RELEASE-p3 FreeBSD 6.3-RELEASE-p3 #0: Wed Jul 16 05:13:50 EDT 200

Take a look at mk-query-digest

Q: What SQL is running on your MySQL database server now?
A: The bane of pain for MySQL DBA’s when there is no official MySQL instrumentation that is dynamic and fine grained sufficiently to solve this problem at the SQL interface.

While hybrid solutions exist, the lack of dynamic and real-time are the issues. There is however great work being done by Baron and others on Maatkit mk-query-digest and packet sniffing the MySQL TCP packets.

$ sudo tcpdump -i eth0 port 3306 -s 65535  -x -n -q -tttt | ./mk-query-digest --type tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
# Caught SIGINT.
5444 packets captured
8254 packets received by filter
2809 packets dropped by kernel
# 2.1s user time, 40ms system time, 22.23M rss, 57.60M vsz
# Overall: 1.58k total, 38 unique, 262.67 QPS, 2.34x concurrency _________
#                    total     min     max     avg     95%  stddev  median
# Exec time            14s    41us      2s     9ms    23ms    72ms   236us
# Time range        2009-10-05 08:17:44.377926 to 2009-10-05 08:17:50.052460
# bytes            271.31k       8   8.79k  176.28  621.67  412.51   44.60
# Rows affe            248       0       3    0.16    0.99    0.38       0
# Warning c              0       0       0       0       0       0       0
#   3% (58)   No_index_used

# Query 1: 118.67 QPS, 1.23x concurrency, ID 0x16219655761820A2 at byte 2167682
#              pct   total     min     max     avg     95%  stddev  median
# Count         45     712
# Exec time     52      7s    41us      1s    10ms    23ms    80ms   138us
# Hosts                 11 10.251.199... (132), 10.251.103... (129)... 9 more
# Time range 2009-10-05 08:17:44.377926 to 2009-10-05 08:17:50.051177
# bytes          2   6.43k       8      17    9.25   16.81    3.15    7.70
# Errors                 1    none
# Rows affe      1       4       0       1    0.01       0    0.07       0
# Warning c      0       0       0       0       0       0       0       0
#   0% (1)    No_index_used
# Query_time distribution
#   1us
#  10us  ################################################
# 100us  ################################################################
#   1ms  #####
#  10ms  #############
# 100ms  #
#    1s  #
#  10s+
# EXPLAIN
select 1G

....

# Rank Query ID           Response time    Calls   R/Call     Item
# ==== ================== ================ ======= ========== ====
#    1 0x16219655761820A2     7.3861 54.9%     712   0.010374 SELECT
#    2 0x930DE584EC815E11     1.6664 12.4%      35   0.047611 SELECT X
#    3 0x68B1E4E47977667A     1.4265 10.6%      71   0.020092 SELECT Y Z
...

In this real-time example, the SELECT 1 a Connector/J keep alive in version 3.1.4 using iBATIS is the major SQL statement used. (yes MM I know about /* ping*/, have suggested to client). I was however with additional sample times able to identify a new query and confirm a full table scan by lack of good index. Monitoring had highlighted an increase in SQL statements and table scans, but you need tools such as this to identify the problem SQL in a well tuned system.

There is a lot of information to digest with this output, to confirm and determine the relative benefit of each number, the histogram etc, but the identification of SQL in real-time and the good work of overall summaries and comments for EXPLAIN and SHOW CREATE TABLE for example shows this tool has been designed by MySQL DBA’s for MySQL DBA’s.

Sheeri just wrote about Dynamic General and Slow Query Log Before MySQL 5.1 which apart from the File I/O overhead is an idea I’d not considered before. What may be a good idea, is to pass this information into a named pipe and then let another process do whatever. Drizzle solves this problem with query logging information being able to be shipped off to Gearman.