MySQL Monitoring 101

At the 2009 MySQL Conference and Expo I presented to a full room on MySQL Monitoring 101.

This presentation focused on the following four goals.

  • Know what to monitor
  • Know how you can monitor
  • Learn practices to diagnose problems
  • Have a foundation of historical information

Updated 09/18/09
You can also find additional materials at:

A change in the MySQL Binary distributions

Yesterday was the surprise announcement of MySQL 5.4 at the 2009 MySQL Conference and Expo. It was unfortunate that the supporting information was not that forthcoming on the MySQL website. I tried for several hours to try and download, but no mirrors were initially available. Today I see some information on the mysql.com home page and finally able to get the binary.

What I found most significant with this new major version release is a change in the binary distribution, as seen on the Download page.

MySQL 5.4 is only available on 3 platforms:

  • Linux (AMD64 / Intel EM64T)
  • Solaris 10 (SPARC, 64-bit)
  • Solaris 10 (AMD64 / Intel EM64T, 64-bit)

I was also surprised that this beta release highlights the emphasis of community contributions (long overdue), yet the community and indeed many employees of Sun/MySQL were simply unaware of this work. This is clearly a change in involving the community. While I applaud the beta status, hopefully a more stable product to start with, it’s development was done in a very closed company model.

What's happening with InnoDB

I have moved on to InnoDB: Innovative Technologies for Performance and Data Protection by Ken Jacobs at MySQL Conference and Expo.

With a brief history lesson of inception from 1994, inclusion in MySQL in 2000 and acquired by Oracle in 2005. Most of the work was done by one person. InnoDB is based on sound database computer science using Gray & Reuters definitive text on database design.

Some key points in Ken’s discussion.

  • Adaptive Hash indexing for frequent queries on keys.
  • In plugin Adaptive Hash is configurable
  • Insert Buffering – Deferring secondary index writes
  • Fast Index Create – doesn’t requires all indexes to be rebuilt
  • Table Compression – Changing the page size

The InnoDB plugin available in 5.1 has a number of new benefits.

  • fast index creation
  • table compression
  • info schema tables
  • new row storage format
  • file format management

All InnoDB 1.0.3 plugin features will be available in MySQL 5.4

The big announcement is a new product – Embedded InnoDB. This has the high performance, reliability and rich functionality of InnoDB, has a flexible programmatic API. No SQL, No security.

Search at Craigslist

I am now sitting in on MySQL and Search at Craigslist by Jeremy Zawodny at MySQL Users Conference

Some of the technical difficulties that required addressing.

  • High churn rate
  • half life can be very short
  • Growth
  • Traffic
  • Need to archive postings, e.g. 100M but be searchable
  • Internationalization and UTF-8

Some of the Craigslist Goals

  • Open Source
  • Easy and approachable
  • be green with energy use

A review of the Internals server configuration

  • Load Balancer (perlbal like)
  • Read Proxy Array (perl+memcached)
  • Web Read Array (apache 1.3 + mod_perl)
  • Object Cache (Perl + memcached)
  • Read DB Cluster (MySQL 5.0.x)
  • Search Cluster (Sphinx)

Clusters of DB servers have good vertical partitioning by Roles. These being

  • Users
  • Classified
  • Forums
  • Stats
  • Archive

Sphinx is a full standalone full text search that is used. Did compare with Apache Solr, but it seemed more complex and complicated. The Sphinx configuration:

  • Partitioned based on cities (people search locally)
  • Attributes v Keywords
  • Persistent Connections
  • Minimal stopword list
  • Partition in 2 clusters (1 master, 4 slaves)

The results of implementing Sphinx were:

  • decrease in 25 MySQL boxes to 10 sphinx boxes
  • no locking
  • 1,000+ qps
  • 50M queries per day
  • Better separation of code

MySQL Users Conference Opening Lines

Opening introduction from Colin Charles got us started. Karen Tegan Padir VP MySQL & Software Infrastructure was the opening keynote.

She comes from a strong tech background and is passionate about open source, the communities and how to make a successful product.

There isn’t a person that doesn’t go a day without interacting with a website or hardware system that uses a MySQL database.

The big news was the announcement of MySQL 5.4 – Performance & Scalability. Key features include.

  • InnoDb scalability 16way x86 and 64 way CMT servers
  • subquery optimization
  • new query algorithms
  • improved stored procedures, and prepared statements
  • enhanced Information Schema
  • improved DTrace Support

More information at MySQL 5.4 Announcement Details….

Other key points includes:

1. Ken Jacobs announces today an Embedded Innodb with a powerful API (not SQL based). Read more at Innobase Introduces Embedded InnoDB
2. MySQLCluster 7.0 is also released today. Some benchmarks 4.3x improvements. New features also include LDAP support.
3. The next release of MySQL Query Analyzer, 2.1 announced.
4. Sun announces a commitment to accept contributions from the community.
5. Community also gets the Monthly Rapid Updates.
6. MySQL Drizzle Project is discussed as a technology incubator.

Partners of the year: Intel, Infobright and Lifeboard.
Appliation of the year: Zappos, Alcatel-lucent and Symantec.
Community members of the year: Marc Delisle, Ronald Bradford, Shlomi Noach.

Where is the MySQL in Sun's announcement

I find it surprising that in the official Sun Announcement there is no mention of MySQL for two reasons. Firstly, this was Sun largest single purchase of $1 billion only 12 months ago. Second, MySQL’s largest competitor is Oracle.

While the Sun website shows the news in grandeur, the MySQL website is noticeably absent in any information of it’s owners’ acquisition.

On my professional side, as an independent speaker for Sun Microsystems with plans for upcoming webinars and future speaking on “Best Practices in Migrating to MySQL from Oracle”, this news does not benefit my bottom line.

A Drizzle update – Running version 2009.03.970-development

I’ve not looked at compiling and running Drizzle on my server for the past four weeks. Well overdue time for a check and see how it’s going. I saw in today’s planet.mysql.com by Eric Day a new dependency is needed. libdrizzle 0.2.0 now in Drizzle is now required, so I started there.

cd ~/bzr
bzr branch lp:libdrizzle
cd libdrizzle
./config/autorun.sh
./configure
make
sudo make install

No problems there, also documented at the Drizzle Wiki. Great to see the docs up to date. I see my old work on starting the compiling page still relevant. Tested on CentOS 5 and Mac OS/X 10.5

Compiling drizzle was not much more difficult.

cd ~/bzr/drizzle
bzr update
make distclean
./config/autorun.sh
./configure --prefix=/home/drizzle/deploy
make
make install

The problems happened when I started drizzle. Initially I was using bin/drizzled_safe, but it was recommended via IRC#drizzle I stick with sbin/drizzled

sbin/drizzled &
error while loading shared libraries: libprotobuf.so.2: cannot open shared object file: No such file or directory

An investigation of Google Proto Buffers.

$ protoc --version
libprotoc 2.0.2

I see that protobuf 2.0.3 is now available, but this was not the problem.

I got around the problem by specifying the current library path:

$ LD_LIBRARY_PATH=/usr/local/lib sbin/drizzled &

I corrected this problem by adding /usr/local/lib to the default ld path, both the libdrizzle and libprotobuf libs are located there.

$ echo "/usr/local/lib" > /etc/ld.so.conf.d/drizzle.conf
$ ldconfig
$ ls -l /usr/local/lib
total 37240
-rw-r--r-- 1 root root  1194602 Mar 31 17:42 libdrizzle.a
-rwxr-xr-x 1 root root      940 Mar 31 17:42 libdrizzle.la
lrwxrwxrwx 1 root root       19 Mar 31 17:42 libdrizzle.so -> libdrizzle.so.0.0.2
lrwxrwxrwx 1 root root       19 Mar 31 17:42 libdrizzle.so.0 -> libdrizzle.so.0.0.2
-rwxr-xr-x 1 root root  1117979 Mar 31 17:42 libdrizzle.so.0.0.2
-rw-r--r-- 1 root root 12199302 Nov 30 23:32 libprotobuf.a
-rwxr-xr-x 1 root root      836 Nov 30 23:32 libprotobuf.la
lrwxrwxrwx 1 root root       20 Nov 30 23:32 libprotobuf.so -> libprotobuf.so.2.0.0
lrwxrwxrwx 1 root root       20 Aug 27  2008 libprotobuf.so.0 -> libprotobuf.so.0.0.0
-rwxr-xr-x 1 root root  5027949 Aug 27  2008 libprotobuf.so.0.0.0
lrwxrwxrwx 1 root root       20 Nov 30 23:32 libprotobuf.so.2 -> libprotobuf.so.2.0.0
-rwxr-xr-x 1 root root  5586965 Nov 30 23:32 libprotobuf.so.2.0.0
-rw-r--r-- 1 root root  9264068 Nov 30 23:32 libprotoc.a
-rwxr-xr-x 1 root root      852 Nov 30 23:32 libprotoc.la
lrwxrwxrwx 1 root root       18 Nov 30 23:32 libprotoc.so -> libprotoc.so.0.0.0
lrwxrwxrwx 1 root root       18 Nov 30 23:32 libprotoc.so.0 -> libprotoc.so.0.0.0
-rwxr-xr-x 1 root root  3645396 Nov 30 23:32 libprotoc.so.0.0.0
drwxr-xr-x 2 root root     4096 Mar 31 17:42 pkgconfig

Starting

$ sbin/drizzled &
InnoDB: The InnoDB memory heap is disabled
InnoDB: Mutexes and rw_locks use GCC atomic builtins.
090331 18:38:08  InnoDB: highest supported file format is Barracuda.
InnoDB: The log sequence number in ibdata files does not match
InnoDB: the log sequence number in the ib_logfiles!
090331 18:38:08  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
090331 18:38:08 InnoDB Plugin 1.0.3 started; log sequence number 46419
sbin/drizzled: ready for connections.
Version: '2009.03.970-development'  socket: ''  port: 4427  Source distribution

Verifying

$ bin/drizzle -uroot
Welcome to the Drizzle client..  Commands end with ; or g.
Your Drizzle connection id is 1
Server version: 2009.03.970-development Source distribution
Type 'help;' or 'h' for help. Type 'c' to clear the buffer.
drizzle> select version();
+-------------------------+
| version()               |
+-------------------------+
| 2009.03.970-development |
+-------------------------+
1 row in set (0.00 sec)
drizzle> exit

Sweet! Now to try some testing & benchmarking before the barrage of conferences next month, 2009 MySQL Camp, Percona Performance Conference and MySQL Conference & Expo.

I’m going to check out The Juice Database Benchmark next as a more realistic benchmark to DBT2 and sysbench.

Identifying resource bottlenecks – CPU

One of the first steps when addressing a MySQL performance tuning problem is to perform a system audit of the physical hardware resources, then identify any obvious bottlenecks in these resources.

When dealing with CPU, a quick audit should include identifying the number of CPU cores your server has, and the types of these cores. The key file on Linux systems is /proc/cpuinfo.

Number of cores can be found via the command cat /proc/cpuinfo | grep “^processor” | wc -l

You need to look more closely at the file to determine the type of CPU (e.g. below the model name shows Intel(R) Xeon(R) CPU X3220 @ 2.40GHz. The combination of knowing the number of processors (cores) listed and physical id and siblings helps identify how many CPUs and how many cores per CPU exist.

$ cat /proc/cpuinfo
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 15
model name	: Intel(R) Xeon(R) CPU           X3220  @ 2.40GHz
stepping	: 11
cpu MHz		: 2394.051
cache size	: 4096 KB
physical id	: 0
siblings	: 4
core id		: 0
cpu cores	: 4
fpu		: yes
fpu_exception	: yes
cpuid level	: 10
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
bogomips	: 4789.96
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

...

Other commands that help with identifying CPU/cores include mpstat and top.

$ mpstat -P ALL 5

11:43:43 AM  CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal   %idle    intr/s
11:43:48 AM  all    0.00    0.00    0.00    0.00    0.05    0.00    0.00   99.95   1033.00
11:43:48 AM    0    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00   1000.40
11:43:48 AM    1    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00      0.00
11:43:48 AM    2    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00     31.40
11:43:48 AM    3    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00      1.00
$ top
0

top - 11:42:09 up 36 days, 13:17,  2 users,  load average: 0.20, 0.24, 0.25
Tasks: 133 total,   1 running, 132 sleeping,   0 stopped,   0 zombie
Cpu0  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  :  0.0%us,  0.3%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   4050776k total,  3825584k used,   225192k free,   397580k buffers
Swap:  1052248k total,      128k used,  1052120k free,  2302408k cached

You can easily identify a CPU bottleneck using the vmstat command.

The following shows an idle system.

$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0    128 234088 409632 2474372    0    0     0     0 1081  198  0  0 100  0  0
 0  0    128 234088 409632 2474396    0    0     0     0 1003   59  0  0 100  0  0
 0  0    128 234088 409636 2474392    0    0     0   100 1085  209  0  0 100  0  0
 0  0    128 233836 409636 2474396    0    0     0     0 1014  184  3  0 97  0  0
 0  0    128 233284 409636 2474396    0    0     0     0 1182  435  2  0 98  0  0
 0  0    128 233176 409636 2474396    0    0     0     0 1024  104  1  0 99  0  0
 0  0    128 233176 409636 2474396    0    0     0     0 1079  195  0  0 100  0  0
 1  0    128 233168 409644 2474396    0    0     0   232 1021  188  3  0 97  0  0
 0  0    128 233176 409644 2474396    0    0     0     0 1111  213  2  0 98  0  0
 0  0    128 233176 409644 2474396    0    0     0     0 1005   60  0  0 100  0  0

The key columns (from the man page are)

CPU – These are percentages of total CPU time.

  • us: Time spent running non-kernel code. (user time, including nice time)
  • sy: Time spent running kernel code. (system time)
  • id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time.
  • wa: Time spent waiting for IO. Prior to Linux 2.5.41, included in idle.
  • st: Time stolen from a virtual machine. Prior to Linux 2.6.11, unknown.
  • Procs

  • r: The number of processes waiting for run time.

NOTE: The columns of vmstat may vary between different Linux Operating Systems.

If you system is CPU Bound then you will observe this. Look at id,us,sy,r

$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 3  0    128 275684 397176 2300672    0    0     0     0 1118  427 74  2 25  0  0
 3  0    128 217404 397176 2300672    0    0     0     0 1017  138 74  1 25  0  0
 6  0    128 239584 397176 2300672    0    0     0     0 1086  350 93  2  5  0  0
 4  0    128 269468 397176 2300672    0    0     0     0 1005  229 98  2  0  0  0
 4  0    128 217636 397180 2300668    0    0     0   168 1087  251 99  2  0  0  0
 4  0    128 240576 397180 2300668    0    0     0     0 1006  182 99  2  0  0  0
 4  0    128 270708 397180 2300668    0    0     0     0 1079  338 98  2  0  0  0
 4  0    128 218752 397180 2300684    0    0     0     0 1005  106 99  1  0  0  0
 4  0    128 226316 397180 2300684    0    0     0     0 1077  308 98  2  0  0  0
 4  0    128 198664 397184 2300680    0    0     0    76 1010  250 99  1  0  0  0
 4  0    128 179444 397184 2300680    0    0     0     0 1077  238 100  0  0  0  0
 4  0    128 185396 397184 2300688    0    0     0     0 1006  210 99  1  0  0  0
 4  0    128 199408 397184 2300688    0    0     0     0 1079  336 99  1  0  0  0

You should also be wary of a Single CPU Bound process. This is why knowing the number of cores is important. In this example, one CPU is bound.

$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st

 0  0    128  99592 412544 2477580    0    0     0     0 1017   89  0  0 100  0  0
 0  0    128  99592 412544 2477580    0    0     0     0 1090  222  0  0 100  0  0
 0  0    128  99592 412544 2477580    0    0     0     0 1019   98  0  0 100  0  0
 1  0    128  99592 412544 2477580    0    0     0     0 1096  347 14  0 86  0  0
 1  0    128  99592 412548 2477576    0    0     0    84 1030  194 25  0 75  0  0
 1  0    128  99592 412548 2477576    0    0     0     0 1094  300 25  0 75  0  0
 1  0    128  99592 412548 2477580    0    0     0     0 1012   76 25  0 75  0  0
 1  0    128  99592 412548 2477580    0    0     0     0 1096  318 25  0 75  0  0
 1  0    128  73192 412548 2477580    0    0     0     0 1039  273 29  0 70  0  0
 1  0    128  77284 412556 2477572    0    0     0   268 1122  373 25  1 75  0  0
 2  0    128  83592 412556 2477584    0    0     0     0 1036  374 27  1 72  0  0
 0  0    128  56220 412564 2477576    0    0     0   172 1017   84  7  0 94  0  0
 0  0    128  56220 412564 2477576    0    0     0     0 1078  192  0  0 100  0  0
$ mpstat -P ALL 1
12:15:55 PM  CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal   %idle    intr/s
12:15:56 PM  all   25.00    0.00    0.00    0.00    0.00    0.00    0.00   75.00   1072.00
12:15:56 PM    0    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00   1001.00
12:15:56 PM    1    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00      0.00
12:15:56 PM    2    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00     62.00
12:15:56 PM    3  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00      9.00

12:15:56 PM  CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal   %idle    intr/s
12:15:57 PM  all   25.00    0.00    0.00    0.00    0.00    0.00    0.00   75.00   1021.00
12:15:57 PM    0    0.00    0.00    0.00    0.00    0.00    1.00    0.00   99.00   1001.00
12:15:57 PM    1    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00      0.00
12:15:57 PM    2    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00     18.00
12:15:57 PM    3  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00      2.00

I will be detailing identifying bottlenecks of Memory, Disk and Network in future posts. You can also find out more at the MySQL User Conference “Monitoring 101 – Simple stuff to save your bacon” session.