Announcing the MySQL Plugin for New Relic

Many application developers would know of New Relic. A SaaS performance and monitoring tool targeted towards your web application monitoring including PHP, Ruby, Java, .Net, Python and Node.

With the release today (June 19, 2013) of the New Relic Platform, custom monitoring of data stores including MySQL are now possible. Try it now free. This link will provide you a free standard account (no cost, no billing details necessary), that enables you to perform application monitoring, server monitoring, MySQL instance monitoring and monitoring of many other products via many plugins.

Over the next few posts I will be discussing some of the design decisions I made for this MySQL plugin. New Relic has certain features that lend towards really helping developers monitor and diagnose the application (I have been surprised how it has helped in debugging DB and OS problems directly from PHP code for example). However, often it’s important that knowing the server resources, the database usage is critical to seeing the whole picture, and with the new plugins, New Relic gives developers, system administrators and database administrator some well targeted insights. When building custom dashboards you can see CPU usage, Database usage, and your web application volume, page load time and other metrics on one page.

The MySQL plugin has two pre-requisite requirements. A MySQL server running 5.0 or better, and a Java JRE 1.6 or better. The plugin can work either directly on each MySQL server/instance, and therefore needs the JRE, or it is possible to configure a single server to collect all MySQL statistics and report them to New Relic. There are no specifics that restrict this plugin working for any MySQL variants/forks, infact I specifically designed the plugin to be forward compatible with new version and status variables for example, and support custom recording of any metrics (more on that later).

This is first release of the New Relic Platform and MySQL plugin so I expect a lot of refinements, improvements and suggestions as we move forward. As an integral part of developing the MySQL plugin and using the New Relic Java SDK, there are a number of roadmap items to better serve MySQL and other products that will be coming in future releases. The beta version of the MySQL plugin has been running on production MySQL servers for several months now and working well.

New Relic provides two ways to display data, first by graphs, and second by tables. There is a handy information option in the Server monitoring that is not presently available for the plugins. Graphs work best with multiple data points and constantly changing data and records averages. When looking at the SQL commands for example is great to see the total breakdown, monitoring MySQL replication lag (a single metric), that hopefully is generally zero can appear a little bland. A cool trick is to click on the legend, this toggles the displayed value, and can really help when one value in a graph hogs the metric.

At present the plugin has 4 tabs of display:

  • Overview provides a high level view of total reads v total writes, database connections, network communications and a table of key utilizations (which I will discuss more in another post)
  • Query Analysis shows details of SQL commands, temporary (memory/disk) tables, slow queries, query cache usage, select query types, sort types and table locking.
  • InnoDB Metrics include buffer pool operations, a page breakdown, row operations, log writes, log and data fsyncs, checkpoint age, history list, internal threads and mutexes.
  • Replication shows lag, relay log volume, I/O thread lag, slave errors and master binary log volume (when on a master).

The plugin is written to be extensible via JSON configuration. Those wishing to monitor different variables, or say custom metrics from storage engines like Tokutek can be easily defined, either a key/value set, or single row of metrics. However, the initial version of the dashboards does not allow the customers to modify the present dashboard. Requests are welcome for me to expand the current global dashboards.

In the News

What do MySQL Consultants do?

One role of a MySQL consultant is to review an existing production system. Sometimes you have sufficient time and access, and other times you don’t. If I am given a limited time here is a general list of things I look at.

  1. Review Server architecture, OS, Memory, Disks (including raid and partition type), Network etc
  2. Review server load and identify physical bottleneck
  3. Look at all running processes
  4. Look specifically at MySQL processes
  5. Review MySQL Error Log
  6. Determine MySQL version
  7. Look at MySQL configuration (e.g. /etc/my.cnf)
  8. Look at running MySQL Variables
  9. Look at running MySQL status (x n times)
  10. Look at running MySQL INNODB status (x n times) if used
  11. Get Database and Schema Sizes
  12. Get Database Schema
  13. Review Slow Query Log
  14. Capture query sample via SHOW FULL PROCESSLIST (locked and long running)
  15. Analyze Binary Log file
  16. Capture all running SQL

Here are some of the commands I would run.

2. Review server load and identify physical bottleneck

$ vmstat 5 720 > vmstat.`date +%y%m%d.%H%M%S`.txt

4. Look at MySQL processes

$ ps -eopid,fname,rss,vsz,user,command | grep -e "RSS" -e "mysql"
  PID COMMAND    RSS    VSZ USER     COMMAND
 5463 grep       764   5204 ronald   grep -e RSS -e mysql
13894 mysqld_s   596   3936 root     /bin/sh /usr/bin/mysqld_safe
13933 mysqld   4787812 5127208 mysql /usr/sbin/mysqld --basedir=/usr --datadir=/vol/mysql/mysqldata --user=mysql --pid-file=/var/run/mysqld/mysqld.pid --skip-external-locking --port=3306 --socket=/var/run/mysqld/mysqld.sock
13934 logger     608   3840 root     logger -p daemon.err -t mysqld_safe -i -t mysqld

$ ps -eopid,fname,rss,vsz,user,command | grep " mysqld " | grep -v grep | awk '{print $3,$4}'
4787820 5127208

5. Review MySQL Error Log

The error log can be found in various different places based on the operating system and configuration. It is important to find the right log, the SHOW GLOBAL VARIABLES LIKE ‘log_error’ will determine the location.

This is generally overlooked, however this can quickly identify some underlying problems with a MySQL environment.

7. Look at MySQL configuration

$ [ -f /etc/my.cnf ] &&  cat /etc/my.cnf
$ [ -f /etc/mysql/my.cnf ] &&  cat /etc/mysql/my.cnf
$ find / -name  "*my*cnf" 2>/dev/null

8. Look at running MySQL Variables

$ mysqladmin -uroot -p variables

9. Look at running MySQL status (x n times)

$ mysqladmin -uroot -p extended-status

It is important to run this several times at regular intervals, say 60 seconds, 60 minutes, or 24 hours.

I also have dedicated scripts that can perform this. Check out Log MySQL Stats.

11. Get Database and Schema Sizes

Check out my scripts on my MySQL DBA page

14. Capture Locked statements

Check out my script for Capturing MySQL sessions.

15. Analyze Binary Log file

Check out my post on using mk-query-digest.

16. Capture all SQL

Check out my post on DML Stats per table

Moving forward

Of course the commands I run exceeds this initial list, and gathering this information is only

Monitoring MySQL with MONyog

It just works. In absence of any MySQL monitoring for your site, I have found no solution that gets you operational as quickly and easily. MONyog can be deployed in 60 seconds, and configured in another 60 seconds. Within 5 minutes you can have visual monitoring of your MySQL environment.

MONyog is an agentless process, which is an advantage for easy install, but does not provide for monitoring redundancy in the capture of information due to agentless nature. It’s a static standalone executable which is great if you need something to work out of the box. You can easily configure multiple servers in a replication topology, or different servers in your environment. You get the ability to monitor all the usual information, with a dashboard and detailed graphs. While MONyog does provide customizations of rules for the graphs and presentation order, that’s about it. You can’t at this time for example change the colors, what’s on graphs except for what MONyog monitors or the security of certain options in the GUI to different users, however I hope they offer these suggestions in future releases.

MONyog includes some nice features that are overlooked in other products. You have the ability to monitor the MySQL error log (if configured appropriately) which is a common complaint of end users. You can also see the process list, and when configured you can also perform query gathering and analysis.

MONyog is a well priced commercial product with a free trial download without registration requirements which gives no barrier to access and evaluate. As a solution and ease of deployment, there is no excuse not to evaluate this product. If you have no monitoring, you can now quickly and easily. I find a number of clients that simply have no monitoring. There really is no excuse as it’s critical information you need to have and record for a successful business.

You can get it from www.webyog.com.

What alert monitoring do you use?

More importantly, how often to you confirm access to your server and database with that alert monitoring?

With a client yesterday the primary database server while still usable and serving connections for a while, but was not accessible via SSH to investigate performance issues. It eventually became non responsive and required a physical reboot. With alert monitoring for system availability only recorded every 5 minutes this was simply too long a delay.

This lead to a discussion with more questions then answers including.

  • How often should you ping your server(s), both internally and externally?
  • How often do you connect physically to your server for confirmation, e.g. a ssh keyed authentication test?
  • How often do you perform a physical database connection test?
  • How often do you do an end to end test, including http request to database query test?

As with all of these, you also want to time these operations for any deviations.

I’ve created a very simple MySQL Alert Monitoring survey. I would appreciate your input.

MySQL Monitoring 101 – Graph your results

The problem

Hands up those that don’t monitor their production MySQL web server. I’m a little surprised by this, but I’ve visited several clients that have absolutely no monitoring other then “the customers will tell us when something is wrong”. The lack of system monitoring is one of the topics in my book “101 ways to screw up a successful startup”.

Why is monitoring important? First it can tell you when something is wrong, most monitoring systems introduce some level of constraints that trigger notifications via email, SMS or red flashing screens. Second, and I consider more important, is it allows you to analyze change and compare results over time. Let’s say you added more memory to your server, and then remembered to also increase the MySQL buffers appropriately. How much improvement did it make? Rather then “it seems faster”, you can have hard and fast numbers to back it up.

The Monitoring

If you have zero monitoring, you need to implement at least the following. Create the following script and run daily at midnight via cron. I’ve made the script as simple as can be.

#!/bin/sh
SCRIPT_NAME="monitor"
DATETIME=`date +%Y%m%d.%H%M`
HOSTNAME=`hostname -s`
LOG_DIR="/tmp"

vmstat 5 17280 > $LOG_DIR/os.vmstat.5.$HOSTNAME.$DATETIME.log &
iostat -x 5 17280 > $LOG_DIR/os.iostat.5.$HOSTNAME.$DATETIME.log &
exit 0

If you don’t have iostat installed, you will find iostat/sar as part of the sysstat package that most distributions will have generally available.

The results

vmstat output of relative idle system.

$ vmstat 5
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0      0 304332 289484 435052    0    0     3     4  206  177  1  0 98  0
 0  0      0 304108 289496 435072    0    0     0    45  411  736  5  0 95  0
 0  0      0 304116 289500 435072    0    0     0    13  395  716  3  0 97  0

vmstat of system under disk load

$ vmstat 5
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 1  0      0 234784 289844 516800    0    0     3     4  206  177  1  0 98  0
 1  1      0 101636 289988 645552    0    0     1 33190  448  626 35  5 31 29
 1  2      0  33092 280888 723276    0    0     4 17233  469  616 24  4  6 66
 1  0      0  62876 177444 797056    0    0     2 14846  837  938 33  5 31 30
 1  1      0  33252 168440 834064    0    0     1 30376  969  904 31  6 22 41

iostat of system under load

 iostat -x 5
Linux 2.6.22-14-generic (newyork)       02/16/2008

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.31    0.00    0.29    0.15    0.00   98.26

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.83     5.82    0.71    0.94    13.04    53.56    40.53     0.09   52.60   2.08   0.34
sr0               0.00     0.00    0.00    0.00     0.00     0.00    10.40     0.00  237.00 235.00   0.01

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          41.30    0.00    5.70   29.00    0.00   24.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00  5153.60    0.60   75.80     4.80 42774.40   559.94    81.57 1354.93  13.09 100.00
sr0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          30.63    0.00    4.10   61.36    0.00    3.90

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00  3850.00    0.20   60.00     1.60 39763.20   660.54    83.68 1450.90  13.86  83.44
sr0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          22.10    0.00    3.80   42.30    0.00   31.80

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00  7645.80    0.40   66.80     3.20 44009.60   654.95   100.34 1192.33  14.14  95.04
sr0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

The analysis

The below graph shows CPU usage over a few hours. What was only observed via the graph was a spike of CPU that occurred every 5 minutes. In fact it occurred just on one server of several servers, and this enabled analysis to find a rouge cron job running every 5 minutes.

The below graph shows CPU idle usage over a 24 hour period. The colors represent different usage patterns such as during the day, batch processing overnight etc.

The below graph shows IO utilization over the same 24 hour period.

NOTE: Color’s added manually to highlight points.
These past two graphs highlight a number of things that require further analysis, but as in this example, it only shows part of a bigger picture. For example only CPU idle is graphed, not Wait I/O for example which as you can see from the I/O is excessive at times.

Still, these graphs were created from grep’ing the appropriate data into a CSV file, loading into Open Office, and click just the graph button, about a 60 second process for each type of data. This alone gave a better picture of what was happening then no picture.

What’s next

This level of monitoring is impractical for production systems, especially if you are monitoring more then one system. You need to monitoring more things, compare different results at the same time, and get better results at hand instantly. In my next topic “Monitoring 201″ I’ll talk about a number of easy and practical options.