Monitoring MySQL resource limits

October 27, 2009 by ronald

I have for the first time seen a client implement MySQL Resource Limits. I got the following error tying to connect to the database.

$ mysql -udba -p
ERROR 1226 (42000): User 'dba' has exceeded the 'max_user_connections' resource (current value: 10)

I see from the documentation the ability to see the limits in the mysql.user table. I see this is included in the SHOW GRANTS output.

SHOW GRANTS for 'dba'@'%';
+--------------------------------------------------------------------------------------------------------------------------------------------+
| Grants for dba@%                                                                                                                         |
+--------------------------------------------------------------------------------------------------------------------------------------------+
| GRANT ALL PRIVILEGES ON *.* TO 'dba'@'%' IDENTIFIED BY PASSWORD '*CAABA4CFB7E71E51477E0658FC2D2BBA1267E669' WITH MAX_USER_CONNECTIONS 10 |
+--------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

The documentation includes details that you can flush the resource limits, however I have found no way to monitor the current usage.

I welcome any feedback from the MySQL Community.

Monitoring MySQL Product Options

October 16, 2009 by ronald

I’ve had plenty of comments on specific products to Monitoring MySQL Options before providing the completed list. Here are the results from my survey to give everybody a more complete list.

Nagios	25	xxxxxxxxxxxxxxxxxxxxxxxxx
MONyog	8	xxxxxxxx
Cacti	4	xxxx
Munin	3	xxx
MySQL Enterprise Monitor/Merlin	3	xxx
Hyperic	2	xx
KontrolBase	2	xx
Zabbix	2	xx
Big Brother	1	x
iGlass	1	x
MyDBA	1	x
MySQL AR	1	x
pacemaker	1	x
Panopta	1	x
Opsview	1	x
Monit	1	x
Tivoli	1	x

NOTE: Some answers included multiple products, these are all counted separately in the above figures.

There are a few products that are not listed at Monitoring MySQL in this list.

If you want to list what you use, please continue to use the MySQL Alert Monitoring Survey. Thanks to all those that replied.

Monitoring MySQL options

October 15, 2009 by ronald

My recent poll What alert monitoring do you use? showed 25% of the 58 respondents to bravely state they had no MySQL monitoring. I see 1 in 3, ~33% in my consulting so this is consistent.

There is no excuse to not have some MySQL Monitoring on your production system. At the worse case, you should be logging important MySQL information for later analysis. I use my own Logging and Analyzing scripts on every client for an immediate assessment regardless of what’s available. I combine that with my modified statpack to give me immediate text based analysis, broken down by hour chunks for quick reference. These help me in troubleshooting, but they are not a complete solution.

The most popular options I see and are also reflected in the results are:

Cacti with MySQL Cacti Templates – Open Source
Nagios used in conjunction with Cacti for alert monitoring – Open Source
MONyog – Commercial with Free trial

There is a good list, including some products I did not know. My goal is to get this information included in the Monitoring-MySQL information site.

I have some additional information on Cacti and MONyog, and I’ll be sharing this information in upcoming posts.

Unknown locale for statpack & maatkit

October 15, 2009 by ronald

I had trouble today on a client site using my MySQL power tools Maatkit and Statpack.

$ ~/scripts/statpack.py --files=mysql.status.091015.080001.txt,mysql.status.091015.090001.txt
Traceback (most recent call last):
  File "/home/rbradfor/scripts/statpack.py", line 563, in ?
    main()
  File "/home/rbradfor/scripts/statpack.py", line 527, in main
    locale.setlocale(locale.LC_NUMERIC, '')
  File "/usr/lib64/python2.4/locale.py", line 381, in setlocale
    return _setlocale(category, locale)
locale.Error: unsupported locale setting

$ cat /var/log/slow-query.log | ./mk-query-digest
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
	LANGUAGE = (unset),
	LC_ALL = (unset),
	LANG = "e_US"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").

I tracked a difference in the LANG environment variable with what is on another production server. Changing it addressed the problem, however I suspect a more underlying OS related problem that I did not have time to address.

$ env
...
LANG=e_US
...
$ export LANG=en_US.UTF-8

I didn’t track the source of e_US, en_US also worked.

Take a look at mk-query-digest

October 8, 2009 by ronald

Q: What SQL is running on your MySQL database server now?
A: The bane of pain for MySQL DBA’s when there is no official MySQL instrumentation that is dynamic and fine grained sufficiently to solve this problem at the SQL interface.

While hybrid solutions exist, the lack of dynamic and real-time are the issues. There is however great work being done by Baron and others on Maatkit mk-query-digest and packet sniffing the MySQL TCP packets.

$ sudo tcpdump -i eth0 port 3306 -s 65535  -x -n -q -tttt | ./mk-query-digest --type tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
# Caught SIGINT.
5444 packets captured
8254 packets received by filter
2809 packets dropped by kernel
# 2.1s user time, 40ms system time, 22.23M rss, 57.60M vsz
# Overall: 1.58k total, 38 unique, 262.67 QPS, 2.34x concurrency _________
#                    total     min     max     avg     95%  stddev  median
# Exec time            14s    41us      2s     9ms    23ms    72ms   236us
# Time range        2009-10-05 08:17:44.377926 to 2009-10-05 08:17:50.052460
# bytes            271.31k       8   8.79k  176.28  621.67  412.51   44.60
# Rows affe            248       0       3    0.16    0.99    0.38       0
# Warning c              0       0       0       0       0       0       0
#   3% (58)   No_index_used

# Query 1: 118.67 QPS, 1.23x concurrency, ID 0x16219655761820A2 at byte 2167682
#              pct   total     min     max     avg     95%  stddev  median
# Count         45     712
# Exec time     52      7s    41us      1s    10ms    23ms    80ms   138us
# Hosts                 11 10.251.199... (132), 10.251.103... (129)... 9 more
# Time range 2009-10-05 08:17:44.377926 to 2009-10-05 08:17:50.051177
# bytes          2   6.43k       8      17    9.25   16.81    3.15    7.70
# Errors                 1    none
# Rows affe      1       4       0       1    0.01       0    0.07       0
# Warning c      0       0       0       0       0       0       0       0
#   0% (1)    No_index_used
# Query_time distribution
#   1us
#  10us  ################################################
# 100us  ################################################################
#   1ms  #####
#  10ms  #############
# 100ms  #
#    1s  #
#  10s+
# EXPLAIN
select 1G

....

# Rank Query ID           Response time    Calls   R/Call     Item
# ==== ================== ================ ======= ========== ====
#    1 0x16219655761820A2     7.3861 54.9%     712   0.010374 SELECT
#    2 0x930DE584EC815E11     1.6664 12.4%      35   0.047611 SELECT X
#    3 0x68B1E4E47977667A     1.4265 10.6%      71   0.020092 SELECT Y Z
...

In this real-time example, the SELECT 1 a Connector/J keep alive in version 3.1.4 using iBATIS is the major SQL statement used. (yes MM I know about /* ping*/, have suggested to client). I was however with additional sample times able to identify a new query and confirm a full table scan by lack of good index. Monitoring had highlighted an increase in SQL statements and table scans, but you need tools such as this to identify the problem SQL in a well tuned system.

There is a lot of information to digest with this output, to confirm and determine the relative benefit of each number, the histogram etc, but the identification of SQL in real-time and the good work of overall summaries and comments for EXPLAIN and SHOW CREATE TABLE for example shows this tool has been designed by MySQL DBA’s for MySQL DBA’s.

Sheeri just wrote about Dynamic General and Slow Query Log Before MySQL 5.1 which apart from the File I/O overhead is an idea I’d not considered before. What may be a good idea, is to pass this information into a named pipe and then let another process do whatever. Drizzle solves this problem with query logging information being able to be shipped off to Gearman.

Simplicity

October 7, 2009 by ronald

Simplicity – Always strive for a simpler solution.

This is a principle I have held and have used for many years in my technology based profession.

It’s very surprising that many organizations when addressing a problem forget to look at what is indeed right infront of them. The same is said about solving a technology problem. When I first heard about Agile Methodology practices in eXtreme Programming (XP) in 1999, I was quick to adopt this approach, because for no other words, it simplified the software development approach. It used a common sense, was practical and provided a test driven approach to improving quality which is key to successful software.

I was reading To Change Effectively, Change Just One Thing where Peter Bregman states “Just simplify it. Reduce it to its essence.” and ” The brilliance is rarely in the model, it’s in the implementation.” I’d encourage you to read the full article which has several valuable reference points.

The KISS – Keep it simple, stupid principle is something you should practice daily. Not just in your job, but in your life.

Looking just at the data

October 7, 2009 by ronald

There are many areas you need to review when addressing MySQL performance such as current database load, executed SQL statements, connections, configuration parameters, memory usage, disk to memory ratio, hardware performance & bottlenecks just to name a few.

If you were to just look at the data that is held in the database, what would you consider?
Here are my tips, when looking just at the data.

What is the current database size?
What is the growth of data over time, say daily, weekly?
Which are the 2 largest tables now?
What 2 tables are growing the fastest?
What tables have greatest churn, specifically DELETE’s?
How often do you optimize your tables?
What is your archiving/purging strategy? Do you even have one?
Review data types? I average 25% reduction in footprints, just by choosing optimal data types, generally with zero code changes.
What further data simplification can occur to reduce size, eg. INT for IP’s, enums, removing repeating text etc?
What normalization of data can occur?
What storage engines are in use?
What data is write once data?
Can data be stored in other forms, e.g. outside a relational database?

Even without looking at the SQL statements or the MySQL configuration you can generally deduce a lot of information about the application by just looking at the data.

What alert monitoring do you use?

October 7, 2009 by ronald

More importantly, how often to you confirm access to your server and database with that alert monitoring?

With a client yesterday the primary database server while still usable and serving connections for a while, but was not accessible via SSH to investigate performance issues. It eventually became non responsive and required a physical reboot. With alert monitoring for system availability only recorded every 5 minutes this was simply too long a delay.

This lead to a discussion with more questions then answers including.

How often should you ping your server(s), both internally and externally?
How often do you connect physically to your server for confirmation, e.g. a ssh keyed authentication test?
How often do you perform a physical database connection test?
How often do you do an end to end test, including http request to database query test?

As with all of these, you also want to time these operations for any deviations.

I’ve created a very simple MySQL Alert Monitoring survey. I would appreciate your input.

NoSQL options

October 6, 2009 by ronald

The NoSQL event in New York had a number of presentations on non relational technologies including of Hadoop, MongoDB and CouchDB.

Coming historically from a relational background of 20 years with Ingres, Oracle and MySQL I have been moving my focus towards non relational data store. The most obvious and well used today is memcached, a non persistent distributed key/value pair store. There are a number of persistent key/value stores in the marketplace, Tokyo Cabinet, Project Voldemort and Redis to name a few.

My list of data store products helps to identify the complex name space of varying products that now exist. A trend is towards schema less solutions, the ability to better manage dynamically typed/formatted information and the Agile Methodology release approach is simply non achievable in a statically type relational database table/column structure. The impact of constant ALTER TABLE commands in a MySQL database makes your production system unusable.

In a highly distribute online and increasing offline operation, fault tolerance and data synchronization and eventual consistency are required features in complex topologies such as multi-master.

I advise and promote a technology agnostic solution when possible. With the use of an API this is actually achievable, however in order to use a variety of backend data store products, one must consider the design patterns for optimal management. Two factors to support a highly distributed data set are no joins and minimal transactional semantics. The Facebook API is a great example, where there are no joins for their MySQL Relational backend. The movement back to a logical and non-normalized schema, or move towards a totally schemaless solution do require great though in the architectural concepts of your application.

Ultimately feature requirements will dictate the relative strengths and weaknesses of products. Full text search is a good example. CouchDB provides native support via Lucene. Another feature I like of couchDB is its append only data mode. This makes durability easy, and auto-recovery after crash a non issue, another feature a transactional relational database can not achieve.

With a 2 day no:sql(east) conference this month, there is definitely greater interest in this space.

Testability

October 2, 2009 by ronald

If I was to provide one tip for organizations on how to implement a successful technology solution, I would state you need to ensure your product/software/system is completely testable. Independent on how you elect to test your system, the design of creating a completely testable infrastructure will enable exponential savings as your business grows.

You achieve this by implementing an Application Programming Interface (API) for all data access. Your goal should be to move away from technology dependence and towards a technology agnostic solution, your dependency is now your business specification. This does not mean you are going to expose this API to the Internet, your own applications are your first clients, your web site and your management reporting tools. Your website is just a client presentation of your most valuable asset, your information.

Creating an environment that enables you test and verify your information independently from how is renders in a browser, enables a complete level of possible automation for testing this component of your communication channel. While end to end testing is also necessary, this becomes more complex and is impractical if this is your only means of testing. The principle of any popular Agile methodology approach is around testing where one popular term is Test Driven Development (TDD). While you may not implement TDD, knowing and applying the principals enables testability.

As you continue to grow, you will realize you now have the infrastructure and ability to stress test your most important system features. It is a common misconception that testing is about ensuring your software works as designed. Testing should not be about what works, but what doesn’t break. The goal of testing should be to break your software. The ability to stress test your system is to know when your system will fail. This ability to predict can benefit you ahead of time. You do not want your startup to suffer a successful catastrophe where you meet all your marketing goals, but you system crashes, and while the “Twitter failed whale” is frustrating, this is one approach attempt to mediate a total failure.