Monitoring the right MySQL slow queries

December 23, 2009 by ronald

When looking at a set of SQL statements in isolation with tools such as the slow query log, processlist and tcpdump/mk-query-digest it is easy to identify queries that are slow.

It is more difficult however to identify the frequency of the query, and whether the slow query is actually acceptable in your overall system design.

I very simple technique to help you is to comment your queries. For example:

SELECT /* 10m cache */ ...;

When I’m working for a client, this process enables me to realize the purpose of the query more quickly and to help prioritize which queries I need to improve first. This does not mean I simply ignore these less frequent queries, however my choices for reviewing and indexing queries on given tables is adjusted generally for OLTP queries first, and batch second.

As with all practices it is important to implement across your full code base.

What do MySQL staff think of the acquisition?

December 21, 2009 by ronald

It finally dawned on me while reflecting on the year past this Sunday that the missing voice since the announcement of the Oracle acquisition of Sun Microsystems (and therefore MySQL) has been the MySQL employees.

When I worked as an employee for MySQL Inc, the acquisition by Sun Microsystems in 2008 lead to several requirements about the acquisition.

You were not allowed to talk about the acquisition publically.
You were not allowed to communicate with any Sun (i.e. the acquirer) resources.

In other words it was “business as usual” which is really an oxymoron, because business will never be exactly as it was before the announcement. The ongoing delay in pending acquisition by Oracle Corporation is really hurting everybody with getting on with doing their jobs, being happy with their work, and making a difference in open source and in the lives of all the benefit from using MySQL.

I’m sure many that have words to say are disappointed, worried or even fearful of their own future careers. Comments are always welcome via Mr Anonymous using 10 minute email.

Understanding installing MySQL rpm versions

December 16, 2009 by ronald

I have a problem with an easy way to install MySQL via rpm without resorting to specifying the exact point release of MySQL. Presently my local yum repository has versions of 5.0, 5.1,5.4 and 5.5.

If I want to install MySQL Sever, I can just run:

$ sudo yum install -y MySQL-server
Setting up Install Process
Package MySQL-server-community-5.5.0-1.rhel5.x86_64 already installed and latest version
Nothing to do

The issue here is the most current version is installed. If I want to install the most current version of 5.1 for example, I have found no way to specify MySQL-server-5.1, or MySQL-server-community-5.1, I have to specify the point release MySQL-server-community-5.1.40

I suspect there is some internal aliasing that may be possible within rpm’s to support this. I’m seeking help from any rpm experts and would appreciate any feedback.

My current products include:

$ sudo yum list MySQL-server-community
Installed Packages
MySQL-server-community.x86_64      5.5.0-1.rhel5        installed
Available Packages
MySQL-server-community.x86_64      5.0.82-0.rhel5       mydb-rhel5-server-x86_64
MySQL-server-community.x86_64      5.0.82-0.rhel5       mydb-rhel5-x86_64
MySQL-server-community.x86_64      5.1.40-0.rhel5       mydb-rhel5-server-x86_64
MySQL-server-community.x86_64      5.1.40-0.rhel5       mydb-rhel5-x86_64
MySQL-server-community.x86_64      5.4.3-0.rhel5        mydb-rhel5-server-x86_64
MySQL-server-community.x86_64      5.4.3-0.rhel5        mydb-rhel5-x86_64
MySQL-server-community.x86_64      5.5.0-1.rhel5        mydb-rhel5-server-x86_64
MySQL-server-community.x86_64      5.5.0-1.rhel5        mydb-rhel5-x86_64

The Oracle EU statement on MySQL – What's missing

December 15, 2009 by ronald

Many providers embedd MySQL with their commercial products including Adobe, Macfee, Nokia, Symantec and ScienceLogic just to name a few. In addition most commercial third party storage engines have for years been forced to provided very customized versions of MySQL due to limitations in the storage engine API. These situations require a license agreement necessary with the trademark holder of MySQL. The Oracle Corporation EU Statement released on December 14, 2009 has carefully worded in the statement about these OEM licenses and storage engine providers there will be no changes for 5 years.

One specific detail is missing, what happens then?

As an individual that uses, recommends, promotes and advises clients especially on various storage engine offerings, I see the downstream effect of uncertainty for these providers can kill the entrepreneurial spirit that the economy desperately needs.

I see the possible impact as two fold. If company X that pays a license agreement is forced in 5 years to now pay say 3x the fee for example and cannot do so, the supplier chain is broken. What about companies that use the product from company X. Are they now liable with a cease and desist notice, therefore jeopardizing these businesses as well.

I see a further complexity in the agreement for storage engine providers. While there is a commitment to maintain the API, basically all storage engine providers don’t use the API, they have custom extensions and custom binaries essential for operation. The agreement makes specific reference to using the provided interfaces. Will the ability to provide heavily customized versions be permitted?

I’m not an expert in open source and commercial licenses and I am not a lawyer, however in the past I admit I may have been ignorant to the specifics that can affect the livelihood of many including myself.

As with any problem, I seek to have or find a solution. I am unclear here what options do exist, and would appreciate feedback.

Due to the anti competitive (*) nature of this resulting acquisition (#), should there be provisions to ensure unrestricted commercial licensing will always be available in the future regardless of time frame. Should limits on the amount of increase or change in license costs to a reasonable amount be enforced to protect companies.

It is most unclear about what will happen and this uncertainly I’m sure is affecting commercial interests.

Until this statement was released, the greatest fear in the MySQL community has been uncertainly. This statement does mention some details but the specifics of commercial business will never be known and therefore still affects us.

(*) Anti competitive nature. Contrary to statements by others, MySQL and Oracle are competitors in the relational database space. While generally they service different spectrums, there is sufficient overlap. As an expert and speaker in Oracle to MySQL migrations I clearly see the competition with these products. For example, while I will never purchase a Ferrari over a Holden for example due to cost and desirable but unnecessary features, they are still both cars and are competitors in their respective industry, the same applies to Oracle and MySQL.

(#) It is important to realize that Oracle is not acquiring MySQL directly. Oracle is acquiring Sun Microsystems which includes Sun hardware platform, the Solaris Operating System and Java which I’m sure are the primary motivations. MySQL is included as it is now owned by Sun Microsystems, and this is the current issue.

Couldn't load plugin named 'innodb'

December 12, 2009 by ronald

As part of reviewing storage engines for my work on the upcoming Expert PHP and MySQL book, I finally had an excuse to try out the InnoDB Plugin for MySQL which is now conveniently included with MySQL 5.1 since 5.1.38.

Following the MySQL 5.1 Reference Manual instructions at 13.6. The InnoDB Storage Engine I included the bare minimum as documented to my my.cnf.

[mysqld]
ignore-builtin-innodb
plugin-load=innodb=ha_innodb_plugin.so ;innodb_trx=ha_innodb_plugin.so ;innodb_locks=ha_innodb_plugin.so ;innodb_cmp=ha_innodb_plugin.so ;innodb_cmp_reset=ha_innodb_plugin.so ;innodb_cmpmem=ha_innodb_plugin.so ;innodb_cmpmem_reset=ha_innodb_plugin.so

However to my misfortune the following error occured.

091212 17:45:14 [ERROR] Can't open shared library '/home/rbradfor/mysql/mysql-5.1.41-linux-x86_64-glibc23/lib/plugin/ha_innodb_plugin.so ' (errno: 22 cannot open shared object file: No such file or directory)
091212 17:45:14 [ERROR] Couldn't load plugin named 'innodb' with soname 'ha_innodb_plugin.so '.

I double checked the files were indeed included in the binary release before undertaking a google search, to reveal the official Innodb Plugin Manual page but still not much to go on with the information ” Can’t open shared library library_name — Diagnose the cause from the following message details.”

Back to re-reading the error message, as referenced, I observed there is a trailing space ‘ ‘ ater the plugin name. Observing that I had indeed had this the my.cnf, the result of joining the lines from the MySQL manual which were broken down for presentation means (as noted).

Removed the space, and viola it works.

mysql> show plugins;
+---------------------+----------+--------------------+---------------------+---------+
| Name                | Status   | Type               | Library             | License |
+---------------------+----------+--------------------+---------------------+---------+
| binlog              | ACTIVE   | STORAGE ENGINE     | NULL                | GPL     |
| partition           | ACTIVE   | STORAGE ENGINE     | NULL                | GPL     |
| ARCHIVE             | ACTIVE   | STORAGE ENGINE     | NULL                | GPL     |
| BLACKHOLE           | ACTIVE   | STORAGE ENGINE     | NULL                | GPL     |
| CSV                 | ACTIVE   | STORAGE ENGINE     | NULL                | GPL     |
| FEDERATED           | DISABLED | STORAGE ENGINE     | NULL                | GPL     |
| MEMORY              | ACTIVE   | STORAGE ENGINE     | NULL                | GPL     |
| MyISAM              | ACTIVE   | STORAGE ENGINE     | NULL                | GPL     |
| MRG_MYISAM          | ACTIVE   | STORAGE ENGINE     | NULL                | GPL     |
| InnoDB              | ACTIVE   | STORAGE ENGINE     | ha_innodb_plugin.so | GPL     |
| INNODB_TRX          | ACTIVE   | INFORMATION SCHEMA | ha_innodb_plugin.so | GPL     |
| INNODB_LOCKS        | ACTIVE   | INFORMATION SCHEMA | ha_innodb_plugin.so | GPL     |
| INNODB_CMP          | ACTIVE   | INFORMATION SCHEMA | ha_innodb_plugin.so | GPL     |
| INNODB_CMP_RESET    | ACTIVE   | INFORMATION SCHEMA | ha_innodb_plugin.so | GPL     |
| INNODB_CMPMEM       | ACTIVE   | INFORMATION SCHEMA | ha_innodb_plugin.so | GPL     |
| INNODB_CMPMEM_RESET | ACTIVE   | INFORMATION SCHEMA | ha_innodb_plugin.so | GPL     |
+---------------------+----------+--------------------+---------------------+---------+

I would question that this level of pedantic checking is unnecessary. A simple trim on semi-colon ‘;’ separated values could be an easy fix.

Is your database schema in sync?

November 25, 2009 by ronald

If you have more then a single MySQL database for your production environment, e.g. a development and test environment, or a MySQL replication topology, ensuring your schema’s are in sync can be task that requires some time if not managed correctly.

There is a tool I do use for MySQL environments called Schema Sync – a MySQL Schema Versioning and Migration Utility. There are many reasons why schema’s get out of sync. Developers may not ensure their changes are reflected in any software to be deployed, and when not tested you could end up with broken functionality. A DBA might try some different index strategies on a slave, but not the master, and never implement or revert.

While some people want the quick and hidden just sync version akin to Rails, I really like this product as it produces proper patch and revert SQL scripts. As a DBA I really want to know what’s going to be applied to my schema, I want to be able to review it before it’s just magically applied. The revert process is also ideal, in time critical situations when you may need to rollback a change, either it’s taking to long to deploy.

This is definitely a tool to use over time, especially if you have multiple developers, DBA’s and more then one MySQL environment that should be the same.

Monitoring MySQL with MONyog

November 25, 2009 by ronald

It just works. In absence of any MySQL monitoring for your site, I have found no solution that gets you operational as quickly and easily. MONyog can be deployed in 60 seconds, and configured in another 60 seconds. Within 5 minutes you can have visual monitoring of your MySQL environment.

MONyog is an agentless process, which is an advantage for easy install, but does not provide for monitoring redundancy in the capture of information due to agentless nature. It’s a static standalone executable which is great if you need something to work out of the box. You can easily configure multiple servers in a replication topology, or different servers in your environment. You get the ability to monitor all the usual information, with a dashboard and detailed graphs. While MONyog does provide customizations of rules for the graphs and presentation order, that’s about it. You can’t at this time for example change the colors, what’s on graphs except for what MONyog monitors or the security of certain options in the GUI to different users, however I hope they offer these suggestions in future releases.

MONyog includes some nice features that are overlooked in other products. You have the ability to monitor the MySQL error log (if configured appropriately) which is a common complaint of end users. You can also see the process list, and when configured you can also perform query gathering and analysis.

MONyog is a well priced commercial product with a free trial download without registration requirements which gives no barrier to access and evaluate. As a solution and ease of deployment, there is no excuse not to evaluate this product. If you have no monitoring, you can now quickly and easily. I find a number of clients that simply have no monitoring. There really is no excuse as it’s critical information you need to have and record for a successful business.

You can get it from www.webyog.com.

MySQL Permissions – Restarting MySQL

November 19, 2009 by ronald

I am working with a client that is using managed hosting on dedicated servers. This has presented new challenges in obtaining the right permissions to undertake MySQL tasks but not have either ‘root’ or ‘mysql’ access and not have to involve a third party everytime.

Adding the following to the /etc/sudoers file enabled the ability to restart MySQL.

User_Alias	DBA = rbradfor, user2, etc
Host_Alias 	DB_SERVERS = server1.example.com, server2.example.com, etc
Cmnd_Alias	MYSQL = /etc/init.d/mysqld, /usr/sbin/tcpdump

DBA DB_SERVERS = MYSQL

As you can see I also got tcpdump, which I find valuable to monitor via mk-query-digest.

Next, permissions for log files.

Monitoring MySQL resource limits

October 27, 2009 by ronald

I have for the first time seen a client implement MySQL Resource Limits. I got the following error tying to connect to the database.

$ mysql -udba -p
ERROR 1226 (42000): User 'dba' has exceeded the 'max_user_connections' resource (current value: 10)

I see from the documentation the ability to see the limits in the mysql.user table. I see this is included in the SHOW GRANTS output.

SHOW GRANTS for 'dba'@'%';
+--------------------------------------------------------------------------------------------------------------------------------------------+
| Grants for dba@%                                                                                                                         |
+--------------------------------------------------------------------------------------------------------------------------------------------+
| GRANT ALL PRIVILEGES ON *.* TO 'dba'@'%' IDENTIFIED BY PASSWORD '*CAABA4CFB7E71E51477E0658FC2D2BBA1267E669' WITH MAX_USER_CONNECTIONS 10 |
+--------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

The documentation includes details that you can flush the resource limits, however I have found no way to monitor the current usage.

I welcome any feedback from the MySQL Community.

Monitoring MySQL Product Options

October 16, 2009 by ronald

I’ve had plenty of comments on specific products to Monitoring MySQL Options before providing the completed list. Here are the results from my survey to give everybody a more complete list.

Nagios	25	xxxxxxxxxxxxxxxxxxxxxxxxx
MONyog	8	xxxxxxxx
Cacti	4	xxxx
Munin	3	xxx
MySQL Enterprise Monitor/Merlin	3	xxx
Hyperic	2	xx
KontrolBase	2	xx
Zabbix	2	xx
Big Brother	1	x
iGlass	1	x
MyDBA	1	x
MySQL AR	1	x
pacemaker	1	x
Panopta	1	x
Opsview	1	x
Monit	1	x
Tivoli	1	x

NOTE: Some answers included multiple products, these are all counted separately in the above figures.

There are a few products that are not listed at Monitoring MySQL in this list.

If you want to list what you use, please continue to use the MySQL Alert Monitoring Survey. Thanks to all those that replied.

Monitoring MySQL options

October 15, 2009 by ronald

My recent poll What alert monitoring do you use? showed 25% of the 58 respondents to bravely state they had no MySQL monitoring. I see 1 in 3, ~33% in my consulting so this is consistent.

There is no excuse to not have some MySQL Monitoring on your production system. At the worse case, you should be logging important MySQL information for later analysis. I use my own Logging and Analyzing scripts on every client for an immediate assessment regardless of what’s available. I combine that with my modified statpack to give me immediate text based analysis, broken down by hour chunks for quick reference. These help me in troubleshooting, but they are not a complete solution.

The most popular options I see and are also reflected in the results are:

Cacti with MySQL Cacti Templates – Open Source
Nagios used in conjunction with Cacti for alert monitoring – Open Source
MONyog – Commercial with Free trial

There is a good list, including some products I did not know. My goal is to get this information included in the Monitoring-MySQL information site.

I have some additional information on Cacti and MONyog, and I’ll be sharing this information in upcoming posts.

Unknown locale for statpack & maatkit

October 15, 2009 by ronald

I had trouble today on a client site using my MySQL power tools Maatkit and Statpack.

$ ~/scripts/statpack.py --files=mysql.status.091015.080001.txt,mysql.status.091015.090001.txt
Traceback (most recent call last):
  File "/home/rbradfor/scripts/statpack.py", line 563, in ?
    main()
  File "/home/rbradfor/scripts/statpack.py", line 527, in main
    locale.setlocale(locale.LC_NUMERIC, '')
  File "/usr/lib64/python2.4/locale.py", line 381, in setlocale
    return _setlocale(category, locale)
locale.Error: unsupported locale setting

$ cat /var/log/slow-query.log | ./mk-query-digest
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
	LANGUAGE = (unset),
	LC_ALL = (unset),
	LANG = "e_US"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").

I tracked a difference in the LANG environment variable with what is on another production server. Changing it addressed the problem, however I suspect a more underlying OS related problem that I did not have time to address.

$ env
...
LANG=e_US
...
$ export LANG=en_US.UTF-8

I didn’t track the source of e_US, en_US also worked.

Take a look at mk-query-digest

October 8, 2009 by ronald

Q: What SQL is running on your MySQL database server now?
A: The bane of pain for MySQL DBA’s when there is no official MySQL instrumentation that is dynamic and fine grained sufficiently to solve this problem at the SQL interface.

While hybrid solutions exist, the lack of dynamic and real-time are the issues. There is however great work being done by Baron and others on Maatkit mk-query-digest and packet sniffing the MySQL TCP packets.

$ sudo tcpdump -i eth0 port 3306 -s 65535  -x -n -q -tttt | ./mk-query-digest --type tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
# Caught SIGINT.
5444 packets captured
8254 packets received by filter
2809 packets dropped by kernel
# 2.1s user time, 40ms system time, 22.23M rss, 57.60M vsz
# Overall: 1.58k total, 38 unique, 262.67 QPS, 2.34x concurrency _________
#                    total     min     max     avg     95%  stddev  median
# Exec time            14s    41us      2s     9ms    23ms    72ms   236us
# Time range        2009-10-05 08:17:44.377926 to 2009-10-05 08:17:50.052460
# bytes            271.31k       8   8.79k  176.28  621.67  412.51   44.60
# Rows affe            248       0       3    0.16    0.99    0.38       0
# Warning c              0       0       0       0       0       0       0
#   3% (58)   No_index_used

# Query 1: 118.67 QPS, 1.23x concurrency, ID 0x16219655761820A2 at byte 2167682
#              pct   total     min     max     avg     95%  stddev  median
# Count         45     712
# Exec time     52      7s    41us      1s    10ms    23ms    80ms   138us
# Hosts                 11 10.251.199... (132), 10.251.103... (129)... 9 more
# Time range 2009-10-05 08:17:44.377926 to 2009-10-05 08:17:50.051177
# bytes          2   6.43k       8      17    9.25   16.81    3.15    7.70
# Errors                 1    none
# Rows affe      1       4       0       1    0.01       0    0.07       0
# Warning c      0       0       0       0       0       0       0       0
#   0% (1)    No_index_used
# Query_time distribution
#   1us
#  10us  ################################################
# 100us  ################################################################
#   1ms  #####
#  10ms  #############
# 100ms  #
#    1s  #
#  10s+
# EXPLAIN
select 1G

....

# Rank Query ID           Response time    Calls   R/Call     Item
# ==== ================== ================ ======= ========== ====
#    1 0x16219655761820A2     7.3861 54.9%     712   0.010374 SELECT
#    2 0x930DE584EC815E11     1.6664 12.4%      35   0.047611 SELECT X
#    3 0x68B1E4E47977667A     1.4265 10.6%      71   0.020092 SELECT Y Z
...

In this real-time example, the SELECT 1 a Connector/J keep alive in version 3.1.4 using iBATIS is the major SQL statement used. (yes MM I know about /* ping*/, have suggested to client). I was however with additional sample times able to identify a new query and confirm a full table scan by lack of good index. Monitoring had highlighted an increase in SQL statements and table scans, but you need tools such as this to identify the problem SQL in a well tuned system.

There is a lot of information to digest with this output, to confirm and determine the relative benefit of each number, the histogram etc, but the identification of SQL in real-time and the good work of overall summaries and comments for EXPLAIN and SHOW CREATE TABLE for example shows this tool has been designed by MySQL DBA’s for MySQL DBA’s.

Sheeri just wrote about Dynamic General and Slow Query Log Before MySQL 5.1 which apart from the File I/O overhead is an idea I’d not considered before. What may be a good idea, is to pass this information into a named pipe and then let another process do whatever. Drizzle solves this problem with query logging information being able to be shipped off to Gearman.

Simplicity

October 7, 2009 by ronald

Simplicity – Always strive for a simpler solution.

This is a principle I have held and have used for many years in my technology based profession.

It’s very surprising that many organizations when addressing a problem forget to look at what is indeed right infront of them. The same is said about solving a technology problem. When I first heard about Agile Methodology practices in eXtreme Programming (XP) in 1999, I was quick to adopt this approach, because for no other words, it simplified the software development approach. It used a common sense, was practical and provided a test driven approach to improving quality which is key to successful software.

I was reading To Change Effectively, Change Just One Thing where Peter Bregman states “Just simplify it. Reduce it to its essence.” and ” The brilliance is rarely in the model, it’s in the implementation.” I’d encourage you to read the full article which has several valuable reference points.

The KISS – Keep it simple, stupid principle is something you should practice daily. Not just in your job, but in your life.

Looking just at the data

October 7, 2009 by ronald

There are many areas you need to review when addressing MySQL performance such as current database load, executed SQL statements, connections, configuration parameters, memory usage, disk to memory ratio, hardware performance & bottlenecks just to name a few.

If you were to just look at the data that is held in the database, what would you consider?
Here are my tips, when looking just at the data.

What is the current database size?
What is the growth of data over time, say daily, weekly?
Which are the 2 largest tables now?
What 2 tables are growing the fastest?
What tables have greatest churn, specifically DELETE’s?
How often do you optimize your tables?
What is your archiving/purging strategy? Do you even have one?
Review data types? I average 25% reduction in footprints, just by choosing optimal data types, generally with zero code changes.
What further data simplification can occur to reduce size, eg. INT for IP’s, enums, removing repeating text etc?
What normalization of data can occur?
What storage engines are in use?
What data is write once data?
Can data be stored in other forms, e.g. outside a relational database?

Even without looking at the SQL statements or the MySQL configuration you can generally deduce a lot of information about the application by just looking at the data.

What alert monitoring do you use?

October 7, 2009 by ronald

More importantly, how often to you confirm access to your server and database with that alert monitoring?

With a client yesterday the primary database server while still usable and serving connections for a while, but was not accessible via SSH to investigate performance issues. It eventually became non responsive and required a physical reboot. With alert monitoring for system availability only recorded every 5 minutes this was simply too long a delay.

This lead to a discussion with more questions then answers including.

How often should you ping your server(s), both internally and externally?
How often do you connect physically to your server for confirmation, e.g. a ssh keyed authentication test?
How often do you perform a physical database connection test?
How often do you do an end to end test, including http request to database query test?

As with all of these, you also want to time these operations for any deviations.

I’ve created a very simple MySQL Alert Monitoring survey. I would appreciate your input.

NoSQL options

October 6, 2009 by ronald

The NoSQL event in New York had a number of presentations on non relational technologies including of Hadoop, MongoDB and CouchDB.

Coming historically from a relational background of 20 years with Ingres, Oracle and MySQL I have been moving my focus towards non relational data store. The most obvious and well used today is memcached, a non persistent distributed key/value pair store. There are a number of persistent key/value stores in the marketplace, Tokyo Cabinet, Project Voldemort and Redis to name a few.

My list of data store products helps to identify the complex name space of varying products that now exist. A trend is towards schema less solutions, the ability to better manage dynamically typed/formatted information and the Agile Methodology release approach is simply non achievable in a statically type relational database table/column structure. The impact of constant ALTER TABLE commands in a MySQL database makes your production system unusable.

In a highly distribute online and increasing offline operation, fault tolerance and data synchronization and eventual consistency are required features in complex topologies such as multi-master.

I advise and promote a technology agnostic solution when possible. With the use of an API this is actually achievable, however in order to use a variety of backend data store products, one must consider the design patterns for optimal management. Two factors to support a highly distributed data set are no joins and minimal transactional semantics. The Facebook API is a great example, where there are no joins for their MySQL Relational backend. The movement back to a logical and non-normalized schema, or move towards a totally schemaless solution do require great though in the architectural concepts of your application.

Ultimately feature requirements will dictate the relative strengths and weaknesses of products. Full text search is a good example. CouchDB provides native support via Lucene. Another feature I like of couchDB is its append only data mode. This makes durability easy, and auto-recovery after crash a non issue, another feature a transactional relational database can not achieve.

With a 2 day no:sql(east) conference this month, there is definitely greater interest in this space.

Testability

October 2, 2009 by ronald

If I was to provide one tip for organizations on how to implement a successful technology solution, I would state you need to ensure your product/software/system is completely testable. Independent on how you elect to test your system, the design of creating a completely testable infrastructure will enable exponential savings as your business grows.

You achieve this by implementing an Application Programming Interface (API) for all data access. Your goal should be to move away from technology dependence and towards a technology agnostic solution, your dependency is now your business specification. This does not mean you are going to expose this API to the Internet, your own applications are your first clients, your web site and your management reporting tools. Your website is just a client presentation of your most valuable asset, your information.

Creating an environment that enables you test and verify your information independently from how is renders in a browser, enables a complete level of possible automation for testing this component of your communication channel. While end to end testing is also necessary, this becomes more complex and is impractical if this is your only means of testing. The principle of any popular Agile methodology approach is around testing where one popular term is Test Driven Development (TDD). While you may not implement TDD, knowing and applying the principals enables testability.

As you continue to grow, you will realize you now have the infrastructure and ability to stress test your most important system features. It is a common misconception that testing is about ensuring your software works as designed. Testing should not be about what works, but what doesn’t break. The goal of testing should be to break your software. The ability to stress test your system is to know when your system will fail. This ability to predict can benefit you ahead of time. You do not want your startup to suffer a successful catastrophe where you meet all your marketing goals, but you system crashes, and while the “Twitter failed whale” is frustrating, this is one approach attempt to mediate a total failure.

Unexplained function output

September 28, 2009 by ronald

I was asked today to confirm the operation of INET_ATON() and INET_NTOA() functions for converting IP4 strings to numeric representations. My tests on the machine I was just connected to at the very instant reported the following results.

mysql> SELECT VERSION();
+-----------------+
| VERSION()       |
+-----------------+
| 5.1.31-1ubuntu2 |
+-----------------+
1 row in set (0.00 sec)

mysql> SELECT INET_ATON('74.165.97.204') AS ipn;
+------------+
| ipn        |
+------------+
| 1252352460 |
+------------+
1 row in set (0.00 sec)

mysql> SELECT INET_NTOA(1252352460) as ipa;
+---------------+
| ipa           |
+---------------+
| 74.165.97.204 |
+---------------+
1 row in set (0.01 sec)

The results on the client via ODBC using MySQL 5.0.45 was?

ipn
1252352460

ipa
37342E3136352E39372E323034

What is causing this? It was immediately suspicious the pattern of 2E (the only letter), which translates from Hex to decimal to ‘.’ and 37 becomes 7 and 34 becomes 4, so the string is in hexidecimal and then with the UNHEX() function you get the right answer?

mysql> select unhex('37342E3136352E39372E323034');
+-------------------------------------+
| unhex('37342E3136352E39372E323034') |
+-------------------------------------+
| 74.165.97.204                       |
+-------------------------------------+

Could it be the ODBC drivers? Investigation showed them current as 5.01.05.00

Searching the MySQL Bugs database didn’t reveal anything noticable, nor on the forums.

My best suggestion was to post on the MySQL ODBC forums but I welcome any feedback from my readers.

MySQL Query Cache path

September 28, 2009 by ronald

Just how effective is the Query Cache on performance? If you are not comfortable reading the MySQL code you can cheat very easily with the SHOW PROFILE command. This demonstration will show you at a high level the relative impact without and with the Query Cache. First let us confirm the Query Cache is not used.

mysql> SHOW GLOBAL VARIABLES LIKE 'query_cache%';
+------------------------------+---------+
| Variable_name                | Value   |
+------------------------------+---------+
| query_cache_limit            | 1048576 |
| query_cache_min_res_unit     | 4096    |
| query_cache_size             | 0       |
| query_cache_type             | ON      |
| query_cache_wlock_invalidate | OFF     |
+------------------------------+---------+
5 rows in set (0.00 sec)

We now enable profiling.

mysql> SET PROFILING=1;

We run our sample query.

mysql> select name,id,competitions from olympic_games where host_city='Sydney'G

We can then enable the query cache and re-run the query.

mysql> SET GLOBAL query_cache_size=1024*1024*16;
mysql> select name,id,competitions from olympic_games where host_city='Sydney'G
mysql> select name,id,competitions from olympic_games where host_city='Sydney'G

NOTE: We run the query twice after enabling the cache, the first time, the query is cached, the second time it is retrieved from the cache. Now let us look at the profiling information.

mysql> SHOW PROFILES;
+----------+------------+-------------------------------------------------------------------------+
| Query_ID | Duration   | Query                                                                   |
+----------+------------+-------------------------------------------------------------------------+
|        1 | 0.00096100 | select name,id,competitions from olympic_games where host_city='Sydney' |
|        2 | 0.00029700 | SET GLOBAL query_cache_size=1024*1024*16                                |
|        3 | 0.00837900 | select name,id,competitions from olympic_games where host_city='Sydney' |
|        4 | 0.00009500 | select name,id,competitions from olympic_games where host_city='Sydney' |
+----------+------------+-------------------------------------------------------------------------+

mysql> SHOW PROFILE SOURCE FOR QUERY 3;
+--------------------------------+----------+---------------------------+---------------+-------------+
| Status                         | Duration | Source_function           | Source_file   | Source_line |
+--------------------------------+----------+---------------------------+---------------+-------------+
| starting                       | 0.000033 | NULL                      | NULL          |        NULL |
| checking query cache for query | 0.000088 | send_result_to_client     | sql_cache.cc  |        1262 |
| Opening tables                 | 0.000025 | open_tables               | sql_base.cc   |        4482 |
| System lock                    | 0.000006 | mysql_lock_tables         | lock.cc       |         258 |
| Table lock                     | 0.000081 | mysql_lock_tables         | lock.cc       |         269 |
| init                           | 0.000037 | mysql_select              | sql_select.cc |        2350 |
| optimizing                     | 0.000016 | optimize                  | sql_select.cc |         772 |
| statistics                     | 0.000021 | optimize                  | sql_select.cc |         954 |
| preparing                      | 0.000021 | optimize                  | sql_select.cc |         964 |
| executing                      | 0.000005 | exec                      | sql_select.cc |        1648 |
| Sending data                   | 0.000500 | exec                      | sql_select.cc |        2190 |
| end                            | 0.000008 | mysql_select              | sql_select.cc |        2395 |
| query end                      | 0.000005 | mysql_execute_command     | sql_parse.cc  |        4821 |
| freeing items                  | 0.007489 | mysql_parse               | sql_parse.cc  |        5827 |
| storing result in query cache  | 0.000028 | query_cache_end_of_result | sql_cache.cc  |         813 |
| logging slow query             | 0.000007 | log_slow_statement        | sql_parse.cc  |        1628 |
| cleaning up                    | 0.000009 | dispatch_command          | sql_parse.cc  |        1595 |
+--------------------------------+----------+---------------------------+---------------+-------------+
17 rows in set (0.00 sec)

mysql> SHOW PROFILE SOURCE FOR QUERY 4;
+--------------------------------+----------+-----------------------+--------------+-------------+
| Status                         | Duration | Source_function       | Source_file  | Source_line |
+--------------------------------+----------+-----------------------+--------------+-------------+
| starting                       | 0.000035 | NULL                  | NULL         |        NULL |
| checking query cache for query | 0.000014 | send_result_to_client | sql_cache.cc |        1262 |
| checking privileges on cached  | 0.000010 | send_result_to_client | sql_cache.cc |        1346 |
| sending cached result to clien | 0.000026 | send_result_to_client | sql_cache.cc |        1441 |
| logging slow query             | 0.000005 | log_slow_statement    | sql_parse.cc |        1628 |
| cleaning up                    | 0.000005 | dispatch_command      | sql_parse.cc |        1595 |
+--------------------------------+----------+-----------------------+--------------+-------------+
6 rows in set (0.00 sec)

It does not take a rocket scientist to determine that 6 steps within the MySQL kernel is better then 17, regardless of what those steps are, and how different in timing they may be.

I’m not wanting to represent how much saving you may have here, there are many factors such as a realistic example, a loaded warmed up environment etc. You should try this in your own environment with your own queries.

This information was to provide an introduction into looking a little deeper at the Query Cache path within MySQL.

Using the Query Cache effectively

September 28, 2009 by ronald

Maximize your strengths, minimize your weaknesses.

You can apply this approach to many things in life, I apply it to describing and using MySQL the product, and it’s components. The Query Cache like many features in MySQL, and indeed features in many different RDBMS products (don’t get me started on Oracle *features*) have relative benefits. In one context it can be seen as ineffective, or even detrimental to your performance, however it’s course grain nature makes it both trivial to disable dynamically (SET GLOBAL query_cache_size=0;), and also easy to get basic statistics on current performance (SHOW GLOBAL STATUS LIKE ‘QCache%';) to determine effectiveness and action appropriately.

The Query Cache is course grained, that is it is rather simple/dumb in nature. When you understand the path of execution of a query within the MySQL kernel you learn a few key things.

When enabled, by default the Query Cache will cache all SELECT statements within certain defined system parameter conditions. There are of course exceptions such as non-deterministic functions, prepared statements in earlier versions etc.
Any DML/DDL statement for a table that has a query cached, flushes all query cache results that pertain to this table.
You can use SQL_CACHE and SQL_NO_CACHE as hints however you can’t configure on a table by table, or query basis.
The query cache works on an exact match of the query (including spaces and case) and other settings such the client character set, and protocol version. If a match is found, data is returned in preformed network packets.<.li>

The Query Cache was not good when set to large values (e.g. > 128M) due to in-efficient cache invalidation. I’m not certain of the original source of this condition however Bug #21074, fixed in 5.0.50 and 5.1.21 is likely the reason.

My advice is to disable the Query Cache by default, especially for testing. As a final stress test you can enable to determine if there is a benefit.

I wish MySQL would spend time in improving key features, for example the Query Cache lacks sufficient instrumentation like what queries are in the cache, what tables are in the cache, and also lack all the sufficient system parameters exposed to fine tune. I believe there is a patch to show the queries for example, but I was unable to find via a google search.

It is a powerful and easy technology if you use it well. It involves architecting your solution appropriately, and knowing when the Query Cache is ineffective.

I have a number of circumstances where the query cache is extremely effective, or could be with simple modifications. A recommendation to a recent client with a 1+TB database was to split historical and current data into two different instances. The data was already in separated tables, the application already performed dual queries, so the change was a simple as a new connection pool. The benefits were huge, not only would the backup process be more efficient, some 500GB of data now only had to be backed up once (as is was 100% static), the scaling and recovery process improved, but the second MySQL instance could enable the query cache and the application would get a huge performance improvement with ZERO code changes for caching. That’s a quick and easy win.

On a side note, I wanted to title this “The MySQL Query Cache is not useless”. When I was a MySQL employee I got reprimanded (twice) for blogging anything about MySQL that wasn’t positive. This blog post is in direct response to Konstantin, a Sun/MySQL employee who actually works on the actually MySQL server code who wrote Query cache = useless?. In my view it is not useless.

EXPLAIN – An essential tool for MySQL developers.

September 26, 2009 by ronald

Just recently I came across the presentation “Bend SQL to your will with EXPLAIN” by Ligaya Turmelle via the Linked In – MySQL Speakers and Presenters group. Slides available at Slideshare. While I know a little about MySQL, I always enjoy spending time reviewing content when possible, perhaps as a good reference when asked, perhaps to glean some small point of knowledge I didn’t know, or didn’t understand completely, or had forgotten about. It’s also fun to randomly find worthwhile information.

EXPLAIN is an essential tool for MySQL developers, if you don’t know what QEP is, the listed google search link gives you a clear and obvious easy definition (using basic contextual searching techniques). You would then use EXPLAIN to determine the QEP, and then learn how to use it well.

For those that want to learn about EXPLAIN as an essential tool for MySQL developers I recommend you check the presentation out.

Calculating your database size

September 25, 2009 by ronald

I generally use the following MySQL INFORMATION_SCHEMA (I_S) query to Calculate Your MySQL Database Size. This query and most others that access the MySQL INFORMATION_SCHEMA can be very slow to execute because they are not real tables and are not governed by physical data, memory buffers and indexes for example but rather internal MySQL data structures.

Mark Leith indicates in his post on innodb_stats_on_metadata that Innodb performs 8 random(ish) dives in to the index, when anybody accesses any of SHOW TABLE STATUS, SHOW INDEX, INFORMATION_SCHEMA.TABLES,INFORMATION_SCHEMA.STATISTICS for InnoDB tables. This can have an effect on performance, especially with a large number of Innodb tables, and a poor ratio of innodb_buffer_pool_size to disk data+index footprint.

What is even more incredible is when the result of this apparently harmless query causes the mysqld process to actual crash with a core dump due to these random index dives. The following core dump analysis highlights my query as the cause of the problem. This has happened now at least twice in for recent core crashes on a production environment.

(gdb) bt
#0 0x000000327280b6b2 in pthread_kill () from ./lib64/libpthread.so.0
#1 0x000000000055b136 in handle_segfault ()
#2 
#3 0x00000000007e1c21 in rec_get_offsets_func ()
#4 0x0000000000766007 in btr_estimate_number_of_different_key_vals ()
#5 0x000000000070d4c2 in dict_update_statistics_low ()
#6 0x000000000061fa84 in ha_innobase::info ()
#7 0x0000000000636972 in fill_schema_charsets ()
#8 0x0000000000639a66 in get_all_tables ()
#9 0x0000000000634633 in get_schema_tables_result ()
#10 0x00000000005bde37 in JOIN::exec ()
#11 0x00000000005bf7a7 in mysql_select ()
#12 0x00000000005c0127 in handle_select ()
#13 0x000000000056fcf0 in mysql_execute_command ()
#14 0x0000000000574c83 in mysql_parse ()
#15 0x00000000005751a0 in dispatch_command ()
#16 0x0000000000576483 in do_command ()
#17 0x0000000000577002 in handle_one_connection ()
#18 0x0000003272806367 in start_thread () from ./lib64/libpthread.so.0
#19 0x0000003271cd30ad in clone () from ./lib64/libc.so.6
Cannot access memory at address 0x3271cd3040

This is an information_schema query that caused innodb to open a table.
This is totally normal. On first open, innodb tables get automatically  analyzed.
This analyze process crashed in innodb.

This exact query *provoked* a crash:

(gdb) x/1s 0x00002aaabc961dd0
0x2aaabc961dd0: "SELECT table_schema,table_name,engine,row_format,
table_rows, avg_row_length,
(data_length+index_length)/1024/1024 as total_mb,
(data_length)/1024/1024 as data_mb,
(index_length)/1024/1024 as index_mb,
CURDATE() AS today
FROM information_schema.tables
WHERE table_schema=@schema
ORDER BY 7 DESC"

The issue however is which table is the problem? How widespread is the corruption. Would an ALTER TABLE ENGINE=Innodb rebuild the table and eliminate the problem. Would an ANALYZE on an Innodb table identify the problem? (I doubt this second point). The problem however is even more significant due to the actual system. The largest single table of this 1TB database is 500GB. The impact of performing the ALTER, the time to undertake this blocking operation, the increase in the Innodb data file that can’t be reclaimed are just two factors that the inexperienced may fall victim of.

A saying I use is “Disaster is inevitable”. In this situation the disaster appears to not be significant but the ramifications due to the lack of appropriate and expert architectural design considerations to correct the problem are.

Is your environment capable of supporting this maintenance requirement? If not, then is the decision maker in your organization worried enough to seek the expert advice to address pro actively or will it be too late.

Understanding Innodb Transaction Isolation

September 24, 2009 by ronald

The MySQL Innodb storage engine provides ACID compliance, the ‘I’ being isolation. There are four states defined in MySQL with the tx_isolation system variable, READ-UNCOMMITTED, READ-COMMITTED, REPEATABLE-READ and SERIALIZABLE.

Generally MySQL installations do not modify the default value of tx_isolation = REPEATABLE-READ, however I have seen with a number of clients the default value has been changed to READ-COMMITTED. When I attempt to identify the reason why, I have always received the same reason. Oracle uses a default transaction isolation of READ-COMMITTED. See how Oracle Manages Data Concurrency and Consistency for more information.

However, while the literal string is the same, the actual implementation of READ-COMMITTED in Oracle more closely represents the REPEATABLE-READ in MySQL.

The following demonstrates what you can expect to see between the operation of these two modes.

REPEATABLE-READ
Session 1	Session 2
DROP TABLE IF EXISTS transaction_test; CREATE TABLE transaction_test( id INT UNSIGNED NOT NULL AUTO_INCREMENT, val VARCHAR(20) NOT NULL, created TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY(id) ) ENGINE=InnoDB DEFAULT CHARSET latin1; INSERT INTO transaction_test(val) VALUES ('a'),('b'),('c');
SELECT @@global.tx_isolation, @@session.tx_isolation; +-----------------------+------------------------+ \| @@global.tx_isolation \| @@session.tx_isolation \| +-----------------------+------------------------+ \| REPEATABLE-READ \| REPEATABLE-READ \| +-----------------------+------------------------+ START TRANSACTION; SELECT * FROM transaction_test; +----+-----+---------------------+ \| id \| val \| created \| +----+-----+---------------------+ \| 1 \| a \| 2009-09-21 00:19:43 \| \| 2 \| b \| 2009-09-21 00:19:43 \| \| 3 \| c \| 2009-09-21 00:19:43 \| +----+-----+---------------------+ SELECT SLEEP(20);
	START TRANSACTION; INSERT INTO transaction_test(val) VALUES ('x'),('y'),('z'); SELECT * FROM transaction_test; +----+-----------------+---------------------+ \| id \| val \| created \| +----+-----------------+---------------------+ \| 1 \| a \| 2009-09-21 00:19:43 \| \| 2 \| b \| 2009-09-21 00:19:43 \| \| 3 \| c \| 2009-09-21 00:19:43 \| \| 4 \| x \| 2009-09-21 00:21:00 \| \| 5 \| y \| 2009-09-21 00:21:00 \| \| 6 \| z \| 2009-09-21 00:21:00 \| +----+-----------------+---------------------+ COMMIT;
INSERT INTO transaction_test(val) VALUES (@@session.tx_isolation); SELECT * FROM transaction_test; +----+-----------------+---------------------+ \| id \| val \| created \| +----+-----------------+---------------------+ \| 1 \| a \| 2009-09-21 00:19:43 \| \| 2 \| b \| 2009-09-21 00:19:43 \| \| 3 \| c \| 2009-09-21 00:19:43 \| \| 7 \| REPEATABLE-READ \| 2009-09-21 00:21:01 \| +----+-----------------+---------------------+ COMMIT;
READ-COMMITTED
SET SESSION tx_isolation='READ-COMMITTED'; TRUNCATE TABLE transaction_test; INSERT INTO transaction_test(val) VALUES ('a'),('b'),('c');
SELECT @@global.tx_isolation, @@session.tx_isolation; +-----------------------+------------------------+ \| @@global.tx_isolation \| @@session.tx_isolation \| +-----------------------+------------------------+ \| REPEATABLE-READ \| READ-COMMITTED \| +-----------------------+------------------------+ START TRANSACTION; SELECT * FROM transaction_test; +----+-----+---------------------+ \| id \| val \| created \| +----+-----+---------------------+ \| 1 \| a \| 2009-09-23 22:49:44 \| \| 2 \| b \| 2009-09-23 22:49:44 \| \| 3 \| c \| 2009-09-23 22:49:44 \| +----+-----+---------------------+ SELECT SLEEP(20);
	START TRANSACTION; INSERT INTO transaction_test(val) VALUES ('x'),('y'),('z'); SELECT * FROM transaction_test; +----+-----+---------------------+ \| id \| val \| created \| +----+-----+---------------------+ \| 1 \| a \| 2009-09-23 22:49:44 \| \| 2 \| b \| 2009-09-23 22:49:44 \| \| 3 \| c \| 2009-09-23 22:49:44 \| \| 4 \| x \| 2009-09-23 22:52:38 \| \| 5 \| y \| 2009-09-23 22:52:38 \| \| 6 \| z \| 2009-09-23 22:52:38 \| +----+-----+---------------------+ COMMIT;
INSERT INTO transaction_test(val) VALUES (@@session.tx_isolation); SELECT * FROM transaction_test; +----+----------------+---------------------+ \| id \| val \| created \| +----+----------------+---------------------+ \| 1 \| a \| 2009-09-23 22:49:44 \| \| 2 \| b \| 2009-09-23 22:49:44 \| \| 3 \| c \| 2009-09-23 22:49:44 \| \| 4 \| x \| 2009-09-23 22:52:38 \| \| 5 \| y \| 2009-09-23 22:52:38 \| \| 6 \| z \| 2009-09-23 22:52:38 \| \| 7 \| READ-COMMITTED \| 2009-09-23 22:56:10 \| +----+----------------+---------------------+ COMMIT;

As you can see, under READ-COMMITTED your result set can change during the transaction. However, how practical is this example in an actual application.

In what circumstances would you consider using READ-COMMITTED? Is there an improvement in locking contention that can lead to less deadlock contention? What is the overhead in other areas?

Harrison writes in My Favorite New Feature of MySQL 5.1: Less InnoDB Locking that best locking out of InnoDB in 5.1 will be with READ-COMMITTED. Note that as mentioned, the impact is a change in replication mode that may have a more dramatic effect.

Heikki Tuuri comments in Understanding InnoDB MVCC that using READ-COMMITTED should help in a specific locking issue.

I am still unclear of the specific benefits in general terms for all environments. Review of the The InnoDB Transaction Model and Locking and specifically Consistent Nonlocking Reads provides “With READ COMMITTED isolation level, each consistent read within a transaction sets and reads its own fresh snapshot” which indicates that for certain workloads the reduced locking is a benefit.

Every environment is different and ultimately the actual transaction statements will determine what options and benefits work best.

SELECT INTO DUMPFILE

September 19, 2009 by ronald

While learning a new ORDER BY syntax recently, as a diligent architect/DBA I reviewed the documentation. What I also found in the SELECT syntax which I did not also know was the keyword DUMPFILE.

The SELECT Syntax from MySQL 5.1 Manual states:

If you use INTO DUMPFILE instead of INTO OUTFILE, MySQL writes only one row into the file, without any column or line termination and without performing any escape processing. This is useful if you want to store a BLOB value in a file.

It’s a shame there is no middle ground, where you get the features of OUTFILE (i.e. all rows), and the features of DUMPFILE (i.e. no heading)

Identifying Resource Bottlenecks – Disk

September 18, 2009 by ronald

With a discussion on identifying CPU and Memory bottlenecks achieved, let us now look at how Disk can affect MySQL performance.

One of the first steps when addressing a MySQL performance tuning problem is to perform a system audit of the physical hardware resources (CPU,Memory,Disk,Network), then identify any obvious bottlenecks in these resources.

There are several commands you can use including vmstat, iostat and sar (both part of the sysstat package) to provide information on disk I/O.

vmstat is a good place to start for an overall view for multiple resources, however for disk it only provides a total system overview.

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
  4  0 249256 107768 116360 1519596    0    0     0   112 8151 11256 40  4 56  0
 3  0 249256 107588 116360 1519680    0    0     4  1068 8021 11514 51  2 47  0
 1  0 249256 107396 116360 1519788    0    0     0    88 8880 12832 35  6 60  0
 0  0 249256 107432 116360 1519928    0    0     4    40 9393 14561  8  4 89  0
 2  0 249256 107160 116360 1519988    0    0     4  5636 9790 14245 23  6 71  0
 1  0 249256 107140 116360 1520356    0    0     4   180 9077 13285 33  3 65  0
 3  0 249256 107100 116360 1520352    0    0     0  1516 7970 13099 22  2 75  0
 4  1 249256 107184 116360 1520476    0    0     4   108 9756 15478 67  4 29  0
 2  0 249256 106196 116360 1520652    0    0     0     0 9512 14212 61  4 35  0

We want to look at is bi, bo and wa. The descriptions of these columns from the man page is:

bi: Blocks received from a block device (blocks/s).
bo: Blocks sent to a block device (blocks/s).
wa: Time spent waiting for IO.

As you can see from this above example, there is no disk reading, just disk writing, and there is no wait for I/O. This is good.

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 3  2 888552  86716 321940 712480   32  636  5352  1800 18177 22731 31  9 29 31
 0  5 888552  86748 321944 712592    0    0  2072   264 15592 19435 27  6 45 23
 4  5 888552  86296 321944 712796   16    0  5556  8388 15559 19674 28  5 11 55
 4  2 888552  86112 321948 713408   24    0  4404  4936 15215 19354 26  6 20 48
 6  0 888552  85732 321948 713608   56    0  6348  4368 15123 19109 25  5 37 34
 2  3 888552  85188 321956 713936   60    0  3080  4104 16322 21044 29  6 48 18
 2  3 888552  84972 321964 714376   20    0  4464 10852 20483 26013 33  9 25 34
 1 10 888552  84772 321980 714724   12    0  9332 12868 16981 21948 28  6 19 48
 2  3 888552  84080 321988 714952  112    0 11144  8944 15973 20190 27  6  1 65

In this above example we see a production system that has high disk reads and writes, and wait I/O is high. If you see the CPU waiting for Disk I/O at 60%-70%-80% you have effectively reached disk saturation.

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0 28      0  14436   4616 2405504    0    0     0     2  101   92  0  0  0 100
 0  4      0  14320   4616 2405504    0    0   120  2300  191  213  0  0  1 98
 0  5      0  15064   4616 2405056    0    0  1688    62 1961 2080  2  0 32 63
 0  4      0  14136   4616 2405916    0    0   952   272  584  569  0  0  0 98
 0  5      0  16864   4624 2403068    0    0   336    76  886 1389  2  1  0 97
 0  3      0  16052   4624 2403928    0    0   800   288  373  352  0  0  0 99
 0  3      0  15380   4624 2404556    0    0   720   240  373  357  0  0  0 98
 0  3      0  14452   4624 2405588    0    0   912   400  330  324  1  0  0 97
 0 57      0  15688   4624 2404380    0    0  1956   893  439  828  1  0  0 97
 0 56      0  15572   4632 2404544    0    0   208    12  947 1402  2  0  0 97
 0 53      0  14784   4632 2405320    0    0   784     1  310  261  0  0  0 98
 0 50      0  14436   4632 2405664    0    0   288   120  175  140  0  0  0 99
 0 50      0  14228   4632 2405688    0    0   160     0   97   85  0  0  0 100
 1 49      0  14112   4632 2406032    0    0   276     0  183  184  0  0  0 100

In the above example the system is past total disk saturation. The system is waiting completely for disk. This is the output of an actual production system. This is a good example because it is important to look at all three figures. Look at how low the amount of bi/bo in ration to wa. This is an indicator of a potential underlying disk subsystem problem, and in-fact this system failed within 24 hours of this output. I have never see 100% Disk Wait I/O before this example.

To get an indication of load on a multi disk system you need to use iostat for example. In the following example, we have an idle system with two drives. I simulate load on sdb with the following command.

$ dd if=/dev/zero of=/disk1/dd.out bs=1024k count=10000

$ iostat 5
Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               2.80         1.60        97.60          8        488
sdb               2.80         0.00        25.60          0        128

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.10    0.00    0.05    0.10    0.00   99.75

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               5.80        19.20       225.60         96       1128
sdb               0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.80    0.00   11.86    6.30    0.00   81.04

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda              19.20        17.60       294.40         88       1472
sdb              18.20         0.00     18246.40          0      91232

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.40    0.00    3.15   34.88    0.00   61.57

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda              35.20        30.40     14409.60        152      72048
sdb              86.20         1.60     83763.20          8     418816

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.30    0.00    1.45   38.87    0.00   58.38

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda              30.80         4.80       451.20         24       2256
sdb              84.40         0.00     84172.80          0     420864

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.40    0.00    1.15   35.43    0.00   61.02

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda              15.40         3.20       270.40         16       1352
sdb              84.80         0.00     83353.60          0     416768

You can see from this example, the increase in blocks written to sdb, and overal %iowait.

If you want to do a blanket test of your disk subsystem you should consider a program that adequately test the different interactions, especially any caching or battery backed cached that your system may have. I have written several articles on using Bonnie++ including Using Bonnie++, Extending Bonnie++ and Bonnie++ Results.

There are a number of variables that make it more difficult to audit disks. RAID configuration is often difficult as this involves custom OS/RAID provider commands. The disk controller cache, and battery backed cache (BBRU) are just two factors. It is important you know these commands, you study them and most importantly you know when your system is running in a degraded mode. The popular Dell 1950/2950 etc series generally have PERC 5/6i cards, you can use the folllowing as a guide to Understanding PERC RAID Controllers.

My 60 second take on RAID and Disk Configuration. I concur with Matt Yonkivit. You should separate your OS from your database on disk, RAID 1 (2 drives) works fine for the OS. For databases, in theory (pure math when understanding RAID), RAID10 is better then RAID5.

My favorite MySQL data type – DECIMAL(31,0)

September 18, 2009 by ronald

It may seem hard to believe, but I have seen DECIMAL(31,0) in action on a production server. Not just in one column, but in 15 columns just in the largest 4 tables of one schema. The column was being used to represent a integer primary or foreign key column.

In a representative production instance (one of a dozen plus distributed production database servers) the overall database footprint was decreased from ~10 GB to ~2 GB, a 78% saving. In total, 15 columns across just 4 tables were changed from DECIMAL(31,0) to INT UNSIGNED.

One single table > 5GB was reduced to under 1GB (a 81% saving). This being my record for any GB+ tables in my time working with the MySQL database.

Had this server for example had 4GB of RAM, and say 2.5GB allocated to the innodb_buffer_pool_size, this one change moved the system from requiring more consistent disk access (4x data to memory) to being able to store all data in memory. Tests showed a clear improvement in Innodb buffer pool reads and hit ratio.

Today’s lesson as described in my 2008 conference presentation Top 20 design tips for data architects is, choose the right integer data type for your data.

MySQL SHOW PRIVILEGES

September 18, 2009 by ronald

Some days you learn about MySQL commands even without knowing about them. Today I wanted to check the privileges a user had because they did not have permissions to drop a view. Rather then typing SHOW GRANTS I quite by accident typed SHOW PRIVILEGES only to realize not only was it a valid command, it actually provided information that means I don’t have to go to the Privileges Provided by MySQL documentation page which I was already on.

Old dog, new trick.

mysql> show privileges;
+-------------------------+---------------------------------------+-------------------------------------------------------+
| Privilege               | Context                               | Comment                                               |
+-------------------------+---------------------------------------+-------------------------------------------------------+
| Alter                   | Tables                                | To alter the table                                    |
| Alter routine           | Functions,Procedures                  | To alter or drop stored functions/procedures          |
| Create                  | Databases,Tables,Indexes              | To create new databases and tables                    |
| Create routine          | Functions,Procedures                  | To use CREATE FUNCTION/PROCEDURE                      |
| Create temporary tables | Databases                             | To use CREATE TEMPORARY TABLE                         |
| Create view             | Tables                                | To create new views                                   |
| Create user             | Server Admin                          | To create new users                                   |
| Delete                  | Tables                                | To delete existing rows                               |
| Drop                    | Databases,Tables                      | To drop databases, tables, and views                  |
| Event                   | Server Admin                          | To create, alter, drop and execute events             |
| Execute                 | Functions,Procedures                  | To execute stored routines                            |
| File                    | File access on server                 | To read and write files on the server                 |
| Grant option            | Databases,Tables,Functions,Procedures | To give to other users those privileges you possess   |
| Index                   | Tables                                | To create or drop indexes                             |
| Insert                  | Tables                                | To insert data into tables                            |
| Lock tables             | Databases                             | To use LOCK TABLES (together with SELECT privilege)   |
| Process                 | Server Admin                          | To view the plain text of currently executing queries |
| References              | Databases,Tables                      | To have references on tables                          |
| Reload                  | Server Admin                          | To reload or refresh tables, logs and privileges      |
| Replication client      | Server Admin                          | To ask where the slave or master servers are          |
| Replication slave       | Server Admin                          | To read binary log events from the master             |
| Select                  | Tables                                | To retrieve rows from table                           |
| Show databases          | Server Admin                          | To see all databases with SHOW DATABASES              |
| Show view               | Tables                                | To see views with SHOW CREATE VIEW                    |
| Shutdown                | Server Admin                          | To shut down the server                               |
| Super                   | Server Admin                          | To use KILL thread, SET GLOBAL, CHANGE MASTER, etc.   |
| Trigger                 | Tables                                | To use triggers                                       |
| Update                  | Tables                                | To update existing rows                               |
| Usage                   | Server Admin                          | No privileges - allow connect only                    |
+-------------------------+---------------------------------------+-------------------------------------------------------+
29 rows in set (0.00 sec)

More woes with java version on Ubuntu

September 18, 2009 by ronald

Armed with more information on Drizzle JDBC being a JDBC 4.0 implementation (helps to explain my issues in Getting started with Drizzle JDBC) I took the time to read about some other new JDBC 4.0 features.

There was reference to handling chained exceptions, however when trying to get this working for SQLException was more complex on Ubuntu 9.04 then I anticipated.

My first problem was an apparent source level problem.

$ javac ExampleDrizzle.java
----------
1. ERROR in ExampleDrizzle.java (at line 14)
	for(Throwable e : sx ) {
	    ^^^^^^^^^^^^^^^^
Syntax error, 'for each' statements are only available if source level is 1.5

That’s weird, what java version was I running now I’d changed with update-alternatives –config java yesterday.

$ java -version
java version "1.6.0_16"
Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode)

No issues here, a quick man reference gives me:

-1.5                    set compliance level to 1.5

I try that, and well that fixes one problem, but creates another.

$ javac -1.5 ExampleDrizzle.java
----------
1. ERROR in ExampleDrizzle.java (at line 14)
	for(Throwable e : sx ) {
	                  ^^
Can only iterate over an array or an instance of java.lang.Iterable

Now Class SQLException 1.6 javadocs shows SQLException as implementing the generics Iterable<Throwable>, while 1.5 javadoc does not. I guess I need to use 1.6 then.

$ javac -1.6 ExampleDrizzle.java
Annotation processing got disabled, since it requires a 1.6 compliant JVM
----------
1. ERROR in ExampleDrizzle.java (at line 14)
	for(Throwable e : sx ) {
	                  ^^
Can only iterate over an array or an instance of java.lang.Iterable

Wait a minute, I’m using a 1.6 compliant JVM. Double checking

$ ls -al /etc/alternatives/java*
lrwxrwxrwx 1 root root 36 2009-09-17 18:53 /etc/alternatives/java -> /usr/lib/jvm/java-6-sun/jre/bin/java
lrwxrwxrwx 1 root root 46 2009-09-17 18:53 /etc/alternatives/java.1.gz -> /usr/lib/jvm/java-6-sun/jre/man/man1/java.1.gz
lrwxrwxrwx 1 root root 31 2009-09-17 17:50 /etc/alternatives/javac -> /usr/lib/jvm/java-gcj/bin/javac
lrwxrwxrwx 1 root root 41 2009-09-17 17:50 /etc/alternatives/javac.1.gz -> /usr/lib/jvm/java-gcj/man/man1/javac.1.gz
lrwxrwxrwx 1 root root 33 2009-09-17 17:50 /etc/alternatives/javadoc -> /usr/lib/jvm/java-gcj/bin/javadoc
lrwxrwxrwx 1 root root 43 2009-09-17 17:50 /etc/alternatives/javadoc.1.gz -> /usr/lib/jvm/java-gcj/man/man1/javadoc.1.gz
lrwxrwxrwx 1 root root 31 2009-09-17 17:50 /etc/alternatives/javah -> /usr/lib/jvm/java-gcj/bin/javah
lrwxrwxrwx 1 root root 41 2009-09-17 17:50 /etc/alternatives/javah.1.gz -> /usr/lib/jvm/java-gcj/man/man1/javah.1.gz
lrwxrwxrwx 1 root root 33 2009-09-11 10:06 /etc/alternatives/javap -> /usr/lib/jvm/java-6-sun/bin/javap
lrwxrwxrwx 1 root root 43 2009-09-11 10:06 /etc/alternatives/javap.1.gz -> /usr/lib/jvm/java-6-sun/man/man1/javap.1.gz
lrwxrwxrwx 1 root root 39 2009-09-11 10:06 /etc/alternatives/java_vm -> /usr/lib/jvm/java-6-sun/jre/bin/java_vm
lrwxrwxrwx 1 root root 38 2009-09-11 10:06 /etc/alternatives/javaws -> /usr/lib/jvm/java-6-sun/jre/bin/javaws
lrwxrwxrwx 1 root root 48 2009-09-11 10:06 /etc/alternatives/javaws.1.gz -> /usr/lib/jvm/java-6-sun/jre/man/man1/javaws.1.gz

javac is not using Sun Java 6. I have no idea how that happened, but it explains now the problem, should be checking javac version, not java version.

$ javac -version
Eclipse Java Compiler 0.894_R34x, 3.4.2 release, Copyright IBM Corp 2000, 2008. All rights reserved.

What the? I was writing Java code on this server by hand, but decided last night to install eclipse after the fact. Did this affect this. I’m not certain whether I installed eclipse before or after my work last night.

I try to change the alternatives again.

$ sudo update-alternatives --config java

There are 4 alternatives which provide `java'.

  Selection    Alternative
-----------------------------------------------
*         1    /usr/lib/jvm/java-6-sun/jre/bin/java
          2    /usr/bin/gij-4.3
          3    /usr/bin/gij-4.2
 +        4    /usr/lib/jvm/java-gcj/jre/bin/java

Press enter to keep the default[*], or type selection number: 1
Using '/usr/lib/jvm/java-6-sun/jre/bin/java' to provide 'java'.

$ javac -version
Eclipse Java Compiler 0.894_R34x, 3.4.2 release, Copyright IBM Corp 2000, 2008. All rights reserved.

That doesn’t work. One needs to know that java and javac operate independently.

$ sudo update-alternatives --config javac

There are 4 alternatives which provide `javac'.

  Selection    Alternative
-----------------------------------------------
          1    /usr/lib/jvm/java-6-sun/bin/javac
          2    /usr/bin/ecj
          3    /usr/bin/gcj-wrapper-4.3
*+        4    /usr/lib/jvm/java-gcj/bin/javac

Press enter to keep the default[*], or type selection number: 1
Using '/usr/lib/jvm/java-6-sun/bin/javac' to provide 'javac'.
$ javac -version
javac 1.6.0_16

$ javac ExampleDrizzle.java

Buyer beware with Ubuntu and it’s rather messed up implementation approach toward alternative java JVM’s.

Getting started with Drizzle JDBC

September 17, 2009 by ronald

In preparation for some Java work I wanted to configure and test the Drizzle JDBC Driver. Any chance to swing Drizzle into a MySQL discussion is worth the research. What I found was an issue compiling and an issue running on Ubuntu 9.04

You can start by downloading and building the Drizzle JDBC. My first problem was when I tried to build a usable .jar. I got errors in the test cases which caused by default no built .jar to work with. I raised Bug #432146 – org.drizzle.jdbc.MySQLDriverTest Tests fail. As I stated it may not be a real bug, but it seems at present that you require a running MySQL instance as well as a running Drizzle instance. In my case I didn’t have MySQL running, and I think to be fair, I should be able to build a Drizzle driver without MySQL.

Anyway, as per the Wiki Docs I proceeded to package without successful test cases. My next problem was more interesting, and perhaps found earlier from the tests?

I first created a test schema my code was going to use.

$ ~/drizzle/deploy/bin/drizzle
Your Drizzle connection id is 724
Server version: 2009.09.1126 Source distribution (trunk)

drizzle> create schema test_java;
Query OK, 1 row affected (0 sec)
drizzle> exit

I wrote a simple Java program.

$ cat ExampleDrizzle.java
import java.sql.*;

public class ExampleDrizzle {

  public static void main(String args[]) {

    try {
      Class.forName("org.drizzle.jdbc.Driver");
    } catch (Exception e) {
      System.out.println(e.getMessage());
      System.exit(1);
    }

    try {
      Connection con = DriverManager.getConnection("jdbc:drizzle://localhost:4427/test_java");
      Statement st = con.createStatement();
      st.executeUpdate("CREATE TABLE a (id int not null primary key, value varchar(20))");
      st.close();
      con.close();
    } catch (SQLException e) {
      System.out.println(e.getMessage());
    }
  }
}

Compiled.

$ javac ExampleDrizzle.java

Ran.

$ java ExampleDrizzle
org.drizzle.jdbc.Driver not found in gnu.gcj.runtime.SystemClassLoader{urls=[file:mysql-connector-java-5.1.8-bin.jar,file:./], parent=gnu.gcj.runtime.ExtensionClassLoader{urls=[], parent=null}}

Oops, been a while since using Java. I was amazed I could write the code in vi in the first place.

$ export CLASSPATH=drizzle-jdbc-0.5-SNAPSHOT.jar:.
$ java ExampleDrizzle
17-Sep-09 6:48:45 PM org.drizzle.jdbc.internal.drizzle.DrizzleProtocol 
INFO: Connected to: localhost:4427
Exception in thread "main" java.lang.NoClassDefFoundError: org.drizzle.jdbc.DrizzleConnection
   at java.lang.Class.initializeClass(libgcj.so.90)
   at org.drizzle.jdbc.Driver.connect(Driver.java:74)
   at java.sql.DriverManager.getConnection(libgcj.so.90)
   at ExampleDrizzle.main(ExampleDrizzle.java:15)
Caused by: java.lang.ClassNotFoundException: java.sql.SQLFeatureNotSupportedException not found in gnu.gcj.runtime.SystemClassLoader{urls=[file:drizzle-jdbc-0.5-SNAPSHOT.jar,file:./], parent=gnu.gcj.runtime.ExtensionClassLoader{urls=[], parent=null}}
   at java.net.URLClassLoader.findClass(libgcj.so.90)
   at java.lang.ClassLoader.loadClass(libgcj.so.90)
   at java.lang.ClassLoader.loadClass(libgcj.so.90)
   at java.lang.Class.forName(libgcj.so.90)
   at java.lang.Class.initializeClass(libgcj.so.90)
   ...3 more

Hmmm, that’s disappointing. I thought about it a minute, figured some guidance would be beneficial , so I sought out the best Java person on #drizzle IRC. Getting a name, but no response from an initial inquiry after about a half hour I thought again at the problem. Just what java are you using?

$ java -version
java version "1.5.0"
gij (GNU libgcj) version 4.3.3

$ ls -l /usr/bin/java
lrwxrwxrwx 1 root root 22 2009-07-17 12:36 /usr/bin/java -> /etc/alternatives/java

$ sudo find / -name java
[sudo] password for rbradfor:
/usr/lib/java
/usr/lib/ure/share/java
/usr/lib/jvm/java-6-sun-1.6.0.16/bin/java
/usr/lib/jvm/java-6-sun-1.6.0.16/jre/bin/java
/usr/lib/jvm/java-1.5.0-gcj-4.3-1.5.0.0/bin/java
/usr/lib/jvm/java-1.5.0-gcj-4.3-1.5.0.0/jre/bin/java
/usr/bin/java
/usr/include/c++/4.3/gnu/java
/usr/include/c++/4.3/java
/usr/local/include/google/protobuf/compiler/java

$ ls -l /etc/alternatives/j*
...
lrwxrwxrwx   1 root root    33 2009-09-17 17:50 jar -> /usr/lib/jvm/java-gcj/jre/bin/jar
lrwxrwxrwx   1 root root    39 2009-09-17 17:50 jar.1.gz -> /usr/lib/jvm/java-gcj/man/man1/jar.1.gz
lrwxrwxrwx   1 root root    35 2009-09-17 17:50 jarsigner -> /usr/lib/jvm/java-gcj/bin/jarsigner
lrwxrwxrwx   1 root root    45 2009-09-17 17:50 jarsigner.1.gz -> /usr/lib/jvm/java-gcj/man/man1/jarsigner.1.gz
lrwxrwxrwx   1 root root    34 2009-09-17 17:50 java -> /usr/lib/jvm/java-gcj/jre/bin/java
lrwxrwxrwx   1 root root    40 2009-09-17 17:50 java.1.gz -> /usr/lib/jvm/java-gcj/man/man1/java.1.gz
lrwxrwxrwx   1 root root    31 2009-09-17 17:50 javac -> /usr/lib/jvm/java-gcj/bin/javac
lrwxrwxrwx   1 root root    41 2009-09-17 17:50 javac.1.gz -> /usr/lib/jvm/java-gcj/man/man1/javac.1.gz
...

I wonder if I should use the real Sun Java.

$ sudo apt-get install sun-java6-jdk
Reading package lists... Done
Building dependency tree
Reading state information... Done
sun-java6-jdk is already the newest version.
0 upgraded, 0 newly installed, 0 to remove and 2 not upgraded.
$ sudo update-alternatives --config java

There are 4 alternatives which provide `java'.

  Selection    Alternative
-----------------------------------------------
          1    /usr/lib/jvm/java-6-sun/jre/bin/java
          2    /usr/bin/gij-4.3
          3    /usr/bin/gij-4.2
*+        4    /usr/lib/jvm/java-gcj/jre/bin/java

Press enter to keep the default[*], or type selection number: 1
Using '/usr/lib/jvm/java-6-sun/jre/bin/java' to provide 'java'.

$ ls -l /usr/bin/java
lrwxrwxrwx 1 root root 22 2009-07-17 12:36 /usr/bin/java -> /etc/alternatives/java
$ ls -l /etc/alternatives/java
lrwxrwxrwx 1 root root 36 2009-09-17 18:53 /etc/alternatives/java -> /usr/lib/jvm/java-6-sun/jre/bin/java

Yep, it took a minute to discover the update-alternatives command, lucky I didn’t try that manually.

A second try.

$ javac ExampleDrizzle.java
$ java ExampleDrizzle
Sep 17, 2009 6:54:22 PM org.drizzle.jdbc.internal.drizzle.DrizzleProtocol 
INFO: Connected to: localhost:4427
Sep 17, 2009 6:54:22 PM org.drizzle.jdbc.internal.drizzle.DrizzleProtocol close
INFO: Closing connection
Sep 17, 2009 6:54:22 PM org.drizzle.jdbc.internal.common.packet.AsyncPacketFetcher run
INFO: Connection closed

$ ~/drizzle/deploy/bin/drizzle test_java
Server version: 2009.09.1126 Source distribution (trunk)

drizzle> show tables;
+---------------------+
| Tables_in_test_java |
+---------------------+
| a                   |
+---------------------+
1 row in set (0 sec)

drizzle> desc a;
+-------+-------------+------+-----+---------+-------+
| Field | Type        | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| id    | int         | NO   | PRI | NULL    |       |
| value | varchar(20) | YES  |     | NULL    |       |
+-------+-------------+------+-----+---------+-------+
2 rows in set (0 sec)

And I’ve got a working testcase.