Advanced reporting options for MySQL

I’m seeking help from the MySQL community for what tools are used today to generate complex reports for enterprise applications that use MySQL. In an Oracle world, you have Oracle Report Writer, in Microsoft Crystal Reports.

In the open source world there is Jasper Reports, Pentaho Reports and BIRT however I don’t know the power of complex reporting with these.

If anybody has experience using or evaluating these tools please let me know. This may lead to possible work.

Migrating MySQL latin1 to utf8 – The process

Having covered the preparation and character set options of performing a latin1 to utf8 MySQL migration, just how do you perform the migration correctly.

Example Case

Just to recap, we have the following example table and data.

mysql> select c,length(c),char_length(c),charset(c), hex(c) from conv.test_latin1;
+---------------+-----------+----------------+------------+----------------------------+
| c             | length(c) | char_length(c) | charset(c) | hex(c)                     |
+---------------+-----------+----------------+------------+----------------------------+
| a             |         1 |              1 | latin1     | 61                         |
| abc           |         3 |              3 | latin1     | 616263                     |
| ☺             |         3 |              3 | latin1     | E298BA                     |
| abc ☺☹☻       |        13 |             13 | latin1     | 61626320E298BAE298B9E298BB |
+---------------+-----------+----------------+------------+----------------------------+

Migration approach 1

The easiest way is to use mysqldump to dump the schema and data, to change the schema definitions and then a client reload process, all with the correct client character sets. Working on our sample table test_latin1 which I have in a conv schema you can do.

$ mysqldump -uroot -p --default-character-set=latin1 --skip-set-charset conv > conv.sql
$ sed -i -e "s/latin1/utf8/g" conv.sql
$ mysql -uroot -p --default-character-set=utf8 conv < conv.sql

If you attempt this which technically works (as I've seen a similar example on the Internet) your bound to screw up something. While the mysqldump and mysql load use the correct client command line options, performing a blind conversion of all references of latin1 to utf8 will not only change your table definition, in my example it will change the name of table, and if would change any values of data that contained the word 'latin1'. For example.

mysql> select c,length(c),char_length(c),charset(c), hex(c) from conv.test_utf8;
+---------+-----------+----------------+------------+----------------------------+
| c       | length(c) | char_length(c) | charset(c) | hex(c)                     |
+---------+-----------+----------------+------------+----------------------------+
| a       |         1 |              1 | utf8       | 61                         |
| abc     |         3 |              3 | utf8       | 616263                     |
| ?       |         3 |              1 | utf8       | E298BA                     |
| abc ??? |        13 |              7 | utf8       | 61626320E298BAE298B9E298BB |
+---------+-----------+----------------+------------+----------------------------+
4 rows in set (0.00 sec)

mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> select c,length(c),char_length(c),charset(c), hex(c) from conv.test_utf8;
+---------------+-----------+----------------+------------+----------------------------+
| c             | length(c) | char_length(c) | charset(c) | hex(c)                     |
+---------------+-----------+----------------+------------+----------------------------+
| a             |         1 |              1 | utf8       | 61                         |
| abc           |         3 |              3 | utf8       | 616263                     |
| ☺             |         3 |              1 | utf8       | E298BA                     |
| abc ☺☹☻       |        13 |              7 | utf8       | 61626320E298BAE298B9E298BB |
+---------------+-----------+----------------+------------+----------------------------+
4 rows in set (0.01 sec)

Be sure to realize now you need to connect with --default-character-set=utf8 or in our example, we use set names as a stop gap measure.

Migration Approach 2

A good practice with using mysqldump regardless of migration is to always dump the schema with the --no-data option, and then dump the data separately with the --no-create-info option. In this case, you can then manually edit the schema file, carefully changing latin1 where appropriate. This is your production data, so some manual hand verification is a good thing. The additional benefit is you can create your schema and verify the syntax is correct before loading any data. While mysqldump creates a single file, due to this dump and reload, you can split your data file and perform some level of parallelism for data loading depending on your hardware capabilities.

$ mysqldump -uroot -p --default-character-set=latin1 --skip-set-charset --no-data conv > conv.schema.sql
$ mysqldump -uroot -p --default-character-set=latin1 --skip-set-charset --no-create-info conv > conv.data.sql
# For simplicity in this example I'm using a more strict global replacement. NOTE: this syntax depends on the version of mysql
$ sed -i -e "s/DEFAULT CHARSET=latin1/DEFAULT CHARSET=utf8/g" conv.schema.sql
$ mysql -uroot -p --default-character-set=utf8 conv < conv.schema.sql
$ mysql -uroot -p --default-character-set=utf8 conv < conv.data.sql
$ mysql -uroot -p --default-character-set=utf8 conv
mysql> select c,length(c),char_length(c),charset(c), hex(c) from conv.test_latin1;
+---------------+-----------+----------------+------------+----------------------------+
| c             | length(c) | char_length(c) | charset(c) | hex(c)                     |
+---------------+-----------+----------------+------------+----------------------------+
| a             |         1 |              1 | utf8       | 61                         |
| abc           |         3 |              3 | utf8       | 616263                     |
| ☺             |         3 |              1 | utf8       | E298BA                     |
| abc ☺☹☻       |        13 |              7 | utf8       | 61626320E298BAE298B9E298BB |
+---------------+-----------+----------------+------------+----------------------------+
4 rows in set (0.00 sec)

Other software interactions

While I only discuss the MySQL data and mysql client usage here, you should ensure that your programming language connecting to MySQL uses the correct character set management, e.g. PHP uses the default_charset settings in php.ini and even your webserver may benefit from the correct encoding, e.g. Apache httpd uses AddDefaultCharset httpd.conf.

Migration Approach Problems

In our example our migration is successfully, however like life, your production data is unlikely to be perfect. In my next post I will talk about identifying and handling exceptions, including data that was wrongly encoded originally into latin1, or translation such as html entities from rich editor input fields for example.

Don't Assume – Data Integrity

MySQL has the same level of data integrity for numbers and strings as Oracle; when MySQL is correctly configured. By default (a reason I wish I knew why it is still the default), MySQL performs silent conversions on boundary conditions of data that will result in your data not always being what is specified. Let’s look at the following examples to demonstrate default behavior.

For numbers

mysql> DROP TABLE IF EXISTS example;
mysql> CREATE TABLE example(i1  TINYINT, i2 TINYINT UNSIGNED, c1 VARCHAR(5));
mysql> INSERT INTO example (i1) VALUES (1), (-1), (100), (500);
Query OK, 4 rows affected, 1 warning (0.08 sec)
mysql> SELECT * FROM example;
+------+------+------+
| i1   | i2   | c1   |
+------+------+------+
|    1 | NULL | NULL |
|   -1 | NULL | NULL |
|  100 | NULL | NULL |
|  127 | NULL | NULL |
+------+------+------+
4 rows in set (0.00 sec)

As you can see for one value we inserted 500, yet the value of 127 is stored? For this example I have used the TINYINT numeric data type to demonstrate truncation. TINYINT is a 1 byte integer that stores values from -128 to +127. Unlike Oracle, MySQL has 9 different data types for numeric columns, and using these wisely can improve your database disk footprint, for example BIGINT v INT. Is there a big deal?.

MySQL also has a nice feature for numeric data types, the UNSIGNED attribute that ensures only a positive integer or 0 value. Let’s see what happens with this column.

Unsigned

mysql> TRUNCATE TABLE example;
mysql> INSERT INTO example (i2) VALUES (1), (-1), (100), (500);
Query OK, 4 rows affected, 2 warnings (0.00 sec)
mysql> SELECT * FROM example;
+------+------+------+
| i1   | i2   | c1   |
+------+------+------+
| NULL |    1 | NULL |
| NULL |    0 | NULL |
| NULL |  100 | NULL |
| NULL |  255 | NULL |
+------+------+------+
4 rows in set (0.00 sec)

Now you see that -1 and 500 are now not the expected values, and before while 500 was silently truncated to 127, now it’s truncated to 255.

For Strings

As you can now assume, the following also occurs for strings.

mysql> TRUNCATE TABLE example;
mysql> INSERT INTO example (c1) VALUES (NULL),('a'),('abcde'),('xyz12345');
Query OK, 4 rows affected, 1 warning (0.00 sec)
mysql> SELECT * FROM example;
+------+------+-------+
| i1   | i2   | c1    |
+------+------+-------+
| NULL | NULL | NULL  |
| NULL | NULL | a     |
| NULL | NULL | abcde |
| NULL | NULL | xyz12 |
+------+------+-------+
4 rows in set (0.00 sec)

Show warnings

As you can see here, the mysql client shows that warnings occurred, but if you don’t review the warning you would never know, a situation that is rarely reviewed with development in richer programming languages. Let us look at these actual warnings more closely.

mysql> INSERT INTO example (i1) VALUES (1), (-1), (100), (500);
Query OK, 4 rows affected, 1 warning (0.08 sec)
mysql> SHOW WARNINGS;
+---------+------+---------------------------------------------+
| Level   | Code | Message                                     |
+---------+------+---------------------------------------------+
| Warning | 1264 | Out of range value for column 'i1' at row 4 |
+---------+------+---------------------------------------------+

mysql> INSERT INTO example (i2) VALUES (1), (-1), (100), (500);
Query OK, 4 rows affected, 2 warnings (0.00 sec)
mysql> SHOW WARNINGS;
+---------+------+---------------------------------------------+
| Level   | Code | Message                                     |
+---------+------+---------------------------------------------+
| Warning | 1264 | Out of range value for column 'i2' at row 2 |
| Warning | 1264 | Out of range value for column 'i2' at row 4 |
+---------+------+---------------------------------------------+
2 rows in set (0.00 sec)
mysql> INSERT INTO example (c1) VALUES (NULL),('a'),('abcde'),('xyz12345');
Query OK, 4 rows affected, 1 warning (0.00 sec)
mysql> SHOW WARNINGS;
+---------+------+-----------------------------------------+
| Level   | Code | Message                                 |
+---------+------+-----------------------------------------+
| Warning | 1265 | Data truncated for column 'c1' at row 4 |
+---------+------+-----------------------------------------+
1 row in set (0.00 sec)

Using sql_mode

The solution is the sql_mode configuration variable and at minimum the value of STRICT_ALL_TABLES defined. We can demonstrate the expected behavior with the following syntax.

mysql> set SESSION sql_mode=STRICT_ALL_TABLES;
Query OK, 0 rows affected (0.00 sec)

mysql> TRUNCATE TABLE example;
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO example (i1) VALUES (1), (-1), (100), (500);
ERROR 1264 (22003): Out of range value for column 'i1' at row 4
mysql> SELECT * FROM example;
+------+------+------+
| i1   | i2   | c1   |
+------+------+------+
|    1 | NULL | NULL |
|   -1 | NULL | NULL |
|  100 | NULL | NULL |
+------+------+------+
3 rows in set (0.00 sec)

As you can see, even with an error for a single INSERT statement, some data was actually stored. You should read Don’t Assume – Transactions for some insights here.

When it comes to dates, there is greater complexity and this is grounds for another entry of this series.

References

About “Don’t Assume”

“Don’t Assume” is a series of posts to help the Oracle DBA understand, use and appreciate the subtle differences and unique characteristics of the MySQL RDBMS in comparison to Oracle. These points as essential to operate MySQL effectively in a production environment and avoid any loss of data or availability.

For more posts in this series be sure to follow the mysql4oracledba tag and also watch out for MySQL for Oracle DBA presentations.

The MySQLCamp for the Oracle DBA is a series of educational talks all Oracle DBA resources should attend. Two presentations from this series IGNITION and LIFTOFF will be presented at the MySQL Users Conference 2010 in Santa Clara, April 2010 This series also includes JUMPSTART and VELOCITY. If you would like to here these presentations in your area, please contact me.

How to crash mysqld intentionally

While some may think I’m daft, I have a legitimate reason for wanting to crash mysqld. However first we need to find a way to crash it.

Great thanks to Alan K, Mark L, Harrison and Hartmut on #mysql-dev for several suggestions and a config option I was unaware of. My investigation even lead to a documentation bug logged as #51739.

My first thought was to find a known bug and if necessary install the correct version to test that. A good one was suggested, Bug #48508 which fails on several versions that I will use to demonstrate with, however the simplest way is to issue kill -11

By default, no core file will be produced which is what I’m seeking but with the right options this is possible. First, the user running mysqld probably has a core file limit size of 0.

$ ulimit -c
0

You can fix this with with ulimit or you can specify this in the [mysqld_safe] section with core-file-size=unlimited

$ ulimit -c unlimited
$ ulimit -c
unlimited

The option I was not aware of is you also have to also specify core-file in your my.cnf

[mysqld]
core-file

I also for my CentOS 5.4 installation ran the following kernel commands, but this may be unnecessary.

sudo /sbin/sysctl -w kernel.core_pattern="core"
sudo /sbin/sysctl -w fs.suid_dumpable= 1

It is now easy to produce a core file.

$ bin/mysqld_safe &
$ killall -11 mysqld
$ bin/mysqld_safe: line 137:  2656 Segmentation fault      (core dumped) ...
100304 16:46:43 mysqld_safe Number of processes running now: 0
100304 16:46:43 mysqld_safe mysqld restarted
$ find . -name "core*"
./data/core.99999

NOTE: Do no run killall on a multi-instance server. I use this syntax here only for simplicity in presentation. It is best to run ps and kill the appropriate pid.

On a side note, I also tried to produce a core on Mac OS X without success. I’d still like to document that way, so if anybody can assist please ping me.

Don't Assume – Transactions

MySQL by default is a NON transactional database. For the hobbyist (See The Hobbyist and the Professional), startup entrepreneur and website developer this may not appear foreign, however to the seasoned Oracle DBA who has only used Oracle the concept is very foreign.

In MySQL you have to be concerned with two situations that will catch the unprepared out. The first is the default autocommit mode. This is TRUE, i.e. all statements are automatically committed on completion.

mysql> SELECT @@autocommit,TRUE;
+--------------+------+
| @@autocommit | TRUE |
+--------------+------+
|            1 |    1 |
+--------------+------+
1 row in set (0.00 sec)

The second is the storage engine used. Again a foreign term for Oracle DBA’s, a storage engine is a technology that stores and retrieves the underlying data from the MySQL database. MySQL has many different storage engines, each with relative strengths and weaknesses and different features. For the purpose of this discussion it is important to know that engines are either non-transactional or transactional. The default storage engine MyISAM is NON transactional. MySQL provides by default the InnoDB storage engine which is transactional. There are distinct advantages of a non transactional environment which I will not go into at this time.

Having recently written about this in my upcoiming book Expert PHP and MySQL I will demonstrate what happens with both MyISAM and InnoDB.

Non-transactional Tables

To show the difference, Listing 6-7 demonstrates that atomicity is not possible with non-transactional tables. The following tables are used in this example.

DROP TABLE IF EXISTS non_trans_parent;
CREATE TABLE non_trans_parent (
  id   INT UNSIGNED NOT NULL AUTO_INCREMENT,
  val  VARCHAR(10) NOT NULL,
PRIMARY KEY (id),
UNIQUE KEY (val)
) ENGINE=MyISAM DEFAULT CHARSET latin1;
DROP TABLE IF EXISTS non_trans_child;
CREATE TABLE non_trans_child (
  id        INT UNSIGNED NOT NULL AUTO_INCREMENT,
  parent_id INT UNSIGNED NOT NULL,
  created   TIMESTAMP NOT NULL,
PRIMARY KEY (id),
INDEX (parent_id)
) ENGINE=MyISAM DEFAULT CHARSET latin1;

To test things out, perform a sample transaction that inserts records into these two tables:

START TRANSACTION;
INSERT INTO non_trans_parent(val) VALUES(‘a’);
INSERT INTO non_trans_child(parent_id,created) VALUES(LAST_INSERT_ID(),NOW());
INSERT INTO non_trans_parent (val) VALUES(‘a’);
ERROR 1062 (23000): Duplicate entry ‘a’ for key ‘val’
ROLLBACK;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> SHOW WARNINGS;
+--------+------+--------------------------------------------------------------
| Level  | Code | Message
+--------+------+--------------------------------------------------------------
| Warning| 1196 | Some non-transactional changed tables couldn’t be rolled back
+--------+------+--------------------------------------------------------------
SELECT * FROM non_trans_parent;
+----+-----+
| id | val |
+----+-----+
|  1 | a   |
+----+-----+
SELECT * FROM non_trans_child;
+----+-----------+---------------------+
| id | parent_id | created             |
+----+-----------+---------------------+
|  1 |         1 | 2009–09–21 23:44:25 |
+----+-----------+---------------------+

As you can see, data that you would have expected to not exist from the transaction is present.

Transactional Tables

Repeat these SQL statements using the transactional storage engine InnoDB; you will observe the difference between transactional and non transactional processing. The following tables, shown in Listing 6-8, are used in this example.

DROP TABLE IF EXISTS trans_parent;
CREATE TABLE trans_parent (
  id   INT UNSIGNED NOT NULL AUTO_INCREMENT,
  val  VARCHAR(10) NOT NULL,
PRIMARY KEY (id),
UNIQUE KEY (val)
) ENGINE=InnoDB DEFAULT CHARSET latin1;
DROP TABLE IF EXISTS trans_child;
CREATE TABLE trans_child (
  id        INT UNSIGNED NOT NULL AUTO_INCREMENT,
  parent_id INT UNSIGNED NOT NULL,
  created   TIMESTAMP NOT NULL,
PRIMARY KEY (id),
INDEX (parent_id)
) ENGINE=InnoDB DEFAULT CHARSET latin1;

Perform a sample transaction that inserts records into these two tables:

START TRANSACTION;
INSERT INTO trans_parent (val) VALUES(‘a’);
INSERT INTO trans_child (parent_id,created) VALUES(LAST_INSERT_ID(),NOW());
INSERT INTO trans_parent (val) VALUES(‘a’);
ERROR 1062 (23000): Duplicate entry ‘a’ for key ‘val’
ROLLBACK;
Query OK, 0 rows affected (0.01 sec)
SELECT * FROM trans_parent;
Empty set (0.00 sec)
SELECT * FROM trans_child;
Empty set (0.00 sec)

As you can see, no data has been recorded as part of the failing transaction.

About “Don’t Assume”

“Don’t Assume” is a series of posts to help the Oracle DBA understand, use and appreciate the subtle differences and unique characteristics of the MySQL RDBMS in comparison to Oracle. These points as essential to operate MySQL effectively in a production environment and avoid any loss of data or availability.

For more posts in this series be sure to follow the mysql4oracledba tag and also watch out for MySQL for Oracle DBA presentations.

The MySQLCamp for the Oracle DBA is a series of educational talks all Oracle DBA resources should attend. Two presentations from this series IGNITION and LIFTOFF will be presented at the MySQL Users Conference 2010 in Santa Clara, April 2010 This series also includes JUMPSTART and VELOCITY. If you would like to here these presentations in your area, please contact me.

Don't Assume – Common Terminology

In Oracle the default transaction isolation is READ_COMMITTED. In MySQL the default is REPEATABLE_READ. Because MySQL also has READ_COMMITTED I have seen in more then one production MySQL environment a transaction isolation of READ_COMMITTED. The explanation and ultimately incorrect assumption is the default in Oracle is READ_COMMITTED so we made that the default in MySQL.

I’m not going to discuss the specific differences of these isolation levels (see reference lines below) except to say it that READ_COMMITTED in Oracle more closely relates to the MySQL default of REPEATABLE_READ and not READ_COMMITTED. Just because the same term for a common feature exists, don’t assume the underlying functionality is the same or that either or both actually conform to the SQL ANSI standard.

While switching your MySQL environment to READ_COMMITTED is possible, there is still conjucture if this actually provides any performance improvement. There are different cases of improving locking contention, in one case Heikki Tuuri the creator of InnoDB suggests READ_COMMITTED may overcome an adjacent range gap locking contention problem while in a tpcc-like benchmark a far greater number of deadlocks were detected.

I will close by stating two facts. When changing the MySQL transaction isolation from the default of REPEATABLE_READ you are using a code path that is less tested and not used as frequently to the millions of default MySQL installations, and you are also required to change the default replication format, again a code path less tested and potential a significant increase in I/O load.

References

About “Don’t Assume”

“Don’t Assume” is a series of posts to help the Oracle DBA understand, use and appreciate the subtle differences and unique characteristics of the MySQL RDBMS in comparison to Oracle. These points as essential to operate MySQL effectively in a production environment and avoid any loss of data or availability.

For more posts in this series be sure to follow the mysql4oracledba tag and also watch out for MySQL for Oracle DBA presentations.

The MySQLCamp for the Oracle DBA is a series of educational talks all Oracle DBA resources should attend. Two presentations from this series IGNITION and LIFTOFF will be presented at the MySQL Users Conference 2010 in Santa Clara, April 2010 This series also includes JUMPSTART and VELOCITY. If you would like to here these presentations in your area, please contact me.

Don't Assume – Session Scope

MySQL system variables and status variables have two scopes. These are GLOBAL and SESSION which are self explanatory.
This is important to realize when altering system variables dynamically. The following example does not produce the expected results.

mysql> USE test;
Database changed
mysql> CREATE TABLE example1(
    -> id INT UNSIGNED NOT NULL AUTO_INCREMENT,
    -> col1 VARCHAR(10) NOT NULL,
    -> PRIMARY KEY(id)
    -> );
Query OK, 0 rows affected (0.29 sec)

mysql> SHOW CREATE TABLE example1G
*************************** 1. row ***************************
       Table: example1
Create Table: CREATE TABLE `example1` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `col1` varchar(10) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

We see that the table has a default CHARACTER SET of latin1. If you wanted to ensure all tables are created as utf8 you change the appropriate system variable. For example, we change the GLOBAL system variable and re-create the table.

mysql> SHOW GLOBAL VARIABLES like 'char%';
+--------------------------+----------------------------------------------------------------+
| Variable_name            | Value                                                          |
+--------------------------+----------------------------------------------------------------+
| character_set_client     | latin1                                                         |
| character_set_connection | latin1                                                         |
| character_set_database   | latin1                                                         |
| character_set_filesystem | binary                                                         |
| character_set_results    | latin1                                                         |
| character_set_server     | latin1                                                         |
| character_set_system     | utf8                                                           |
| character_sets_dir       | /Users/rbradfor/mysql/mysql-5.1.39-osx10.5-x86/share/charsets/ |
+--------------------------+----------------------------------------------------------------+
8 rows in set (0.00 sec)

mysql> SET GLOBAL character_set_server=utf8;
Query OK, 0 rows affected (0.10 sec)
mysql> DROP TABLE example1;
Query OK, 0 rows affected (0.00 sec)
mysql> CREATE TABLE example1( id INT UNSIGNED NOT NULL AUTO_INCREMENT, col1 VARCHAR(10) NOT NULL, PRIMARY KEY(id) );
Query OK, 0 rows affected (0.12 sec)

mysql> SHOW CREATE TABLE example1G
*************************** 1. row ***************************
       Table: example1
Create Table: CREATE TABLE `example1` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `col1` varchar(10) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
1 row in set (0.02 sec)

The table still is latin1. This is because now we have a SESSION scope that differs from the GLOBAL scope as seen in this output.

mysql> SELECT @@GLOBAL.character_set_server,@@SESSION.character_set_server;
+-------------------------------+--------------------------------+
| @@GLOBAL.character_set_server | @@SESSION.character_set_server |
+-------------------------------+--------------------------------+
| utf8                          | latin1                         |
+-------------------------------+--------------------------------+
1 row in set (0.00 sec)

The solution is easy however the trap can be easily overlooked and especially when changing other MySQL system variables.

mysql> SET SESSION character_set_server=utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> DROP TABLE example1;Query OK, 0 rows affected (0.00 sec)

mysql> CREATE TABLE example1( id INT UNSIGNED NOT NULL AUTO_INCREMENT, col1 VARCHAR(10) NOT NULL, PRIMARY KEY(id) );
Query OK, 0 rows affected (0.09 sec)

mysql> SHOW CREATE TABLE example1G
*************************** 1. row ***************************
       Table: example1
Create Table: CREATE TABLE `example1` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `col1` varchar(10) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

By default, and when not specified in SHOW and SET commands the default scope is GLOBAL, however prior to MySQL 5.0.2 the default was SESSION. A note you will find in the 5.0 Reference Manual but not the current GA version 5.1 Reference Manual. See also SHOW STATUS Gotcha written in August 2006.

There are also other gotchas with scope that we will discuss at some other time.

References

About “Don’t Assume”

“Don’t Assume” is a series of posts to help the Oracle DBA understand, use and appreciate the subtle differences and unique characteristics of the MySQL RDBMS in comparison to Oracle. These points as essential to operate MySQL effectively in a production environment and avoid any loss of data or availability.

For more posts in this series be sure to follow the mysql4oracledba tag and also watch out for other MySQL for Oracle DBA presentations.

The MySQLCamp for the Oracle DBA is a series of educational talks all Oracle DBA resources should attend. Two presentations from this series IGNITION and LIFTOFF will be presented at the MySQL Users Conference 2010 in Santa Clara, April 2010 This series also includes JUMPSTART and VELOCITY. If you would like to here these presentations in your area, please contact me.

Don't Assume Series – MySQL for the Oracle DBA

As part of my MySQLCamp for the Oracle DBA series of talks to help the Oracle DBA understand, use and appreciate MySQL I have also developed a series of short interesting posts I have termed “Don’t Assume”. Many of these are re-occurring points during my consulting experiences as I observe Oracle DBA’s using MySQL. I am putting the finishing touches to my MySQL for the Oracle DBA series of talks and I’m excited to highlight some of the subtle differences and unique characteristics of MySQL RDBMS in comparison to Oracle and some extent other products including SQL Server.

Stay tuned for more soon.

I will be presenting at the MySQL Users Conference 2010 in Santa Clara, April 2010 two presentations from this series, IGNITION and LIFTOFF. This series also includes JUMPSTART and VELOCITY.

Edward Screven of Oracle to Answer Questions for future of MySQL

For those of you on the O’Reilly MySQL conference list you will no doubt see this email, but for readers here is the important bits.


Oracle Executive Will Speak at O’Reilly MySQL Conference & Expo
Edward Screven to Answer Questions re: Future of MySQL

Sebastopol, CA, February 24, 2010—Wonder about the future of MySQL? Curious about what Oracle plans for the open source database software? Expect answers when Edward Screven, Oracle’s chief corporate architect and leader of the MySQL business, speaks at the O’Reilly MySQL Conference & Expo, scheduled for April 12-15, at the Santa Clara Convention Center and the Hyatt Regency Santa Clara.

Edward Screven reports to CEO Larry Ellison, and he drives technology and architecture decisions across all Oracle products to ensure that product directions are consistent with Oracle’s overall strategy. He’ll discuss the current and future state of MySQL, now part of the Oracle family of products. His presentation will also cover Oracle’s investment in MySQL technology and community, as well as the role that open source in general is playing within heterogeneous customer environments around the world.

I have not found a link yet to provide reference to this.

Ineffective concatenated indexes

In MySQL significant performance improvements can be achieved by the correct use of indexes. It is important to understand different MySQL index implementations and one key improvement on indexes defined on single columns is to use multiple column or more commonly known concatenated indexes.

However it’s also possible to define ineffective indexes. This example shows you how to identify a concatenated index that is ineffective.

CREATE TABLE example (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT,
  a  INT UNSIGNED NOT NULL,
  b  INT UNSIGNED NOT NULL,
  c  INT UNSIGNED NOT NULL,
  d  INT UNSIGNED NOT NULL,
  x  VARCHAR(10),
  y  VARCHAR(10),
  z  VARCHAR(10),
PRIMARY KEY (id),
UNIQUE INDEX (a,b,c,d)
) ENGINE=InnoDB;

INSERT INTO example(a,b,c,d) VALUES
(1,0,1,1),(1,0,1,2), (1,0,2,3), (1,0,4,5),
(2,0,2,1),(2,0,2,2), (2,0,2,3), (2,0,2,5),
(3,0,2,1),(3,0,2,3), (3,0,3,3), (3,0,3,5);

And our sample query is

SELECT id,x,y,z
FROM   example
WHERE  a = 2
AND    c = 2
AND    d = 3;

The EXPLAIN plan is

+----+-------------+---------+------+---------------+------+---------+-------+------+-------------+
| id | select_type | table   | type | possible_keys | key  | key_len | ref   | rows | Extra       |
+----+-------------+---------+------+---------------+------+---------+-------+------+-------------+
|  1 | SIMPLE      | example | ref  | a             | a    | 4       | const |    4 | Using where |
+----+-------------+---------+------+---------------+------+---------+-------+------+-------------+

While we are using the index (see the key column), the full benefit of the index is not utilized (see the key_len). 4 indicates the number of bytes used, that is only 1 INT column.

Let’s look at a second example.

SELECT id,x,y,z
FROM   example
WHERE  a = 2
AND    b = 0
AND    c = 2
AND    d = 3;

+----+-------------+---------+-------+---------------+------+---------+-------------------------+------+-------+
| id | select_type | table   | type  | possible_keys | key  | key_len | ref                     | rows | Extra |
+----+-------------+---------+-------+---------------+------+---------+-------------------------+------+-------+
|  1 | SIMPLE      | example | const | a             | a    | 16      | const,const,const,const |    1 |       |
+----+-------------+---------+-------+---------------+------+---------+-------------------------+------+-------+

In this example the index is used however the key_len is 16, that is 4 x 4 byte INT columns. This is ideal for this index.

In the above example, the client was using a common data structure however was not using one column, it’s values were all effectively 0. Certain queries were written with this knowledge and instead of specifying the column, they elected to remove it, however the impact of this developer code change was increased load on the database and more inefficient performance.

While this was easily addressed by a code change, an alternative could have been to change the index definition. It was not possible to remove the column due to legacy requirements.

ALTER TABLE example
DROP INDEX a,  ADD UNIQUE INDEX (a,c,d);
mysql> explain SELECT id,x,y,z FROM   example  WHERE  a = 2 AND    c = 2 AND    d = 3;
+----+-------------+---------+-------+---------------+------+---------+-------------------+------+-------+
| id | select_type | table   | type  | possible_keys | key  | key_len | ref               | rows | Extra |
+----+-------------+---------+-------+---------------+------+---------+-------------------+------+-------+
|  1 | SIMPLE      | example | const | a             | a    | 12      | const,const,const |    1 |       |
+----+-------------+---------+-------+---------------+------+---------+-------------------+------+-------+

A better solution may be to enforce better integrity on this business rule, that b only contains 0. MySQL does not support check constraint. The closest option would be to use an ENUM data type and a STRICT sql_mode however ENUM is only a string object so this would break other code.

ALTER TABLE example MODIFY b ENUM('0') NOT NULL DEFAULT '0';

Not only is performance impacted in this situation, in other examples of ineffective indexes it’s simply a waste of diskspace to sort additional columns in an index that are never used, and also a waste of memory as the index pages stored in memory contain information that is not used.

Getting started with Cassandra

With the motivation from today’s public news on Twitter’s move from MySQL to Cassandra, my own skills desire following in-depth discussions at last November’s Open SQL Camp to consider Cassandra and yesterday’s discussion with a new client on persistent key-value store products, today I download installed and configured for the first time. Not that today’s news was unexpected, if you follow the Twitter Engineering Open Source projects you would have seen Cassandra as well as other products being used or evaluated by Twitter.

So I went from nothing to a working Cassandra node in under 5 minutes. This is what I did.

  1. While I knew this was an Apache project, a Google Search yields for me the 3rd link for the The Apache Cassandra Project at http://incubator.apache.org/cassandra/. Congrats for Cassandra now a top level Apache Project. This url will update soon.
  2. Download Cassandra. Hard to miss with a big green button on home page. Current version is 0.5
  3. I read Getting Started, which is the 3rd top level link on menu after Home and Download. Step 1 is picking a version which I’ve already done, Step 2 is Running a single node.
  4. The Getting Started indicated a problem on Mac OS X for the required minimum Java version. I was installing on Mac OS X 10.5 and CentOS 5.4. I’ve experienced this Java 6 default path issue before. Set my JAVA_HOME and PATH accordingly (after I updated the wiki with correct value)
  5. I extracted the tar file, changed to the directory and took at look at the README.txt file. Yes, I always check this first with any software and relevant because it includes valuable instructions on creating the default data and log directories.
  6. Start with bin/cassandra -f. No problems!
  7. I then followed the instructions from the link in Step 2 with the CassandraCli. This tests and confirms the installation is operational.

Ok, a working environment. I’ve now installed on a second machine and tested however I now need to configure the cluster, and the documentation is not as straightforward. Time to try out Google again.

On a side note, this is one reason why I love Open Source. I followed the instructions online and found a mistake in the Mac OS X path, I simply registered and corrected providing the benefit of my experience for the next reader(s).

You may also like to view future posts including.

The correct approach to rolling MySQL logs

I say correct because there are several incorrect approaches to managing MySQL logs. In MySQL you have two important log files, the MySQL error log (configured with –log-error) and the MySQL slow query log (configured with –log-slow-queries or –slow-query-log and –slow-query-log-file which is available from 5.1.29).

The ideal management of these log files is different for each type of file.

The MySQL Error Log

With the error log you want to have one file showing a contiguous history of the server instance. This log provides valuable information over time and you should not discard this. You do NOT want to roll your error log. If for feel the content in the error log is too much, then these are errors you need to be addressing, and archiving after correcting. There are circumstances where error logs are rolled outside of your control.

The first is the FLUSH LOGS command. When this command is run, the error log is renamed to -old and a new log is created. For example:

$ ls -l log/error*
-rw-rw---- 1 mysql root  1733 Feb 22 16:11 error.log

$ mysql -uroot -p -e "FLUSH LOGS"

$ ls -l log/error*
-rw-rw---- 1 mysql mysql    0 Feb 22 18:08 error.log
-rw-rw---- 1 mysql root  1733 Feb 22 16:11 error.log-old

What happens when you run this command again?

$ mysql -uroot -p -e "FLUSH LOGS"

$ ls -l log/error*
-rw-rw---- 1 mysql mysql 0 Feb 22 18:10 log/error.log
-rw-rw---- 1 mysql mysql 0 Feb 22 18:08 log/error.log-old

You have now lost all valuable information in the error log, both the current log and the -old log are ZERO bytes in size.

The second example is the Ubuntu specific MySQL distribution on Ubuntu OS that logs MySQL information to the system error log (i.e. /var/log/syslog). You are then receiving a daily log rotate via the default OS log-rotate settings. You effectively lose information after 7 days. Here is what you will find on a stock Ubuntu server.

$ ls -l /var/log
...
-rw-r----- 1 mysql       adm       0 2008-05-28 20:33 mysql.err
-rw-r----- 1 mysql       adm       0 2008-05-28 20:33 mysql.log
...
-rw-r----- 1 syslog      adm  278480 2010-02-22 20:22 syslog
-rw-r----- 1 syslog      adm  366934 2010-02-22 06:25 syslog.0
-rw-r----- 1 syslog      adm   21025 2010-02-21 06:27 syslog.1.gz
-rw-r----- 1 syslog      adm   18551 2010-02-20 06:47 syslog.2.gz
-rw-r----- 1 syslog      adm   20086 2010-02-19 06:25 syslog.3.gz
-rw-r----- 1 syslog      adm   17135 2010-02-18 06:40 syslog.4.gz
-rw-r----- 1 syslog      adm   19238 2010-02-17 06:32 syslog.5.gz
-rw-r----- 1 syslog      adm   16101 2010-02-16 06:34 syslog.6.gz

You have to troll the syslog files to find any mysql specific information, even that is not possible with one command.

$ grep syslog syslog.0 | grep mysql

$ zcat syslog*gz | grep mysql
Feb 16 22:12:20 db1 mysqld[21769]: 100216 22:12:20 [ERROR] /usr/sbin/mysqld: Incorrect key file for table '/tmp/#sql_5508_40.MYI'; try to repair it
Feb 16 22:12:20 db1 mysqld[21769]: 100216 22:12:20 [ERROR] /usr/sbin/mysqld: Incorrect key file for table '/tmp/#sql_5508_33.MYI'; try to repair it
Feb 16 22:30:15 db1 mysqld[21769]: 100216 22:19:13 [ERROR] /usr/sbin/mysqld: Incorrect key file for table '/tmp/#sql_5508_29.MYI'; try to repair it
Feb 16 22:30:17 db1 mysqld[21769]: 100216 22:19:13 [ERROR] /usr/sbin/mysqld: Incorrect key file for table '/tmp/#sql_5508_44.MYI'; try to repair it
...

I’m lucky I looked, because tomorrow would have been too late to see these errors, as this is the oldest log file.

I place a disclaimer that if you have proper backups in place, these files are retrievable, but that takes time and effort which I consider unnecessary for a database server. The mysql log is one of the most important log files for this type of server, you want this information on the server.

The solution to this problem is easy in Ubuntu, always define the -log-error variable for the [mysqld] and [mysqld_safe] sections to a specific file, e.g. /var/log/mysql/mysql.log

The MySQL slow query log

With the slow query log, you DO want to rotate the log file produced. The best practice is to rotate this file daily. In addition you should both analyze the log file producing a top 5 or top 10 list of slow SQL queries each day. As load may not always predictable, it’s ideal to also analyze the combined logs of the past 7 days for cross reference.

The daily granularity allows you to track your load of slow performing queries more consistently and it also enables you verify more easily the impact of improvements made when they have been deployed.

Care needs to be taken to roll your log filed. Simply moving the log file will not work. For example.

A good tip to help in slow query SQL analysis is to
Comment your SQL (url)

$ mysql -uroot -p -e "select sleep(5)"
$ tail -1 log/slow.log
select sleep(5);

$ ls -l log/slow*
-rw-rw---- 1 mysql mysql 352 Feb 22 15:26 log/slow.log

$ mv log/slow.log log/slow.log.1; touch log/slow.log
$ ls -l log/slow*
-rw-r--r-- 1 root  root    0 Feb 22 15:26 log/slow.log
-rw-rw---- 1 mysql mysql 352 Feb 22 15:26 log/slow.log.1


$ mysql -uroot -p -e "SELECT SLEEP(4)"

$ ls -l log/slow*
-rw-r--r-- 1 root  root    0 Feb 22 15:26 log/slow.log
-rw-rw---- 1 mysql mysql 533 Feb 22 16:01 log/slow.log.1
$ tail -1 log/slow.log.1
SELECT SLEEP(4);

As you can see, the slow log was not written to, but the previous file which has the same inode.

$ rm -f log/slow.log; mv log/slow.log.1 log/slow.log
$ ls -l log/slow.log*
-rw-rw---- 1 mysql mysql 533 Feb 22 16:01 log/slow.log
$ cp log/slow.log log/slow.log.`date +%M`; > log/slow.log
$ mysql -uroot -p -e "SELECT SLEEP(3)"
$ ls -l log/slow.log*
-rw-rw---- 1 mysql mysql 181 Feb 22 16:03 log/slow.log
-rw-r----- 1 root  root  533 Feb 22 16:03 log/slow.log.03
$  tail -1 log/slow.log
SELECT SLEEP(3);

$ cp log/slow.log log/slow.log.`date +%M`; > log/slow.log
$ mysql -uroot -p -e "SELECT SLEEP(6)"
$ ls -l log/slow.log*
-rw-rw---- 1 mysql mysql 181 Feb 22 16:04 log/slow.log
-rw-r----- 1 root  root  533 Feb 22 16:03 log/slow.log.03
-rw-r----- 1 root  root  181 Feb 22 16:04 log/slow.log.04
$ tail -1 log/slow.log
SELECT SLEEP(6);

As you can see by copying and truncating you can perform an effective log rotate manually. Ideally you should config logrotate to manage this log.

More Information

What's your MySQL version?

I’ve heard that the mechanic’s wife always has a car that needs repair or tuneup, the painter’s wife always had walls of peeling paint, you get the picture. What about MySQL DBA’s and their own databases? While I have many versions of MySQL for testing including for example the latest 5.1.44 which I was using for my previous post, what is running on my production server? Let’s see:

mysql> select version();
+-----------+
| version() |
+-----------+
| 5.1.25-rc |
+-----------+

That’s really old. And yes, to prove my point that we can be our own worst enemy, the previous version before 5.1.25 was 5.1.6. Yes, .6 which worked just fine, and never crashed once for my 20+ websites. While I have downloaded onto my production server several versions ready for upgrade including versions 5.1.30, 5.1,38, and 5.4.1 I’ve never actually gone through the upgrade process.

Migrating MySQL latin1 to utf8 – Character Set Options

Continuing on from preparation in our MySQL latin1 to utf8 migration let us first understand where MySQL uses character sets. MySQL defines the character set at 4 different levels for the structure of data.

  • Instance
  • Schema
  • Table
  • Column

In MySQL 5.1, the default character set is latin1. If not specified, this is what you will get. For example.

mysql> create table test1(c1 varchar(10) not null);
mysql> show create table test1G
Create Table: CREATE TABLE `test1` (
  `c1` varchar(10) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1

If you want all tables in your instance to always be a default of utf8, you can changed the server variable character_set_server. This can be set dynamically.

mysql> set global character_set_server=utf8;
mysql> set session character_set_server=utf8;
mysql> create table test2(c1 varchar(10) not null);
mysql> show create table test2G
Create Table: CREATE TABLE `test2` (
  `c1` varchar(10) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8

If you change this dynamically be sure to include the option in your my.cnf to ensure this option is persisted for a mysqld restart.

You can define the default character set for all new tables in a given schema. You specify this when you create the schema.

mysql> set global character_set_server=latin1;
mysql> set session character_set_server=latin1;
mysql> create schema test_ucs2 default character set ucs2;
mysql> use test_ucs2;
mysql> create table test3(c1 varchar(10) not null);
mysql> show create table test3G
Create Table: CREATE TABLE `test3` (
  `c1` varchar(10) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=ucs2

Even though we have a schema default, you can always specify the default character set for a given table which overrides any defaults.

mysql> use test_ucs2;
mysql> create table test4_utf8 (c varchar(10) not null) default charset utf8;
mysql> show create table test4_utf8G
Create Table: CREATE TABLE `test4_utf8` (
  `c` varchar(10) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8

And finally, if you really wanted to be specific you can define the character set on a per column level.

mysql> create table test4_utf8_latin1 (c varchar(10) not null, c2 varchar(20) charset latin1) default charset utf8;
mysql> show create table test4_utf8_latin1G
*************************** 1. row ***************************
       Table: test4_utf8_latin1
Create Table: CREATE TABLE `test4_utf8_latin1` (
  `c` varchar(10) NOT NULL,
  `c2` varchar(20) CHARACTER SET latin1 DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8

With great flexibility comes great responsibility. You should have a defined standard for your application that is simple and easy to understand. I am not a proponent of using utf8 for everything, the primary reason why is memory. As part of my consulting I spend a lot of time with clients that have limited resources, e.g. database servers with 2GB or 4GB of RAM. MySQL stores utf8 efficiently on disk, but when this data is stored in memory for internal usage, it automatically uses 3 bytes, when on disk it may only be 1 byte. You can test this by creating a MEMORY table with latin1 and utf8 examples and comparing the difference in size. Is this a serious problem? Well that depends on many factors such as the number of database connections, persistent or not persistent connections, the size of the results etc. While it’s difficult in MySQL to instrument the memory precisely on a per connection basis, prudence should be a consideration for any physical resources, especially RAM.

Now that we understand what’s possible, how can we change our existing latin1 tables in our preparation example?

We could try a simple ALTER TABLE command.

mysql> alter table test_latin1 default charset utf8;
mysql> show create table test_latin1G
Create Table: CREATE TABLE `test_latin1` (
  `c` varchar(100) CHARACTER SET latin1 NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8

This does not work, because we are only changing the default storage engine of the table. The underlying columns remain the same. If we were to add a new column, it would default to utf8. We can however achieve what we expected with the CONVERT option.

mysql> alter table test_latin1 convert to character set utf8;
mysql> show create table test_latin1G
Create Table: CREATE TABLE `test_latin1` (
  `c` varchar(100) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8

We look at our data, and it looks great? Are we done with our conversion?

mysql> select * from test_latin1;
+---------------+
| c             |
+---------------+
| a             |
| abc           |
| ☺           |
| abc ☺☹☻ |
+---------------+

The answer is no. While it may look like the data is correct, MySQL also manages character sets for the communication channel. In this case, we are still communicating in latin1. To ensure moving forward in the future we must always communicate in utf8 to ensure we correctly pass utf8 to the database. We can test this with the mysql client, and as you will see our data is still corrupt.

mysql> set names utf8;
mysql> select * from test_latin1;
+------------------------+
| c                      |
+------------------------+
| a                      |
| abc                    |
| ☺                 |
| abc ☺☹☻ |
+------------------------+

mysql> show session variables like 'character%';
+--------------------------+----------------------------------------------------------------+
| Variable_name            | Value                                                          |
+--------------------------+----------------------------------------------------------------+
| character_set_client     | utf8                                                           |
| character_set_connection | utf8                                                           |
| character_set_database   | latin1                                                         |
| character_set_filesystem | binary                                                         |
| character_set_results    | utf8                                                           |
| character_set_server     | latin1                                                         |
| character_set_system     | utf8                                                           |
+--------------------------+----------------------------------------------------------------+

While you can see how we could migrate the schema definition, this does not complete our migration. In my next post, I will discuss the various different ways to correctly perform a data migration between latin1 and utf8.

Checked your MySQL recovery process recently?

I sound like a broken record with every client when I talk to about the resilience of their production environments. It’s very simple in theory, however in practice many organizations fail.

Ask yourself these checklist questions for your MySQL backup and recovery process?

  1. Do you have MySQL backups in place?
  2. Do you backup ALL your MySQL data?
  3. Do you have consistent MySQL backups?
  4. Do you have backups that include both static snapshot and point in time transactions?
  5. Do you review your backup logs EVERY SINGLE day or have tested backup monitoring in place?
  6. Do you perform a test recovery of your static backup?
  7. Do you perform a test recovery to point in time?
  8. Do you time your backup and recovery process and review over time?
  9. Do you have off-site copies of your backups?
  10. Do you backup your primary binary logs?

In the past month I’ve discovered clients that have an online only business (i.e. Their MySQL data is their only tangible asset), they perform daily backups but they don’t have binary logging enabled. I’ve also discovered an example of backup logs not being checked, and an underlying mysqldump error was resulting in an incomplete backup, yet the backup script apparently completed successfully.

Disaster is inevitable.

If you don’t score 8 or better in the above checklist in your business, you are at higher risk. If you are a owner/founder/executive this should keep you awake at night if your not sure of your business viability. If your organization needs help, please contact me for assistance.

  • Can you recover from a small or large disaster?
  • Do you have confidence in your DR plan?
  • Do you know how long your DR plan will take.
  • What does your online business look like or operate during your DR time?

The Blue Pill or the Red Pill

At the recent FOSDEM 2010 event, I presented in my keynote Dolphins, now and beyond a option which I termed the “Blue Pill” or the “Red Pill”. The following slide produced noticed interest in a packed room, and subsequent conversation.

While the ownership of MySQL has changed, the option between MySQL and Oracle as a product for use still remains. While MySQL is the most popular for modern online applications, Oracle continues to have the widely used enterprise database product and has a large number of Oracle DBAs in the IT marketplace.

Over the past 5 years I have presented a number of topics on MySQL for Oracle DBA’s. At the upcoming MySQL Users Conference 2010 I will be presenting the first two talks in my new series “MySQLCamp for the Oracle DBA”.

  • IGNITION – is the preparation necessary for a successful launch of a MySQL ecosystem for an Oracle DBA. This volume covers the preparation needed to be ready for ongoing production administration of MySQL.
  • LIFTOFF – Only after a successful preparation covered in IGNITION can you be ready for the implementation and management of a MySQL ecosystem and a successful launch of your product.

Migrating MySQL latin1 to utf8 – Preparation

This article is Part 1 of a series of articles regarding MySQL character set conversion.
Be sure to also check out character set options and the process for more information.

Before undertaking such migration the first step is a lesson in understanding more about how latin1 and utf8 work and interact in MySQL. latin1 in a common and historical character set used in MySQL. utf8 (first available in MySQL Version 4.1) is an encoding supporting multiple bytes and is the system default in MySQL 5.0

  • latin1 is a single byte character set.
  • utf8 is a 1-3 byte character set depending on the size of the character. NOTE: MySQL utf8 does not support the RFC 3629 4 byte sequences. (Updated: MySQL 5.5 now supports full Unicode support with the ” utf8m4″charset))

MySQL variables

MySQL has a number of different system variables to consider, the following is the default representation in MySQL 5.1

mysql> show global variables like '%char%';
+--------------------------+----------------------------------------------------------------+
| Variable_name            | Value                                                          |
+--------------------------+----------------------------------------------------------------+
| character_set_client     | latin1                                                         |
| character_set_connection | latin1                                                         |
| character_set_database   | latin1                                                         |
| character_set_filesystem | binary                                                         |
| character_set_results    | latin1                                                         |
| character_set_server     | latin1                                                         |
| character_set_system     | utf8                                                           |
+--------------------------+----------------------------------------------------------------+

MySQL enables you to specify the character set for data at the Instance,Schema,Table and Column level. This is important because this complexity, especially between the Table and Columns can catch you out.

It is also important to ensure that not only is data stored in the appropriate format, the communication between client and server also needs to support the character set you wish to use.

latin1 example

We start by creating a simple table, inserting some data, and reviewing the data.

mysql> create table test_latin1(c varchar(100) not null) default charset latin1;
mysql> insert into test_latin1(c) values ('a'),('abc'),('☺'),('abc ☺☹☻');

mysql> select c,length(c),char_length(c),charset(c),hex(c) from test_latin1;
+---------------+-----------+----------------+------------+----------------------------+
| c             | length(c) | char_length(c) | charset(c) | hex(c)                     |
+---------------+-----------+----------------+------------+----------------------------+
| a             |         1 |              1 | latin1     | 61                         |
| abc           |         3 |              3 | latin1     | 616263                     |
| ☺             |         3 |              3 | latin1     | E298BA                     |
| abc ☺☹☻       |        13 |             13 | latin1     | 61626320E298BAE298B9E298BB |
+---------------+-----------+----------------+------------+----------------------------+
4 rows in set (0.00 sec)

We use the LENGTH() and CHAR_LENGTH() functions to confirm the known length of the string and the true number of characters.

In the above examples, the smiley characters can be reproduced in a web page with the following.

&#9786;&#9785;&#9787;

utf8 example

In a separate session (because we re-use these later) we repeat for utf8.

mysql> set  names utf8;
mysql> show session variables like '%char%';
+--------------------------+----------------------------------------------------------------+
| Variable_name            | Value                                                          |
+--------------------------+----------------------------------------------------------------+
| character_set_client     | utf8                                                           |
| character_set_connection | utf8                                                           |
| character_set_database   | latin1                                                         |
| character_set_filesystem | binary                                                         |
| character_set_results    | utf8                                                           |
| character_set_server     | latin1                                                         |
| character_set_system     | utf8                                                           |
+--------------------------+----------------------------------------------------------------+

mysql> create table test_utf8(c varchar(100) not null) default charset utf8;
mysql> insert into test_utf8(c) values ('a'),('abc'),('☺'),('abc ☺☹☻');

mysql> select c,length(c),char_length(c),charset(c), hex(c) from test_utf8;
+---------------+-----------+----------------+------------+----------------------------+
| c             | length(c) | char_length(c) | charset(c) | hex(c)                     |
+---------------+-----------+----------------+------------+----------------------------+
| a             |         1 |              1 | utf8       | 61                         |
| abc           |         3 |              3 | utf8       | 616263                     |
| ☺            |         3 |              1 | utf8       | E298BA                     |
| abc ☺☹☻       |        13 |              7 | utf8       | 61626320E298BAE298B9E298BB |
+---------------+-----------+----------------+------------+----------------------------+
4 rows in set (0.00 sec)

As you can see, the key difference here is the CHAR_LENGTH() of the utf8 strings differ.

Mixing latin1 with utf8

If we now look at the latin1 data in the utf8 session we see that while the underlying data via LENGTH(),CHAR_LENGTH() and HEX() remains unchanged (because this is the database representation of the data), the actual output presentation is garbled due to the mismatch in the client communication.

mysql> select c,length(c),char_length(c),charset(c), hex(c) from test_latin1;
+------------------------+-----------+----------------+------------+----------------------------+
| c                      | length(c) | char_length(c) | charset(c) | hex(c)                     |
+------------------------+-----------+----------------+------------+----------------------------+
| a                      |         1 |              1 | latin1     | 61                         |
| abc                    |         3 |              3 | latin1     | 616263                     |
| ☺                    |         3 |              3 | latin1     | E298BA                     |
| abc ☺☹☻          |        13 |             13 | latin1     | 61626320E298BAE298B9E298BB |
+------------------------+-----------+----------------+------------+----------------------------+
4 rows in set (0.00 sec)

If we look at the utf8 data in latin1, we see a different garbled representation.

mysql> select c,length(c),char_length(c),charset(c),hex(c) from test_utf8;
+---------+-----------+----------------+------------+----------------------------+
| c       | length(c) | char_length(c) | charset(c) | hex(c)                     |
+---------+-----------+----------------+------------+----------------------------+
| a       |         1 |              1 | utf8       | 61                         |
| abc     |         3 |              3 | utf8       | 616263                     |
| ?       |         3 |              1 | utf8       | E298BA                     |
| abc ??? |        13 |              7 | utf8       | 61626320E298BAE298B9E298BB |
+---------+-----------+----------------+------------+----------------------------+
4 rows in set (0.00 sec)

Conclusion

Armed now with a basic understanding we are ready to consider what approaches we may undertake to migrate this example table, and how we may be able to verify our data.

References

Character Set Support
Connection Character Sets and Collations

Beyond MySQL GA: patches, storage engines, forks, and pre-releases – FOSDEM 2010

Kristian Nielsen presented “Beyond MySQL GA: patches, storage engines, forks, and pre-releases”.
This included a history of current products:

Google Patches (5.0 & 5.1) included improvements in :

  • statistics/monitoring
  • lock contention
  • binlog
  • malloc()
  • filesorts
  • innodb I/O and wait statistics
  • SHOW …STATISTICS statements
  • smp scalability
  • I/O scalability
  • semisync replication
  • many more

Percona Patches (5.0) focus on

  • statistics/monitoring
  • performance/scalability
  • buffer pool content/mutexes
  • microslow patch

These have been ported to 5.1 and mainly integrated into XtraDB.

EBay Patches (5.0) have included:

  • variable length memory storage engine
  • pool of threads
  • Virtual columns

XtraDB storage engine (5.1) includes

  • Percona patches
  • Google patches
  • Innodb patches
  • Has XtraBackup for backup

Other engines/patches discussed included:

  • PBXT storage engine – community contribution
  • FederatedX – replacement to Federated
  • Sphinx storage engine
  • Pinba storage engine – Collects PHP statistics
  • Others OQGraph/Spider
  • Galera – Synchronous replication
  • Drizzle

Alternative packaging options for MySQL 5.0 and MySQL 5.1 including Our Delta, Percona and MariaDB.

FOSDEM 2010 MySQL Developer Room Schedule
FOSDEM 2010 Website
Brussels, Belgium
February 7, 2010

Multi-Master Manager for MySQL – FOSDEM 2010

The next presentation by Piotr Biel from Percona was on Multi-Master Manager for MySQL.

The introduction included a discussion of the popular MySQL HA solutions including:

  • MySQL Master-slave replication with failover
  • MMM managed bi-directional replication
  • Heartbeat/SAN
  • Heartbeat/DRBD
  • NDB Cluster

A key problem that was clarified in the talk is the discussion of Multi-Master and this IS NOT master-master. You only write to a single node. With MySQL is this critical because MySQL replication does not manage collision detection.

The MMM Cluster Elements are:

  • monitoring node
  • database nodes

And the Application Components are:

  • mon
  • agent
  • angel

MMM works with 3 layers.

  • Network Layer – uses a virtual IP address, related to servers, not a physical machine
  • Database Layer
  • Application Layer

MMM uses two roles for management with your application.

  • exclusive – also known as the writer
  • balanced – also known as the reader

There are 3 different statuses are used to indicate node state

  • proper operation
  • maintenance
  • fatal errors

The mmm_control is the tool used to manage the cluster including:

  • move roles
  • enable/disable individual nodes
  • view cluster status
  • configure failover

The Implementation challenges require the use of the following MySQL settings to minimize problems.

  • auto_increment_offset/auto_increment_increment
  • log_slave_updates
  • read_only

FOSDEM 2010 MySQL Developer Room Schedule
FOSDEM 2010 Website
Brussels, Belgium
February 7, 2010

10x Performance Improvements in MySQL – A Case Study

The slides for my presentation at FOSDEM 2010 are now available online at slideshare. In this presentation I describe a successful client implementation with the result of 10x performance improvements. My presentation covers monitoring, reviewing and analyzing SQL, the art of indexes, improving SQL, storage engines and caching.

The end result was a page load improvement from 700+ms load time to a a consistent 60ms.

State of phpMyAdmin – FOSDEM 2010

Following the opening keynote “Dolphins, now and beyond”, Marc Delisle presented on “State of phpMyAdmin”.

phpMyAdmin is an DBA administration tool for MySQL available today in 57 different languages. This is found today in many distributions, LAMP stack products and also in cpanel. The product is found at http://phpmyadmin.net.

There are current two versions, the legacy 2.x version to support older php 3.x & 4.x, The current version 3.x is for PHP 5.2 or greater.

The current UI includes some new features including.

  • calendar input for date fields
  • meta data for mime types e.g images, which is great for showing the output as an image, otherwise blob data
  • Relational designer with the able to show and create foreign keys

The New features in 3.3 (currently in beta) include:

  • Replication support including configuring master/slave, start/stop slave.
  • Synchronization model showing structure and data differences between two servers and ability to sync.
  • New export to php array, xslx, mediawiki, new importing features including progress bar.
  • Changes tracking for changes on per instance or per table. Providing change report and export options.

FOSDEM 2010 MySQL Developer Room Schedule
FOSDEM 2010 Website
Brussels, Belgium
February 7, 2010

Be sure to know your my.cnf [sections]

The MySQL configuration file, e.g. /etc/my.cnf has a number of different section headings including [mysql], [mysqld], [mysqld_safe]. It is important that you ensure you put the right variables into the right section. For example, the following my.cnf configuration file will not operate as the user probably expects.

[mysqld]
...
log-bin=mysql-bin
server-id=1
query_cache_size = 100M
query_cache_type = 1

...

[mysqld_safe]
...
key_buffer_size=600M
skip-innodb
...

In this example, this configuration does not give you a MyISAM key buffer of 600M, it’s actually the default of 8M.

mysql> show global variables like 'key_buffer_size';
+-----------------+---------+
| Variable_name   | Value   |
+-----------------+---------+
| key_buffer_size | 8388600 |
+-----------------+---------+

Be sure to add the right options to the [mysqld] section.

What I didn’t know until yesterday was that some programs read from multiple groups. From the 5.1.2. Server Command Options MySQL reference manual page. In helping the describe the problem for the benefit of readers I actually learned something new myself.


mysqld reads options from the [mysqld] and [server] groups. mysqld_safe reads options from the [mysqld], [server], [mysqld_safe], and [safe_mysqld] groups. mysql.server reads options from the [mysqld] and [mysql.server] groups.

I have for example always put log-error in both the [mysqld_safe] and [mysql]d sections because both of these write different errors. Seems that is unnecessary.

Speaking at MySQL UC 2010

My talk on 10x performance improvements – A case study has just been approved for the 2010 MySQL Conference. This will be my 5th straight year speaking at the MySQL conferences. For those in Europe wanting a sneak peek I am also speaking at FOSDEM 2010 in Brussels on Feb 7th where I’ll be giving an abridged version.

As an independent MySQL consultant, my work generally covers performance tuning and scalability and sometimes database architecture. Often however my work involves a review of a given problem and recommendations (immediate, short and long term). I’m rarely involved in the full implementation and generally do not see the full fruits of the proposed work.

Recently however I was able to work with a client, first in resolving critical performance issues and then in a review of the application architecture and MySQL environment, provide recommendations and also helping internal resources in the successful implementation. The result was a very successful engagement and an ideal case study on a strategy for tackling improving performance of an application using MySQL.

With an existing environment that included a MySQL master, 3 database slaves and 6 web servers this application provided the grounds of a well sized configuration and the need for greater availability and scalability.

The Call for Papers for the 2010 MySQL conference is still open. If you would like to share your experiences with MySQL submit your talk today.


O'Reilly MySQL Conference & Expo 2010

Europe conference options for MySQL Developers

For those in the US the annual MySQL UC is taking place again in April. For those in Europe we have dedicated room for MySQL and MySQL related products/variants/branches at FOSDEM 2010 being held in Brussels, Belgium on 6-7 Feb.

This conference will feature a full day of talks with a format of 20 minutes presentation and 5 minutes Q&A. More information about submissions can be found at Call for Papers for “MySQL and Friends” Developer Room at FOSDEM 2010 now open!

Other references:

Updated

Wednesday January 6th is the last date for submissions. Extension for FOSDEM MySQL

Monitoring the right MySQL slow queries

When looking at a set of SQL statements in isolation with tools such as the slow query log, processlist and tcpdump/mk-query-digest it is easy to identify queries that are slow.

It is more difficult however to identify the frequency of the query, and whether the slow query is actually acceptable in your overall system design.

I very simple technique to help you is to comment your queries. For example:

SELECT /* 10m cache */ ...;

When I’m working for a client, this process enables me to realize the purpose of the query more quickly and to help prioritize which queries I need to improve first. This does not mean I simply ignore these less frequent queries, however my choices for reviewing and indexing queries on given tables is adjusted generally for OLTP queries first, and batch second.

As with all practices it is important to implement across your full code base.

What do MySQL staff think of the acquisition?

It finally dawned on me while reflecting on the year past this Sunday that the missing voice since the announcement of the Oracle acquisition of Sun Microsystems (and therefore MySQL) has been the MySQL employees.

When I worked as an employee for MySQL Inc, the acquisition by Sun Microsystems in 2008 lead to several requirements about the acquisition.

  • You were not allowed to talk about the acquisition publically.
  • You were not allowed to communicate with any Sun (i.e. the acquirer) resources.

In other words it was “business as usual” which is really an oxymoron, because business will never be exactly as it was before the announcement. The ongoing delay in pending acquisition by Oracle Corporation is really hurting everybody with getting on with doing their jobs, being happy with their work, and making a difference in open source and in the lives of all the benefit from using MySQL.

I’m sure many that have words to say are disappointed, worried or even fearful of their own future careers. Comments are always welcome via Mr Anonymous using 10 minute email.

Understanding installing MySQL rpm versions

I have a problem with an easy way to install MySQL via rpm without resorting to specifying the exact point release of MySQL. Presently my local yum repository has versions of 5.0, 5.1,5.4 and 5.5.

If I want to install MySQL Sever, I can just run:

$ sudo yum install -y MySQL-server
Setting up Install Process
Package MySQL-server-community-5.5.0-1.rhel5.x86_64 already installed and latest version
Nothing to do

The issue here is the most current version is installed. If I want to install the most current version of 5.1 for example, I have found no way to specify MySQL-server-5.1, or MySQL-server-community-5.1, I have to specify the point release MySQL-server-community-5.1.40

I suspect there is some internal aliasing that may be possible within rpm’s to support this. I’m seeking help from any rpm experts and would appreciate any feedback.

My current products include:

$ sudo yum list MySQL-server-community
Installed Packages
MySQL-server-community.x86_64      5.5.0-1.rhel5        installed
Available Packages
MySQL-server-community.x86_64      5.0.82-0.rhel5       mydb-rhel5-server-x86_64
MySQL-server-community.x86_64      5.0.82-0.rhel5       mydb-rhel5-x86_64
MySQL-server-community.x86_64      5.1.40-0.rhel5       mydb-rhel5-server-x86_64
MySQL-server-community.x86_64      5.1.40-0.rhel5       mydb-rhel5-x86_64
MySQL-server-community.x86_64      5.4.3-0.rhel5        mydb-rhel5-server-x86_64
MySQL-server-community.x86_64      5.4.3-0.rhel5        mydb-rhel5-x86_64
MySQL-server-community.x86_64      5.5.0-1.rhel5        mydb-rhel5-server-x86_64
MySQL-server-community.x86_64      5.5.0-1.rhel5        mydb-rhel5-x86_64

The Oracle EU statement on MySQL – What's missing

Many providers embedd MySQL with their commercial products including Adobe, Macfee, Nokia, Symantec and ScienceLogic just to name a few. In addition most commercial third party storage engines have for years been forced to provided very customized versions of MySQL due to limitations in the storage engine API. These situations require a license agreement necessary with the trademark holder of MySQL. The Oracle Corporation EU Statement released on December 14, 2009 has carefully worded in the statement about these OEM licenses and storage engine providers there will be no changes for 5 years.

One specific detail is missing, what happens then?

As an individual that uses, recommends, promotes and advises clients especially on various storage engine offerings, I see the downstream effect of uncertainty for these providers can kill the entrepreneurial spirit that the economy desperately needs.

I see the possible impact as two fold. If company X that pays a license agreement is forced in 5 years to now pay say 3x the fee for example and cannot do so, the supplier chain is broken. What about companies that use the product from company X. Are they now liable with a cease and desist notice, therefore jeopardizing these businesses as well.

I see a further complexity in the agreement for storage engine providers. While there is a commitment to maintain the API, basically all storage engine providers don’t use the API, they have custom extensions and custom binaries essential for operation. The agreement makes specific reference to using the provided interfaces. Will the ability to provide heavily customized versions be permitted?

I’m not an expert in open source and commercial licenses and I am not a lawyer, however in the past I admit I may have been ignorant to the specifics that can affect the livelihood of many including myself.

As with any problem, I seek to have or find a solution. I am unclear here what options do exist, and would appreciate feedback.

Due to the anti competitive (*) nature of this resulting acquisition (#), should there be provisions to ensure unrestricted commercial licensing will always be available in the future regardless of time frame. Should limits on the amount of increase or change in license costs to a reasonable amount be enforced to protect companies.

It is most unclear about what will happen and this uncertainly I’m sure is affecting commercial interests.

Until this statement was released, the greatest fear in the MySQL community has been uncertainly. This statement does mention some details but the specifics of commercial business will never be known and therefore still affects us.

(*) Anti competitive nature. Contrary to statements by others, MySQL and Oracle are competitors in the relational database space. While generally they service different spectrums, there is sufficient overlap. As an expert and speaker in Oracle to MySQL migrations I clearly see the competition with these products. For example, while I will never purchase a Ferrari over a Holden for example due to cost and desirable but unnecessary features, they are still both cars and are competitors in their respective industry, the same applies to Oracle and MySQL.

(#) It is important to realize that Oracle is not acquiring MySQL directly. Oracle is acquiring Sun Microsystems which includes Sun hardware platform, the Solaris Operating System and Java which I’m sure are the primary motivations. MySQL is included as it is now owned by Sun Microsystems, and this is the current issue.

Couldn't load plugin named 'innodb'

As part of reviewing storage engines for my work on the upcoming Expert PHP and MySQL book, I finally had an excuse to try out the InnoDB Plugin for MySQL which is now conveniently included with MySQL 5.1 since 5.1.38.

Following the MySQL 5.1 Reference Manual instructions at 13.6. The InnoDB Storage Engine I included the bare minimum as documented to my my.cnf.

[mysqld]
ignore-builtin-innodb
plugin-load=innodb=ha_innodb_plugin.so ;innodb_trx=ha_innodb_plugin.so ;innodb_locks=ha_innodb_plugin.so ;innodb_cmp=ha_innodb_plugin.so ;innodb_cmp_reset=ha_innodb_plugin.so ;innodb_cmpmem=ha_innodb_plugin.so ;innodb_cmpmem_reset=ha_innodb_plugin.so

However to my misfortune the following error occured.

091212 17:45:14 [ERROR] Can't open shared library '/home/rbradfor/mysql/mysql-5.1.41-linux-x86_64-glibc23/lib/plugin/ha_innodb_plugin.so ' (errno: 22 cannot open shared object file: No such file or directory)
091212 17:45:14 [ERROR] Couldn't load plugin named 'innodb' with soname 'ha_innodb_plugin.so '.

I double checked the files were indeed included in the binary release before undertaking a google search, to reveal the official Innodb Plugin Manual page but still not much to go on with the information ” Can’t open shared library library_name — Diagnose the cause from the following message details.”

Back to re-reading the error message, as referenced, I observed there is a trailing space ‘ ‘ ater the plugin name. Observing that I had indeed had this the my.cnf, the result of joining the lines from the MySQL manual which were broken down for presentation means (as noted).

Removed the space, and viola it works.

mysql> show plugins;
+---------------------+----------+--------------------+---------------------+---------+
| Name                | Status   | Type               | Library             | License |
+---------------------+----------+--------------------+---------------------+---------+
| binlog              | ACTIVE   | STORAGE ENGINE     | NULL                | GPL     |
| partition           | ACTIVE   | STORAGE ENGINE     | NULL                | GPL     |
| ARCHIVE             | ACTIVE   | STORAGE ENGINE     | NULL                | GPL     |
| BLACKHOLE           | ACTIVE   | STORAGE ENGINE     | NULL                | GPL     |
| CSV                 | ACTIVE   | STORAGE ENGINE     | NULL                | GPL     |
| FEDERATED           | DISABLED | STORAGE ENGINE     | NULL                | GPL     |
| MEMORY              | ACTIVE   | STORAGE ENGINE     | NULL                | GPL     |
| MyISAM              | ACTIVE   | STORAGE ENGINE     | NULL                | GPL     |
| MRG_MYISAM          | ACTIVE   | STORAGE ENGINE     | NULL                | GPL     |
| InnoDB              | ACTIVE   | STORAGE ENGINE     | ha_innodb_plugin.so | GPL     |
| INNODB_TRX          | ACTIVE   | INFORMATION SCHEMA | ha_innodb_plugin.so | GPL     |
| INNODB_LOCKS        | ACTIVE   | INFORMATION SCHEMA | ha_innodb_plugin.so | GPL     |
| INNODB_CMP          | ACTIVE   | INFORMATION SCHEMA | ha_innodb_plugin.so | GPL     |
| INNODB_CMP_RESET    | ACTIVE   | INFORMATION SCHEMA | ha_innodb_plugin.so | GPL     |
| INNODB_CMPMEM       | ACTIVE   | INFORMATION SCHEMA | ha_innodb_plugin.so | GPL     |
| INNODB_CMPMEM_RESET | ACTIVE   | INFORMATION SCHEMA | ha_innodb_plugin.so | GPL     |
+---------------------+----------+--------------------+---------------------+---------+

I would question that this level of pedantic checking is unnecessary. A simple trim on semi-colon ‘;’ separated values could be an easy fix.

Is your database schema in sync?

If you have more then a single MySQL database for your production environment, e.g. a development and test environment, or a MySQL replication topology, ensuring your schema’s are in sync can be task that requires some time if not managed correctly.

There is a tool I do use for MySQL environments called Schema Sync – a MySQL Schema Versioning and Migration Utility. There are many reasons why schema’s get out of sync. Developers may not ensure their changes are reflected in any software to be deployed, and when not tested you could end up with broken functionality. A DBA might try some different index strategies on a slave, but not the master, and never implement or revert.

While some people want the quick and hidden just sync version akin to Rails, I really like this product as it produces proper patch and revert SQL scripts. As a DBA I really want to know what’s going to be applied to my schema, I want to be able to review it before it’s just magically applied. The revert process is also ideal, in time critical situations when you may need to rollback a change, either it’s taking to long to deploy.

This is definitely a tool to use over time, especially if you have multiple developers, DBA’s and more then one MySQL environment that should be the same.