Additional DB objects in AWS RDS

To expand on Jervin’s Default RDS Account Privileges, RDS for MySQL provides a number of routines and triggers defined the the ‘mysql’ meta schema. These help in various tasks because the SUPER privilege is not provided.

SELECT routine_schema,routine_name
FROM information_schema.routines;
+----------------+-----------------------------------+
| routine_schema | routine_name                      |
+----------------+-----------------------------------+
| mysql          | rds_collect_global_status_history |
| mysql          | rds_disable_gsh_collector         |
| mysql          | rds_disable_gsh_rotation          |
| mysql          | rds_enable_gsh_collector          |
| mysql          | rds_enable_gsh_rotation           |
| mysql          | rds_kill                          |
| mysql          | rds_kill_query                    |
| mysql          | rds_rotate_general_log            |
| mysql          | rds_rotate_global_status_history  |
| mysql          | rds_rotate_slow_log               |
| mysql          | rds_set_configuration             |
| mysql          | rds_set_gsh_collector             |
| mysql          | rds_set_gsh_rotation              |
| mysql          | rds_show_configuration            |
| mysql          | rds_skip_repl_error               |
+----------------+-----------------------------------+
15 rows in set (0.00 sec)

SELECT trigger_schema, trigger_name,
          CONCAT(event_object_schema,'.',event_object_table) AS table_name,
          CONCAT(action_timing,' ',event_manipulation) AS trigger_action
FROM information_schema.triggers;
+----------------+--------------+------------+----------------+
| trigger_schema | trigger_name | table_name | trigger_action |
+----------------+--------------+------------+----------------+
| mysql          | block_proc_u | mysql.proc | BEFORE UPDATE  |
| mysql          | block_proc_d | mysql.proc | BEFORE DELETE  |
| mysql          | block_user_i | mysql.user | BEFORE INSERT  |
| mysql          | block_user_u | mysql.user | BEFORE UPDATE  |
| mysql          | block_user_d | mysql.user | BEFORE DELETE  |
+----------------+--------------+------------+----------------+

MySQL MHA and Perl pathing

I am certainly not a knowledgeable Perl person, however I recently ran into the error Can’t locate MHA/MasterMonitor.pm on Red Hat 6.x. I have installed MySQL MHA on various systems before without any issues.

$ masterha_manager -version
Can't locate MHA/MasterMonitor.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /usr/bin/masterha_manager line 26.
BEGIN failed--compilation aborted at /usr/bin/masterha_manager line 26.

The issue was that MySQL MHA is not installed in any of the acceptable default paths for this disto default installation.

$ find / -type d -name MHA
/usr/lib/perl5/vendor_perl/MHA

The fix was simple on this OS, but I expect there is a correct Perl approach?

ln -s /usr/lib/perl5/vendor_perl/MHA /usr/lib64/perl5/vendor_perl/
$ cat /etc/*release
Red Hat Enterprise Linux Server release 6.3 (Santiago)
Red Hat Enterprise Linux Server release 6.3 (Santiago)

Cloning MySQL 5.6 instances

A tip for all those cloud users that like cloning database servers (as reported in my book Effective MySQL – Replication Techniques in Depth).

Starting with MySQL 5.6, MySQL instances have a UUID. Cloning servers to quickly create slaves will result in the following error message.

mysql> SHOW SLAVE STATUSG
...
  Last_IO_Error: Fatal error: The slave I/O thread stops because master and slave have equal MySQL server UUIDs; these UUIDs must be different for replication to work.
...

The solution is simple. Clear our the file based configuration file (located in the MySQL datadir) and restart the MySQL instance.

$ rm -f /mysql/data/auto.cnf
$ service mysql restart

Unnecessary 3am (emergency) call

Help, the database is on fire. Well, it probably is but the solution may also be easy. Here are a few steps for the part-time MySQL DBA/sysadmin/developer. Total time to address this solution was 2 minutes, the inability to not go back to sleep, not priceless.

First, access to the DB server is possible, and also the mysql command line client. If you are getting “Too Many Connections” read Why GRANT ALL is bad.

Given the message “the database is on fire”, the likely cause is rogue SQL.

mysql> show global variables like 'max_connections';
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| max_connections | 5000  |
+-----------------+-------+
1 row in set (0.02 sec)

mysql> show global status like 'threads%';
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| Threads_cached    | 1     |
| Threads_connected | 5001  |
| Threads_created   | 5002  |
| Threads_running   | 5001  |
+-------------------+-------+
4 rows in set (0.01 sec)

Yeah, as expected. A SHOW PROCESSLIST in this situation shows all queries stuck in the state of “query end”, which indicates the obvious problem to me (a disk space problem).

For the record I do not recommend setting max_connections to 5,000. MySQL (and Linux) does not function with 5,000 concurrent connections (especially when they all want to create a disk based temporary table, but that’s another story). What is missing is a maximum threads running configuration option, and applicable coding practices and proactive server connection management to prevent such a situation.

Confirming my suspicions.

$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/xvde             7.9G  2.5G  5.0G  34% /
tmpfs                 7.3G     0  7.3G   0% /dev/shm
/dev/xvdf1            197G   42G  146G  23% /mysql
/dev/xvdh1             99G   94G  8.0K 100% /mysql/binlog

The solution here is simple, you need to remove old binary logs. It is NOT the recommendation you delete the files manually but use PURGE MASTER LOGS. However this produced an error.

mysql> purge master logs to 'mysql-bin.000029';
ERROR 3 (HY000): Error writing file '/mysql/binlog/mysql-bin.index_crash_safe' (Errcode: 28 - No space left on device)

That’s a new message, possibly a new MySQL 5.6 error situation. The only choice now is to remove a physical file first. A suggestion to MySQL engineers. Let’s create this file in normal operations with sufficient blank bytes, enabling MySQL to be able to create the file even when the disk is full, and then avoid manual file manipulation.

$ rm mysql-bin.000002

Then:

mysql> purge master logs to 'mysql-bin.000029';
Query OK, 0 rows affected, 1 warning (1.37 sec)
mysql> show warnings;
+---------+------+---------------------------------------------------------------+
| Level   | Code | Message                                                       |
+---------+------+---------------------------------------------------------------+
| Warning | 1612 | Being purged log /mysql/binlog/mysql-bin.000002 was not found |
+---------+------+---------------------------------------------------------------+
1 row in set (0.01 sec)

As always, a recommendation when problems occur is to check the error log (aka Monitoring the MySQL Error Log). This issue is also immediately apparent.

2013-08-23 06:28:36 30609 [Warning] Disk is full writing '/mysql/binlog/mysql-bin.000097' (Errcode: 28 - No space left on device). Waiting for someone to free space...
2013-08-23 06:28:36 30609 [Warning] Retry in 60 secs. Message reprinted in 600 secs

I would also like to point out that being proactive and having monitoring and instrumentation in ALL startups is critical if you want to be successful. Point 1 in my recommendations of Successful Scalability Principles.

Was it really an emergency?

Identifying empty tables in MySQL

The following simple INFORMATION_SCHEMA statement will identify and also verify tables that have no rows. These may be candidate tables to remove from your data model.

mysql --defaults-file=.my.cnf -N -e "select CONCAT('SELECT "',table_schema,'.',table_name,'" AS tbl, COUNT(*) AS cnt FROM ',table_schema,'.',table_name,';') as cmd from information_schema.tables where table_schema not in ('mysql','performance_schema','information_schema') and table_rows=0;" | mysql --defaults-file=.my.cnf -N

A simple FAILED SQL statement breaks MySQL 5.6 replication

I setup 6 shiny new MySQL 5.6.13 MySQL servers, ready for testing and production deployment tomorrow.

I found that the New Relic MySQL Monitoring was throwing the following error.

[2013-08-08 03:53:33 +0000] com.newrelic.plugins.mysql.MySQL | SEVERE | An SQL error occured running 'SHOW ENGINE INNODB STATUS' Access denied; you need (at least one of) the PROCESS privilege(s) for this operation

Simple fix, the user I am gathering metrics requires the PROCESS privilege. Again simple enough.

mysql> grant PROCESS on xxx.* to xxx@'10.%';
ERROR 1221 (HY000): Incorrect usage of DB GRANT and GLOBAL PRIVILEGES
mysql> grant PROCESS on *.* to xxx@'10.%';
Query OK, 0 rows affected (0.00 sec)

As you can see I got an error for a global privilege, again trivial, easy fix to correct syntax. However, it breaks replication with a very non descriptive message.

mysql> SHOW SLAVE STATUSG
...
Last_Errno: 1590
Last_Error: The incident LOST_EVENTS occured on the master. Message: error writing to the binary log
...

Bug #68892 reported this in April, verified by Oracle support, but is listed as ‘non-critical’. I agree with the bug author, given MySQL 5.6 touts many MySQL replication improvements, a simple failed statement should not break replication.

DP#8 The disadvantages of row at a time processing

It can be hard for software engineers to understand the following principle, however it is very important for improving performance and obtaining immediate scalability options. The principle is “Do Less Work”. That is, run less SQL statements.

Just one method to achieving the execution of less SQL statements is to eliminate Row At a Time (RAT) processing. In simple terms, do not perform identical repeating SQL statements in a loop. Relational algebra, and the Structure Query Language (SQL) specification is specifically designed to work with sets of data, or as I describe, Chunk At a Time (CAT) processing.

Customer Example

Your online social media website lets you send messages to multiple friends at one time. You enter the message, select the friends you want to receive the message and click send. While the user waits a moment and gets a success message, behind the scenes the application runs the following SQL statements to record your request.

START TRANSACTION;
INSERT INTO dp8_message_sent(message_id, user_id, message, created)
VALUES(NULL, 42, 'Hey guys. Just a reminder. The poker game will start on Friday at 8pm.',NOW());
SELECT @message_id :=LAST_INSERT_ID();
INSERT INTO dp8_message_recipient(message_id, from_user_id, to_user_id, status)
VALUES (@message_id,42,16,'New');
UPDATE dp8_user_notification
SET    new_message = 'Y',
       new_message_count = new_message_count + 1
WHERE  user_id = 16;
INSERT INTO dp8_message_recipient(message_id, from_user_id, to_user_id, status)
VALUES (@message_id,42,18,'New');
UPDATE dp8_user_notification
SET    new_message = 'Y',
       new_message_count = new_message_count + 1
WHERE  user_id = 18;
INSERT INTO dp8_message_recipient(message_id, from_user_id, to_user_id, status)
VALUES (@message_id,42,99,'New');
UPDATE dp8_user_notification
SET    new_message = 'Y',
       new_message_count = new_message_count + 1
WHERE  user_id = 99;
INSERT INTO dp8_message_recipient(message_id, from_user_id, to_user_id, status)
VALUES (@message_id,42,21,'New');
UPDATE dp8_user_notification
SET    new_message = 'Y',
       new_message_count = new_message_count + 1
WHERE  user_id = 21;
INSERT INTO dp8_message_recipient(message_id, from_user_id, to_user_id, status)
VALUES (@message_id,42,62,'New');
UPDATE dp8_user_notification
SET    new_message = 'Y',
       new_message_count = new_message_count + 1
WHERE  user_id = 62;
COMMIT;

You can define the table structures used in this example with:

DROP TABLE IF EXISTS dp8_message_sent;
CREATE TABLE dp8_message_sent(
  message_id INT UNSIGNED NOT NULL AUTO_INCREMENT,
  user_id     INT UNSIGNED NOT NULL,
  message  VARCHAR(500) NOT NULL,
  created     DATETIME NOT NULL,
  PRIMARY KEY(message_id),
  KEY(user_id)
) ENGINE=InnoDB CHARSET utf8;
DROP TABLE IF EXISTS dp8_message_recipient;
CREATE TABLE dp8_message_recipient(
  message_id     INT UNSIGNED NOT NULL,
  from_user_id     INT UNSIGNED NOT NULL,
  to_user_id     INT UNSIGNED NOT NULL,
  status   ENUM('New','Read','Deleted') NOT NULL,
  PRIMARY KEY(message_id,to_user_id),
  KEY(from_user_id)
) ENGINE=InnoDB CHARSET utf8;
DROP TABLE IF EXISTS dp8_user_notification;
CREATE TABLE dp8_user_notification(
  user_id INT UNSIGNED NOT NULL AUTO_INCREMENT,
  new_message ENUM ('Y','N') NOT NULL DEFAULT 'N',
  new_message_count INT UNSIGNED NOT NULL DEFAULT '0',
  PRIMARY KEY(user_id)
) ENGINE=InnoDB CHARSET utf8;

The average software developer may not see the problem here. In your test environment you executed 12 SQL statements and the code worked fine, i.e. it met the requirements for the function. However, while producing the correct result, this is a poor code approach.

This example shows not one repeating query, but two. Lucky you only sent the message to a few friends. If you sent it to 200 friends, you have a significant number more SQL statements to execute. This time the code executes 402 SQL statements for the same feature. The response time to the user is longer, the application connection has to remain open longer and the database has more work to do.

This popular site is sending thousands of messages per second, so the problem is compounded to produce an excess of unnecessary work, not just for the database, but the application web server connections as their are longer open requests.

The solution is straightforward. Remove repeating queries. It’s not rocket science. This is a simple design practice I teach as the problem is evident on most consulting engagements. Popular products including Drupal and WordPress also implement this poor practice and developers that extend these products propagate this poor practice excessively. If this development approach can be easily found in a few common functions, in it generally a clear indicator this problem can be found throughout the code.

Here is the same operation performed efficiently.

START TRANSACTION;
INSERT INTO dp8_message_sent(message_id, user_id, message, created)
VALUES(NULL, 42, 'Hey guys. Just a better reminder. The poker game will start on Friday at 8pm.',NOW());
INSERT INTO dp8_message_recipient(message_id, from_user_id, to_user_id, status)
VALUES
(LAST_INSERT_ID(),42,16,'New'),
(LAST_INSERT_ID(),42,18,'New'),
(LAST_INSERT_ID(),42,99,'New'),
(LAST_INSERT_ID(),42,21,'New'),
(LAST_INSERT_ID(),42,62,'New');
UPDATE dp8_user_notification
SET    new_message = 'Y',
       new_message_count = new_message_count + 1
WHERE  user_id IN (16,18,99,21,62);
COMMIT;

No matter how many friends you send a message to, only 3 SQL statements are executed. In these queries we see two different examples of leveraging the set capabilities of SQL to perform chunk at a time processing. We discuss the benefits of the multi-values INSERT in more detail with DP#10.

Customer Example 2

The following is a simple example for an online store processing function. Your shipping provider provides an update of all packages that were processed by them for a given date. For each packing tracking code that you have recorded with orders they provide a last known status. For example if the package were successfully delivered, is in transit, or has been returned.

A typical and very common developer process is to open the file, read each line looping through all the rows, and for each row perform a single update without using transactions like:

open file
for each line
do
  UPDATE dp8_order
  SET    last_shipping_status=?, last_shipping_update=?
  WHERE  tracking_code=?;
done
close file

As the size of data increases so does the processing time because you execute one statement per row. When there are are 10 packages, 10 SQL statements, when there are 300,000 packages, there are 300,000 SQL statements.

This batch process does not have a user response time requirement like online applications where performance is key to retaining your users. However, while eliminating row at a time processing is critical for providing a better user experience it is also just as important for batch processing.

stmt = 'INSERT INTO dp8_batch_tracking (batch_id, tracking_code, status, last_update) VALUES'
sep = ''

open file
for each line
do
  stmt = stmt + sep + '(42, ?, ?, ?)'
  sep  = ''
done
close file

START TRANSACTION;
EXECUTE IMMEDIATE stmt;
UPDATE dp8_order o, dp8_batch_tracking bt
SET    o.last_shipping_status=bt.status, o.last_shipping_update=bt.last_update
WHERE  bt.batch_id = 42
AND    bt.tracking_code = o.tracking_code;
--DELETE FROM batch_tracking WHERE batch_id=42;
COMMIT;

This example removes the one query per row problem, and results in just 2 SQL queries for processing the file regardless of size.

NOTE: In MySQL there is a limit to the length of the SQL statement (i.e. The INSERT). This can be adjusted with the max_allowed_packet variable which can be set per SQL statement. If you are processing very large files, the following code would be modified to perform the INSERT for ‘n’ records, however only a single UPDATE is still required. See DP#10 for an example of using max_allowed_packet.

This example shows just one way to optimize this operation with the least amount of code changes to the existing application. An even better approach is to use the LOAD DATA INFILE syntax to populate the batch table directly. This requires additional SQL privileges and file system access and hence is a more complex solution.

Why is the impact of removing these repeating queries so significant? To answer that question we need to look at the anatomy of the execution of an SQL statement.

SQL statement workflow

To the end user viewing your website with a browser, the result of clicking send on a webpage is a [short] delay before the expected results are displayed or the applicable action occurs. Behind the scenes an extensive amount of work is performed. For anybody that has looked at a waterfall chart showing the response from a web server, there is a far greater complexity for rendering the page you are looking at. The following article gives a good introduction to browser waterfall graphs — http://www.webperformancetoday.com/2010/07/09/waterfalls-101/. While the browser may render 100s of files, it is generally the first response, the actual page that is involved in executing the necessary SQL statements, and the focus of this design practice.

When a HTTP request is made to a web container the application performs a number of operations to satisfy the request and produce a response. With your application, regardless of the programming language, access to the MySQL database is performed by SQL statements. Each statement is passed to the language specific MySQL connector required with your web container. For example, when using the Apache HTTP server and the PHP programming language, the MySQL Native Driver (mysqlnd) is the necessary MySQL Connector. There are connectors for the popular languages including C, C++, Java, .Net, Python, Ruby etc.

Here is a short summarized list of what occurs with all SQL statements.

  1. The application executes an SQL statement.
  2. The MySQL client connector accepts the SQL statement then connects across the network to the specified MySQL server and passes the SQL statement to the MySQL server.
  3. The MySQL server processes all incoming SQL statements in individual threads, so many SQL statements can be executed concurrently.
  4. The MySQL server first parses the SQL statement for valid SQL syntax, and produces a summarized structure of the tables and columns used in the SQL statement.
  5. The MySQL server performs a security check to ensure the user that is requesting this SQL statement has the necessary privileges to be able to access/modify the information requested in the SQL statement.
  6. The MySQL server then passes the parsed SQL statement to the MySQL query optimizer. This is heart of the decision making process where the cost-based optimizer creates a decision tree, evaluates the various options by pruning the expensive paths to produce the optimal Query Execution Plan (QEP).
  7. The MySQL server then passes the QEP to the applicable MySQL storage engine(s) to perform the physical work of storing and/or retrieving the data for the given SQL statement.
  8. Depending on the type of query, the MySQL server may have to do additional work, for example to join multiple tables, sort results etc.
  9. When the MySQL server has produced the results for the SQL statement, these results are send back across the network to the application server.

NOTE: This is a simplified representation of the execution path of an SQL statement in MySQL. The use of the MySQL Query Cache discussed in QP#9 introduces additional steps and can also produce a significantly simplified and faster execution path.

To summarize, every SQL statement is passed to the MySQL server, the network overhead of points 2 and 9 are the most expensive amount of time in a well tuned MySQL application. This alone is the greatest reason to run less SQL statements.

Every SQL statement is parsed, checked for application permissions and optimized before execution. This is most applicable for example when combining INSERT statements with multiple VALUES clauses. In addition to saving the network round trip, this overhead is also eliminated by combining SQL statements.

Universal Application

This same principle can be applied to other products that process data. For example, memcache is a popular product to improve performance and scalability of your application by providing a memory caching layer. The following figures are for an example benchmark with 28 objects in memcache using two cloud servers in Rackspace Cloud.

Using an individual get call 28 times sequentially in a single PHP file, simulating a client example, the total response time of the benchmarked ranged from 24 to 56 milliseconds. Using the same configuration with a single multi-get call for the same 28 objects the results ranged from 4 to 7 milliseconds.

It does not require a graph to see the 6x-10x improvement in performance by eliminating row at a time processing. The saving of 20-50 milliseconds may seem small, however when multiplied in environments with thousands of concurrent users, thousands of times per second, has a large impact on resources.

Recap

This principle shows a simple technique for reducing the number of SQL statements by eliminate repeating queries. As a goal of “Do Less Work”, this is only one case. DP#16 discusses several other query saving techniques that can eliminate repeating and unwanted queries providing improved performance.

More References

About Expert MySQL Design Practices

This new series by leading MySQL Expert Ronald Bradford helps the software engineer understand, appreciate and develop the right skills and techniques to build scalable software solutions. These proven and reproducible design practices will ensure your use of MySQL to improve performance, scalability and reliability.

These expert design practices are from 25 years of professional experience following formal university qualifications in computer science. All of these practices are written for use with a MySQL based data system however most of the content in these practices predate the existence of the MySQL product and have stood the test of time with emerging technologies and software development approaches. Many practices apply directly to other data stores, whether relational or the new NoSQL products and include working with persistent and non-persistent data storage products.

More information about the series can be found at http://ronaldbradford.com/blog/expert-mysql-design-practices/

MySQL configuration file created in /usr/my.cnf

As part of uncovering a new security improvement in MySQL 5.6 with the default installation being secured with a unique root MySQL password, the MySQL provided rpm’s installed the MySQL configuration file in /usr.

I feel this maybe a bug, however this is just as an explanation for others to comment on.
The following steps reproduces this.

1. Launch a CentOS 6.4 AWS EC2 Instance

LOG=/tmp/create-instance.tmp.$$
INSTANCE_TYPE=m1.medium
ec2-run-instances ami-eb6b0182 --instance-type ${INSTANCE_TYPE} --region us-east-1 --availability-zone us-east-1b --key db --group db | tee ${LOG}
INSTANCE=`grep ^INSTANCE ${LOG} | awk '{print $2}'`
ec2-describe-instances ${INSTANCE} | tee ${LOG}
while [ `grep ^INSTANCE ${LOG} | awk '{print $6}'` != 'running' ] ;  do cat ${LOG}; sleep 5; ec2-describe-instances ${INSTANCE} | tee ${LOG}; done
SERVER=`grep ^INSTANCE ${LOG} | awk '{print $4}'`
echo "#instance=${INSTANCE},server=${SERVER}"

#instance=i-xxx,server=ec2-xxx.compute-1.amazonaws.com

2. Connect to server

ssh -i ~/.ssh/db.pem [email protected]

3. Install software

yum update -y

# Remove the Postfix MySQL dependency
rpm -qa | grep -i mysql
rpm -e --nodeps mysql-libs-5.1.69-1.el6_4.x86_64

# Just to prove no config here
ls -l /etc/my.cnf
ls: cannot access /etc/my.cnf: No such file or directory
ls -l /usr/my.cnf
ls: cannot access /usr/my.cnf: No such file or directory

# Install MySQL prerequisites
yum install -y libaio perl

cd /tmp
yum install -y wget
wget http://cdn.mysql.com/Downloads/MySQL-5.6/MySQL-5.6.13-1.el6.x86_64.rpm-bundle.tar
tar xvf MySQL-5.6.13-1.el6.x86_64.rpm-bundle.tar
rpm -ivh MySQL*.rpm

4. The results of the installation of the default MySQL 5.6 rpms.

A RANDOM PASSWORD HAS BEEN SET FOR THE MySQL root USER !
You will find that password in '/root/.mysql_secret'.

You must change that password on your first connect,
no other statement but 'SET PASSWORD' will be accepted.
See the manual for the semantics of the 'password expired' flag.

Also, the account for the anonymous user has been removed.

In addition, you can run:

  /usr/bin/mysql_secure_installation

which will also give you the option of removing the test database.
This is strongly recommended for production servers.

See the manual for more instructions.

Please report any problems with the /usr/bin/mysqlbug script!

The latest information about MySQL is available on the web at

http://www.mysql.com

Support MySQL by buying support/licenses at http://shop.mysql.com

New default config file was created as /usr/my.cnf and
will be used by default by the server when you start it.
You may edit this file to change server settings

As you can see via the rpm installation instructions, a new configuration file was created in “/usr/my.cnf”

$ cat /usr/my.cnf
# For advice on how to change settings please see
# http://dev.mysql.com/doc/refman/5.6/en/server-configuration-defaults.html

[mysqld]

# Remove leading # and set to the amount of RAM for the most important data
# cache in MySQL. Start at 70% of total RAM for dedicated server, else 10%.
# innodb_buffer_pool_size = 128M

# Remove leading # to turn on a very important data integrity option: logging
# changes to the binary log between backups.
# log_bin

# These are commonly set, remove the # and set as required.
# basedir = .....
# datadir = .....
# port = .....
# server_id = .....
# socket = .....

# Remove leading # to set options mainly useful for reporting servers.
# The server defaults are faster for transactions and fast SELECTs.
# Adjust sizes as needed, experiment to find the optimal values.
# join_buffer_size = 128M
# sort_buffer_size = 2M
# read_rnd_buffer_size = 2M

sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES

DP#4 The importance of using sql_mode

What if the data you retrieved from the database did not match the data the application claimed to had successfully stored? How comfortable would your organization feel about your skills and the products that are being used to store important information if data integrity was not guaranteed?

MySQL employs a terrible default technique known as silent truncation where the product determines that it knows about your data better than you. Never has the saying “do not assume” because it makes an “ass” out of “u” and “me” been more applicable.

Customer Example

A HTML form for new customers provide input fields for the customer first and last name. Good design was considered with the HTML form client validation to ensure that each field could not exceed 20 characters in length. However, the database design is different, where the first name is only defined as 10 characters. In most cases this is sufficient, however for first names longer than 10 characters, the data retrieved does not match the data that was apparently successfully stored because there was no SQL error. The following SQL reproduces this situation.

DROP TABLE IF EXISTS dp4;
CREATE TABLE dp4 (
  customer_id     INT UNSIGNED NOT NULL AUTO_INCREMENT,
  first_name      VARCHAR(10) NOT NULL,
  last_name       VARCHAR(20) NOT NULL,
  PRIMARY KEY (customer_id)
) ENGINE=InnoDB DEFAULT CHARSET utf8;

INSERT INTO dp4 (customer_id, first_name, last_name)
          VALUES (NULL,'Evangeline','Jones');
INSERT INTO dp4 (customer_id, first_name, last_name)
          VALUES (NULL,'Christopher','Smith');
INSERT INTO dp4 (customer_id, first_name, last_name)
          VALUES (NULL,'Alexander','Bell');

SELECT * FROM dp4;
+-------------+------------+-----------+
| customer_id | first_name | last_name |
+-------------+------------+-----------+
|           1 | Evangeline | Jones     |
|           2 | Christophe | Smith     |
|           3 | Alexander  | Bell      |
+-------------+------------+-----------+
3 rows in set (0.00 sec)

As you can see, the first name of Christopher Smith is not actually correctly stored in the database. MySQL DID NOT produce an error message, rather it performed a silent truncation of the data.

Defining sql_mode

To demonstrate what level of data integrity you should expect with MySQL, you must define the sql_mode configuration option. The following example demonstrates the dynamic syntax for a given connection and the error you should expect.

SET SESSION sql_mode='STRICT_ALL_TABLES,NO_ZERO_DATE,NO_ZERO_IN_DATE';

TRUNCATE TABLE dp4;
INSERT INTO dp4 (customer_id, first_name, last_name)
          VALUES (NULL,'Christopher','James');
ERROR 1406 (22001): Data too long for column 'first_name' at row 1
SELECT * FROM dp4;
Empty set (0.00 sec)

When MySQL is first installed the following configuration option should always be added to all environments as a default.

$ cat /etc/my/cnf
[mysqld]
sql_mode=STRICT_ALL_TABLES,NO_ZERO_DATE,NO_ZERO_IN_DATE,NO_ENGINE_SUBSTITUTION

Starting with MySQL 5.6, the default MySQL configuration files when created by MySQL define sql_mode variable with STRICT_ALL_TABLES,NO_ENGINE_SUBSTITUTION. This is a step in the right direction.

For more information, refer to the MySQL Reference Manual for sql_mode at http://dev.mysql.com/doc/refman/5.6/en/server-sql-mode.html

NOTE: MySQL provides many different options with sql_mode. Careful consideration is needed to determine which options are best for your application. Some options help in providing syntax and compatibility with other database products however these can affect and even break existing products written specifically for MySQL.

MySQL Warnings

The underlying cause of this loss of data integrity is how MySQL handles success and error conditions with SQL Statements. There are the obvious success and failure states, however MySQL has a third state known was warnings, or more specifically success with warnings. As the use of warnings is uncommon with other data store products, many applications, developers and programming languages ignore checking for warnings, or are simply unaware of this inbuilt feature.

Using the MySQL command line client, you can get a visual indication of warnings following an SQL statement which then help the need for reviewing what warnings occurred.

SET SESSION sql_mode='';
INSERT INTO dp4 (customer_id, first_name, last_name)
VALUES (NULL,'Christopher','Smith');
Query OK, 1 row affected, 1 warning (0.00 sec)

SHOW WARNINGS;
+---------+------+-------------------------------------------------+
| Level   | Code | Message                                         |
+---------+------+-------------------------------------------------+
| Warning | 1265 | Data truncated for column 'first_name' at row 1 |
+---------+------+-------------------------------------------------+
1 row in set (0.00 sec)

When using PHP there is no indication of SQL warnings unless you specifically check after every SQL statement. For example:

<?php
  $con = mysqli_connect('localhost', 'scott', 'sakila', 'design');
  if (mysqli_connect_errno()) {
    print 'Failed to connect to MySQL: ' . mysqli_connect_error() . "n";
    exit(1);
  }

  if (!mysqli_query($con, 'INSERT INTO dp4 (customer_id, first_name, last_name) ' .
                          'VALUES (NULL,"Christopher","Holt") ')) {
    print 'Failed to insert data: ' . mysqli_error($con) . "n";
  }
  if (($warnings = mysqli_warning_count($con)) > 0) {
    if ($rs = mysqli_query($con, "SHOW WARNINGS")) {
        $row = mysqli_fetch_row($rs);
        printf("%s (%d): %sn", $row[0], $row[1], $row[2]);
        mysqli_free_result($rs);
    }
  }
  mysqli_close($con);
?>

The best recommendation is to avoid all situations where MySQL can produce a warning and does not provide the best possible data integrity.

Refer to the MySQL Reference Manual for more information on SHOW WARNINGS at
http://dev.mysql.com/doc/refman/5.6/en/show-warnings.html

The Larger Problem

This underlying problem is actually more difficult to correct for an existing production system than enabling the correct sql_mode configuration option. Using the customer example, the identification of any rows that are 10 characters in length could be valid, or may have been truncated. There is no easy way to obtain the actual value that was originally submitted. The use of the correct numerical data type (DP#14) can provide a check constraint for values, however it can also suffer from the same truncation problem. You especially hope that this does not affect your payroll, your frequent flyer points balance or your accumulated score from your favorite online game.

The solution is to avoid the problem of producing incorrect data.

Review

While this example is using a character data type, field truncation can also occur with numeric and date data types. The use of applicable sql_mode configuration settings is a critical MySQL design practice to ensure adequate data integrity that all systems need to implement.

More References

About Expert MySQL Design Practices

This new series by leading MySQL Expert Ronald Bradford helps the software engineer understand, appreciate and develop the right skills and techniques to build scalable software solutions. These proven and reproducible design practices will ensure your use of MySQL to improve performance, scalability and reliability.

These expert design practices are from 25 years of professional experience following formal university qualifications in computer science. All of these practices are written for use with a MySQL based data system however most of the content in these practices predate the existence of the MySQL product and have stood the test of time with emerging technologies and software development approaches. Many practices apply directly to other data stores, whether relational or the new NoSQL products and include working with persistent and non-persistent data storage products.

More information about the series can be found at http://ronaldbradford.com/blog/expert-mysql-design-practices/

Unexplained (trivial) MySQL behavior

The -N or –skip-column-names is a convenient option with the mysql client to skip the header line of output.
However I found when viewing the output via the terminal, some interesting and unexplained output.

$ mysql -h*** -u*** -p -e "SELECT VARIABLE_NAME,VARIABLE_VALUE FROM INFORMATION_SCHEMA.GLOBAL_STATUS"
+--------------------------------+----------------+
| VARIABLE_NAME                  | VARIABLE_VALUE |
+--------------------------------+----------------+
| ABORTED_CLIENTS                | 710001         |
| ABORTED_CONNECTS               | 31             |
| BINLOG_CACHE_DISK_USE          | 0              |
| BINLOG_CACHE_USE               | 0              |
| BYTES_RECEIVED                 | 2522301004     |
| BYTES_SENT                     | 317785976      |
| COM_ADMIN_COMMANDS             | 2890667        |
| COM_ASSIGN_TO_KEYCACHE         | 0              |
| COM_ALTER_DB                   | 0              |
| COM_ALTER_DB_UPGRADE           | 0              |
$ mysql -N -h*** -u*** -p -e "SELECT VARIABLE_NAME,VARIABLE_VALUE FROM INFORMATION_SCHEMA.GLOBAL_STATUS"
+--------------------------------+--------------+
|                ABORTED_CLIENTS |       710001 |
|               ABORTED_CONNECTS |           31 |
|          BINLOG_CACHE_DISK_USE |            0 |
|               BINLOG_CACHE_USE |            0 |
|                 BYTES_RECEIVED |   2522947764 |
|                     BYTES_SENT |    348838502 |
|             COM_ADMIN_COMMANDS |      2890742 |


As you can see all the values of the first column are right aligned in terminal display. When written to a file (which is the intended outcome), the data is not.

$ mysql ... > /tmp/x
$ head /tmp/x
ABORTED_CLIENTS	710009
ABORTED_CONNECTS	31
BINLOG_CACHE_DISK_USE	0
BINLOG_CACHE_USE	0
BYTES_RECEIVED	2526760299
BYTES_SENT	530046795
COM_ADMIN_COMMANDS	2890742
COM_ASSIGN_TO_KEYCACHE	0
COM_ALTER_DB	0
COM_ALTER_DB_UPGRADE	0

Improved Security with MySQL 5.6

Installed on a clean CentOS 6.4 AWS instance.

sudo su -
cd /tmp
wget http://cdn.mysql.com/Downloads/MySQL-5.6/MySQL-5.6.13-1.el6.x86_64.rpm-bundle.tar
tar xvf MySQL-5.6.13-1.el6.x86_64.rpm-bundle.tar
yum install -y libaio perl
rpm -i MySQL*.rpm

The following output is the sign that security is being considered with new MySQL versions. Woot!

A RANDOM PASSWORD HAS BEEN SET FOR THE MySQL root USER !
You will find that password in '/root/.mysql_secret'.

You must change that password on your first connect,
no other statement but 'SET PASSWORD' will be accepted.
See the manual for the semantics of the 'password expired' flag.

Also, the account for the anonymous user has been removed.

In addition, you can run:

  /usr/bin/mysql_secure_installation

which will also give you the option of removing the test database.
This is strongly recommended for production servers.

See the manual for more instructions.

Please report any problems with the /usr/bin/mysqlbug script!

The latest information about MySQL is available on the web at

http://www.mysql.com

Support MySQL by buying support/licenses at http://shop.mysql.com

New default config file was created as /usr/my.cnf and
will be used by default by the server when you start it.
You may edit this file to change server settings

However, moving the MySQL configuration to /usr. WTF?