DP#4 The importance of using sql_mode

What if the data you retrieved from the database did not match the data the application claimed to had successfully stored? How comfortable would your organization feel about your skills and the products that are being used to store important information if data integrity was not guaranteed?

MySQL employs a terrible default technique known as silent truncation where the product determines that it knows about your data better than you. Never has the saying “do not assume” because it makes an “ass” out of “u” and “me” been more applicable.

Customer Example

A HTML form for new customers provide input fields for the customer first and last name. Good design was considered with the HTML form client validation to ensure that each field could not exceed 20 characters in length. However, the database design is different, where the first name is only defined as 10 characters. In most cases this is sufficient, however for first names longer than 10 characters, the data retrieved does not match the data that was apparently successfully stored because there was no SQL error. The following SQL reproduces this situation.

DROP TABLE IF EXISTS dp4;
CREATE TABLE dp4 (
  customer_id     INT UNSIGNED NOT NULL AUTO_INCREMENT,
  first_name      VARCHAR(10) NOT NULL,
  last_name       VARCHAR(20) NOT NULL,
  PRIMARY KEY (customer_id)
) ENGINE=InnoDB DEFAULT CHARSET utf8;

INSERT INTO dp4 (customer_id, first_name, last_name)
          VALUES (NULL,'Evangeline','Jones');
INSERT INTO dp4 (customer_id, first_name, last_name)
          VALUES (NULL,'Christopher','Smith');
INSERT INTO dp4 (customer_id, first_name, last_name)
          VALUES (NULL,'Alexander','Bell');

SELECT * FROM dp4;
+-------------+------------+-----------+
| customer_id | first_name | last_name |
+-------------+------------+-----------+
|           1 | Evangeline | Jones     |
|           2 | Christophe | Smith     |
|           3 | Alexander  | Bell      |
+-------------+------------+-----------+
3 rows in set (0.00 sec)

As you can see, the first name of Christopher Smith is not actually correctly stored in the database. MySQL DID NOT produce an error message, rather it performed a silent truncation of the data.

Defining sql_mode

To demonstrate what level of data integrity you should expect with MySQL, you must define the sql_mode configuration option. The following example demonstrates the dynamic syntax for a given connection and the error you should expect.

SET SESSION sql_mode='STRICT_ALL_TABLES,NO_ZERO_DATE,NO_ZERO_IN_DATE';

TRUNCATE TABLE dp4;
INSERT INTO dp4 (customer_id, first_name, last_name)
          VALUES (NULL,'Christopher','James');
ERROR 1406 (22001): Data too long for column 'first_name' at row 1
SELECT * FROM dp4;
Empty set (0.00 sec)

When MySQL is first installed the following configuration option should always be added to all environments as a default.

$ cat /etc/my/cnf
[mysqld]
sql_mode=STRICT_ALL_TABLES,NO_ZERO_DATE,NO_ZERO_IN_DATE,NO_ENGINE_SUBSTITUTION

Starting with MySQL 5.6, the default MySQL configuration files when created by MySQL define sql_mode variable with STRICT_ALL_TABLES,NO_ENGINE_SUBSTITUTION. This is a step in the right direction.

For more information, refer to the MySQL Reference Manual for sql_mode at http://dev.mysql.com/doc/refman/5.6/en/server-sql-mode.html

NOTE: MySQL provides many different options with sql_mode. Careful consideration is needed to determine which options are best for your application. Some options help in providing syntax and compatibility with other database products however these can affect and even break existing products written specifically for MySQL.

MySQL Warnings

The underlying cause of this loss of data integrity is how MySQL handles success and error conditions with SQL Statements. There are the obvious success and failure states, however MySQL has a third state known was warnings, or more specifically success with warnings. As the use of warnings is uncommon with other data store products, many applications, developers and programming languages ignore checking for warnings, or are simply unaware of this inbuilt feature.

Using the MySQL command line client, you can get a visual indication of warnings following an SQL statement which then help the need for reviewing what warnings occurred.

SET SESSION sql_mode='';
INSERT INTO dp4 (customer_id, first_name, last_name)
VALUES (NULL,'Christopher','Smith');
Query OK, 1 row affected, 1 warning (0.00 sec)

SHOW WARNINGS;
+---------+------+-------------------------------------------------+
| Level   | Code | Message                                         |
+---------+------+-------------------------------------------------+
| Warning | 1265 | Data truncated for column 'first_name' at row 1 |
+---------+------+-------------------------------------------------+
1 row in set (0.00 sec)

When using PHP there is no indication of SQL warnings unless you specifically check after every SQL statement. For example:

<?php
  $con = mysqli_connect('localhost', 'scott', 'sakila', 'design');
  if (mysqli_connect_errno()) {
    print 'Failed to connect to MySQL: ' . mysqli_connect_error() . "n";
    exit(1);
  }

  if (!mysqli_query($con, 'INSERT INTO dp4 (customer_id, first_name, last_name) ' .
                          'VALUES (NULL,"Christopher","Holt") ')) {
    print 'Failed to insert data: ' . mysqli_error($con) . "n";
  }
  if (($warnings = mysqli_warning_count($con)) > 0) {
    if ($rs = mysqli_query($con, "SHOW WARNINGS")) {
        $row = mysqli_fetch_row($rs);
        printf("%s (%d): %sn", $row[0], $row[1], $row[2]);
        mysqli_free_result($rs);
    }
  }
  mysqli_close($con);
?>

The best recommendation is to avoid all situations where MySQL can produce a warning and does not provide the best possible data integrity.

Refer to the MySQL Reference Manual for more information on SHOW WARNINGS at
http://dev.mysql.com/doc/refman/5.6/en/show-warnings.html

The Larger Problem

This underlying problem is actually more difficult to correct for an existing production system than enabling the correct sql_mode configuration option. Using the customer example, the identification of any rows that are 10 characters in length could be valid, or may have been truncated. There is no easy way to obtain the actual value that was originally submitted. The use of the correct numerical data type (DP#14) can provide a check constraint for values, however it can also suffer from the same truncation problem. You especially hope that this does not affect your payroll, your frequent flyer points balance or your accumulated score from your favorite online game.

The solution is to avoid the problem of producing incorrect data.

Review

While this example is using a character data type, field truncation can also occur with numeric and date data types. The use of applicable sql_mode configuration settings is a critical MySQL design practice to ensure adequate data integrity that all systems need to implement.

More References

About Expert MySQL Design Practices

This new series by leading MySQL Expert Ronald Bradford helps the software engineer understand, appreciate and develop the right skills and techniques to build scalable software solutions. These proven and reproducible design practices will ensure your use of MySQL to improve performance, scalability and reliability.

These expert design practices are from 25 years of professional experience following formal university qualifications in computer science. All of these practices are written for use with a MySQL based data system however most of the content in these practices predate the existence of the MySQL product and have stood the test of time with emerging technologies and software development approaches. Many practices apply directly to other data stores, whether relational or the new NoSQL products and include working with persistent and non-persistent data storage products.

More information about the series can be found at http://ronaldbradford.com/blog/expert-mysql-design-practices/