Why SQL_MODE is important? Part I

MySQL pre version 5.0 was very lax in it’s management of valid data. It was easy for data integrity to be abused if you knew how. The most common examples were truncations and silent conversions that if not understood could provide a serious data integrity issue.

In version 5.0, the introduction of SQL_MODE solved this problem. We will look at one example of how SQL_MODE can be enabled to provided improved data integrity.

You want to store the individual RGB (red/green/blue) decimal values of colors in a table. Each of these has a range from 0 to 255. You read that you can store 255 values in a TINYINT Integer data type, so you create a table like:

DROP TABLE IF EXISTS color_to_decimal;
CREATE TABLE color_to_decimal(
name VARCHAR(20) NOT NULL PRIMARY KEY,
red    TINYINT NOT NULL,
green TINYINT NOT NULL,
blue   TINYINT NOT NULL);

You insert some data like:

INSERT INTO color_to_decimal (name, red,green,blue) VALUES ('white',255,255,255);
INSERT INTO color_to_decimal (name, red,green,blue) VALUES ('black',0,0,0);
INSERT INTO color_to_decimal (name, red,green,blue) VALUES ('red',255,0,0);
INSERT INTO color_to_decimal (name, red,green,blue) VALUES ('green',0,255,0);
INSERT INTO color_to_decimal (name, red,green,blue) VALUES ('blue',0,0,255);
INSERT INTO color_to_decimal (name, red,green,blue) VALUES ('yellow',255,255,0);

Great, but when you look at you data you get?

SELECT     name, red, green, blue
FROM       color_to_decimal
ORDER BY name;
+--------+-----+-------+------+
| name   | red | green | blue |
+--------+-----+-------+------+
| black  |   0 |     0 |    0 |
| blue   |   0 |     0 |  127 |
| green  |   0 |   127 |    0 |
| red    | 127 |     0 |    0 |
| white  | 127 |   127 |  127 |
| yellow | 127 |   127 |    0 |
+--------+-----+-------+------+
6 rows in set (0.01 sec)

What happened, you delete the data and re-insert only to find no changes. You have been the victim of a silent conversion, via a means of truncation.

The TINYINT data type is 1 byte (8 bits). 8 bits can store the values from 0 to 255. When you use this integer data type, only 7 bits are actually available, which gives the range of 0 to 127. Why? Because MySQL reserved one bit for the sign, either positive or negative, even though you didn’t want a sign.

So knowing this, you go back and recreate your table with the following definition.

DROP TABLE IF EXISTS color_to_decimal;
CREATE TABLE color_to_decimal(
name VARCHAR(20) NOT NULL PRIMARY KEY,
red    TINYINT UNSIGNED NOT NULL,
green TINYINT UNSIGNED NOT NULL,
blue   TINYINT UNSIGNED NOT NULL);

You load your data and look at it again, and you see.

+--------+-----+-------+------+
| name   | red | green | blue |
+--------+-----+-------+------+
| black  |   0 |     0 |    0 |
| blue   |   0 |     0 |  255 |
| green  |   0 |   255 |    0 |
| red    | 255 |     0 |    0 |
| white  | 255 |   255 |  255 |
| yellow | 255 |   255 |    0 |
+--------+-----+-------+------+
6 rows in set (0.00 sec)

But, should you have been told about this, should there have been an error. Well, in MySQL this is actually a warning, and most applications never support and cater for warnings. It is only when you use the MySQL client program, as in these examples, you are given an indication, with the following line after each insert. If you look closely.

mysql> INSERT INTO color_to_decimal (name, red,green,blue) VALUES ('blue',0,0,255);
Query OK, 1 row affected, 1 warning (0.00 sec)


However there is a savior for this situation, and that is SQL_MODE.

When set to the setting TRADITIONAL, an error and not a warning is generated, and most applications support catching errors. Look at what happens in our example using the original table.

SET SQL_MODE=TRADITIONAL;
DROP TABLE IF EXISTS color_to_decimal;
CREATE TABLE color_to_decimal(
name VARCHAR(20) NOT NULL PRIMARY KEY,
red    TINYINT NOT NULL,
green TINYINT NOT NULL,
blue   TINYINT NOT NULL);
INSERT INTO color_to_decimal (name, red,green,blue) VALUES ('white',255,255,255);
ERROR 1264 (22003): Out of range value for column 'red' at row 1


As an added benefit you get this data integrity for free. We didn’t test it, because we know the data coming in is in the range of 0-255, but what if the user entered 500 for example. Let’s see.

TRUNCATE TABLE color_to_decimal;
SET SQL_MODE='';
INSERT INTO color_to_decimal (name, red, green, blue) VALUES('a bad color',500,0,0);
SELECT name, red, green, blue FROM color_to_decimal;
SET SQL_MODE=TRADITIONAL;
INSERT INTO color_to_decimal (name, red, green, blue) VALUES('a bad color',500,0,0);

Looking closely at the response in the client.

mysql> TRUNCATE TABLE color_to_decimal;
Query OK, 0 rows affected (0.02 sec)

mysql> SET SQL_MODE='';
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO color_to_decimal (name, red, green, blue) VALUES('a bad color',500,0,0);
Query OK, 1 row affected, 1 warning (0.00 sec)

mysql> SELECT name, red, green, blue FROM color_to_decimal;
+-------------+-----+-------+------+
| name        | red | green | blue |
+-------------+-----+-------+------+
| a bad color | 127 |     0 |    0 |
+-------------+-----+-------+------+
1 row in set (0.00 sec)

mysql> SET SQL_MODE=TRADITIONAL;
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO color_to_decimal (name, red, green, blue) VALUES('a bad color',500,0,0);
ERROR 1264 (22003): Out of range value for column 'red' at row 1

As discussed in my presentation Top 20 design tips for Data Architects the UNSIGNED column construct should be always defined unless there is a specific reason not to.

In some respects I would argue that the default for an Integer column should be actually UNSIGNED, and that SIGNED should be specified when you want a sign. Most integer columns generally in schema’s only contain positive numbers. There are of course plenty of examples, positional geo data, financial data, medical data for example.

One could also argue that MySQL should make the default SQL_MODE at least TRADITIONAL, and that only when you want backward compatibility should you then change the SQL_MODE.

This is Part I on SQL_MODE, there are few more interesting cases to discuss at a later time.


About the Author

Ronald Bradford provides Consulting and Advisory Services in Data Architecture, Performance and Scalability for MySQL Solutions. An IT industry professional for two decades with extensive database experience in MySQL, Oracle and Ingres his expertise covers data architecture, software development, migration, performance analysis and production system implementations. His knowledge from 10 years of consulting across many industry sectors, technologies and countries has provided unique insight into being able to provide solutions to problems. For more information Contact Ronald.

References

Comments

  1. says

    Hi,

    “One could also argue that MySQL should make the default SQL_MODE at least TRADITIONAL, and that only when you want backward compatibility should you then change the SQL_MODE.”

    I’d like that too, except that many applications (esp. the typical PHP webapps) will instantly crap out on you (Try it). In fact, many applications do not withstand any non-default options, like the default engine, default auto_commit etc. (so it’s not just confined to SQL_MODE)

    As for sql_mode, both ANSI and TRADITIONAL are most likely to sink the typical PHP app. Many PHP apps simply use double quotes to delimit strings in their SQL statements, and many schemas simply rely on ‘0000-00-00′ dates.

    I have been told by some appdevs that I am shooting myself in the foot by changing things like the sql_mode at the server level, and that’s only fair that I should suffer from problems with their apps.

    I disagree.

    Personally I would urge every application developer to always have their code explicitly set the sql_mode (and any other settings, such as autocommit mode) at the session level immediately after connecting to MySQL to what suits their application best. For most applications that would boil down to a single statement:

    SET sql_mode := ”, autocommit = true, storage_engine = MyISAM;

    If you look at the installation documentation for most PHP apps, you will almost never see any reference to these kinds of MySQL config requirements. They simply assume a default installation, always.

    I don’t see this getting better anytime soon. If MySQL would change the default configuration, it would just lead to a lot of broken apps and scrolls and scrolls of slashdot rants.

  2. Mark R says

    I think that setting the server to TRADITIONAL for a server used in a service where well-tested code is used, which has been tested under the same circumstances, is a very good thing to do.

    Change is bad. Change will introduce bugs. Changing SQL_MODE is a major change to the way that MySQL works which will cause new bugs in existing code- therefore doing so in an existing application is significantly risky.

    One case where changing SQL_MODE might be acceptable is if you’re doing such a major change anyway that every single query must be tested- for example upgrading the MySQL version.

    Typical open source PHP applications are generally so buggy and have such poor error handling that even diagnosing or detecting bugs caused by SQL_MODE will be very time consuming. Convincing their maintainers that these are bugs proves difficult (in my experience). Don’t use such applications in production.

    For a new application- great – provided you are completely sure that the developers’ test environments will also use that setting (which should be easy, as your developers must always use a system which is as close in configuration to production as possible).

    Mark

  3. says

    “In version 5.0, the introduction of SQL_MODE solved this problem” I would say partially solved. It was a good step towards sanity, but there are still ways to get garbage in :)

  4. Paolo says

    sql_mode traditional doesn’t affect conversions to varchar fields.

    set session sql_mode=TRADITIONAL;
    create table nogood (string varchar(8));
    insert into uff (string) values (10);

    it inserts “10” into nogood. PostgreSQL would have given an error.
    I bet varchar are quite common as db fields and I googled up to this page after a bug triggered by some queries comparing a varchar field to 0 instead of to ‘0’. A different db (I’m thinking about psql) would have let us pick this bug as soon as we wrote it.