Why SQL_MODE is important? Part I

MySQL pre version 5.0 was very lax in it’s management of valid data. It was easy for data integrity to be abused if you knew how. The most common examples were truncations and silent conversions that if not understood could provide a serious data integrity issue.

In version 5.0, the introduction of SQL_MODE solved this problem. We will look at one example of how SQL_MODE can be enabled to provided improved data integrity.

You want to store the individual RGB (red/green/blue) decimal values of colors in a table. Each of these has a range from 0 to 255. You read that you can store 255 values in a TINYINT Integer data type, so you create a table like:

DROP TABLE IF EXISTS color_to_decimal;
CREATE TABLE color_to_decimal(
name VARCHAR(20) NOT NULL PRIMARY KEY,
red    TINYINT NOT NULL,
green TINYINT NOT NULL,
blue   TINYINT NOT NULL);

You insert some data like:

INSERT INTO color_to_decimal (name, red,green,blue) VALUES ('white',255,255,255);
INSERT INTO color_to_decimal (name, red,green,blue) VALUES ('black',0,0,0);
INSERT INTO color_to_decimal (name, red,green,blue) VALUES ('red',255,0,0);
INSERT INTO color_to_decimal (name, red,green,blue) VALUES ('green',0,255,0);
INSERT INTO color_to_decimal (name, red,green,blue) VALUES ('blue',0,0,255);
INSERT INTO color_to_decimal (name, red,green,blue) VALUES ('yellow',255,255,0);

Great, but when you look at you data you get?

SELECT     name, red, green, blue
FROM       color_to_decimal
ORDER BY name;
+--------+-----+-------+------+
| name   | red | green | blue |
+--------+-----+-------+------+
| black  |   0 |     0 |    0 |
| blue   |   0 |     0 |  127 |
| green  |   0 |   127 |    0 |
| red    | 127 |     0 |    0 |
| white  | 127 |   127 |  127 |
| yellow | 127 |   127 |    0 |
+--------+-----+-------+------+
6 rows in set (0.01 sec)

What happened, you delete the data and re-insert only to find no changes. You have been the victim of a silent conversion, via a means of truncation.

The TINYINT data type is 1 byte (8 bits). 8 bits can store the values from 0 to 255. When you use this integer data type, only 7 bits are actually available, which gives the range of 0 to 127. Why? Because MySQL reserved one bit for the sign, either positive or negative, even though you didn’t want a sign.

So knowing this, you go back and recreate your table with the following definition.

DROP TABLE IF EXISTS color_to_decimal;
CREATE TABLE color_to_decimal(
name VARCHAR(20) NOT NULL PRIMARY KEY,
red    TINYINT UNSIGNED NOT NULL,
green TINYINT UNSIGNED NOT NULL,
blue   TINYINT UNSIGNED NOT NULL);

You load your data and look at it again, and you see.

+--------+-----+-------+------+
| name   | red | green | blue |
+--------+-----+-------+------+
| black  |   0 |     0 |    0 |
| blue   |   0 |     0 |  255 |
| green  |   0 |   255 |    0 |
| red    | 255 |     0 |    0 |
| white  | 255 |   255 |  255 |
| yellow | 255 |   255 |    0 |
+--------+-----+-------+------+
6 rows in set (0.00 sec)

But, should you have been told about this, should there have been an error. Well, in MySQL this is actually a warning, and most applications never support and cater for warnings. It is only when you use the MySQL client program, as in these examples, you are given an indication, with the following line after each insert. If you look closely.

mysql> INSERT INTO color_to_decimal (name, red,green,blue) VALUES ('blue',0,0,255);
Query OK, 1 row affected, 1 warning (0.00 sec)


However there is a savior for this situation, and that is SQL_MODE.

When set to the setting TRADITIONAL, an error and not a warning is generated, and most applications support catching errors. Look at what happens in our example using the original table.

SET SQL_MODE=TRADITIONAL;
DROP TABLE IF EXISTS color_to_decimal;
CREATE TABLE color_to_decimal(
name VARCHAR(20) NOT NULL PRIMARY KEY,
red    TINYINT NOT NULL,
green TINYINT NOT NULL,
blue   TINYINT NOT NULL);
INSERT INTO color_to_decimal (name, red,green,blue) VALUES ('white',255,255,255);
ERROR 1264 (22003): Out of range value for column 'red' at row 1


As an added benefit you get this data integrity for free. We didn’t test it, because we know the data coming in is in the range of 0-255, but what if the user entered 500 for example. Let’s see.

TRUNCATE TABLE color_to_decimal;
SET SQL_MODE='';
INSERT INTO color_to_decimal (name, red, green, blue) VALUES('a bad color',500,0,0);
SELECT name, red, green, blue FROM color_to_decimal;
SET SQL_MODE=TRADITIONAL;
INSERT INTO color_to_decimal (name, red, green, blue) VALUES('a bad color',500,0,0);

Looking closely at the response in the client.

mysql> TRUNCATE TABLE color_to_decimal;
Query OK, 0 rows affected (0.02 sec)

mysql> SET SQL_MODE='';
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO color_to_decimal (name, red, green, blue) VALUES('a bad color',500,0,0);
Query OK, 1 row affected, 1 warning (0.00 sec)

mysql> SELECT name, red, green, blue FROM color_to_decimal;
+-------------+-----+-------+------+
| name        | red | green | blue |
+-------------+-----+-------+------+
| a bad color | 127 |     0 |    0 |
+-------------+-----+-------+------+
1 row in set (0.00 sec)

mysql> SET SQL_MODE=TRADITIONAL;
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO color_to_decimal (name, red, green, blue) VALUES('a bad color',500,0,0);
ERROR 1264 (22003): Out of range value for column 'red' at row 1

As discussed in my presentation Top 20 design tips for Data Architects the UNSIGNED column construct should be always defined unless there is a specific reason not to.

In some respects I would argue that the default for an Integer column should be actually UNSIGNED, and that SIGNED should be specified when you want a sign. Most integer columns generally in schema’s only contain positive numbers. There are of course plenty of examples, positional geo data, financial data, medical data for example.

One could also argue that MySQL should make the default SQL_MODE at least TRADITIONAL, and that only when you want backward compatibility should you then change the SQL_MODE.

This is Part I on SQL_MODE, there are few more interesting cases to discuss at a later time.


About the Author

Ronald Bradford provides Consulting and Advisory Services in Data Architecture, Performance and Scalability for MySQL Solutions. An IT industry professional for two decades with extensive database experience in MySQL, Oracle and Ingres his expertise covers data architecture, software development, migration, performance analysis and production system implementations. His knowledge from 10 years of consulting across many industry sectors, technologies and countries has provided unique insight into being able to provide solutions to problems. For more information Contact Ronald .

References

Tagged with: Databases General MySQL

Related Posts

Creating a More Realistic Benchmark

Common benchmark approaches fall into two general categories, synthetic testing and realistic testing. You have the most generic operations from a synthetic test, starting with the most simple example using a single table, a single column, and for a single DML operation.

Read more

Testing, Benchmarking, Evaluating

Testing and benchmarking are widely used terms in software technology, each serving a distinct purpose and goal. With the increasing adoption of AI in software development, the term evaluating has become significant and with this the re-emergence of what is quality assurance.

Read more

Your Attack Vector Extends Beyond Production Systems

A common data security issue is the unprotected copying of production data to non-production environments without any redaction, masking, or filtering. This practice poses a serious risk. A malicious actor will target the weakest link in your infrastructure, including non-production accounts and the developer systems accessing them.

Read more