Recently on a client site I had to fight the pain of having no way to confirm loss of data integrity when optimizing data types. Due to MySQL’s ability to perform silent conversion of data, when converting a number of columns we enabled sql_mode to catch any truncations as errors.
sql_mode=STRICT_ALL_TABLES
This ensured that should any data truncations occur, an error is thrown not a warning. The following shows an example case study for converting an INT to TINYINT UNSIGNED and shows that without sql_mode silent conversions occur.
mysql> drop schema if exists tmp; Query OK, 25 rows affected (0.40 sec) mysql> create schema tmp; Query OK, 1 row affected (0.01 sec) mysql> use tmp Database changed mysql> create table t1(i1 INT NULL); Query OK, 0 rows affected (0.15 sec) mysql> insert into t1 values(1),(2),(3),(256),(65536),(NULL); Query OK, 6 rows affected (0.06 sec) Records: 6 Duplicates: 0 Warnings: 0 mysql> select * from t1; +-------+ | i1 | +-------+ | 1 | | 2 | | 3 | | 256 | | 65536 | | NULL | +-------+ 6 rows in set (0.03 sec) mysql> set session sql_mode=STRICT_ALL_TABLES; Query OK, 0 rows affected (0.05 sec) mysql> alter table t1 modify i1 TINYINT UNSIGNED NULL; ERROR 1264 (22003): Out of range value for column 'i1' at row 4 mysql> set session sql_mode=''; Query OK, 0 rows affected (0.03 sec) mysql> alter table t1 modify i1 TINYINT UNSIGNED NULL; Query OK, 6 rows affected, 2 warnings (0.07 sec) Records: 6 Duplicates: 0 Warnings: 2 mysql> show warnings; +---------+------+---------------------------------------------+ | Level | Code | Message | +---------+------+---------------------------------------------+ | Warning | 1264 | Out of range value for column 'i1' at row 4 | | Warning | 1264 | Out of range value for column 'i1' at row 5 | +---------+------+---------------------------------------------+ 2 rows in set (0.00 sec) mysql> select * from t1; +------+ | i1 | +------+ | 1 | | 2 | | 3 | | 255 | | 255 | | NULL | +------+ 6 rows in set (0.00 sec)
The process however for the client included converting a number of DATETIME columns to DATE columns. The input data specification called for the input values to be dates only so storing and using DATETIME was simply a waste of space.
Unfortunately the use of sql_mode didn’t help here, so the combination of changing multiple columns in one time simply meant it was impossible to determine if any truncations occurred.
This combined with the sheer data volume, tables between 10GB – 500GB, with 10 – 100 million rows.
When you see 14+ millions rows modified, 35+million rows and of course SHOW WARNINGS by default only shows a very small number of actual warnings
It is of course impossible to run individual ALTER statements on columns. Of the top 10 tables, excluding the largest, alter times were between 3 and 15 hrs.
The only solution is to run pre SQL checks on the data to search for the boundary conditions manually. This is not the ideal solution for a customer.