Managing SQL Drift: Ensuring Stability in Database Transitions

SQL drift is a significant challenge that occurs when SQL statements from an existing system produce unexpected results after migration to a new environment or system. These issues manifest in several critical ways: SQL statements may generate new execution errors, experience significant performance degradation, or yield differences in data integrity. Such challenges extend beyond simple compatibility issues, stemming from variations in database engines, optimization strategies, and SQL implementations. SQL drift represents a fundamental shift in how SQL behaves across different platforms and versions. Whether during on-premises to cloud migrations, transitions to managed services, database vendor switches, or even routine version upgrades, SQL drift presents a critical consideration for data-driven applications.

SQL drift frequently occurs during:

  • On-premises to cloud migrations
  • Cloud to managed service transitions
  • Cross-product migrations (e.g., switching database vendors)
  • Database version upgrades
  • Platform modernization efforts

The implications of SQL drift can be significant, leading to application instability, increased operational costs, and delayed migration timelines. The impact often extends to compromised data quality and results in a degraded user experience as systems become less reliable and responsive. Successfully managing SQL drift involves four key stages:

  1. Identification
  2. Prioritization
  3. Correction
  4. Validation

Identification is the critical first step in managing SQL drift, focusing on systematically discovering potential issues. This phase involves detecting SQL statements that may behave differently in the new environment, analyzing syntax compatibility across platforms, establishing performance baselines, and validating data outputs to ensure consistency.

Prioritization involves evaluating SQL drift issues based on business impact, risk assessment, resource allocation, and migration scheduling to determine the optimal order for resolution.

Correction addresses SQL drift through code remediation, performance optimization, syntax updates, and developing alternative solutions when necessary.

Validation confirms SQL drift corrections through comprehensive testing, performance verification against established baselines, and data integrity checks to ensure the corrected SQL maintains its intended functionality.

An effective way to demonstrate the impact of SQL drift is by using a sample collection of SQL statements executed across different versions of MySQL. The End of Life (EOL) for MySQL 5.7, coupled with AWS RDS and AWS RDS Aurora beginning extended support in 2024, has increased costs for organizations that are not proactive in managing database migrations. This situation is particularly common in development-focused teams that lack dedicated architecture and operations resources.

A MySQL demonstration of SQL Drift

Using a subset of SQL statements executed in MySQL 5.7 and subsequent MySQL versions 8.0, 8.4, 9.0, and 9.1,Next BaseLine can examine the impact of SQL drift. This output shows the changing state of errors, deprecations, warnings and notices for the 42 example SQL statements.

Example Output from Next BaseLine

In MySQL 5.7, the use of the keyword SQL_NO_CACHE in an SQL statement presents as a deprecated warning.

17 Deprecations
ID: 5, Hash: f31f2e99b2
  SQL: "SELECT SQL_NO_CACHE 1;"
  Deprecation: (1681) 'SQL_NO_CACHE' is deprecated and will be removed in a future release.

In MySQL 8.0, the MySQL Query Cache is removed, however the use of SQL_NO_CACHE in SQL statements is still valid. Even in the next GA version, 8.4, this SQL keyword is still on the deprecated list, and it continues to deprecated in the current 9.1 innovation release.

A different example of deprecated functions are ENCRYPT and DES_ENCRYPT.

ID: 17, Hash: 947fcef53a
  SQL: "SELECT ENCRYPT('BaseLine',1);"
  Deprecation: (1287) 'ENCRYPT' is deprecated and will be removed in a future release. Please use AES_ENCRYPT instead
ID: 18, Hash: 364c0ffbf4
  SQL: "SELECT DES_ENCRYPT('BaseLine');"
  Deprecation: (1287) 'DES_ENCRYPT' is deprecated and will be removed in a future release. Please use AES_ENCRYPT instead

In MySQL 8.0, these SQL statements produce a hard error. These actually present as internal functions that are not present in the schema used rather than a “FUNCTION does not exist”. (More on this later).

ID: 17, Hash: 947fcef53a
  SQL: "SELECT ENCRYPT('BaseLine',1);"
  Error 1370 (42000): execute command denied to user 'nextbaseline'@'%' for routine 'airport.ENCRYPT'
ID: 18, Hash: 364c0ffbf4
  SQL: "SELECT DES_ENCRYPT('BaseLine');"
  Error 1370 (42000): execute command denied to user 'nextbaseline'@'%' for routine 'airport.DES_ENCRYPT'

Some example GIS SQL statements that in MySQL 5.7 present as deprecated, however they each are a different error number.

ID: 19, Hash: f319748e0c
  SQL: "SELECT CONTAINS(ST_GeomFromText('POLYGON((0 0, 0 10, 10 10, 10 0, 0 0))'), ST_GeomFromText('POINT(5 5)'));"
  Deprecation: (1287) 'CONTAINS' is deprecated and will be removed in a future release. Please use MBRCONTAINS instead
ID: 20, Hash: d686267b19
  SQL: "SELECT ST_GeomFromWKB(Point(0, 0));"
  Deprecation: (3195) st_geometryfromwkb(geometry) is deprecated and will be replaced by st_srid(geometry, 0) in a future version. Use st_geometryfromwkb(st_aswkb(geometry), 0) instead.

In MySQL 8.0+, these two deprecated statements produce different error messages.

ID: 19, Hash: f319748e0c
  SQL: "SELECT CONTAINS(ST_GeomFromText('POLYGON((0 0, 0 10, 10 10, 10 0, 0 0))'), ST_GeomFromText('POINT(5 5)'));"
  Error 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '(ST_GeomFromText('POLYGON((0 0, 0 10, 10 10, 10 0, 0 0))'), ST_GeomFromText('POI' at line 1
ID: 20, Hash: d686267b19
  SQL: "SELECT ST_GeomFromWKB(Point(0, 0));"
  Error 3037 (22023): Invalid GIS data provided to function st_geomfromwkb.

Migrating a WordPress site

A more realistic example would involve taking production workload, such as WordPress running on a self-hosted MySQL 5.7 server, and assessing the potential impact of switching to MySQL 8.0 without upgrading the application (not a recommended approach). We have collected representative Production SQL statements for this WordPress setup, referred to as a BaseLine

After collecting SQL traffic and testing this workload against a MySQL 5.7 environment, previously unnoticed SQL warnings were highlighted for the team.

When executed against an upgraded MySQL 8.0 instance, problematic SQL statements were immediately identified. For a larger, more complex product, this process would help prioritize where resources are most needed.

A modern cloud database implementation

Finally, let’s consider TiDB from PingCap as an example of validating your application with a cloud implementation. Using the same small set of 42 SQL statements, TiDB has taken a proactive approach by entirely eliminating warnings in their MySQL protocol. In TiDB, SQL statements are now either valid SQL syntax or produce a hard error.

What was a deprecation for ENCRYPT is now a hard error. Also, a more correct error message is provided ‘FUNCTION does not exist’.

ID: 17, Hash: 947fcef53a
  SQL: "SELECT ENCRYPT('BaseLine',1);"
  Error 1305 (42000): FUNCTION ENCRYPT does not exist
ID: 18, Hash: 364c0ffbf4
  SQL: "SELECT DES_ENCRYPT('BaseLine');"
  Error 1305 (42000): FUNCTION DES_ENCRYPT does not exist

In MySQL 5.7, ENCODE was deprecated and in MySQL 8.0+ it was removed. In TiDB, it is a valid function.

TiDB also produces some interesting artifacts in error messages for SQL statements not seen with MySQL. An example is Error 1235 ... has only noop implementation in tidb now .... This syntax however shows that a setting can change the status of these SQL statements.

ID: 14, Hash: dbcb4b05a2
  SQL: "SELECT table_name, count(*) FROM information_schema.tables GROUP BY table_name ASC;"
  Error 1235 (42000): function GROUP BY expr ASC|DESC has only noop implementation in tidb now, use tidb_enable_noop_functions to enable these functions
...
ID: 26, Hash: 7369c77d51
  SQL: "SELECT SQL_CALC_FOUND_ROWS * FROM information_schema.schemata;"
  Error 1235 (42000): function SQL_CALC_FOUND_ROWS has only noop implementation in tidb now, use tidb_enable_noop_functions to enable these functions

Even during development, an interesting and unintended bug in early testing, resulted in an interesting error using TiDB.

ID: 31, Hash: 9cae50cbfc
  SQL: "SELECT DATE('2024-01-01   10:00:00'); /* Example of bad data causing warning */SELECT 'abc' AS full;"
  Error 8130 (HY000): client has multi-statement capability disabled. Run SET GLOBAL tidb_multi_statement_mode='ON' after you understand the security risk

Conclusion

Next BaseLine is now available in limited beta. Eliminate the uncertainty around “Will the migration work?” by performing an independent risk assessment of your product in a migrated database environment before committing to ad-hoc engineering efforts. If you’re interested in seeing a demo with your own SQL workload, you can register here.

Next BaseLine currently supports MySQL, PostgreSQL, Oracle, and SQL Server RDBMS products, covering both self-hosted and cloud-managed implementations across AWS, GCP, Azure, and Alibaba. It supports multiple MySQL- and PostgreSQL-compatible databases, including TiDB, SingleStore, Neon Serverless, Nile, ElephantSQL, TimeScale, and more. Additional compatibility is available for Snowflake, ClickHouse, and DuckDB.

Database testing for all version changes (including minor versions)

We know that SQL statement compatibility can change with major database version upgrades and that you should adequately test for them. But what about minor version upgrades?

It is dangerous to assume that your existing SQL statements work with a minor update, especially when using an augmented version of an open-source database such as a cloud provider that may not be as transparent about all changes.

While I have always found reading the release notes an important step in architectural principles over the decades, many organizations skip over this principle and get caught off guard when there are no dedicated DBAs and architects in the engineering workforce.

Real-world examples of minor version upgrade issues

Here are two real-world situations common in the AWS RDS ecosystem using MySQL.

  1. You are an organization that uses RDS Aurora MySQL for its production systems, and you upgrade one minor version at a time. A diligent approach is to be one minor version behind unless a known bug is fixed in a newer version you depend on.
  2. You are an organization that, to save costs with a comprehensive engineering team, uses AWS RDS MySQL (not Aurora) for developers and some testing environments.

I’ve simplified a real-world example to a simple SQL statement and combined these two separate use cases into one simulated situation for demonstration purposes.

mysql> SELECT content_type FROM reserved2;
Empty set (0.00 sec)

mysql> SELECT VERSION(), @@aurora_version;
+-----------+------------------+
| VERSION() | @@aurora_version |
+-----------+------------------+
| 8.0.28    | 3.04.2           |
+-----------+------------------+

mysql> SELECT VERSION();
+-----------+
| VERSION() |
+-----------+
| 8.0.34    |
+-----------+
1 row in set (0.00 sec)

This is a simple enough query, this runs in AWS RDS Aurora MySQL 3.04.02 (which is the present Aurora MySQL long-term support (LTS) release). This is based on MySQL 8.0.28 which is FWIW not a supported AWS RDS MySQL version anymore, the minimum is now 8.0.32 (Supported MySQL minor versions on Amazon RDS).

It runs in AWS RDS MySQL 8.0.34 which is for example what version your developer setup is.

An AWS RDS MySQL Aurora minor version upgrade

You decide to upgrade from Aurora 3.04.x/3.05.x to 3.06.x. This Aurora version is actually based on MySQL 8.0.34 (the version you just tested in RDS). Without adequate due diligence you roll out to production only to find after the fact that this SQL statement (realize this is one simplified example for demonstrate purposes) now breaks for no apparent reason.

mysql> select content_type from reserved2;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'content_type from reserved2' at line 1

mysql> SELECT VERSION(), @@aurora_version;
+-----------+------------------+
| VERSION() | @@aurora_version |
+-----------+------------------+
| 8.0.34    | 3.06.0           |
+-----------+------------------+

Now, you need to investigate the problem, which can take hours, even days of resource time, and a lot of shaking heads to realize it has nothing to do with your application code but to do with the minor version upgrade. Which you simply cannot roll back. See Risks from auto upgrades with managed database services for some interesting facts.

Wait, what just happened?

If you performed this upgrade to the latest AWS RDS Aurora MySQL 3.06.0 version sometime after the release on 3/7/24 and before 6/4/24, a 3-month period, you are left with one choice. You have to make application code changes to address the breakage.

How many man-hours/man-days does this take? If you upgraded to this version in the past two weeks, technically you have a second choice. You can go to the most current version, 3.07.0, but you have already spent time in testing and deploying 3.06.0, which you need to re-test, then rollout in non-production accounts and then rollout to production. How many man-days of work is this?

It may be hard to justify the cost of automated testing until you uncover a situation like this one; however, it can easily be avoided in the future.

So why did this happen?

Lets look deeper are the fine-print

RDS Aurora MySQL 3.06.0

Aurora MySQL version 3.06.0 supports Amazon Bedrock integration and introduces the new reserved keywords accept, aws_bedrock_invoke_model, aws_sagemaker_invoke_endpoint, content_type, and timeout_ms. Check the object definitions for the usage of the new reserved keywords before upgrading to version 3.06.0. To mitigate the conflict with the new reserved keywords, quote the reserved keywords used in the object definitions. For more information on the Amazon Bedrock integration and handling the reserved keywords, see What is Amazon Bedrock? in the Amazon Aurora User Guide. For additional information, see Keywords and Reserved Words, The INFORMATION_SCHEMA KEYWORDS Table, and Schema Object Names in the MySQL documentation.

From AWS RDS Aurora MySQL 3.06.0 release notes (3/7/24).

While less likely you would name a column aws_bedrock_invoke_model, column names of content_type and timeout_ms are common words.

RDS Aurora MySQL 3.07.0

Aurora MySQL version 3.06.0 added support for Amazon Bedrock integration. As part of this, new reserved keywords (accept, aws_bedrock_invoke_model, aws_sagemaker_invoke_endpoint, content_type, and timeout_ms) were added. In Aurora MySQL version 3.07.0, these keywords have been changed to nonreserved keywords, which are permitted as identifiers without quoting. For more information on how MySQL handles reserved and nonreserved keywords, see Keywords and reserved words in the MySQL documentation.

From AWS RDS Aurora MySQL 3.07.0 release notes (6/4/24). Clearly someone at AWS saw the breaking changes and it was reverted. While it’s possible many customers may not need to catch this situation, this is one specific use case.

Conclusion

The moral of the database story here is Be Prepared.

You should always be prepared for future breaking compatibility. You should test with a regular software upgrade cadence and leverage automation as much as possible.

Next BaseLine is a software product that automates testing for many use cases, including this simple SQL compatibility issue. By adding to your CI/CD pipeline can help identify and risk in all SQL database access, including new engineering software releases or infrastructure updates. This product can be implemented in a few hours, and cost significantly less than the large amount of time lost with this one realistic situation.

Next BaseLine - Helping to create a better and faster next version of your data-driven product

Footnote

This example was not uncovered from a customer situation. It was uncovered and used as a demonstration because I read the release notes.

Test Case


SELECT VERSION();
SELECT VERSION(), @@aurora_version; /* No way to comment out the !Aurora example */
CREATE SCHEMA IF NOT EXISTS test;
USE test;
CREATE TABLE reserved1(id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, accept CHAR(1) NOT NULL DEFAULT 'N');
CREATE TABLE reserved2(id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, content_type VARCHAR(10) NULL DEFAULT 'text/plain');
CREATE TABLE reserved3(id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, timeout_ms INT UNSIGNED NOT NULL);
SELECT accept FROM reserved1;
SELECT content_type FROM reserved2;
SELECT timeout_ms FROM reserved3;

The curse of MySQL warnings

MySQL warnings are an anti-pattern when it comes to maintaining data integrity. When the information retrieved from a database does not match what was entered, and this is not identified immediately, this can be permanently lost.

MySQL by default for several decades until the most recent versions enabled you to insert incorrect data, or insert data that was then truncated, or other patterns that resulted in failed data integrity. Very few applications considered handling warnings as errors, and there is a generation of software products that have never informed the developers that warnings were occurring.

The most simplest example is:

CREATE SCHEMA IF NOT EXISTS warnings;
USE warnings;

CREATE TABLE short_name(
  id INT UNSIGNED NOT NULL AUTO_INCREMENT,
  name VARCHAR(20) NOT NULL,
  PRIMARY KEY(id)
);

INSERT INTO short_name (name) VALUES ('This name is too long and will get truncated');
ERROR 1406 (22001): Data too long for column 'name' at row 1

This is what you expect would happen. In many, many applications IT DOES NOT.

For almost 20 years the default setting was to support possible data corruption

If you used an older version without setting up a more strict SQL_MODE from the default you end up with.

INSERT INTO short_name (name) VALUES ('This Name is too long and will get truncated');
Query OK, 1 row affected, 1 warning (0.00 sec)

SELECT * FROM short_name;
+----+----------------------+
| id | name                 |
+----+----------------------+
|  1 | This name is too lon |
+----+----------------------+
1 row in set (0.00 sec)

Only if you run SHOW WARNINGS and after the actual SQL statement would you know? There is no other way to find this information in any logs. There is no way to

mysql> SHOW WARNINGS;
+---------+------+-------------------------------------------+
| Level   | Code | Message                                   |
+---------+------+-------------------------------------------+
| Warning | 1265 | Data truncated for column 'name' at row 1 |
+---------+------+-------------------------------------------+
1 row in set (0.00 sec)

Numerous other examples can shock a customer when, after some time, expected data in a production is lost and unretrievable.

If you came from a more strict RDBMS background, or you tuned your MySQL installation or uncovered this and many other poor defaults, you would have improved your data integrity with and improved SQL_MODE.

So MySQL warnings are bad? No, they are ideal when used appropriately. However, the next critical dilemma occurs.

Warnings are valuable when used to identify important characteristics of an SQL statement that a developer or database administrator should be aware of. However, the only way to retrieve these warnings is from the application making the connection to the database at each statement, and generally, these warnings are just lost.

Here are some examples of warnings that are important for the engineering team that define criteria such as deprecation notices, which are important for production database upgrades.

SELECT JSON_MERGE('["a"]','["b"]'); 
Warning (Code 1287): 'JSON_MERGE' is deprecated and will be removed in a future release. Please use JSON_MERGE_PRESERVE/JSON_MERGE_PATCH instead

SELECT ST_GeomFromWKB(Point(0, 0));
Warning: (3195) st_geometryfromwkb(geometry) is deprecated and will be replaced by st_srid(geometry, 0) in a future version. Use st_geometryfromwkb(st_aswkb(geometry), 0) instead.

SELECT DATE('2024-01-01 10:00:00') 
Warning (Code 4096): Delimiter ' ' in position 11 in datetime value '2024-01-01 10:00:00' at row 1 is superfluous and is deprecated. Please remove.

SELECT BINARY 'a' = 'A' 
Warning (Code 1287): 'BINARY expr' is deprecated and will be removed in a future release. Please use CAST instead 

You definitely want to know about these, collect them (hard), add them to your backlog, and don’t leave it until its too late in the I can’t upgrade my database to have to address.

If you want to know about these, collect them (hard), add them to your backlog, and don’t leave it until it’s too late for a critical last-minute upgrade to my database to have to address.

There are also warnings that should be collected and used for performance verification, which apply to running systems. I wanted to show one specific example uncovered during testing of a MySQL upgrade to version 8.0.

Warning (Code 3170): Memory capacity of 8388608 bytes for 'range_optimizer_max_mem_size' exceeded. Range optimization was not done for this query.

In fact, this warning occurs in MySQL 5.7, but the customer never knew because they did not look at the warnings. How many other SQL statements in your application produce warnings now? How can you find this out?

It was rather easy to create a reproducible test case but what now?

  • Do you set range_optimizer_max_mem_size=0
  • Do you set to the value you need, which you can identify with SELECT * FROM performance_schema.memory_summary_by_thread_by_event_name WHERE thread_id=PS_CURRENT_THREAD_ID() AND event_name='memory/sql/test_quick_select'\G
  • Do you need to modify your optimizer_switch settings?
  • Do you try something else?
  • Do you refactor your application?
  • Do you just leave it as is?

When you want to consider several different options, which one works best for this query? What about the impact on your entire production workload? Knowing statistically which is the best choice for your full workload and under various conditions is the optimal output, but how?

Next BaseLine was built to perform experiments comparing changes to your data, configuration, and infrastructure to validate the next version of your product statistically performs better than your current version across all of your application at different workloads.

Next BaseLine also provides numerous benefits for a major database upgrade, so I’ve focussed on getting these capabilities to customers quicker to save money. It provides the benefit of detecting SQL statements that produce errors in the next MySQL version, enabling you to categorize and prioritize areas of your application that must be corrected. It also captures important information about the performance and quality of the data from your MySQL queries; this also can help in identifying the most critical aspects of your application to invest engineering time and mitigate risk in your database upgrade plan. It can also collect warning messages such as these discussed when considering migrating from MySQL 5.7 to MySQL 8, or it can just find them with your current application.

What is your pain point with MySQL database upgrades? What are you doing right now to help reduce this additional budget spend? Join our private beta program now to find out more.

Next BaseLine

Helping to create a better and faster next version of your data-driven product