Mastering MySQL 5.7 EOL migrations

In a recent podcast on Mastering EOL Migrations: Lessons learned from MySQL 5.7 to 8.0 I discuss with my colleague Adam North not only the technical issues that become a major migration but also key business and management requirements with having a well-articulated strategy that covers:

  • Planning
  • Testing
  • Be Prepared
  • Proactiveness

Having a plan is key to any significant task including data migrations. You should heed the warnings and the deprecations and consider all potential downstream product impacts such as connector upgrades. The plan includes a timeline but also needs to define all the stakeholders both technical and business, the definition of a successful migration, and most importantly the decision tree for a non-successful migration that would include any outage, failback, rollback, or fix-forward requirements.

Test, Test, Test. Leveraging the simple design pattern of read-write splitting (hint: if your application does not support this, it should) you get to test with minimal risk all of your application reads and with real load from 1% to 100%. You can validate all writes but this does not match concurrency, however, you can emulate load testing and using this two-way door strategy, verify and prevent many common problems before the decision point of failover.

Being prepared is assuming your migration will fail, rather than assuming it will succeed. Rehearsal of all steps that are documented and reproducible. Validating that your backup and recovery strategy is still optimal and operational with the new version, preparing supporting staff for availability before, during, and after the migration. There are probably not many technologists that can say, “Well that was a boring, uneventful successful migration”. The question is why not?

Being proactive is just as important. Leaving a large migration to the last minute is procrastination and a cause of unneeded stress during a non-successful migration. The Meltdown/Spectre vulnerabilities are one example of a highly impactful event outside of your control that sidelined entire teams in many companies for months. Does an outage of your cloud provider impact your uptime requirements and force you to delay a last-minute migration due to customer SLA obligations? While being prepared is for the reasons you could think of, being proactive and prepared is for the situations you have not thought of.

Having solid architectural design practices will aid greatly in many critical business requirements of uptime, read-only mode, scale-out, scale-up, and sharding. These design patterns also greatly enhance the likelihood of a successful database migration.

We have also created a Checklist to cover the planning and execution of a migration. Any input is welcome.

You can check out the video podcast on YouTube or listen with your favorite podcast tool.

Defensive Data Techniques

As a data architect I always ensure that for any database schema change there a fully recoverable execution path.
I have generally advised to create a patch/revert process for every change.  For example, if a change adds a new column or index to a table, a revert script would remove the respective column or index.
The goal is to always have a defensive position for any changes. The concept is that simple, it is not complex.

In its simplest form I use the following directory and file structure.

/schema
    schema.sql
    /patch
        YYYYMMDDXX.sql     where XX,ZZ are sequential 2 digit numbers, e.g. 01,02
        YYYYMMDDZZ.sql
   /revert
       YYYYMMDDXX.sql   This is the same file name in the revert sub-directory.
       YYYYMMDDZZ.sql

At any commit or tag in configuration management it is possible to create a current copy of the schema, i.e. use schema.sql.
It is also possible to take the first version of schema.sql and apply chronologically all the patch scripts to arrive at the same consistent structure of the schema that is in schema.sql. You can also run a validation process to confirm these are equivalent.
For each tagged version or commit of this directory structure and files in version control, this should always hold true.
While not the desired execution path, every revert script can be applied in a reverse chronological order and return to the first version of the schema.
If you want to maintain a first_schema.sql file within the directory structure, you can always create any version of the schema from a given commit in a roll-forward or roll-back scenario.

In reality however this is rarely implemented. There is always divergence or drift. Drift occurs for several primary reasons. The first is non-adherence to the defined process. The second and more critical is the lack of adequate testing and verification at each and every step.  A Test Driven Design (TDD) that validates the given approach would enable a verification of end state of the schema and enable the verification at each accumulated

In addition to each patch/revert there needs to be a state that is maintained of what has been applied.  Generally for RDBMS storing this metadata within a table is recommended.

The above example shows files of .sql extension. Any schema management process needs to cater for .sh, .py or other extensions to cater for more complex operations.
 
What about data changes?  I would recommend that for all configuration information you follow the same management principles as for schema objects, that is you have a patch to insert/update/delete data, and you have a revert script that can restore that data.  Generally the complexity of the rollback process is a hurdle for developers/engineers. Having a framework is important to manage how data consistency is maintained. This framework could generate a statement to restore the data (e.g. a selective mysqldump), require a hand-crafted statement, or leverage the benefit of the RDBMS by storing the data into intermediate shallow tables.

Using a least privileged model complicates an applicable framework approach. Does the user applying the change now require the FILE privilege, or CREATE/DROP privilege to create tables for the ability to restore data.

If there is strict referential integrity at the database level, those protections will defend against unintended consequences. For example, deleting a row that is dependent on a foreign key relationship.  In a normal operating system accommodations are made generally for the sake of performance, but also for supporting poor data cleansing requirements. If the application maintains a level of referential integrity, the schema management process also needs to support this, adding a further complexity.  Ensuring data integrity is an important separate topic. If there is a dangling row, what is the impact? The data still exists, it is just not presented in a user interface or included in calculations. This generally leads to greater unintended consequences that are generally never obvious at the time of execution, but rather days, weeks or months later.

When it comes to objects within the structure of an RDBMS the situation is more complex.  A classic example in MySQL is a user.  A user in MySQL is actually the user definition which is just the username, password and host.  A user contains one or more grants. The user may be the owner of additional objects. Using default and legacy MySQL, it is simply not possible to determine if a user is actually being used. Percona and other variances support INFORMATION_SCHEMA.USER_STATISTICS which is a better method of evaluating the use of a user.  This does however require the intervention of time-based data collection, as this table is the accumulative statistics since an instance restart or flush.

With this type of object, or meta object several defensive techniques exist.  

If you had the user `blargie` and that user had grants to read data from several schemas, is the user used?  I don’t think so, let’s just delete it is not a fact-based approach to avoiding a subsequent problem.
Is the user used? Let’s revoke the users privileges and monitor for errors or user feedback? Or let’s change the user’s password?  With each of these strategies it is important to always have a defensive process to rollback.
A different approach is to use a common data technique of marking information as deleted before it’s physically deleted (think trash can before you empty the trash).  For MySQL users there is no default functionality (in the most recent versions of MySQL you can DISABLE a user).  One implementation to apply this pattern is to rename the user, which has the benefit of keeping the user’s password and privileges intack, therefore reducing the amount of complexity in restoring.

Regardless of the technique, it is important there is always a recovery path.  In a subsequent post I will discuss this approach towards cloud metadata, for example an AWS KMS policy, IAM Rule or ASG setting and the impact of  Infrastructure as a Service (IaaS) such as Terraform.

More reading https://en.wikipedia.org/wiki/Test-driven_development, https://en.wikipedia.org/wiki/Defensive_programming

MySQL Data Security Risk Assessment presentation

Securing your data is only as good as your weakest link. A clear-text password in a file or history file, shared privileges between test and production or open sudo access when you can connect as an unprivileged user all are security flaws. This talk discusses how to navigate the poor defaults MySQL has in place, how to strengthen processes and how to audit your environment. It also covers the complexity of deploying changes in an always available production environment.

Presented at the Data.Ops Conference in Barcelona, Spain.
Download slides

DP#8 The disadvantages of row at a time processing

It can be hard for software engineers to understand the following principle, however it is very important for improving performance and obtaining immediate scalability options. The principle is “Do Less Work”. That is, run less SQL statements.

Just one method to achieving the execution of less SQL statements is to eliminate Row At a Time (RAT) processing. In simple terms, do not perform identical repeating SQL statements in a loop. Relational algebra, and the Structure Query Language (SQL) specification is specifically designed to work with sets of data, or as I describe, Chunk At a Time (CAT) processing.

Customer Example

Your online social media website lets you send messages to multiple friends at one time. You enter the message, select the friends you want to receive the message and click send. While the user waits a moment and gets a success message, behind the scenes the application runs the following SQL statements to record your request.

START TRANSACTION;
INSERT INTO dp8_message_sent(message_id, user_id, message, created)
VALUES(NULL, 42, 'Hey guys. Just a reminder. The poker game will start on Friday at 8pm.',NOW());
SELECT @message_id :=LAST_INSERT_ID();
INSERT INTO dp8_message_recipient(message_id, from_user_id, to_user_id, status)
VALUES (@message_id,42,16,'New');
UPDATE dp8_user_notification
SET    new_message = 'Y',
       new_message_count = new_message_count + 1
WHERE  user_id = 16;
INSERT INTO dp8_message_recipient(message_id, from_user_id, to_user_id, status)
VALUES (@message_id,42,18,'New');
UPDATE dp8_user_notification
SET    new_message = 'Y',
       new_message_count = new_message_count + 1
WHERE  user_id = 18;
INSERT INTO dp8_message_recipient(message_id, from_user_id, to_user_id, status)
VALUES (@message_id,42,99,'New');
UPDATE dp8_user_notification
SET    new_message = 'Y',
       new_message_count = new_message_count + 1
WHERE  user_id = 99;
INSERT INTO dp8_message_recipient(message_id, from_user_id, to_user_id, status)
VALUES (@message_id,42,21,'New');
UPDATE dp8_user_notification
SET    new_message = 'Y',
       new_message_count = new_message_count + 1
WHERE  user_id = 21;
INSERT INTO dp8_message_recipient(message_id, from_user_id, to_user_id, status)
VALUES (@message_id,42,62,'New');
UPDATE dp8_user_notification
SET    new_message = 'Y',
       new_message_count = new_message_count + 1
WHERE  user_id = 62;
COMMIT;

You can define the table structures used in this example with:

DROP TABLE IF EXISTS dp8_message_sent;
CREATE TABLE dp8_message_sent(
  message_id INT UNSIGNED NOT NULL AUTO_INCREMENT,
  user_id     INT UNSIGNED NOT NULL,
  message  VARCHAR(500) NOT NULL,
  created     DATETIME NOT NULL,
  PRIMARY KEY(message_id),
  KEY(user_id)
) ENGINE=InnoDB CHARSET utf8;
DROP TABLE IF EXISTS dp8_message_recipient;
CREATE TABLE dp8_message_recipient(
  message_id     INT UNSIGNED NOT NULL,
  from_user_id     INT UNSIGNED NOT NULL,
  to_user_id     INT UNSIGNED NOT NULL,
  status   ENUM('New','Read','Deleted') NOT NULL,
  PRIMARY KEY(message_id,to_user_id),
  KEY(from_user_id)
) ENGINE=InnoDB CHARSET utf8;
DROP TABLE IF EXISTS dp8_user_notification;
CREATE TABLE dp8_user_notification(
  user_id INT UNSIGNED NOT NULL AUTO_INCREMENT,
  new_message ENUM ('Y','N') NOT NULL DEFAULT 'N',
  new_message_count INT UNSIGNED NOT NULL DEFAULT '0',
  PRIMARY KEY(user_id)
) ENGINE=InnoDB CHARSET utf8;

The average software developer may not see the problem here. In your test environment you executed 12 SQL statements and the code worked fine, i.e. it met the requirements for the function. However, while producing the correct result, this is a poor code approach.

This example shows not one repeating query, but two. Lucky you only sent the message to a few friends. If you sent it to 200 friends, you have a significant number more SQL statements to execute. This time the code executes 402 SQL statements for the same feature. The response time to the user is longer, the application connection has to remain open longer and the database has more work to do.

This popular site is sending thousands of messages per second, so the problem is compounded to produce an excess of unnecessary work, not just for the database, but the application web server connections as their are longer open requests.

The solution is straightforward. Remove repeating queries. It’s not rocket science. This is a simple design practice I teach as the problem is evident on most consulting engagements. Popular products including Drupal and WordPress also implement this poor practice and developers that extend these products propagate this poor practice excessively. If this development approach can be easily found in a few common functions, in it generally a clear indicator this problem can be found throughout the code.

Here is the same operation performed efficiently.

START TRANSACTION;
INSERT INTO dp8_message_sent(message_id, user_id, message, created)
VALUES(NULL, 42, 'Hey guys. Just a better reminder. The poker game will start on Friday at 8pm.',NOW());
INSERT INTO dp8_message_recipient(message_id, from_user_id, to_user_id, status)
VALUES
(LAST_INSERT_ID(),42,16,'New'),
(LAST_INSERT_ID(),42,18,'New'),
(LAST_INSERT_ID(),42,99,'New'),
(LAST_INSERT_ID(),42,21,'New'),
(LAST_INSERT_ID(),42,62,'New');
UPDATE dp8_user_notification
SET    new_message = 'Y',
       new_message_count = new_message_count + 1
WHERE  user_id IN (16,18,99,21,62);
COMMIT;

No matter how many friends you send a message to, only 3 SQL statements are executed. In these queries we see two different examples of leveraging the set capabilities of SQL to perform chunk at a time processing. We discuss the benefits of the multi-values INSERT in more detail with DP#10.

Customer Example 2

The following is a simple example for an online store processing function. Your shipping provider provides an update of all packages that were processed by them for a given date. For each packing tracking code that you have recorded with orders they provide a last known status. For example if the package were successfully delivered, is in transit, or has been returned.

A typical and very common developer process is to open the file, read each line looping through all the rows, and for each row perform a single update without using transactions like:

open file
for each line
do
  UPDATE dp8_order
  SET    last_shipping_status=?, last_shipping_update=?
  WHERE  tracking_code=?;
done
close file

As the size of data increases so does the processing time because you execute one statement per row. When there are are 10 packages, 10 SQL statements, when there are 300,000 packages, there are 300,000 SQL statements.

This batch process does not have a user response time requirement like online applications where performance is key to retaining your users. However, while eliminating row at a time processing is critical for providing a better user experience it is also just as important for batch processing.

stmt = 'INSERT INTO dp8_batch_tracking (batch_id, tracking_code, status, last_update) VALUES'
sep = ''

open file
for each line
do
  stmt = stmt + sep + '(42, ?, ?, ?)'
  sep  = ''
done
close file

START TRANSACTION;
EXECUTE IMMEDIATE stmt;
UPDATE dp8_order o, dp8_batch_tracking bt
SET    o.last_shipping_status=bt.status, o.last_shipping_update=bt.last_update
WHERE  bt.batch_id = 42
AND    bt.tracking_code = o.tracking_code;
--DELETE FROM batch_tracking WHERE batch_id=42;
COMMIT;

This example removes the one query per row problem, and results in just 2 SQL queries for processing the file regardless of size.

NOTE: In MySQL there is a limit to the length of the SQL statement (i.e. The INSERT). This can be adjusted with the max_allowed_packet variable which can be set per SQL statement. If you are processing very large files, the following code would be modified to perform the INSERT for ‘n’ records, however only a single UPDATE is still required. See DP#10 for an example of using max_allowed_packet.

This example shows just one way to optimize this operation with the least amount of code changes to the existing application. An even better approach is to use the LOAD DATA INFILE syntax to populate the batch table directly. This requires additional SQL privileges and file system access and hence is a more complex solution.

Why is the impact of removing these repeating queries so significant? To answer that question we need to look at the anatomy of the execution of an SQL statement.

SQL statement workflow

To the end user viewing your website with a browser, the result of clicking send on a webpage is a [short] delay before the expected results are displayed or the applicable action occurs. Behind the scenes an extensive amount of work is performed. For anybody that has looked at a waterfall chart showing the response from a web server, there is a far greater complexity for rendering the page you are looking at. The following article gives a good introduction to browser waterfall graphs — http://www.webperformancetoday.com/2010/07/09/waterfalls-101/. While the browser may render 100s of files, it is generally the first response, the actual page that is involved in executing the necessary SQL statements, and the focus of this design practice.

When a HTTP request is made to a web container the application performs a number of operations to satisfy the request and produce a response. With your application, regardless of the programming language, access to the MySQL database is performed by SQL statements. Each statement is passed to the language specific MySQL connector required with your web container. For example, when using the Apache HTTP server and the PHP programming language, the MySQL Native Driver (mysqlnd) is the necessary MySQL Connector. There are connectors for the popular languages including C, C++, Java, .Net, Python, Ruby etc.

Here is a short summarized list of what occurs with all SQL statements.

  1. The application executes an SQL statement.
  2. The MySQL client connector accepts the SQL statement then connects across the network to the specified MySQL server and passes the SQL statement to the MySQL server.
  3. The MySQL server processes all incoming SQL statements in individual threads, so many SQL statements can be executed concurrently.
  4. The MySQL server first parses the SQL statement for valid SQL syntax, and produces a summarized structure of the tables and columns used in the SQL statement.
  5. The MySQL server performs a security check to ensure the user that is requesting this SQL statement has the necessary privileges to be able to access/modify the information requested in the SQL statement.
  6. The MySQL server then passes the parsed SQL statement to the MySQL query optimizer. This is heart of the decision making process where the cost-based optimizer creates a decision tree, evaluates the various options by pruning the expensive paths to produce the optimal Query Execution Plan (QEP).
  7. The MySQL server then passes the QEP to the applicable MySQL storage engine(s) to perform the physical work of storing and/or retrieving the data for the given SQL statement.
  8. Depending on the type of query, the MySQL server may have to do additional work, for example to join multiple tables, sort results etc.
  9. When the MySQL server has produced the results for the SQL statement, these results are send back across the network to the application server.

NOTE: This is a simplified representation of the execution path of an SQL statement in MySQL. The use of the MySQL Query Cache discussed in QP#9 introduces additional steps and can also produce a significantly simplified and faster execution path.

To summarize, every SQL statement is passed to the MySQL server, the network overhead of points 2 and 9 are the most expensive amount of time in a well tuned MySQL application. This alone is the greatest reason to run less SQL statements.

Every SQL statement is parsed, checked for application permissions and optimized before execution. This is most applicable for example when combining INSERT statements with multiple VALUES clauses. In addition to saving the network round trip, this overhead is also eliminated by combining SQL statements.

Universal Application

This same principle can be applied to other products that process data. For example, memcache is a popular product to improve performance and scalability of your application by providing a memory caching layer. The following figures are for an example benchmark with 28 objects in memcache using two cloud servers in Rackspace Cloud.

Using an individual get call 28 times sequentially in a single PHP file, simulating a client example, the total response time of the benchmarked ranged from 24 to 56 milliseconds. Using the same configuration with a single multi-get call for the same 28 objects the results ranged from 4 to 7 milliseconds.

It does not require a graph to see the 6x-10x improvement in performance by eliminating row at a time processing. The saving of 20-50 milliseconds may seem small, however when multiplied in environments with thousands of concurrent users, thousands of times per second, has a large impact on resources.

Recap

This principle shows a simple technique for reducing the number of SQL statements by eliminate repeating queries. As a goal of “Do Less Work”, this is only one case. DP#16 discusses several other query saving techniques that can eliminate repeating and unwanted queries providing improved performance.

More References

About Expert MySQL Design Practices

This new series by leading MySQL Expert Ronald Bradford helps the software engineer understand, appreciate and develop the right skills and techniques to build scalable software solutions. These proven and reproducible design practices will ensure your use of MySQL to improve performance, scalability and reliability.

These expert design practices are from 25 years of professional experience following formal university qualifications in computer science. All of these practices are written for use with a MySQL based data system however most of the content in these practices predate the existence of the MySQL product and have stood the test of time with emerging technologies and software development approaches. Many practices apply directly to other data stores, whether relational or the new NoSQL products and include working with persistent and non-persistent data storage products.

More information about the series can be found at http://ronaldbradford.com/blog/expert-mysql-design-practices/

DP#4 The importance of using sql_mode

What if the data you retrieved from the database did not match the data the application claimed to had successfully stored? How comfortable would your organization feel about your skills and the products that are being used to store important information if data integrity was not guaranteed?

MySQL employs a terrible default technique known as silent truncation where the product determines that it knows about your data better than you. Never has the saying “do not assume” because it makes an “ass” out of “u” and “me” been more applicable.

Customer Example

A HTML form for new customers provide input fields for the customer first and last name. Good design was considered with the HTML form client validation to ensure that each field could not exceed 20 characters in length. However, the database design is different, where the first name is only defined as 10 characters. In most cases this is sufficient, however for first names longer than 10 characters, the data retrieved does not match the data that was apparently successfully stored because there was no SQL error. The following SQL reproduces this situation.

DROP TABLE IF EXISTS dp4;
CREATE TABLE dp4 (
  customer_id     INT UNSIGNED NOT NULL AUTO_INCREMENT,
  first_name      VARCHAR(10) NOT NULL,
  last_name       VARCHAR(20) NOT NULL,
  PRIMARY KEY (customer_id)
) ENGINE=InnoDB DEFAULT CHARSET utf8;

INSERT INTO dp4 (customer_id, first_name, last_name)
          VALUES (NULL,'Evangeline','Jones');
INSERT INTO dp4 (customer_id, first_name, last_name)
          VALUES (NULL,'Christopher','Smith');
INSERT INTO dp4 (customer_id, first_name, last_name)
          VALUES (NULL,'Alexander','Bell');

SELECT * FROM dp4;
+-------------+------------+-----------+
| customer_id | first_name | last_name |
+-------------+------------+-----------+
|           1 | Evangeline | Jones     |
|           2 | Christophe | Smith     |
|           3 | Alexander  | Bell      |
+-------------+------------+-----------+
3 rows in set (0.00 sec)

As you can see, the first name of Christopher Smith is not actually correctly stored in the database. MySQL DID NOT produce an error message, rather it performed a silent truncation of the data.

Defining sql_mode

To demonstrate what level of data integrity you should expect with MySQL, you must define the sql_mode configuration option. The following example demonstrates the dynamic syntax for a given connection and the error you should expect.

SET SESSION sql_mode='STRICT_ALL_TABLES,NO_ZERO_DATE,NO_ZERO_IN_DATE';

TRUNCATE TABLE dp4;
INSERT INTO dp4 (customer_id, first_name, last_name)
          VALUES (NULL,'Christopher','James');
ERROR 1406 (22001): Data too long for column 'first_name' at row 1
SELECT * FROM dp4;
Empty set (0.00 sec)

When MySQL is first installed the following configuration option should always be added to all environments as a default.

$ cat /etc/my/cnf
[mysqld]
sql_mode=STRICT_ALL_TABLES,NO_ZERO_DATE,NO_ZERO_IN_DATE,NO_ENGINE_SUBSTITUTION

Starting with MySQL 5.6, the default MySQL configuration files when created by MySQL define sql_mode variable with STRICT_ALL_TABLES,NO_ENGINE_SUBSTITUTION. This is a step in the right direction.

For more information, refer to the MySQL Reference Manual for sql_mode at http://dev.mysql.com/doc/refman/5.6/en/server-sql-mode.html

NOTE: MySQL provides many different options with sql_mode. Careful consideration is needed to determine which options are best for your application. Some options help in providing syntax and compatibility with other database products however these can affect and even break existing products written specifically for MySQL.

MySQL Warnings

The underlying cause of this loss of data integrity is how MySQL handles success and error conditions with SQL Statements. There are the obvious success and failure states, however MySQL has a third state known was warnings, or more specifically success with warnings. As the use of warnings is uncommon with other data store products, many applications, developers and programming languages ignore checking for warnings, or are simply unaware of this inbuilt feature.

Using the MySQL command line client, you can get a visual indication of warnings following an SQL statement which then help the need for reviewing what warnings occurred.

SET SESSION sql_mode='';
INSERT INTO dp4 (customer_id, first_name, last_name)
VALUES (NULL,'Christopher','Smith');
Query OK, 1 row affected, 1 warning (0.00 sec)

SHOW WARNINGS;
+---------+------+-------------------------------------------------+
| Level   | Code | Message                                         |
+---------+------+-------------------------------------------------+
| Warning | 1265 | Data truncated for column 'first_name' at row 1 |
+---------+------+-------------------------------------------------+
1 row in set (0.00 sec)

When using PHP there is no indication of SQL warnings unless you specifically check after every SQL statement. For example:

<?php
  $con = mysqli_connect('localhost', 'scott', 'sakila', 'design');
  if (mysqli_connect_errno()) {
    print 'Failed to connect to MySQL: ' . mysqli_connect_error() . "n";
    exit(1);
  }

  if (!mysqli_query($con, 'INSERT INTO dp4 (customer_id, first_name, last_name) ' .
                          'VALUES (NULL,"Christopher","Holt") ')) {
    print 'Failed to insert data: ' . mysqli_error($con) . "n";
  }
  if (($warnings = mysqli_warning_count($con)) > 0) {
    if ($rs = mysqli_query($con, "SHOW WARNINGS")) {
        $row = mysqli_fetch_row($rs);
        printf("%s (%d): %sn", $row[0], $row[1], $row[2]);
        mysqli_free_result($rs);
    }
  }
  mysqli_close($con);
?>

The best recommendation is to avoid all situations where MySQL can produce a warning and does not provide the best possible data integrity.

Refer to the MySQL Reference Manual for more information on SHOW WARNINGS at
http://dev.mysql.com/doc/refman/5.6/en/show-warnings.html

The Larger Problem

This underlying problem is actually more difficult to correct for an existing production system than enabling the correct sql_mode configuration option. Using the customer example, the identification of any rows that are 10 characters in length could be valid, or may have been truncated. There is no easy way to obtain the actual value that was originally submitted. The use of the correct numerical data type (DP#14) can provide a check constraint for values, however it can also suffer from the same truncation problem. You especially hope that this does not affect your payroll, your frequent flyer points balance or your accumulated score from your favorite online game.

The solution is to avoid the problem of producing incorrect data.

Review

While this example is using a character data type, field truncation can also occur with numeric and date data types. The use of applicable sql_mode configuration settings is a critical MySQL design practice to ensure adequate data integrity that all systems need to implement.

More References

About Expert MySQL Design Practices

This new series by leading MySQL Expert Ronald Bradford helps the software engineer understand, appreciate and develop the right skills and techniques to build scalable software solutions. These proven and reproducible design practices will ensure your use of MySQL to improve performance, scalability and reliability.

These expert design practices are from 25 years of professional experience following formal university qualifications in computer science. All of these practices are written for use with a MySQL based data system however most of the content in these practices predate the existence of the MySQL product and have stood the test of time with emerging technologies and software development approaches. Many practices apply directly to other data stores, whether relational or the new NoSQL products and include working with persistent and non-persistent data storage products.

More information about the series can be found at http://ronaldbradford.com/blog/expert-mysql-design-practices/