Using MySQL Table Checksum

The MySQL Table Checksum, part of the MySQL Toolkit (having to be renamed soon) is an invaluable community tool for use with MySQL. Most sites or installations of any volume will use MySQL Replication for one purpose or another, be it for read scalability, backups, testing, upgrading etc.

Why is it needed?
There are two primary compelling reasons. First, MySQL replication is an asynchronous process and there is no absolute guarantee that the Master Database and the Slave Database are the same (By definition that can be different). Second, MySQL does not provide any tools relating to checking, managing, reporting differences. Luckily the community has addressed this present lack of product feature in current versions of MySQL. Special thanks to Baron who has made this happen.

Should I be concerned?
Yes, you should. While some people will not like that I’ve made this alarming comment, a lot of organizations use MySQL Replication for backups. It would be unprofessional to be ignorant to make the assumption.

How to I get it?
Download MySQL Table Checksum. It is recommended that you download all components in the toolkit as there are many valuable tools then just this one.

MySQL Table Checksum will require Perl which is generally always installed with any Linux distribution by default (For Windows you need to do yourself). You will also require the Perl Modules DBI and DBD::mysql. Use CPAN for easy installation of these modules.

These are the steps I did (NOTE: this is a specific version, you should always check for newer versions)

wget http://internap.dl.sourceforge.net/sourceforge/mysqltoolkit/mysqltoolkit-1204.tar.gz
tar xvfz mysqltoolkit-1204.tar.gz
cd mysqltoolkit-1204/bin
./mysql-table-checksum

If everything is installed correctly, you should see.

Usage: mysql-table-checksum [OPTION]... HOST [HOST...]

Errors in command-line arguments:
  * No hosts specified.

mysql-table-checksum checksums MySQL tables efficiently on one or more HOSTs.
Each HOST is specified as a DSN and missing values are inherited from the first
HOST.  If you specify multiple HOSTs, the first is assumed to be the master.
For more details, please use the --help option, or try 'perldoc
mysql-table-checksum' for complete documentation.

How do I use it?
Start with the documentation, RTFM

perldoc mysql-table-checksum

There are a number of ways to run MySQL Table Checksum and different means of using arguments etc. This is what I do.

By Default:

./mysql-table-checksum u=root,p=sakila,h=localhost

DATABASE TABLE       CHUNK HOST      ENGINE      COUNT       CHECKSUM TIME WAIT STAT  LAG
xxx      table1          1 localhost InnoDB       NULL     1678710928    0    0 NULL NULL
xxx      table2          1 localhost InnoDB       NULL     3023415523    0    0 NULL NULL
xxx      table3          1 localhost InnoDB       NULL     1692517818    0    0 NULL NULL
xxx      table4          1 localhost InnoDB       NULL              0    0    0 NULL NULL
xxx      table5          1 localhost InnoDB       NULL     2295061143    0    0 NULL NULL
xxx      table6          1 localhost InnoDB       NULL     2238111875    0    0 NULL NULL
xxx      table7          1 localhost InnoDB       NULL      823770692    0    0 NULL NULL
xxx      table8          1 localhost InnoDB       NULL     2313561225    3    0 NULL NULL
xxx      table9          1 localhost InnoDB       NULL     3524358173    0    0 NULL NULL
xxx      table10         1 localhost InnoDB       NULL              0    0    0 NULL NULL

The output will provide a checksum via the MySQL CHECKSUM function. As this doesn’t provide a row count, the COUNT column is NULL.

A better method is to use the –replicate function. This provides the results into a Database Table (handy for lots of things) as well as getting table counts. This requires a pre-requisite table. For this example I’ve added it to the test database schema.

$ mysql -uroot -psakila test
mysql > CREATE TABLE checksum (
                db         char(64)     NOT NULL,
                tbl        char(64)     NOT NULL,
                chunk      int          NOT NULL,
                boundaries char(64)     NOT NULL,
                this_crc   char(40)     NOT NULL,
                this_cnt   int          NOT NULL,
                master_crc char(40)         NULL,
                master_cnt int              NULL,
                ts         timestamp    NOT NULL,
                PRIMARY KEY (db, tbl, chunk)
             );

My execution for this using this table

./mysql-table-checksum u=root,p=sakila,h=localhost --replicate=test.checksum

DATABASE TABLE    CHUNK HOST      ENGINE      COUNT                                 CHECKSUM TIME WAIT STAT  LAG
xxx      table1       1 localhost InnoDB       2236 cdd6689dbb23ff547540561c4815c717b3d01bf3    0 NULL NULL NULL
xxx      table2       1 localhost InnoDB       3157 f4d538a7a83168acb2cf3374a6edc4949809e723    0 NULL NULL NULL
xxx      table3       1 localhost InnoDB         99 daa0d8403feb200beac5db6456e92c86de6a9b84    0 NULL NULL NULL
xxx      table4       1 localhost InnoDB          0                                     NULL    0 NULL NULL NULL
xxx      table5       1 localhost InnoDB       1006 118b996e10f76153b786479b15f134c08cd21b62    0 NULL NULL NULL
xxx      table6       1 localhost InnoDB    2499397 6a60e07f45d5980bb5de3bd75e1b2ce1e5be25b8   29 NULL NULL NULL
xxx      table7       1 localhost InnoDB      11879 00929c2d6b9278f585b6b7346816bb7a0e483b9e    0 NULL NULL NULL
xxx      table8       1 localhost InnoDB        675 e8e17f9ed15986965ff678ca0c26cd5397ea272d    0 NULL NULL NULL
xxx      table9       1 localhost InnoDB    1730498 373e7fb3fec77b1d0edcfb6a5e28619e9c91d5e4   25 NULL NULL NULL
xxx      table10      1 localhost InnoDB    8557931 7913803bf65bbcf18e679255c657dba53045b88e   99 NULL NULL NULL

The screen output is great, I actually run the following command keeping the output. See later for more info,

./mysql-table-checksum u=root,p=sakila,h=localhost > checksum.mysql4.replicate.`date +%y%m%d.%H%M` &

The table contents are a little different.

mysql> select * from checksum limit 1G
*************************** 1. row ***************************
        db: xxx
       tbl: table20
     chunk: 1
boundaries: 1=1
  this_crc: f7c4d19f6342105f97a4b2770ee7b9094e57a6f2
  this_cnt: 2236
master_crc: f7c4d19f6342105f97a4b2770ee7b9094e57a6f2
master_cnt: 2236
        ts: 2007-11-08 15:37:16

What is most impressive is that in a Master/Slave environment these commands are all replicated, and a simple comparison of data in this table on each slave will confirm if your database is in sync. The docs also confirm, the structure of the tables are not compared so a slave may indeed have some changes.

Migration?
I’ve been able to use the checksum to test a MySQL 4 to MySQL 5 migration. I’ve been able to confirm via checksums as an initial verification that the data has indeed been loaded correctly.

The issue was using the log files for comparison between to instances is the TIME column value changed. As the format of the rows is spaced, it was not possible to easily cut as each schema name was a different length. A small inconvience for the benefit in the results.

I have found however that optimizing the schema in 5, e.g. INT for BIGINT and TIMESTAMP for DATETIME did cause the checksums to fail.

Conclusion
This is a quick introduction, this tool has a lot to offer and is only one in a whole toolkit.

Comments

  1. says

    Can you elaborate on this please?

    “What is most impressive is that in a Master/Slave environment these commands are all replicated, and a simple comparison of data in this table on each slave will confirm if your database is in sync.”

    How do you do the comparison?

    thanks,

    Keith

  2. Lou says

    I remember when you came to help us out with our dbs and you made the comment about replication not ensuring data integrity between databases. All the checks we did previously didn’t come up with any problems, but just this week I found our own little replication issue:

    http://bugs.mysql.com/bug.php?id=41344&thanks=2&notify=67

    Good to know that at least there’s starting to be tools out there to watch for this.